flex lexical analyser
This article may require cleanup to meet Wikipedia's quality standards. Please improve this article if you can. (September 2007) |
It has been suggested that [[::Flex++|Flex++]] be merged into this article or section. (Discuss) |
Developer(s) | Vern Paxson |
---|---|
Stable release | 2.5.35 / February 26, 2008 |
Operating system | Unix-like |
Type | Lexical analyzer generator |
License | BSD license |
Website | flex.sf.net |
flex (fast lexical analyzer generator) is a free software alternative to lex.[1] It is frequently used with the free Bison parser generator. Unlike Bison, flex is not part of the GNU project.[2] Flex was written in C by Vern Paxson around 1987. He was translating a Ratfor generator, which had been led by Jef Poskanzer.[3]
A similar lexical scanner for C++ is flex++, which is included as part of the flex package.
Example lexical analyzer
This is an example of a scanner (written in C) for the instructional programming language PL/0.
The symbols recognized are: '+
', '-
', '*
', '/'
, '=
', '(
', ')
', ',
', ';
', '.
', ':=
', '<
', '<=
', '<>
', '>
', '>=
';
numbers: 0-9 {0-9}
; identifiers: a-zA-Z {a-zA-Z0-9}
and keywords: begin
, call
, const
, do
, end
, if
, odd
, procedure
, then
, var
, while
.
External variables used:
FILE *source /* The source file */
int cur_line, cur_col, err_line, err_col /* For error reporting */
int num /* Last number read stored here, for the parser */
char id[] /* Last identifier read stored here, for the parser */
Hashtab *keywords /* List of keywords */
External routines called:
error(const char msg[]) /* Report an error */
Hashtab *create_htab(int estimate) /* Create a lookup table */
int enter_htab(Hashtab *ht, char name[], void *data) /* Add an entry to a lookup table */
Entry *find_htab(Hashtab *ht, char *s) /* Find an entry in a lookup table */
void *get_htab_data(Entry *entry) /* Returns data from a lookup table */
FILE *fopen(char fn[], char mode[]) /* Opens a file for reading */
fgetc(FILE *stream) /* Read the next character from a stream */
ungetc(int ch, FILE *stream) /* Put-back a character onto a stream */
isdigit(int ch), isalpha(int ch), isalnum(int ch) /* Character classification */
External types:
Symbol /* An enumerated type of all the symbols in the PL/0 language */
Hashtab /* Represents a lookup table */
Entry /* Represents an entry in the lookup table */
Scanning is started by calling init_scan
, passing the name of the source file. If the source file is successfully opened, the parser calls getsym
repeatedly to return successive symbols from the source file.
The heart of the scanner, getsym
, should be straightforward. First, whitespace is skipped. Then the retrieved character is classified. If the character represents a multiple-character symbol, additional processing must be done. Numbers are converted to internal form, and identifiers are checked to see if they represent a keyword.
int read_ch(void) {
int ch = fgetc(source);
cur_col++;
if (ch == '\n') {
cur_line++;
cur_col = 0;
}
return ch;
}
void put_back(int ch) {
ungetc(ch, source);
cur_col--;
if (ch == '\n') cur_line--;
}
Symbol getsym(void) {
int ch;
while ((ch = read_ch()) != EOF && ch <= ' ')
;
err_line = cur_line;
err_col = cur_col;
switch (ch) {
case EOF: return eof;
case '+': return plus;
case '-': return minus;
case '*': return times;
case '/': return slash;
case '=': return eql;
case '(': return lparen;
case ')': return rparen;
case ',': return comma;
case ';': return semicolon;
case '.': return period;
case ':':
ch = read_ch();
return (ch == '=') ? becomes : nul;
case '<':
ch = read_ch();
if (ch == '>') return neq;
if (ch == '=') return leq;
put_back(ch);
return lss;
case '>':
ch = read_ch();
if (ch == '=') return geq;
put_back(ch);
return gtr;
default:
if (isdigit(ch)) {
num = 0;
do { /* no checking for overflow! */
num = 10 * num + ch - '0';
ch = read_ch();
} while ( ch != EOF && isdigit(ch));
put_back(ch);
return number;
}
if (isalpha(ch)) {
Entry *entry;
id_len = 0;
do {
if (id_len < MAX_ID) {
id[id_len] = (char)ch;
id_len++;
}
ch = read_ch();
} while ( ch != EOF && isalnum(ch));
id[id_len] = '\0';
put_back(ch);
entry = find_htab(keywords, id);
return entry ? (Symbol)get_htab_data(entry) : ident;
}
error("getsym: invalid character '%c'", ch);
return nul;
}
}
int init_scan(const char fn[]) {
if ((source = fopen(fn, "r")) == NULL) return 0;
cur_line = 1;
cur_col = 0;
keywords = create_htab(11);
enter_htab(keywords, "begin", beginsym);
enter_htab(keywords, "call", callsym);
enter_htab(keywords, "const", constsym);
enter_htab(keywords, "do", dosym);
enter_htab(keywords, "end", endsym);
enter_htab(keywords, "if", ifsym);
enter_htab(keywords, "odd", oddsym);
enter_htab(keywords, "procedure", procsym);
enter_htab(keywords, "then", thensym);
enter_htab(keywords, "var", varsym);
enter_htab(keywords, "while", whilesym);
return 1;
}
Now, contrast the above code with the code needed for a flex generated scanner for the same language:
%{
#include "y.tab.h"
%}
digit [0-9]
letter [a-zA-Z]
%%
"+" { return PLUS; }
"-" { return MINUS; }
"*" { return TIMES; }
"/" { return SLASH; }
"(" { return LPAREN; }
")" { return RPAREN; }
";" { return SEMICOLON; }
"," { return COMMA; }
"." { return PERIOD; }
":=" { return BECOMES; }
"=" { return EQL; }
"<>" { return NEQ; }
"<" { return LSS; }
">" { return GTR; }
"<=" { return LEQ; }
">=" { return GEQ; }
"begin" { return BEGINSYM; }
"call" { return CALLSYM; }
"const" { return CONSTSYM; }
"do" { return DOSYM; }
"end" { return ENDSYM; }
"if" { return IFSYM; }
"odd" { return ODDSYM; }
"procedure" { return PROCSYM; }
"then" { return THENSYM; }
"var" { return VARSYM; }
"while" { return WHILESYM; }
{letter}({letter}|{digit})* {
yylval.id = (char *)strdup(yytext);
return IDENT; }
{digit}+ { yylval.num = atoi(yytext);
return NUMBER; }
[ \t\n\r] /* skip whitespace */
. { printf("Unknown character [%c]\n",yytext[0]);
return UNKNOWN; }
%%
int yywrap(void){return 1;}
About 50 lines of code for flex versus about 100 lines of hand-written code.
Issues
Time Complexity
A Flex lexical analyzer sometimes has time complexity <math>O(n)</math> in the length of the input. That is, it performs a constant number of operations for each input symbol. This constant is quite low: GCC generates 12 instructions for the DFA match loop. Note that the constant is independent of the length of the token, the length of the regular expression and the size of the DFA.
However, one optional feature of Flex can cause Flex to generate a scanner with non-linear performance: The use of the REJECT macro in a scanner with the potential to match extremely long tokens. In this case, the programmer has explicitly told flex to "go back and try again" after it has already matched some input. This will cause the DFA to backtrack to find other accept states. In theory, the time complexity is <math>O(n+m^2) \ge O(m^2)</math> where <math>m</math> is the length of the longest token (this reverts to <math>O(n)</math> if tokens are "small" with respect to the input size).[4][dubious ] The REJECT feature is not enabled by default, and its performance implications are thoroughly documented in the Flex manual.[citation needed]
See also
External links
- Flex Homepage
- Flex Manual
- ANSI-C Lex Specification
- JFlex: Fast Scanner Generator for Java
- Brief description of Lex, Flex, YACC, and Bison
References
- ↑ Levine, John (August 2009). flex & bison. O'Reilly Media. pp. 304. ISBN 978-0-596-15597-1. http://oreilly.com/catalog/9780596155988.
- ↑ Is flex GNU or not?, flex FAQ
- ↑ When was flex born?, flex FAQ
- ↑ http://flex.sourceforge.net/manual/Performance.html (last paragraph)
cs:flex lexical analyser fr:Flex (logiciel) ko:Flex (어휘분석기) pl:Flex sr:Flex
If you like SEOmastering Site, you can support it by - BTC: bc1qppjcl3c2cyjazy6lepmrv3fh6ke9mxs7zpfky0 , TRC20 and more...
- Pages using deprecated source tags
- Pages with syntax highlighting errors
- Pages with broken file links
- Articles needing cleanup from September 2007
- Articles with invalid date parameter in template
- All pages needing cleanup
- Articles to be merged from November 2007
- All articles to be merged
- All accuracy disputes
- Articles with disputed statements from October 2009
- All articles with unsourced statements
- Articles with unsourced statements from November 2009
- Free compilers and interpreters
- Compiling tools
- Parser generators