Lex is a computer program that generates lexical analyzers ("scanners" or "lexers").
Contents
- Open source
- Structure of a Lex file
- Example of a Lex file
- Using Lex with parser generators
- Lex and make
- References
Lex is commonly used with the yacc parser generator. Lex, originally written by Mike Lesk and Eric Schmidt and described in 1975, is the standard lexical analyzer generator on many Unix systems, and an equivalent tool is specified as part of the POSIX standard.
Lex reads an input stream specifying the lexical analyzer and outputs source code implementing the lexer in the C programming language.
Open source
Though originally distributed as proprietary software, some versions of Lex are now open source. Open source versions of Lex, based on the original AT&T code are now distributed as open source systems such as OpenSolaris and Plan 9 from Bell Labs. One popular open source version of Lex, called flex, or the "fast lexical analyzer", is not derived from proprietary code.
Structure of a Lex file
The structure of a Lex file is intentionally similar to that of a yacc file; files are divided into three sections, separated by lines that contain only two percent signs, as follows:
Definition section%%Rules section%%C code sectionExample of a Lex file
The following is an example Lex file for the flex version of Lex. It recognizes strings of numbers (positive integers) in the input, and simply prints them out.
If this input is given to flex
, it will be converted into a C file, lex.yy.c
. This can be compiled into an executable which matches and outputs strings of integers. For example, given the input:
the program will print:
Saw an integer: 123Saw an integer: 2Saw an integer: 6Using Lex with parser generators
Lex and parser generators, such as Yacc or Bison, are commonly used together. Parser generators use a formal grammar to parse an input stream, something which Lex cannot do using simple regular expressions (Lex is limited to simple finite state automata).
It is typically preferable to have a (Yacc-generated, say) parser be fed a token-stream as input, rather than having it consume the input character-stream directly. Lex is often used to produce such a token-stream.
Scannerless parsing refers to parsing the input character-stream directly, without a distinct lexer.
Lex and make
make is a utility that can be used to maintain programs involving Lex. Make assumes that a file that has an extension of .l
is a Lex source file. The make internal macro LFLAGS
can be used to specify Lex options to be invoked automatically by make.