Let's assume we have a simple programming language with keywords `if`, `else`, `while`, and identifiers (variable names) consisting of letters and digits. We want to tokenize a given program written in this language.
Here's the Lex specification file (`lexer.l`) along with explanations:
```lex
%{
#include <stdio.h>
%}
%option noyywrap
%%
if { printf("IF\n"); }
else { printf("ELSE\n"); }
while { printf("WHILE\n"); }
[a-zA-Z][a-zA-Z0-9]* { printf("IDENTIFIER: %s\n", yytext); }
[ \t\n] ; // Skip whitespace
. { printf("UNKNOWN CHARACTER: %s\n", yytext); }
%%
int main() {
yylex();
return 0;
}
```
Explanation of each section:
- `%{ ... %}`: This section is used for including any necessary header files and declaring global variables or definitions. In this case, we include `stdio.h` for printing messages.
- `%option noyywrap`: This option indicates that the `yywrap` function won't be used. It's commonly used to signal the end of input in Lex programs.
- `%%`: This delimiter separates the Lex rules from the user code.
- `if`, `else`, `while`: These are keywords in our language. For each keyword, we specify a regular expression followed by an action to be taken when a match is found. In this case, we print the corresponding token name.
- `[a-zA-Z][a-zA-Z0-9]*`: This regular expression matches identifiers. It starts with a letter (uppercase or lowercase) and can be followed by letters or digits. When an identifier is matched, we print its value using `yytext`.
- `[ \t\n]`: This regular expression matches whitespace characters (spaces, tabs, newlines). We skip these characters.
- `.`: This regular expression matches any character that didn't match any of the previous patterns. When an unknown character is encountered, we print its value using `yytext`.
- The final section (`main()`) initializes the Lexical Analyzer using `yylex()`.
To compile and run the program:
1. Save the Lex specification in a file named `lexer.l`.
2. Open a terminal and navigate to the directory containing `lexer.l`.
3. Run the following commands:
- `lex lexer.l` (compiles the Lex specification)
- `gcc lex.yy.c -o lexer -ll` (compiles the Lex-generated code)
- `./lexer` (runs the program)
Now, you can provide input text (your program) to the compiled lexer, and it will tokenize the input and display the corresponding tokens.
Comments
Post a Comment