Skip to main content

2. Implement a Lexical Analyzer for a given program using Lex Tool.

 Let's assume we have a simple programming language with keywords `if`, `else`, `while`, and identifiers (variable names) consisting of letters and digits. We want to tokenize a given program written in this language.


Here's the Lex specification file (`lexer.l`) along with explanations:


```lex

%{

#include <stdio.h>

%}


%option noyywrap


%%

if      { printf("IF\n"); }

else    { printf("ELSE\n"); }

while   { printf("WHILE\n"); }

[a-zA-Z][a-zA-Z0-9]* { printf("IDENTIFIER: %s\n", yytext); }

[ \t\n]  ; // Skip whitespace

.        { printf("UNKNOWN CHARACTER: %s\n", yytext); }

%%


int main() {

    yylex();

    return 0;

}

```


Explanation of each section:


- `%{ ... %}`: This section is used for including any necessary header files and declaring global variables or definitions. In this case, we include `stdio.h` for printing messages.


- `%option noyywrap`: This option indicates that the `yywrap` function won't be used. It's commonly used to signal the end of input in Lex programs.


- `%%`: This delimiter separates the Lex rules from the user code.


- `if`, `else`, `while`: These are keywords in our language. For each keyword, we specify a regular expression followed by an action to be taken when a match is found. In this case, we print the corresponding token name.


- `[a-zA-Z][a-zA-Z0-9]*`: This regular expression matches identifiers. It starts with a letter (uppercase or lowercase) and can be followed by letters or digits. When an identifier is matched, we print its value using `yytext`.


- `[ \t\n]`: This regular expression matches whitespace characters (spaces, tabs, newlines). We skip these characters.


- `.`: This regular expression matches any character that didn't match any of the previous patterns. When an unknown character is encountered, we print its value using `yytext`.


- The final section (`main()`) initializes the Lexical Analyzer using `yylex()`.


To compile and run the program:


1. Save the Lex specification in a file named `lexer.l`.

2. Open a terminal and navigate to the directory containing `lexer.l`.

3. Run the following commands:

   - `lex lexer.l` (compiles the Lex specification)

   - `gcc lex.yy.c -o lexer -ll` (compiles the Lex-generated code)

   - `./lexer` (runs the program)


Now, you can provide input text (your program) to the compiled lexer, and it will tokenize the input and display the corresponding tokens.


Comments