<
HTML>
<
HEAD>
<
TITLE>Using LEX with ACCENT</
TITLE>
</
HEAD>
<
BODY bgcolor=
"white">
<
TABLE cellspacing=20>
<
TR>
<
TD valign=
"top">
<
img src=
"logo.gif">
</
TD>
<
TD valign=
"bottom" align=
"left">
<a href=
"index.html">The Accent Compiler Compiler</a>
<
h1>Using LEX with ACCENT</
h1>
</
TD>
</
TR>
<
TR>
<
TD align=
"right" valign=
"top">
<!-- MENU -->
<
font face=
"helvetica">
<a href=
"index.html">Accent</a><
br>
<a href=
"overview.html">Overview</a><
br>
<a href=
"tutorial.html">Tutorial</a><
br>
<a href=
"language.html">Language</a><
br>
<a href=
"installation.html">Installation</a><
br>
<a href=
"usage.html">Usage</a><
br>
Lex<
br>
<a href=
"algorithms.html">Algorithms</a><
br>
<a href=
"distribution.html">Distribution</a><
br>
</
font>
</
TD>
<
TD valign=
"top">
<!--- begin main content -->
<h3>The Scanner Function</h3>
The representation of terminal symbols (tokens) is not defined
by the <i>Accent</i> specification. An <i>Accent</i> parser
cooperates with a lexical scanner that converts the
source text into
a sequence of tokens. This scanner is implemented by a function
<
tt>yylex()</
tt> that reads the next token and returns a value
representing the kind of the token.
<h3>The Kind of a Token</h3>
The kind of a token is indicated by a number.
<p>
A terminal symbol denoted by a literal in the <i>Accent</i> specification,
e.g. <
tt>
'+'</
tt>, is represented by the numerical value of the character.
So <
tt>yylex()</
tt> returns this value if it has recognized this literal:
<
pre>
return
'+';
</
pre>
A terminal symbol denoted by a symbolic name declared
in the token declaration part of the <i>Accent</i> specification,
e.g. <
tt>NUMBER</
tt>, is represented by a constant with a symbolic name
that is the same as the token name. So <
tt>yylex</
tt> returns
this constant:
<
pre>
return NUMBER;
</
pre>
The definition of the constants is generated by <i>Accent</i>
and is contained in the generated file <
tt>yygrammar.h</
tt>.
Hence the file introducing <
tt>yylex</
tt> should include this file.
<
pre>
#include
"yygrammar.h"
</
pre>
<h3>The Attribute of a Token</h3>
Besides having a kind (e.g. <
tt>NUMBER</
tt>)
a token can also be augmented with a semantic attribute.
The function <
tt>yylex</
tt>
assigns this attribute value to the variable <
tt>yylval</
tt>.
For example
<
pre>
yylval = atoi(yytext);
</
pre>
(here <
tt>yytext</
tt> is the actual token that has been recognized
as a <
tt>NUMBER</
tt>; the function <
tt>atoi()</
tt> converts this
string into a numerical value).
<p>
The variable <
tt>yylval</
tt> is declared
in the generated file <
tt>yygrammar.c</
tt>.
An <
tt>external</
tt> declaration for this variable
is provided in the generated file <
tt>yygrammar.h</
tt>.
<p>
<
tt>yylval</
tt> is declared as of type <
tt>YYSTYPE</
tt>.
This is defined by <i>Accent</i>
in the file <
tt>yygrammar.h</
tt> as a macro standing for <
tt>long</
tt>.
<
pre>
#ifndef YYSTYPE
#define YYSTYPE long
#endif
</
pre>
The user can define his or her own type before including the file
<
tt>yygrammar.h</
tt>.
For example, a file <
tt>yystype.h</
tt> may define
<
pre>
typedef union {
int intval;
float floatval;
} ATTRIBUTE;
#define YYSTYPE ATTRIBUTE
</
pre>
Now the file defining <
tt>yylex()</
tt> imports two
header files:
<
pre>
#include
"yystype.h"
#include
"yygrammar.h"
</
pre>
and defines the semantic attribute by:
<
pre>
yylval.intval = atoi(yytext);
</
pre>
<h3>The <i>Lex</i> Specification</h3>
The function <
tt>yylex</
tt> can be generated by the scanner generator
<i>Lex</i> (or the GNU implementation <i>Flex</i>).
<p>
The <a href=
"http://dinosaur.compilertools.net"><i>Lex & Yacc Page</i></a>
has online documentation for <i>Lex</i> and <i>Flex</i>.
<p>
A <i>Lex</i> specification gives rules that define for each token how it
is represented and how it is processed.
A rule has the
form
<
pre>
pattern { action }
</
pre>
<
tt>pattern</
tt> is a regular expression
that specifies the representation of the token.
<p>
<
tt>action</
tt> is <i>C</i>
code that specifies how the token is processed.
This
code sets the attribute value and returns the kind of the token.
<p>
For example, here is a rule for the token <
tt>NUMBER</
tt>:
<
pre>
[0-9]+ { yylval.intval = atoi(yytext); return NUMBER; }
</
pre>
The <i>Lex</i> specification starts with a definition
section
which can be used to import
header files and to declare variables.
For example,
<
pre>
%{
#include
"yystype.h"
#include
"yygrammar.h"
%}
%%
</
pre>
Here the
section imports <
tt>yystype.h</
tt> to provide a user specific
definition of <
tt>YYSTYPE</
tt> and <
tt>yygrammar.h</
tt>
that defines the token codes.
The <
tt>%%</
tt> separates this
section from the rules part.
<h3>The <i>Accent</i> Specification</h3>
In the <i>Accent</i> specification, tokens are introduced in the token
declaration part.
<p>
For example
<
pre>
%token NUMBER;
</
pre>
introduces a token with name <
tt>NUMBER</
tt>.
<p>
Inside a rule the token can be used with a parameter,
for example
<
pre>
NUMBER<x>
</
pre>
This parameter can then be used in actions to access the attribute of the token.
It is of type <
tt>YYSTYPE</
tt>.
<
pre>
Value : NUMBER<x> { printf(
"%d", x.intval); } ;
</
pre>
or simply
<
pre>
Value : NUMBER<x> { printf(
"%d", x); } ;
</
pre>
if there is no user specific definition of <
tt>YYSTYPE</
tt>.
<p>
As opposed to the <i>Lex</i> specification the import of <
tt>yygrammar.h</
tt>
does not appear in the
<i>Accent</i> specification.
If the user specifies an own type <
tt>YYSTYPE</
tt>
this has to be done in global prelude part, e.g.
<
pre>
%prelude {
#include
"yystype.h"
}
</
pre>
<h3>Tracking the
Source Position</h3>
Like <
tt>yylval</
tt>, which holds the attribute of a token,
there is a further variable, <
tt>yypos</
tt>, thats holds the
source position
of the token.
<p>
<
tt>yypos</
tt> is declared in the <i>Accent</i> runtime
as an <
tt>external</
tt> variable of type <
tt>long</
tt>.
Its initial value is <
tt>1</
tt>.
<p>
This variable can be set in rules of the <i>Lex</i> specification.
For example,
<
pre>
\n { yypos++; /* adjust linenumber and skip newline */ }
</
pre>
If the newline character is seen, <
tt>yypos</
tt> is incremented
and so holds the actual line number.
<p>
The variable <
tt>yypos</
tt> is managed in in such a way that
it holds the correct value when <
tt>yyerror</
tt> is invoked to
report a syntax error
(although due to lookahead already the next token is read).
<p>
It has also a correct value when semantic actions are executed
(note that this is done after lexical analysis and parsing).
Hence it can be used inside semantic actions,
for example
<
pre>
value:
NUMBER<n> { printf(
"value in line %d is %d\n", yypos, n); }
;
</
pre>
<!--- end main content -->
<
br>
<
br>
<
font face=
"helvetica" size=
"1">
<a href=
"http://accent.compilertools.net">accent.compilertools.net</a>
</
font>
</
TD>
</
TR>
</
TABLE>
</
BODY>
</
HTML>