Quelle tutorial.html

Sprache: HTML

products/Sources/formale Sprachen/C/Accent/doc/tutorial.html

<HTML>

<HEAD>
<TITLE>A Short Tutorial on the ACCENT Compiler Compiler</TITLE>
</HEAD>

<BODY bgcolor="white">

<TABLE cellspacing=20>

   <TR>

      <TD valign="top">
         <img src="logo.gif">
      </TD>

      <TD valign="bottom" align="left">
         <a href="index.html">The Accent Compiler Compiler</a>
         <h1>A Short Tutorial</h1>
      </TD>

   </TR>

   <TR>

      <TD align="right" valign="top">
         
         <font face="helvetica">
         <a href="index.html">Accent</a><br>
         <a href="overview.html">Overview</a><br>
         Tutorial<br>
         <a href="language.html">Language</a><br>
         <a href="installation.html">Installation</a><br>
         <a href="usage.html">Usage</a><br>
         <a href="lex.html">Lex</a><br>
         <a href="algorithms.html">Algorithms</a><br>
         <a href="distribution.html">Distribution</a><br>
         </font>
      </TD>

      <TD valign="top">
      
The input of a computer program is often written in a specific language.
This holds for traditional compilers but it is also true for many other
application programs. For example, a program that displays molecules
reads input written in a molecule language.
<p>
Accent is a tool that supports the development of language processors.
It generates programs that process source text written
in the language specified by the user and make the underlying structure
of the text explicit. Following this structure semantic action linked to
certain constructs are executed.
<ul>
<li><a href="#describe">How to Describe Languages</a>
<li><a href="#assign">How to Assign Meaning</a>
<li><a href="#abbreviate">How to Abbreviate Specifications</a>
<li><a href="#resolve">How to Resolve Ambiguities</a>
</ul>

<h1><a name="describe">How to Describe Languages</a></h1>

<h3>Grammars</h3>

The user describes the language by providing a grammar.
Such a grammar is given by rules.
<p>
A rule describes how to build a particular construct of the language.
This is done by listing one or more alternatives
how to build the construct from constituents.
<p>
For example, if we define a programming language, we can express the
fact that a <tt>program</tt> is constructed from a <tt>declaration_part</tt>
and a <tt>statement_part</tt>
by the rule
<pre>
   program : declaration_part statement_part ;
</pre>
A rule has a left hand side that names the construct defined by the rule.
A colon  ("<tt>:</tt>") separates the left hand side from the right hand side.
The right hand side specifies how to build the construct.
A semicolon ("<tt>;</tt>") terminates the rule.
<p>
The name on the left hand side is called a nonterminal.
A nonterminal may be used in right hand sides to specify constituents.
The possible representations defined by a nonterminial are called
the phrases of the nonterminal.
<p>
The rule above says that phrases for <tt>program</tt> are
composed from a phrase for <tt>declaration_part</tt> followed by a phrase
for <tt>statement_part</tt>.
<p>
A rule may provide more than one alternative.
Here is a specification of the nonterminal <tt>statement</tt>:
<pre>
   statement :
      variable '=' expression
   |  IF expression THEN statement ELSE statement
   |  WHILE expression DO statement
   |  BEGIN statement_seq END
   ;
</pre>
It describes four alternatives how to construct a statement.
Alternatives are separated by a bar ("<tt>|</tt>").
<p>
The nonterminal that is defined by the first rule of the grammar
is called the start symbol.
The phrases of the start symbol constitute the language defined by the grammar.
<p>
Grammars of this kind are called context-free grammars.
Accent can process all context-free grammars without restriction.

<h3>Lexical Elements</h3>

Nonterminal symbols are defined by rules.
There are also elementary items
called terminal symbols or tokens.
<p>
They may be given literally, if they are represented by a single character.
For example, the <tt>'='</tt> element in the first alternative for
<tt>statement</tt> stands for the character "<tt>=</tt>".
<p>
They may also be referred by a symbolic name such as <tt>IF</tt>
in the second alternative for <tt>statement</tt>.
In our language an <tt>IF</tt> symbol could be represented by
the two character string "<tt>if</tt>".
Such a mapping from strings to tokens is not defined
by the Accent specification.
In Accent, one just introduces symbolic names that are used for terminal
symbols.
<p>
For example,
<pre>
   %token IF, THEN, ELSE, WHILE, DO, BEGIN, END;
</pre>
introduces names for the tokens in the rule for <tt>statement</tt>.
<p>
The declaration precedes the first rule of the grammar.
<p>
The actual representation is given by rules for a further tool:
The generator Lex can be used to create a lexical analyzer
that partitions the input text into tokens. A specification for Lex
is a list of lexical rules.
<p>
A lexical rule states a pattern (a regular expression) that matches the token
and an action that is carried out when the token is recognized.
Here is are two example rules:
<pre>
   "="  { return '='; }
   "if" { return IF;  }
</pre>
If "<tt>=</tt>" is recognized then the code of the character <tt>'='</tt>
is returned as an indicator.
If "<tt>if</tt>" is recognized then the value of the constant <tt>IF</tt>
is returned.
<p>
A token may have a more complex structure. An example is the token
<tt>NUMBER</tt> which represents a sequence of digits.
This can be specified by a Lex rule like this:
<pre>
[0-9]+ { return NUMBER; }
</pre>
A string that matches the pattern <tt>[0-9]+</tt>
(i.e. a sequence of digits) is indicated as a <tt>NUMBER</tt>.
<p>
The lexical analyzer (described in the Lex specification)
structures the input into a sequence of tokens.
The syntactical analyzer (described in the Accent specification)
hierarchically structures this sequence into phrases.
<p>
An item is specified as token if it has a fixed representation
such as "<tt>=</tt>" or "<tt>if</tt>"
or if it can be defined by a simple expression such as the pattern
for <tt>NUMBER</tt>.
In many cases tokens are those items that can be separated
by additional white space.
A Lex rule can specify to skip white space so that it can be ignored
in the syntactical grammar.
<p>
The Lex rule
<pre>
   " " { /* skip blank */ }
</pre>
skips blanks by not returning a token indicator.

<h3>A Grammar for Expressions</h3>
We now present a simple but complete example:
a grammar for expressions like
<pre>
   10+20*30
</pre>
Here is the Accent specification:
<pre>
   01  %token NUMBER;
   02
   03  expression :
   04    term
   05  ;
   06
   07  term :
   08    term '+' factor
   10  | term '-' factor
   11  | factor
   12  ;
   13
   14  factor :
   15    factor '*' primary
   16  | factor '/' primary
   17  | primary
   18  ;
   19
   20  primary :
   21    NUMBER
   22  | '(' term ')'
   23  | '-' primary
   24  ;
</pre>
These rules not only define what constitutes a valid expression
but also give it structure.
The different nonterminals reflect the binding strength of the operators.
The operators of a factor ("<tt>*</tt>" and "<tt>/</tt>")
have a stronger binding than the
operators of a term ("<tt>+</tt>" and "<tt>-</tt>")
because <tt>factor</tt> is a constituent of a <tt>term</tt>
(a <tt>term</tt> appearing inside a <tt>factor</tt> must be enclosed in
parentheses).
<p>
For example, the input
<pre>
   10+20*30
</pre>
is structured as follows:
<pre>
   expression
   |
   +-term
     |
     +-term
     | |
     | +- factor
     |    |
     |    +-primary
     |      |
     |      +-NUMBER
     |
     +-'+'
     |
     +-factor
       |
       +-factor
       | |
       | +-primary
       |   |
       |   +-NUMBER
       |
       +-'*'
       |
       +-primary
  |
         +-NUMBER
</pre>
or more precisely
(this representation, generated with Accent, specifies which alternative
has been chosen and lists in curly braces the constituents):
<pre>
   expression alternative at line 4, col 6 of grammar {
     term alternative at line 8, col 6 of grammar {
       term alternative at line 10, col 8 of grammar {
         factor alternative at line 16, col 9 of grammar {
           primary alternative at line 20, col 8 of grammar {
             NUMBER
           }
         }
       }
       '+'
       factor alternative at line 14, col 8 of grammar {
         factor alternative at line 16, col 9 of grammar {
           primary alternative at line 20, col 8 of grammar {
             NUMBER
           }
         }
         '*'
         primary alternative at line 20, col 8 of grammar {
           NUMBER
         }
       }
     }
   }
</pre>
The tree above indicates that
<pre>
   10+20*30
</pre>
is structured as
<pre>
   10+(20*30)
</pre>
and not as
<pre>
   (10+20)*30
</pre>

Here is the Lex specification for the expression grammar:
<pre>
   %{
   #include "yygrammar.h"
   %}
   %%
   "+"    { return '+'; }
   "-"    { return '-'; }
   "*"    { return '*'; }
   "/"    { return '/'; }
   [0-9]+ { return NUMBER; }
   " "    { /* skip blank */ }
   \n     { /* skip newline */ }
   .      { yyerror("illegal token"); }
</pre>

(The file <tt>yygrammar.h</tt>, which is included in the header of the
Lex specification, is generated by Accent and contains the definition of
the constant <tt>NUMBER</tt>.)

<h1><a name="assign">How to Assign Meaning</a></h1>

<h3>Semantic Actions</h3>

From the above grammar Accent generates a program that analyzes its
input syntactically: it rejects all texts that do not conform to the grammar.
<p>
In order to process the input semantically we have
to specify semantic actions.
These actions may be embedded into the grammar at arbitrary positions.
They are executed when the particular alternative is processed.
The members of a selected alternative are processed from left to right.
<p>
A semantic action is arbitrary C code inclosed in curly braces.
The text (without the braces) is copied verbatim into the generated
program.
<p>
Here is an example
<pre>
   N:
      { printf("1\n"); } A { printf("2\n"); } B { printf("3\n"); }
   |  { printf("x\n"); } C { printf("y\n"); }
   ;

   A: 'a' { printf("inside A\n"}; };
   B: 'b' { printf("inside B\n"}; };
   C: 'c' { printf("inside C\n"}; };
</pre>
For the input
<pre>
   a b
</pre>
the generated program produces the output
<pre>
   1
   inside A
   2
   inside B
   3
</pre>
For each nonterminal Accent generates a tree walker function.
Here is the code generated for <tt>N</tt>
(slightly edited and without <tt>#line</tt> pragmas for C preprocessor):
<pre>
   N ()
   {
      switch(yyselect()) {
      case 1:
  {
     printf("1\n");
     A();
     printf("2\n");
     B();
     printf("3\n");
  }
  break;
      case 2:
  {
     printf("x\n");
     C();
     printf("y\n");
  }
  break;
      }
   }
</pre>

<h3>Attributes of Nonterminals</h3>

Like functions in C, nonterminal can have parameters.
Parameters may be of mode <tt>in</tt> or <tt>out</tt>.
<tt>in</tt> parameters are used to pass information from the context
to a particular nonterminal (often called inherited attributes).
<tt>out</tt> parameters pass information from a nonterminal to its context
(often called synthesized attributes).

At the left hand side of rule the name of
the nonterminal is followed by a signature that specifies mode, type,
and name of parameters.
The signature is enclosed in the braces
"<tt><</tt>" and "<tt>></tt>".
<p>
For example
<pre>
   N < %in int context, %out int result > : ... ;
</pre>
<tt>N</tt> has an input parameter <tt>context</tt> and an output parameter
<tt>result</tt>, both are of type <tt>int</tt>.
<p>
If a nonterminal appears on the right hand side, actual parameters
follow the nonterminal name, enclosed in
"<tt><</tt>" and "<tt>></tt>".
<p>
For example
<pre>
   N<actual_context, actual_result>
</pre>
Parameters can be accessed inside semantic actions.
The values of input parameters must be defined inside semantic actions
or be the output of other members.
<p>
For example
<pre>
   demo :
      { actual_context = 10; }
      N<actual_context, actual_result>
      { printf("%d\n", actual_result); }
   ;
</pre>
An alternative for a nonterminal must define its output parameters,
either by using them as output parameters for  members
or by assigning a value inside a semantic action.
If an output parameter of the left hand side (a formal output parameter)
is used inside a semantic action, it must be dereferenced with the "<tt>*</tt>"
operator (output parameters are passed by reference to the generated
tree walker function).
<p>
For example
<pre>
   N<%in int context, %out int result> : { *result = context+1; } ;
</pre>
An elaboration of <tt>demo</tt> prints <tt>11</tt>.
<p>
Here are the generated functions:
<pre>
demo ()
{
   int actual_context;
   int actual_result;

   switch(yyselect()) {
   case 1:
      {
  actual_context = 10;
  N(actual_context, &actual_result);
  printf("%d\n", actual_result);
      }
      break;
   }
}

N (context, result)
   int context;
   int *result;
{
   switch(yyselect()) {
   case 2:
      {
         *result = context+1;
      }
      break;
   }
}
</pre>
As you see, identifiers that appear as parameters of nonterminals
are automatically declared.
<p>
Formal parameters, if present, are specified in the form
<pre>
   < %in parameter_specifications %out parameter_specifications >
</pre>
where either the <tt>%in</tt> group or the <tt>%out</tt> group may be omitted.
<p>
The most frequent case, where we have only output parameters,
can simply be written without a mode indicator:
<pre>
   < parameter_specifications >
</pre>
<tt>parameter_specifications</tt> is a list of the form
<pre>
   Type_1 Name_1 , ... , Type_n Name_n
</pre>
The type may be omitted.
In this case the special type <tt>YYSTYPE</tt> is assumed.
This may be defined by the user as a macro.
If there is no user specific definition <tt>YYSTYPE</tt>
stands for <tt>long</tt>.
<p>
Hence in most cases a left hand side of a rule simply looks like this:
<pre>
   Block<b> : ...
</pre>

<h3>Attributes of Tokens</h3>

All items declared as tokens have an output parameter of type
<tt>YYSTYPE</tt>. If a token is used on the right hand side of a rule,
an actual parameter may be specified to access the attribute value of
the token which is computed by the scanner.
<p>
For example
<pre>
    Value :
       NUMBER<n> { printf("%d\n", n); }
    ;
</pre>
Here <tt>n</tt> represents the numeric value of <tt>NUMBER</tt>.
It can be used in the semantic action.
<p>
The attribute value of a token must be computed in the semantic action
of the Lex rule for the token. It must be assigned to the special
variable <tt>yylval</tt> which is of type <tt>YYSTYPE</tt>.
<p>
For example if we want to access the value of a <tt>NUMBER</tt>
the corresponding Lex could be
<pre>
   [0-9]+ { yylval = atoi(yytext); return NUMBER; }
</pre>
The special variable <tt>yytext</tt> holds the string that matched the
pattern. The C function <tt>atoi</tt> converts it into a integer.

<h3>Global Prelude</h3>

If <tt>YYSTYPE</tt> is defined by the user,
it should be declared in an <tt>include</tt> file,
because it is used in the grammar as well as in the Lex specification.
<p>
<tt>#include</tt> statements can be placed in the global prelude part
at the beginning of the grammar file.
Text which is enclosed by
<pre>
   %prelude {
</pre>
and
<pre>
   }
</pre>
is copied verbatim into the generated program.
<p>
For example
<pre>
   %prelude {
   #include "yystype.h"
   }
</pre>

<h3>Rule Prelude</h3>

Identifiers that are used as attributes need not be declared.
One may use semantic actions to declare additional variables
(the curly braces surrounding a semantic do not appear in the generated code).
<p>
For example
<pre>
   demo :
      {int i = 0;} alternative_1 ;
   |  alternative_2
   ;
</pre>
Such variables are local to the alternative.
<tt>i</tt> is visible in the sematic action of <tt>alternative_1</tt>
but not in those of <tt>alternative_2</tt>
<p>
Variables that are visible to all alternatives (but local to the rule)
can be declare in rule prelude
which has the same form as the global prelude but appears before
the alternative list of a rule.
<p>
For example
<pre>
   demo :
      %prelude {int i = 0;}
      alternative_1
   |  alternative_2
   ;
</pre>
<tt>i</tt> is visible in the sematic actions of both alternatives.
<p>
The rule prelude can also be used to provide code that should be execute
as initialization for all alternatives.

<h3>A Calculator</h3>

We are now ready to turn the expression grammar into a calculator.
<p>
For this purpose nonterminals get an output parameter that holds the
numerical value of the phrase represented by the nonterminal.
This value is compute from the numerical values of the constituents.
<p>
For example, in the left hand side
<pre>
   term<n> :
</pre>
<tt>term</tt> gets an attribute <tt>n</tt>.
<p>
In the right hand side
<pre>
     term<x> '+' factor<y> { *n = x+y; }
</pre>
the nonterminal <tt>term</tt> gets an attribute <tt>x</tt>
and the nonterminal <tt>factor</tt> gets an attribute <tt>y</tt>.
The attribute of the left hand side is the computed as the sum
of <tt>x</tt> and <tt>y</tt>.
<p>
Here is the complete grammar:
<pre>
   %token NUMBER;

   expression :
     term<n> { printf("%d\n", n); }
   ;

   term<n> :
     term<x> '+' factor<y> { *n = x+y; }
   | term<x> '-' factor<y> { *n = x-y; }
   | factor<n>
   ;

   factor<n> :
     factor<x> '*' primary<y> { *n = x*y; }
   | factor<x> '/' primary<y> { *n = x/y; }
   | primary<n>
   ;

   primary<n> :
     NUMBER<n>
   | '(' term<n> ')'
   | '-' primary<x> { *n = -x; }
   ;
</pre>
See
<a href="usage.html"><i>Using the Accent Compiler Compiler</i></a>
how to process this specification to obtain a functioning calculator.

<h1><a name="abbreviate">How to Abbreviate Specifications</a></h1>

<h3>Extended Backus Naur Form</h3>

So far we have only considered grammars where members of alternatives
are nonterminal and terminal symbols. A formalism of this kind
was used in the Algol 60 report and named after the editors
of that document: Backus Naur Form.
<p>
Accent also supports a notation that is known as Extended Backus Naur Form.
In this formalism one can write structured members to specify
local alternatives and optional and repetitive elements.

<h3>Local Alternatives</h3>

A member of the form
<pre>
   ( alt_1 | ... | alt_n )
</pre>
can be used to specify alternative representations of a member
without introducing a new nonterminal.
<p>
For example, instead of
<pre>
   signed_number :
      sign NUMBER
   ;

   sign :
      '+'
   |  '-'
   ;
</pre>
one can write
<pre>
   signed_number :
      ( '+' | '-' ) NUMBER
   ;
</pre>
Semantic actions may be inserted.
The actions of the selected alternative are executed.
<p>
For example,
<pre>
   signed_number<r> :
      { int s; }
      ( '+' { s = +1; } | '-' { s = -1; } ) NUMBER<n>
      { *r = s*n; }
   ;
</pre>

<h3>Optional Elements</h3>

A member can also have the form
<pre>
   ( M_1 ... M_n )?
</pre>
in which case the enclosed items <tt>M_1</tt>,  ... , <tt>M_n</tt>
may appear in the input or not.
<p>
For example,
<pre>
   integer :
      ( sign )? NUMBER
   ;
</pre>
specifies specifies that <tt>integer</tt> is a <tt>NUMBER</tt>
preceded by an optional <tt>sign</tt>.
<p>
So both
<pre>
   123
</pre>
and
<pre>
   + 123
</pre>
are valid phrases for <tt>integer</tt>.
<p>
More than one alternative may be specified between
"<tt>(</tt>" and "<tt>)?</tt>":
<pre>
   ( alt_1 | ... | alt_n )?
</pre>
For example,
<pre>
    integer :
       ( '+' | '-' )? NUMBER
    ;
</pre>
specifies that an <tt>integer</tt> is a <tt>NUMBER</tt>
that is optionally preceded by either a
"<tt>+</tt>" or a "<tt>-</tt>".
<p>
In case of semantic actions, proper initialization is required,
because none of the alternative may be processed:
<pre>
   integer<r> :
      { int s = +1; }
      ( '+' | '-' { s = -1; } )? NUMBER<n>
      { *r = s*n; }
   ;
</pre>

<h3>Repetitive Elements</h3>

A further form of a member is
<pre>
   ( M_1 ... M_n )*
</pre>
in which case the enclosed items <tt>M_1</tt>,  ... , <tt>M_n</tt>
may be repeated an arbitrary number of times (including zero).
<p>
For example,
<pre>
   number_list :
      NUMBER  ( ',' NUMBER )*
   ;
</pre>
specifies that a <tt>number_list</tt> is given by at least one <tt>NUMBER</tt>
which is followed by arbitrary number of comma-separated <tt>NUMBER</tt>s.
<p>
Semantic action inside repetions are executed as often as there are instances.
<p>
For example,
<pre>
   number_list :
      NUMBER<sum>  ( ',' NUMBER<next> { sum += next;} )*
      { printf("%d\n", sum); }
   ;
</pre>
adds all the numbers and prints their sum.
<p>
Again, several alternatives may specified:
<pre>
   ( alt_1 | ... | alt_n )*
</pre>
For example,
<pre>
   statements :
      ( simple_statement | structured_statement )*
   ;
</pre>
<tt>statements</tt> matches an arbitrary number of statements,
each of which may be a <tt>simple_statement</tt> or a
<tt>structured_statement</tt>.

<h1><a name="resolve">How to Resolve Ambiguities</a></h1>

<h3>Ambiguities</h3>

The phrase structure of a source text determines the actions that are
executed. There are grammars, where the same source text
can have different phrase structures.
Such grammars are called ambiguous.
<p>
As an example consider this rule from the C language report:
<pre>
   selection_statement :
     IF '(' expression ')' statement
   | IF '(' expression ')' statement ELSE statement
   | SWITCH '(' expression ')' statement
   ;
</pre>
For the source text
<pre>
   if (x) if (y) f(); else g();
</pre>
two interpretations are possible.
One treats the text like
<pre>
   if (x) { if (y) f(); else g(); }
</pre>
the other one like
<pre>
   if (x) { if (y) f(); } else g();
</pre>
In the first case <tt>g()</tt> is invoked if
<tt>x</tt> is true and <tt>y</tt> is false.
In the second case <tt>g()</tt> is invoked if
<tt>x</tt> is false.
<p>
The intended interpretation could be specified by an unambiguous grammar
(as has been done in the case of Java). But such a grammar would be more
clumsy because the problem can not be solved locally.
The C report uses an ambiguous grammar and states:
"<i>The <tt>else</tt> ambiguity is resolved by connecting an <tt>else</tt>
with the last-encountered <tt>else</tt>-less <tt>if</tt>
at the same block nesting level.</i>"
<p>
In traditional systems ambiguous grammars lead to violations of
the constraints of the underlying grammar classes (all LL(1) and all LALR(1)
grammars are unambiguous). The result is a conflict between parser actions
(such as a "shift/reduce" conflict in Yacc,
these conflicts are also possible if the grammar is unambiguous).
Such systems allow the user to resolve the parser conflicts
or take a default action.
<p>
Since Accent has been designed so that it can be used without knowledge of
parser implementation, we have developed an annotation framework that allows
the user to resolve ambiguities at the abstract level of the grammar.
Moreover, this framework is complete, it allows to resolve every ambiguity.
<p>
For this purpose we have to give a complete classification of ambiguities.
We distinguish ambiguities between alternatives
and ambiguities inside alternatives.
<p>
It is undecidable whether a given grammar is unambiguous.
But an ambiguity can be detected at runtime.
In this case the Accent Runtime prints a detailed analysis
and gives an indication how to resolve the ambiguity via an annotation.
If the annotated grammar is processed by Accent, the parser selects
the phrase structure that has been specified by the user.

<h3>Ambiguities Between Alternatives</h3>

Here is a simple example for an ambiguity between alternatives:
<pre>
   01  Start:
   02   'x' N 'z'
   03  ;
   04
   05  N:
   06    A { printf("A selected\n"); }
   07  | B { printf("B selected\n"); }
   08  ;
   09
   10  A: 'y' ;
   11  B: 'y' ;
</pre>
For the input
<pre>
   x y z
</pre>
there are two possible derivations for <tt>N</tt> since both,
<tt>A</tt> and <tt>B</tt>, can produce a "<tt>y</tt>".
Hence there are two possible outputs for the same input:
<pre>
   A selected
</pre>
and
<pre>
   B selected
</pre>
When confronted with that input the parser emits the following diagnostics:
<pre>
   GRAMMAR DEBUG INFORMATION

   Grammar ambiguity detected.
   Two different ``N'' derivation trees for the same phrase.

   TREE 1
   ------

   N alternative at line 6, col 3 of grammar {
     A alternative at line 10, col 4 of grammar {
       'y'
     }
   }

   TREE 2
   ------

   N alternative at line 7, col 3 of grammar {
     B alternative at line 11, col 4 of grammar {
       'y'
     }
   }

   Use %prio annotation to select an alternative.

   END OF GRAMMAR DEBUG INFORMATION
</pre>
A "<tt>%prio</tt>" annotation can be used to give an alternative
a certain priority.
It is written in the form
<pre>
   %prio number
</pre>
and attached at the end of an alternative. The alternative then gets the
priority <tt>number</tt>.
<p>
When two different rules  can produce the same string, both must have
a "<tt>%prio</tt>" annotation. The alternative with the higher priority
is selected.
<p>
To select the second alternative for <tt>N</tt> we give it the
priority <tt>2</tt> and assign <tt>1</tt> to the first alternative.
<pre>
   N:
     A { printf("A selected\n"); } %prio 1
   | B { printf("B selected\n"); } %prio 2
   ;
</pre>
Now the ambiguity is resolved and the output of the generated program is:
<pre>
   B selected
</pre>
In the same style we can resolve the <tt>else</tt> ambiguity:
<pre>
   selection_statement :
     IF '(' expression ')' statement %prio 1
   | IF '(' expression ')' statement ELSE statement %prio 2
   | SWITCH '(' expression ')' statement
   ;
</pre>
It is not necessary to provide a priority for the third alternative because
it is not involved in the ambiguity.

<h3>Ambiguities Inside Alternatives</h3>

We have seen that a derivation is not unique if one can replace a subtree
by a different one that produces the same string.
But there is also another source of ambiguity that is less obvious.
<p>
In this kind of ambiguity inside the same alternative
two neighbor derivations are replaced by different derivations
that produce different strings.
But together the new derivations produce the same string than the old ones.
<p>
It is clear that the string produced by a new derivation must have
a different length than the string produced by an old derivation.
The ambiguity can be resolved by specifying whether
the short or the long string should be preferred.
<p>
Here is an example:
<pre>
   01  N :
   02    'x' L R 'z'
   03  ;
   04
   05  L :
   06    'a'     { printf("( a )");   }
   07  | 'a' 'b' { printf("( a b )"); }
   08  ;
   09
   10  R :
   11    'c'     { printf("( c )\n");   }
   12  | 'b' 'c' { printf("( b c )\n"); }
   13  ;
</pre>
To parse the input
<pre>
   x a b c z
</pre>
one can either use the first alternative of <tt>L</tt> and the second
alternative of <tt>R</tt> or the second alternative of <tt>L</tt>
and the first alternative of <tt>R</tt>.
<p>
The diagnostic message produced by the parser is:
<pre>
   GRAMMAR DEBUG INFORMATION

   Grammar ambiguity detected.
   There are two different parses
   for the beginning of ``N'',
   alternative at line 2, col 3 of grammar,
   up to and containing ``R'' at line 2, col 9 of grammar.

   PARSE 1
   -------

   'x'
   L alternative at line 6, col 3 of grammar {
     'a'
   }
   R alternative at line 12, col 3 of grammar {
     'b'
     'c'
   }

   PARSE 2
   -------

   'x'
   L alternative at line 7, col 3 of grammar {
     'a'
     'b'
   }
   R alternative at line 11, col 3 of grammar {
     'c'
   }

   For ``R'' at line 2, col 9 of grammar,
   use %long annotation to select first parse,
   use %short annotation to select second parse.

   END OF GRAMMAR DEBUG INFORMATION
</pre>
The "<tt>%long</tt>" annotation in front of a nonterminal
indicates that we prefer that the nonterminal produces a longer string.
<p>
If we prefix the nonterminal symbol <tt>R</tt> with <tt>%long</tt> as in
<pre>
   N :
     'x' L %long R 'z'
   ;
</pre>
the ambiguity is resolved and we get the output
<pre>
   ( a )( b c )
</pre>
The "<tt>%short</tt>" annotation indicates that we prefer the shorter string.
If we prefix <tt>R</tt> with <tt>%short</tt> as in
<pre>
   N :
     'x' L %short R 'z'
   ;
</pre>
we get the output
<pre>
   ( a b )( c )
</pre>
      
      <br>
      <br>
      <font face="helvetica" size="1">
      <a href="http://accent.compilertools.net">accent.compilertools.net</a>
      </font>
      </TD>
   </TR>
</TABLE>

</BODY>
</HTML>

Messung V0.5 in Prozent

¤ Dauer der Verarbeitung: 0.17 Sekunden (vorverarbeitet am 2026-06-06) ¤

Wurzel

Suchen

PVS Prover

Isabelle Prover

NIST Cobol Testsuite

Cephes Mathematical Library

Vienna Development Method

Haftungshinweis

Die Informationen auf dieser Webseite wurden nach bestem Wissen sorgfältig zusammengestellt. Es wird jedoch weder Vollständigkeit, noch Richtigkeit, noch Qualität der bereit gestellten Informationen zugesichert.

Bemerkung:

Die farbliche Syntaxdarstellung und die Messung sind noch experimentell.