/usr/share/doc/libopentoken4-dev/examples/ASU_Example_5

This directory contains various implementations of Example 5.10,
section 5.3, page 295 from [1] "Compilers Principles, Techniques,
and Tools" by Aho, Sethi, and Ullman (aka: "The Dragon Book").

The grammar in that example is:

L -> E n       print (val[top])
E -> E1 + T    val[ntop] := val[top - 2] + val[top]
E -> T
T -> T1 * F    val[ntop] := val[top - 2] * val[top]
T -> F
F -> ( E )     val[ntop] := val[ntop - 1]
F -> digit

'n' is end of file. val, top, and ntop implement an operand stack,
with the following rule:

When a production with r symbols on the right side is reduced, the
value of ntop is set to top - r + 1. After each action is executed,
top is set to ntop.

There are three implementations of this or equivalent grammars in this
directory:

asu_example_5_10_lr-run          LR parser generator
asu_example_5_10_rd_commute-run  rewrite grammar using commutivity
asu_example_5_10_rd_list-run     rewrite grammar to use List tokens

Each of these versions is provided as an executable, that can accept
input from the console or from a file. In addition, if the -t option
is given, it will output appropriate trace information, which is very
useful in understanding how the parsers operate.

The LR version uses the OpenToken Production type to represent the
productions, and the OpenToken LALR Parser generator to generate a
parser. The generated parser implements the operand stack implicitly,
since it implements a token stack for the productions. Thus the user
does not need to provide the operand stack separately.

This parser requires one lookahead.

The other two versions use recursive descent. The grammar above is
left recursive, so it cannot be used directly for recursive descent;
parsing E requires looking for E, which requires looking for E, etc.
So the grammar must be rewritten.

The rd_commute version uses recursive descent implemented by the
OpenToken Selection and Sequence tokens. It rewrites the grammar as
follows, taking advantage of the commutivity of + and *:

L -> E EOF     print (pop)
E -> T + E     push (pop + pop)
E -> T
T -> F * T     push (pop * pop)
T -> F
F -> ( E )
F -> integer   push (integer)

Here the operand stack is provided by the user, in the Build actions
for the 'T + E' and 'F * T' sequence tokens, and the integer token.

This parser requires infinite lookaheads; it needs to see the + or *
of the sequences for E and T, _after_ a parentheses. This means it
always looks ahead to the end of input, at each step in the parse. It
also requires backtracking during lookahead; it backs up to retry the
alternatives for E and T. OpenToken can handle this, as long as there
is memory space.

Note that E and T are implemented by Selection tokens, and the order
of the tokens is important, because selection tokens accept the first
choice that matches. If E is implemented as T | T + E, then it will
never choose T + E.

This implementation is very inefficient, as can be seen by running it
with -t and comparing the output to the other two. It is included as
an illustration that OpenToken allows implementing naive grammars that
still work, which can be an advantage when you are just getting
started with parsers.

The rd_list version uses recursive descent implemented by the
OpenToken List, Selection, and Sequence tokens. It rewrites the
grammar as follows, using {} to indicate possible repetition (as in
Extended Backus-Naur form; see
http://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form):

L -> E  EOF      print (L.val)
E -> T {+ T}     element action: E.val := E.val + T.val
T -> F {* F}     element action: T.val := T.val * F.val
F -> ( E )       F.val := E.val
F -> integer

E is initialized to 0, F to 1.

The List token implements the repetition {}, and executes an action
every time the non-operator token is recognized. It keeps a local copy
of the result token on the CPU function call stack; that implements
the operand stack.

This parser requires two lookaheads, to distinguish E from T. However,
it does not require backtracking during lookahead. It gets around the
parenthesis lookahead problem in the commute version by _always_
looking for the operator after an element of a list is parsed.
libopentoken4-dev 5.0a-1 / usr / share / doc / libopentoken4-dev / examples / ASU_Example_5_10 / README.text