/usr/share/mozart/doc/gump/node6.html is in mozart-doc 1.4.0-8ubuntu1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>3.1 Example</TITLE><LINK href="ozdoc.css" rel="stylesheet" type="text/css"></HEAD><BODY><TABLE align="center" border="0" cellpadding="6" cellspacing="6" class="nav"><TR bgcolor="#DDDDDD"><TD><A href="node5.html">- Up -</A></TD><TD><A href="node7.html#section.parser.reference">Next >></A></TD></TR></TABLE><DIV id="section.parser.example"><H2><A name="section.parser.example">3.1 Example</A></H2><P> This section presents the parser for the functional language <SPAN class="language">Lambda</SPAN><A name="label168"></A> for which a scanner was specified in the last chapter. </P><H3><A name="label169">3.1.1 Writing a Parser Specification</A></H3><P> <A href="node6.html#program.parser.example">Program 3.1</A> shows the parser specification which will serve as an example to demonstrate the basic concepts of the Gump Parser Generator. This example will be examined in detail in the following. </P><DIV class="apropos"><P class="margin">Class Descriptors</P><P> Again, a Gump specification resembles a class definition introduced by a special keyword, <CODE><SPAN class="keyword">parser</SPAN></CODE>, and augmented by additional declarations. The usual class descriptors <CODE><SPAN class="keyword">from</SPAN></CODE> and <CODE><SPAN class="keyword">meth</SPAN></CODE> are also used in this specification in lines 2 to 8. The switches <SPAN class="ignore"><CODE><SPAN class="reference">\switch +</SPAN></CODE></SPAN><CODE><SPAN class="reference">gumpparseroutputsimplified</SPAN></CODE><A name="label170"></A> and <SPAN class="ignore"><CODE><SPAN class="reference">\switch +</SPAN></CODE></SPAN><CODE><SPAN class="reference">gumpparserverbose</SPAN></CODE><A name="label171"></A> simply cause additional information to be output at parser generation time; we will see this in the next section. </P></DIV><P> The <CODE>error</CODE><A name="label173"></A> method will be called upon detection of <A name="label174"></A><SPAN class="index">parse errors</SPAN><A name="label175"></A><A name="label176"></A>. Its parameter is a virtual string describing the error. We redefine this method (which has a default implementation in the super class <CODE>GumpParser<SPAN class="keyword">.</SPAN><SPAN class="string">'class'</SPAN></CODE>) since we want to provide the user with the line number<A name="label177"></A> information we maintain in the scanner. </P><DIV class="apropos"><P class="margin">Token Declarations</P><P> In line 10 begin the token declarations<A name="label178"></A>. All token classes (which must be atoms) that the scanner can produce are listed after the <CODE><SPAN class="keyword">token</SPAN></CODE> keyword. Additionally, some tokens are assigned an <A name="label179"></A><EM>associativity</EM> (here: <CODE>leftAssoc</CODE>) and a <A name="label180"></A><EM>precedence</EM> value (a nonzero positive integer) after a colon. These are used to resolve ambiguities in the syntax rules<A name="label181"></A>. The reason for the assignments in our example are explained below. (You may notice that one of the listed tokens cannot be produced by the scanner, the <CODE><SPAN class="string">'APPLY'</SPAN></CODE> token. This is called a <A name="label182"></A><EM>pseudo-token</EM> and is solely defined for its associativity and precedence information.) </P></DIV><DIV class="apropos"><P class="margin">Syntax Rules</P><P> <A name="label183"></A> Line 19 marks the start of the syntax rules themselves. For each nonterminal, a syntax rule (introduced by the keyword <CODE><SPAN class="keyword">syn</SPAN></CODE>) must be given. Nonterminals may be named by atoms or variables. </P></DIV><DIV class="apropos"><P class="margin">Start Symbols</P><P> <A name="label184"></A> <A name="label185"></A> An atom means that this nonterminal is a start symbol. Several start symbols may be defined - the one to reduce to is selected when a concrete parse is initiated. </P></DIV><DIV class="apropos"><P class="margin">Formal Parameter Lists</P><P> Following the nonterminal is its parameter list, consisting of zero or more variables in parentheses. The start symbol <CODE>program</CODE> has two parameters: a list of definitions and a list of terms. These are both output parameters, as is indicated by the commentary <CODE>?</CODE>. </P></DIV><DIV class="apropos"><P class="margin">EBNF Phrases</P><P> The body of each syntax rule is an <A name="label186"></A><SPAN class="index">EBNF</SPAN> phrase (EBNF is an abbreviation of <A name="label187"></A><EM>Extended Backus-Naur-Formalism</EM>). As in Oz, we distinguish between statements and expressions: Some EBNF phrases carry values and may thus only stand at expression position, others don't and must be used at statement position. </P></DIV><P> The basic building blocks of EBNF expressions are <A name="label188"></A><EM>grammar symbol applications</EM>, denoted by the name of a terminal or nonterminal followed by the actual parameter list in parentheses. An example of this is the <CODE>Definition($)</CODE> in line 20, which is an application of the nonterminal <CODE>Definition</CODE> with a single actual parameter. Since this is the <A name="label189"></A><SPAN class="index">nesting marker</SPAN>, the application is an expression (as opposed to a statement) with the value of the corresponding actual parameter as its value. This application is written inside the <A name="label190"></A><SPAN class="index">repetition</SPAN> symbols <CODE>{ </CODE>...<CODE> }<SPAN class="keyword">*</SPAN></CODE>, which means that the application is to be repeated 0 to <I>n</I> times. The repetition construct builds a list of its argument's values at each iteration, since it is used in expression position. This list is assigned to the formal parameter <CODE>Definitions</CODE>. </P><P> The next line, line 21, is similar: Here, a nonempty list (note the <CODE><SPAN class="keyword">+</SPAN></CODE>) of <CODE>Term</CODE>s is expected, seperated by semicolons. The values computed by each <CODE>Term</CODE> are collected in a list, which is assigned to the formal parameter <CODE>Terms</CODE>. </P><DIV class="apropos"><P class="margin">Local Variables</P><P> The next syntax rule introduces a new feature: <A name="label191"></A><SPAN class="index">local variables</SPAN>. All variables in <A name="label192"></A><EM>pattern position</EM> in syntax rules are implicitly declared local. EBNF pattern positions are the left side of an assignment (such as in line 20) and the actual parameters of grammar symbol applications. If in any of these places a single non-escaped variable (i. e., written without <CODE><SPAN class="keyword">!</SPAN></CODE>) is used, it is implicitly declared local to the EBNF construct it is used in. Such is the case for the variables <CODE>I</CODE> and <CODE>T</CODE> in line 24. The formal parameter variables assigned to in lines 20 and 21 had to be escaped to avoid their implicit (re-)declaration. </P></DIV><P> The syntax rule for <CODE>Definition</CODE> in line 23 has a single parameter. Since this is the nesting marker, an EBNF expression is expected as body of this rule. The value of a sequence of EBNF expressions is the value of the last expression (as in Oz, where the value of a sequential composition is the value of the composition's second argument). </P><DIV class="apropos"><P class="margin">Semantic Actions</P><P> The last EBNF expression in line 23 is the <EM class="noindex">semantic action</EM><A name="label193"></A>, introduced by the arrow <CODE><SPAN class="keyword">=></SPAN></CODE>. This action constructs an abstract syntax tree node (represented as a tuple). </P></DIV><DIV class="apropos"><P class="margin">Alternatives</P><P> Lines 26 to 32 show the rule for <CODE>Term</CODE>. This rule has several <A name="label194"></A><SPAN class="index">alternatives</SPAN>, separated by the choice operator <CODE><SPAN class="keyword">[]</SPAN></CODE>. These alternatives also imply the need for the given token precedences and associativities mentioned above: Not all inputs have a unique parse tree. If, for example, we wrote </P><BLOCKQUOTE class="code"><CODE><SPAN class="keyword">lambda</SPAN> <SPAN class="variablename">x</SPAN>.<SPAN class="variablename">y</SPAN> <SPAN class="variablename">z</SPAN></CODE></BLOCKQUOTE><P> this could be parsed as either </P><BLOCKQUOTE class="code"><CODE>(<SPAN class="keyword">lambda</SPAN> <SPAN class="variablename">x</SPAN>.<SPAN class="variablename">y</SPAN>) <SPAN class="variablename">z</SPAN></CODE></BLOCKQUOTE><P> or </P><BLOCKQUOTE class="code"><CODE><SPAN class="keyword">lambda</SPAN> <SPAN class="variablename">x</SPAN>.(<SPAN class="variablename">y</SPAN> <SPAN class="variablename">z</SPAN>)</CODE></BLOCKQUOTE><P> We want to enforce the second meaning (that is, the application has a higher precedence than the abstraction); furthermore, the application should be left-associative (i. e., <CODE><SPAN class="variablename">x</SPAN> <SPAN class="variablename">y</SPAN> <SPAN class="variablename">z</SPAN></CODE> means <CODE>(<SPAN class="variablename">x</SPAN> <SPAN class="variablename">y</SPAN>) <SPAN class="variablename">z</SPAN></CODE>). </P></DIV><DIV class="apropos"><P class="margin">Resolving Conflicts</P><P> This is why the <A name="label195"></A><SPAN class="index">pseudo-token</SPAN> <CODE><SPAN class="string">'APPLY'</SPAN></CODE> was introduced. Each alternative may also have, like the tokens, a precedence and an associativity. If the alternative contains a terminal, than the values of the last terminal are used. Alternatively, a special <A name="label196"></A><EM>precedence token</EM> may be specified via <CODE>prec(</CODE><I>terminal</I><CODE>)</CODE>; then the values of this are used instead. Thus, the application <CODE>Term Term</CODE> is left-associative. Higher precedence values mean tighter binding of operators. Thus, the application (token <CODE><SPAN class="string">'APPLY'</SPAN></CODE> of precedence 2) has precedence over the abstraction (token <CODE><SPAN class="string">'.'</SPAN></CODE> of precedence 1). </P></DIV><P> However, one anomaly remains because the application has no (visible) operator - to resolve conflicts, the precedence/associativity values of the lookahead token are compared to the values of the (potentially) applicable rules. So if the lookahead is one of the tokens with which a <CODE>Term</CODE> can begin, it is in fact an application we have to parse. This is why all these tokens are assigned the same precedence as the application. (For a more detailed description of how operator precedence information is used to resolve conflicts, consult the <A name="label197"></A><SPAN class="index"><SPAN class="tool">bison</SPAN></SPAN> manual <A href="bib.html#donellystallman95">[DS95]</A>. </P><DIV class="apropos"><P class="margin">Epsilon Productions</P><P> The last nonterminal, <CODE>Line</CODE> in line 33, is actually only introduced for the semantic value it computes. The empty sequence of grammar symbols is denoted by <CODE><SPAN class="keyword">skip</SPAN></CODE>. </P></DIV><DIV class="program" id="program.parser.example"><HR><P><A name="program.parser.example"></A></P><P> </P><BLOCKQUOTE class="linenumbers"><PRE><SPAN class="reference">\switch +gumpparseroutputsimplified +gumpparserverbose</SPAN> <BR> <BR><SPAN class="keyword">declare</SPAN> <BR><SPAN class="keyword">parser</SPAN> <SPAN class="type">LambdaParser</SPAN> <SPAN class="keyword">from</SPAN><SPAN class="type"> GumpParser.</SPAN><SPAN class="string">'class'</SPAN> <BR> <SPAN class="keyword">meth</SPAN> <SPAN class="functionname">error</SPAN>(VS) Scanner <SPAN class="keyword">in</SPAN> <BR> GumpParser<SPAN class="keyword">.</SPAN><SPAN class="string">'class'</SPAN><SPAN class="keyword">,</SPAN> getScanner(?Scanner)<BR> {System<SPAN class="keyword">.</SPAN>showInfo <SPAN class="string">'line '</SPAN><SPAN class="keyword">#</SPAN>{Scanner getLineNumber($)}<SPAN class="keyword">#</SPAN><SPAN class="string">': '</SPAN><SPAN class="keyword">#</SPAN>VS}<BR> <SPAN class="keyword">end</SPAN> <BR> <BR> <SPAN class="keyword">token</SPAN> <BR> <SPAN class="string">'define'</SPAN> <SPAN class="string">';'</SPAN> <SPAN class="string">'='</SPAN> <SPAN class="string">')'</SPAN> <BR> <SPAN class="string">'.'</SPAN>: leftAssoc(1)<BR> <SPAN class="string">'APPLY'</SPAN>: leftAssoc(2)<BR> <SPAN class="string">'lambda'</SPAN>: leftAssoc(2)<BR> <SPAN class="string">'('</SPAN>: leftAssoc(2)<BR> <SPAN class="string">'id'</SPAN>: leftAssoc(2)<BR> <SPAN class="string">'int'</SPAN>: leftAssoc(2)<BR> <BR> <SPAN class="keyword">syn</SPAN> <SPAN class="functionname">program</SPAN>(?Definitions ?Terms)<BR> <SPAN class="keyword">!</SPAN>Definitions={ Definition($) }<SPAN class="keyword">*</SPAN> <BR> <SPAN class="keyword">!</SPAN>Terms={ Term($) <SPAN class="keyword">//</SPAN> <SPAN class="string">';'</SPAN> }<SPAN class="keyword">+</SPAN> <BR> <SPAN class="keyword">end</SPAN> <BR> <SPAN class="keyword">syn</SPAN> <SPAN class="functionname">Definition</SPAN>($)<BR> <SPAN class="string">'define'</SPAN> <SPAN class="string">'id'</SPAN>(I) <SPAN class="string">'='</SPAN> Term(T) <SPAN class="string">';'</SPAN> <SPAN class="keyword">=></SPAN> definition(I T)<BR> <SPAN class="keyword">end</SPAN> <BR> <SPAN class="keyword">syn</SPAN> <SPAN class="functionname">Term</SPAN>($)<BR> <SPAN class="string">'lambda'</SPAN> <SPAN class="string">'id'</SPAN>(I) <SPAN class="string">'.'</SPAN> Term(T) <SPAN class="keyword">=></SPAN> lambda(I T)<BR> <SPAN class="keyword">[]</SPAN> Term(T1) Term(T2) prec(<SPAN class="string">'APPLY'</SPAN>) <SPAN class="keyword">=></SPAN> apply(T1 T2)<BR> <SPAN class="keyword">[]</SPAN> <SPAN class="string">'('</SPAN> Term(T) <SPAN class="string">')'</SPAN> <SPAN class="keyword">=></SPAN> T<BR> <SPAN class="keyword">[]</SPAN> <SPAN class="string">'id'</SPAN>(I) Line(L) <SPAN class="keyword">=></SPAN> id(I L)<BR> <SPAN class="keyword">[]</SPAN> <SPAN class="string">'int'</SPAN>(I) <SPAN class="keyword">=></SPAN> int(I)<BR> <SPAN class="keyword">end</SPAN> <BR> <SPAN class="keyword">syn</SPAN> <SPAN class="functionname">Line</SPAN>($)<BR> <SPAN class="keyword">skip</SPAN> <SPAN class="keyword">=></SPAN> {GumpParser<SPAN class="keyword">.</SPAN><SPAN class="string">'class'</SPAN><SPAN class="keyword">,</SPAN> getScanner($) getLineNumber($)}<BR> <SPAN class="keyword">end</SPAN> <BR><SPAN class="keyword">end</SPAN> <BR></PRE></BLOCKQUOTE><P> </P><P class="caption"><STRONG>Program 3.1:</STRONG> The <CODE>LambdaParser</CODE> parser specification.</P><HR></DIV><H3><A name="label198">3.1.2 Invoking Gump</A></H3><P> Parser specifications are processed in the same way scanner specifications are. First we prepare the Gump Parser Generator by feeding:<A name="label199"></A> </P><BLOCKQUOTE class="code"><CODE><SPAN class="reference">\switch +gump</SPAN></CODE></BLOCKQUOTE><P><A name="label200"></A> </P><P> Then the file to translate is simply fed into the compiler. Suppose you saved the example specification in the file <CODE>LambdaParser.ozg</CODE>; feed: </P><BLOCKQUOTE class="code"><CODE><SPAN class="reference">\insert LambdaParser.ozg</SPAN></CODE></BLOCKQUOTE><P> The extension <CODE>.ozg</CODE><A name="label201"></A> indicating, as before, an Oz file with embedded Gump specifications. </P><DIV class="apropos"><P class="margin">Output Files</P><P> <A name="label202"></A> Two files are generated from the <CODE><SPAN class="keyword">parser</SPAN></CODE> definition: <CODE>LambdaParser.simplified</CODE> contains a simplified version of the syntax rules where the EBNF constructs have been expanded to equivalent BNF forms (because the <SPAN class="ignore"><CODE><SPAN class="reference">\switch +</SPAN></CODE></SPAN><CODE><SPAN class="reference">gumpparseroutputsimplified</SPAN></CODE><A name="label203"></A> switch was set), whereas the file <CODE>LambdaParser.output</CODE> contains the output from the <A name="label204"></A><SPAN class="index"><SPAN class="tool">bison</SPAN></SPAN> parse table generator (because the <SPAN class="ignore"><CODE><SPAN class="reference">\switch +</SPAN></CODE></SPAN><CODE><SPAN class="reference">gumpparserverbose</SPAN></CODE><A name="label205"></A> switch was set). These names are generated from the parser specification's name. </P></DIV><H3><A name="label206">3.1.3 Using the Generated Parser</A></H3><P> <A href="node6.html#program.parser.test">Program 3.2</A> shows an example Oz program that uses both the generated scanner from the last chapter and the generated parser from above. </P><DIV class="apropos"><P class="margin">Initialization</P><P> First, the scanner and parser classes are loaded. After instantiating and initializing the scanner, a parser object is created. This needs as initializer a single parameter, a scanner. This is, technically speaking, a unary procedure that understands the messages <CODE>putToken</CODE><A name="label208"></A> and <CODE>getToken</CODE><A name="label210"></A> described in <A href="node7.html#section.parser.class">Section 3.2.3</A>. </P></DIV><DIV class="apropos"><P class="margin">Initiating a Parse</P><P> The most interesting message sent to the parser is the <CODE>parse</CODE><A name="label212"></A> message. The first argument has to be a tuple. The label specifies the start symbol to use, the features correspond to the actual parameters of the start symbol. In this case, the actual parameter variables <CODE>Definitions</CODE> and <CODE>Terms</CODE> are bound to lists of definitions and terms, respectively. The second argument to the <CODE>parse</CODE> message is the result status. This is either unified with <CODE><SPAN class="keyword">true</SPAN></CODE> if parsing was successful or with <CODE><SPAN class="keyword">false</SPAN></CODE> otherwise. </P></DIV></DIV><DIV class="program" id="program.parser.test"><HR><P><A name="program.parser.test"></A></P><P> </P><BLOCKQUOTE><PRE><SPAN class="reference">\switch +gump</SPAN> <BR><SPAN class="reference">\insert gump/examples/LambdaScanner.ozg</SPAN> <BR><SPAN class="reference">\insert gump/examples/LambdaParser.ozg</SPAN> <BR> <BR><SPAN class="keyword">local</SPAN> <BR> MyScanner = {New LambdaScanner init()}<BR> MyParser = {New LambdaParser init(MyScanner)}<BR> Definitions Terms Status<BR><SPAN class="keyword">in</SPAN> <BR> {MyScanner scanFile(<SPAN class="string">'Lambda.in'</SPAN>)}<BR> {MyParser parse(program(?Definitions ?Terms) ?Status)}<BR> {MyScanner close()}<BR> <SPAN class="keyword">if</SPAN> Status <SPAN class="keyword">then</SPAN> <BR> {Browse Definitions}<BR> {Browse Terms}<BR> {System<SPAN class="keyword">.</SPAN>showInfo <SPAN class="string">'accepted'</SPAN>}<BR> <SPAN class="keyword">else</SPAN> <BR> {System<SPAN class="keyword">.</SPAN>showInfo <SPAN class="string">'rejected'</SPAN>}<BR> <SPAN class="keyword">end</SPAN> <BR><SPAN class="keyword">end</SPAN> <BR></PRE></BLOCKQUOTE><P> </P><P class="caption"><STRONG>Program 3.2:</STRONG> A program making use of the generated parser.</P><HR></DIV><TABLE align="center" border="0" cellpadding="6" cellspacing="6" class="nav"><TR bgcolor="#DDDDDD"><TD><A href="node5.html">- Up -</A></TD><TD><A href="node7.html#section.parser.reference">Next >></A></TD></TR></TABLE><HR><ADDRESS><A href="http://www.ps.uni-sb.de/~kornstae/">Leif Kornstaedt</A><BR><SPAN class="version">Version 1.4.0 (20110908185330)</SPAN></ADDRESS></BODY></HTML>
|