/usr/share/mozart/doc/gump/node3.html is in mozart-doc 1.4.0-8ubuntu1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>2.1 Example</TITLE><LINK href="ozdoc.css" rel="stylesheet" type="text/css"></HEAD><BODY><TABLE align="center" border="0" cellpadding="6" cellspacing="6" class="nav"><TR bgcolor="#DDDDDD"><TD><A href="node2.html">- Up -</A></TD><TD><A href="node4.html#section.scanner.reference">Next >></A></TD></TR></TABLE><DIV id="section.scanner.example"><H2><A name="section.scanner.example">2.1 Example</A></H2><P> As a running example we will specify, throughout the manual, a front-end for a compiler or an interpreter for a small functional language <SPAN class="language">Lambda</SPAN><A name="label18"></A>. In this section we will define the scanner for this language, in <A href="node6.html#section.parser.example">Section 3.1</A> we build a parser on top of this scanner. </P><H3><A name="label19">2.1.1 Writing a Scanner Specification</A></H3><P> <A href="node3.html#program.scanner.example">Program 2.1</A> shows the specification of the sample scanner we will consider in this section. In the following we will examine this example line by line. </P><DIV class="apropos"><P class="margin">Class Descriptors</P><P> At the first glance the scanner specification closely resembles a class definition with some extra elements, introduced by the keyword <CODE><SPAN class="keyword">scanner</SPAN></CODE> instead of <CODE><SPAN class="keyword">class</SPAN></CODE>. This is intentional, since it will ultimately be replaced by a class. This is why all descriptors allowed in a class definition are also allowed at the beginning of a scanner specification. Consider the <CODE><SPAN class="keyword">from</SPAN></CODE>, <CODE><SPAN class="keyword">attr</SPAN></CODE> and <CODE><SPAN class="keyword">meth</SPAN></CODE> constructs used in lines 2 to 10. </P></DIV><DIV class="apropos"><P class="margin">Lexical Abbreviations</P><P> The scanner-specific declarations begin at line 12. Two kinds of definition can be introduced by the keyword <CODE><SPAN class="keyword">lex</SPAN></CODE>: either a <A name="label20"></A><EM>lexical abbreviation</EM>, as seen in lines 12 to 15, or a <A name="label21"></A><EM>lexical rule</EM> as found from line 17 to the end of the specification. A lexical abbreviation <CODE><SPAN class="keyword">lex</SPAN> </CODE><I>I</I><CODE> = <SPAN class="string"><</SPAN></CODE><I>R</I><CODE><SPAN class="string">></SPAN> <SPAN class="keyword">end</SPAN></CODE> associates an identifier <I>I</I> with a given regular expression <I>R</I>. Occurrences of <CODE>{</CODE><I>I</I><CODE>}</CODE> in other regular expressions are then replaced to <CODE>(</CODE><I>R</I><CODE>)</CODE>. </P></DIV><P> <A name="label22"></A> Note that regular expressions use the same syntax as regular expressions in <A name="label23"></A><SPAN class="index"><SPAN class="tool">flex</SPAN></SPAN> <A href="bib.html#paxson95">[Pax95]</A>, with a few exceptions (detailed in <A href="node4.html#section.scanner.syntax">Section 2.2.1</A>). Furthermore, we must either enclose them in angle brackets or give them as Oz strings. (The latter proves useful when the angle-bracketed version confuses <A name="label24"></A><SPAN class="index">Emacs</SPAN>' fontification<A name="label25"></A> mode, but is a bit harder to read, since more characters must be escaped.) </P><P> The example defines four lexical abbreviations: <CODE>digit</CODE> stands for a decimal digit, <CODE>letter</CODE> for an uppercase or lowercase letter; <CODE>id</CODE> defines the syntax of identifiers to consist of a letter, followed by an arbitrary sequence of letters and digits; and finally, <CODE>int</CODE> defines the syntax of positive decimal integers as a nonempty sequence of digits. </P><DIV class="apropos"><P class="margin">Lexical Rules</P><P> Lexical rules of the form <CODE><SPAN class="keyword">lex</SPAN> <SPAN class="string"><</SPAN></CODE><I>R</I><CODE><SPAN class="string">></SPAN> </CODE><I>S</I><CODE> <SPAN class="keyword">end</SPAN></CODE> are more interesting, since the set of these is the actual scanner specification. Upon a match of a prefix of the input character stream with the regular expression <CODE>R</CODE>, the statement <CODE>S</CODE> is executed as a method body (i. e., <CODE><SPAN class="keyword">self</SPAN></CODE> may be accessed and modified). Two methods are provided by the mixin class <A name="label26"></A><SPAN class="index"><CODE>GumpScanner<SPAN class="keyword">.</SPAN><SPAN class="string">'class'</SPAN></CODE></SPAN> (inherited from in line 2) to append tokens to the token stream: <CODE>putToken1</CODE><A name="label28"></A>, which appends a token of a given class without a value (<CODE><SPAN class="keyword">unit</SPAN></CODE> being used instead), and <CODE>putToken</CODE><A name="label30"></A>, which allows a specific token value to be provided. Token classes may be represented by arbitrary Oz values, but the parser generator in <A href="node5.html#chapter.parser">Chapter 3</A> expects them to be atoms. In lines 18 and 21 you can see how constants are used as token classes. In line 33 the token class is computed from the lexeme. </P></DIV><DIV class="apropos"><P class="margin">Accessing the Lexeme</P><P> The lexeme itself may be accessed in several ways. The method <CODE>getAtom</CODE><A name="label32"></A> returns the lexeme as an atom, which is the representation for identifier token values chosen in line 25. The method <CODE>getString</CODE><A name="label34"></A> returns the lexeme as a string, such as in line 28, where it is subsequently converted to an integer. </P></DIV><P> The remaining lexical rules are easily explained. Lines 36 and 37 respectively describe how <A name="label35"></A><SPAN class="index">whitespace</SPAN> and <A name="label36"></A><SPAN class="index">comments</SPAN> are to be ignored. This is done by neither calling <CODE>putToken1</CODE> nor <CODE>putToken</CODE>. (Note that an action can also invoke them several times to append multiple tokens to the token stream, just as it may chose not to invoke them at all to simply ignore the lexeme or only produce <A name="label37"></A><SPAN class="index">side effects</SPAN>.) The rule in line 38 ignores any matched newlines, but updates the line counter<A name="label38"></A> attribute <CODE>LineNumber</CODE> as it does so. The rule in line 41 reports any remaining <A name="label39"></A><SPAN class="index">unmatched characters</SPAN> in the input as <A name="label40"></A><SPAN class="index">lexical errors</SPAN><A name="label41"></A> and returns the token <A name="label42"></A><SPAN class="index"><CODE><SPAN class="string">'error'</SPAN></CODE></SPAN><A name="label43"></A><A name="label44"></A> which the parser can recognize as an erroneous token. </P><DIV class="apropos"><P class="margin">End-of-File Rules</P><P> The final rule, in line 46, has the special syntax <A name="label45"></A><SPAN class="index"><CODE><SPAN class="keyword"><<</SPAN>EOF<SPAN class="keyword">>></SPAN></CODE></SPAN><A name="label46"></A><A name="label47"></A> (it might also have been written as <CODE><SPAN class="string">"<<EOF>>"</SPAN></CODE>) and only matches the end of the character stream. It returns the token <A name="label48"></A><SPAN class="index"><CODE><SPAN class="string">'EOF'</SPAN></CODE></SPAN> which can be recognized by the parser as the end of input. Note that the action might just as well open another file<A name="label49"></A> to read from. </P></DIV><P> More information about acceptable sets of regular expressions in scanner specifications, conflict resolution and grouping into lexical modes is given in <A href="node4.html#section.scanner.syntax">Section 2.2.1</A>. </P><DIV class="program" id="program.scanner.example"><HR><P><A name="program.scanner.example"></A></P><P> </P><BLOCKQUOTE class="linenumbers"><PRE><SPAN class="keyword">declare</SPAN> <BR><SPAN class="keyword">scanner</SPAN> <SPAN class="type">LambdaScanner</SPAN> <SPAN class="keyword">from</SPAN><SPAN class="type"> GumpScanner.</SPAN><SPAN class="string">'class'</SPAN> <BR> <SPAN class="keyword">attr</SPAN> LineNumber<BR> <SPAN class="keyword">meth</SPAN> <SPAN class="functionname">init</SPAN>()<BR> GumpScanner<SPAN class="keyword">.</SPAN><SPAN class="string">'class'</SPAN><SPAN class="keyword">,</SPAN> init()<BR> LineNumber <SPAN class="keyword"><-</SPAN> 1<BR> <SPAN class="keyword">end</SPAN> <BR> <SPAN class="keyword">meth</SPAN> <SPAN class="functionname">getLineNumber</SPAN>($)<BR> <SPAN class="keyword">@</SPAN>LineNumber<BR> <SPAN class="keyword">end</SPAN> <BR> <BR> <SPAN class="keyword">lex</SPAN> <SPAN class="type">digit</SPAN> = <SPAN class="string"><[0-9]></SPAN> <SPAN class="keyword">end</SPAN> <BR> <SPAN class="keyword">lex</SPAN> <SPAN class="type">letter</SPAN> = <SPAN class="string"><[A-Za-z]></SPAN> <SPAN class="keyword">end</SPAN> <BR> <SPAN class="keyword">lex</SPAN> <SPAN class="type">id</SPAN> = <SPAN class="string"><{letter}({letter}|{digit})*></SPAN> <SPAN class="keyword">end</SPAN> <BR> <SPAN class="keyword">lex</SPAN> <SPAN class="type">int</SPAN> = <SPAN class="string"><{digit}+></SPAN> <SPAN class="keyword">end</SPAN> <BR> <BR> <SPAN class="keyword">lex</SPAN> <SPAN class="string"><define></SPAN> <BR> GumpScanner<SPAN class="keyword">.</SPAN><SPAN class="string">'class'</SPAN><SPAN class="keyword">,</SPAN> putToken1(<SPAN class="string">'define'</SPAN>)<BR> <SPAN class="keyword">end</SPAN> <BR> <SPAN class="keyword">lex</SPAN> <SPAN class="string"><lambda></SPAN> <BR> GumpScanner<SPAN class="keyword">.</SPAN><SPAN class="string">'class'</SPAN><SPAN class="keyword">,</SPAN> putToken1(<SPAN class="string">'lambda'</SPAN>)<BR> <SPAN class="keyword">end</SPAN> <BR> <SPAN class="keyword">lex</SPAN> <SPAN class="string"><{id}></SPAN> A <SPAN class="keyword">in</SPAN> <BR> GumpScanner<SPAN class="keyword">.</SPAN><SPAN class="string">'class'</SPAN><SPAN class="keyword">,</SPAN> getAtom(?A)<BR> GumpScanner<SPAN class="keyword">.</SPAN><SPAN class="string">'class'</SPAN><SPAN class="keyword">,</SPAN> putToken(<SPAN class="string">'id'</SPAN> A)<BR> <SPAN class="keyword">end</SPAN> <BR> <SPAN class="keyword">lex</SPAN> <SPAN class="string"><{int}></SPAN> S <SPAN class="keyword">in</SPAN> <BR> GumpScanner<SPAN class="keyword">.</SPAN><SPAN class="string">'class'</SPAN><SPAN class="keyword">,</SPAN> getString(?S)<BR> GumpScanner<SPAN class="keyword">.</SPAN><SPAN class="string">'class'</SPAN><SPAN class="keyword">,</SPAN> putToken(<SPAN class="string">'int'</SPAN> {String<SPAN class="keyword">.</SPAN>toInt S})<BR> <SPAN class="keyword">end</SPAN> <BR> <SPAN class="keyword">lex</SPAN> <SPAN class="keyword"><</SPAN><SPAN class="string">"."</SPAN><SPAN class="keyword">|</SPAN><SPAN class="string">"("</SPAN><SPAN class="keyword">|</SPAN><SPAN class="string">")"</SPAN><SPAN class="keyword">|</SPAN><SPAN class="string">"="</SPAN><SPAN class="keyword">|</SPAN><SPAN class="string">";"</SPAN><SPAN class="keyword">></SPAN> A <SPAN class="keyword">in</SPAN> <BR> GumpScanner<SPAN class="keyword">.</SPAN><SPAN class="string">'class'</SPAN><SPAN class="keyword">,</SPAN> getAtom(?A)<BR> GumpScanner<SPAN class="keyword">.</SPAN><SPAN class="string">'class'</SPAN><SPAN class="keyword">,</SPAN> putToken1(A)<BR> <SPAN class="keyword">end</SPAN> <BR> <BR> <SPAN class="keyword">lex</SPAN> <SPAN class="string"><[ \t]></SPAN> <SPAN class="keyword">skip</SPAN> <SPAN class="keyword">end</SPAN> <BR> <SPAN class="keyword">lex</SPAN> <SPAN class="keyword"><</SPAN><SPAN class="string">"%"</SPAN><SPAN class="keyword">.*></SPAN> <SPAN class="keyword">skip</SPAN> <SPAN class="keyword">end</SPAN> <BR> <SPAN class="keyword">lex</SPAN> <SPAN class="string"><\n></SPAN> <BR> LineNumber <SPAN class="keyword"><-</SPAN> <SPAN class="keyword">@</SPAN>LineNumber <SPAN class="keyword">+</SPAN> 1<BR> <SPAN class="keyword">end</SPAN> <BR> <SPAN class="keyword">lex</SPAN> <SPAN class="string"><.></SPAN> <BR> {System<SPAN class="keyword">.</SPAN>showInfo <SPAN class="string">'line '</SPAN><SPAN class="keyword">#@</SPAN>LineNumber<SPAN class="keyword">#</SPAN><SPAN class="string">': unrecognized character'</SPAN>}<BR> GumpScanner<SPAN class="keyword">.</SPAN><SPAN class="string">'class'</SPAN><SPAN class="keyword">,</SPAN> putToken1(<SPAN class="string">'error'</SPAN>)<BR> <SPAN class="keyword">end</SPAN> <BR> <BR> <SPAN class="keyword">lex</SPAN> <SPAN class="string"><<EOF>></SPAN> <BR> GumpScanner<SPAN class="keyword">.</SPAN><SPAN class="string">'class'</SPAN><SPAN class="keyword">,</SPAN> putToken1(<SPAN class="string">'EOF'</SPAN>)<BR> <SPAN class="keyword">end</SPAN> <BR><SPAN class="keyword">end</SPAN> <BR></PRE></BLOCKQUOTE><P> </P><P class="caption"><STRONG>Program 2.1:</STRONG> The <CODE>LambdaScanner</CODE> scanner specification.</P><HR></DIV><H3><A name="label50">2.1.2 Invoking Gump</A></H3><P> Now that we have finished writing our specification, we want to translate it into an Oz class definition that implements our scanner. For this, we issue the compiler directive </P><BLOCKQUOTE class="code"><CODE><SPAN class="reference">\switch +gump</SPAN></CODE></BLOCKQUOTE><P> <A name="label51"></A> whereupon the compiler<A name="label52"></A> will accept Gump specifications. </P><DIV class="apropos"><P class="margin">Running Gump</P><P> Save the above specification in a file <CODE>LambdaScanner.ozg</CODE>. The extension <CODE>.ozg</CODE><A name="label53"></A> indicates that this file contains Oz code with additional Gump definitions, so that <A name="label54"></A><SPAN class="index">Emacs</SPAN> will fontify<A name="label55"></A> Gump definitions correctly. Feeding </P><BLOCKQUOTE class="code"><CODE><SPAN class="reference">\insert LambdaScanner.ozg</SPAN></CODE></BLOCKQUOTE><P> will process this file. Switch to the Compiler buffer (via <KBD>C-c C-c</KBD>) to watch Gump's status messages and any errors occurring during the translation. </P></DIV><DIV class="apropos"><P class="margin">Output Files</P><P> <A name="label56"></A> When the translation is finished, you will notice several new files in the current working directory. These will be named after your <CODE><SPAN class="keyword">scanner</SPAN></CODE> specification. Suppose your scanner was called <CODE>S</CODE>, then you will find files <CODE>S.l</CODE>, <CODE>S.C</CODE>, <CODE>S.o</CODE> and <CODE>S.so</CODE>. The first three are intermediate results (respectively the input file for <A name="label57"></A><SPAN class="index"><SPAN class="tool">flex</SPAN></SPAN>, the <SPAN class="tool">flex</SPAN>-generated <A name="label58"></A><SPAN class="index">C++</SPAN> file and the object code produced by the C++ compiler) and the last one is the resulting <A name="label59"></A><SPAN class="index">dynamic library</SPAN><A name="label60"></A><A name="label61"></A> used by the generated scanner. </P></DIV><DIV class="apropos"><P class="margin">Implementation Limitation</P><P> <A name="label62"></A> Note that due to limitations of dynamic linking, a scanner may only be loaded once into the system. When interactively developing a scanner, this means that you will not see changes you make to the set and order of the regular expressions consistently. You should thus halt and restart Mozart each time you make changes to the regular expressions. </P></DIV><P> See also <A href="node4.html#section.scanner.params">Section 2.2.2</A> for a workaround around this limitation. </P><H3><A name="label63">2.1.3 Using the Generated Scanner</A></H3><P> <A href="node3.html#program.scanner.test">Program 2.2</A> shows a sample program running our generated scanner. </P><P> The generated <CODE>LambdaScanner</CODE> class is instantiated as <CODE>MyScanner</CODE>. We have to call the method <CODE>init()</CODE> first to initialize the internal structures of the <A name="label64"></A><SPAN class="index"><CODE>GumpScanner<SPAN class="keyword">.</SPAN><SPAN class="string">'class'</SPAN></CODE></SPAN>. </P><DIV class="apropos"><P class="margin">Requesting Tokens</P><P> The procedure <A name="label65"></A><SPAN class="index"><CODE>GetTokens</CODE></SPAN> repeatedly invokes the <CODE>GumpScanner<SPAN class="keyword">.</SPAN><SPAN class="string">'class'</SPAN></CODE> method<A name="label67"></A> </P><BLOCKQUOTE class="code"><CODE>getToken(</CODE><CODE>?<I>X</I></CODE><CODE> </CODE><CODE>?<I>Y</I></CODE><CODE>)</CODE></BLOCKQUOTE><P> which returns the next token's token class in <CODE>X</CODE> and token value in <CODE>Y</CODE> and removes it from the token stream. <CODE>GetTokens</CODE> exits when the end of the token stream is reached, which is recognized by the token class <A name="label68"></A><SPAN class="index"><CODE><SPAN class="string">'EOF'</SPAN></CODE></SPAN>. </P></DIV><DIV class="apropos"><P class="margin">Providing Inputs</P><P> To actually start scanning we have to provide an input character stream. This is done via one of the methods<A name="label70"></A><A name="label71"></A> </P><BLOCKQUOTE class="code"><CODE>scanFile(</CODE><CODE>+<I>FileName</I></CODE><CODE>)</CODE></BLOCKQUOTE><P> or<A name="label73"></A><A name="label74"></A> </P><BLOCKQUOTE class="code"><CODE>scanVirtualString(</CODE><CODE>+<I>V</I></CODE><CODE>)</CODE></BLOCKQUOTE><P> Each of these pushes the currently used buffer (if any) upon an internal stack of buffers<A name="label75"></A> and builds a new buffer from the given source. Each time the end of a buffer is reached, the <A name="label76"></A><SPAN class="index"><CODE><SPAN class="keyword"><<</SPAN>EOF<SPAN class="keyword">>></SPAN></CODE></SPAN> rule is matched. This may pop a buffer and continue scanning the next-outer buffer where it left off, using the <CODE>closeBuffer</CODE><A name="label78"></A> method described in <A href="node4.html#section.scanner.class">Section 2.2.3</A>. </P></DIV><DIV class="apropos"><P class="margin">Closing Scanners</P><P> When a scanner is not used anymore, it should be sent the message<A name="label80"></A> </P><BLOCKQUOTE class="code"><CODE>close()</CODE></BLOCKQUOTE><P> so that it can close any open files and release any allocated buffers. (This is even necessary when scanning virtual strings due to the underlying implementation in <A name="label81"></A><SPAN class="index">C++</SPAN>.) </P></DIV><P> The following is a sample input for the scanner. The above example expects this to be placed in the file <CODE>Lambda.in</CODE> in the current directory: </P><BLOCKQUOTE><PRE><SPAN class="comment">% some input to test the class LambdaScanner<BR></SPAN><SPAN class="keyword">define</SPAN> <SPAN class="functionname">f</SPAN> = <SPAN class="keyword">lambda</SPAN> <SPAN class="variablename">y</SPAN>.<SPAN class="keyword">lambda</SPAN> <SPAN class="variablename">z</SPAN>.(<SPAN class="variablename">add</SPAN> <SPAN class="variablename">y</SPAN> <SPAN class="variablename">z</SPAN>);<BR><SPAN class="keyword">define</SPAN> <SPAN class="functionname">c</SPAN> = <SPAN class="reference">17</SPAN>;<BR><SPAN class="variablename">f</SPAN> <SPAN class="variablename">c</SPAN> <SPAN class="reference">7</SPAN>;<BR>((<SPAN class="variablename">f</SPAN>) <SPAN class="variablename">c</SPAN>) <SPAN class="reference">7</SPAN> <BR></PRE></BLOCKQUOTE><P> </P></DIV><DIV class="program" id="program.scanner.test"><HR><P><A name="program.scanner.test"></A></P><P> </P><BLOCKQUOTE><PRE><SPAN class="reference">\switch +gump</SPAN> <BR><SPAN class="reference">\insert gump/examples/LambdaScanner.ozg</SPAN> <BR> <BR><SPAN class="keyword">local</SPAN> <BR> MyScanner = {New LambdaScanner init()}<BR> <SPAN class="keyword">proc</SPAN><SPAN class="variablename"> </SPAN>{<SPAN class="functionname">GetTokens</SPAN>} T V <SPAN class="keyword">in</SPAN> <BR> {MyScanner getToken(?T ?V)}<BR> <SPAN class="keyword">case</SPAN> T <SPAN class="keyword">of</SPAN> <SPAN class="string">'EOF'</SPAN> <SPAN class="keyword">then</SPAN> <BR> {System<SPAN class="keyword">.</SPAN>showInfo <SPAN class="string">'End of file reached.'</SPAN>}<BR> <SPAN class="keyword">else</SPAN> <BR> {System<SPAN class="keyword">.</SPAN>show T<SPAN class="keyword">#</SPAN>V}<BR> {GetTokens}<BR> <SPAN class="keyword">end</SPAN> <BR> <SPAN class="keyword">end</SPAN> <BR><SPAN class="keyword">in</SPAN> <BR> {MyScanner scanFile(<SPAN class="string">'Lambda.in'</SPAN>)}<BR> {GetTokens}<BR> {MyScanner close()}<BR><SPAN class="keyword">end</SPAN> <BR></PRE></BLOCKQUOTE><P> </P><P class="caption"><STRONG>Program 2.2:</STRONG> A program making use of the generated scanner.</P><HR></DIV><TABLE align="center" border="0" cellpadding="6" cellspacing="6" class="nav"><TR bgcolor="#DDDDDD"><TD><A href="node2.html">- Up -</A></TD><TD><A href="node4.html#section.scanner.reference">Next >></A></TD></TR></TABLE><HR><ADDRESS><A href="http://www.ps.uni-sb.de/~kornstae/">Leif Kornstaedt</A><BR><SPAN class="version">Version 1.4.0 (20110908185330)</SPAN></ADDRESS></BODY></HTML>
|