/usr/share/doc/m4-doc/Input-processing.html

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- 
This manual (31 December 2016) is for GNU M4 (version
1.4.18), a package containing an implementation of the m4 macro
language.

Copyright (C) 1989-1994, 2004-2014, 2016 Free Software
Foundation, Inc.

Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.3 or any later version published by the Free Software
Foundation; with no Invariant Sections, no Front-Cover Texts, and no
Back-Cover Texts.  A copy of the license is included in the section
entitled "GNU Free Documentation License." -->
<!-- Created by GNU Texinfo 6.3, http://www.gnu.org/software/texinfo/ -->
<head>
<title>GNU M4 1.4.18 macro processor: Input processing</title>

<meta name="description" content="GNU M4 1.4.18 macro processor: Input processing">
<meta name="keywords" content="GNU M4 1.4.18 macro processor: Input processing">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link href="index.html#Top" rel="start" title="Top">
<link href="Indices.html#Indices" rel="index" title="Indices">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Syntax.html#Syntax" rel="up" title="Syntax">
<link href="Macros.html#Macros" rel="next" title="Macros">
<link href="Other-tokens.html#Other-tokens" rel="prev" title="Other tokens">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.indentedblock {margin-right: 0em}
blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
blockquote.smallquotation {font-size: smaller}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
div.lisp {margin-left: 3.2em}
div.smalldisplay {margin-left: 3.2em}
div.smallexample {margin-left: 3.2em}
div.smalllisp {margin-left: 3.2em}
kbd {font-style: oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: inherit; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: inherit; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.nolinebreak {white-space: nowrap}
span.roman {font-family: initial; font-weight: normal}
span.sansserif {font-family: sans-serif; font-weight: normal}
ul.no-bullet {list-style: none}
-->
</style>


</head>

<body lang="en">
<a name="Input-processing"></a>
<div class="header">
<p>
Previous: <a href="Other-tokens.html#Other-tokens" accesskey="p" rel="prev">Other tokens</a>, Up: <a href="Syntax.html#Syntax" accesskey="u" rel="up">Syntax</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Indices.html#Indices" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<a name="How-m4-copies-input-to-output"></a>
<h3 class="section">3.5 How <code>m4</code> copies input to output</h3>

<p>As <code>m4</code> reads the input token by token, it will copy each token
directly to the output immediately.
</p>
<p>The exception is when it finds a word with a macro definition.  In that
case <code>m4</code> will calculate the macro&rsquo;s expansion, possibly reading
more input to get the arguments.  It then inserts the expansion in front
of the remaining input.  In other words, the resulting text from a macro
call will be read and parsed into tokens again.
</p>
<p><code>m4</code> expands a macro as soon as possible.  If it finds a macro call
when collecting the arguments to another, it will expand the second call
first.  This process continues until there are no more macro calls to
expand and all the input has been consumed.
</p>
<p>For a running example, examine how <code>m4</code> handles this input:
</p>
<div class="example">
<pre class="example">format(`Result is %d', eval(`2**15'))
</pre></div>

<p>First, <code>m4</code> sees that the token &lsquo;<samp>format</samp>&rsquo; is a macro name, so
it collects the tokens &lsquo;<samp>(</samp>&rsquo;, &lsquo;<samp>`Result is %d'</samp>&rsquo;, &lsquo;<samp>,</samp>&rsquo;,
and &lsquo;<samp>&nbsp;<!-- /@w --></samp>&rsquo;, before encountering another potential macro.  Sure
enough, &lsquo;<samp>eval</samp>&rsquo; is a macro name, so the nested argument collection
picks up &lsquo;<samp>(</samp>&rsquo;, &lsquo;<samp>`2**15'</samp>&rsquo;, and &lsquo;<samp>)</samp>&rsquo;, invoking the eval macro
with the lone argument of &lsquo;<samp>2**15</samp>&rsquo;.  The expansion of
&lsquo;<samp>eval(2**15)</samp>&rsquo; is &lsquo;<samp>32768</samp>&rsquo;, which is then rescanned as the five
tokens &lsquo;<samp>3</samp>&rsquo;, &lsquo;<samp>2</samp>&rsquo;, &lsquo;<samp>7</samp>&rsquo;, &lsquo;<samp>6</samp>&rsquo;, and &lsquo;<samp>8</samp>&rsquo;; and
combined with the next &lsquo;<samp>)</samp>&rsquo;, the format macro now has all its
arguments, as if the user had typed:
</p>
<div class="example">
<pre class="example">format(`Result is %d', 32768)
</pre></div>

<p>The format macro expands to &lsquo;<samp>Result is 32768</samp>&rsquo;, and we have another
round of scanning for the tokens &lsquo;<samp>Result</samp>&rsquo;, &lsquo;<samp>&nbsp;<!-- /@w --></samp>&rsquo;,
&lsquo;<samp>is</samp>&rsquo;, &lsquo;<samp>&nbsp;<!-- /@w --></samp>&rsquo;, &lsquo;<samp>3</samp>&rsquo;, &lsquo;<samp>2</samp>&rsquo;, &lsquo;<samp>7</samp>&rsquo;, &lsquo;<samp>6</samp>&rsquo;, and
&lsquo;<samp>8</samp>&rsquo;.  None of these are macros, so the final output is
</p>
<div class="example">
<pre class="example">&rArr;Result is 32768
</pre></div>

<p>As a more complicated example, we will contrast an actual code
example from the Gnulib project<a name="DOCF1" href="#FOOT1"><sup>1</sup></a>,
showing both a buggy approach and the desired results.  The user desires
to output a shell assignment statement that takes its argument and turns
it into a shell variable by converting it to uppercase and prepending a
prefix.  The original attempt looks like this:
</p>
<div class="example">
<pre class="example">changequote([,])dnl
define([gl_STRING_MODULE_INDICATOR],
  [
    dnl comment
    GNULIB_]translit([$1],[a-z],[A-Z])[=1
  ])dnl
  gl_STRING_MODULE_INDICATOR([strcase])
&rArr;  <!-- /@w -->
&rArr;        GNULIB_strcase=1
&rArr;  <!-- /@w -->
</pre></div>

<p>Oops &ndash; the argument did not get capitalized.  And although the manual
is not able to easily show it, both lines that appear empty actually
contain two trailing spaces.  By stepping through the parse, it is easy
to see what happened.  First, <code>m4</code> sees the token
&lsquo;<samp>changequote</samp>&rsquo;, which it recognizes as a macro, followed by
&lsquo;<samp>(</samp>&rsquo;, &lsquo;<samp>[</samp>&rsquo;, &lsquo;<samp>,</samp>&rsquo;, &lsquo;<samp>]</samp>&rsquo;, and &lsquo;<samp>)</samp>&rsquo; to form the
argument list.  The macro expands to the empty string, but changes the
quoting characters to something more useful for generating shell code
(unbalanced &lsquo;<samp>`</samp>&rsquo; and &lsquo;<samp>'</samp>&rsquo; appear all the time in shell scripts,
but unbalanced &lsquo;<samp>[]</samp>&rsquo; tend to be rare).  Also in the first line,
<code>m4</code> sees the token &lsquo;<samp>dnl</samp>&rsquo;, which it recognizes as a builtin
macro that consumes the rest of the line, resulting in no output for
that line.
</p>
<p>The second line starts a macro definition.  <code>m4</code> sees the token
&lsquo;<samp>define</samp>&rsquo;, which it recognizes as a macro, followed by a &lsquo;<samp>(</samp>&rsquo;,
&lsquo;<samp>[gl_STRING_MODULE_INDICATOR]</samp>&rsquo;, and &lsquo;<samp>,</samp>&rsquo;.  Because an unquoted
comma was encountered, the first argument is known to be the expansion
of the single-quoted string token, or &lsquo;<samp>gl_STRING_MODULE_INDICATOR</samp>&rsquo;.
Next, <code>m4</code> sees &lsquo;<samp><span class="key">NL</span></samp>&rsquo;, &lsquo;<samp> </samp>&rsquo;, and &lsquo;<samp> </samp>&rsquo;, but this
whitespace is discarded as part of argument collection.  Then comes a
rather lengthy single-quoted string token, &lsquo;<samp>[<span class="key">NL</span>&nbsp;&nbsp;&nbsp;&nbsp;dnl
comment<span class="key">NL</span>&nbsp;&nbsp;&nbsp;&nbsp;GNULIB_]</samp>&rsquo;.  This is followed by the token
&lsquo;<samp>translit</samp>&rsquo;, which <code>m4</code> recognizes as a macro name, so a nested
macro expansion has started.
</p>
<p>The arguments to the <code>translit</code> are found by the tokens &lsquo;<samp>(</samp>&rsquo;,
&lsquo;<samp>[$1]</samp>&rsquo;, &lsquo;<samp>,</samp>&rsquo;, &lsquo;<samp>[a-z]</samp>&rsquo;, &lsquo;<samp>,</samp>&rsquo;, &lsquo;<samp>[A-Z]</samp>&rsquo;, and finally
&lsquo;<samp>)</samp>&rsquo;.  All three string arguments are expanded (or in other words,
the quotes are stripped), and since neither &lsquo;<samp>$</samp>&rsquo; nor &lsquo;<samp>1</samp>&rsquo; need
capitalization, the result of the macro is &lsquo;<samp>$1</samp>&rsquo;.  This expansion is
rescanned, resulting in the two literal characters &lsquo;<samp>$</samp>&rsquo; and
&lsquo;<samp>1</samp>&rsquo;.
</p>
<p>Scanning of the outer macro resumes, and picks up with
&lsquo;<samp>[=1<span class="key">NL</span>&nbsp;&nbsp;]</samp>&rsquo;, and finally &lsquo;<samp>)</samp>&rsquo;.  The collected pieces of
expanded text are concatenated, with the end result that the macro
&lsquo;<samp>gl_STRING_MODULE_INDICATOR</samp>&rsquo; is now defined to be the sequence
&lsquo;<samp><span class="key">NL</span>&nbsp;&nbsp;&nbsp;&nbsp;dnl comment<span class="key">NL</span>&nbsp;&nbsp;&nbsp;&nbsp;GNULIB_$1=1<span class="key">NL</span>&nbsp;&nbsp;</samp>&rsquo;.
Once again, &lsquo;<samp>dnl</samp>&rsquo; is recognized and avoids a newline in the output.
</p>
<p>The final line is then parsed, beginning with &lsquo;<samp> </samp>&rsquo; and &lsquo;<samp> </samp>&rsquo;
that are output literally.  Then &lsquo;<samp>gl_STRING_MODULE_INDICATOR</samp>&rsquo; is
recognized as a macro name, with an argument list of &lsquo;<samp>(</samp>&rsquo;,
&lsquo;<samp>[strcase]</samp>&rsquo;, and &lsquo;<samp>)</samp>&rsquo;.  Since the definition of the macro
contains the sequence &lsquo;<samp>$1</samp>&rsquo;, that sequence is replaced with the
argument &lsquo;<samp>strcase</samp>&rsquo; prior to starting the rescan.  The rescan sees
&lsquo;<samp><span class="key">NL</span></samp>&rsquo; and four spaces, which are output literally, then
&lsquo;<samp>dnl</samp>&rsquo;, which discards the text &lsquo;<samp> comment<span class="key">NL</span></samp>&rsquo;.  Next
comes four more spaces, also output literally, and the token
&lsquo;<samp>GNULIB_strcase</samp>&rsquo;, which resulted from the earlier parameter
substitution.  Since that is not a macro name, it is output literally,
followed by the literal tokens &lsquo;<samp>=</samp>&rsquo;, &lsquo;<samp>1</samp>&rsquo;, &lsquo;<samp><span class="key">NL</span></samp>&rsquo;, and
two more spaces.  Finally, the original &lsquo;<samp><span class="key">NL</span></samp>&rsquo; seen after the
macro invocation is scanned and output literally.
</p>
<p>Now for a corrected approach.  This rearranges the use of newlines and
whitespace so that less whitespace is output (which, although harmless
to shell scripts, can be visually unappealing), and fixes the quoting
issues so that the capitalization occurs when the macro
&lsquo;<samp>gl_STRING_MODULE_INDICATOR</samp>&rsquo; is invoked, rather then when it is
defined.  It also adds another layer of quoting to the first argument of
<code>translit</code>, to ensure that the output will be rescanned as a string
rather than a potential uppercase macro name needing further expansion.
</p>
<div class="example">
<pre class="example">changequote([,])dnl
define([gl_STRING_MODULE_INDICATOR],
  [dnl comment
  GNULIB_[]translit([[$1]], [a-z], [A-Z])=1dnl
])dnl
  gl_STRING_MODULE_INDICATOR([strcase])
&rArr;    GNULIB_STRCASE=1
</pre></div>

<p>The parsing of the first line is unchanged.  The second line sees the
name of the macro to define, then sees the discarded &lsquo;<samp><span class="key">NL</span></samp>&rsquo;
and two spaces, as before.  But this time, the next token is
&lsquo;<samp>[dnl comment<span class="key">NL</span>&nbsp;&nbsp;GNULIB_[]translit([[$1]], [a-z],
[A-Z])=1dnl<span class="key">NL</span>]</samp>&rsquo;, which includes nested quotes, followed by
&lsquo;<samp>)</samp>&rsquo; to end the macro definition and &lsquo;<samp>dnl</samp>&rsquo; to skip the
newline.  No early expansion of <code>translit</code> occurs, so the entire
string becomes the definition of the macro.
</p>
<p>The final line is then parsed, beginning with two spaces that are
output literally, and an invocation of
<code>gl_STRING_MODULE_INDICATOR</code> with the argument &lsquo;<samp>strcase</samp>&rsquo;.
Again, the &lsquo;<samp>$1</samp>&rsquo; in the macro definition is substituted prior to
rescanning.  Rescanning first encounters &lsquo;<samp>dnl</samp>&rsquo;, and discards
&lsquo;<samp> comment<span class="key">NL</span></samp>&rsquo;.  Then two spaces are output literally.  Next
comes the token &lsquo;<samp>GNULIB_</samp>&rsquo;, but that is not a macro, so it is
output literally.  The token &lsquo;<samp>[]</samp>&rsquo; is an empty string, so it does
not affect output.  Then the token &lsquo;<samp>translit</samp>&rsquo; is encountered.
</p>
<p>This time, the arguments to <code>translit</code> are parsed as &lsquo;<samp>(</samp>&rsquo;,
&lsquo;<samp>[[strcase]]</samp>&rsquo;, &lsquo;<samp>,</samp>&rsquo;, &lsquo;<samp> </samp>&rsquo;, &lsquo;<samp>[a-z]</samp>&rsquo;, &lsquo;<samp>,</samp>&rsquo;, &lsquo;<samp> </samp>&rsquo;,
&lsquo;<samp>[A-Z]</samp>&rsquo;, and &lsquo;<samp>)</samp>&rsquo;.  The two spaces are discarded, and the
translit results in the desired result &lsquo;<samp>[STRCASE]</samp>&rsquo;.  This is
rescanned, but since it is a string, the quotes are stripped and the
only output is a literal &lsquo;<samp>STRCASE</samp>&rsquo;.
Then the scanner sees &lsquo;<samp>=</samp>&rsquo; and &lsquo;<samp>1</samp>&rsquo;, which are output
literally, followed by &lsquo;<samp>dnl</samp>&rsquo; which discards the rest of the
definition of <code>gl_STRING_MODULE_INDICATOR</code>.  The newline at the
end of output is the literal &lsquo;<samp><span class="key">NL</span></samp>&rsquo; that appeared after the
invocation of the macro.
</p>
<p>The order in which <code>m4</code> expands the macros can be further explored
using the trace facilities of GNU <code>m4</code> (see <a href="Trace.html#Trace">Trace</a>).
</p>
<div class="footnote">
<hr>
<h4 class="footnotes-heading">Footnotes</h4>

<h3><a name="FOOT1" href="#DOCF1">(1)</a></h3>
<p>Derived from a patch in
<a href="http://lists.gnu.org/archive/html/bug-gnulib/2007-01/msg00389.html">http://lists.gnu.org/archive/html/bug-gnulib/2007-01/msg00389.html</a>,
and a followup patch in
<a href="http://lists.gnu.org/archive/html/bug-gnulib/2007-02/msg00000.html">http://lists.gnu.org/archive/html/bug-gnulib/2007-02/msg00000.html</a></p>
</div>
<hr>
<div class="header">
<p>
Previous: <a href="Other-tokens.html#Other-tokens" accesskey="p" rel="prev">Other tokens</a>, Up: <a href="Syntax.html#Syntax" accesskey="u" rel="up">Syntax</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Indices.html#Indices" title="Index" rel="index">Index</a>]</p>
</div>



</body>
</html>
m4-doc 1.4.18-1 / usr / share / doc / m4-doc / Input-processing.html