/usr/share/doc/flex-old-doc/flex_7.html is in flex-old-doc 2.5.4a-10ubuntu1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 | html2
<html>
<!-- Created on February 6, 2012 by texi2html 1.82
texi2html was written by:
Lionel Cons <Lionel.Cons@cern.ch> (original author)
Karl Berry <karl@freefriends.org>
Olaf Bachmann <obachman@mathematik.uni-kl.de>
and many others.
Maintained by: Many creative people.
Send bugs and suggestions to <texi2html-bug@nongnu.org>
-->
<head>
<title>Flex - a scanner generator: 7. Patterns</title>
<meta name="description" content="Flex - a scanner generator: 7. Patterns">
<meta name="keywords" content="Flex - a scanner generator: 7. Patterns">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="texi2html 1.82">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.smallquotation {font-size: smaller}
pre.display {font-family: serif}
pre.format {font-family: serif}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: serif; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: serif; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.roman {font-family:serif; font-weight:normal;}
span.sansserif {font-family:sans-serif; font-weight:normal;}
ul.toc {list-style: none}
-->
</style>
</head>
<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
<a name="Patterns"></a>
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="flex_6.html#Format" title="Previous section in reading order"> < </a>]</td>
<td valign="middle" align="left">[<a href="flex_8.html#Matching" title="Next section in reading order"> > </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="flex_6.html#Format" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="flex.html#Top" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="flex_8.html#Matching" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="flex.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="flex_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[Index]</td>
<td valign="middle" align="left">[<a href="flex_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<a name="Patterns-1"></a>
<h1 class="chapter">7. Patterns</h1>
<p>The patterns in the input are written using an extended
set of regular expressions. These are:
</p>
<dl compact="compact">
<dt> ‘<samp>x</samp>’</dt>
<dd><p>match the character ‘<samp>x</samp>’
</p></dd>
<dt> ‘<samp>.</samp>’</dt>
<dd><p>any character (byte) except newline
</p></dd>
<dt> ‘<samp>[xyz]</samp>’</dt>
<dd><p>a "character class"; in this case, the pattern
matches either an ‘<samp>x</samp>’, a ‘<samp>y</samp>’, or a ‘<samp>z</samp>’
</p></dd>
<dt> ‘<samp>[abj-oZ]</samp>’</dt>
<dd><p>a "character class" with a range in it; matches
an ‘<samp>a</samp>’, a ‘<samp>b</samp>’, any letter from ‘<samp>j</samp>’ through ‘<samp>o</samp>’,
or a ‘<samp>Z</samp>’
</p></dd>
<dt> ‘<samp>[^A-Z]</samp>’</dt>
<dd><p>a "negated character class", i.e., any character
but those in the class. In this case, any
character EXCEPT an uppercase letter.
</p></dd>
<dt> ‘<samp>[^A-Z\n]</samp>’</dt>
<dd><p>any character EXCEPT an uppercase letter or
a newline
</p></dd>
<dt> ‘<samp><var>r</var>*</samp>’</dt>
<dd><p>zero or more <var>r</var>’s, where <var>r</var> is any regular expression
</p></dd>
<dt> ‘<samp><var>r</var>+</samp>’</dt>
<dd><p>one or more <var>r</var>’s
</p></dd>
<dt> ‘<samp><var>r</var>?</samp>’</dt>
<dd><p>zero or one <var>r</var>’s (that is, "an optional <var>r</var>")
</p></dd>
<dt> ‘<samp><var>r</var>{2,5}</samp>’</dt>
<dd><p>anywhere from two to five <var>r</var>’s
</p></dd>
<dt> ‘<samp><var>r</var>{2,}</samp>’</dt>
<dd><p>two or more <var>r</var>’s
</p></dd>
<dt> ‘<samp><var>r</var>{4}</samp>’</dt>
<dd><p>exactly 4 <var>r</var>’s
</p></dd>
<dt> ‘<samp>{<var>name</var>}</samp>’</dt>
<dd><p>the expansion of the "<var>name</var>" definition
(see above)
</p></dd>
<dt> ‘<samp>"[xyz]\"foo"</samp>’</dt>
<dd><p>the literal string: ‘<samp>[xyz]"foo</samp>’
</p></dd>
<dt> ‘<samp>\<var>x</var></samp>’</dt>
<dd><p>if <var>x</var> is an ‘<samp>a</samp>’, ‘<samp>b</samp>’, ‘<samp>f</samp>’, ‘<samp>n</samp>’, ‘<samp>r</samp>’, ‘<samp>t</samp>’, or ‘<samp>v</samp>’,
then the ANSI-C interpretation of \<var>x</var>.
Otherwise, a literal ‘<samp><var>x</var></samp>’ (used to escape
operators such as ‘<samp>*</samp>’)
</p></dd>
<dt> ‘<samp>\0</samp>’</dt>
<dd><p>a NUL character (ASCII code 0)
</p></dd>
<dt> ‘<samp>\123</samp>’</dt>
<dd><p>the character with octal value 123
</p></dd>
<dt> ‘<samp>\x2a</samp>’</dt>
<dd><p>the character with hexadecimal value <code>2a</code>
</p></dd>
<dt> ‘<samp>(<var>r</var>)</samp>’</dt>
<dd><p>match an <var>r</var>; parentheses are used to override
precedence (see below)
</p></dd>
<dt> ‘<samp><var>r</var><var>s</var></samp>’</dt>
<dd><p>the regular expression <var>r</var> followed by the
regular expression <var>s</var>; called "concatenation"
</p></dd>
<dt> ‘<samp><var>r</var>|<var>s</var></samp>’</dt>
<dd><p>either an <var>r</var> or an <var>s</var>
</p></dd>
<dt> ‘<samp><var>r</var>/<var>s</var></samp>’</dt>
<dd><p>an <var>r</var> but only if it is followed by an <var>s</var>. The text
matched by <var>s</var> is included when determining whether this rule is
the <em>longest match</em>, but is then returned to the input before
the action is executed. So the action only sees the text matched
by <var>r</var>. This type of pattern is called <em>trailing context</em>.
(There are some combinations of ‘<samp><var>r</var>/<var>s</var></samp>’ that <code>flex</code>
cannot match correctly; see notes in the Deficiencies / Bugs section
below regarding "dangerous trailing context".)
</p></dd>
<dt> ‘<samp>^<var>r</var></samp>’</dt>
<dd><p>an <var>r</var>, but only at the beginning of a line (i.e.,
which just starting to scan, or right after a
newline has been scanned).
</p></dd>
<dt> ‘<samp><var>r</var>$</samp>’</dt>
<dd><p>an <var>r</var>, but only at the end of a line (i.e., just
before a newline). Equivalent to "<var>r</var>/\n".
</p>
<p>Note that flex’s notion of "newline" is exactly
whatever the C compiler used to compile flex
interprets ’\n’ as; in particular, on some DOS
systems you must either filter out \r’s in the
input yourself, or explicitly use <var>r</var>/\r\n for "r$".
</p></dd>
<dt> ‘<samp><<var>s</var>><var>r</var></samp>’</dt>
<dd><p>an <var>r</var>, but only in start condition <var>s</var> (see
below for discussion of start conditions)
<<var>s1</var>,<var>s2</var>,<var>s3</var>><var>r</var>
same, but in any of start conditions <var>s1</var>,
<var>s2</var>, or <var>s3</var>
</p></dd>
<dt> ‘<samp><*><var>r</var></samp>’</dt>
<dd><p>an <var>r</var> in any start condition, even an exclusive one.
</p></dd>
<dt> ‘<samp><<EOF>></samp>’</dt>
<dd><p>an end-of-file
<<var>s1</var>,<var>s2</var>><<EOF>>
an end-of-file when in start condition <var>s1</var> or <var>s2</var>
</p></dd>
</dl>
<p>Note that inside of a character class, all regular
expression operators lose their special meaning except escape
(’\’) and the character class operators, ’-’, ’]’, and, at
the beginning of the class, ’^’.
</p>
<p>The regular expressions listed above are grouped according
to precedence, from highest precedence at the top to
lowest at the bottom. Those grouped together have equal
precedence. For example,
</p>
<table><tr><td> </td><td><pre class="example">foo|bar*
</pre></td></tr></table>
<p>is the same as
</p>
<table><tr><td> </td><td><pre class="example">(foo)|(ba(r*))
</pre></td></tr></table>
<p>since the ’*’ operator has higher precedence than
concatenation, and concatenation higher than alternation (’|’).
This pattern therefore matches <em>either</em> the string "foo" <em>or</em>
the string "ba" followed by zero-or-more r’s. To match
"foo" or zero-or-more "bar"’s, use:
</p>
<table><tr><td> </td><td><pre class="example">foo|(bar)*
</pre></td></tr></table>
<p>and to match zero-or-more "foo"’s-or-"bar"’s:
</p>
<table><tr><td> </td><td><pre class="example">(foo|bar)*
</pre></td></tr></table>
<p>In addition to characters and ranges of characters,
character classes can also contain character class
<em>expressions</em>. These are expressions enclosed inside ‘<samp>[</samp>’: and ‘<samp>:</samp>’]
delimiters (which themselves must appear between the ’[’
and ’]’ of the character class; other elements may occur
inside the character class, too). The valid expressions
are:
</p>
<table><tr><td> </td><td><pre class="example">[:alnum:] [:alpha:] [:blank:]
[:cntrl:] [:digit:] [:graph:]
[:lower:] [:print:] [:punct:]
[:space:] [:upper:] [:xdigit:]
</pre></td></tr></table>
<p>These expressions all designate a set of characters
equivalent to the corresponding standard C ‘<samp>isXXX</samp>’ function. For
example, ‘<samp>[:alnum:]</samp>’ designates those characters for which
‘<samp>isalnum()</samp>’ returns true - i.e., any alphabetic or numeric.
Some systems don’t provide ‘<samp>isblank()</samp>’, so flex defines
‘<samp>[:blank:]</samp>’ as a blank or a tab.
</p>
<p>For example, the following character classes are all
equivalent:
</p>
<table><tr><td> </td><td><pre class="example">[[:alnum:]]
[[:alpha:][:digit:]
[[:alpha:]0-9]
[a-zA-Z0-9]
</pre></td></tr></table>
<p>If your scanner is case-insensitive (the ‘<samp>-i</samp>’ flag), then
‘<samp>[:upper:]</samp>’ and ‘<samp>[:lower:]</samp>’ are equivalent to ‘<samp>[:alpha:]</samp>’.
</p>
<p>Some notes on patterns:
</p>
<ul class="toc">
<li> -
A negated character class such as the example
"[^A-Z]" above <em>will match a newline</em> unless "\n" (or an
equivalent escape sequence) is one of the
characters explicitly present in the negated character
class (e.g., "[^A-Z\n]"). This is unlike how many
other regular expression tools treat negated
character classes, but unfortunately the inconsistency
is historically entrenched. Matching newlines
means that a pattern like [^"]* can match the
entire input unless there’s another quote in the
input.
</li><li> -
A rule can have at most one instance of trailing
context (the ’/’ operator or the ’$’ operator).
The start condition, ’^’, and "<<EOF>>" patterns
can only occur at the beginning of a pattern, and,
as well as with ’/’ and ’$’, cannot be grouped
inside parentheses. A ’^’ which does not occur at
the beginning of a rule or a ’$’ which does not
occur at the end of a rule loses its special
properties and is treated as a normal character.
<p>The following are illegal:
</p>
<table><tr><td> </td><td><pre class="example">foo/bar$
<sc1>foo<sc2>bar
</pre></td></tr></table>
<p>Note that the first of these, can be written
"foo/bar\n".
</p>
<p>The following will result in ’$’ or ’^’ being
treated as a normal character:
</p>
<table><tr><td> </td><td><pre class="example">foo|(bar$)
foo|^bar
</pre></td></tr></table>
<p>If what’s wanted is a "foo" or a
bar-followed-by-a-newline, the following could be used (the special
’|’ action is explained below):
</p>
<table><tr><td> </td><td><pre class="example">foo |
bar$ /* action goes here */
</pre></td></tr></table>
<p>A similar trick will work for matching a foo or a
bar-at-the-beginning-of-a-line.
</p></li></ul>
<hr size="6">
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="flex_6.html#Format" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="flex_8.html#Matching" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="flex.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="flex_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[Index]</td>
<td valign="middle" align="left">[<a href="flex_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<p>
<font size="-1">
This document was generated by <em>Build Daemon user</em> on <em>February 6, 2012</em> using <a href="http://www.nongnu.org/texi2html/"><em>texi2html 1.82</em></a>.
</font>
<br>
</p>
</body>
</html>
|