This file is indexed.

/usr/share/doc/ne/html/Regular-Expressions.html is in ne-doc 2.5-1.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
<html lang="en">
<head>
<title>Regular Expressions - ne's manual</title>
<meta http-equiv="Content-Type" content="text/html">
<meta name="description" content="ne's manual">
<meta name="generator" content="makeinfo 4.13">
<link title="Top" rel="start" href="index.html#Top">
<link rel="up" href="Reference.html#Reference" title="Reference">
<link rel="prev" href="Menus.html#Menus" title="Menus">
<link rel="next" href="Automatic-Preferences.html#Automatic-Preferences" title="Automatic Preferences">
<link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
<meta http-equiv="Content-Style-Type" content="text/css">
<style type="text/css"><!--
  pre.display { font-family:inherit }
  pre.format  { font-family:inherit }
  pre.smalldisplay { font-family:inherit; font-size:smaller }
  pre.smallformat  { font-family:inherit; font-size:smaller }
  pre.smallexample { font-size:smaller }
  pre.smalllisp    { font-size:smaller }
  span.sc    { font-variant:small-caps }
  span.roman { font-family:serif; font-weight:normal; } 
  span.sansserif { font-family:sans-serif; font-weight:normal; } 
--></style>
</head>
<body>
<div class="node">
<a name="Regular-Expressions"></a>
<p>
Next:&nbsp;<a rel="next" accesskey="n" href="Automatic-Preferences.html#Automatic-Preferences">Automatic Preferences</a>,
Previous:&nbsp;<a rel="previous" accesskey="p" href="Menus.html#Menus">Menus</a>,
Up:&nbsp;<a rel="up" accesskey="u" href="Reference.html#Reference">Reference</a>
<hr>
</div>

<h3 class="section">3.8 Regular Expressions</h3>

<p><a name="index-Regular-Expressions-73"></a>
Regular expressions are a powerful way of specifying complex search and
replace operations. <code>ne</code> supports the full regular expression
syntax on US-ASCII and 8-bit buffers, but has to impose a restriction on
character sets when searching in UTF-8 text. See <a href="UTF_002d8-Support.html#UTF_002d8-Support">UTF-8 Support</a>.

<h4 class="subsection">3.8.1 Syntax</h4>

<p>The following section is taken (with minor modifications) from the GNU regular
expression library documentation and is Copyright &copy; Free Software
Foundation.

   <p>A regular expression describes a set of strings.  The simplest case is one
that describes a particular string; for example, the string &lsquo;<samp><span class="samp">foo</span></samp>&rsquo; when
regarded as a regular expression matches &lsquo;<samp><span class="samp">foo</span></samp>&rsquo; and nothing else. 
Nontrivial regular expressions use certain special constructs so that they
can match more than one string.  For example, the regular expression
&lsquo;<samp><span class="samp">foo|bar</span></samp>&rsquo; matches either the string &lsquo;<samp><span class="samp">foo</span></samp>&rsquo; or the string
&lsquo;<samp><span class="samp">bar</span></samp>&rsquo;; the regular expression &lsquo;<samp><span class="samp">c[ad]*r</span></samp>&rsquo; matches any of the strings
&lsquo;<samp><span class="samp">cr</span></samp>&rsquo;, &lsquo;<samp><span class="samp">car</span></samp>&rsquo;, &lsquo;<samp><span class="samp">cdr</span></samp>&rsquo;, &lsquo;<samp><span class="samp">caar</span></samp>&rsquo;, &lsquo;<samp><span class="samp">cadddar</span></samp>&rsquo; and all other
such strings with any number of &lsquo;<samp><span class="samp">a</span></samp>&rsquo;'s and &lsquo;<samp><span class="samp">d</span></samp>&rsquo;'s.

   <p>Regular expressions have a syntax in which a few characters are special
constructs and the rest are <dfn>ordinary</dfn>.  An ordinary character is a
simple regular expression which matches that character and nothing else. The
special characters are &lsquo;<samp><span class="samp">$</span></samp>&rsquo;, &lsquo;<samp><span class="samp">^</span></samp>&rsquo;, &lsquo;<samp><span class="samp">.</span></samp>&rsquo;, &lsquo;<samp><span class="samp">*</span></samp>&rsquo;, &lsquo;<samp><span class="samp">+</span></samp>&rsquo;,
&lsquo;<samp><span class="samp">?</span></samp>&rsquo;, &lsquo;<samp><span class="samp">[</span></samp>&rsquo;, &lsquo;<samp><span class="samp">]</span></samp>&rsquo; , &lsquo;<samp><span class="samp">(</span></samp>&rsquo;, &lsquo;<samp><span class="samp">)</span></samp>&rsquo; and &lsquo;<samp><span class="samp">\</span></samp>&rsquo;.  Any other
character appearing in a regular expression is ordinary, unless a &lsquo;<samp><span class="samp">\</span></samp>&rsquo;
precedes it.

   <p>For example, &lsquo;<samp><span class="samp">f</span></samp>&rsquo; is not a special character, so it is ordinary,
and therefore &lsquo;<samp><span class="samp">f</span></samp>&rsquo; is a regular expression that matches the string &lsquo;<samp><span class="samp">f</span></samp>&rsquo;
and no other string.  (It does <em>not</em> match the string &lsquo;<samp><span class="samp">ff</span></samp>&rsquo;.)  Likewise,
&lsquo;<samp><span class="samp">o</span></samp>&rsquo; is a regular expression that matches only &lsquo;<samp><span class="samp">o</span></samp>&rsquo;.

   <p>Any two regular expressions <var>a</var> and <var>b</var> can be concatenated. 
The result is a regular expression that matches a string if <var>a</var>
matches some amount of the beginning of that string and <var>b</var>
matches the rest of the string.

   <p>As a simple example, we can concatenate the regular expressions
&lsquo;<samp><span class="samp">f</span></samp>&rsquo; and &lsquo;<samp><span class="samp">o</span></samp>&rsquo; to get the regular expression &lsquo;<samp><span class="samp">fo</span></samp>&rsquo;,
which matches only the string &lsquo;<samp><span class="samp">fo</span></samp>&rsquo;.  Still trivial.

   <p>Note: special characters are treated as ordinary ones if they are in
contexts where their special meanings make no sense.  For example,
&lsquo;<samp><span class="samp">*foo</span></samp>&rsquo; treats &lsquo;<samp><span class="samp">*</span></samp>&rsquo; as ordinary since there is no preceding
expression on which the &lsquo;<samp><span class="samp">*</span></samp>&rsquo; can act. It is poor practice to depend on
this behaviour; better to quote the special character anyway, regardless of
where is appears.

   <p>The following are the characters and character sequences that have special
meaning within regular expressions. Any character not mentioned here is not
special; it stands for exactly itself for the purposes of searching and
matching.

     <dl>
<dt>&lsquo;<samp><span class="samp">.</span></samp>&rsquo;<dd>is a special character that matches anything except a newline. Using
concatenation, we can make regular expressions like &lsquo;<samp><span class="samp">a.b</span></samp>&rsquo;, which matches
any three-character string which begins with &lsquo;<samp><span class="samp">a</span></samp>&rsquo; and ends with
&lsquo;<samp><span class="samp">b</span></samp>&rsquo;.

     <br><dt>&lsquo;<samp><span class="samp">*</span></samp>&rsquo;<dd>is not a construct by itself; it is a suffix, which means the preceding
regular expression is to be repeated as many times as possible.  In
&lsquo;<samp><span class="samp">fo*</span></samp>&rsquo;, the &lsquo;<samp><span class="samp">*</span></samp>&rsquo; applies to the &lsquo;<samp><span class="samp">o</span></samp>&rsquo;, so &lsquo;<samp><span class="samp">fo*</span></samp>&rsquo; matches
&lsquo;<samp><span class="samp">f</span></samp>&rsquo; followed by any number of &lsquo;<samp><span class="samp">o</span></samp>&rsquo;'s.

     <p>The case of zero &lsquo;<samp><span class="samp">o</span></samp>&rsquo;'s is allowed: &lsquo;<samp><span class="samp">fo*</span></samp>&rsquo; does match
&lsquo;<samp><span class="samp">f</span></samp>&rsquo;.

     <p>&lsquo;<samp><span class="samp">*</span></samp>&rsquo; always applies to the <em>smallest</em> possible preceding
expression. Thus, &lsquo;<samp><span class="samp">fo*</span></samp>&rsquo; has a repeating &lsquo;<samp><span class="samp">o</span></samp>&rsquo;, not a repeating
&lsquo;<samp><span class="samp">fo</span></samp>&rsquo;.

     <br><dt>&lsquo;<samp><span class="samp">+</span></samp>&rsquo;<dd>&lsquo;<samp><span class="samp">+</span></samp>&rsquo; is like &lsquo;<samp><span class="samp">*</span></samp>&rsquo; except that at least one match for the preceding
pattern is required for &lsquo;<samp><span class="samp">+</span></samp>&rsquo;.  Thus, &lsquo;<samp><span class="samp">c[ad]+r</span></samp>&rsquo; does not match
&lsquo;<samp><span class="samp">cr</span></samp>&rsquo; but does match anything else that &lsquo;<samp><span class="samp">c[ad]*r</span></samp>&rsquo; would match.

     <br><dt>&lsquo;<samp><span class="samp">?</span></samp>&rsquo;<dd>&lsquo;<samp><span class="samp">?</span></samp>&rsquo; is like &lsquo;<samp><span class="samp">*</span></samp>&rsquo; except that it allows either zero or one match for
the preceding pattern.  Thus, &lsquo;<samp><span class="samp">c[ad]?r</span></samp>&rsquo; matches &lsquo;<samp><span class="samp">cr</span></samp>&rsquo; or &lsquo;<samp><span class="samp">car</span></samp>&rsquo;
or &lsquo;<samp><span class="samp">cdr</span></samp>&rsquo;, and nothing else.

     <br><dt>&lsquo;<samp><span class="samp">[ ... ]</span></samp>&rsquo;<dd>&lsquo;<samp><span class="samp">[</span></samp>&rsquo; begins a <dfn>character set</dfn>, which is terminated by a &lsquo;<samp><span class="samp">]</span></samp>&rsquo;. 
In the simplest case, the characters between the two form the set. 
Thus, &lsquo;<samp><span class="samp">[ad]</span></samp>&rsquo; matches either &lsquo;<samp><span class="samp">a</span></samp>&rsquo; or &lsquo;<samp><span class="samp">d</span></samp>&rsquo;,
and &lsquo;<samp><span class="samp">[ad]*</span></samp>&rsquo; matches any string of &lsquo;<samp><span class="samp">a</span></samp>&rsquo;'s and &lsquo;<samp><span class="samp">d</span></samp>&rsquo;'s
(including the empty string), from which it follows that
&lsquo;<samp><span class="samp">c[ad]*r</span></samp>&rsquo; matches &lsquo;<samp><span class="samp">car</span></samp>&rsquo;, <i>et cetera</i>.

     <p>Character ranges can also be included in a character set, by writing two
characters with a &lsquo;<samp><span class="samp">-</span></samp>&rsquo; between them.  Thus, &lsquo;<samp><span class="samp">[a-z]</span></samp>&rsquo; matches any
lower-case letter.  Ranges may be intermixed freely with individual
characters, as in &lsquo;<samp><span class="samp">[a-z$%.]</span></samp>&rsquo;, which matches any lower case letter or
&lsquo;<samp><span class="samp">$</span></samp>&rsquo;, &lsquo;<samp><span class="samp">%</span></samp>&rsquo; or period.

     <p>Note that the usual special characters are not special any more inside a
character set.  A completely different set of special characters exists
inside character sets: &lsquo;<samp><span class="samp">]</span></samp>&rsquo;, &lsquo;<samp><span class="samp">-</span></samp>&rsquo; and &lsquo;<samp><span class="samp">^</span></samp>&rsquo;.

     <p>To include a &lsquo;<samp><span class="samp">]</span></samp>&rsquo; in a character set, you must make it
the first character.  For example, &lsquo;<samp><span class="samp">[]a]</span></samp>&rsquo; matches &lsquo;<samp><span class="samp">]</span></samp>&rsquo; or &lsquo;<samp><span class="samp">a</span></samp>&rsquo;. 
To include a &lsquo;<samp><span class="samp">-</span></samp>&rsquo;, you must use it in a context where it cannot possibly
indicate a range: that is, as the first character, or immediately
after a range.

     <p>Note that when searching in UTF-8 text, a character set may contain
US-ASCII characters only.

     <br><dt>&lsquo;<samp><span class="samp">[^ ... ]</span></samp>&rsquo;<dd>&lsquo;<samp><span class="samp">[^</span></samp>&rsquo; begins a <dfn>complement character set</dfn>, which matches any
character except the ones specified.  Thus, &lsquo;<samp><span class="samp">[^a-z0-9A-Z]</span></samp>&rsquo; matches
all characters <em>except</em> letters and digits. Also in this case, when
searching in UTF-8 text a complemented character set may contain US-ASCII
characters only.

     <p>&lsquo;<samp><span class="samp">^</span></samp>&rsquo; is not special in a character set unless it is the first character. 
The character following the &lsquo;<samp><span class="samp">^</span></samp>&rsquo; is treated as if it were first (it may
be a &lsquo;<samp><span class="samp">-</span></samp>&rsquo; or a &lsquo;<samp><span class="samp">]</span></samp>&rsquo;).

     <br><dt>&lsquo;<samp><span class="samp">^</span></samp>&rsquo;<dd>is a special character that matches the empty string &ndash; but only if at the
beginning of a line in the text being matched.  Otherwise it fails to match
anything.  Thus, &lsquo;<samp><span class="samp">^foo</span></samp>&rsquo; matches a &lsquo;<samp><span class="samp">foo</span></samp>&rsquo; that occurs at the
beginning of a line.

     <br><dt>&lsquo;<samp><span class="samp">$</span></samp>&rsquo;<dd>is similar to &lsquo;<samp><span class="samp">^</span></samp>&rsquo; but matches only at the end of a line. Thus,
&lsquo;<samp><span class="samp">xx*$</span></samp>&rsquo; matches a string of one or more &lsquo;<samp><span class="samp">x</span></samp>&rsquo;'s at the end of a
line.

     <br><dt>&lsquo;<samp><span class="samp">\</span></samp>&rsquo;<dd>has two functions: it quotes the above special characters (including
&lsquo;<samp><span class="samp">\</span></samp>&rsquo;), and it introduces additional special constructs.

     <p>Because &lsquo;<samp><span class="samp">\</span></samp>&rsquo; quotes special characters, &lsquo;<samp><span class="samp">\$</span></samp>&rsquo; is a regular
expression that matches only &lsquo;<samp><span class="samp">$</span></samp>&rsquo;, and &lsquo;<samp><span class="samp">\[</span></samp>&rsquo; is a regular
expression that matches only &lsquo;<samp><span class="samp">[</span></samp>&rsquo;, and so on.

     <p>For the most part, &lsquo;<samp><span class="samp">\</span></samp>&rsquo; followed by any character matches only that
character.  However, there are several exceptions: characters which, when
preceded by &lsquo;<samp><span class="samp">\</span></samp>&rsquo;, are special constructs.  Such characters are always
ordinary when encountered on their own.

     <br><dt>&lsquo;<samp><span class="samp">|</span></samp>&rsquo;<dd>specifies an alternative. Two regular expressions <var>a</var> and <var>b</var> with
&lsquo;<samp><span class="samp">|</span></samp>&rsquo; in between form an expression that matches anything that either
<var>a</var> or <var>b</var> will match.

     <p>Thus, &lsquo;<samp><span class="samp">foo|bar</span></samp>&rsquo; matches either &lsquo;<samp><span class="samp">foo</span></samp>&rsquo; or &lsquo;<samp><span class="samp">bar</span></samp>&rsquo; but no other
string.

     <p>&lsquo;<samp><span class="samp">|</span></samp>&rsquo; applies to the largest possible surrounding expressions.  Only a
surrounding &lsquo;<samp><span class="samp">( ... )</span></samp>&rsquo; grouping can limit the grouping power of
&lsquo;<samp><span class="samp">|</span></samp>&rsquo;.

     <br><dt>&lsquo;<samp><span class="samp">( ... )</span></samp>&rsquo;<dd>is a grouping construct that serves three purposes:

          <ol type=1 start=1>
<li>To enclose a set of &lsquo;<samp><span class="samp">|</span></samp>&rsquo; alternatives for other operations. 
Thus, &lsquo;<samp><span class="samp">(foo|bar)x</span></samp>&rsquo; matches either &lsquo;<samp><span class="samp">foox</span></samp>&rsquo; or &lsquo;<samp><span class="samp">barx</span></samp>&rsquo;.

          <li>To enclose a complicated expression for the postfix &lsquo;<samp><span class="samp">*</span></samp>&rsquo; to operate on. 
Thus, &lsquo;<samp><span class="samp">ba(na)*</span></samp>&rsquo; matches &lsquo;<samp><span class="samp">bananana</span></samp>&rsquo; <i>et cetera</i>, with any (zero or
more) number of &lsquo;<samp><span class="samp">na</span></samp>&rsquo;'s.

          <li>To mark a matched substring for future reference.

          </ol>

     <p>This last application is not a consequence of the idea of a parenthetical
grouping; it is a separate feature that happens to be assigned as a second
meaning to the same &lsquo;<samp><span class="samp">( ... )</span></samp>&rsquo; construct because there is no
conflict in practice between the two meanings.  Here is an explanation of
this feature:

     <br><dt>&lsquo;<samp><span class="samp">\</span><var>digit</var></samp>&rsquo;<dd>After the end of a &lsquo;<samp><span class="samp">( ... )</span></samp>&rsquo; construct, the matcher remembers the
beginning and end of the text matched by that construct.  Then, later on in
the regular expression, you can use &lsquo;<samp><span class="samp">\</span></samp>&rsquo; followed by <var>digit</var> to mean
&ldquo;match the same text matched the <var>digit</var>'th time by the &lsquo;<samp><span class="samp">(
... )</span></samp>&rsquo; construct.&rdquo;  The &lsquo;<samp><span class="samp">( ... )</span></samp>&rsquo; constructs are numbered
in order of commencement in the regexp.

     <p>The strings matching the first nine &lsquo;<samp><span class="samp">( ... )</span></samp>&rsquo; constructs appearing
in a regular expression are assigned numbers 1 through 9 in order of their
beginnings. 
&lsquo;<samp><span class="samp">\1</span></samp>&rsquo; through &lsquo;<samp><span class="samp">\9</span></samp>&rsquo; may be used to refer to the text matched by
the corresponding &lsquo;<samp><span class="samp">( ... )</span></samp>&rsquo; construct.

     <p>For example, &lsquo;<samp><span class="samp">(.+)\1</span></samp>&rsquo; matches any non empty string that is composed of
two identical halves.  The &lsquo;<samp><span class="samp">(.+)</span></samp>&rsquo; matches the first half, which may be
anything non empty, but the &lsquo;<samp><span class="samp">\1</span></samp>&rsquo; that follows must match the same exact
text.

     <br><dt>&lsquo;<samp><span class="samp">\b</span></samp>&rsquo;<dd>matches the empty string, but only if it is at the beginning or
end of a word.  Thus, &lsquo;<samp><span class="samp">\bfoo\b</span></samp>&rsquo; matches any occurrence of
&lsquo;<samp><span class="samp">foo</span></samp>&rsquo; as a separate word.  &lsquo;<samp><span class="samp">\bball(s|)\b</span></samp>&rsquo; matches
&lsquo;<samp><span class="samp">ball</span></samp>&rsquo; or &lsquo;<samp><span class="samp">balls</span></samp>&rsquo; as a separate word.

     <br><dt>&lsquo;<samp><span class="samp">\B</span></samp>&rsquo;<dd>matches the empty string, provided it is <em>not</em> at the beginning or end
of a word.

     <br><dt>&lsquo;<samp><span class="samp">\&lt;</span></samp>&rsquo;<dd>matches the empty string, but only if it is at the beginning
of a word.

     <br><dt>&lsquo;<samp><span class="samp">\&gt;</span></samp>&rsquo;<dd>matches the empty string, but only if it is at the end of a word.

     <br><dt>&lsquo;<samp><span class="samp">\w</span></samp>&rsquo;<dd>matches any word-constituent character. These are US-ASCII letters,
numbers and the underscore, independently on the buffer encoding.

     <br><dt>&lsquo;<samp><span class="samp">\W</span></samp>&rsquo;<dd>matches any character that is not a word-constituent. 
</dl>

<h4 class="subsection">3.8.2 Replacing regular expressions</h4>

<p>Also the replacement string has some special feature when doing a regular
expression search and replace. Exactly as during the search, &lsquo;<samp><span class="samp">\</span></samp>&rsquo; followed
by <var>digit</var> stands for &ldquo;the text matched the <var>digit</var>'th time by the
&lsquo;<samp><span class="samp">( ... )</span></samp>&rsquo; construct in the search expression&rdquo;. Moreover, &lsquo;<samp><span class="samp">\0</span></samp>&rsquo;
represent the whole string matched by the regular expression. Thus, for
instance, the replace string &lsquo;<samp><span class="samp">\0\0</span></samp>&rsquo; has the effect of doubling any string
matched.

   <p>Another example: if you search for &lsquo;<samp><span class="samp">(a+)(b+)</span></samp>&rsquo;, replacing with
&lsquo;<samp><span class="samp">\2x\1</span></samp>&rsquo;, you will match any string composed by a series of &lsquo;<samp><span class="samp">a</span></samp>&rsquo;'s
followed by a series of &lsquo;<samp><span class="samp">b</span></samp>&rsquo;'s, and you will replace it with the
string obtained by moving the &lsquo;<samp><span class="samp">a</span></samp>&rsquo; in front of the &lsquo;<samp><span class="samp">b</span></samp>&rsquo;'s, adding
moreover &lsquo;<samp><span class="samp">x</span></samp>&rsquo; inbetween. For instance, &lsquo;<samp><span class="samp">aaaab</span></samp>&rsquo; will be matched and
replaced by &lsquo;<samp><span class="samp">bxaaaa</span></samp>&rsquo;.

   <p>Note that the backslash character can escape itself. Thus, to put a
backslash in the replacement string, you have to use &lsquo;<samp><span class="samp">\\</span></samp>&rsquo;.

   </body></html>