/usr/share/doc/nedit/html/parenConstructs.html is in nedit 1:5.6~cvs20081118-7.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 | <HTML>
<HEAD>
<TITLE> Regular Expressions </TITLE>
</HEAD>
<BODY>
<A NAME="Parenthetical_Constructs"></A>
<H2> Parenthetical Constructs </H2>
<P>
<H3>Capturing Parentheses</H3>
</P><P>
Capturing Parentheses are of the form `(<regex>)' and can be used to group
arbitrarily complex regular expressions. Parentheses can be nested, but the
total number of parentheses, nested or otherwise, is limited to 50 pairs.
The text that is matched by the regular expression between a matched set of
parentheses is captured and available for text substitutions and
backreferences (see below.) Capturing parentheses carry a fairly high
overhead both in terms of memory used and execution speed, especially if
quantified by `*' or `+'.
</P><P>
<H3>Non-Capturing Parentheses</H3>
</P><P>
Non-Capturing Parentheses are of the form `(?:<regex>)' and facilitate
grouping only and do not incur the overhead of normal capturing parentheses.
They should not be counted when determining numbers for capturing parentheses
which are used with backreferences and substitutions. Because of the limit
on the number of capturing parentheses allowed in a regex, it is advisable to
use non-capturing parentheses when possible.
</P><P>
<H3>Positive Look-Ahead</H3>
</P><P>
Positive look-ahead constructs are of the form `(?=<regex>)' and implement a
zero width assertion of the enclosed regular expression. In other words, a
match of the regular expression contained in the positive look-ahead
construct is attempted. If it succeeds, control is passed to the next
regular expression atom, but the text that was consumed by the positive
look-ahead is first unmatched (backtracked) to the place in the text where
the positive look-ahead was first encountered.
</P><P>
One application of positive look-ahead is the manual implementation of a
first character discrimination optimization. You can include a positive
look-ahead that contains a character class which lists every character that
the following (potentially complex) regular expression could possibly start
with. This will quickly filter out match attempts that cannot possibly
succeed.
</P><P>
<H3>Negative Look-Ahead</H3>
</P><P>
Negative look-ahead takes the form `(?!<regex>)' and is exactly the same as
positive look-ahead except that the enclosed regular expression must NOT
match. This can be particularly useful when you have an expression that is
general, and you want to exclude some special cases. Simply precede the
general expression with a negative look-ahead that covers the special cases
that need to be filtered out.
</P><P>
<H3>Positive Look-Behind</H3>
</P><P>
Positive look-behind constructs are of the form `(?<=<regex>)' and implement
a zero width assertion of the enclosed regular expression in front of the
current matching position. It is similar to a positive look-ahead assertion,
except for the fact that the match is attempted on the text preceding the
current position, possibly even in front of the start of the matching range
of the entire regular expression.
</P><P>
A restriction on look-behind expressions is the fact that the expression
must match a string of a bounded size. In other words, `*', `+', and `{n,}'
quantifiers are not allowed inside the look-behind expression. Moreover,
matching performance is sensitive to the difference between the upper and
lower bound on the matching size. The smaller the difference, the better the
performance. This is especially important for regular expressions used in
highlight patterns.
</P><P>
Positive look-behind has similar applications as positive look-ahead.
</P><P>
<H3>Negative Look-Behind</H3>
</P><P>
Negative look-behind takes the form `(?<!<regex>)' and is exactly the same as
positive look-behind except that the enclosed regular expression must
<I>not</I> match. The same restrictions apply.
</P><P>
Note however, that performance is even more sensitive to the distance
between the size boundaries: a negative look-behind must not match for
<B>any</B> possible size, so the matching engine must check <B>every</B> size.
</P><P>
<H3>Case Sensitivity</H3>
</P><P>
There are two parenthetical constructs that control case sensitivity:
</P><P>
<PRE>
(?i<regex>) Case insensitive; `AbcD' and `aBCd' are
equivalent.
</PRE>
</P><P>
<PRE>
(?I<regex>) Case sensitive; `AbcD' and `aBCd' are
different.
</PRE>
</P><P>
Regular expressions are case sensitive by default, that is, `(?I<regex>)' is
assumed. All regular expression token types respond appropriately to case
insensitivity including character classes and backreferences. There is some
extra overhead involved when case insensitivity is in effect, but only to the
extent of converting each character compared to lower case.
</P><P>
<H3>Matching Newlines</H3>
</P><P>
NEdit regular expressions by default handle the matching of newlines in a way
that should seem natural for most editing tasks. There are situations,
however, that require finer control over how newlines are matched by some
regular expression tokens.
</P><P>
By default, NEdit regular expressions will <I>not</I> match a newline character for
the following regex tokens: dot (`.'); a negated character class (`[^...]');
and the following shortcuts for character classes:
</P><P>
<PRE>
`\d', `\D', `\l', `\L', `\s', `\S', `\w', `\W', `\Y'
</PRE>
</P><P>
The matching of newlines can be controlled for the `.' token, negated
character classes, and the `\s' and `\S' shortcuts by using one of the
following parenthetical constructs:
</P><P>
<PRE>
(?n<regex>) `.', `[^...]', `\s', `\S' match newlines
</PRE>
</P><P>
<PRE>
(?N<regex>) `.', `[^...]', `\s', `\S' don't match
newlines
</PRE>
</P><P>
`(?N<regex>)' is the default behavior.
</P><P>
<H3>Notes on New Parenthetical Constructs</H3>
</P><P>
Except for plain parentheses, none of the parenthetical constructs capture
text. If that is desired, the construct must be wrapped with capturing
parentheses, e.g. `((?i<regex))'.
</P><P>
All parenthetical constructs can be nested as deeply as desired, except for
capturing parentheses which have a limit of 50 sets of parentheses,
regardless of nesting level.
</P><P>
<H3>Back References</H3>
</P><P>
Backreferences allow you to match text captured by a set of capturing
parenthesis at some later position in your regular expression. A
backreference is specified using a single backslash followed by a single
digit from 1 to 9 (example: \3). Backreferences have similar syntax to
substitutions (see below), but are different from substitutions in that they
appear within the regular expression, not the substitution string. The number
specified with a backreference identifies which set of text capturing
parentheses the backreference is associated with. The text that was most
recently captured by these parentheses is used by the backreference to
attempt a match. As with substitutions, open parentheses are counted from
left to right beginning with 1. So the backreference `\3' will try to match
another occurrence of the text most recently matched by the third set of
capturing parentheses. As an example, the regular expression `(\d)\1' could
match `22', `33', or `00', but wouldn't match `19' or `01'.
</P><P>
A backreference must be associated with a parenthetical expression that is
complete. The expression `(\w(\1))' contains an invalid backreference since
the first set of parentheses are not complete at the point where the
backreference appears.
</P><P>
<H3>Substitution</H3>
</P><P>
Substitution strings are used to replace text matched by a set of capturing
parentheses. The substitution string is mostly interpreted as ordinary text
except as follows.
</P><P>
The escape sequences described above for special characters, and octal and
hexadecimal escapes are treated the same way by a substitution string. When
the substitution string contains the `&' character, NEdit will substitute the
entire string that was matched by the `Find...' operation. Any of the first
nine sub-expressions of the match string can also be inserted into the
replacement string. This is done by inserting a `\' followed by a digit from
1 to 9 that represents the string matched by a parenthesized expression
within the regular expression. These expressions are numbered left-to-right
in order of their opening parentheses.
</P><P>
The capitalization of text inserted by `&' or `\1', `\2', ... `\9' can be
altered by preceding them with `\U', `\u', `\L', or `\l'. `\u' and `\l'
change only the first character of the inserted entity, while `\U' and `\L'
change the entire entity to upper or lower case, respectively.
<P><HR>
</P><P>
</P>
</BODY>
</HTML>
|