/usr/share/freemat/help/text/regexp.mdc is in freemat-help 4.0-5.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | REGEXP REGEXP Regular Expression Matching Function
Usage
Matches regular expressions in the provided string. This function is
complicated, and compatibility with MATLABs syntax is not perfect. The
syntax for its use is
regexp('str','expr')
which returns a row vector containing the starting index of each substring
of str that matches the regular expression described by expr. The
second form of regexp returns six outputs in the following order:
[start stop tokenExtents match tokens names] = regexp('str','expr')
where the meaning of each of the outputs is defined below.
- start is a row vector containing the starting index of each
substring that matches the regular expression.
- stop is a row vector containing the ending index of each
substring that matches the regular expression.
- tokenExtents is a cell array containing the starting and ending
indices of each substring that matches the tokens in the regular
expression. A token is a captured part of the regular expression.
If the 'once' mode is used, then this output is a double array.
- match is a cell array containing the text for each substring
that matches the regular expression. In 'once' mode, this is a
string.
- tokens is a cell array of cell arrays of strings that correspond
to the tokens in the regular expression. In 'once' mode, this is a
cell array of strings.
- named is a structure array containing the named tokens captured
in a regular expression. Each named token is assigned a field in the resulting
structure array, and each element of the array corresponds to a different
match.
If you want only some of the the outputs, you can use the
following variant of regexp:
[o1 o2 ...] = regexp('str','expr', 'p1', 'p2', ...)
where p1 etc. are the names of the outputs (and the order we want
the outputs in). As a final variant, you can supply some mode
flags to regexp
[o1 o2 ...] = regexp('str','expr', p1, p2, ..., 'mode1', 'mode2')
where acceptable mode flags are:
- 'once' - only the first match is returned.
- 'matchcase' - letter case must match (selected by default for regexp)
- 'ignorecase' - letter case is ignored (selected by default for regexpi)
- 'dotall' - the '.' operator matches any character (default)
- 'dotexceptnewline' - the '.' operator does not match the newline character
- 'stringanchors' - the ^ and $ operators match at the beginning and
end (respectively) of a string.
- 'lineanchors' - the ^ and $ operators match at the beginning and
end (respectively) of a line.
- 'literalspacing' - the space characters and comment characters # are matched
as literals, just like any other ordinary character (default).
- 'freespacing' - all spaces and comments are ignored in the regular expression.
You must use '\ ' and '\#' to match spaces and comment characters, respectively.
Note the following behavior differences between MATLABs regexp and FreeMats:
- If you have an old version of pcre installed, then named tokens must use the
older <?P<name> syntax, instead of the new <?<name> syntax.
- The pcre library is pickier about named tokens and their appearance in
expressions. So, for example, the regexp from the MATLAB
manual '(?<first>\\w+)\\s+(?<last>\\w+)(?<last>\\w+),\\s+(?<first>\\w+)'|
does not work correctly (as of this writing) because the same named
tokens appear multiple
times. The workaround is to assign different names to each token, and then collapse
the results later.
|