/usr/share/perl5/Pegex/API.pod is in libpegex-perl 0.55-1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 | =pod
=for comment
DO NOT EDIT. This Pod was generated by Swim.
See http://github.com/ingydotnet/swim-pm#readme
=encoding utf8
=head1 The Pegex API
Pegex can be used in many ways: inside scripts, from the command line or as
the foundation of a modular parsing framework. This document details the
various ways to use Pegex.
At the most abstract level, Pegex works like this:
$result = $parser->new($grammar, $receiver)->parse($input);
Which is to say, abstractly: a Pegex parser, under the direction of a Pegex
grammar, parses an input stream, and reports matches to a Pegex receiver,
which produces a result.
The parser, grammar, receiver and even the input, are Pegex objects. These
4 objects are involved in every Pegex parse operation, so let's review
them briefly:
=over
=item L<Pegex::Parser>
The Pegex parsing engine. This engine applies the logic of the grammar to an
input text. A B<parser> object contains a B<grammar> object and a B<receiver>
object. Its primary method is called C<parse>. The default parser engine is
non-backtracking, recursive descent. However there are parser subclasses for
various alternative types of parsing.
=item L<Pegex::Grammar>
A Pegex grammar starts as a text file/string composed in the B<Pegex> syntax.
Before it can be used in by a Parser it must be compiled. After compilation,
it is turned into a data tree consisting of rules and regexes. In modules that
are based on a Pegex grammar, the grammar will be compiled into a class file.
Pegex itself, uses a Pegex grammar class called L<Pegex::Pegex::Grammar> to
parse various Pegex grammars.
=item L<Pegex::Receiver>
A parser on it's own has no idea what to do with the text it matches. A Pegex
B<receiver> is a class that contains methods corresponding to the rules in a
grammar. As a rule in the grammar matches, its corresponding receiver method
(if one exists) is called with the data that has been matched. It is the
receiver's job to take action on the data, often building it into some new
structure. Pegex will use L<Pegex::Tree::Wrap> as the default receiver; it
produces a reasonably readable tree of the matched/captured data.
=item L<Pegex::Input>
Pegex abstracts its input streams into an object interface as well. Any
operation that can take an input string, can also take an input object. Pegex
will turn regular strings into these objects. This is probably the API concept
you will encounter the least, but it is covered here for completeness.
=back
All of these object classes can be subclassed to acheive various results.
Normally, you will write your own Pegex grammar and a Pegex receiver to
achieve a task.
=head2 Starting Simple - The C<pegex> Function
The L<Pegex> module exports a function called C<pegex> that you can use for
smaller tasks. Here is an example:
use Pegex;
use YAML;
$grammar = "
expr: num PLUS num
num: /( DIGIT+ )/
";
print Dump pegex($grammar)->parse('2+2');
This program would produce:
expr:
- num: 2
- num: 2
Let's review what's happening here. The L<Pegex> module is exporting a
C<pegex> function. This function takes a Pegex grammar string as input.
Internally this function compiles the grammar string into a grammar object.
Then it creates a parser object containing the grammar object and returns it.
The parse method is called on the input string: C<'2+2'>. The string matches,
and a nice data structure is returned.
So how was the data structure created? By the receiver object, of course! But
we didn't specify one, did we? Nope. It used the default receiver,
L<Pegex::Tree::Wrap>. We could have said:
print Dump pegex($grammar, 'Pegex::Tree::Wrap')->parse('2+2');
This receiver basically generates a mapping, where rule names of matches are
the keys, and the leaf values are the regex captures.
The more basic receiver called L<Pegex::Tree> generates a tree of sequences
that contain just the data (without the rule names). This code:
print Dump pegex($grammar, 'Pegex::Tree')->parse('2+2');
would produce:
- 2
- 2
If we wrote our own receiver class called C<Calculator> like this:
package Calculator;
use base 'Pegex::Tree';
sub got_expr {
my ($receiver, $data) = @_;
my ($a, $b) = @$data;
return $a + $b;
}
Then, this:
print pegex(grammar, 'Calculator')->parse('2+2');
would print:
4
=head2 More Explicit Usage
Continuing with the example above, let's see how to do it a little more
formally.
use Pegex::Parser;
use Pegex::Grammar;
use Pegex::Tree;
use Pegex::Input;
$grammar_text = "
expr: num PLUS num
num: /( DIGIT+ )/
";
$grammar = Pegex::Grammar->new(text => $grammar_text);
$receiver = Pegex::Tree->new();
$parser = Pegex::Parser->new(
grammar => $grammar,
receiver => $receiver,
);
$input = Pegex::Input->new(string => '2+2');
print Dump parser->parse($input);
This code does the same thing as the first example, but this time we've made
all the objects ourselves.
=head2 Precompiled Grammars
If you ship a Pegex grammar as part of a CPAN distribution, you'll want it to
be precompiled into a module. Pegex makes that easy.
Say the grammar_text about is stored in a file called C<share/expr.pgx>. If
you create a module called C<lib/MyThing/Grammar.pm> with content like this:
package MyThing::Grammar;
use base 'Pegex::Grammar';
use constant file => './share/expr.pgx';
sub make_tree {
}
1;
Then run this command line:
perl -Ilib -MMyThing::Grammar=compile
It will rewrite your module to look something like this:
package MyThing::Grammar;
use base 'Pegex::Grammar';
use constant file => './share/expr.pgx';
sub make_tree {
{ '+toprule' => 'expr',
'PLUS' => { '.rgx' => qr/\G\+/ },
'expr' => {
'.all' => [
{ '.ref' => 'num' },
{ '.ref' => 'PLUS' },
{ '.ref' => 'num' }
]
},
'num' => { '.rgx' => qr/\G([0-9]+)/ }
}
}
1;
This command found the file where your grammar is, compiled it, and used
L<Data::Dumper> to output it back into your module's C<make_tree> method.
This is what a compiled Pegex grammar looks like. As soon as this module is
loaded, the grammar is ready to be used by Pegex.
If you find yourself needing to compile your grammar module a lot during
development, just set this environment variable like so:
export PERL_PEGEX_AUTO_COMPILE=MyThing::Grammar
Now, every time the grammar module is loaded it will check to see if it needs
to be recompiled, and do it on the fly.
If you have more than one grammar to recompile, just list all the names
separated by commas.
=head1 See Also
=over
=item * L<Pegex::Parser>
=item * L<Pegex::Grammar>
=item * L<Pegex::Receiver>
=item * L<Pegex::Tree>
=item * L<Pegex::Tree::Wrap>
=item * L<Pegex::Input>
=back
=cut
|