/usr/share/pyshared/gamera/ruleengine.py is in python-gamera 3.3.2-2.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | # -*- mode: python; indent-tabs-mode: nil; tab-width: 3 -*-
# vim: set tabstop=3 shiftwidth=3 expandtab:
#
# Copyright (C) 2001-2005 Ichiro Fujinaga, Michael Droettboom,
# and Karl MacMillan
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
#
import traceback
from sys import exc_info
from inspect import isfunction
from gamera import group, util
from fudge import Fudge
"""This module provides tools for applying graph-rewriting rules to a
set of glyphs.
It would seem nice to invent a completely new language for this task
(for instance, a logic-programming language like Prolog might have
been nice).
However, in this design, each rule is defined by a basic Python
function. This integrates much easier into the rest of the
Python-based Gamera world, since any Python function can be called at
any time, and doesn't require the user to learn a new language. The
disadvantage is that the pattern matching is single-tiered (no
recursion) and the ordering and application of the rules is left to
the end programmer, since this approach makes it difficult to do any
global optimisation. There may be ways around these limitations, but
that will require some thinking.
Rule functions take the basic form:
def rule_function0(a="dot", b="letter*"): # 1: function signature
if (Fudge(a.lr_y) == b.lr_y and
a.ul_x > b.lr_x and
a.ul_x - b.lr_x < 20): # 2: expression
a.classify_heuristic('punctuation.period') # 3: operation
return [], [] # 4: added, removed
1. The function signature itself is used to define which classes of
glyphs should be applied to the function. Each argument has a
Python default value which is a string containing the symbol name
or an "symbol name regular expression" that will match the glyph to
be passed in. Yes, this is a complete bastardisation of what
default arguments are meant for in Python, since these string
values will never actually be passed in to the function. However,
I think this syntax is convenient and kind of cute. (Another
option might be to use some sort of special doc-string syntax, I
suppose.)
Rules are indexed by the first argument, so it is a good idea to
use the least-frequently occuring symbol name (or the class the
most clearly defines the structure) as the first argument.
A "symbol name regular expression" is an instance of the
special-purpose regular expression language for symbol names
defined in id_name_matching.py. This syntax is pretty limited
compared to standard Python regular expression syntax, but is much
more convenient for the task. (For example, 'upper.*' would be
'upper\.[^.]*' as a standard Python regular expression, and nobody
wants to write code like that ;)
Informal syntax definition:
A|B # matches A or B
A.B|C # matches A.B or A.C
(A.B)|C # matches A.B or C
* # multiple-character wildcard
? # single character wildcard
() # grouping can be performed with
# parentheses
[a-z] # matches any character a-z
Example expressions:
(upper.x)|(lower.y) # match either upper.x or lower.y
upper.* # match the 'upper' category
upper.a|b|c # matches upper.a, upper.b or upper.c
upper.capital_? # ? is a single character wildcard
2. The body of the function, in a literal sense, is left entirely up
to the end-user programmer (it's just Python afterall.) However,
most rule functions will consist of some sort of test followed by
some sort of change (side-effect!!!) of the data.
For performing tests that need to be somewhat "fuzzy", you may want
to use Fudge objects. Comparisons on Fudge objects have a built-in
fuzzy zone around the core int, Point or Rect that make comparing
things to be "close enough" easier. (See fudge.py)
3. In the body, the rule function can modify the glyph attributes in
any way it sees fit.
4. Optionally, the rule function can return a pair of lists. The
first list is of glyphs that have been created by the operation
(for instance, if combining two glyphs into one union glyph.) The
second list is a list of glyphs that have been removed by the
operation (for instance removing noise glyphs.) (If your function
returns None, the RuleEngine assumes you meant [], []). These
results are appended together each time a set of rules is applied
and returned to the caller. Optionally, the rule-performing
mechanism can reapply rules to all the newly-created glyphs.
PERFORMING RULES:
Rules are grouped into RuleEngine classes, which then perform the
actually searching and call the rule functions.
# Initialise a new rule engine with a couple of rules
rule_engine = RuleEngine([rule_function0, rule_function1])
# Run the rules on the given list of glyphs
added, removed = rule_engine.perform_rules(glyphs)
An optional argument, grid_size, specifies the size of the indexing
grid cells. This will affect how far apart symbols can be matched.
TODO: There may be at some point some kind of Meta-RuleEngine for
performing sets of rules in sequence or parallel or whatever... """
class RuleEngineError(Exception):
pass
class RuleEngine:
_class_rules = []
def __init__(self, rules=[], reapply=0):
self.rules = {}
self._regexs = {}
self._rules_by_regex = {}
for rule in rules + self._class_rules:
if isfunction(rule):
self.add_rule(rule)
elif isinstance(rule, RuleEngine):
for r in rule.rules:
self.add_rule(r)
self._reapply = reapply
def add_rule(self, rule):
import id_name_matching
assert len(rule.func_defaults)
self.rules[rule] = None
regex = rule.func_defaults[0]
if not self._rules_by_regex.has_key(regex):
self._rules_by_regex[regex] = []
self._rules_by_regex[regex].append(rule)
for regex in rule.func_defaults:
if not self._regexs.has_key(regex):
self._regexs[regex] = id_name_matching.build_id_regex(regex)
if not self._rules_by_regex.has_key(regex):
self._rules_by_regex[regex] = []
def get_rules(self):
return self.rules.keys()
def _deal_with_result(self, rule, glyphs, added, removed):
try:
current_stack = traceback.extract_stack()
result = rule(*glyphs)
except Exception, e:
lines = traceback.format_exception(*exc_info())
del lines[1]
exception = ''.join(lines)
# We build a dictionary of exceptions in order to remove duplicates
self._exceptions.append(exception)
return 0
if result is None or result == ([], []):
return 0
for a in result[0]:
added[a] = None
for a in result[1]:
removed[a] = None
return 1
def perform_rules(self, glyphs, grid_size=100, recurse=0,
progress=None, _recursion_level=0):
self._exceptions = util.Set()
if _recursion_level > 10:
return [], []
elif _recursion_level == 0:
progress = util.ProgressFactory("Performing rules...")
try:
grid_index = group.GridIndexWithKeys(glyphs, grid_size, grid_size)
found_regexs = {}
for regex_string, compiled in self._regexs.items():
for glyph in glyphs:
if glyph.match_id_name(compiled):
grid_index.add_glyph_by_key(glyph, regex_string)
found_regexs[regex_string] = None
# This loop is only so the progress bar can do something useful.
for regex in found_regexs.iterkeys():
progress.add_length(
len(self._rules_by_regex[regex]) *
len(grid_index.get_glyphs_by_key(regex)))
added = {}
removed = {}
for regex in found_regexs.iterkeys():
for rule in self._rules_by_regex[regex]:
glyph_specs = rule.func_defaults
for glyph in grid_index.get_glyphs_by_key(regex):
if len(glyph_specs) == 1:
self._deal_with_result(rule, (glyph,), added, removed)
elif len(glyph_specs) == 2:
for glyph2 in grid_index.get_glyphs_around_glyph_by_key(
glyph, glyph_specs[1]):
stop = self._deal_with_result(rule, (glyph, glyph2), added, removed)
if not self._reapply and stop:
break
else:
seed = [list(grid_index.get_glyphs_around_glyph_by_key(glyph, x))
for x in glyph_specs[1:]]
for combination in util.combinations(seed):
stop = self._deal_with_result(rule, [glyph] + combination,
added, removed)
if not self._reapply and stop:
break
progress.step()
finally:
if _recursion_level == 0:
progress.kill()
if recurse and len(added):
self._deal_with_result(
self.perform_rules(added.keys(), 1, progress, _recursion_level + 1))
if len(self._exceptions):
s = ("One or more of the rule functions caused an exception.\n" +
"(Each exception listed only once):\n\n" +
"\n".join(self._exceptions))
raise RuleEngineError(s)
return added.keys(), removed.keys()
|