This file is indexed.

/usr/share/doc/python-gamera.toolkits.ocr/html/gamera.toolkits.ocr.classes.Page.html is in python-gamera.toolkits.ocr 1.2.2-2.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="Docutils 0.12: http://docutils.sourceforge.net/" />
<title>class Page</title>
<link rel="stylesheet" href="default.css" type="text/css" />
</head>
<body>
<div class="document" id="class-page">
<h1 class="title">class <tt class="docutils literal">Page</tt></h1>

<p><strong>Last modified</strong>:</p>
<div class="contents topic" id="contents">
<p class="topic-title first">Contents</p>
<ul class="simple">
<li><a class="reference internal" href="#page" id="id9"><tt class="docutils literal">Page</tt></a><ul>
<li><a class="reference internal" href="#init" id="id10"><tt class="docutils literal">__init__</tt></a></li>
<li><a class="reference internal" href="#segment" id="id11"><tt class="docutils literal">segment</tt></a></li>
<li><a class="reference internal" href="#id3" id="id12"><tt class="docutils literal">page_to_lines</tt></a></li>
<li><a class="reference internal" href="#id5" id="id13"><tt class="docutils literal">order_lines</tt></a></li>
<li><a class="reference internal" href="#id6" id="id14"><tt class="docutils literal">lines_to_chars</tt></a></li>
<li><a class="reference internal" href="#id7" id="id15"><tt class="docutils literal">chars_to_words</tt></a></li>
<li><a class="reference internal" href="#show-lines" id="id16"><tt class="docutils literal">show_lines</tt></a></li>
<li><a class="reference internal" href="#show-glyphs" id="id17"><tt class="docutils literal">show_glyphs</tt></a></li>
<li><a class="reference internal" href="#show-words" id="id18"><tt class="docutils literal">show_words</tt></a></li>
</ul>
</li>
</ul>
</div>
<div class="section" id="page">
<h1><a class="toc-backref" href="#id9"><tt class="docutils literal">Page</tt></a></h1>
<p>In module <tt class="docutils literal">gamera.toolkits.ocr.classes</tt></p>
<p><p>The <tt class="docutils literal">Page</tt> object offers the page segmentation functionality by
providing a <tt class="docutils literal">segment</tt> method. See <a class="reference external" href="#segment">its documentation</a> for more information
on how to overwrite specific steps of the segmentation process.</p>
<p>After the call of <tt class="docutils literal">segment</tt>, the segmentation results are stored in the
following attributes of <tt class="docutils literal">Page</tt>:</p>
<blockquote>
<dl class="docutils">
<dt><strong>textlines</strong></dt>
<dd>List of <a class="reference external" href="gamera.toolkits.ocr.classes.Textline.html">Textline</a> objects representing all text lines</dd>
<dt><strong>img</strong></dt>
<dd>The image to which Ccs in the <em>textlines</em> refer.</dd>
</dl>
</blockquote>
</p>
<div class="section" id="init">
<h2><a class="toc-backref" href="#id10"><tt class="docutils literal">__init__</tt></a></h2>
<p>The only required argument in the constructor is the image that is to
be segmented. Note that the constructor does <em>not</em> do the segmentation; for
this, you must call the <a class="reference external" href="#segment">segment</a> method.</p>
<p>Signature:</p>
<blockquote>
<tt class="docutils literal">init (image, glyphs=None, classify_ccs=None)</tt></blockquote>
<p>with</p>
<blockquote>
<dl class="docutils">
<dt><em>image</em>:</dt>
<dd>The image to be segmented.</dd>
<dt><em>glyphs</em>:</dt>
<dd>An optional list of connected components representing the characters
in the image. In general, this is not needed, but it can be useful
for bottom up methods starting from already detected characters (e.g.
by Gamera's classification based character grouping.</dd>
<dt><em>classify_ccs</em>:</dt>
<dd>A callable class with the same interface as <a class="reference external" href="gamera.toolkits.ocr.classes.ClassifyCCs.html">ClassifyCCs</a>.
If given, it will be called during the segmentation process, right after
the splitting of lines to characters.</dd>
</dl>
</blockquote>
</div>
<div class="section" id="segment">
<h2><a class="toc-backref" href="#id11"><tt class="docutils literal">segment</tt></a></h2>
<p>Segments <em>Page.img</em> and stores the result in <em>Page.textlines</em>.
This method has no arguments.</p>
<p>It calls the following methods in the given order:</p>
<blockquote>
<ul class="simple">
<li><a class="reference external" href="#page-to-lines">page_to_lines</a> for splitting the page into segments representing text lines</li>
<li><a class="reference external" href="#order-lines">order_lines</a> for sorting the lines into reading order</li>
<li><a class="reference external" href="#lines-to-chars">lines_to_chars</a> for splitting all lines into characters</li>
<li><em>Page.classify_ccs</em> when it is set, i.e., has been passed to the
constructor (default is that it is not set)</li>
<li><a class="reference external" href="#chars-to-words">chars_to_words</a> for grouping the characters to words</li>
</ul>
</blockquote>
<p>By overwriting one (or several) of the above functions, you can
replace specific steps of the segmentation process with custom
algorithms.</p>
</div>
<div class="section" id="id3">
<h2><a class="toc-backref" href="#id12"><tt class="docutils literal">page_to_lines</tt></a></h2>
<p>Splits the image into segments representing text lines.
This method has no arguments.</p>
<p>The current implementation simply calls the <em>bbox_merging</em>
plugin from the Gamera core with <em>Ey=0</em>, such that the page is not
split into paragraphs, but into lines.</p>
<p>The segmentation result is stored in the variable <em>Page.ccs_lines</em>,
which is a list of the data type <tt class="docutils literal">Cc</tt>, i.e., with each segment (line)
represented by a different label in the image. This is the interface
used by all page segmentation plugins in the Gamera core.</p>
<div class="note">
<p class="first admonition-title">Note</p>
<p class="last">When you overwrite this method, make sure that write the
segmentation result to <em>self.ccs_lines</em>. This member variable
will then be further processed by <a class="reference external" href="#lines-to-chars">lines_to_chars</a>.</p>
</div>
</div>
<div class="section" id="id5">
<h2><a class="toc-backref" href="#id13"><tt class="docutils literal">order_lines</tt></a></h2>
<p>Sorts the segments in <em>Page.ccs_lines</em> into reading order.
This method has no arguments.</p>
<p>The current implementation uses the plugin <em>textline_reading_order</em>
from the Gamera core.</p>
</div>
<div class="section" id="id6">
<h2><a class="toc-backref" href="#id14"><tt class="docutils literal">lines_to_chars</tt></a></h2>
<p>Splits text lines into characters. Signature:</p>
<blockquote>
<tt class="docutils literal">lines_to_chars (lines=None)</tt></blockquote>
<p><em>lines</em> must be a list of <tt class="docutils literal">Cc</tt> data types, each of them representing
a text line. When not given (default), <em>Page.ccs_lines</em> is used instead.
The current implementation calls <em>get_line_glyphs</em> as defined
in the module <a class="reference external" href="functions.html">ocr_toolkit</a>.</p>
<p>The result is stored in <em>Page.textlines</em>; the characters are stored
for each textline in <em>Textline.glyphs</em>.</p>
</div>
<div class="section" id="id7">
<h2><a class="toc-backref" href="#id15"><tt class="docutils literal">chars_to_words</tt></a></h2>
<p>Groups the characters in each <tt class="docutils literal">Textline</tt> from <em>Page.textlines</em>
to words and stores the result for each <tt class="docutils literal">Textline</tt> in the property
<em>Textline.words</em>.</p>
<p>This method has an optional but generally useless argument for the list of
textlines. It is therefore usually called without arguments.</p>
<p>The current implementation calls <em>chars_make_words</em> as defined
in the module <a class="reference external" href="functions.html">ocr_toolkit</a>.</p>
</div>
<div class="section" id="show-lines">
<h2><a class="toc-backref" href="#id16"><tt class="docutils literal">show_lines</tt></a></h2>
<p>Returns an RGB image with all segmented text lines marked by hollow
rects. Makes only sense after <em>page_to_lines</em> (or <em>segment</em>) has
been called.</p>
</div>
<div class="section" id="show-glyphs">
<h2><a class="toc-backref" href="#id17"><tt class="docutils literal">show_glyphs</tt></a></h2>
<p>Returns an RGB image with all segmented/grouped characters marked by
hollow rects. Makes only sense after <em>lines_to_chars</em> (or <em>segment</em>) has
been called.</p>
</div>
<div class="section" id="show-words">
<h2><a class="toc-backref" href="#id18"><tt class="docutils literal">show_words</tt></a></h2>
<p>Returns an RGB image with all grouped words marked by
hollow rects. Makes only sense after <em>chars_to_words</em> (or <em>segment</em>) has
been called..</p>
</div>
</div>
</div>
</body>
</html>