This file is indexed.

/usr/share/doc/festival-doc/html/Utterance-chunking.html is in festival-doc 1:2.4~release-2.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created by GNU Texinfo 5.2, http://www.gnu.org/software/texinfo/ -->
<head>
<title>Festival Speech Synthesis System: Utterance chunking</title>

<meta name="description" content="Festival Speech Synthesis System: Utterance chunking">
<meta name="keywords" content="Festival Speech Synthesis System: Utterance chunking">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link href="index.html#Top" rel="start" title="Top">
<link href="Index.html#Index" rel="index" title="Index">
<link href="Index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="TTS.html#TTS" rel="up" title="TTS">
<link href="Text-modes.html#Text-modes" rel="next" title="Text modes">
<link href="TTS.html#TTS" rel="prev" title="TTS">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.smallquotation {font-size: smaller}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
div.indentedblock {margin-left: 3.2em}
div.lisp {margin-left: 3.2em}
div.smalldisplay {margin-left: 3.2em}
div.smallexample {margin-left: 3.2em}
div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
div.smalllisp {margin-left: 3.2em}
kbd {font-style:oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: inherit; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: inherit; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.nocodebreak {white-space:nowrap}
span.nolinebreak {white-space:nowrap}
span.roman {font-family:serif; font-weight:normal}
span.sansserif {font-family:sans-serif; font-weight:normal}
ul.no-bullet {list-style: none}
-->
</style>


</head>

<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
<a name="Utterance-chunking"></a>
<div class="header">
<p>
Next: <a href="Text-modes.html#Text-modes" accesskey="n" rel="next">Text modes</a>, Up: <a href="TTS.html#TTS" accesskey="u" rel="up">TTS</a> &nbsp; [<a href="Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<a name="Utterance-chunking-1"></a>
<h3 class="section">9.1 Utterance chunking</h3>

<a name="index-utterance-chunking"></a>
<a name="index-eou_005ftree"></a>
<p>Text to speech works by first tokenizing the file and chunking the
tokens into utterances.  The definition of utterance breaks is
determined by the utterance tree in variable <code>eou_tree</code>.  A default
version is given in <samp>lib/tts.scm</samp>.  This uses a decision tree to
determine what signifies an utterance break.  Obviously blank lines are
probably the most reliable, followed by certain punctuation.  The
confusion of the use of periods for both sentence breaks and
abbreviations requires some more heuristics to best guess their
different use.  The following tree is currently used which
works better than simply using punctuation.
</p><div class="lisp">
<pre class="lisp">(defvar eou_tree
'((n.whitespace matches &quot;.*\n.*\n\\(.\\|\n\\)*&quot;) ;; 2 or more newlines
  ((1))
  ((punc in (&quot;?&quot; &quot;:&quot; &quot;!&quot;))
   ((1))
   ((punc is &quot;.&quot;)
    ;; This is to distinguish abbreviations vs periods
    ;; These are heuristics
    ((name matches &quot;\\(.*\\..*\\|[A-Z][A-Za-z]?[A-Za-z]?\\|etc\\)&quot;)
     ((n.whitespace is &quot; &quot;)
      ((0))                  ;; if abbrev single space isn't enough for break
      ((n.name matches &quot;[A-Z].*&quot;)
       ((1))
       ((0))))
     ((n.whitespace is &quot; &quot;)  ;; if it doesn't look like an abbreviation
      ((n.name matches &quot;[A-Z].*&quot;)  ;; single space and non-cap is no break
       ((1))
       ((0)))
      ((1))))
    ((0)))))
</pre></div>
<p>The token items this is applied to will always (except in the
end of file case) include one following token, so look ahead is
possible.  The &quot;n.&quot; and &quot;p.&quot; and &quot;p.p.&quot; prefixes allow access to the
surrounding token context.  The features <code>name</code>, <code>whitespace</code>
and <code>punc</code> allow access to the contents of the token itself.  At
present there is no way to access the lexicon form this tree which
unfortunately might be useful if certain abbreviations were identified
as such there.
</p>
<p>Note these are heuristics and written by hand not trained from data,
though problems have been fixed as they have been observed in data.  The
above rules may make mistakes where abbreviations appear at end of
lines, and when improper spacing and capitalization is used.  This is
probably worth changing, for modes where more casual text appears, such
as email messages and USENET news messages.  A possible improvement
could be made by analysing a text to find out its basic threshold of
utterance break (i.e. if no full stop, two spaces, followed by a
capitalized word sequences appear and the text is of a reasonable length
then look for other criteria for utterance breaks).
</p>
<p>Ultimately what we are trying to do is to chunk the text into utterances
that can be synthesized quickly and start to play them quickly to
minimise the time someone has to wait for the first sound when starting
synthesis.  Thus it would be better if this chunking were done on
<em>prosodic phrases</em> rather than chunks more similar to linguistic
sentences.  Prosodic phrases are bounded in size, while sentences are
not.
</p>
<hr>
<div class="header">
<p>
Next: <a href="Text-modes.html#Text-modes" accesskey="n" rel="next">Text modes</a>, Up: <a href="TTS.html#TTS" accesskey="u" rel="up">TTS</a> &nbsp; [<a href="Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>



</body>
</html>