/usr/share/doc/festival-doc/html/Utterance-chunking.html is in festival-doc 1:2.4~release-2.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created by GNU Texinfo 5.2, http://www.gnu.org/software/texinfo/ -->
<head>
<title>Festival Speech Synthesis System: Utterance chunking</title>
<meta name="description" content="Festival Speech Synthesis System: Utterance chunking">
<meta name="keywords" content="Festival Speech Synthesis System: Utterance chunking">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link href="index.html#Top" rel="start" title="Top">
<link href="Index.html#Index" rel="index" title="Index">
<link href="Index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="TTS.html#TTS" rel="up" title="TTS">
<link href="Text-modes.html#Text-modes" rel="next" title="Text modes">
<link href="TTS.html#TTS" rel="prev" title="TTS">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.smallquotation {font-size: smaller}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
div.indentedblock {margin-left: 3.2em}
div.lisp {margin-left: 3.2em}
div.smalldisplay {margin-left: 3.2em}
div.smallexample {margin-left: 3.2em}
div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
div.smalllisp {margin-left: 3.2em}
kbd {font-style:oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: inherit; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: inherit; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.nocodebreak {white-space:nowrap}
span.nolinebreak {white-space:nowrap}
span.roman {font-family:serif; font-weight:normal}
span.sansserif {font-family:sans-serif; font-weight:normal}
ul.no-bullet {list-style: none}
-->
</style>
</head>
<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
<a name="Utterance-chunking"></a>
<div class="header">
<p>
Next: <a href="Text-modes.html#Text-modes" accesskey="n" rel="next">Text modes</a>, Up: <a href="TTS.html#TTS" accesskey="u" rel="up">TTS</a> [<a href="Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<a name="Utterance-chunking-1"></a>
<h3 class="section">9.1 Utterance chunking</h3>
<a name="index-utterance-chunking"></a>
<a name="index-eou_005ftree"></a>
<p>Text to speech works by first tokenizing the file and chunking the
tokens into utterances. The definition of utterance breaks is
determined by the utterance tree in variable <code>eou_tree</code>. A default
version is given in <samp>lib/tts.scm</samp>. This uses a decision tree to
determine what signifies an utterance break. Obviously blank lines are
probably the most reliable, followed by certain punctuation. The
confusion of the use of periods for both sentence breaks and
abbreviations requires some more heuristics to best guess their
different use. The following tree is currently used which
works better than simply using punctuation.
</p><div class="lisp">
<pre class="lisp">(defvar eou_tree
'((n.whitespace matches ".*\n.*\n\\(.\\|\n\\)*") ;; 2 or more newlines
((1))
((punc in ("?" ":" "!"))
((1))
((punc is ".")
;; This is to distinguish abbreviations vs periods
;; These are heuristics
((name matches "\\(.*\\..*\\|[A-Z][A-Za-z]?[A-Za-z]?\\|etc\\)")
((n.whitespace is " ")
((0)) ;; if abbrev single space isn't enough for break
((n.name matches "[A-Z].*")
((1))
((0))))
((n.whitespace is " ") ;; if it doesn't look like an abbreviation
((n.name matches "[A-Z].*") ;; single space and non-cap is no break
((1))
((0)))
((1))))
((0)))))
</pre></div>
<p>The token items this is applied to will always (except in the
end of file case) include one following token, so look ahead is
possible. The "n." and "p." and "p.p." prefixes allow access to the
surrounding token context. The features <code>name</code>, <code>whitespace</code>
and <code>punc</code> allow access to the contents of the token itself. At
present there is no way to access the lexicon form this tree which
unfortunately might be useful if certain abbreviations were identified
as such there.
</p>
<p>Note these are heuristics and written by hand not trained from data,
though problems have been fixed as they have been observed in data. The
above rules may make mistakes where abbreviations appear at end of
lines, and when improper spacing and capitalization is used. This is
probably worth changing, for modes where more casual text appears, such
as email messages and USENET news messages. A possible improvement
could be made by analysing a text to find out its basic threshold of
utterance break (i.e. if no full stop, two spaces, followed by a
capitalized word sequences appear and the text is of a reasonable length
then look for other criteria for utterance breaks).
</p>
<p>Ultimately what we are trying to do is to chunk the text into utterances
that can be synthesized quickly and start to play them quickly to
minimise the time someone has to wait for the first sound when starting
synthesis. Thus it would be better if this chunking were done on
<em>prosodic phrases</em> rather than chunks more similar to linguistic
sentences. Prosodic phrases are bounded in size, while sentences are
not.
</p>
<hr>
<div class="header">
<p>
Next: <a href="Text-modes.html#Text-modes" accesskey="n" rel="next">Text modes</a>, Up: <a href="TTS.html#TTS" accesskey="u" rel="up">TTS</a> [<a href="Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>
|