/usr/share/doc/festival-doc/html/Letter-to-sound-rules.html

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created by GNU Texinfo 5.2, http://www.gnu.org/software/texinfo/ -->
<head>
<title>Festival Speech Synthesis System: Letter to sound rules</title>

<meta name="description" content="Festival Speech Synthesis System: Letter to sound rules">
<meta name="keywords" content="Festival Speech Synthesis System: Letter to sound rules">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link href="index.html#Top" rel="start" title="Top">
<link href="Index.html#Index" rel="index" title="Index">
<link href="Index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Lexicons.html#Lexicons" rel="up" title="Lexicons">
<link href="Building-letter-to-sound-rules.html#Building-letter-to-sound-rules" rel="next" title="Building letter to sound rules">
<link href="Lookup-process.html#Lookup-process" rel="prev" title="Lookup process">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.smallquotation {font-size: smaller}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
div.indentedblock {margin-left: 3.2em}
div.lisp {margin-left: 3.2em}
div.smalldisplay {margin-left: 3.2em}
div.smallexample {margin-left: 3.2em}
div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
div.smalllisp {margin-left: 3.2em}
kbd {font-style:oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: inherit; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: inherit; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.nocodebreak {white-space:nowrap}
span.nolinebreak {white-space:nowrap}
span.roman {font-family:serif; font-weight:normal}
span.sansserif {font-family:sans-serif; font-weight:normal}
ul.no-bullet {list-style: none}
-->
</style>


</head>

<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
<a name="Letter-to-sound-rules"></a>
<div class="header">
<p>
Next: <a href="Building-letter-to-sound-rules.html#Building-letter-to-sound-rules" accesskey="n" rel="next">Building letter to sound rules</a>, Previous: <a href="Lookup-process.html#Lookup-process" accesskey="p" rel="prev">Lookup process</a>, Up: <a href="Lexicons.html#Lexicons" accesskey="u" rel="up">Lexicons</a> &nbsp; [<a href="Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<a name="Letter-to-sound-rules-1"></a>
<h3 class="section">13.4 Letter to sound rules</h3>

<a name="index-letter-to-sound-rules-1"></a>
<p>Each lexicon may define what action to take when a word cannot
be found in the addenda or the compiled lexicon.  There are
a number of options which will hopefully be added to as more
general letter to sound rule systems are added.
</p>
<a name="index-unknown-words"></a>
<p>The method is set by the command
</p><div class="lisp">
<pre class="lisp">(lex.set.lts.method METHOD)
</pre></div>
<p>Where <var>METHOD</var> can be any of the following
</p><dl compact="compact">
<dt>&lsquo;<samp>Error</samp>&rsquo;</dt>
<dd><p>Throw an error when an unknown word is found (default).
</p></dd>
<dt>&lsquo;<samp>lts_rules</samp>&rsquo;</dt>
<dd><p>Use externally specified set of letter to sound rules (described
below).  The name of the rule set to use is defined with the
<code>lex.lts.ruleset</code> function.  This method runs one
set of rules on an exploded form of the word and assumes the rules
return a list of phonemes (in the appropriate set).  If multiple
instances of rules are required use the <code>function</code> method
described next.
</p></dd>
<dt>&lsquo;<samp>none</samp>&rsquo;</dt>
<dd><p>This returns an entry with a <code>nil</code> pronunciation field.  This will
only be valid in very special circumstances.
</p></dd>
<dt>&lsquo;<samp>FUNCTIONNAME</samp>&rsquo;</dt>
<dd><p>Call this as a LISP function function name.    This function
is given two arguments: the word and the part of speech.  It should
return a valid lexical entry.
</p></dd>
</dl>

<p>The basic letter to sound rule system is very simple but is
powerful enough to build reasonably complex letter to sound rules.
Although we&rsquo;ve found trained LTS rules better than hand written
ones (for complex languages) where no data is available and rules
must be hand written the following rule formalism is much easier to
use than that generated by the LTS training system (described
in the next section).
</p>
<a name="index-letter-to-sound-rules-2"></a>
<a name="index-LTS"></a>
<p>The basic form of a rule is as follows
</p><div class="lisp">
<pre class="lisp">( LEFTCONTEXT [ ITEMS ] RIGHTCONTEXT = NEWITEMS )
</pre></div>
<p>This interpretation is that if <var>ITEMS</var> appear in the specified right
and left context then the output string is to contain <var>NEWITEMS</var>.
Any of <var>LEFTCONTEXT</var>, <var>RIGHTCONTEXT</var> or <var>NEWITEMS</var> may be
empty.  Note that <var>NEWITEMS</var> is written to a different &quot;tape&quot; and hence
cannot feed further rules (within this ruleset).  An example is
</p><div class="lisp">
<pre class="lisp">( # [ c h ] C = k )
</pre></div>
<p>The special character <code>#</code> denotes a word boundary, and the symbol
<code>C</code> denotes the set of all consonants, sets are declared before
rules.  This rule states that a <code>ch</code> at the start of a word
followed by a consonant is to be rendered as the <code>k</code> phoneme.
Symbols in contexts may be followed by the symbol <code>*</code> for zero or
more occurrences, or <code>+</code> for one or more occurrences.
</p>
<p>The symbols in the rules are treated as set names if they are declared
as such or as symbols in the input/output alphabets.  The symbols
may be more than one character long and the names are case sensitive.
</p>
<p>The rules are tried in order until one matches the first (or more)
symbol of the tape.  The rule is applied adding the right hand side to
the output tape.  The rules are again applied from the start of the list
of rules.
</p>
<p>The function used to apply a set of rules if given an atom will explode
it into a list of single characters, while if given a list will use it
as is.  This reflects the common usage of wishing to re-write the
individual letters in a word to phonemes but without excluding the
possibility of using the system for more complex manipulations,
such as multi-pass LTS systems and phoneme conversion.
</p>
<p>From lisp there are three basic access functions, there
are corresponding functions in the C/C++ domain.
</p>
<dl compact="compact">
<dt><code>(lts.ruleset NAME SETS RULES)</code></dt>
<dd><p>Define a new set of lts rules.  Where <code>NAME</code> is the name for this
rule, SETS is a list of set definitions of the form <code>(SETNAME e0 e1
...)</code>  and <code>RULES</code> are a list of rules as described above.
</p></dd>
<dt><code>(lts.apply WORD RULESETNAME)</code></dt>
<dd><p>Apply the set of rules named <code>RULESETNAME</code> to <code>WORD</code>.  If
<code>WORD</code> is a symbol it is exploded into a list of the individual
characters in its print name.  If <code>WORD</code> is a list it is used as
is.  If the rules cannot be successfully applied an error is given.  The
result of (successful) application is returned in a list.
</p></dd>
<dt><code>(lts.check_alpha WORD RULESETNAME)</code></dt>
<dd><p>The symbols in <code>WORD</code> are checked against the input alphabet of the
rules named <code>RULESETNAME</code>.  If they are all contained in that
alphabet <code>t</code> is returned, else <code>nil</code>.  Note this does not
necessarily mean the rules will successfully apply (contexts may restrict
the application of the rules), but it allows general checking like
numerals, punctuation etc, allowing application of appropriate rule
sets.
</p></dd>
</dl>

<p>The letter to sound rule system may be used directly from Lisp
and can easily be used to do relatively complex operations for
analyzing words without requiring modification of the C/C++
system.  For example the Welsh letter to sound rule system consists
or three rule sets, first to explicitly identify epenthesis, then
identify stressed vowels, and finally rewrite this augmented
letter string to phonemes.  This is achieved by
the following function
</p><div class="lisp">
<pre class="lisp">(define (welsh_lts word features)
  (let (epen str wel)
    (set! epen (lts.apply (downcase word) 'newepen))
    (set! str (lts.apply epen 'newwelstr))
    (set! wel (lts.apply str 'newwel))
    (list word
          nil
          (lex.syllabify.phstress wel))))
</pre></div>
<p>The LTS method for the Welsh lexicon is set to <code>welsh_lts</code>, so this
function is called when a word is not found in the lexicon.  The
above function first downcases the word and then applies the rulesets in
turn, finally calling the syllabification process and returns a
constructed lexically entry.
</p>
<hr>
<div class="header">
<p>
Next: <a href="Building-letter-to-sound-rules.html#Building-letter-to-sound-rules" accesskey="n" rel="next">Building letter to sound rules</a>, Previous: <a href="Lookup-process.html#Lookup-process" accesskey="p" rel="prev">Lookup process</a>, Up: <a href="Lexicons.html#Lexicons" accesskey="u" rel="up">Lexicons</a> &nbsp; [<a href="Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>



</body>
</html>
festival-doc 1:2.1~release-8 / usr / share / doc / festival-doc / html / Letter-to-sound-rules.html