/usr/share/doc/festival-doc/html/Lexical-entries.html is in festival-doc 1:2.1~release-8.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created by GNU Texinfo 5.2, http://www.gnu.org/software/texinfo/ -->
<head>
<title>Festival Speech Synthesis System: Lexical entries</title>
<meta name="description" content="Festival Speech Synthesis System: Lexical entries">
<meta name="keywords" content="Festival Speech Synthesis System: Lexical entries">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link href="index.html#Top" rel="start" title="Top">
<link href="Index.html#Index" rel="index" title="Index">
<link href="Index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Lexicons.html#Lexicons" rel="up" title="Lexicons">
<link href="Defining-lexicons.html#Defining-lexicons" rel="next" title="Defining lexicons">
<link href="Lexicons.html#Lexicons" rel="prev" title="Lexicons">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.smallquotation {font-size: smaller}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
div.indentedblock {margin-left: 3.2em}
div.lisp {margin-left: 3.2em}
div.smalldisplay {margin-left: 3.2em}
div.smallexample {margin-left: 3.2em}
div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
div.smalllisp {margin-left: 3.2em}
kbd {font-style:oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: inherit; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: inherit; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.nocodebreak {white-space:nowrap}
span.nolinebreak {white-space:nowrap}
span.roman {font-family:serif; font-weight:normal}
span.sansserif {font-family:sans-serif; font-weight:normal}
ul.no-bullet {list-style: none}
-->
</style>
</head>
<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
<a name="Lexical-entries"></a>
<div class="header">
<p>
Next: <a href="Defining-lexicons.html#Defining-lexicons" accesskey="n" rel="next">Defining lexicons</a>, Up: <a href="Lexicons.html#Lexicons" accesskey="u" rel="up">Lexicons</a> [<a href="Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<a name="Lexical-entries-1"></a>
<h3 class="section">13.1 Lexical entries</h3>
<a name="index-lexical-entries"></a>
<p>Lexical entries consist of three basic parts, a head word, a part of
speech and a pronunciation. The headword is what you might normally
think of as a word e.g. ‘<samp>walk</samp>’, ‘<samp>chairs</samp>’ etc. but it might be
any token.
</p>
<a name="index-part-of-speech-tag"></a>
<a name="index-POS"></a>
<a name="index-part-of-speech-map"></a>
<p>The part-of-speech field currently consist of a simple atom (or nil if
none is specified). Of course there are many part of speech tag sets
and whatever you mark in your lexicon must be compatible with the
subsystems that use that information. You can optionally set a part of
speech tag mapping for each lexicon. The value should be a reverse
assoc-list of the following form
</p><div class="lisp">
<pre class="lisp">(lex.set.pos.map
'((( punc fpunc) punc)
(( nn nnp nns nnps ) n)))
</pre></div>
<p>All part of speech tags not appearing in the left hand side of a pos map
are left unchanged.
</p>
<a name="index-pronunciation"></a>
<p>The third field contains the actual pronunciation of the word. This
is an arbitrary Lisp S-expression. In many of the lexicons distributed
with Festival this entry has internal format, identifying syllable
structure, stress markigns and of course the phones themselves. In
some of our other lexicons we simply list the phones with stress marking
on each vowel.
</p>
<p>Some typical example entries are
</p>
<div class="lisp">
<pre class="lisp">( "walkers" n ((( w oo ) 1) (( k @ z ) 0)) )
( "present" v ((( p r e ) 0) (( z @ n t ) 1)) )
( "monument" n ((( m o ) 1) (( n y u ) 0) (( m @ n t ) 0)) )
</pre></div>
<a name="index-homographs"></a>
<p>Note you may have two entries with the same headword, but different
part of speech fields allow differentiation. For example
</p>
<div class="lisp">
<pre class="lisp">( "lives" n ((( l ai v z ) 1)) )
( "lives" v ((( l i v z ) 1)) )
</pre></div>
<p>See <a href="Lookup-process.html#Lookup-process">Lookup process</a>, for a description of how multiple entries with the
same headword are used during lookup.
</p>
<a name="index-lexical-stress"></a>
<p>By current conventions, single syllable function words should have no
stress marking, while single syllable content words should be stressed.
</p>
<p><em>NOTE:</em> the POS field may change in future to contain more complex
formats. The same lexicon mechanism (but different lexicon) is
used for holding part of speech tag distributions for the POS prediction
module.
</p>
<hr>
<div class="header">
<p>
Next: <a href="Defining-lexicons.html#Defining-lexicons" accesskey="n" rel="next">Defining lexicons</a>, Up: <a href="Lexicons.html#Lexicons" accesskey="u" rel="up">Lexicons</a> [<a href="Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>
|