/usr/share/doc/aspell-doc/aspell.html/Unicode-Normalization.html is in aspell-doc 0.60.7~20110707-4.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- This is the user's manual for Aspell
GNU Aspell is a spell checker designed to eventually replace Ispell.
It can either be used as a library or as an independent spell checker.
Copyright © 2000-2011 Kevin Atkinson.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1 or
any later version published by the Free Software Foundation; with no
Invariant Sections, no Front-Cover Texts and no Back-Cover Texts. A
copy of the license is included in the section entitled "GNU Free
Documentation License". -->
<!-- Created by GNU Texinfo 6.4.90, http://www.gnu.org/software/texinfo/ -->
<head>
<title>Unicode Normalization (GNU Aspell 0.60.7-pre)</title>
<meta name="description" content="Aspell 0.60.7-pre spell checker user’s manual.">
<meta name="keywords" content="Unicode Normalization (GNU Aspell 0.60.7-pre)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link href="index.html#Top" rel="start" title="Top">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Language-Related-Issues.html#Language-Related-Issues" rel="up" title="Language Related Issues">
<link href="German-Sharp-S.html#German-Sharp-S" rel="next" title="German Sharp S">
<link href="Words-With-Symbols-in-Them.html#Words-With-Symbols-in-Them" rel="prev" title="Words With Symbols in Them">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.indentedblock {margin-right: 0em}
blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
blockquote.smallquotation {font-size: smaller}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
div.lisp {margin-left: 3.2em}
div.smalldisplay {margin-left: 3.2em}
div.smallexample {margin-left: 3.2em}
div.smalllisp {margin-left: 3.2em}
kbd {font-style: oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: inherit; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: inherit; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.nolinebreak {white-space: nowrap}
span.roman {font-family: initial; font-weight: normal}
span.sansserif {font-family: sans-serif; font-weight: normal}
ul.no-bullet {list-style: none}
-->
</style>
</head>
<body lang="en">
<a name="Unicode-Normalization"></a>
<div class="header">
<p>
Next: <a href="German-Sharp-S.html#German-Sharp-S" accesskey="n" rel="next">German Sharp S</a>, Previous: <a href="Words-With-Symbols-in-Them.html#Words-With-Symbols-in-Them" accesskey="p" rel="prev">Words With Symbols in Them</a>, Up: <a href="Language-Related-Issues.html#Language-Related-Issues" accesskey="u" rel="up">Language Related Issues</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>]</p>
</div>
<hr>
<a name="Unicode-Normalization-1"></a>
<h3 class="appendixsec">C.3 Unicode Normalization</h3>
<p>Because Unicode contains a large number of precomposed characters there
are multiple ways a character can be represented. For example letter
ö can either be represented as
</p>
<div class="example">
<pre class="example">U+00F6 LATIN SMALL LETTER O WITH DIAERESIS
</pre><pre class="example">or
</pre><pre class="example">U+0061 LATIN SMALL LETTER O + U+0308 COMBINING DIAERESIS
</pre></div>
<p>By performing normalization first, Aspell will only see one of these
representations. The exact form of normalization depends on the
language. Give the choice of:
</p>
<ol>
<li> Precomposed character
</li><li> Base letter + combining character(s)
</li><li> Base letter only
</li></ol>
<p>if the precomposed character is in the target character set, then (1),
if both base and combining character is present, then (2), otherwise (3).
</p>
<p>Unicode Normalization is now implemented in Aspell 0.60.
</p>
</body>
</html>
|