/usr/share/doc/aspell-doc/aspell.html/Unicode-Normalization.html

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- This is the user's manual for Aspell

GNU Aspell is a spell checker designed to eventually replace Ispell.
It can either be used as a library or as an independent spell checker.

Copyright © 2000-2011 Kevin Atkinson.

Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1 or
any later version published by the Free Software Foundation; with no
Invariant Sections, no Front-Cover Texts and no Back-Cover Texts.  A
copy of the license is included in the section entitled "GNU Free
Documentation License". -->
<!-- Created by GNU Texinfo 6.4.90, http://www.gnu.org/software/texinfo/ -->
<head>
<title>Unicode Normalization (GNU Aspell 0.60.7-pre)</title>

<meta name="description" content="Aspell 0.60.7-pre spell checker user&rsquo;s manual.">
<meta name="keywords" content="Unicode Normalization (GNU Aspell 0.60.7-pre)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link href="index.html#Top" rel="start" title="Top">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Language-Related-Issues.html#Language-Related-Issues" rel="up" title="Language Related Issues">
<link href="German-Sharp-S.html#German-Sharp-S" rel="next" title="German Sharp S">
<link href="Words-With-Symbols-in-Them.html#Words-With-Symbols-in-Them" rel="prev" title="Words With Symbols in Them">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.indentedblock {margin-right: 0em}
blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
blockquote.smallquotation {font-size: smaller}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
div.lisp {margin-left: 3.2em}
div.smalldisplay {margin-left: 3.2em}
div.smallexample {margin-left: 3.2em}
div.smalllisp {margin-left: 3.2em}
kbd {font-style: oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: inherit; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: inherit; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.nolinebreak {white-space: nowrap}
span.roman {font-family: initial; font-weight: normal}
span.sansserif {font-family: sans-serif; font-weight: normal}
ul.no-bullet {list-style: none}
-->
</style>


</head>

<body lang="en">
<a name="Unicode-Normalization"></a>
<div class="header">
<p>
Next: <a href="German-Sharp-S.html#German-Sharp-S" accesskey="n" rel="next">German Sharp S</a>, Previous: <a href="Words-With-Symbols-in-Them.html#Words-With-Symbols-in-Them" accesskey="p" rel="prev">Words With Symbols in Them</a>, Up: <a href="Language-Related-Issues.html#Language-Related-Issues" accesskey="u" rel="up">Language Related Issues</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>]</p>
</div>
<hr>
<a name="Unicode-Normalization-1"></a>
<h3 class="appendixsec">C.3 Unicode Normalization</h3>

<p>Because Unicode contains a large number of precomposed characters there
are multiple ways a character can be represented.  For example letter
ö can either be represented as
</p>
<div class="example">
<pre class="example">U+00F6 LATIN SMALL LETTER O WITH DIAERESIS
</pre><pre class="example">or
</pre><pre class="example">U+0061 LATIN SMALL LETTER O + U+0308 COMBINING DIAERESIS
</pre></div>

<p>By performing normalization first, Aspell will only see one of these
representations.  The exact form of normalization depends on the
language.  Give the choice of:
</p>
<ol>
<li> Precomposed character
</li><li> Base letter + combining character(s)
</li><li> Base letter only
</li></ol>

<p>if the precomposed character is in the target character set, then (1),
if both base and combining character is present, then (2), otherwise (3).
</p>
<p>Unicode Normalization is now implemented in Aspell 0.60.
</p>



</body>
</html>
aspell-doc 0.60.7~20110707-4 / usr / share / doc / aspell-doc / aspell.html / Unicode-Normalization.html