This file is indexed.

/usr/share/doc/ne/html/The-Encoding-Mess.html is in ne-doc 3.0.1-2build1.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created by GNU Texinfo 5.2, http://www.gnu.org/software/texinfo/ -->
<head>
<title>ne&rsquo;s manual: The Encoding Mess</title>

<meta name="description" content="ne&rsquo;s manual: The Encoding Mess">
<meta name="keywords" content="ne&rsquo;s manual: The Encoding Mess">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link href="index.html#Top" rel="start" title="Top">
<link href="Concept-Index.html#Concept-Index" rel="index" title="Concept Index">
<link href="Command-Index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="index.html#Top" rel="up" title="Top">
<link href="History.html#History" rel="next" title="History">
<link href="Motivations-and-Design.html#Motivations-and-Design" rel="prev" title="Motivations and Design">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.smallquotation {font-size: smaller}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
div.indentedblock {margin-left: 3.2em}
div.lisp {margin-left: 3.2em}
div.smalldisplay {margin-left: 3.2em}
div.smallexample {margin-left: 3.2em}
div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
div.smalllisp {margin-left: 3.2em}
kbd {font-style:oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: inherit; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: inherit; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.nocodebreak {white-space:nowrap}
span.nolinebreak {white-space:nowrap}
span.roman {font-family:serif; font-weight:normal}
span.sansserif {font-family:sans-serif; font-weight:normal}
ul.no-bullet {list-style: none}
-->
</style>


</head>

<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
<a name="The-Encoding-Mess"></a>
<div class="header">
<p>
Next: <a href="History.html#History" accesskey="n" rel="next">History</a>, Previous: <a href="Motivations-and-Design.html#Motivations-and-Design" accesskey="p" rel="prev">Motivations and Design</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> &nbsp; [<a href="Command-Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<a name="The-Encoding-Mess-1"></a>
<h2 class="chapter">8 The Encoding Mess</h2>
<a name="index-UTF_002d8"></a>
<a name="index-ISO_002d8859-family"></a>
<a name="index-ISO_002d8859_002d1"></a>

<p>The original <code>ne</code> handled 8-bit text files, and assumed that every
byte coming from the keyboard could be output to the terminal. No other
assumption was made&mdash;for instance, the up/down casing functions did not
assume a particular encoding for non-US-ASCII characters. This choice
had a significant advantage: <code>ne</code> could handle easily several
different encodings, with minor nuisances for the end user.
</p>
<p>Since version 1.30, <code>ne</code> supports UTF-8. It can use UTF-8 for its
input/output, and it can also interpret one or more buffers as containing
UTF-8 encoded text, acting accordingly. Note that the buffer content is
actual UTF-8 text&mdash;<code>ne</code> does not use wide characters. As a
positive side-effect, <code>ne</code> can support fully the ISO-10646
standard, but nonetheless non-UTF-8 texts occupy exactly one byte per
character.
</p>
<p>More precisely, <em>any</em> piece of text in <code>ne</code> is classified as
US-ASCII, 8-bit or UTF-8. A US-ASCII text contains only US-ASCII
characters. An 8-bit text sports a one-to-one correspondence between
characters and bytes, whereas an UTF-8 text is interpreted in UTF-8.  Of
course, this rises a difficult question: <em>when</em> should a buffer be
classified as UTF-8?
</p>
<p>Character encodings are a mess. There is nothing we can do to change
this fact, as character encodings are <em>metadata that modify data
semantics</em>. The same file may represent different texts of different
lengths when interpreted with different encodings. Thus, there is no safe
way of guessing the encoding of a file.
</p>
<p><code>ne</code> stays on the safe side: it will never try to convert a file
from an encoding to another one. It can, however, interpret data
contained in a buffer depending on an encoding: in other words,
encodings are truly treated as metadata. You can switch off UTF-8
at any time, and see the same buffer as a standard 8-bit file.
</p>
<p>Moreover, <code>ne</code> uses a <em>lazy</em> approach to the problem: first of
all, unless the UTF-8 automatic detection flag is set
(see <a href="UTF8Auto.html#UTF8Auto">UTF8Auto</a>), no attempt is ever made to consider a file as UTF-8
encoded.  Every file, clip, command line, etc., is firstly scanned for
non-US-ASCII characters: if it is entirely made of US-ASCII characters,
it is classified as US-ASCII. An US-ASCII piece of text is compatible
with anything else&mdash;it may be pasted in any buffer, or, if it is a
buffer, it may accept any form of text. Buffers classified as US-ASCII
are distinguished by an &lsquo;<samp>A</samp>&rsquo; on the status bar.
</p>
<p>As soon as a user action forces a choice of encoding (e.g., an accented
character is typed, or an UTF-8-encoded clip is pasted), <code>ne</code> fixes
the mode to 8-bit or UTF-8 (when there is a choice, this depends on
the value of the <a href="UTF8Auto.html#UTF8Auto">UTF8Auto</a> flag). Of course, in some cases this may
be impossible, and in that case an error will be reported.
</p>
<p>All this happens behind the scenes, and it is designed so that in 99% of
the cases there is no need to think of encodings. In any case, should
<code>ne</code>&rsquo;s behaviour not match your needs, you can always change at run
time the level of UTF-8 support.
</p>



<hr>
<div class="header">
<p>
Next: <a href="History.html#History" accesskey="n" rel="next">History</a>, Previous: <a href="Motivations-and-Design.html#Motivations-and-Design" accesskey="p" rel="prev">Motivations and Design</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> &nbsp; [<a href="Command-Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>



</body>
</html>