This file is indexed.

/usr/lib/perl5/Unicode/UTF8.pod is in libunicode-utf8-perl 0.59-1build1.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
=head1 NAME

Unicode::UTF8 - Encoding and decoding of UTF-8 encoding form

=head1 SYNOPSIS

    use Unicode::UTF8 qw[decode_utf8 encode_utf8];
    
    use warnings FATAL => 'utf8'; # fatalize encoding glitches
    $string = decode_utf8($octets);
    $octets = encode_utf8($string);

=head1 DESCRIPTION

This module provides functions to encode and decode UTF-8 encoding form as 
specified by Unicode and ISO/IEC 10646:2011.

=head1 FUNCTIONS

=head2 decode_utf8

    $string = decode_utf8($octets);
    $string = decode_utf8($octets, $fallback);

Returns an decoded representation of C<$octets> in UTF-8 encoding as a character
string.

C<$fallback> is an optional C<CODE> reference which provides a error-handling 
mechanism, allowing customization of error handling. The default error-handling 
mechanism is to replace any ill-formed UTF-8 sequences or encoded code points 
which can't be interchanged with REPLACEMENT CHARACTER (U+FFFD).

    $string = $fallback->($octets, $is_usv, $position);

C<$fallback> is invoked with three arguments: C<$octets>, C<$is_usv> and 
C<$position>. C<$octets> is a sequence of one or more octets containing the 
maximal subpart of the ill-formed subsequence or encoded code point which 
can't be interchanged. C<$is_usv> is a boolean indicating whether or not 
C<$octets> represent a encoded Unicode scalar value. C<$position> is a 
unsigned integer containing the zero based octet position at which the error 
occurred within the octets provided to C<decode_utf8()>. C<$fallback> must 
return a character string consisting of zero or more Unicode scalar values. 
Unicode scalar values consist of code points in the range U+0000..U+D7FF and 
U+E000..U+10FFFF.

=head2 encode_utf8

    $octets = encode_utf8($string);
    $octets = encode_utf8($string, $fallback);

Returns an encoded representation of C<$string> in UTF-8 encoding as an octet
string.

C<$fallback> is an optional C<CODE> reference which provides a error-handling 
mechanism, allowing customization of error handling. The default error-handling 
mechanism is to replace any code points which can't be interchanged or represented 
in UTF-8 encoding form with REPLACEMENT CHARACTER (U+FFFD).

    $string = $fallback->($codepoint, $is_usv, $position);

C<$fallback> is invoked with three arguments: C<$codepoint>, C<$is_usv> and 
C<$position>. C<$codepoint> is a unsigned integer containing the code point 
which can't be interchanged or represented in UTF-8 encoding form. C<$is_usv> 
is a boolean indicating whether or not C<$codepoint> is a Unicode scalar value. 
C<$position> is a unsigned integer containing the zero based character position 
at which the error occurred within the string provided to C<encode_utf8()>. 
C<$fallback> must return a character string consisting of zero or more Unicode 
scalar values.Unicode scalar values consist of code points in the range 
U+0000..U+D7FF and U+E000..U+10FFFF.

=head1 EXPORTS

None by default. All functions can be exported using the C<:all> tag or individually.

=head1 DIAGNOSTICS

=over 4

=item Can't decode a wide character string

(F) Wide character in octets.

=item Can't decode ill-formed UTF-8 octet sequence <%s> in position %u

(W utf8) Encountered an ill-formed UTF-8 octet sequence. <%s> contains a 
hexadecimal representation of the maximal subpart of the ill-formed subsequence.

=item Can't interchange noncharacter code point U+%X in position %u

(W utf8, nonchar) Noncharacters are code points that are permanently reserved 
in the Unicode Standard for internal use. They are forbidden for use in open 
interchange of Unicode text data. Noncharacters consist of the values U+nFFFE 
and U+nFFFF (where n is from 0 to 10^16) and the values U+FDD0..U+FDEF.

=item Can't represent surrogate code point U+%X in position %u

(W utf8, surrogate) Surrogate code points are designated only for surrogate code 
units in the UTF-16 character encoding form. Surrogates consist of code points 
in the range U+D800 to U+DFFF.

=item Can't represent super code point \x{%X} in position %u

(W utf8, non_unicode) Code points greater than U+10FFFF. Perl's extended codespace.

=item Can't decode ill-formed UTF-X octet sequence <%s> in position %u

(F) Encountered an ill-formed octet sequence in Perl's internal representation 
of wide characters.

=back

The sub-categories: C<nonchar>, C<surrogate> and C<non_unicode> is only available 
on Perl 5.14 or greater. See L<perllexwarn> for available categories and hierarchies.

=head1 COMPARISON

Here is a summary of features for comparison with L<Encode>'s UTF-8 implementation:

=over 4

=item *

Simple API which makes use of Perl's standard warning categories.

=item *

Recognizes all noncharacters regardless of Perl version

=item *

Implements Unicode's recommended practice for using U+FFFD.

=item *

Better diagnostics in warning messages

=item *

Detects and reports inconsistency in Perl's internal representation of 
wide characters (UTF-X)

=item *

Preserves taintedness of decoded C<$octets> or encoded C<$string>

=item *

Better performance ~ 600% - 1200% (JA: 600%, AR: 700%, SV: 900%, EN: 1200%, 
see benchmarks directory in git repository)

=back

=head1 CONFORMANCE

It's the author's believe that this UTF-8 implementation is conformant with 
the Unicode Standard Version 6.0. Any deviations from the Unicode Standard 
is to be considered a bug.

=head1 SEE ALSO

=over 4

=item L<Encode>

=item L<http://www.unicode.org/>

=back

=head1 SUPPORT

=head2 BUGS

Please report any bugs by email to C<bug-unicode-utf8 at rt.cpan.org>, or 
through the web interface at L<http://rt.cpan.org/Public/Dist/Display.html?Name=Unicode-UTF8>. 
You will be automatically notified of any progress on the request by the system.

=head2 SOURCE CODE

This is open source software. The code repository is available for public 
review and contribution under the terms of the license.

L<http://github.com/chansen/p5-unicode-utf8>

    git clone http://github.com/chansen/p5-unicode-utf8

=head1 AUTHOR

Christian Hansen C<chansen@cpan.org>

=head1 COPYRIGHT

Copyright 2011-2012 by Christian Hansen.

This is free software; you can redistribute it and/or modify it under
the same terms as the Perl 5 programming language system itself.