This file is indexed.

/usr/share/doc/recode-doc/Library.html is in recode-doc 3.6-21.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created by GNU Texinfo 5.1, http://www.gnu.org/software/texinfo/ -->
<head>
<title>The recode reference manual: Library</title>

<meta name="description" content="The recode reference manual: Library">
<meta name="keywords" content="The recode reference manual: Library">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link href="index.html#Top" rel="start" title="Top">
<link href="Concept-Index.html#Concept-Index" rel="index" title="Concept Index">
<link href="Charset-and-Surface-Index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="index.html#Top" rel="up" title="Top">
<link href="Universal.html#Universal" rel="next" title="Universal">
<link href="Invoking-recode.html#Debugging" rel="previous" title="Debugging">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.smallquotation {font-size: smaller}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
div.indentedblock {margin-left: 3.2em}
div.lisp {margin-left: 3.2em}
div.smalldisplay {margin-left: 3.2em}
div.smallexample {margin-left: 3.2em}
div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
div.smalllisp {margin-left: 3.2em}
kbd {font-style:oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: inherit; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: inherit; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.nocodebreak {white-space:nowrap}
span.nolinebreak {white-space:nowrap}
span.roman {font-family:serif; font-weight:normal}
span.sansserif {font-family:sans-serif; font-weight:normal}
ul.no-bullet {list-style: none}
-->
</style>


</head>

<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
<a name="Library"></a>
<div class="header">
<p>
Next: <a href="Universal.html#Universal" accesskey="n" rel="next">Universal</a>, Previous: <a href="Invoking-recode.html#Invoking-recode" accesskey="p" rel="previous">Invoking recode</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> &nbsp; [<a href="Charset-and-Surface-Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="A-recoding-library"></a>
<h2 class="chapter">4 A recoding library</h2>

<a name="index-recoding-library"></a>
<p>The program named <code>recode</code> is just an application of its recoding
library.  The recoding library is available separately for other C
programs.  A good way to acquire some familiarity with the recoding
library is to get acquainted with the <code>recode</code> program itself.
</p>
<p>To use the recoding library once it is installed, a C program needs to
have a line:
</p>
<div class="example">
<pre class="example">#include &lt;recode.h&gt;
</pre></div>

<p>near its beginning, and the user should have &lsquo;<samp>-lrecode</samp>&rsquo; on the
linking call, so modules from the recoding library are found.
</p>
<p>The library is still under development.  As it stands, it contains four
identifiable sets of routines: the outer level functions, the request
level functions, the task level functions and the charset level functions.
There are discussed in separate sections.
</p>
<p>For effectively using the recoding library in most applications, it should
be rarely needed to study anything beyond the main initialisation function
at outer level, and then, various functions at request level.
</p>
<table class="menu" border="0" cellspacing="0">
<tr><td align="left" valign="top">&bull; <a href="#Outer-level" accesskey="1">Outer level</a>:</td><td>&nbsp;&nbsp;</td><td align="left" valign="top">Outer level functions
</td></tr>
<tr><td align="left" valign="top">&bull; <a href="#Request-level" accesskey="2">Request level</a>:</td><td>&nbsp;&nbsp;</td><td align="left" valign="top">Request level functions
</td></tr>
<tr><td align="left" valign="top">&bull; <a href="#Task-level" accesskey="3">Task level</a>:</td><td>&nbsp;&nbsp;</td><td align="left" valign="top">Task level functions
</td></tr>
<tr><td align="left" valign="top">&bull; <a href="#Charset-level" accesskey="4">Charset level</a>:</td><td>&nbsp;&nbsp;</td><td align="left" valign="top">Charset level functions
</td></tr>
<tr><td align="left" valign="top">&bull; <a href="#Errors" accesskey="5">Errors</a>:</td><td>&nbsp;&nbsp;</td><td align="left" valign="top">Handling errors
</td></tr>
</table>

<hr>
<a name="Outer-level"></a>
<div class="header">
<p>
Next: <a href="#Request-level" accesskey="n" rel="next">Request level</a>, Previous: <a href="#Library" accesskey="p" rel="previous">Library</a>, Up: <a href="#Library" accesskey="u" rel="up">Library</a> &nbsp; [<a href="Charset-and-Surface-Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Outer-level-functions"></a>
<h3 class="section">4.1 Outer level functions</h3>

<a name="index-outer-level-functions"></a>
<p>The outer level functions mainly prepare the whole recoding library for
use, or do actions which are unrelated to specific recodings.  Here is
an example of a program which does not really make anything useful.
</p>
<div class="example">
<pre class="example">#include &lt;stdbool.h&gt;
#include &lt;recode.h&gt;

const char *program_name;

int
main (int argc, char *const *argv)
{
  program_name = argv[0];
  RECODE_OUTER outer = recode_new_outer (true);

  recode_delete_outer (outer);
  exit (0);
}
</pre></div>

<a name="index-RECODE_005fOUTER-structure"></a>
<p>The header file <code>&lt;recode.h&gt;</code> declares an opaque <code>RECODE_OUTER</code>
structure, which the programmer should use for allocating a variable in
his program (let&rsquo;s assume the programmer is a male, here, no prejudice
intended).  This &lsquo;<samp>outer</samp>&rsquo; variable is given as a first argument to
all outer level functions.
</p>
<a name="index-stdbool_002eh-header"></a>
<a name="index-bool-data-type"></a>
<p>The <code>&lt;recode.h&gt;</code> header file uses the Boolean type setup by the
system header file <code>&lt;stdbool.h&gt;</code>.  But this header file is still
fairly new in C standards, and likely does not exist everywhere.  If you
system does not offer this system header file yet, the proper compilation
of the <code>&lt;recode.h&gt;</code> file could be guaranteed through the replacement
of the inclusion line by:
</p>
<div class="example">
<pre class="example">typedef enum {false = 0, true = 1} bool;
</pre></div>

<p>People wanting wider portability, or Autoconf lovers, might arrange their
<samp>configure.in</samp> for being able to write something more general, like:
</p>
<div class="example">
<pre class="example">#if STDC_HEADERS
# include &lt;stdlib.h&gt;
#endif

/* Some systems do not define EXIT_*, even with STDC_HEADERS.  */
#ifndef EXIT_SUCCESS
# define EXIT_SUCCESS 0
#endif
#ifndef EXIT_FAILURE
# define EXIT_FAILURE 1
#endif
/* The following test is to work around the gross typo in systems like Sony
   NEWS-OS Release 4.0C, whereby EXIT_FAILURE is defined to 0, not 1.  */
#if !EXIT_FAILURE
# undef EXIT_FAILURE
# define EXIT_FAILURE 1
#endif

#if HAVE_STDBOOL_H
# include &lt;stdbool.h&gt;
#else
typedef enum {false = 0, true = 1} bool;
#endif

#include &lt;recode.h&gt;

const char *program_name;

int
main (int argc, char *const *argv)
{
  program_name = argv[0];
  RECODE_OUTER outer = recode_new_outer (true);

  recode_term_outer (outer);
  exit (EXIT_SUCCESS);
}
</pre></div>

<p>but we will not insist on such details in the examples to come.
</p>
<ul>
<li> Initialisation functions
<a name="index-initialisation-functions_002c-outer"></a>

<div class="example">
<pre class="example">RECODE_OUTER recode_new_outer (<var>auto_abort</var>);
bool recode_delete_outer (<var>outer</var>);
</pre></div>

<a name="index-recode_005fnew_005fouter"></a>
<a name="index-recode_005fdelete_005fouter"></a>
<p>The recoding library absolutely needs to be initialised before being used,
and <code>recode_new_outer</code> has to be called once, first.  Besides the
<var>outer</var> it is meant to initialise, the function accepts a Boolean
argument whether or not the library should automatically issue diagnostics
on standard and abort the whole program on errors.  When <var>auto_abort</var>
is <code>true</code>, the library later conveniently issues diagnostics itself,
and aborts the calling program on errors.  This is merely a convenience,
because if this parameter was <code>false</code>, the calling program should always
take care of checking the return value of all other calls to the recoding
library functions, and when any error is detected, issue a diagnostic and
abort processing itself.
</p>
<p>Regardless of the setting of <var>auto_abort</var>, all recoding library
functions return a success status.  Most functions are geared for returning
<code>false</code> for an error, and <code>true</code> if everything went fine.
Functions returning structures or strings return <code>NULL</code> instead
of the result, when the result cannot be produced.  If <var>auto_abort</var>
is selected, functions either return <code>true</code>, or do not return at all.
</p>
<p>As in the example above, <code>recode_new_outer</code> is called only once in
most cases.  Calling <code>recode_new_outer</code> implies some overhead, so
calling it more than once should preferably be avoided.
</p>
<p>The termination function <code>recode_delete_outer</code> reclaims the memory
allocated by <code>recode_new_outer</code> for a given <var>outer</var> variable.
Calling <code>recode_delete_outer</code> prior to program termination is more
aesthetic then useful, as all memory resources are automatically reclaimed
when the program ends.  You may spare this terminating call if you prefer.
</p>
</li><li> The <code>program_name</code> declaration

<a name="index-program_005fname-variable"></a>
<p>As we just explained, the user may set the <code>recode</code> library so that,
in case of problems error, it issues the diagnostic itself and aborts the
whole processing.  This capability may be quite convenient.  When this
feature is used, the aborting routine includes the name of the running
program in the diagnostic.  On the other hand, when this feature is not
used, the library merely return error codes, giving the library user fuller
control over all this.  This behaviour is more like what usual libraries
do: they return codes and never abort.  However, I would rather not force
library users to necessarily check all return codes themselves, by leaving
no other choice.  In most simple applications, letting the library diagnose
and abort is much easier, and quite welcome.  This is precisely because
both possibilities exist that the <code>program_name</code> variable is needed: it
may be used by the library <em>when</em> the user sets it to diagnose itself.
</p></li></ul>

<hr>
<a name="Request-level"></a>
<div class="header">
<p>
Next: <a href="#Task-level" accesskey="n" rel="next">Task level</a>, Previous: <a href="#Outer-level" accesskey="p" rel="previous">Outer level</a>, Up: <a href="#Library" accesskey="u" rel="up">Library</a> &nbsp; [<a href="Charset-and-Surface-Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Request-level-functions"></a>
<h3 class="section">4.2 Request level functions</h3>

<a name="index-request-level-functions"></a>
<p>The request level functions are meant to cover most recoding needs
programmers may have; they should provide all usual functionality.
Their API is almost stable by now.  To get started with request level
functions, here is a full example of a program which sole job is to filter
<code>ibmpc</code> code on its standard input into <code>latin1</code> code on its
standard output.
</p>
<div class="example">
<pre class="example">#include &lt;stdio.h&gt;
#include &lt;stdbool.h&gt;
#include &lt;recode.h&gt;

const char *program_name;

int
main (int argc, char *const *argv)
{
  program_name = argv[0];
  RECODE_OUTER outer = recode_new_outer (true);
  RECODE_REQUEST request = recode_new_request (outer);
  bool success;

  recode_scan_request (request, &quot;ibmpc..latin1&quot;);

  success = recode_file_to_file (request, stdin, stdout);

  recode_delete_request (request);
  recode_delete_outer (outer);

  exit (success ? 0 : 1);
}
</pre></div>

<a name="index-RECODE_005fREQUEST-structure"></a>
<p>The header file <code>&lt;recode.h&gt;</code> declares a <code>RECODE_REQUEST</code> structure,
which the programmer should use for allocating a variable in his program.
This <var>request</var> variable is given as a first argument to all request
level functions, and in most cases, may be considered as opaque.
</p>
<ul>
<li> Initialisation functions
<a name="index-initialisation-functions_002c-request"></a>

<div class="example">
<pre class="example">RECODE_REQUEST recode_new_request (<var>outer</var>);
bool recode_delete_request (<var>request</var>);
</pre></div>

<a name="index-recode_005fnew_005frequest"></a>
<a name="index-recode_005fdelete_005frequest"></a>
<p>No <var>request</var> variable may not be used in other request level
functions of the recoding library before having been initialised by
<code>recode_new_request</code>.  There may be many such <var>request</var>
variables, in which case, they are independent of one another and
they all need to be initialised separately.  To avoid memory leaks, a
<var>request</var> variable should not be initialised a second time without
calling <code>recode_delete_request</code> to &ldquo;un-initialise&rdquo; it.
</p>
<p>Like for <code>recode_delete_outer</code>, calling <code>recode_delete_request</code>
prior to program termination, in the example above, may be left out.
</p>
</li><li> Fields of <code>struct recode_request</code>
<a name="index-recode_005frequest-structure"></a>

<p>Here are the fields of a <code>struct recode_request</code> which may be
meaningfully changed, once a <var>request</var> has been initialised by
<code>recode_new_request</code>, but before it gets used.  It is not very frequent,
in practice, that these fields need to be changed.  To access the fields,
you need to include <samp>recodext.h</samp> <em>instead</em> of <samp>recode.h</samp>,
in which case there also is a greater chance that you need to recompile
your programs if a new version of the recoding library gets installed.
</p>
<dl compact="compact">
<dt><code>verbose_flag</code></dt>
<dd><a name="index-verbose_005fflag"></a>
<p>This field is initially <code>false</code>.  When set to <code>true</code>, the
library will echo to stderr the sequence of elementary recoding steps
needed to achieve the requested recoding.
</p>
</dd>
<dt><code>diaeresis_char</code></dt>
<dd><a name="index-diaeresis_005fchar"></a>
<p>This field is initially the ASCII value of a double quote <kbd>&quot;</kbd>,
but it may also be the ASCII value of a colon <kbd>:</kbd>.  In <code>texte</code>
charset, some countries use double quotes to mark diaeresis, while other
countries prefer colons.  This field contains the diaeresis character
for the <code>texte</code> charset.
</p>
</dd>
<dt><code>make_header_flag</code></dt>
<dd><a name="index-make_005fheader_005fflag"></a>
<p>This field is initially <code>false</code>.  When set to <code>true</code>, it
indicates that the program is merely trying to produce a recoding table in
source form rather than completing any actual recoding.  In such a case,
the optimisation of step sequence can be attempted much more aggressively.
If the step sequence cannot be reduced to a single step, table production
will fail.
</p>
</dd>
<dt><code>diacritics_only</code></dt>
<dd><a name="index-diacritics_005fonly"></a>
<p>This field is initially <code>false</code>.  For <code>HTML</code> and <code>LaTeX</code>
charset, it is often convenient to recode the diacriticized characters
only, while just not recoding other HTML code using ampersands or angular
brackets, or LaTeX code using backslashes.  Set the field to <code>true</code>
for getting this behaviour.  In the other charset, one can edit text as
well as HTML or LaTeX directives.
</p>
</dd>
<dt><code>ascii_graphics</code></dt>
<dd><a name="index-ascii_005fgraphics"></a>
<p>This field is initially <code>false</code>, and relate to characters 176 to
223 in the <code>ibmpc</code> charset, which are use to draw boxes.  When set
to <code>true</code>, while getting out of <code>ibmpc</code>, ASCII characters are
selected so to graphically approximate these boxes.
</p></dd>
</dl>

</li><li> Study of request strings

<div class="example">
<pre class="example">bool recode_scan_request (<var>request</var>, &quot;<var>string</var>&quot;);
</pre></div>

<a name="index-recode_005fscan_005frequest"></a>
<p>The main role of a <var>request</var> variable is to describe a set of
recoding transformations.  Function <code>recode_scan_request</code> studies
the given <var>string</var>, and stores an internal representation of it into
<var>request</var>.  Note that <var>string</var> may be a full-fledged <code>recode</code>
request, possibly including surfaces specifications, intermediary
charsets, sequences, aliases or abbreviations (see <a href="Invoking-recode.html#Requests">Requests</a>).
</p>
<p>The internal representation automatically receives some pre-conditioning
and optimisation, so the <var>request</var> may then later be used many times
to achieve many actual recodings.  It would not be efficient calling
<code>recode_scan_request</code> many times with the same <var>string</var>, it is
better having many <var>request</var> variables instead.
</p>
</li><li> Actual recoding jobs

<p>Once the <var>request</var> variable holds the description of a recoding
transformation, a few functions use it for achieving an actual recoding.
Either input or output of a recoding may be string, an in-memory buffer,
or a file.
</p>
<p>Functions with names like
<code>recode_<var>input-type</var>_to_<var>output-type</var></code> request an actual
recoding, and are described below.  It is easy to remember which arguments
each function accepts, once grasped some simple principles for each
possible <var>type</var>.  However, one of the recoding function escapes these
principles and is discussed separately, first.
</p>
<div class="example">
<pre class="example">recode_string (<var>request</var>, <var>string</var>);
</pre></div>

<a name="index-recode_005fstring"></a>
<p>The function <code>recode_string</code> recodes <var>string</var> according
to <var>request</var>, and directly returns the resulting recoded string
freshly allocated, or <code>NULL</code> if the recoding could not succeed for
some reason.  When this function is used, it is the responsibility of
the programmer to ensure that the memory used by the returned string is
later reclaimed.
</p>
<a name="index-recode_005fstring_005fto_005fbuffer"></a>
<a name="index-recode_005fstring_005fto_005ffile"></a>
<a name="index-recode_005fbuffer_005fto_005fbuffer"></a>
<a name="index-recode_005fbuffer_005fto_005ffile"></a>
<a name="index-recode_005ffile_005fto_005fbuffer"></a>
<a name="index-recode_005ffile_005fto_005ffile"></a>
<div class="example">
<pre class="example">char *recode_string_to_buffer (<var>request</var>,
  <var>input_string</var>,
  &amp;<var>output_buffer</var>, &amp;<var>output_length</var>, &amp;<var>output_allocated</var>);
bool recode_string_to_file (<var>request</var>,
  <var>input_file</var>,
  <var>output_file</var>);
bool recode_buffer_to_buffer (<var>request</var>,
  <var>input_buffer</var>, <var>input_length</var>,
  &amp;<var>output_buffer</var>, &amp;<var>output_length</var>, &amp;<var>output_allocated</var>);
bool recode_buffer_to_file (<var>request</var>,
  <var>input_buffer</var>, <var>input_length</var>,
  <var>output_file</var>);
bool recode_file_to_buffer (<var>request</var>,
  <var>input_file</var>,
  &amp;<var>output_buffer</var>, &amp;<var>output_length</var>, &amp;<var>output_allocated</var>);
bool recode_file_to_file (<var>request</var>,
  <var>input_file</var>,
  <var>output_file</var>);
</pre></div>

<p>All these functions return a <code>bool</code> result, <code>false</code> meaning that
the recoding was not successful, often because of reversibility issues.
The name of the function well indicates on which types it reads and which
type it produces.  Let&rsquo;s discuss these three types in turn.
</p>
<dl compact="compact">
<dt>string</dt>
<dd>
<p>A string is merely an in-memory buffer which is terminated by a <code>NUL</code>
character (using as many bytes as needed), instead of being described
by a byte length.  For input, a pointer to the buffer is given through
one argument.
</p>
<p>It is notable that there is no <code>to_string</code> functions.  Only one
function recodes into a string, and it is <code>recode_string</code>, which
has already been discussed separately, above.
</p>
</dd>
<dt>buffer</dt>
<dd>
<p>A buffer is a sequence of bytes held in computer memory.  For input, two
arguments provide a pointer to the start of the buffer and its byte size.
Note that for charsets using many bytes per character, the size is given
in bytes, not in characters.
</p>
<p>For output, three arguments provide the address of three variables, which
will receive the buffer pointer, the used buffer size in bytes, and the
allocated buffer size in bytes.  If at the time of the call, the buffer
pointer is <code>NULL</code>, then the allocated buffer size should also be zero,
and the buffer will be allocated afresh by the recoding functions.  However,
if the buffer pointer is not <code>NULL</code>, it should be already allocated,
the allocated buffer size then gives its size.  If the allocated size
gets exceeded while the recoding goes, the buffer will be automatically
reallocated bigger, probably elsewhere, and the allocated buffer size will
be adjusted accordingly.
</p>
<p>The second variable, giving the in-memory buffer size, will receive the
exact byte size which was needed for the recoding.  A <code>NUL</code> character
is guaranteed at the end of the produced buffer, but is not counted in the
byte size of the recoding.  Beyond that <code>NUL</code>, there might be some
extra space after the recoded data, extending to the allocated buffer size.
</p>
</dd>
<dt>file</dt>
<dd>
<a name="index-recode_005ffilter_005fopen_002c-not-available"></a>
<a name="index-recode_005ffilter_005fclose_002c-not-available"></a>
<p>A file is a sequence of bytes held outside computer memory, but
buffered through it.  For input, one argument provides a pointer to a
file already opened for read.  The file is then read and recoded from its
current position until the end of the file, effectively swallowing it in
memory if the destination of the recoding is a buffer.  For reading a file
filtered through the recoding library, but only a little bit at a time, one
should rather use <code>recode_filter_open</code> and <code>recode_filter_close</code>
(these two functions are not yet available).
</p>
<p>For output, one argument provides a pointer to a file already opened
for write.  The result of the recoding is written to that file starting
at its current position.
</p></dd>
</dl>
</li></ul>

<a name="index-recode_005fformat_005ftable"></a>
<p>The following special function is still subject to change:
</p>
<div class="example">
<pre class="example">void recode_format_table (<var>request</var>, <var>language</var>, &quot;<var>name</var>&quot;);
</pre></div>

<p>and is not documented anymore for now.
</p>
<hr>
<a name="Task-level"></a>
<div class="header">
<p>
Next: <a href="#Charset-level" accesskey="n" rel="next">Charset level</a>, Previous: <a href="#Request-level" accesskey="p" rel="previous">Request level</a>, Up: <a href="#Library" accesskey="u" rel="up">Library</a> &nbsp; [<a href="Charset-and-Surface-Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Task-level-functions"></a>
<h3 class="section">4.3 Task level functions</h3>
<a name="index-task-level-functions"></a>

<p>The task level functions are used internally by the request level
functions, they allow more explicit control over files and memory
buffers holding input and output to recoding processes.  The interface
specification of task level functions is still subject to change a bit.
</p>
<p>To get started with task level functions, here is a full example of a
program which sole job is to filter <code>ibmpc</code> code on its standard input
into <code>latin1</code> code on its standard output.  That is, this program has
the same goal as the one from the previous section, but does its things
a bit differently.
</p>
<div class="example">
<pre class="example">#include &lt;stdio.h&gt;
#include &lt;stdbool.h&gt;
#include &lt;recodext.h&gt;

const char *program_name;

int
main (int argc, char *const *argv)
{
  program_name = argv[0];
  RECODE_OUTER outer = recode_new_outer (false);
  RECODE_REQUEST request = recode_new_request (outer);
  RECODE_TASK task;
  bool success;

  recode_scan_request (request, &quot;ibmpc..latin1&quot;);

  task = recode_new_task (request);
  task-&gt;input.file = &quot;&quot;;
  task-&gt;output.file = &quot;&quot;;
  success = recode_perform_task (task);

  recode_delete_task (task);
  recode_delete_request (request);
  recode_delete_outer (outer);

  exit (success ? 0 : 1);
}
</pre></div>

<a name="index-RECODE_005fTASK-structure"></a>
<p>The header file <code>&lt;recode.h&gt;</code> declares a <code>RECODE_TASK</code>
structure, which the programmer should use for allocating a variable in
his program.  This <code>task</code> variable is given as a first argument to
all task level functions.  The programmer ought to change and possibly
consult a few fields in this structure, using special functions.
</p>
<ul>
<li> Initialisation functions
<a name="index-initialisation-functions_002c-task"></a>

<a name="index-recode_005fnew_005ftask"></a>
<a name="index-recode_005fdelete_005ftask"></a>
<div class="example">
<pre class="example">RECODE_TASK recode_new_task (<var>request</var>);
bool recode_delete_task (<var>task</var>);
</pre></div>

<p>No <var>task</var> variable may be used in other task level functions
of the recoding library without having first been initialised with
<code>recode_new_task</code>.  There may be many such <var>task</var> variables,
in which case, they are independent of one another and they all need to be
initialised separately.  To avoid memory leaks, a <var>task</var> variable should
not be initialised a second time without calling <code>recode_delete_task</code> to
&ldquo;un-initialise&rdquo; it.  This function also accepts a <var>request</var> argument
and associates the request to the task.  In fact, a task is essentially
a set of recoding transformations with the specification for its current
input and its current output.
</p>
<p>The <var>request</var> variable may be scanned before or after the call to
<code>recode_new_task</code>, it does not matter so far.  Immediately after
initialisation, before further changes, the <var>task</var> variable associates
<var>request</var> empty in-memory buffers for both input and output.
The output buffer will later get allocated automatically on the fly,
as needed, by various task processors.
</p>
<p>Even if a call to <code>recode_delete_task</code> is not strictly mandatory
before ending the program, it is cleaner to always include it.  Moreover,
in some future version of the recoding library, it might become required.
</p>
</li><li> Fields of <code>struct task_request</code>
<a name="index-task_005frequest-structure"></a>

<p>Here are the fields of a <code>struct task_request</code> which may be meaningfully
changed, once a <var>task</var> has been initialised by <code>recode_new_task</code>.
In fact, fields are expected to change.  Once again, to access the fields,
you need to include <samp>recodext.h</samp> <em>instead</em> of <samp>recode.h</samp>,
in which case there also is a greater chance that you need to recompile
your programs if a new version of the recoding library gets installed.
</p>
<dl compact="compact">
<dt><code>request</code></dt>
<dd>
<p>The field <code>request</code> points to the current recoding request, but may
be changed as needed between recoding calls, for example when there is
a need to achieve the construction of a resulting text made up of many
pieces, each being recoded differently.
</p>
</dd>
<dt><code>input.name</code></dt>
<dt><code>input.file</code></dt>
<dd>
<p>If <code>input.name</code> is not <code>NULL</code> at start of a recoding, this is
a request that a file by that name be first opened for reading and later
automatically closed once the whole file has been read. If the file name is
not <code>NULL</code> but an empty string, it means that standard input is to
be used.  The opened file pointer is then held into <code>input.file</code>.
</p>
<p>If <code>input.name</code> is <code>NULL</code> and <code>input.file</code> is not, than
<code>input.file</code> should point to a file already opened for read, which
is meant to be recoded.
</p>
</dd>
<dt><code>input.buffer</code></dt>
<dt><code>input.cursor</code></dt>
<dt><code>input.limit</code></dt>
<dd>
<p>When both <code>input.name</code> and <code>input.file</code> are <code>NULL</code>, three
pointers describe an in-memory buffer containing the text to be recoded.
The buffer extends from <code>input.buffer</code> to <code>input.limit</code>,
yet the text to be recoded only extends from <code>input.cursor</code> to
<code>input.limit</code>.  In most situations, <code>input.cursor</code> starts with
the value that <code>input.buffer</code> has.  (Its value will internally advance
as the recoding goes, until it reaches the value of <code>input.limit</code>.)
</p>
</dd>
<dt><code>output.name</code></dt>
<dt><code>output.file</code></dt>
<dd>
<p>If <code>output.name</code> is not <code>NULL</code> at start of a recoding, this
is a request that a file by that name be opened for write and later
automatically closed after the recoding is done.  If the file name is
not <code>NULL</code> but an empty string, it means that standard output is to
be used.  The opened file pointer is then held into <code>output.file</code>.
If several passes with intermediate files are needed to produce the
recoding, the <code>output.name</code> file is opened only for the final pass.
</p>
<p>If <code>output.name</code> is <code>NULL</code> and <code>output.file</code> is not, then
<code>output.file</code> should point to a file already opened for write, which
will receive the result of the recoding.
</p>
</dd>
<dt><code>output.buffer</code></dt>
<dt><code>output.cursor</code></dt>
<dt><code>output.limit</code></dt>
<dd>
<p>When both <code>output.name</code> and <code>output.file</code> are <code>NULL</code>, three
pointers describe an in-memory buffer meant to receive the text, once it
is recoded.  The buffer is already allocated from <code>output.buffer</code>
to <code>output.limit</code>.  In most situations, <code>output.cursor</code> starts
with the value that <code>output.buffer</code> has.  Once the recoding is done,
<code>output.cursor</code> will point at the next free byte in the buffer,
just after the recoded text, so another recoding could be called without
changing any of these three pointers, for appending new information to it.
The number of recoded bytes in the buffer is the difference between
<code>output.cursor</code> and <code>output.buffer</code>.
</p>
<p>Each time <code>output.cursor</code> reaches <code>output.limit</code>, the buffer
is reallocated bigger, possibly at a different location in memory, always
held up-to-date in <code>output.buffer</code>.  It is still possible to call a
task level function with no output buffer at all to start with, in which
case all three fields should have <code>NULL</code> as a value.  This is the
situation immediately after a call to <code>recode_new_task</code>.
</p>
</dd>
<dt><code>strategy</code></dt>
<dd><a name="index-strategy"></a>
<a name="index-RECODE_005fSTRATEGY_005fUNDECIDED"></a>
<p>This field, which is of type <code>enum recode_sequence_strategy</code>, tells
how various recoding steps (passes) will be interconnected.  Its initial
value is <code>RECODE_STRATEGY_UNDECIDED</code>, which is a constant defined in
the header file <samp>&lt;recodext.h&gt;</samp>.  Other possible values are:
</p>
<dl compact="compact">
<dt><code>RECODE_SEQUENCE_IN_MEMORY</code></dt>
<dd><a name="index-RECODE_005fSEQUENCE_005fIN_005fMEMORY"></a>
<p>Keep intermediate recodings in memory.
</p></dd>
<dt><code>RECODE_SEQUENCE_WITH_FILES</code></dt>
<dd><a name="index-RECODE_005fSEQUENCE_005fWITH_005fFILES"></a>
<p>Do not fork, use intermediate files.
</p></dd>
<dt><code>RECODE_SEQUENCE_WITH_PIPE</code></dt>
<dd><a name="index-RECODE_005fSEQUENCE_005fWITH_005fPIPE"></a>
<p>Fork processes connected with <code>pipe(2)</code>.
</p></dd>
</dl>

<p>The best for now is to leave this field alone, and let the recoding
library decide its strategy, as many combinations have not been tested yet.
</p>
</dd>
<dt><code>byte_order_mark</code></dt>
<dd><a name="index-byte_005forder_005fmark"></a>
<p>This field, which is preset to <code>true</code>, indicates that a byte order
mark is to be expected at the beginning of any canonical <code>UCS-2</code>
or <code>UTF-16</code> text, and that such a byte order mark should be also
produced for these charsets.
</p>
</dd>
<dt><code>fail_level</code></dt>
<dd><a name="index-fail_005flevel"></a>
<p>This field, which is of type <code>enum recode_error</code> (see <a href="#Errors">Errors</a>),
sets the error level at which task level functions should report a failure.
If an error being detected is equal or greater than <code>fail_level</code>,
the function will eventually return <code>false</code> instead of <code>true</code>.
The preset value for this field is <code>RECODE_NOT_CANONICAL</code>, that means
that if not reset to another value, the library will report failure on
<em>any</em> error.
</p>
</dd>
<dt><code>abort_level</code></dt>
<dd><a name="index-abort_005flevel"></a>
<a name="index-RECODE_005fMAXIMUM_005fERROR"></a>
<p>This field, which is of type <code>enum recode_error</code> (see <a href="#Errors">Errors</a>), sets
the error level at which task level functions should immediately interrupt
their processing.  If an error being detected is equal or greater than
<code>abort_level</code>, the function returns immediately, but the returned
value (<code>true</code> or <code>false</code>) is still is decided from the setting
of <code>fail_level</code>, not <code>abort_level</code>.  The preset value for this
field is <code>RECODE_MAXIMUM_ERROR</code>, that means that is not reset to
another value, the library will never interrupt a recoding task.
</p>
</dd>
<dt><code>error_so_far</code></dt>
<dd><a name="index-error_005fso_005ffar"></a>
<p>This field, which is of type <code>enum recode_error</code> (see <a href="#Errors">Errors</a>),
maintains the maximum error level met so far while the recoding task
was proceeding.  The preset value is <code>RECODE_NO_ERROR</code>.
</p></dd>
</dl>

</li><li> Task execution
<a name="index-task-execution"></a>

<a name="index-recode_005fperform_005ftask"></a>
<a name="index-recode_005ffilter_005fopen"></a>
<a name="index-recode_005ffilter_005fclose"></a>
<div class="example">
<pre class="example">recode_perform_task (<var>task</var>);
recode_filter_open (<var>task</var>, <var>file</var>);
recode_filter_close (<var>task</var>);
</pre></div>

<p>The function <code>recode_perform_task</code> reads as much input as possible,
and recode all of it on prescribed output, given a properly initialised
<var>task</var>.
</p>
<p>Functions <code>recode_filter_open</code> and <code>recode_filter_close</code> are
only planned for now.  They are meant to read input in piecemeal ways.
Even if functionality already exists informally in the library, it has
not been made available yet through such interface functions.
</p></li></ul>

<hr>
<a name="Charset-level"></a>
<div class="header">
<p>
Next: <a href="#Errors" accesskey="n" rel="next">Errors</a>, Previous: <a href="#Task-level" accesskey="p" rel="previous">Task level</a>, Up: <a href="#Library" accesskey="u" rel="up">Library</a> &nbsp; [<a href="Charset-and-Surface-Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Charset-level-functions"></a>
<h3 class="section">4.4 Charset level functions</h3>
<a name="index-charset-level-functions"></a>

<a name="index-internal-functions"></a>
<p>Many functions are internal to the recoding library.  Some of them
have been made external and available, for the <code>recode</code> program
had to retain all its previous functionality while being transformed
into a mere application of the recoding library.  These functions are
not really documented here for the time being, as we hope that many of
them will vanish over time.  When this set of routines will stabilise,
it would be convenient to document them as an API for handling charset
names and contents.
</p>
<a name="index-find_005fcharset"></a>
<a name="index-list_005fall_005fcharsets"></a>
<a name="index-list_005fconcise_005fcharset"></a>
<a name="index-list_005ffull_005fcharset"></a>
<div class="example">
<pre class="example">RECODE_CHARSET find_charset (<var>name</var>, <var>cleaning-type</var>);
bool list_all_charsets (<var>charset</var>);
bool list_concise_charset (<var>charset</var>, <var>list-format</var>);
bool list_full_charset (<var>charset</var>);
</pre></div>

<hr>
<a name="Errors"></a>
<div class="header">
<p>
Previous: <a href="#Charset-level" accesskey="p" rel="previous">Charset level</a>, Up: <a href="#Library" accesskey="u" rel="up">Library</a> &nbsp; [<a href="Charset-and-Surface-Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Handling-errors"></a>
<h3 class="section">4.5 Handling errors</h3>
<a name="index-error-handling"></a>
<a name="index-handling-errors"></a>

<a name="index-error-messages"></a>
<p>The <code>recode</code> program, while using the <code>recode</code> library, needs to
control whether recoding problems are reported or not, and then reflect
these in the exit status.  The program should also instruct the library
whether the recoding should be abruptly interrupted when an error is
met (so sparing processing when it is known in advance that a wrong
result would be discarded anyway), or if it should proceed nevertheless.
Here is how the library groups errors into levels, listed here in order
of increasing severity.
</p>
<dl compact="compact">
<dt><code>RECODE_NO_ERROR</code></dt>
<dd><a name="index-RECODE_005fNO_005fERROR"></a>

<p>No error was met on previous library calls.
</p>
</dd>
<dt><code>RECODE_NOT_CANONICAL</code></dt>
<dd><a name="index-RECODE_005fNOT_005fCANONICAL"></a>
<a name="index-non-canonical-input_002c-error-message"></a>

<p>The input text was using one of the many alternative codings for some
phenomenon, but not the one <code>recode</code> would have canonically generated.
So, if the reverse recoding is later attempted, it would produce a text
having the same <em>meaning</em> as the original text, yet not being byte
identical.
</p>
<p>For example, a <code>Base64</code> block in which end-of-lines appear elsewhere
that at every 76 characters is not canonical.  An e-circumflex in TeX
which is coded as &lsquo;<samp>\^{e}</samp>&rsquo; instead of &lsquo;<samp>\^e</samp>&rsquo; is not canonical.
</p>
</dd>
<dt><code>RECODE_AMBIGUOUS_OUTPUT</code></dt>
<dd><a name="index-RECODE_005fAMBIGUOUS_005fOUTPUT"></a>
<a name="index-ambiguous-output_002c-error-message"></a>

<p>It has been discovered that if the reverse recoding was attempted on
the text output by this recoding, we would not obtain the original text,
only because an ambiguity was generated by accident in the output text.
This ambiguity would then cause the wrong interpretation to be taken.
</p>
<p>Here are a few examples.  If the <code>Latin-1</code> sequence &lsquo;<samp>e^</samp>&rsquo;
is converted to Easy French and back, the result will be interpreted
as e-circumflex and so, will not reflect the intent of the original two
characters.  Recoding an <code>IBM-PC</code> text to <code>Latin-1</code> and back,
where the input text contained an isolated <kbd>LF</kbd>, will have a spurious
<kbd>CR</kbd> inserted before the <kbd>LF</kbd>.
</p>
<p>Currently, there are many cases in the library where the production of
ambiguous output is not properly detected, as it is sometimes a difficult
problem to accomplish this detection, or to do it speedily.
</p>
</dd>
<dt><code>RECODE_UNTRANSLATABLE</code></dt>
<dd><a name="index-RECODE_005fUNTRANSLATABLE"></a>
<a name="index-untranslatable-input_002c-error-message"></a>

<p>One or more input character could not be recoded, because there is just
no representation for this character in the output charset.
</p>
<p>Here are a few examples.  Non-strict mode often allows <code>recode</code> to
compute on-the-fly mappings for unrepresentable characters, but strict
mode prohibits such attribution of reversible translations: so strict
mode might often trigger such an error.  Most <code>UCS-2</code> codes used to
represent Asian characters cannot be expressed in various Latin charsets.
</p>
</dd>
<dt><code>RECODE_INVALID_INPUT</code></dt>
<dd><a name="index-RECODE_005fINVALID_005fINPUT"></a>
<a name="index-invalid-input_002c-error-message"></a>

<p>The input text does not comply with the coding it is declared to hold.  So,
there is no way by which a reverse recoding would reproduce this text,
because <code>recode</code> should never produce invalid output.
</p>
<p>Here are a few examples.  In strict mode, <code>ASCII</code> text is not allowed
to contain characters with the eight bit set.  <code>UTF-8</code> encodings
ought to be minimal<a name="DOCF7" href="#FOOT7"><sup>7</sup></a>.
</p>
</dd>
<dt><code>RECODE_SYSTEM_ERROR</code></dt>
<dd><a name="index-RECODE_005fSYSTEM_005fERROR"></a>
<a name="index-system-detected-problem_002c-error-message"></a>

<p>The underlying system reported an error while the recoding was going on,
likely an input/output error.
(This error symbol is currently unused in the library.)
</p>
</dd>
<dt><code>RECODE_USER_ERROR</code></dt>
<dd><a name="index-RECODE_005fUSER_005fERROR"></a>
<a name="index-misuse-of-recoding-library_002c-error-message"></a>

<p>The programmer or user requested something the recoding library is unable
to provide, or used the API wrongly.
(This error symbol is currently unused in the library.)
</p>
</dd>
<dt><code>RECODE_INTERNAL_ERROR</code></dt>
<dd><a name="index-RECODE_005fINTERNAL_005fERROR"></a>
<a name="index-internal-recoding-bug_002c-error-message"></a>

<p>Something really wrong, which should normally never happen, was detected
within the recoding library.  This might be due to genuine bugs in the
library, or maybe due to un-initialised or overwritten arguments to
the API.
(This error symbol is currently unused in the library.)
</p>
</dd>
<dt><code>RECODE_MAXIMUM_ERROR</code></dt>
<dd><a name="index-RECODE_005fMAXIMUM_005fERROR-1"></a>

<p>This error code should never be returned, it is only internally used as
a sentinel for the list of all possible error codes.
</p></dd>
</dl>

<a name="index-error-level-threshold"></a>
<a name="index-threshold-for-error-reporting"></a>
<p>One should be able to set the error level threshold for returning failure
at end of recoding, and also the threshold for immediate interruption.
If many errors occur while the recoding proceed, which are not severe
enough to interrupt the recoding, then the most severe error is retained,
while others are forgotten<a name="DOCF8" href="#FOOT8"><sup>8</sup></a>.  So, in case of an error,
the possible actions currently are:
</p>
<ul>
<li> do nothing and let go, returning success at end of recoding,
</li><li> just let go for now, but return failure at end of recoding,
</li><li> interrupt recoding right away and return failure now.
</li></ul>

<p>See <a href="#Task-level">Task level</a>, and particularly the description of the fields
<code>fail_level</code>, <code>abort_level</code> and <code>error_so_far</code>, for more
information about how errors are handled.
</p>

<div class="footnote">
<hr>
<h4 class="footnotes-heading">Footnotes</h4>

<h3><a name="FOOT7" href="#DOCF7">(7)</a></h3>
<p>The minimality of an <code>UTF-8</code> encoding
is guaranteed on output, but currently, it is not checked on input.</p>
<h3><a name="FOOT8" href="#DOCF8">(8)</a></h3>
<p>Another approach would have been
to define the level symbols as masks instead, and to give masks to
threshold setting routines, and to retain all errors&mdash;yet I never
met myself such a need in practice, and so I fear it would be overkill.
On the other hand, it might be interesting to maintain counters about
how many times each kind of error occurred.</p>
</div>
<hr>
<div class="header">
<p>
Previous: <a href="#Charset-level" accesskey="p" rel="previous">Charset level</a>, Up: <a href="#Library" accesskey="u" rel="up">Library</a> &nbsp; [<a href="Charset-and-Surface-Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>



</body>
</html>