/usr/share/doc/recode-doc/Library.html is in recode-doc 3.6-21.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created by GNU Texinfo 5.1, http://www.gnu.org/software/texinfo/ -->
<head>
<title>The recode reference manual: Library</title>
<meta name="description" content="The recode reference manual: Library">
<meta name="keywords" content="The recode reference manual: Library">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link href="index.html#Top" rel="start" title="Top">
<link href="Concept-Index.html#Concept-Index" rel="index" title="Concept Index">
<link href="Charset-and-Surface-Index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="index.html#Top" rel="up" title="Top">
<link href="Universal.html#Universal" rel="next" title="Universal">
<link href="Invoking-recode.html#Debugging" rel="previous" title="Debugging">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.smallquotation {font-size: smaller}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
div.indentedblock {margin-left: 3.2em}
div.lisp {margin-left: 3.2em}
div.smalldisplay {margin-left: 3.2em}
div.smallexample {margin-left: 3.2em}
div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
div.smalllisp {margin-left: 3.2em}
kbd {font-style:oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: inherit; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: inherit; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.nocodebreak {white-space:nowrap}
span.nolinebreak {white-space:nowrap}
span.roman {font-family:serif; font-weight:normal}
span.sansserif {font-family:sans-serif; font-weight:normal}
ul.no-bullet {list-style: none}
-->
</style>
</head>
<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
<a name="Library"></a>
<div class="header">
<p>
Next: <a href="Universal.html#Universal" accesskey="n" rel="next">Universal</a>, Previous: <a href="Invoking-recode.html#Invoking-recode" accesskey="p" rel="previous">Invoking recode</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> [<a href="Charset-and-Surface-Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="A-recoding-library"></a>
<h2 class="chapter">4 A recoding library</h2>
<a name="index-recoding-library"></a>
<p>The program named <code>recode</code> is just an application of its recoding
library. The recoding library is available separately for other C
programs. A good way to acquire some familiarity with the recoding
library is to get acquainted with the <code>recode</code> program itself.
</p>
<p>To use the recoding library once it is installed, a C program needs to
have a line:
</p>
<div class="example">
<pre class="example">#include <recode.h>
</pre></div>
<p>near its beginning, and the user should have ‘<samp>-lrecode</samp>’ on the
linking call, so modules from the recoding library are found.
</p>
<p>The library is still under development. As it stands, it contains four
identifiable sets of routines: the outer level functions, the request
level functions, the task level functions and the charset level functions.
There are discussed in separate sections.
</p>
<p>For effectively using the recoding library in most applications, it should
be rarely needed to study anything beyond the main initialisation function
at outer level, and then, various functions at request level.
</p>
<table class="menu" border="0" cellspacing="0">
<tr><td align="left" valign="top">• <a href="#Outer-level" accesskey="1">Outer level</a>:</td><td> </td><td align="left" valign="top">Outer level functions
</td></tr>
<tr><td align="left" valign="top">• <a href="#Request-level" accesskey="2">Request level</a>:</td><td> </td><td align="left" valign="top">Request level functions
</td></tr>
<tr><td align="left" valign="top">• <a href="#Task-level" accesskey="3">Task level</a>:</td><td> </td><td align="left" valign="top">Task level functions
</td></tr>
<tr><td align="left" valign="top">• <a href="#Charset-level" accesskey="4">Charset level</a>:</td><td> </td><td align="left" valign="top">Charset level functions
</td></tr>
<tr><td align="left" valign="top">• <a href="#Errors" accesskey="5">Errors</a>:</td><td> </td><td align="left" valign="top">Handling errors
</td></tr>
</table>
<hr>
<a name="Outer-level"></a>
<div class="header">
<p>
Next: <a href="#Request-level" accesskey="n" rel="next">Request level</a>, Previous: <a href="#Library" accesskey="p" rel="previous">Library</a>, Up: <a href="#Library" accesskey="u" rel="up">Library</a> [<a href="Charset-and-Surface-Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Outer-level-functions"></a>
<h3 class="section">4.1 Outer level functions</h3>
<a name="index-outer-level-functions"></a>
<p>The outer level functions mainly prepare the whole recoding library for
use, or do actions which are unrelated to specific recodings. Here is
an example of a program which does not really make anything useful.
</p>
<div class="example">
<pre class="example">#include <stdbool.h>
#include <recode.h>
const char *program_name;
int
main (int argc, char *const *argv)
{
program_name = argv[0];
RECODE_OUTER outer = recode_new_outer (true);
recode_delete_outer (outer);
exit (0);
}
</pre></div>
<a name="index-RECODE_005fOUTER-structure"></a>
<p>The header file <code><recode.h></code> declares an opaque <code>RECODE_OUTER</code>
structure, which the programmer should use for allocating a variable in
his program (let’s assume the programmer is a male, here, no prejudice
intended). This ‘<samp>outer</samp>’ variable is given as a first argument to
all outer level functions.
</p>
<a name="index-stdbool_002eh-header"></a>
<a name="index-bool-data-type"></a>
<p>The <code><recode.h></code> header file uses the Boolean type setup by the
system header file <code><stdbool.h></code>. But this header file is still
fairly new in C standards, and likely does not exist everywhere. If you
system does not offer this system header file yet, the proper compilation
of the <code><recode.h></code> file could be guaranteed through the replacement
of the inclusion line by:
</p>
<div class="example">
<pre class="example">typedef enum {false = 0, true = 1} bool;
</pre></div>
<p>People wanting wider portability, or Autoconf lovers, might arrange their
<samp>configure.in</samp> for being able to write something more general, like:
</p>
<div class="example">
<pre class="example">#if STDC_HEADERS
# include <stdlib.h>
#endif
/* Some systems do not define EXIT_*, even with STDC_HEADERS. */
#ifndef EXIT_SUCCESS
# define EXIT_SUCCESS 0
#endif
#ifndef EXIT_FAILURE
# define EXIT_FAILURE 1
#endif
/* The following test is to work around the gross typo in systems like Sony
NEWS-OS Release 4.0C, whereby EXIT_FAILURE is defined to 0, not 1. */
#if !EXIT_FAILURE
# undef EXIT_FAILURE
# define EXIT_FAILURE 1
#endif
#if HAVE_STDBOOL_H
# include <stdbool.h>
#else
typedef enum {false = 0, true = 1} bool;
#endif
#include <recode.h>
const char *program_name;
int
main (int argc, char *const *argv)
{
program_name = argv[0];
RECODE_OUTER outer = recode_new_outer (true);
recode_term_outer (outer);
exit (EXIT_SUCCESS);
}
</pre></div>
<p>but we will not insist on such details in the examples to come.
</p>
<ul>
<li> Initialisation functions
<a name="index-initialisation-functions_002c-outer"></a>
<div class="example">
<pre class="example">RECODE_OUTER recode_new_outer (<var>auto_abort</var>);
bool recode_delete_outer (<var>outer</var>);
</pre></div>
<a name="index-recode_005fnew_005fouter"></a>
<a name="index-recode_005fdelete_005fouter"></a>
<p>The recoding library absolutely needs to be initialised before being used,
and <code>recode_new_outer</code> has to be called once, first. Besides the
<var>outer</var> it is meant to initialise, the function accepts a Boolean
argument whether or not the library should automatically issue diagnostics
on standard and abort the whole program on errors. When <var>auto_abort</var>
is <code>true</code>, the library later conveniently issues diagnostics itself,
and aborts the calling program on errors. This is merely a convenience,
because if this parameter was <code>false</code>, the calling program should always
take care of checking the return value of all other calls to the recoding
library functions, and when any error is detected, issue a diagnostic and
abort processing itself.
</p>
<p>Regardless of the setting of <var>auto_abort</var>, all recoding library
functions return a success status. Most functions are geared for returning
<code>false</code> for an error, and <code>true</code> if everything went fine.
Functions returning structures or strings return <code>NULL</code> instead
of the result, when the result cannot be produced. If <var>auto_abort</var>
is selected, functions either return <code>true</code>, or do not return at all.
</p>
<p>As in the example above, <code>recode_new_outer</code> is called only once in
most cases. Calling <code>recode_new_outer</code> implies some overhead, so
calling it more than once should preferably be avoided.
</p>
<p>The termination function <code>recode_delete_outer</code> reclaims the memory
allocated by <code>recode_new_outer</code> for a given <var>outer</var> variable.
Calling <code>recode_delete_outer</code> prior to program termination is more
aesthetic then useful, as all memory resources are automatically reclaimed
when the program ends. You may spare this terminating call if you prefer.
</p>
</li><li> The <code>program_name</code> declaration
<a name="index-program_005fname-variable"></a>
<p>As we just explained, the user may set the <code>recode</code> library so that,
in case of problems error, it issues the diagnostic itself and aborts the
whole processing. This capability may be quite convenient. When this
feature is used, the aborting routine includes the name of the running
program in the diagnostic. On the other hand, when this feature is not
used, the library merely return error codes, giving the library user fuller
control over all this. This behaviour is more like what usual libraries
do: they return codes and never abort. However, I would rather not force
library users to necessarily check all return codes themselves, by leaving
no other choice. In most simple applications, letting the library diagnose
and abort is much easier, and quite welcome. This is precisely because
both possibilities exist that the <code>program_name</code> variable is needed: it
may be used by the library <em>when</em> the user sets it to diagnose itself.
</p></li></ul>
<hr>
<a name="Request-level"></a>
<div class="header">
<p>
Next: <a href="#Task-level" accesskey="n" rel="next">Task level</a>, Previous: <a href="#Outer-level" accesskey="p" rel="previous">Outer level</a>, Up: <a href="#Library" accesskey="u" rel="up">Library</a> [<a href="Charset-and-Surface-Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Request-level-functions"></a>
<h3 class="section">4.2 Request level functions</h3>
<a name="index-request-level-functions"></a>
<p>The request level functions are meant to cover most recoding needs
programmers may have; they should provide all usual functionality.
Their API is almost stable by now. To get started with request level
functions, here is a full example of a program which sole job is to filter
<code>ibmpc</code> code on its standard input into <code>latin1</code> code on its
standard output.
</p>
<div class="example">
<pre class="example">#include <stdio.h>
#include <stdbool.h>
#include <recode.h>
const char *program_name;
int
main (int argc, char *const *argv)
{
program_name = argv[0];
RECODE_OUTER outer = recode_new_outer (true);
RECODE_REQUEST request = recode_new_request (outer);
bool success;
recode_scan_request (request, "ibmpc..latin1");
success = recode_file_to_file (request, stdin, stdout);
recode_delete_request (request);
recode_delete_outer (outer);
exit (success ? 0 : 1);
}
</pre></div>
<a name="index-RECODE_005fREQUEST-structure"></a>
<p>The header file <code><recode.h></code> declares a <code>RECODE_REQUEST</code> structure,
which the programmer should use for allocating a variable in his program.
This <var>request</var> variable is given as a first argument to all request
level functions, and in most cases, may be considered as opaque.
</p>
<ul>
<li> Initialisation functions
<a name="index-initialisation-functions_002c-request"></a>
<div class="example">
<pre class="example">RECODE_REQUEST recode_new_request (<var>outer</var>);
bool recode_delete_request (<var>request</var>);
</pre></div>
<a name="index-recode_005fnew_005frequest"></a>
<a name="index-recode_005fdelete_005frequest"></a>
<p>No <var>request</var> variable may not be used in other request level
functions of the recoding library before having been initialised by
<code>recode_new_request</code>. There may be many such <var>request</var>
variables, in which case, they are independent of one another and
they all need to be initialised separately. To avoid memory leaks, a
<var>request</var> variable should not be initialised a second time without
calling <code>recode_delete_request</code> to “un-initialise” it.
</p>
<p>Like for <code>recode_delete_outer</code>, calling <code>recode_delete_request</code>
prior to program termination, in the example above, may be left out.
</p>
</li><li> Fields of <code>struct recode_request</code>
<a name="index-recode_005frequest-structure"></a>
<p>Here are the fields of a <code>struct recode_request</code> which may be
meaningfully changed, once a <var>request</var> has been initialised by
<code>recode_new_request</code>, but before it gets used. It is not very frequent,
in practice, that these fields need to be changed. To access the fields,
you need to include <samp>recodext.h</samp> <em>instead</em> of <samp>recode.h</samp>,
in which case there also is a greater chance that you need to recompile
your programs if a new version of the recoding library gets installed.
</p>
<dl compact="compact">
<dt><code>verbose_flag</code></dt>
<dd><a name="index-verbose_005fflag"></a>
<p>This field is initially <code>false</code>. When set to <code>true</code>, the
library will echo to stderr the sequence of elementary recoding steps
needed to achieve the requested recoding.
</p>
</dd>
<dt><code>diaeresis_char</code></dt>
<dd><a name="index-diaeresis_005fchar"></a>
<p>This field is initially the ASCII value of a double quote <kbd>"</kbd>,
but it may also be the ASCII value of a colon <kbd>:</kbd>. In <code>texte</code>
charset, some countries use double quotes to mark diaeresis, while other
countries prefer colons. This field contains the diaeresis character
for the <code>texte</code> charset.
</p>
</dd>
<dt><code>make_header_flag</code></dt>
<dd><a name="index-make_005fheader_005fflag"></a>
<p>This field is initially <code>false</code>. When set to <code>true</code>, it
indicates that the program is merely trying to produce a recoding table in
source form rather than completing any actual recoding. In such a case,
the optimisation of step sequence can be attempted much more aggressively.
If the step sequence cannot be reduced to a single step, table production
will fail.
</p>
</dd>
<dt><code>diacritics_only</code></dt>
<dd><a name="index-diacritics_005fonly"></a>
<p>This field is initially <code>false</code>. For <code>HTML</code> and <code>LaTeX</code>
charset, it is often convenient to recode the diacriticized characters
only, while just not recoding other HTML code using ampersands or angular
brackets, or LaTeX code using backslashes. Set the field to <code>true</code>
for getting this behaviour. In the other charset, one can edit text as
well as HTML or LaTeX directives.
</p>
</dd>
<dt><code>ascii_graphics</code></dt>
<dd><a name="index-ascii_005fgraphics"></a>
<p>This field is initially <code>false</code>, and relate to characters 176 to
223 in the <code>ibmpc</code> charset, which are use to draw boxes. When set
to <code>true</code>, while getting out of <code>ibmpc</code>, ASCII characters are
selected so to graphically approximate these boxes.
</p></dd>
</dl>
</li><li> Study of request strings
<div class="example">
<pre class="example">bool recode_scan_request (<var>request</var>, "<var>string</var>");
</pre></div>
<a name="index-recode_005fscan_005frequest"></a>
<p>The main role of a <var>request</var> variable is to describe a set of
recoding transformations. Function <code>recode_scan_request</code> studies
the given <var>string</var>, and stores an internal representation of it into
<var>request</var>. Note that <var>string</var> may be a full-fledged <code>recode</code>
request, possibly including surfaces specifications, intermediary
charsets, sequences, aliases or abbreviations (see <a href="Invoking-recode.html#Requests">Requests</a>).
</p>
<p>The internal representation automatically receives some pre-conditioning
and optimisation, so the <var>request</var> may then later be used many times
to achieve many actual recodings. It would not be efficient calling
<code>recode_scan_request</code> many times with the same <var>string</var>, it is
better having many <var>request</var> variables instead.
</p>
</li><li> Actual recoding jobs
<p>Once the <var>request</var> variable holds the description of a recoding
transformation, a few functions use it for achieving an actual recoding.
Either input or output of a recoding may be string, an in-memory buffer,
or a file.
</p>
<p>Functions with names like
<code>recode_<var>input-type</var>_to_<var>output-type</var></code> request an actual
recoding, and are described below. It is easy to remember which arguments
each function accepts, once grasped some simple principles for each
possible <var>type</var>. However, one of the recoding function escapes these
principles and is discussed separately, first.
</p>
<div class="example">
<pre class="example">recode_string (<var>request</var>, <var>string</var>);
</pre></div>
<a name="index-recode_005fstring"></a>
<p>The function <code>recode_string</code> recodes <var>string</var> according
to <var>request</var>, and directly returns the resulting recoded string
freshly allocated, or <code>NULL</code> if the recoding could not succeed for
some reason. When this function is used, it is the responsibility of
the programmer to ensure that the memory used by the returned string is
later reclaimed.
</p>
<a name="index-recode_005fstring_005fto_005fbuffer"></a>
<a name="index-recode_005fstring_005fto_005ffile"></a>
<a name="index-recode_005fbuffer_005fto_005fbuffer"></a>
<a name="index-recode_005fbuffer_005fto_005ffile"></a>
<a name="index-recode_005ffile_005fto_005fbuffer"></a>
<a name="index-recode_005ffile_005fto_005ffile"></a>
<div class="example">
<pre class="example">char *recode_string_to_buffer (<var>request</var>,
<var>input_string</var>,
&<var>output_buffer</var>, &<var>output_length</var>, &<var>output_allocated</var>);
bool recode_string_to_file (<var>request</var>,
<var>input_file</var>,
<var>output_file</var>);
bool recode_buffer_to_buffer (<var>request</var>,
<var>input_buffer</var>, <var>input_length</var>,
&<var>output_buffer</var>, &<var>output_length</var>, &<var>output_allocated</var>);
bool recode_buffer_to_file (<var>request</var>,
<var>input_buffer</var>, <var>input_length</var>,
<var>output_file</var>);
bool recode_file_to_buffer (<var>request</var>,
<var>input_file</var>,
&<var>output_buffer</var>, &<var>output_length</var>, &<var>output_allocated</var>);
bool recode_file_to_file (<var>request</var>,
<var>input_file</var>,
<var>output_file</var>);
</pre></div>
<p>All these functions return a <code>bool</code> result, <code>false</code> meaning that
the recoding was not successful, often because of reversibility issues.
The name of the function well indicates on which types it reads and which
type it produces. Let’s discuss these three types in turn.
</p>
<dl compact="compact">
<dt>string</dt>
<dd>
<p>A string is merely an in-memory buffer which is terminated by a <code>NUL</code>
character (using as many bytes as needed), instead of being described
by a byte length. For input, a pointer to the buffer is given through
one argument.
</p>
<p>It is notable that there is no <code>to_string</code> functions. Only one
function recodes into a string, and it is <code>recode_string</code>, which
has already been discussed separately, above.
</p>
</dd>
<dt>buffer</dt>
<dd>
<p>A buffer is a sequence of bytes held in computer memory. For input, two
arguments provide a pointer to the start of the buffer and its byte size.
Note that for charsets using many bytes per character, the size is given
in bytes, not in characters.
</p>
<p>For output, three arguments provide the address of three variables, which
will receive the buffer pointer, the used buffer size in bytes, and the
allocated buffer size in bytes. If at the time of the call, the buffer
pointer is <code>NULL</code>, then the allocated buffer size should also be zero,
and the buffer will be allocated afresh by the recoding functions. However,
if the buffer pointer is not <code>NULL</code>, it should be already allocated,
the allocated buffer size then gives its size. If the allocated size
gets exceeded while the recoding goes, the buffer will be automatically
reallocated bigger, probably elsewhere, and the allocated buffer size will
be adjusted accordingly.
</p>
<p>The second variable, giving the in-memory buffer size, will receive the
exact byte size which was needed for the recoding. A <code>NUL</code> character
is guaranteed at the end of the produced buffer, but is not counted in the
byte size of the recoding. Beyond that <code>NUL</code>, there might be some
extra space after the recoded data, extending to the allocated buffer size.
</p>
</dd>
<dt>file</dt>
<dd>
<a name="index-recode_005ffilter_005fopen_002c-not-available"></a>
<a name="index-recode_005ffilter_005fclose_002c-not-available"></a>
<p>A file is a sequence of bytes held outside computer memory, but
buffered through it. For input, one argument provides a pointer to a
file already opened for read. The file is then read and recoded from its
current position until the end of the file, effectively swallowing it in
memory if the destination of the recoding is a buffer. For reading a file
filtered through the recoding library, but only a little bit at a time, one
should rather use <code>recode_filter_open</code> and <code>recode_filter_close</code>
(these two functions are not yet available).
</p>
<p>For output, one argument provides a pointer to a file already opened
for write. The result of the recoding is written to that file starting
at its current position.
</p></dd>
</dl>
</li></ul>
<a name="index-recode_005fformat_005ftable"></a>
<p>The following special function is still subject to change:
</p>
<div class="example">
<pre class="example">void recode_format_table (<var>request</var>, <var>language</var>, "<var>name</var>");
</pre></div>
<p>and is not documented anymore for now.
</p>
<hr>
<a name="Task-level"></a>
<div class="header">
<p>
Next: <a href="#Charset-level" accesskey="n" rel="next">Charset level</a>, Previous: <a href="#Request-level" accesskey="p" rel="previous">Request level</a>, Up: <a href="#Library" accesskey="u" rel="up">Library</a> [<a href="Charset-and-Surface-Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Task-level-functions"></a>
<h3 class="section">4.3 Task level functions</h3>
<a name="index-task-level-functions"></a>
<p>The task level functions are used internally by the request level
functions, they allow more explicit control over files and memory
buffers holding input and output to recoding processes. The interface
specification of task level functions is still subject to change a bit.
</p>
<p>To get started with task level functions, here is a full example of a
program which sole job is to filter <code>ibmpc</code> code on its standard input
into <code>latin1</code> code on its standard output. That is, this program has
the same goal as the one from the previous section, but does its things
a bit differently.
</p>
<div class="example">
<pre class="example">#include <stdio.h>
#include <stdbool.h>
#include <recodext.h>
const char *program_name;
int
main (int argc, char *const *argv)
{
program_name = argv[0];
RECODE_OUTER outer = recode_new_outer (false);
RECODE_REQUEST request = recode_new_request (outer);
RECODE_TASK task;
bool success;
recode_scan_request (request, "ibmpc..latin1");
task = recode_new_task (request);
task->input.file = "";
task->output.file = "";
success = recode_perform_task (task);
recode_delete_task (task);
recode_delete_request (request);
recode_delete_outer (outer);
exit (success ? 0 : 1);
}
</pre></div>
<a name="index-RECODE_005fTASK-structure"></a>
<p>The header file <code><recode.h></code> declares a <code>RECODE_TASK</code>
structure, which the programmer should use for allocating a variable in
his program. This <code>task</code> variable is given as a first argument to
all task level functions. The programmer ought to change and possibly
consult a few fields in this structure, using special functions.
</p>
<ul>
<li> Initialisation functions
<a name="index-initialisation-functions_002c-task"></a>
<a name="index-recode_005fnew_005ftask"></a>
<a name="index-recode_005fdelete_005ftask"></a>
<div class="example">
<pre class="example">RECODE_TASK recode_new_task (<var>request</var>);
bool recode_delete_task (<var>task</var>);
</pre></div>
<p>No <var>task</var> variable may be used in other task level functions
of the recoding library without having first been initialised with
<code>recode_new_task</code>. There may be many such <var>task</var> variables,
in which case, they are independent of one another and they all need to be
initialised separately. To avoid memory leaks, a <var>task</var> variable should
not be initialised a second time without calling <code>recode_delete_task</code> to
“un-initialise” it. This function also accepts a <var>request</var> argument
and associates the request to the task. In fact, a task is essentially
a set of recoding transformations with the specification for its current
input and its current output.
</p>
<p>The <var>request</var> variable may be scanned before or after the call to
<code>recode_new_task</code>, it does not matter so far. Immediately after
initialisation, before further changes, the <var>task</var> variable associates
<var>request</var> empty in-memory buffers for both input and output.
The output buffer will later get allocated automatically on the fly,
as needed, by various task processors.
</p>
<p>Even if a call to <code>recode_delete_task</code> is not strictly mandatory
before ending the program, it is cleaner to always include it. Moreover,
in some future version of the recoding library, it might become required.
</p>
</li><li> Fields of <code>struct task_request</code>
<a name="index-task_005frequest-structure"></a>
<p>Here are the fields of a <code>struct task_request</code> which may be meaningfully
changed, once a <var>task</var> has been initialised by <code>recode_new_task</code>.
In fact, fields are expected to change. Once again, to access the fields,
you need to include <samp>recodext.h</samp> <em>instead</em> of <samp>recode.h</samp>,
in which case there also is a greater chance that you need to recompile
your programs if a new version of the recoding library gets installed.
</p>
<dl compact="compact">
<dt><code>request</code></dt>
<dd>
<p>The field <code>request</code> points to the current recoding request, but may
be changed as needed between recoding calls, for example when there is
a need to achieve the construction of a resulting text made up of many
pieces, each being recoded differently.
</p>
</dd>
<dt><code>input.name</code></dt>
<dt><code>input.file</code></dt>
<dd>
<p>If <code>input.name</code> is not <code>NULL</code> at start of a recoding, this is
a request that a file by that name be first opened for reading and later
automatically closed once the whole file has been read. If the file name is
not <code>NULL</code> but an empty string, it means that standard input is to
be used. The opened file pointer is then held into <code>input.file</code>.
</p>
<p>If <code>input.name</code> is <code>NULL</code> and <code>input.file</code> is not, than
<code>input.file</code> should point to a file already opened for read, which
is meant to be recoded.
</p>
</dd>
<dt><code>input.buffer</code></dt>
<dt><code>input.cursor</code></dt>
<dt><code>input.limit</code></dt>
<dd>
<p>When both <code>input.name</code> and <code>input.file</code> are <code>NULL</code>, three
pointers describe an in-memory buffer containing the text to be recoded.
The buffer extends from <code>input.buffer</code> to <code>input.limit</code>,
yet the text to be recoded only extends from <code>input.cursor</code> to
<code>input.limit</code>. In most situations, <code>input.cursor</code> starts with
the value that <code>input.buffer</code> has. (Its value will internally advance
as the recoding goes, until it reaches the value of <code>input.limit</code>.)
</p>
</dd>
<dt><code>output.name</code></dt>
<dt><code>output.file</code></dt>
<dd>
<p>If <code>output.name</code> is not <code>NULL</code> at start of a recoding, this
is a request that a file by that name be opened for write and later
automatically closed after the recoding is done. If the file name is
not <code>NULL</code> but an empty string, it means that standard output is to
be used. The opened file pointer is then held into <code>output.file</code>.
If several passes with intermediate files are needed to produce the
recoding, the <code>output.name</code> file is opened only for the final pass.
</p>
<p>If <code>output.name</code> is <code>NULL</code> and <code>output.file</code> is not, then
<code>output.file</code> should point to a file already opened for write, which
will receive the result of the recoding.
</p>
</dd>
<dt><code>output.buffer</code></dt>
<dt><code>output.cursor</code></dt>
<dt><code>output.limit</code></dt>
<dd>
<p>When both <code>output.name</code> and <code>output.file</code> are <code>NULL</code>, three
pointers describe an in-memory buffer meant to receive the text, once it
is recoded. The buffer is already allocated from <code>output.buffer</code>
to <code>output.limit</code>. In most situations, <code>output.cursor</code> starts
with the value that <code>output.buffer</code> has. Once the recoding is done,
<code>output.cursor</code> will point at the next free byte in the buffer,
just after the recoded text, so another recoding could be called without
changing any of these three pointers, for appending new information to it.
The number of recoded bytes in the buffer is the difference between
<code>output.cursor</code> and <code>output.buffer</code>.
</p>
<p>Each time <code>output.cursor</code> reaches <code>output.limit</code>, the buffer
is reallocated bigger, possibly at a different location in memory, always
held up-to-date in <code>output.buffer</code>. It is still possible to call a
task level function with no output buffer at all to start with, in which
case all three fields should have <code>NULL</code> as a value. This is the
situation immediately after a call to <code>recode_new_task</code>.
</p>
</dd>
<dt><code>strategy</code></dt>
<dd><a name="index-strategy"></a>
<a name="index-RECODE_005fSTRATEGY_005fUNDECIDED"></a>
<p>This field, which is of type <code>enum recode_sequence_strategy</code>, tells
how various recoding steps (passes) will be interconnected. Its initial
value is <code>RECODE_STRATEGY_UNDECIDED</code>, which is a constant defined in
the header file <samp><recodext.h></samp>. Other possible values are:
</p>
<dl compact="compact">
<dt><code>RECODE_SEQUENCE_IN_MEMORY</code></dt>
<dd><a name="index-RECODE_005fSEQUENCE_005fIN_005fMEMORY"></a>
<p>Keep intermediate recodings in memory.
</p></dd>
<dt><code>RECODE_SEQUENCE_WITH_FILES</code></dt>
<dd><a name="index-RECODE_005fSEQUENCE_005fWITH_005fFILES"></a>
<p>Do not fork, use intermediate files.
</p></dd>
<dt><code>RECODE_SEQUENCE_WITH_PIPE</code></dt>
<dd><a name="index-RECODE_005fSEQUENCE_005fWITH_005fPIPE"></a>
<p>Fork processes connected with <code>pipe(2)</code>.
</p></dd>
</dl>
<p>The best for now is to leave this field alone, and let the recoding
library decide its strategy, as many combinations have not been tested yet.
</p>
</dd>
<dt><code>byte_order_mark</code></dt>
<dd><a name="index-byte_005forder_005fmark"></a>
<p>This field, which is preset to <code>true</code>, indicates that a byte order
mark is to be expected at the beginning of any canonical <code>UCS-2</code>
or <code>UTF-16</code> text, and that such a byte order mark should be also
produced for these charsets.
</p>
</dd>
<dt><code>fail_level</code></dt>
<dd><a name="index-fail_005flevel"></a>
<p>This field, which is of type <code>enum recode_error</code> (see <a href="#Errors">Errors</a>),
sets the error level at which task level functions should report a failure.
If an error being detected is equal or greater than <code>fail_level</code>,
the function will eventually return <code>false</code> instead of <code>true</code>.
The preset value for this field is <code>RECODE_NOT_CANONICAL</code>, that means
that if not reset to another value, the library will report failure on
<em>any</em> error.
</p>
</dd>
<dt><code>abort_level</code></dt>
<dd><a name="index-abort_005flevel"></a>
<a name="index-RECODE_005fMAXIMUM_005fERROR"></a>
<p>This field, which is of type <code>enum recode_error</code> (see <a href="#Errors">Errors</a>), sets
the error level at which task level functions should immediately interrupt
their processing. If an error being detected is equal or greater than
<code>abort_level</code>, the function returns immediately, but the returned
value (<code>true</code> or <code>false</code>) is still is decided from the setting
of <code>fail_level</code>, not <code>abort_level</code>. The preset value for this
field is <code>RECODE_MAXIMUM_ERROR</code>, that means that is not reset to
another value, the library will never interrupt a recoding task.
</p>
</dd>
<dt><code>error_so_far</code></dt>
<dd><a name="index-error_005fso_005ffar"></a>
<p>This field, which is of type <code>enum recode_error</code> (see <a href="#Errors">Errors</a>),
maintains the maximum error level met so far while the recoding task
was proceeding. The preset value is <code>RECODE_NO_ERROR</code>.
</p></dd>
</dl>
</li><li> Task execution
<a name="index-task-execution"></a>
<a name="index-recode_005fperform_005ftask"></a>
<a name="index-recode_005ffilter_005fopen"></a>
<a name="index-recode_005ffilter_005fclose"></a>
<div class="example">
<pre class="example">recode_perform_task (<var>task</var>);
recode_filter_open (<var>task</var>, <var>file</var>);
recode_filter_close (<var>task</var>);
</pre></div>
<p>The function <code>recode_perform_task</code> reads as much input as possible,
and recode all of it on prescribed output, given a properly initialised
<var>task</var>.
</p>
<p>Functions <code>recode_filter_open</code> and <code>recode_filter_close</code> are
only planned for now. They are meant to read input in piecemeal ways.
Even if functionality already exists informally in the library, it has
not been made available yet through such interface functions.
</p></li></ul>
<hr>
<a name="Charset-level"></a>
<div class="header">
<p>
Next: <a href="#Errors" accesskey="n" rel="next">Errors</a>, Previous: <a href="#Task-level" accesskey="p" rel="previous">Task level</a>, Up: <a href="#Library" accesskey="u" rel="up">Library</a> [<a href="Charset-and-Surface-Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Charset-level-functions"></a>
<h3 class="section">4.4 Charset level functions</h3>
<a name="index-charset-level-functions"></a>
<a name="index-internal-functions"></a>
<p>Many functions are internal to the recoding library. Some of them
have been made external and available, for the <code>recode</code> program
had to retain all its previous functionality while being transformed
into a mere application of the recoding library. These functions are
not really documented here for the time being, as we hope that many of
them will vanish over time. When this set of routines will stabilise,
it would be convenient to document them as an API for handling charset
names and contents.
</p>
<a name="index-find_005fcharset"></a>
<a name="index-list_005fall_005fcharsets"></a>
<a name="index-list_005fconcise_005fcharset"></a>
<a name="index-list_005ffull_005fcharset"></a>
<div class="example">
<pre class="example">RECODE_CHARSET find_charset (<var>name</var>, <var>cleaning-type</var>);
bool list_all_charsets (<var>charset</var>);
bool list_concise_charset (<var>charset</var>, <var>list-format</var>);
bool list_full_charset (<var>charset</var>);
</pre></div>
<hr>
<a name="Errors"></a>
<div class="header">
<p>
Previous: <a href="#Charset-level" accesskey="p" rel="previous">Charset level</a>, Up: <a href="#Library" accesskey="u" rel="up">Library</a> [<a href="Charset-and-Surface-Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Handling-errors"></a>
<h3 class="section">4.5 Handling errors</h3>
<a name="index-error-handling"></a>
<a name="index-handling-errors"></a>
<a name="index-error-messages"></a>
<p>The <code>recode</code> program, while using the <code>recode</code> library, needs to
control whether recoding problems are reported or not, and then reflect
these in the exit status. The program should also instruct the library
whether the recoding should be abruptly interrupted when an error is
met (so sparing processing when it is known in advance that a wrong
result would be discarded anyway), or if it should proceed nevertheless.
Here is how the library groups errors into levels, listed here in order
of increasing severity.
</p>
<dl compact="compact">
<dt><code>RECODE_NO_ERROR</code></dt>
<dd><a name="index-RECODE_005fNO_005fERROR"></a>
<p>No error was met on previous library calls.
</p>
</dd>
<dt><code>RECODE_NOT_CANONICAL</code></dt>
<dd><a name="index-RECODE_005fNOT_005fCANONICAL"></a>
<a name="index-non-canonical-input_002c-error-message"></a>
<p>The input text was using one of the many alternative codings for some
phenomenon, but not the one <code>recode</code> would have canonically generated.
So, if the reverse recoding is later attempted, it would produce a text
having the same <em>meaning</em> as the original text, yet not being byte
identical.
</p>
<p>For example, a <code>Base64</code> block in which end-of-lines appear elsewhere
that at every 76 characters is not canonical. An e-circumflex in TeX
which is coded as ‘<samp>\^{e}</samp>’ instead of ‘<samp>\^e</samp>’ is not canonical.
</p>
</dd>
<dt><code>RECODE_AMBIGUOUS_OUTPUT</code></dt>
<dd><a name="index-RECODE_005fAMBIGUOUS_005fOUTPUT"></a>
<a name="index-ambiguous-output_002c-error-message"></a>
<p>It has been discovered that if the reverse recoding was attempted on
the text output by this recoding, we would not obtain the original text,
only because an ambiguity was generated by accident in the output text.
This ambiguity would then cause the wrong interpretation to be taken.
</p>
<p>Here are a few examples. If the <code>Latin-1</code> sequence ‘<samp>e^</samp>’
is converted to Easy French and back, the result will be interpreted
as e-circumflex and so, will not reflect the intent of the original two
characters. Recoding an <code>IBM-PC</code> text to <code>Latin-1</code> and back,
where the input text contained an isolated <kbd>LF</kbd>, will have a spurious
<kbd>CR</kbd> inserted before the <kbd>LF</kbd>.
</p>
<p>Currently, there are many cases in the library where the production of
ambiguous output is not properly detected, as it is sometimes a difficult
problem to accomplish this detection, or to do it speedily.
</p>
</dd>
<dt><code>RECODE_UNTRANSLATABLE</code></dt>
<dd><a name="index-RECODE_005fUNTRANSLATABLE"></a>
<a name="index-untranslatable-input_002c-error-message"></a>
<p>One or more input character could not be recoded, because there is just
no representation for this character in the output charset.
</p>
<p>Here are a few examples. Non-strict mode often allows <code>recode</code> to
compute on-the-fly mappings for unrepresentable characters, but strict
mode prohibits such attribution of reversible translations: so strict
mode might often trigger such an error. Most <code>UCS-2</code> codes used to
represent Asian characters cannot be expressed in various Latin charsets.
</p>
</dd>
<dt><code>RECODE_INVALID_INPUT</code></dt>
<dd><a name="index-RECODE_005fINVALID_005fINPUT"></a>
<a name="index-invalid-input_002c-error-message"></a>
<p>The input text does not comply with the coding it is declared to hold. So,
there is no way by which a reverse recoding would reproduce this text,
because <code>recode</code> should never produce invalid output.
</p>
<p>Here are a few examples. In strict mode, <code>ASCII</code> text is not allowed
to contain characters with the eight bit set. <code>UTF-8</code> encodings
ought to be minimal<a name="DOCF7" href="#FOOT7"><sup>7</sup></a>.
</p>
</dd>
<dt><code>RECODE_SYSTEM_ERROR</code></dt>
<dd><a name="index-RECODE_005fSYSTEM_005fERROR"></a>
<a name="index-system-detected-problem_002c-error-message"></a>
<p>The underlying system reported an error while the recoding was going on,
likely an input/output error.
(This error symbol is currently unused in the library.)
</p>
</dd>
<dt><code>RECODE_USER_ERROR</code></dt>
<dd><a name="index-RECODE_005fUSER_005fERROR"></a>
<a name="index-misuse-of-recoding-library_002c-error-message"></a>
<p>The programmer or user requested something the recoding library is unable
to provide, or used the API wrongly.
(This error symbol is currently unused in the library.)
</p>
</dd>
<dt><code>RECODE_INTERNAL_ERROR</code></dt>
<dd><a name="index-RECODE_005fINTERNAL_005fERROR"></a>
<a name="index-internal-recoding-bug_002c-error-message"></a>
<p>Something really wrong, which should normally never happen, was detected
within the recoding library. This might be due to genuine bugs in the
library, or maybe due to un-initialised or overwritten arguments to
the API.
(This error symbol is currently unused in the library.)
</p>
</dd>
<dt><code>RECODE_MAXIMUM_ERROR</code></dt>
<dd><a name="index-RECODE_005fMAXIMUM_005fERROR-1"></a>
<p>This error code should never be returned, it is only internally used as
a sentinel for the list of all possible error codes.
</p></dd>
</dl>
<a name="index-error-level-threshold"></a>
<a name="index-threshold-for-error-reporting"></a>
<p>One should be able to set the error level threshold for returning failure
at end of recoding, and also the threshold for immediate interruption.
If many errors occur while the recoding proceed, which are not severe
enough to interrupt the recoding, then the most severe error is retained,
while others are forgotten<a name="DOCF8" href="#FOOT8"><sup>8</sup></a>. So, in case of an error,
the possible actions currently are:
</p>
<ul>
<li> do nothing and let go, returning success at end of recoding,
</li><li> just let go for now, but return failure at end of recoding,
</li><li> interrupt recoding right away and return failure now.
</li></ul>
<p>See <a href="#Task-level">Task level</a>, and particularly the description of the fields
<code>fail_level</code>, <code>abort_level</code> and <code>error_so_far</code>, for more
information about how errors are handled.
</p>
<div class="footnote">
<hr>
<h4 class="footnotes-heading">Footnotes</h4>
<h3><a name="FOOT7" href="#DOCF7">(7)</a></h3>
<p>The minimality of an <code>UTF-8</code> encoding
is guaranteed on output, but currently, it is not checked on input.</p>
<h3><a name="FOOT8" href="#DOCF8">(8)</a></h3>
<p>Another approach would have been
to define the level symbols as masks instead, and to give masks to
threshold setting routines, and to retain all errors—yet I never
met myself such a need in practice, and so I fear it would be overkill.
On the other hand, it might be interesting to maintain counters about
how many times each kind of error occurred.</p>
</div>
<hr>
<div class="header">
<p>
Previous: <a href="#Charset-level" accesskey="p" rel="previous">Charset level</a>, Up: <a href="#Library" accesskey="u" rel="up">Library</a> [<a href="Charset-and-Surface-Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>
|