This file is indexed.

/usr/share/doc/chicken-bin/manual-html/Unit srfi-14.html is in chicken-bin 4.12.0-0.3.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
<!doctype html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rel="stylesheet" href="manual.css" type="text/css" /></head>
<title>Chicken &raquo; Unit srfi-14</title>
<meta name="viewport" content="initial-scale=1" /></html>
<body>
<div id="body">
<div id="main"><h2 id="sec:Unit_srfi-14"><a href="#sec:Unit_srfi-14">Unit srfi-14</a></h2><p>Character set library.  An abbreviated version of the SRFI is provided in this document.  Full documentation is available in the <a href="http://srfi.schemers.org/srfi-14/srfi-14.html">original SRFI-14 document</a>.</p><p>On systems that support dynamic loading, the <tt>srfi-14</tt> unit can be made available in the interpreter (<tt>csi</tt>) by entering</p>
<pre class="highlight colorize"><span class="paren1">(<span class="default">require-extension srfi-14</span>)</span></pre><p>This library provides only the Latin-1 character set.  To get Unicode semantics, see the <a href="http://wiki.call-cc.org/egg/utf8">utf8</a> egg.  However, information on Unicode character sets is still provided in this document.</p><h2 id="sec:Specification"><a href="#sec:Specification">Specification</a></h2><p>In the following procedure specifications:</p><ul><li>A CS parameter is a character set. </li>
<li>An S parameter is a string. </li>
<li>A CHAR parameter is a character. </li>
<li>A CHAR-LIST parameter is a list of characters. </li>
<li>A PRED parameter is a unary character predicate procedure, returning a true/false value when applied to a character. </li>
<li>An OBJ parameter may be any value at all. </li>
</ul>
<p>Passing values to procedures with these parameters that do not satisfy these types is an error.</p><p>Unless otherwise noted in the specification of a procedure, procedures always return character sets that are distinct (from the point of view of the linear-update operations) from the parameter character sets. For example, <tt>char-set-adjoin</tt> is guaranteed to provide a fresh character set, even if it is not given any character parameters.</p><p>Parameters given in square brackets are optional. Unless otherwise noted in the text describing the procedure, any prefix of these optional parameters may be supplied, from zero arguments to the full list. When a procedure returns multiple values, this is shown by listing the return values in square brackets, as well. So, for example, the procedure with signature</p><pre>halts? F [X INIT-STORE] -&gt; [BOOLEAN INTEGER]</pre><p>would take one (F), two (F, X) or three (F, X, INIT-STORE) input parameters, and return two values, a boolean and an integer.</p><p>A parameter followed by &quot;<tt>...</tt>&quot; means zero-or-more elements. So the procedure with the signature</p><pre>sum-squares X ...  -&gt; NUMBER</pre><p>takes zero or more arguments (X ...), while the procedure with signature</p><pre>spell-check DOC DICT_1 DICT_2 ... -&gt; STRING-LIST</pre><p>takes two required parameters (DOC and DICT_1) and zero or more optional parameters (DICT_2 ...).</p><h3 id="sec:General_procedures"><a href="#sec:General_procedures">General procedures</a></h3><dl class="defsig"><dt class="defsig" id="def:char-set.3f"><span class="sig"><tt>(char-set? obj) -&gt; boolean</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>Is the object OBJ a character set?</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set.3d"><span class="sig"><tt>(char-set= cs_1 ...) -&gt; boolean</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>Are the character sets equal?</p><p>Boundary cases:</p><pre>(char-set=) =&gt; TRUE
(char-set= cs) =&gt; TRUE</pre><p>Rationale: transitive binary relations are generally extended to n-ary relations in Scheme, which enables clearer, more concise code to be written. While the zero-argument and one-argument cases will almost certainly not arise in first-order uses of such relations, they may well arise in higher-order cases or macro-generated code. <i>E.g.,</i> consider</p><pre>(apply char-set= cset-list)</pre><p>This is well-defined if the list is empty or a singleton list. Hence we extend these relations to any number of arguments. Implementors have reported actual uses of n-ary relations in higher-order cases allowing for fewer than two arguments. The way of Scheme is to handle the general case; we provide the fully general extension.</p><p>A counter-argument to this extension is that R5RS's transitive binary arithmetic relations (<tt>=</tt>, <tt>&lt;</tt>, <i>etc.</i>) require at least two arguments, hence this decision is a break with the prior convention -- although it is at least one that is backwards-compatible.</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set.3c.3d"><span class="sig"><tt>(char-set&lt;= cs_1 ...) -&gt; boolean</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>Returns true if every character set CS_I is a subset of character set CS_I+1.</p><p>Boundary cases:</p><pre>(char-set&lt;=) =&gt; TRUE
(char-set&lt;= cs) =&gt; TRUE</pre><p>Rationale: See <tt>char-set=</tt> for discussion of zero- and one-argument applications. Consider testing a list of char-sets for monotonicity with</p><pre>(apply char-set&lt;= cset-list)</pre></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set-hash"><span class="sig"><tt>(char-set-hash cs [bound]) -&gt; integer</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>Compute a hash value for the character set CS. BOUND is a non-negative exact integer specifying the range of the hash function. A positive value restricts the return value to the range [0,BOUND).</p><p>If BOUND is either zero or not given, the implementation may use an implementation-specific default value, chosen to be as large as is efficiently practical. For instance, the default range might be chosen for a given implementation to map all strings into the range of integers that can be represented with a single machine word.</p><p>Invariant:</p><pre>(char-set= cs_1 cs_2) =&gt; (= (char-set-hash cs_1 b) (char-set-hash cs_2 b))</pre><p>A legal but nonetheless discouraged implementation:</p><pre>(define (char-set-hash cs . maybe-bound) 1)</pre><p>Rationale: allowing the user to specify an explicit bound simplifies user code by removing the mod operation that typically accompanies every hash computation, and also may allow the implementation of the hash function to exploit a reduced range to efficiently compute the hash value. <i>E.g.</i>, for small bounds, the hash function may be computed in a fashion such that intermediate values never overflow into bignum integers, allowing the implementor to provide a fixnum-specific &quot;fast path&quot; for computing the common cases very rapidly.</p></dd>
</dl>
<h3 id="sec:Iterating_over_character_sets"><a href="#sec:Iterating_over_character_sets">Iterating over character sets</a></h3><dl class="defsig"><dt class="defsig" id="def:char-set-cursor"><span class="sig"><tt>(char-set-cursor cset) -&gt; cursor</tt></span> <span class="type">procedure</span></dt>
<dt class="defsig" id="def:char-set-ref"><span class="sig"><tt>(char-set-ref cset cursor) -&gt; char</tt></span> <span class="type">procedure</span></dt>
<dt class="defsig" id="def:char-set-cursor-next"><span class="sig"><tt>(char-set-cursor-next cset cursor) -&gt; cursor</tt></span> <span class="type">procedure</span></dt>
<dt class="defsig" id="def:end-of-char-set.3f"><span class="sig"><tt>(end-of-char-set? cursor) -&gt; boolean</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>Cursors are a low-level facility for iterating over the characters in a set. A cursor is a value that indexes a character in a char set. <tt>char-set-cursor</tt> produces a new cursor for a given char set. The set element indexed by the cursor is fetched with <tt>char-set-ref</tt>. A cursor index is incremented with <tt>char-set-cursor-next</tt>; in this way, code can step through every character in a char set. Stepping a cursor &quot;past the end&quot; of a char set produces a cursor that answers true to <tt>end-of-char-set?</tt>. It is an error to pass such a cursor to <tt>char-set-ref</tt> or to <tt>char-set-cursor-next</tt>.</p><p>A cursor value may not be used in conjunction with a different character set; if it is passed to <tt>char-set-ref</tt> or <tt>char-set-cursor-next</tt> with a character set other than the one used to create it, the results and effects are undefined.</p><p>Cursor values are <i>not</i> necessarily distinct from other types. They may be integers, linked lists, records, procedures or other values. This license is granted to allow cursors to be very &quot;lightweight&quot; values suitable for tight iteration, even in fairly simple implementations.</p><p>Note that these primitives are necessary to export an iteration facility for char sets to loop macros.</p><p>Example:</p><pre>(define cs (char-set #\G #\a #\T #\e #\c #\h))
 
;; Collect elts of CS into a list.
(let lp ((cur (char-set-cursor cs)) (ans '()))
  (if (end-of-char-set? cur) ans
      (lp (char-set-cursor-next cs cur)
          (cons (char-set-ref cs cur) ans))))
  =&gt; (#\G #\T #\a #\c #\e #\h)
 
;; Equivalently, using a list unfold (from SRFI 1):
(unfold-right end-of-char-set? 
              (curry char-set-ref cs)
	      (curry char-set-cursor-next cs)
	      (char-set-cursor cs))
  =&gt; (#\G #\T #\a #\c #\e #\h)</pre><p>Rationale: Note that the cursor API's four functions &quot;fit&quot; the functional protocol used by the unfolders provided by the list, string and char-set SRFIs (see the example above). By way of contrast, here is a simpler, two-function API that was rejected for failing this criterion. Besides <tt>char-set-cursor</tt>, it provided a single function that mapped a cursor and a character set to two values, the indexed character and the next cursor. If the cursor had exhausted the character set, then this function returned false instead of the character value, and another end-of-char-set cursor. In this way, the other three functions of the current API were combined together.</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set-fold"><span class="sig"><tt>(char-set-fold kons knil cs) -&gt; object</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>This is the fundamental iterator for character sets. Applies the function KONS across the character set CS using initial state value KNIL. That is, if CS is the empty set, the procedure returns KNIL. Otherwise, some element C of CS is chosen; let CS' be the remaining, unchosen characters. The procedure returns</p><pre>(char-set-fold KONS (KONS C KNIL) CS')</pre><p>Examples:</p><pre>;; CHAR-SET-MEMBERS
(lambda (cs) (char-set-fold cons '() cs))
 
;; CHAR-SET-SIZE
(lambda (cs) (char-set-fold (lambda (c i) (+ i 1)) 0 cs))
 
;; How many vowels in the char set?
(lambda (cs) 
  (char-set-fold (lambda (c i) (if (vowel? c) (+ i 1) i))
                 0 cs))</pre></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set-unfold"><span class="sig"><tt>(char-set-unfold f p g seed [base-cs]) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dt class="defsig" id="def:char-set-unfold.21"><span class="sig"><tt>(char-set-unfold! f p g seed base-cs) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>This is a fundamental constructor for char-sets.</p><ul><li>G is used to generate a series of &quot;seed&quot; values from the initial seed: SEED, (G SEED), (G^2 SEED), (G^3 SEED), ... </li>
<li>P tells us when to stop -- when it returns true when applied to one of these seed values. </li>
<li>F maps each seed value to a character. These characters are added to the base character set BASE-CS to form the result; BASE-CS defaults to the empty set. <tt>char-set-unfold!</tt> adds the characters to BASE-CS in a linear-update -- it is allowed, but not required, to side-effect and use BASE-CS's storage to construct the result. </li>
</ul>
<p>More precisely, the following definitions hold, ignoring the optional-argument issues:</p><pre>(define (char-set-unfold p f g seed base-cs) 
  (char-set-unfold! p f g seed (char-set-copy base-cs)))
 
(define (char-set-unfold! p f g seed base-cs)
  (let lp ((seed seed) (cs base-cs))
        (if (p seed) cs                                 ; P says we are done.
            (lp (g seed)                                ; Loop on (G SEED).
                (char-set-adjoin! cs (f seed))))))      ; Add (F SEED) to set.</pre><p>(Note that the actual implementation may be more efficient.)</p><p>Examples:</p><pre>(port-&gt;char-set p) = (char-set-unfold eof-object? values
                                      (lambda (x) (read-char p))
                                      (read-char p))
 
(list-&gt;char-set lis) = (char-set-unfold null? car cdr lis)</pre></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set-for-each"><span class="sig"><tt>(char-set-for-each proc cs) -&gt; unspecified</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>Apply procedure PROC to each character in the character set CS. Note that the order in which PROC is applied to the characters in the set is not specified, and may even change from one procedure application to another.</p><p>Nothing at all is specified about the value returned by this procedure; it is not even required to be consistent from call to call. It is simply required to be a value (or values) that may be passed to a command continuation, <i>e.g.</i> as the value of an expression appearing as a non-terminal subform of a <tt>begin</tt> expression. Note that in R5RS, this restricts the procedure to returning a single value; non-R5RS systems may not even provide this restriction.</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set-map"><span class="sig"><tt>(char-set-map proc cs) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>PROC is a char-&gt;char procedure. Apply it to all the characters in the char-set CS, and collect the results into a new character set.</p><p>Essentially lifts PROC from a char-&gt;char procedure to a char-set -&gt; char-set procedure.</p><p>Example:</p><pre>(char-set-map char-downcase cset)</pre></dd>
</dl>
<h3 id="sec:Creating_character_sets"><a href="#sec:Creating_character_sets">Creating character sets</a></h3><dl class="defsig"><dt class="defsig" id="def:char-set-copy"><span class="sig"><tt>(char-set-copy cs) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>Returns a copy of the character set CS. &quot;Copy&quot; means that if either the input parameter or the result value of this procedure is passed to one of the linear-update procedures described below, the other character set is guaranteed not to be altered.</p><p>A system that provides pure-functional implementations of the linear-operator suite could implement this procedure as the identity function -- so copies are <i>not</i> guaranteed to be distinct by <tt>eq?</tt>.</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set"><span class="sig"><tt>(char-set char_1 ...) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>Return a character set containing the given characters.</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:list-.3echar-set"><span class="sig"><tt>(list-&gt;char-set char-list [base-cs]) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dt class="defsig" id="def:list-.3echar-set.21"><span class="sig"><tt>(list-&gt;char-set! char-list base-cs) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>Return a character set containing the characters in the list of characters CHAR-LIST.</p><p>If character set BASE-CS is provided, the characters from CHAR-LIST are added to it. <tt>list-&gt;char-set!</tt> is allowed, but not required, to side-effect and reuse the storage in BASE-CS; <tt>list-&gt;char-set</tt> produces a fresh character set.</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:string-.3echar-set"><span class="sig"><tt>(string-&gt;char-set s [base-cs]) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dt class="defsig" id="def:string-.3echar-set.21"><span class="sig"><tt>(string-&gt;char-set! s base-cs) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>Return a character set containing the characters in the string S.</p><p>If character set BASE-CS is provided, the characters from S are added to it. <tt>string-&gt;char-set!</tt> is allowed, but not required, to side-effect and reuse the storage in BASE-CS; <tt>string-&gt;char-set</tt> produces a fresh character set.</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set-filter"><span class="sig"><tt>(char-set-filter pred cs [base-cs]) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dt class="defsig" id="def:char-set-filter.21"><span class="sig"><tt>(char-set-filter! pred cs base-cs) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>Returns a character set containing every character C in CS such that <tt>(PRED C)</tt> returns true.</p><p>If character set BASE-CS is provided, the characters specified by PRED are added to it. <tt>char-set-filter!</tt> is allowed, but not required, to side-effect and reuse the storage in BASE-CS; <tt>char-set-filter</tt> produces a fresh character set.</p><p>An implementation may not save away a reference to PRED and invoke it after <tt>char-set-filter</tt> or <tt>char-set-filter!</tt> returns -- that is, &quot;lazy,&quot; on-demand implementations are not allowed, as PRED may have external dependencies on mutable data or have other side-effects.</p><p>Rationale: This procedure provides a means of converting a character predicate into its equivalent character set; the CS parameter allows the programmer to bound the predicate's domain. Programmers should be aware that filtering a character set such as <tt>char-set:full</tt> could be a very expensive operation in an implementation that provided an extremely large character type, such as 32-bit Unicode. An earlier draft of this library provided a simple <tt>predicate-&gt;char-set</tt> procedure, which was rejected in favor of <tt>char-set-filter</tt> for this reason.</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:ucs-range-.3echar-set"><span class="sig"><tt>(ucs-range-&gt;char-set lower upper [error? base-cs]) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dt class="defsig" id="def:ucs-range-.3echar-set.21"><span class="sig"><tt>(ucs-range-&gt;char-set! lower upper error? base-cs) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>LOWER and UPPER are exact non-negative integers; LOWER &lt;= UPPER.</p><p>Returns a character set containing every character whose ISO/IEC 10646 UCS-4 code lies in the half-open range [LOWER,UPPER).</p><ul><li>If the requested range includes unassigned UCS values, these are silently ignored (the current UCS specification has &quot;holes&quot; in the space of assigned codes). </li>
<li>If the requested range includes &quot;private&quot; or &quot;user space&quot; codes, these are handled in an implementation-specific manner; however, a UCS- or Unicode-based Scheme implementation should pass them through transparently. </li>
<li>If any code from the requested range specifies a valid, assigned UCS character that has no corresponding representative in the implementation's character type, then (1) an error is raised if ERROR? is true, and (2) the code is ignored if ERROR? is false (the default). This might happen, for example, if the implementation uses ASCII characters, and the requested range includes non-ASCII characters. </li>
</ul>
<p>If character set BASE-CS is provided, the characters specified by the range are added to it. <tt>ucs-range-&gt;char-set!</tt> is allowed, but not required, to side-effect and reuse the storage in BASE-CS; <tt>ucs-range-&gt;char-set</tt> produces a fresh character set.</p><p>Note that ASCII codes are a subset of the Latin-1 codes, which are in turn a subset of the 16-bit Unicode codes, which are themselves a subset of the 32-bit UCS-4 codes. We commit to a specific encoding in this routine, regardless of the underlying representation of characters, so that client code using this library will be portable. <i>I.e.</i>, a conformant Scheme implementation may use EBCDIC or SHIFT-JIS to encode characters; it must simply map the UCS characters from the given range into the native representation when possible, and report errors when not possible.</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:-.3echar-set"><span class="sig"><tt>(-&gt;char-set x) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>Coerces X into a char-set. X may be a string, character or char-set. A string is converted to the set of its constituent characters; a character is converted to a singleton set; a char-set is returned as-is. This procedure is intended for use by other procedures that want to provide &quot;user-friendly,&quot; wide-spectrum interfaces to their clients.</p></dd>
</dl>
<h3 id="sec:Querying_character_sets"><a href="#sec:Querying_character_sets">Querying character sets</a></h3><dl class="defsig"><dt class="defsig" id="def:char-set-size"><span class="sig"><tt>(char-set-size cs) -&gt; integer</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>Returns the number of elements in character set CS.</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set-count"><span class="sig"><tt>(char-set-count pred cs) -&gt; integer</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>Apply PRED to the chars of character set CS, and return the number of chars that caused the predicate to return true.</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set-.3elist"><span class="sig"><tt>(char-set-&gt;list cs) -&gt; character-list</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>This procedure returns a list of the members of character set CS. The order in which CS's characters appear in the list is not defined, and may be different from one call to another.</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set-.3estring"><span class="sig"><tt>(char-set-&gt;string cs) -&gt; string</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>This procedure returns a string containing the members of character set CS. The order in which CS's characters appear in the string is not defined, and may be different from one call to another.</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set-contains.3f"><span class="sig"><tt>(char-set-contains? cs char) -&gt; boolean</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>This procedure tests CHAR for membership in character set CS.</p><p>The MIT Scheme character-set package called this procedure CHAR-SET-MEMBER?, but the argument order isn't consistent with the name.</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set-every"><span class="sig"><tt>(char-set-every pred cs) -&gt; boolean</tt></span> <span class="type">procedure</span></dt>
<dt class="defsig" id="def:char-set-any"><span class="sig"><tt>(char-set-any pred cs) -&gt; boolean</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>The <tt>char-set-every</tt> procedure returns true if predicate PRED returns true of every character in the character set CS. Likewise, <tt>char-set-any</tt> applies PRED to every character in character set CS, and returns the first true value it finds. If no character produces a true value, it returns false. The order in which these procedures sequence through the elements of CS is not specified.</p><p>Note that if you need to determine the actual character on which a predicate returns true, use <tt>char-set-any</tt> and arrange for the predicate to return the character parameter as its true value, <i>e.g.</i></p><pre>(char-set-any (lambda (c) (and (char-upper-case? c) c)) 
              cs)</pre></dd>
</dl>
<h3 id="sec:Character-set_algebra"><a href="#sec:Character-set_algebra">Character-set algebra</a></h3><dl class="defsig"><dt class="defsig" id="def:char-set-adjoin"><span class="sig"><tt>(char-set-adjoin cs char_1 ...) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dt class="defsig" id="def:char-set-delete"><span class="sig"><tt>(char-set-delete cs char_1 ...) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>Add/delete the CHAR_I characters to/from character set CS.</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set-adjoin.21"><span class="sig"><tt>(char-set-adjoin! cs char_1 ...) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dt class="defsig" id="def:char-set-delete.21"><span class="sig"><tt>(char-set-delete! cs char_1 ...) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>Linear-update variants. These procedures are allowed, but not required, to side-effect their first parameter.</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set-complement"><span class="sig"><tt>(char-set-complement cs) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dt class="defsig" id="def:char-set-union"><span class="sig"><tt>(char-set-union cs_1 ...) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dt class="defsig" id="def:char-set-intersection"><span class="sig"><tt>(char-set-intersection cs_1 ...) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dt class="defsig" id="def:char-set-difference"><span class="sig"><tt>(char-set-difference cs_1 cs_2 ...) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dt class="defsig" id="def:char-set-xor"><span class="sig"><tt>(char-set-xor cs_1 ...) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dt class="defsig" id="def:char-set-diff.2bintersection"><span class="sig"><tt>(char-set-diff+intersection cs_1 cs_2 ...) -&gt; [char-set char-set]</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>These procedures implement set complement, union, intersection, difference, and exclusive-or for character sets. The union, intersection and xor operations are n-ary. The difference function is also n-ary, associates to the left (that is, it computes the difference between its first argument and the union of all the other arguments), and requires at least one argument.</p><p>Boundary cases:</p><pre>(char-set-union) =&gt; char-set:empty
(char-set-intersection) =&gt; char-set:full
(char-set-xor) =&gt; char-set:empty
(char-set-difference CS) =&gt; CS</pre><p><tt>char-set-diff+intersection</tt> returns both the difference and the intersection of the arguments -- it partitions its first parameter. It is equivalent to</p><pre>(values (char-set-difference CS_1 CS_2 ...)
        (char-set-intersection CS_1 (char-set-union CS_2 ...)))</pre><p>but can be implemented more efficiently.</p><p>Programmers should be aware that <tt>char-set-complement</tt> could potentially be a very expensive operation in Scheme implementations that provide a very large character type, such as 32-bit Unicode. If this is a possibility, sets can be complimented with respect to a smaller universe using <tt>char-set-difference</tt>.</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set-complement.21"><span class="sig"><tt>(char-set-complement! cs) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dt class="defsig" id="def:char-set-union.21"><span class="sig"><tt>(char-set-union! cs_1 cs_2 ...) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dt class="defsig" id="def:char-set-intersection.21"><span class="sig"><tt>(char-set-intersection! cs_1 cs_2 ...) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dt class="defsig" id="def:char-set-difference.21"><span class="sig"><tt>(char-set-difference! cs_1 cs_2 ...) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dt class="defsig" id="def:char-set-xor.21"><span class="sig"><tt>(char-set-xor! cs_1 cs_2 ...) -&gt; char-set</tt></span> <span class="type">procedure</span></dt>
<dt class="defsig" id="def:char-set-diff.2bintersection.21"><span class="sig"><tt>(char-set-diff+intersection! cs_1 cs_2 cs_3 ...) -&gt; [char-set char-set]</tt></span> <span class="type">procedure</span></dt>
<dd class="defsig"><p>These are linear-update variants of the set-algebra functions. They are allowed, but not required, to side-effect their first (required) parameter.</p><p><tt>char-set-diff+intersection!</tt> is allowed to side-effect both of its two required parameters, CS_1 and CS_2.</p></dd>
</dl>
<h2 id="sec:Standard_character_sets"><a href="#sec:Standard_character_sets">Standard character sets</a></h2><p>Several character sets are predefined for convenience:</p><table>
<tr><td><tt>char-set:lower-case</tt></td><td>Lower-case letters</td></tr>

<tr><td><tt>char-set:upper-case</tt></td><td>Upper-case letters</td></tr>

<tr><td><tt>char-set:title-case</tt></td><td>Title-case letters</td></tr>

<tr><td><tt>char-set:letter</tt></td><td>Letters</td></tr>

<tr><td><tt>char-set:digit</tt></td><td>Digits</td></tr>

<tr><td><tt>char-set:letter+digit</tt></td><td>Letters and digits</td></tr>

<tr><td><tt>char-set:graphic</tt></td><td>Printing characters except spaces</td></tr>

<tr><td><tt>char-set:printing</tt></td><td>Printing characters including spaces</td></tr>

<tr><td><tt>char-set:whitespace</tt></td><td>Whitespace characters</td></tr>

<tr><td><tt>char-set:iso-control</tt></td><td>The ISO control characters</td></tr>

<tr><td><tt>char-set:punctuation</tt></td><td>Punctuation characters</td></tr>

<tr><td><tt>char-set:symbol</tt></td><td>Symbol characters</td></tr>

<tr><td><tt>char-set:hex-digit</tt></td><td>A hexadecimal digit: 0-9, A-F, a-f</td></tr>

<tr><td><tt>char-set:blank</tt></td><td>Blank characters -- horizontal whitespace</td></tr>

<tr><td><tt>char-set:ascii</tt></td><td>All characters in the ASCII set.</td></tr>

<tr><td><tt>char-set:empty</tt></td><td>Empty set</td></tr>

<tr><td><tt>char-set:full</tt></td><td>All characters</td></tr>
</table>
<p>In Unicode Scheme implementations, the base character sets are compatible with Java's Unicode specifications. For ASCII or Latin-1, we simply restrict the Unicode set specifications to their first 128 or 256 codes, respectively.</p><p>Here are the definitions for some of the sets in an ASCII implementation:</p><table>
<tr><td><tt>char-set:lower-case</tt></td><td>a-z</td></tr>

<tr><td><tt>char-set:upper-case</tt></td><td>A-Z</td></tr>

<tr><td><tt>char-set:letter</tt></td><td>A-Z and a-z</td></tr>

<tr><td><tt>char-set:digit</tt></td><td>0123456789</td></tr>

<tr><td><tt>char-set:punctuation</tt></td><td><tt>!&quot;#%&amp;'()*,-./:;?@[\]_{</tt>}</td></tr>

<tr><td><tt>char-set:symbol</tt></td><td><tt>$+&lt;=&gt;^`|~</tt></td></tr>

<tr><td><tt>char-set:whitespace</tt></td><td>Space, newline, tab, form feed, vertical tab, carriage return</td></tr>

<tr><td><tt>char-set:blank</tt></td><td>Space and tab</td></tr>

<tr><td><tt>char-set:graphic</tt></td><td>letter + digit + punctuation + symbol</td></tr>

<tr><td><tt>char-set:printing</tt></td><td>graphic + whitespace</td></tr>

<tr><td><tt>char-set:iso-control</tt></td><td>ASCII 0-31 and 127</td></tr>
</table>
<h3 id="sec:Character_set_constants"><a href="#sec:Character_set_constants">Character set constants</a></h3><dl class="defsig"><dt class="defsig" id="def:char-set:lower-case"><span class="sig"><tt>char-set:lower-case</tt></span> <span class="type">constant</span></dt>
<dd class="defsig"><p>For Unicode, a character is lowercase if</p><ul><li>it is not in the range [U+2000,U+2FFF], and </li>
<li>the Unicode attribute table does not give a lowercase mapping for it, and </li>
<li>at least one of the following is true: <ul><li>the Unicode attribute table gives a mapping to uppercase for the character, or </li>
<li>the name for the character in the Unicode attribute table contains the words &quot;SMALL LETTER&quot; or &quot;SMALL LIGATURE&quot;. </li>
</ul>
</li>
</ul>
<p>The lower-case ASCII characters are</p><p>abcdefghijklmnopqrstuvwxyz</p><p>Latin-1 adds another 33 lower-case characters to the ASCII set:</p><table>
<tr><td>00B5</td><td>MICRO SIGN</td></tr>

<tr><td>00DF</td><td>LATIN SMALL LETTER SHARP S</td></tr>

<tr><td>00E0</td><td>LATIN SMALL LETTER A WITH GRAVE</td></tr>

<tr><td>00E1</td><td>LATIN SMALL LETTER A WITH ACUTE</td></tr>

<tr><td>00E2</td><td>LATIN SMALL LETTER A WITH CIRCUMFLEX</td></tr>

<tr><td>00E3</td><td>LATIN SMALL LETTER A WITH TILDE</td></tr>

<tr><td>00E4</td><td>LATIN SMALL LETTER A WITH DIAERESIS</td></tr>

<tr><td>00E5</td><td>LATIN SMALL LETTER A WITH RING ABOVE</td></tr>

<tr><td>00E6</td><td>LATIN SMALL LETTER AE</td></tr>

<tr><td>00E7</td><td>LATIN SMALL LETTER C WITH CEDILLA</td></tr>

<tr><td>00E8</td><td>LATIN SMALL LETTER E WITH GRAVE</td></tr>

<tr><td>00E9</td><td>LATIN SMALL LETTER E WITH ACUTE</td></tr>

<tr><td>00EA</td><td>LATIN SMALL LETTER E WITH CIRCUMFLEX</td></tr>

<tr><td>00EB</td><td>LATIN SMALL LETTER E WITH DIAERESIS</td></tr>

<tr><td>00EC</td><td>LATIN SMALL LETTER I WITH GRAVE</td></tr>

<tr><td>00ED</td><td>LATIN SMALL LETTER I WITH ACUTE</td></tr>

<tr><td>00EE</td><td>LATIN SMALL LETTER I WITH CIRCUMFLEX</td></tr>

<tr><td>00EF</td><td>LATIN SMALL LETTER I WITH DIAERESIS</td></tr>

<tr><td>00F0</td><td>LATIN SMALL LETTER ETH</td></tr>

<tr><td>00F1</td><td>LATIN SMALL LETTER N WITH TILDE</td></tr>

<tr><td>00F2</td><td>LATIN SMALL LETTER O WITH GRAVE</td></tr>

<tr><td>00F3</td><td>LATIN SMALL LETTER O WITH ACUTE</td></tr>

<tr><td>00F4</td><td>LATIN SMALL LETTER O WITH CIRCUMFLEX</td></tr>

<tr><td>00F5</td><td>LATIN SMALL LETTER O WITH TILDE</td></tr>

<tr><td>00F6</td><td>LATIN SMALL LETTER O WITH DIAERESIS</td></tr>

<tr><td>00F8</td><td>LATIN SMALL LETTER O WITH STROKE</td></tr>

<tr><td>00F9</td><td>LATIN SMALL LETTER U WITH GRAVE</td></tr>

<tr><td>00FA</td><td>LATIN SMALL LETTER U WITH ACUTE</td></tr>

<tr><td>00FB</td><td>LATIN SMALL LETTER U WITH CIRCUMFLEX</td></tr>

<tr><td>00FC</td><td>LATIN SMALL LETTER U WITH DIAERESIS</td></tr>

<tr><td>00FD</td><td>LATIN SMALL LETTER Y WITH ACUTE</td></tr>

<tr><td>00FE</td><td>LATIN SMALL LETTER THORN</td></tr>

<tr><td>00FF</td><td>LATIN SMALL LETTER Y WITH DIAERESIS</td></tr>
</table>
<p>Note that three of these have no corresponding Latin-1 upper-case character:</p><table>
<tr><td>00B5</td><td>MICRO SIGN</td></tr>

<tr><td>00DF</td><td>LATIN SMALL LETTER SHARP S</td></tr>

<tr><td>00FF</td><td>LATIN SMALL LETTER Y WITH DIAERESIS</td></tr>
</table>
<p>(The compatibility micro character uppercases to the non-Latin-1 Greek capital mu; the German sharp s character uppercases to the pair of characters &quot;SS,&quot; and the capital y-with-diaeresis is non-Latin-1.)</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set:upper-case"><span class="sig"><tt>char-set:upper-case</tt></span> <span class="type">constant</span></dt>
<dd class="defsig"><p>For Unicode, a character is uppercase if</p><ul><li>it is not in the range [U+2000,U+2FFF], and </li>
<li>the Unicode attribute table does not give an uppercase mapping for it (this excludes titlecase characters), and </li>
<li>at least one of the following is true: <ul><li>the Unicode attribute table gives a mapping to lowercase for the character, or </li>
<li>the name for the character in the Unicode attribute table contains the words &quot;CAPITAL LETTER&quot; or &quot;CAPITAL LIGATURE&quot;. </li>
</ul>
</li>
</ul>
<p>The upper-case ASCII characters are</p><p>ABCDEFGHIJKLMNOPQRSTUVWXYZ</p><p>Latin-1 adds another 30 upper-case characters to the ASCII set:</p><table>
<tr><td>00C0</td><td>LATIN CAPITAL LETTER A WITH GRAVE</td></tr>

<tr><td>00C1</td><td>LATIN CAPITAL LETTER A WITH ACUTE</td></tr>

<tr><td>00C2</td><td>LATIN CAPITAL LETTER A WITH CIRCUMFLEX</td></tr>

<tr><td>00C3</td><td>LATIN CAPITAL LETTER A WITH TILDE</td></tr>

<tr><td>00C4</td><td>LATIN CAPITAL LETTER A WITH DIAERESIS</td></tr>

<tr><td>00C5</td><td>LATIN CAPITAL LETTER A WITH RING ABOVE</td></tr>

<tr><td>00C6</td><td>LATIN CAPITAL LETTER AE</td></tr>

<tr><td>00C7</td><td>LATIN CAPITAL LETTER C WITH CEDILLA</td></tr>

<tr><td>00C8</td><td>LATIN CAPITAL LETTER E WITH GRAVE</td></tr>

<tr><td>00C9</td><td>LATIN CAPITAL LETTER E WITH ACUTE</td></tr>

<tr><td>00CA</td><td>LATIN CAPITAL LETTER E WITH CIRCUMFLEX</td></tr>

<tr><td>00CB</td><td>LATIN CAPITAL LETTER E WITH DIAERESIS</td></tr>

<tr><td>00CC</td><td>LATIN CAPITAL LETTER I WITH GRAVE</td></tr>

<tr><td>00CD</td><td>LATIN CAPITAL LETTER I WITH ACUTE</td></tr>

<tr><td>00CE</td><td>LATIN CAPITAL LETTER I WITH CIRCUMFLEX</td></tr>

<tr><td>00CF</td><td>LATIN CAPITAL LETTER I WITH DIAERESIS</td></tr>

<tr><td>00D0</td><td>LATIN CAPITAL LETTER ETH</td></tr>

<tr><td>00D1</td><td>LATIN CAPITAL LETTER N WITH TILDE</td></tr>

<tr><td>00D2</td><td>LATIN CAPITAL LETTER O WITH GRAVE</td></tr>

<tr><td>00D3</td><td>LATIN CAPITAL LETTER O WITH ACUTE</td></tr>

<tr><td>00D4</td><td>LATIN CAPITAL LETTER O WITH CIRCUMFLEX</td></tr>

<tr><td>00D5</td><td>LATIN CAPITAL LETTER O WITH TILDE</td></tr>

<tr><td>00D6</td><td>LATIN CAPITAL LETTER O WITH DIAERESIS</td></tr>

<tr><td>00D8</td><td>LATIN CAPITAL LETTER O WITH STROKE</td></tr>

<tr><td>00D9</td><td>LATIN CAPITAL LETTER U WITH GRAVE</td></tr>

<tr><td>00DA</td><td>LATIN CAPITAL LETTER U WITH ACUTE</td></tr>

<tr><td>00DB</td><td>LATIN CAPITAL LETTER U WITH CIRCUMFLEX</td></tr>

<tr><td>00DC</td><td>LATIN CAPITAL LETTER U WITH DIAERESIS</td></tr>

<tr><td>00DD</td><td>LATIN CAPITAL LETTER Y WITH ACUTE</td></tr>

<tr><td>00DE</td><td>LATIN CAPITAL LETTER THORN</td></tr>
</table>
</dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set:title-case"><span class="sig"><tt>char-set:title-case</tt></span> <span class="type">constant</span></dt>
<dd class="defsig"><p>In Unicode, a character is titlecase if it has the category Lt in the character attribute database. There are very few of these characters; here is the entire 31-character list as of Unicode 3.0:</p><table>
<tr><td>01C5</td><td>LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON</td></tr>

<tr><td>01C8</td><td>LATIN CAPITAL LETTER L WITH SMALL LETTER J</td></tr>

<tr><td>01CB</td><td>LATIN CAPITAL LETTER N WITH SMALL LETTER J</td></tr>

<tr><td>01F2</td><td>LATIN CAPITAL LETTER D WITH SMALL LETTER Z</td></tr>

<tr><td>1F88</td><td>GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI</td></tr>

<tr><td>1F89</td><td>GREEK CAPITAL LETTER ALPHA WITH DASIA AND PROSGEGRAMMENI</td></tr>

<tr><td>1F8A</td><td>GREEK CAPITAL LETTER ALPHA WITH PSILI AND VARIA AND PROSGEGRAMMENI</td></tr>

<tr><td>1F8B</td><td>GREEK CAPITAL LETTER ALPHA WITH DASIA AND VARIA AND PROSGEGRAMMENI</td></tr>

<tr><td>1F8C</td><td>GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA AND PROSGEGRAMMENI</td></tr>

<tr><td>1F8D</td><td>GREEK CAPITAL LETTER ALPHA WITH DASIA AND OXIA AND PROSGEGRAMMENI</td></tr>

<tr><td>1F8E</td><td>GREEK CAPITAL LETTER ALPHA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI</td></tr>

<tr><td>1F8F</td><td>GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI</td></tr>

<tr><td>1F98</td><td>GREEK CAPITAL LETTER ETA WITH PSILI AND PROSGEGRAMMENI</td></tr>

<tr><td>1F99</td><td>GREEK CAPITAL LETTER ETA WITH DASIA AND PROSGEGRAMMENI</td></tr>

<tr><td>1F9A</td><td>GREEK CAPITAL LETTER ETA WITH PSILI AND VARIA AND PROSGEGRAMMENI</td></tr>

<tr><td>1F9B</td><td>GREEK CAPITAL LETTER ETA WITH DASIA AND VARIA AND PROSGEGRAMMENI</td></tr>

<tr><td>1F9C</td><td>GREEK CAPITAL LETTER ETA WITH PSILI AND OXIA AND PROSGEGRAMMENI</td></tr>

<tr><td>1F9D</td><td>GREEK CAPITAL LETTER ETA WITH DASIA AND OXIA AND PROSGEGRAMMENI</td></tr>

<tr><td>1F9E</td><td>GREEK CAPITAL LETTER ETA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI</td></tr>

<tr><td>1F9F</td><td>GREEK CAPITAL LETTER ETA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI</td></tr>

<tr><td>1FA8</td><td>GREEK CAPITAL LETTER OMEGA WITH PSILI AND PROSGEGRAMMENI</td></tr>

<tr><td>1FA9</td><td>GREEK CAPITAL LETTER OMEGA WITH DASIA AND PROSGEGRAMMENI</td></tr>

<tr><td>1FAA</td><td>GREEK CAPITAL LETTER OMEGA WITH PSILI AND VARIA AND PROSGEGRAMMENI</td></tr>

<tr><td>1FAB</td><td>GREEK CAPITAL LETTER OMEGA WITH DASIA AND VARIA AND PROSGEGRAMMENI</td></tr>

<tr><td>1FAC</td><td>GREEK CAPITAL LETTER OMEGA WITH PSILI AND OXIA AND PROSGEGRAMMENI</td></tr>

<tr><td>1FAD</td><td>GREEK CAPITAL LETTER OMEGA WITH DASIA AND OXIA AND PROSGEGRAMMENI</td></tr>

<tr><td>1FAE</td><td>GREEK CAPITAL LETTER OMEGA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI</td></tr>

<tr><td>1FAF</td><td>GREEK CAPITAL LETTER OMEGA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI</td></tr>

<tr><td>1FBC</td><td>GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI</td></tr>

<tr><td>1FCC</td><td>GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI</td></tr>

<tr><td>1FFC</td><td>GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI</td></tr>
</table>
<p>There are no ASCII or Latin-1 titlecase characters.</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set:letter"><span class="sig"><tt>char-set:letter</tt></span> <span class="type">constant</span></dt>
<dd class="defsig"><p>In Unicode, a letter is any character with one of the letter categories (Lu, Ll, Lt, Lm, Lo) in the Unicode character database.</p><p>There are 52 ASCII letters</p><p>abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ</p><p>There are 117 Latin-1 letters. These are the 115 characters that are members of the Latin-1 <tt>char-set:lower-case</tt> and <tt>char-set:upper-case</tt> sets, plus</p><table>
<tr><td>00AA</td><td>FEMININE ORDINAL INDICATOR</td></tr>

<tr><td>00BA</td><td>MASCULINE ORDINAL INDICATOR</td></tr>
</table>
<p>(These two letters are considered lower-case by Unicode, but not by SRFI 14.)</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set:digit"><span class="sig"><tt>char-set:digit</tt></span> <span class="type">constant</span></dt>
<dd class="defsig"><p>In Unicode, a character is a digit if it has the category Nd in the character attribute database. In Latin-1 and ASCII, the only such characters are 0123456789. In Unicode, there are other digit characters in other code blocks, such as Gujarati digits and Tibetan digits.</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set:hex-digit"><span class="sig"><tt>char-set:hex-digit</tt></span> <span class="type">constant</span></dt>
<dd class="defsig"><p>The only hex digits are 0123456789abcdefABCDEF.</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set:letter.2bdigit"><span class="sig"><tt>char-set:letter+digit</tt></span> <span class="type">constant</span></dt>
<dd class="defsig"><p>The union of <tt>char-set:letter</tt> and <tt>char-set:digit.</tt></p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set:graphic"><span class="sig"><tt>char-set:graphic</tt></span> <span class="type">constant</span></dt>
<dd class="defsig"><p>A graphic character is one that would put ink on paper. The ASCII and Latin-1 graphic characters are the members of</p><table>
<tr><td><tt>char-set:letter</tt></td></tr>

<tr><td><tt>char-set:digit</tt></td></tr>

<tr><td><tt>char-set:punctuation</tt></td></tr>

<tr><td><tt>char-set:symbol</tt></td></tr>
</table>
</dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set:printing"><span class="sig"><tt>char-set:printing</tt></span> <span class="type">constant</span></dt>
<dd class="defsig"><p>A printing character is one that would occupy space when printed, <i>i.e.</i>, a graphic character or a space character. <tt>char-set:printing</tt> is the union of <tt>char-set:whitespace</tt> and <tt>char-set:graphic.</tt></p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set:whitespace"><span class="sig"><tt>char-set:whitespace</tt></span> <span class="type">constant</span></dt>
<dd class="defsig"><p>In Unicode, a whitespace character is either</p><ul><li>a character with one of the space, line, or paragraph separator categories (Zs, Zl or Zp) of the Unicode character database. </li>
<li>U+0009 Horizontal tabulation (\t control-I) </li>
<li>U+000A Line feed (\n control-J) </li>
<li>U+000B Vertical tabulation (\v control-K) </li>
<li>U+000C Form feed (\f control-L) </li>
<li>U+000D Carriage return (\r control-M) </li>
</ul>
<p>There are 24 whitespace characters in Unicode 3.0:</p><table>
<tr><td>0009</td><td>HORIZONTAL TABULATION</td><td>\t control-I</td></tr>

<tr><td>000A</td><td>LINE FEED</td><td>\n control-J</td></tr>

<tr><td>000B</td><td>VERTICAL TABULATION</td><td>\v control-K</td></tr>

<tr><td>000C</td><td>FORM FEED</td><td>\f control-L</td></tr>

<tr><td>000D</td><td>CARRIAGE RETURN</td><td>\r control-M</td></tr>

<tr><td>0020</td><td>SPACE</td><td>Zs</td></tr>

<tr><td>00A0</td><td>NO-BREAK SPACE</td><td>Zs</td></tr>

<tr><td>1680</td><td>OGHAM SPACE MARK</td><td>Zs</td></tr>

<tr><td>2000</td><td>EN QUAD</td><td>Zs</td></tr>

<tr><td>2001</td><td>EM QUAD</td><td>Zs</td></tr>

<tr><td>2002</td><td>EN SPACE</td><td>Zs</td></tr>

<tr><td>2003</td><td>EM SPACE</td><td>Zs</td></tr>

<tr><td>2004</td><td>THREE-PER-EM SPACE</td><td>Zs</td></tr>

<tr><td>2005</td><td>FOUR-PER-EM SPACE</td><td>Zs</td></tr>

<tr><td>2006</td><td>SIX-PER-EM SPACE</td><td>Zs</td></tr>

<tr><td>2007</td><td>FIGURE SPACE</td><td>Zs</td></tr>

<tr><td>2008</td><td>PUNCTUATION SPACE</td><td>Zs</td></tr>

<tr><td>2009</td><td>THIN SPACE</td><td>Zs</td></tr>

<tr><td>200A</td><td>HAIR SPACE</td><td>Zs</td></tr>

<tr><td>200B</td><td>ZERO WIDTH SPACE</td><td>Zs</td></tr>

<tr><td>2028</td><td>LINE SEPARATOR</td><td>Zl</td></tr>

<tr><td>2029</td><td>PARAGRAPH SEPARATOR</td><td>Zp</td></tr>

<tr><td>202F</td><td>NARROW NO-BREAK SPACE</td><td>Zs</td></tr>

<tr><td>3000</td><td>IDEOGRAPHIC SPACE</td><td>Zs</td></tr>
</table>
<p>The ASCII whitespace characters are the first six characters in the above list -- line feed, horizontal tabulation, vertical tabulation, form feed, carriage return, and space. These are also exactly the characters recognised by the Posix <tt>isspace()</tt> procedure. Latin-1 adds the no-break space.</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set:iso-control"><span class="sig"><tt>char-set:iso-control</tt></span> <span class="type">constant</span></dt>
<dd class="defsig"><p>The ISO control characters are the Unicode/Latin-1 characters in the ranges [U+0000,U+001F] and [U+007F,U+009F].</p><p>ASCII restricts this set to the characters in the range [U+0000,U+001F] plus the character U+007F.</p><p>Note that Unicode defines other control characters which do not belong to this set (hence the qualifying prefix &quot;iso-&quot; in the name).</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set:punctuation"><span class="sig"><tt>char-set:punctuation</tt></span> <span class="type">constant</span></dt>
<dd class="defsig"><p>In Unicode, a punctuation character is any character that has one of the punctuation categories in the Unicode character database (Pc, Pd, Ps, Pe, Pi, Pf, or Po.)</p><p>ASCII has 23 punctuation characters:</p><pre>!&quot;#%&amp;'()*,-./:;?@[\]_{}</pre><p>Latin-1 adds six more:</p><table>
<tr><td>00A1</td><td>INVERTED EXCLAMATION MARK</td></tr>

<tr><td>00AB</td><td>LEFT-POINTING DOUBLE ANGLE QUOTATION MARK</td></tr>

<tr><td>00AD</td><td>SOFT HYPHEN</td></tr>

<tr><td>00B7</td><td>MIDDLE DOT</td></tr>

<tr><td>00BB</td><td>RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK</td></tr>

<tr><td>00BF</td><td>INVERTED QUESTION MARK</td></tr>
</table>
<p>Note that the nine ASCII characters <tt>$+&lt;=&gt;^`|~</tt> are <i>not</i> punctuation. They are &quot;symbols.&quot;</p></dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set:symbol"><span class="sig"><tt>char-set:symbol</tt></span> <span class="type">constant</span></dt>
<dd class="defsig"><p>In Unicode, a symbol is any character that has one of the symbol categories in the Unicode character database (Sm, Sc, Sk, or So). There are nine ASCII symbol characters:</p><pre>$+&lt;=&gt;^`|~</pre><p>Latin-1 adds 18 more:</p><table>
<tr><td>00A2</td><td>CENT SIGN</td></tr>

<tr><td>00A3</td><td>POUND SIGN</td></tr>

<tr><td>00A4</td><td>CURRENCY SIGN</td></tr>

<tr><td>00A5</td><td>YEN SIGN</td></tr>

<tr><td>00A6</td><td>BROKEN BAR</td></tr>

<tr><td>00A7</td><td>SECTION SIGN</td></tr>

<tr><td>00A8</td><td>DIAERESIS</td></tr>

<tr><td>00A9</td><td>COPYRIGHT SIGN</td></tr>

<tr><td>00AC</td><td>NOT SIGN</td></tr>

<tr><td>00AE</td><td>REGISTERED SIGN</td></tr>

<tr><td>00AF</td><td>MACRON</td></tr>

<tr><td>00B0</td><td>DEGREE SIGN</td></tr>

<tr><td>00B1</td><td>PLUS-MINUS SIGN</td></tr>

<tr><td>00B4</td><td>ACUTE ACCENT</td></tr>

<tr><td>00B6</td><td>PILCROW SIGN</td></tr>

<tr><td>00B8</td><td>CEDILLA</td></tr>

<tr><td>00D7</td><td>MULTIPLICATION SIGN</td></tr>

<tr><td>00F7</td><td>DIVISION SIGN</td></tr>
</table>
</dd>
</dl>
<dl class="defsig"><dt class="defsig" id="def:char-set:blank"><span class="sig"><tt>char-set:blank</tt></span> <span class="type">constant</span></dt>
<dd class="defsig"><p>Blank chars are horizontal whitespace. In Unicode, a blank character is either</p><ul><li>a character with the space separator category (Zs) in the Unicode character database. </li>
<li>U+0009 Horizontal tabulation (\t control-I) </li>
</ul>
<p>There are eighteen blank characters in Unicode 3.0:</p><table>
<tr><td>0009</td><td>HORIZONTAL TABULATION</td><td>\t control-I</td></tr>

<tr><td>0020</td><td>SPACE</td><td>Zs</td></tr>

<tr><td>00A0</td><td>NO-BREAK SPACE</td><td>Zs</td></tr>

<tr><td>1680</td><td>OGHAM SPACE MARK</td><td>Zs</td></tr>

<tr><td>2000</td><td>EN QUAD</td><td>Zs</td></tr>

<tr><td>2001</td><td>EM QUAD</td><td>Zs</td></tr>

<tr><td>2002</td><td>EN SPACE</td><td>Zs</td></tr>

<tr><td>2003</td><td>EM SPACE</td><td>Zs</td></tr>

<tr><td>2004</td><td>THREE-PER-EM SPACE</td><td>Zs</td></tr>

<tr><td>2005</td><td>FOUR-PER-EM SPACE</td><td>Zs</td></tr>

<tr><td>2006</td><td>SIX-PER-EM SPACE</td><td>Zs</td></tr>

<tr><td>2007</td><td>FIGURE SPACE</td><td>Zs</td></tr>

<tr><td>2008</td><td>PUNCTUATION SPACE</td><td>Zs</td></tr>

<tr><td>2009</td><td>THIN SPACE</td><td>Zs</td></tr>

<tr><td>200A</td><td>HAIR SPACE</td><td>Zs</td></tr>

<tr><td>200B</td><td>ZERO WIDTH SPACE</td><td>Zs</td></tr>

<tr><td>202F</td><td>NARROW NO-BREAK SPACE</td><td>Zs</td></tr>

<tr><td>3000</td><td>IDEOGRAPHIC SPACE</td><td>Zs</td></tr>
</table>
<p>The ASCII blank characters are the first two characters above -- horizontal tab and space. Latin-1 adds the no-break space.</p></dd>
</dl>
<hr /><p>Previous: <a href="Unit%20srfi-13.html">Unit srfi-13</a></p><p>Next: <a href="Unit%20srfi-18.html">Unit srfi-18</a></p></div></div></body>