/usr/share/doc/xindy-rules/style-tutorial-2.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
 <META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
 <TITLE>XINDY Style File Tutorial: A Basic Introduction</TITLE>
 <LINK HREF="style-tutorial-3.html" REL=next>
 <LINK HREF="style-tutorial-1.html" REL=previous>
 <LINK HREF="style-tutorial.html#toc2" REL=contents>
</HEAD>
<BODY>
<A HREF="style-tutorial-3.html">Next</A>
<A HREF="style-tutorial-1.html">Previous</A>
<A HREF="style-tutorial.html#toc2">Contents</A>
<HR>
<H2><A NAME="s2">2. A Basic Introduction</A></H2>

<P>This section incrementally introduces the most important aspects of
the system. After reading this chapter you should be able to specify
style files for
about 80% of the commonly used indexes. The examples are demonstrated
with a TeX markup so one can easily typeset the results <SF>xindy</SF>
produces. You need LaTeX and the ISO-Latin enhancements that come
with the <CODE>inputenc</CODE> package to run the following examples. Also the
<SF>xindy</SF> system must already be installed.
<P>
But this tutorial doesn't reflect real life any more. The concepts are
introduced below to explain them, but actual usage is probably
different. In particular, you should not expect to specify sort rules
by hand; usually one uses the language modules for that. Same way,
xindy standard modules provide lots of markup functionality that you
need for your documents, and can be used as a starting point.
Nevertheless, let's continue with the explanation of <sf>xindy</sf>'s
style file language.
<P>
<H2><A NAME="ss2.1">2.1 Running <SF>xindy</SF></A>
</H2>

<P>Create a new directory somewhere and copy some files from the
distribution directory <CODE>doc/style-tutorial/</CODE> by typing
<P>
<BLOCKQUOTE><CODE>
<PRE>
eg$ mkdir tutorial
eg$ cd tutorial
eg$ cp &lt;distrib-dir>/doc/style-tutorial/*.raw .
eg$ cp &lt;distrib-dir>/doc/style-tutorial/*.tex .
</PRE>
</CODE></BLOCKQUOTE>
<P>with <CODE>distrib-dir</CODE> replaced by the actual location. Now create a
file <CODE>style1.xdy</CODE> with the following content:
<P>
<BLOCKQUOTE><CODE>
<PRE>
;; This is a first example using `xindy'.

(define-location-class "page-numbers" ("arabic-numbers"))
(define-attributes (("definition" "usage")))
</PRE>
</CODE></BLOCKQUOTE>
<P>Now run <SF>xindy</SF> by typing
<P>
<BLOCKQUOTE><CODE>
<PRE>
eg$ xindy -t ex1.xlg -M style1 -I xindy ex1.raw
</PRE>
</CODE></BLOCKQUOTE>
<P>You should see something like
<P>
<BLOCKQUOTE><CODE>
<PRE>
Opening logfile "ex1.xlg" (done)
Reading indexstyle...
Loading module "style1.xdy"...
Finished loading module "style1.xdy".
Finished reading indexstyle.
Finalizing indexstyle... (done)

Reading raw-index "ex1.raw"...
Finished reading raw-index.

Processing index... [10%] [20%] [30%] [40%] [50%] [60%] [70%] [80%] [90%] [100%]
Finished processing index.

Writing markup... [10%] [20%] [30%] [40%] [50%] [60%] [70%] [80%] [90%] [100%]
Markup written into file "ex1.ind".
</PRE>
</CODE></BLOCKQUOTE>
<P><SF>xindy</SF> has now successfully compiled the index <CODE>ex1.raw</CODE> using
your index style <CODE>style1.xdy</CODE>. The result is now stored in file
<CODE>ex1.ind</CODE>. You can view this file but currently it only contains an
unreadable mix of data.
<P>But now let's come back to our index style. The syntax of the command
is in a Lisp-like form with lots of braces, looking a little bit
weird, but you'll soon get used to it. What is the meaning of the two
commands we specified? The first command informed <SF>xindy</SF> that we
like to process page numbers. We do this by defining a new
<EM>location class</EM> named <CODE>page-numbers</CODE>. The page numbers consist
of <CODE>arabic-numbers</CODE> as we might expect but this is not necessarily
true---imagine your page numbers consisted of roman numerals instead.
When reading the <EM>raw index</EM> contained in file <CODE>ex1.raw</CODE>
<SF>xindy</SF> checks all locations if they match all known location
classes. Since in our example the only known location class is the
class of page numbers which are written using arabic digits, all
locations will be checked if they are correct page numbers.
<P>The second command tells <SF>xindy</SF> that we use two types of
attributes for location references. Most often the locations in an
index denote different meanings. For example, in mathematical texts
one will distinguish the <EM>definition</EM> of a mathematical term from
its <EM>usage</EM>. Sometimes these are typeset using different font
shapes such as <EM>italic</EM> or font series such as <B>boldface</B>. Each
location has an associated attribute which, if it is unspecified,
defaults to the attribute <CODE>default</CODE>. With this command you have
made these attributes known to the system, which makes it possible to
assign different markup to these attributes later on.
<P>
<P>
<H2><A NAME="ss2.2">2.2 Adding some Markup</A>
</H2>

<P>Until now you haven't seen something exciting, so its time to specify
some markup. Add the following lines to our index style:
<P>
<BLOCKQUOTE><CODE>
<PRE>
(markup-index :open  "~n\begin{theindex}~n"
              :close "~n\end{theindex}~n"
              :tree)

(markup-locref :class "page-numbers" :attr "definition"
               :open  "{\bf " :close "}")

(markup-locclass-list :open "\quad{}")
(markup-locref-list :sep ", ")
</PRE>
</CODE></BLOCKQUOTE>
<P>Now run <SF>xindy</SF> again and afterwards LaTeX:
<P>
<BLOCKQUOTE><CODE>
<PRE>
eg$ xindy -t ex1.xlg -M style1 -I xindy ex1.raw
eg$ latex ex1.tex
</PRE>
</CODE></BLOCKQUOTE>
<P>You can view <CODE>ex1.dvi</CODE> with your prefered viewer (maybe <CODE>xdvi</CODE>
or something else) to get a first impression of your results. Maybe
your are not satisfied (for sure you aren't), because it still looks
very confusing. What did the above rules tell <SF>xindy</SF>? When you
view the file <CODE>ex1.ind</CODE> which is the result <SF>xindy</SF> generates,
you'll recognize some of the <EM>markup tags</EM> you specified. The
following is an excerpt of this file:
<P>
<BLOCKQUOTE><CODE>
<PRE>
\begin{theindex}
  academia\quad{}{\bf 1}acafetado\quad{}{\bf 2}acalmar\quad{}{\bf 4}
  açafrão\quad{}{\bf 3}indexflat\quad{}1hierarchical\quad{}2
  veryhierarchical\quad{}3impressive\quad{}4saber\quad{}{\bf 7}
  sabor\quad{}{\bf 8}sabão\quad{}{\bf 6}sábado\quad{}{\bf 5}
\end{theindex}
</PRE>
</CODE></BLOCKQUOTE>
<P>First of all you'll see that the file starts with the string
<CODE>\begin{theindex}</CODE> and ends with <CODE>\end{theindex}</CODE>.
Additionally some locations are correctly enclosed into a TeX macro
that typesets them in shape boldface, whereas others aren't. The
boldface ones are all those locations from the raw index that have the
attribute <CODE>definition</CODE>.
<P>The <CODE>:open</CODE> and <CODE>:close</CODE> keyword arguments each take a string as
argument. The first one is written to the file when <EM>opening</EM> an
enviroment, whereas the latter one <EM>closes</EM> an environment. What we
have specified is the markup for the whole index (which is actually
printed only once) and the markup for all locations of class
<CODE>page-numbers</CODE> which own the attribute <CODE>definition</CODE>. Here we
have cleanly separated the structured markup from the visual one,
allowing an easy redefinition if we decide, for example, to markup the
<CODE>definition</CODE>-locations in italics instead of boldface.
<P>Some words on <EM>keyword arguments</EM> and <EM>switches</EM>. Keyword
arguments such as <CODE>:open</CODE> or <CODE>:close</CODE> always take exactly one
argument which must be positioned right after the keyword separated by
a whitespace (a blank or a tab-stop). Switches don't take any
arguments. For example, <CODE>:tree</CODE> in the command <CODE>markup-index</CODE> is
a switch and thus it does not take an argument. We will use this
terminology throughout the rest of this document.
<P>The third command caused <SF>xindy</SF> to insert a horizontal space
between the keyword and the locations (the TeX command
<CODE>\quad{}</CODE> simply inserts a specific horizontal space). The last
command caused <SF>xindy</SF> to separate all location references from
each other with a comma followed by a blank, independently of any
location class.
<P>As you already may have observed, the tilde sign (<CODE>~</CODE>) serves
as a <EM>quoting character</EM>.
<P>We continue specifying markup to get a printable result by adding more
markup:
<P>
<BLOCKQUOTE><CODE>
<PRE>
(markup-indexentry :open "~n  \item "           :depth 0)
(markup-indexentry :open "~n    \subitem "      :depth 1)
(markup-indexentry :open "~n      \subsubitem " :depth 2)
</PRE>
</CODE></BLOCKQUOTE>
<P>This assigns different markup for the different hierarchy layers of
the indexentries. Our index is hierarchically organized. It contains
items which themselves contain more sub-items which also might contain
sub-sub-items. Each layer is started by a different markup which is
correctly assigned with the <CODE>:depth</CODE> keyword argument. The layers
are numbered by their <EM>depth</EM> starting from zero.
<P>Now run <SF>xindy</SF> and TeX again and enjoy your first index. It's
easy, isn't it! The output <CODE>ex1.ind</CODE> looks like the following:
<P>
<BLOCKQUOTE><CODE>
<PRE>
\begin{theindex}

 \item academia\quad{}{\bf 1}
 \item acafetado\quad{}{\bf 2}
 \item acalmar\quad{}{\bf 4}
 \item açafrão\quad{}{\bf 3}
 \item index
    \subitem flat\quad{}1
    \subitem hierarchical\quad{}2
    \subitem very
      \subsubitem hierarchical\quad{}3
      \subsubitem impressive\quad{}4
 \item saber\quad{}{\bf 7}
 \item sabor\quad{}{\bf 8}
 \item sabão\quad{}{\bf 6}
 \item sábado\quad{}{\bf 5}

\end{theindex}
</PRE>
</CODE></BLOCKQUOTE>
<P>Hmm, as you might have seen there are several problems that need
further investigation. The index contains some Portuguese words that
are printed correctly but should appear at other positions inside the
index. For instance, the word <EM>sábado</EM> should appear before the
word <EM>saber</EM> since <EM>á</EM> must be sorted as if it were simply an
<EM>a</EM>. The reason why these words are not sorted correctly is
simple---the accentuated letters have codes beyond position 128 in the
ISO-Latin alphabet. Sorting based on these codes yields this incorrect
order.
<P>What to do? We can define for each of the words containing these
special characters an explicit <EM>print key</EM>. The print key describes
the printed representation of the keyword whereas the <EM>key</EM> or
<EM>main key</EM> is used for sorting and merging. A very tedious task
which is not a very clever solution since in a non-english language
many many words contain these special cases. We follow the way
<SF>xindy</SF> offers: <EM>keyword-mappings</EM>.
<P>
<H2><A NAME="ss2.3">2.3 Keyword Mappings</A>
</H2>

<P>What are keyword mappings for? A good question. I'll try to give some
answers.
<P>
<UL>
<LI> <EM>Merging of differently written words</EM>. Some text formatting
systems allow different writings for the same word. For example, TeX
can be used with ISO-Latin characters as well as with its own
character sequences. If a document is composed from smaller ones
possibly written by different authors using different forms of writing
the index entries, the same terms may happen to be written differently
and consequently we need a mechanism to identify them as equal.
</LI>
<LI> <EM>Specifying the sort order</EM>. As outlined in the previous
section it is really difficult and error-prone to specify the sort key
for each keyword explicitly. Sometimes the sort order is even
different for the type of the document, as it happens in German, where
two different types of sortings exist, one for everyday indexes and
one for dictionaries. The sort order actually defines the position of
arbitrary language-specific letters inside of an index.
</LI>
</UL>
<P>A detailed elaboration of these ideas can be found in the paper <EM>An
International Version of MakeIndex</EM> by Joachim Schrod [3].
It describes the ideas that led to modifications on one of the
ancestors of the <SF>xindy</SF> system: <CODE>makeindex</CODE> [4].
<P>The keyword mappings are as follows. The <EM>merge key</EM> is generated
from the <EM>main key</EM> with the so called <EM>merge mapping</EM>. The
merge mapping can be specified with the command <CODE>merge-rule</CODE>. The
<EM>sort key</EM> is derived from the merge key using the <EM>sort
mapping</EM> specified with the <CODE>sort-rule</CODE> command. The following
scheme shows this mapping process:
<P>
<FIGURE>
<EPS FILE="mappings.eps">
<IMG SRC="mappings.gif">
</FIGURE>
<P>
<P>We will use this command now to define a suitable sort mapping that
fits our needs:
<P>
<BLOCKQUOTE><CODE>
<PRE>
(sort-rule "à" "a")
(sort-rule "á" "a")
(sort-rule "ã" "a")
(sort-rule "è" "e")
(sort-rule "é" "e")
(sort-rule "ç" "c")
</PRE>
</CODE></BLOCKQUOTE>
<P>These rules define mappings from a keyword to a <EM>normalized</EM>
version. In the logfile <CODE>ex1.xlg</CODE> these transformations are logged so
that one can see how these mappings are performed. In this example we
do not need any <CODE>merge-rule</CODE> but we will see applications in
further examples.
<P>Running <SF>xindy</SF> and TeXing the result now places the indexentries
at the right positions.
<P>In reality, such sort rules tend to be much more complex, due to
the idiosynchrasies of sorting natural languages. Sort rules for
many languages are available as part of the xindy distribution, as
so-called language modules. We specify the language with the option
<code>-L</code>, e.g., in our example we could have used
<BLOCKQUOTE><CODE>
<PRE>
eg$ xindy -t ex1.xlg -M style1 -L portuguese -I xindy ex1.raw
</PRE>
</CODE></BLOCKQUOTE>
<P>If we use one of the available language modules, sort rules in
user-written xindy styles are ignored.
<P>The result is now quite satisfying if the index entries weren't
clumped together that much. We usually want the different index
entries beginning with the same letter be optically separated from the
ofhers. This improves readability and there must be a way to
accomplish this---the <EM>letter groups</EM>.
<P>
<H2><A NAME="ss2.4">2.4 Letter Groups</A>
</H2>

<P>To group indexentries we must define what indexentries form a group.
The clustering is done by matching the keywords' prefixes (taken from
the <EM>sort key</EM>) with a user-defined table of prefixes and define
appropriate markup that separates the groups from each other. Here it
goes.
<P>
<BLOCKQUOTE><CODE>
<PRE>
(define-letter-groups
  ("a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m"
   "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"))

(markup-letter-group-list :sep "~n\indexspace")
</PRE>
</CODE></BLOCKQUOTE>
<P>This defines the given list of letter groups. When forming the letter
groups, each letter group is checked if it matches a prefix of the
indexentries' sort key. The longest match assigns the index entry to
this letter group. If no match was possible the index entry is put into
group <CODE>default</CODE>.
<P>The result now looks much better than before. You have now learned the
basic features that you need to specify everyday indexes. In the next
chapter we'll continue to make you an expert in indexing.
<P>
<P>
<HR>
<A HREF="style-tutorial-3.html">Next</A>
<A HREF="style-tutorial-1.html">Previous</A>
<A HREF="style-tutorial.html#toc2">Contents</A>
</BODY>
</HTML>
xindy-rules 2.5.1.20160104-4build1 / usr / share / doc / xindy-rules / style-tutorial-2.html