This file is indexed.

/usr/lib/R/site-library/gdata/doc/mapLevels.Rnw is in r-cran-gdata 2.13.3-1.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
%\VignetteIndexEntry{Mapping levels of a factor}
%\VignettePackage{gdata}
%\VignetteKeywords{levels, factor, manip}

\documentclass[a4paper]{report}
\usepackage{Rnews}
\usepackage[round]{natbib}
\bibliographystyle{abbrvnat}

\usepackage{Sweave}
\SweaveOpts{strip.white=all, keep.source=TRUE}

\begin{document}

\begin{article}

\title{Mapping levels of a factor}
\subtitle{The \pkg{gdata} package}
\author{by Gregor Gorjanc}

\maketitle

\section{Introduction}

Factors use levels attribute to store information on mapping between
internal integer codes and character values i.e. levels. First level is
mapped to internal integer code 1 and so on. Although some users do not
like factors, their use is more efficient in terms of storage than for
character vectors. Additionally, there are many functions in base \R{} that
provide additional value for factors. Sometimes users need to work with
internal integer codes and mapping them back to factor, especially when
interfacing external programs. Mapping information is also of interest if
there are many factors that should have the same set of levels. This note
describes \code{mapLevels} function, which is an utility function for
mapping the levels of a factor in \pkg{gdata} \footnote{from version 2.3.1}
package \citep{WarnesGdata}.

\section{Description with examples}

Function \code{mapLevels()} is an (S3) generic function and works on
\code{factor} and \code{character} atomic classes. It also works on
\code{list} and \code{data.frame} objects with previously mentioned atomic
classes. Function \code{mapLevels} produces a so called ``map'' with names
and values. Names are levels, while values can be internal integer codes or
(possibly other) levels. This will be clarified later on.  Class of this
``map'' is \code{levelsMap}, if \code{x} in \code{mapLevels()} was atomic
or \code{listLevelsMap} otherwise - for \code{list} and \code{data.frame}
classes. The following example shows the creation and printout of such a
``map''.

<<ex01>>=
library(gdata)
(fac <- factor(c("B", "A", "Z", "D")))
(map <- mapLevels(x=fac))
@

If we have to work with internal integer codes, we can transform factor to
integer and still get ``back the original factor'' with ``map'' used as
argument in \code{mapLevels<-} function as shown bellow. \code{mapLevels<-}
is also an (S3) generic function and works on same classes as
\code{mapLevels} plus \code{integer} atomic class.

<<ex02>>=
(int <- as.integer(fac))
mapLevels(x=int) <- map
int
identical(fac, int)
@

Internally ``map'' (\code{levelsMap} class) is a \code{list} (see bellow),
but its print method unlists it for ease of inspection. ``Map'' from
example has all components of length 1. This is not mandatory as
\code{mapLevels<-} function is only a wrapper around workhorse function
\code{levels<-} and the later can accept \code{list} with components of
various lengths.

<<ex03>>=
str(map)
@

Although not of primary importance, this ``map'' can also be used to remap
factor levels as shown bellow.  Components ``later'' in the map take over
the ``previous'' ones. Since this is not optimal I would rather recommend
other approaches for ``remapping'' the levels of a \code{factor}, say
\code{recode} in \pkg{car} package \citep{FoxCar}.

<<ex04>>=
map[[2]] <- as.integer(c(1, 2))
map
int <- as.integer(fac)
mapLevels(x=int) <- map
int
@

Up to now examples showed ``map'' with internal integer codes for values
and levels for names. I call this integer ``map''. On the other hand
character ``map'' uses levels for values and (possibly other) levels for
names. This feature is a bit odd at first sight, but can be used to easily
unify levels and internal integer codes across several factors.  Imagine
you have a factor that is for some reason split into two factors \code{f1}
and \code{f2} and that each factor does not have all levels. This is not
uncommon situation.

<<ex05>>=
(f1 <- factor(c("A", "D", "C")))
(f2 <- factor(c("B", "D", "C")))
@

If we work with this factors, we need to be careful as they do not have the
same set of levels. This can be solved with appropriately specifying
\code{levels} argument in creation of factors i.e. \code{levels=c("A", "B",
  "C", "D")} or with proper use of \code{levels<-} function. I say proper
as it is very tempting to use:

<<ex06>>=
fTest <- f1
levels(fTest) <- c("A", "B", "C", "D")
fTest
@

Above example extends set of levels, but also changes level of 2nd and 3rd
element in \code{fTest}! Proper use of \code{levels<-} (as shown in
\code{levels} help page) would be:

<<ex07>>=
fTest <- f1
levels(fTest) <- list(A="A", B="B",
                      C="C", D="D")
fTest
@

Function \code{mapLevels} with character ``map'' can help us in such
scenarios to unify levels and internal integer codes across several
factors. Again the workhorse under this process is \code{levels<-} function
from base \R{}! Function \code{mapLevels<-} just controls the assignment of
(integer or character) ``map'' to \code{x}. Levels in \code{x} that match
``map'' values (internal integer codes or levels) are changed to ``map''
names (possibly other levels) as shown in \code{levels} help page. Levels
that do not match are converted to \code{NA}. Integer ``map'' can be
applied to \code{integer} or \code{factor}, while character ``map'' can be
applied to \code{character} or \code{factor}. Result of \code{mapLevels<-}
is always a \code{factor} with possibly ``remapped'' levels.

To get one joint character ``map'' for several factors, we need to put
factors in a \code{list} or \code{data.frame} and use arguments
\code{codes=FALSE} and \code{combine=TRUE}. Such map can then be used to
unify levels and internal integer codes.

<<ex08>>=
(bigMap <- mapLevels(x=list(f1, f2),
                     codes=FALSE,
                     combine=TRUE))
mapLevels(f1) <- bigMap
mapLevels(f2) <- bigMap
f1
f2
cbind(as.character(f1), as.integer(f1),
      as.character(f2), as.integer(f2))
@

If we do not specify \code{combine=TRUE} (which is the default behaviour)
and \code{x} is a \code{list} or \code{data.frame}, \code{mapLevels}
returns ``map'' of class \code{listLevelsMap}. This is internally a
\code{list} of ``maps'' (\code{levelsMap} objects). Both
\code{listLevelsMap} and \code{levelsMap} objects can be passed to
\code{mapLevels<-} for \code{list}/\code{data.frame}. Recycling occurs when
length of \code{listLevelsMap} is not the same as number of
components/columns of a \code{list}/\code{data.frame}.

Additional convenience methods are also implemented to ease the work with
``maps'':

\begin{itemize}

\item \code{is.levelsMap}, \code{is.listLevelsMap}, \code{as.levelsMap} and
  \code{as.listLevelsMap} for testing and coercion of user defined
  ``maps'',

\item \code{"["} for subsetting,

\item \code{c} for combining \code{levelsMap} or \code{listLevelsMap}
  objects; argument \code{recursive=TRUE} can be used to coerce
  \code{listLevelsMap} to \code{levelsMap}, for example \code{c(llm1, llm2,
    recursive=TRUE)} and

\item \code{unique} and \code{sort} for \code{levelsMap}.

\end{itemize}

\section{Summary}

Functions \code{mapLevels} and \code{mapLevels<-} can help users to map
internal integer codes to factor levels and unify levels as well as
internal integer codes among several factors. I welcome any comments or
suggestions.

% \bibliography{refs}
\begin{thebibliography}{1}
\providecommand{\natexlab}[1]{#1}
\providecommand{\url}[1]{\texttt{#1}}
\expandafter\ifx\csname urlstyle\endcsname\relax
  \providecommand{\doi}[1]{doi: #1}\else
  \providecommand{\doi}{doi: \begingroup \urlstyle{rm}\Url}\fi

\bibitem[Fox(2006)]{FoxCar}
J.~Fox.
\newblock \emph{car: Companion to Applied Regression}, 2006.
\newblock URL \url{http://socserv.socsci.mcmaster.ca/jfox/}.
\newblock R package version 1.1-1.

\bibitem[Warnes(2006)]{WarnesGdata}
G.~R. Warnes.
\newblock \emph{gdata: Various R programming tools for data manipulation},
  2006.
\newblock URL
  \url{http://cran.r-project.org/src/contrib/Descriptions/gdata.html}.
\newblock R package version 2.3.1. Includes R source code and/or documentation
  contributed by Ben Bolker, Gregor Gorjanc and Thomas Lumley.

\end{thebibliography}

\address{Gregor Gorjanc\\
  University of Ljubljana, Slovenia\\
\email{gregor.gorjanc@bfro.uni-lj.si}}

\end{article}

\end{document}