This file is indexed.

/usr/lib/R/site-library/multtest/otherDocs/multtest.Rnw is in r-bioc-multtest 2.34.0-1.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
% \VignetteIndexEntry{multtest Tutorial}
% \VignetteKeywords{Expression Analysis}
% \VignettePackage{multtest}
\documentclass[11pt]{article}

\usepackage{amsmath,epsfig,fullpage}
\usepackage{graphicx}
\usepackage[authoryear,round]{natbib}
\usepackage{hyperref}

\parindent 0in

\bibliographystyle{abbrvnat}

\begin{document}

\title{\bf Bioconductor's multtest package}
\author{Sandrine Dudoit$^1$ and Yongchao Ge$^2$}

\maketitle

\begin{center}
1. Division of Biostatistics, University of California, Berkeley,
   \url{http://www.stat.berkeley.edu/~sandrine}\\
2. Department of Biomathematical Sciences, Mount Sinai School of Medicine, New York,
   {\tt yongchao.ge@mssm.edu}\\
\end{center}

\tableofcontents

% library(tools)
% Rnwfile<- file.path("/home/sandrine/CVS_stuff/madman/Rpacks/multtest/inst/doc",
%                     "multtest.Rnw") 
% Sweave(Rnwfile,pdf=TRUE,eps=TRUE,stylepath=TRUE,driver=RweaveLatex())

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Overview}

The {\tt multtest} package contains a collection of functions for
multiple hypothesis testing. These functions can be used to identify
differentially expressed genes in microarray experiments, i.e., genes
whose expression levels are associated with a response or covariate of
interest. \\  

{\bf Introduction to multiple testing.} This document provides a
tutorial for using the {\tt multtest} package. For a detailed
introduction to multiple testing consult the document {\tt
  multtest.intro} in the {\tt inst/doc} directory of the package. See
also \cite{Shaffer95} and  \cite{Dudoit&Shaffer02} for a review of
multiple testing procedures and complete references.\\ 

{\bf Multiple testing procedures implemented in {\tt multtest}.}
The {\tt multtest} package implements multiple testing procedures for
controlling different Type I error rates. It includes procedures for
controlling the family--wise Type I error rate (FWER): Bonferroni,
\cite{Hochberg88}, \cite{Holm79}, Sidak, \cite{Westfall&Young93} minP
and maxT procedures. It also includes procedures for controlling the
false discovery rate (FDR): \cite{Benjamini&Hochberg95} and
\cite{Benjamini&Yekutieli01} step--up procedures. These procedures are
implemented for tests based on $t$--statistics, $F$--statistics,
paired $t$--statistics, block $F$--statistics, Wilcoxon
statistics. The results of the procedures are summarized using
adjusted $p$--values, which reflect for each gene the overall
experiment Type I error rate when genes with a smaller $p$--value are
declared differentially expressed. Adjusted $p$--values may be
obtained either from the nominal distribution of the test statistics
or by permutation. The permutation algorithm for the maxT and minP
procedures is described in \cite{Ge&Dudoit}.\\ 

{\bf Help files.}  As with any R package, detailed information on
functions, their arguments and value, can be obtained in the help
files. For instance, to view the help file for the function {\tt
  mt.maxT} in a browser, use {\tt help.start()} followed by {\tt ?
  mt.maxT}. 



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Case study: the ALL/AML leukemia dataset of Golub et al. (1999)}

 We demonstrate the functionality of this package using gene
 expression data from the leukemia ALL/AML study of
 \cite{Golubetal}. To load the leukemia dataset, use {\tt
   data(golub)}, and to view a description of the experiments and
 data, type {\tt ? golub}.  

%<<eval=TRUE, echo=TRUE>>=
<<>>=
library(multtest, verbose=FALSE)
data(golub)
@

\cite{Golubetal} were interested in identifying genes that are
differentially expressed in patients with two type of leukemias, acute
lymphoblastic leukemia (ALL, class 0) and acute myeloid leukemia (AML,
class 1). Gene expression levels were measured using Affymetrix
high--density oligonucleotide chips containing $p=6,817$ human
genes. The learning set comprises $n=38$ samples, 27 ALL cases and 11
AML cases (data available at {\tt
  http://www.genome.wi.mit.edu/MPR}). Following Golub et al. (personal
communication, Pablo Tamayo), three preprocessing steps were applied
to the normalized matrix of intensity values available on the website:
(i) thresholding: floor 
of 100 and ceiling of 16,000; (ii) filtering: exclusion of genes with
$\max/\min \leq 5$ or $(\max-\min) \leq 500$, where $\max$ and $\min$ refer
respectively to the maximum and minimum intensities for a
particular gene across mRNA samples; (iii) base 10 logarithmic
transformation. Boxplots of the expression levels for each of the 38
samples revealed the need to standardize the expression levels within
arrays before combining data across samples. The data were then
summarized by a $3,051 \times 38 $ matrix $X=(x_{ji})$, where $x_{ji}$
denotes the expression level for gene $j$ in tumor mRNA sample $i$. \\


The dataset {\tt golub} contains the gene expression data for the 38
training set tumor mRNA samples and 3,051 genes retained after
pre--processing. The dataset includes 

\begin{itemize}
\item
{{\tt golub}:} a $3,051 \times 38 $ matrix of expression levels;
\item
{{\tt golub.gnames}:} a $3,051 \times 3 $ matrix of gene identifiers;
\item
{{\tt golub.cl}:} a vector of tumor class labels (0 for ALL, 1 for AML). 
\end{itemize}

%<<eval=TRUE, echo=TRUE>>=
<<>>=
dim(golub)
golub[1:4,1:4]
dim(golub.gnames)
golub.gnames[1:4,]
golub.cl
@


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{The {\tt mt.teststat} and {\tt mt.teststat.num.denum} functions}


The {\tt mt.teststat} and {\tt mt.teststat.num.denum} functions
provide a convenient way to compute test statistics for each row of a
data frame, e.g., two--sample Welch $t$--statistics, Wilcoxon
statistics, $F$--statistics, paired $t$--statistics, block
$F$--statistics. To compute two--sample $t$--statistics comparing, for
each gene, expression in the ALL cases to expression in the AML cases 

%<<eval=TRUE, echo=TRUE>>=
<<>>=
teststat<-mt.teststat(golub,golub.cl)
@

The following produces a normal Quantile--Quantile (Q--Q) plot of the
test statistics (Figure \ref{fig:mtQQ})
. In our application, we are
not so much interested in testing whether the test statistics follow a
particular distribution, but in using the Q--Q plot as a visual aid
for identifying genes with ``unusual'' test statistics. Q--Q plots
informally correct for the large number of comparisons and the points
which deviate markedly from an otherwise linear relationship are
likely to correspond to those genes whose expression levels differ
between the control and treatment groups.

%%<<mtQQ,fig=TRUE,prefix=FALSE,echo=TRUE,include=FALSE>>=
%%\begin{verbatim}
<<>>=
postscript("mtQQ.eps")
qqnorm(teststat)
qqline(teststat)
dev.off()
pdf("mtQQ.pdf")
qqnorm(teststat)
qqline(teststat)
dev.off()
@
%%\end{verbatim}
%%@

We may also wish to look at plots of the numerators and denominators
of the test statistics (Figure \ref{fig:mtNumDen}) 

%%<<mtNumDen,fig=TRUE,prefix=FALSE,echo=TRUE,include=FALSE>>=
%%\begin{verbatim}
<<>>=
tmp<-mt.teststat.num.denum(golub,golub.cl,test="t")
num<-tmp$teststat.num
denum<-tmp$teststat.denum
postscript("mtNumDen.eps")
plot(sqrt(denum),num)
dev.off()
pdf("mtNumDen.pdf")
plot(sqrt(denum),num)
dev.off()
@
%%\end{verbatim}
%%@

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{The {\tt mt.rawp2adjp} function}

This function computes adjusted $p$--values for simple multiple
testing procedures from a vector of raw (unadjusted) $p$--values. The
procedures include the 
Bonferroni, \cite{Holm79}, \cite{Hochberg88}, and Sidak procedures for
strong control of the family--wise Type I error rate (FWER), and the
\cite{Benjamini&Hochberg95} and \cite{Benjamini&Yekutieli01}
procedures for (strong) control of the false discovery rate (FDR). \\ 

As a first approximation, compute raw nominal two--sided $p$--values
for the $3,051$ test statistics using the standard Gaussian
distribution 

%%<<eval=TRUE, echo=TRUE>>=
<<>>=
rawp0<-2*(1-pnorm(abs(teststat)))
@

Adjusted $p$--values for these seven multiple testing procedures can
be computed as follows and stored in the original gene order in {\tt
  adjp} using {\tt order(res\$index)} 

%%<<eval=TRUE, echo=TRUE>>=
<<>>=
procs<-c("Bonferroni","Holm","Hochberg","SidakSS","SidakSD","BH","BY")
res<-mt.rawp2adjp(rawp0,procs)
adjp<-res$adjp[order(res$index),]
round(adjp[1:10,],2)
@

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{The {\tt mt.maxT} and {\tt mt.minP} functions}

The {\tt mt.maxT} and {\tt mt.minP} functions compute permutation
adjusted $p$--values for the maxT and minP step--down multiple testing
procedure described in \cite{Westfall&Young93}. These procedure
provide strong control of the FWER and also incorporate the joint
dependence structure between the test statistics. There are thus in
general less conservative than the standard Bonferroni procedure. The
permutation algorithm for the maxT and minP procedures is described in
\cite{Ge&Dudoit}.\\

Permutation unadjusted $p$--values and adjusted $p$--values for the
maxT procedure with Welch $t$--statistics are computed as
follows. {\tt mt.maxT} returns $p$--values sorted in decreasing order
of the absolute $t$--statistics and {\tt order(resT\$index)} is used
to obtain $p$--values and test statistics in the original gene
order. In practice, the number of permutations $B$ should be several
thousands, we set $B=1,000$ here for illustration purposes.

%%<<eval=TRUE, echo=TRUE>>=
<<>>=
resT<-mt.maxT(golub,golub.cl,B=1000)
ord<-order(resT$index)
rawp<-resT$rawp[ord]
maxT<-resT$adjp[ord]
teststat<-resT$teststat[ord]
@


Three functions related to the {\tt mt.maxT} and {\tt mt.minP}
functions are {\tt mt.sample.teststat}, {\tt mt.sample.rawp}, and {\tt
mt.sample.label}. These functions provide tools to investigate the
permutation distribution of test statistics, raw (unadjusted)
$p$--values, and class labels, respectively.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{The {\tt mt.reject} function}

The function {\tt mt.reject} returns the identity and number of rejected
     hypotheses for several multiple testing procedures and different 
     nominal Type I error rates. The number of hypotheses rejected
     using unadjusted $p$--values and maxT $p$--values for different
     Type I error rates ($\alpha=0, 0.1, 0.2, \ldots, 1$) can be
     obtained by 

%%<<eval=TRUE, echo=TRUE>>=
<<>>=
mt.reject(cbind(rawp,maxT),seq(0,1,0.1))$r
@

The genes with maxT $p$--values less than or equal to 0.01 are 

%%<<eval=TRUE, echo=TRUE>>=
<<>>=
which<-mt.reject(cbind(rawp,maxT),0.01)$which[,2]
golub.gnames[which,2]
@

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{The {\tt mt.plot} function}

The {\tt mt.plot} function produces a number of graphical summaries
for the results of multiple testing procedures and their corresponding
adjusted $p$--values. To produce plots of sorted permutation
unadjusted $p$--values and adjusted $p$--values for the Bonferroni,
maxT, \cite{Benjamini&Hochberg95}, and \cite{Benjamini&Yekutieli01}
procedures use 

%%<<eval=TRUE, echo=TRUE>>=
<<>>=
res<-mt.rawp2adjp(rawp,c("Bonferroni","BH","BY"))
adjp<-res$adjp[order(res$index),]
allp<-cbind(adjp,maxT)
dimnames(allp)[[2]]<-c(dimnames(adjp)[[2]],"maxT")
procs<-dimnames(allp)[[2]]
procs<-procs[c(1,2,5,3,4)]
cols<-c(1,2,3,5,6)
ltypes<-c(1,2,2,3,3)
@

For plotting sorted adjusted $p$--values set the argument {\tt plottype="pvsr"}

%%<<mtpvsr,fig=TRUE,prefix=FALSE,echo=TRUE,include=FALSE>>=
%%\begin{verbatim}
<<>>=
postscript("mtpvsr.eps")
mt.plot(allp[,procs],teststat,plottype="pvsr",
        proc=procs,leg=c(2000,0.4),lty=ltypes,col=cols,lwd=2) 
dev.off()
pdf("mtpvsr.pdf")
mt.plot(allp[,procs],teststat,plottype="pvsr",
        proc=procs,leg=c(2000,0.4),lty=ltypes,col=cols,lwd=2) 
dev.off()
@
%%\end{verbatim}
%%@

and for plotting adjusted $p$--values vs. the test statistics use {\tt
  plottype="pvst"} 

%%<<mtpvst,fig=TRUE,prefix=FALSE,echo=TRUE,include=FALSE>>=
%%\begin{verbatim}
<<>>=
postscript("mtpvst.eps")
mt.plot(allp[,procs],teststat,plottype="pvst",
        logscale=TRUE,proc=procs,leg=c(-0.5,2),pch=ltypes,col=cols)
dev.off()
pdf("mtpvst.pdf")
mt.plot(allp[,procs],teststat,plottype="pvst",
        logscale=TRUE,proc=procs,leg=c(-0.5,2),pch=ltypes,col=cols)
dev.off()
@
%%\end{verbatim}
%%@

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\bibliography{multtest} 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\begin{figure}[ht]
  %%\centerline{\epsfig{figure=mtQQ.eps,width=4in,height=4in,angle=0}}
  \begin{center}
    \includegraphics[width=4in,height=4in,angle=0]{mtQQ}
  \end{center}
  \caption{Normal Q--Q plot of $t$--statistics for leukemia data.}
  \protect\label{fig:mtQQ}
\end{figure}

\begin{figure}[ht]
  %%\centerline{\epsfig{figure=mtNumDen.eps,width=4in,height=4in,angle=0}}
  \begin{center}
    \includegraphics[width=4in,height=4in,angle=0]{mtNumDen}
  \end{center}
  \caption{Numerator vs. square root of denominator of the
    $t$--statistics for the leukemia data.} 
  \protect\label{fig:mtNumDen}
\end{figure}

\begin{figure}[ht]
  %%\centerline{\epsfig{figure=mtpvsr.eps,width=4in,height=4in,angle=0}}
  \begin{center}
    \includegraphics[width=4in,height=4in,angle=0]{mtpvsr}
  \end{center}
  \caption{Sorted adjusted $p$--values for the leukemia data.}
  \protect\label{fig:mtpvsr}
\end{figure}

\begin{figure}[ht]
  %%\centerline{\epsfig{figure=mtpvst.eps,width=4in,height=4in,angle=0}}
  \begin{center}
    \includegraphics[width=4in,height=4in,angle=0]{mtpvst}
  \end{center}
  \caption{Adjusted $p$--values (log scale) vs. $t$--statistics for the
    leukemia data.}  
  \protect\label{fig:mtpvst}
\end{figure}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\end{document}