/usr/lib/R/site-library/Biobase/doc/legacy/Biobase.Rnw is in r-bioc-biobase 2.14.0-1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 | %
% NOTE -- ONLY EDIT Biobase.Rnw!!!
% Biobase.tex file will get overwritten.
%
%\VignetteIndexEntry{Biobase Primer}
%\VignetteDepends{}
%\VignetteKeywords{Expression Analysis}
%\VignettePackage{Biobase}
\documentclass{article}
\usepackage{hyperref}
\textwidth=6.2in
\textheight=8.5in
%\parskip=.3cm
\oddsidemargin=.1in
\evensidemargin=.1in
\headheight=-.3in
\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textsf{#1}}}
\newcommand{\Rmethod}[1]{{\texttt{#1}}}
\newcommand{\Rfunarg}[1]{{\texttt{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}
\newcommand{\classdef}[1]{%
{\em #1}
}
%%FIXME: Jianhua, can you go through the current class definitions for
%%exprSet and phenoData etc and update the documentation here accordingly.
\begin{document}
\title{Textual Description of Biobase}
\maketitle
\section*{Introduction}
Biobase is part of the Bioconductor project. It is meant to be the
location of any reusable (or non--specific) functionality.
Biobase will be required by most of the other Bioconductor libraries.
\section{Data Structures}
Part of the Biobase functionality is the standardization of data
structures for genomic data.
Currently we have designed some data structures to handle microarray
data.
The \classdef{exprSet} class has the following slots:
\begin{description}
\item[exprs] A matrix of expression levels. Arrays are columns and
genes are rows.
\item[se.exprs] A matrix of standard errors for expressions if they
are available. It will have length 0 if they
are not.
\item[phenoData] An object of class \verb+phenoData+ that contains
phenotypic and/or experimental data.
\item[description] A description of the experiment (object of class MIAME)
\item[annotation] A character string indicating the base name for the
associated annotation.
\item[notes] A set of notes describing aspects or features of the data,
the analysis, processing done, etc.
\end{description}
These data are extremely large and complex. To deal with them
effectively we will need better tools for combining data and
documentation. The \verb+exprSet+ class represents an initial attempt
by the Bioconductor project to provide better tools for documenting
and handling these large and complex data sets.
The expression data represent experimentally derived data. In most
cases these data will benefit from making use of biologically relevant
meta-data. The meta-data are very large and diverse. In order to
facilitate interactions and explorations we have taken the approach
of constructing a specialized meta-data package for each chip or instrument.
Many of these packages are available from the Bioconductor web site.
These packages contain information such as the gene name, symbol and
chromosomal location. There are other meta-data packages that contain
the information that is provided by other initiatives such as GO and
KEGG. These data can then be linked to the \verb+exprSet+ via the
\verb+annotation+ slot.
The \textit{annotate} package provides basic data manipulation tools
for the meta-data packages.
The \texttt{phenoData} class has the following slots:
\begin{description}
\item[pData] A dataframe where the rows are cases and the columns are
variables.
\item[varLabels] A list of labels and descriptions for the variables
represented by the columns of \verb+pData+.
\item[varMetaData] A \Rclass{data.frame} defining meta-data for the
variables contained in the \Robject{pData} slot.
\end{description}
Instances of this class are essentially \verb+data.frame+'s with some
additional documentation on the variables stored in the
\verb+varLabels+ and \Robject{varMetaData} slots.
%%FIXME: does this currently exist? It should be easy to require that
%%the names of the pData data.frame are the same, and in the same
%%order as the column names of the exprs
A mechanism for ensuring that the elements of the \verb+phenoData+
slot of an instance of \verb+exprSet+ are in the same order as the
columns of the \verb+exprs+ array is needed. It is important that these be
properly aligned since analyses will require this and automatic tools
for checking will probably be better than ad hoc ones.
In addition to the class definitions a number of special methods (or
functions) have been defined to operate on instances of these classes.
Some particular attention has been paid to subsetting operations.
Instances of both \verb+phenoData+ and \verb+exprSet+ are closed under
subsetting operations.
That is, any subset of one of these objects retains its class.
There are also specialized print methods for objects of both classes.
We consider an instance of an \verb+exprSet+ to be an expression array
with some additional information. Thus there are two subscripts, one
for the rows and one for the columns.
For that reason subsetting works in the following ways:
\begin{itemize}
\item If the first subscript is given then the appropriate subset
of rows from \verb+exprs+ and \verb+se.exprs+ is taken. All the data
in \verb+phenoData+ is propagated since no subset of cases was made.
\item If the second subscript is given then the appropriate set of
columns from \verb+exprs+ and \verb+se.exprs+ is taken. At the same
time the corresponding set of {\em rows} of \verb+phenoData+ are
taken.
\end{itemize}
\subsection{An exprSet Vignette}
In the data directory for Biobase there is a small anonymized data
set. It consists of expression level data for 500 genes on 26
patients. The data can be accessed with the command
\verb+data(geneData)+.
There are three artificial covariates provided as well.
These can be accessed using \verb+data(geneCovariate)+ once the Biobase
library is attached.
%%FIXME: is there a better way to do this?
The following vignette shows how to read in these data and to create
an instance of the \verb+exprSet+ class using those data.
The first step is to create an object of \verb+phenoData+ calss.
This object is used to store the phenotype data.
We will use the example data frame \verb+geneCovariate+ to create such an object.
<<R.hide, echo=F, results=hide>>=
library(Biobase)
@
%%<<R>>=
%%data(geneCov)
%%data(geneData)
%%covN <- list(cov1 = "Covariate 1; 2 levels",
%% cov2 = "Covariate 2; 2 levels",
%% cov3 = "Covariate 3; 3 levels")
%%
%%pD <- new("phenoData", pData=geneCov, varLabels=covN)
%%eSet <- new("exprSet", exprs=geneData, phenoData=pD)
%%
%%@
<<R.phenoData>>=
data(geneCovariate)
head(geneCovariate)
phenoD <- new("phenoData", pData=geneCovariate,
varLabels=list("sex"="Female/Male","type"="Case/Control","score"="Testing Score")
)
phenoD
@
We also need a \verb+MIAME+ object to represent the background information, such as the names of the experimenter and laboratory, for this data set.
<<R.MIAME>>=
descr <- new("MIAME",
name="Pierre Fermat",
lab="Francis Galton Lab",
contact="pfermat@lab.not.exist",
title="Smoking-Cancer Experiment",
abstract="An example object of expression set (exprSet) class",
url="www.lab.not.exist",
other=list(
notes="An example object of expression set (exprSet) class")
)
descr
@
Moreover, we need a \verb+data.frame+ object to group the reporters (probes) at this data set (chip).
The definition of probe types is applied from \textit{Affymetrix GeneChip Expression Analysis Data Analysis Fundamentals} (\url{http://www.affymetrix.com/Auth/support/downloads/manuals/data_analysis_fundamentals_manual.pdf}), and the result is stored in the \verb+data.frame+ object \Robject{reporter}. This is a one-column data frame. The rows represent the probe ids in the data set, and the values in the data frame are the predefined types associated with each probe.
<<R.reporterInfo>>=
data(reporter)
head(reporter)
@
The data frame \Robject{geneData}, containing 500 probes and 26 samples, is a subset of real expression data from an Affymetrix U95v2
chip. The \Robject{seD} is a \Rclass{matrix} object containing the standard errors of \Robject{geneData}. Finaly, we can put all objects together to create the \verb+exprSet+ object.
<<R.exSet>>=
data(geneData)
data(seD)
exSet <- new("exprSet",
exprs = geneData,
se.exprs = seD,
phenoData = phenoD,
annotation = "hgu95av2",
description = descr,
notes = descr@other$notes,
reporterInfo = reporter
)
exSet
@
If instead, you have the data stored in files, e.g. \texttt{.csv}
files and want to read it in using \Rfunction{read.table} you will
need to do some conversion. \Rfunction{read.table} reads data into a
\Robject{data.frame}, and this is fine for the phenotypic data, but
not for the expression data. You will need to convert the
\Robject{data.frame} into a matrix. The code example gives a short
example of the commands that you would need to execute if you have the
expression data stored in a file named \texttt{myexprs.csv} and the
phenotypic data in a file named \texttt{mycovs.csv}.
\begin{verbatim}
myExprs = read.csv("myexprs.csv")
myExprsmat = as.matrix(myExprs)
myCovs = read.csv("mycovs.csv")
if( ncol(myExprsmat) != nrow(myCovs) )
print("ERROR: these should be the same")
\end{verbatim}
You should probably also check to ensure that you have the appropriate
samples names used as row names for \Robject{myCovs} and as column
names for \Robject{myExprsmat}, and they must be in the same order.
So, again some pseudo-code that gives the essential ingredients is
given next. If you are unclear on the difference between a
\Robject{matrix} and a \Robject{data.frame} then you should consult
one of the very many introductory texts on using R as well as the
manual pages and documents that come with R.
\begin{verbatim}
if( !all.equal( row.names(myCovs), dimnames(myExprsmat)[[2]] ) )
print("ERROR: samples names are wrong")
\end{verbatim}
Now, we are almost ready to create an instance of an \Robject{exprSet}
object. The missing piece is a list of descriptions for the covariates
that constitute the phenotypic data. Suppose that we have two
variables, one the age of the patient in years and the other the
particular translocation they are known to have. And suppose that in
the \Robject{data.frame} \Robject{myCovs} the column names are
\texttt{age} and \texttt{translocation}. Then we could make up the
covariate descriptions as follows.
\begin{verbatim}
covDesc = list(age="age of the patient in years",
translocation="known translocation")
\end{verbatim}
The purpose of this level of documentation is to help others who might
want to re-analyze your data, or you, if at some time in the future
you need to go back and see what was done. By carefully annotating the
data, in a self-describing way, you increase the chances that you will
understand what was done. Below is the simplest form for creating an
\Robject{exprSet}. We note that typically you would fill in more
details, like the name of the chip used and some additional data
describing the experiment itself.
\begin{verbatim}
myphenoData = new("phenoData", pData=myCovs, varLabels=covDesc)
myEset = new("exprSet", exprs=myExprsmat, phenoData=myphenoData)
\end{verbatim}
\section{Aggregate}
When performing an interative computation such as cross--validation or
bootstrapping it is often useful to be able to aggregate certain intermediate
results.
The \verb+Aggregate+ functions (and soon the Aggregate class) provide
some simple tools for doing this.
The strategy employed is to maintain the summary statistics in an
environment. This is passed to the iterative function. It does not
need to be returned since environments have a {\em
pass--by--reference} semantic. Once the function has finished the
environment can be queried for the summary statistics.
One simple task that people often want to carry out is to determine in
a cross--validation calculation which genes are selected the most
often. In some sense these genes may form a more stable basis for
inference.
Achieving that using an Aggregator is very straight forward.
At each iteration we will pass the names of the selected genes to the
Aggregator. It has two functions, one for initializing and one for
updating (or aggregating). The aggregator also has an environment.
This environment stores the data that is being aggregated.
For our cross--validation example the process goes as follows:
\begin{enumerate}
\item At each iteration Aggregate is called with the list of genes
selected.
\item For each gene in that list we check to see if it was selected
before.
\begin{enumerate}
\item If not then \verb+initfun+ is called with that gene name.
The value returned by \verb+initfun+ is then associated with
the gene name in the aggregation environment.
\item If so, then the current value is obtained and \verb+agfun+ is
called with the the gene name and the current value. This returns
a new value that is then associated with the gene name in the
aggregation environment.
\end{enumerate}
\end{enumerate}
Basically we are using this as a form of updating hash table.
At the same time we are slightly subverting R's usual pass--by--value
semantics.
%\section*{Resampling exprSet's}
%need to fill this in again
\section{Session Information}
The version number of R and packages loaded for generating the vignette were:
<<echo=FALSE,results=tex>>=
toLatex(sessionInfo())
@
\end{document}
|