This file is indexed.

/usr/lib/R/site-library/GenomicRanges/doc/ExtendingGenomicRanges.Rnw is in r-bioc-genomicranges 1.30.0-1.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
% \VignetteIndexEntry{5. Extending GenomicRanges}
% \VignetteDepends{GenomicRanges, VariantAnnotation}
% \VignetteKeywords{ranges}
% \VignettePackage{GenomicRanges}

\documentclass{article}

\usepackage[authoryear,round]{natbib}

<<style, echo=FALSE, results=tex>>=
BiocStyle::latex(use.unsrturl=FALSE)
@

\title{Extending \Biocpkg{GenomicRanges}}
\author{Michael Lawrence, Bioconductor Team}
\date{Edited: Oct 2014; Compiled: \today}

\begin{document}

\maketitle

\tableofcontents

<<options, echo=FALSE>>=
options(width=72)
options(showHeadLines=3)
options(showTailLines=3)
@

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

\section{Introduction}

The goal of \Biocpkg{GenomicRanges} is to provide general containers
for genomic data. The central class, at least from the user
perspective, is \Rclass{GRanges}, which formalizes the notion of
ranges, while allowing for arbitrary ``metadata columns'' to be
attached to it. These columns offer the same flexibility as the
venerable \Rclass{data.frame} and permit users to adapt
\Rclass{GRanges} to a wide variety of \textit{adhoc} use-cases.

The more we encounter a particular problem, the better we understand
it. We eventually develop a systematic approach for solving the most
frequently encountered problems, and every systematic approach
deserves a systematic implementation. For example, we might want to
formally store genetic variants, with information on alleles and read
depths. The metadata columns, which were so useful during prototyping,
are inappropriate for extending the formal semantics of our data
structure: for the sake of data integrity, we need to ensure that the
columns are always present and that they meet certain constraints.

We might also find that our prototype does not scale well to the
increased data volume that often occurs when we advance past the
prototype stage. \Rclass{GRanges} is meant mostly for prototyping and
stores its data in memory as simple R data structures. We may require
something more specialized when the data are large; for example, we
might store the data as a Tabix-indexed file, or in a database.

The \Biocpkg{GenomicRanges} package does not directly solve either of
these problems, because there are no general solutions. However, it is
adaptible to specialized use cases.

\section{The \Rclass{GenomicRanges} abstraction}

Unbeknownst to many, most of the \Rclass{GRanges} implementation is
provided by methods on the \Rclass{GenomicRanges} class, the virtual
parent class of \Rclass{GRanges}. \Rclass{GenomicRanges} methods
provide everything except for the actual data storage and retrieval,
which \Rclass{GRanges} implements directly using slots. For example,
the ranges are retrieved like this:
<<granges-ranges>>=
library(GenomicRanges)
selectMethod(ranges, "GRanges")
@ 

An alternative implementation is \Rclass{DelegatingGenomicRanges},
which stores all of its data in a delegate \Rclass{GenomicRanges}
object:
<<delegating-granges-ranges>>=
selectMethod(ranges, "DelegatingGenomicRanges")
@ 

This abstraction enables us to pursue more efficient implementations
for particular tasks. One example is \Rclass{GNCList}, which is
indexed for fast range queries, we expose here:
<<gnclist-granges>>=
getSlots("GNCList")["granges"]
@ 

The \Biocpkg{MutableRanges} package in svn provides other, untested
examples.

\section{Formalizing \texttt{mcols}: Extra column slots}

An orthogonal problem to data storage is adding semantics by the
formalization of metadata columns, and we solve it using the ``extra
column slot'' mechanism. Whenever \Rclass{GenomicRanges} needs to
operate on its metadata columns, it also delegates to the internal
\Rfunction{extraColumnSlotNames} generic, methods of which should
return a character vector, naming the slots in the
\Rclass{GenomicRanges} subclass that correspond to columns (i.e., they
have one value per range). It extracts the slot values and manipulates
them as it would a metadata column -- except they are now formal
slots, with formal types.

An example is the \Rclass{VRanges} class in
\Biocpkg{VariantAnnotation}. It stores information on the variants by
adding these column slots:
<<vranges>>=
GenomicRanges:::extraColumnSlotNames(VariantAnnotation:::VRanges())
@ 

Mostly for historical reasons, \Rclass{VRanges} extends
\Rclass{GRanges}. However, since the data storage mechanism and the
set of extra column slots are orthogonal, it is probably best practice
to take a composition approach by extending
\Rclass{DelegatingGenomicRanges}. 

\end{document}