This file is indexed.

/usr/lib/R/site-library/snpStats/doc/tdt-vignette.Rnw is in r-bioc-snpstats 1.24.0+dfsg-1.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
%\documentclass[a4paper,12pt]{article}
\documentclass[12pt]{article}
\usepackage{fullpage}
% \usepackage{times}
%\usepackage{mathptmx}
%\renewcommand{\ttdefault}{cmtt}
\usepackage{graphicx}

\usepackage[pdftex,
bookmarks,
bookmarksopen,
pdfauthor={David Clayton},
pdftitle={TDT and snpStats Vignette}]
{hyperref}

\title{TDT vignette\\Use of snpStats in family--based studies}
\author{David Clayton}
\date{\today}

\usepackage{Sweave}
\SweaveOpts{echo=TRUE, pdf=TRUE, eps=FALSE}

\begin{document}
\setkeys{Gin}{width=1.0\textwidth}

%\VignetteIndexEntry{TDT tests}
%\VignettePackage{snpStats}

\maketitle


\section*{Pedigree data}

The {\tt snpStats} package contains some tools for analysis of
family-based studies. These assume that a subject support file
provides the information necessary to reconstruct pedigrees in the
well-known format used in the {\it LINKAGE} package. Each line of the
support file  must contain an identifier of the {\em pedigree} to which
the individual belongs, together with an identifier of subject within pedigree,
and the within-pedigree identifiers for the subject's father and
mother. Usually this information, together with phenotype data, will
be contained in a dataframe with rownames which link to the rownames
of the {\tt SnpMatrix} containing the genotype data. The following
commands read some illustrative data on 3,017 subjects and 43
(autosomal) SNPs\footnote{These data are on a much smaller scale than
  would arise in genome-wide studies, but serve to illustrate the
  available tools. Note, however, that execution speeds are quite adequate for
  genome-wide data.}. The data consist of a dataframe containing the
subject and pedigree information ({\tt pedData}) and a {\tt
  SnpMatrix} containing the genotype data ({\tt genotypes}):
<<family-data>>=
require(snpStats)
data(families)
genotypes
head(pedData)
@ 
The first family comprises four individuals: two parents and two
sibling offspring. The parents are ``founders'' in the pedigree, {\it
  i.e.}  there is no data for their parents, so that their {\tt father}
and {\tt mother} identifiers are set to {\tt NA}. This differs from
the convention in the {\it LINKAGE} package, which would code these as
zero. Otherwise coding is as in {\it LINKAGE}: {\tt sex} is coded 1 for
  male and 2 for female, and disease status ({\tt affected}) is coded
  1 for unaffected and 2 for affected.
  
\section*{Checking for mis-inheritances}

The function {\tt misinherits} counts non-Mendelian inheritances in
the data. It returns a logical matrix with one row for each subject
who has any mis-inheritances and one column for each SNP which was ever
mis-inherited. 
<<mis-inheritances>>=
mis <- misinherits(data=pedData, snp.data=genotypes)
dim(mis)
@ 
Thus, 114 of the subjects and 37 of the SNPs had at least one
mis-inheritance. The following commands count mis-inheritances per
subject and plot its frequency distribution, and similarly,
for mis-inheritances per SNP:

<<per-subj-snp,fig=TRUE>>=
per.subj <- apply(mis, 1, sum, na.rm=TRUE)
per.snp <- apply(mis, 2, sum, na.rm=TRUE)
par(mfrow = c(1, 2))
hist(per.subj,main='Histogram per Subject', xlab='Subject')
hist(per.snp,main='Histogram per SNP', xlab='SNP')
@ 

Note that mis-inheritances must be ascribed to offspring, although the
error may lie with the parent data. The following commands first
extract the pedigree identifiers for mis-inheriting subjects and go on
to chart the numbers of mis-inheritances per family:
<<per-family,fig=TRUE>>=
fam <- pedData[rownames(mis), "familyid"]
per.fam <- tapply(per.subj, fam, sum)
par(mfrow = c(1, 1))
hist(per.fam, main='Histogram per Family', xlab='Family')
@ 
None of the above analyses suggest serious problems with the data,
although there are clearly a few genotyping errors.

\section*{TDT tests}

At present, the package only allows testing of
discrete disease phenotypes in case--parent trios --- basically the
Transmission/Disequilibrium Test (TDT). This is carried out by the
function {\tt tdt.snp}, which returns the same class of object as that
returned by {\tt single.snp.tests}; allelic (1 df) and genotypic
(2~df) tests are computed. The following commands compute the tests,
display the $p$-values, and plot quantile--quantile plots of the 1~df tests
chi-squared statistics:
<<tdt-tests,fig=TRUE,keep.source=TRUE>>=
tests <- tdt.snp(data = pedData, snp.data = genotypes)
cbind(p.values.1df = p.value(tests, 1),
      p.values.2df = p.value(tests, 2))
qq.chisq(chi.squared(tests, 1), df = 1)
@ 
Since these SNPs were all in a region of known association, the
overdispersion of test statistics is not surprising. Note that,
because each family had two affected offspring, there were twice as
many parent-offspring trios as families. In the above tests, the
contribution of the two trios in each family to the test statistic
have been assumed to be independent. When there is {\em linkage}
between the genetic locus and disease trait, this assumption is
incorrect and an alternative variance estimate can be used by
specifying {\tt robust=TRUE} in the call. However, in practice,
linkage is very rarely strong enough to require this correction.
\end{document}