/usr/share/doc/blast2/impala.html is in blast2 1:2.2.25.20110713-3ubuntu2.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator"
content="HTML Tidy for Linux/x86 (vers 1st October 2002), see www.w3.org" />
<title></title>
</head>
<body>
<pre>
IMPALA: Integrating Matrix Profiles And Local Alignments
1. Files in Distribution
The following IMPALA source code files are distributed:
copymat.c
impatool.c
makemat.c
posit2.c
profiles.c
newkar.c
profiles.h
Makefile
2. Compilation
Run the following commands in the directory, containing IMPALA source code files:
make makemat
make copymat
make impala
This will result in three binary executable files:
makemat : primary profile preprocessor
(converts a collection of binary profiles, created by the -C option
of PSI-BLAST, into portable ASCII form);
copymat : secondary profile preprocessor
(converts ASCII matrices, produced by the primary preprocessor,
into database that can be read into memory quickly);
impala : search program (searches a database of score
matrices, prepared by copymat, producing BLAST-like output).
3. Conversion of profiles into searchable database
3.1. Primary preprocessing
Prepare the following files:
i. a collection of PSI-BLAST-generated profiles with arbitrary
names and suffix .chk;
ii. a collection of "profile master sequences", associated with
the profiles, each in a separate file with arbitrary name and a 3 character
suffix starting with c;
the sequences can have deflines; they need not be sequences in nr or
in any other sequence database; if the sequences have deflines, then
the deflines must be unique.
iii. a list of profile file names, one per line, named
<database_name>.pn;
iv. a list of master sequence file names, one per line, in the same
order as a list of profile names, named
<database_name>.sn;
The following files will be created:
i. a collection of ASCII files, corresponding to each of the
original profiles, named
<profile_name>.mtx;
ii. a list of ASCII matrix files, named
<database_name>.mn;
iii. ASCII file with auxiliary information, named
<database_name>.aux;
Arguments to makemat:
-P database name (required)
-G Cost to open a gap (optional)
default = 11
-E Cost to extend a gap (optional)
default = 1
-U Underlying amino acid scoring matrix (optional)
default = BLOSUM62
-d Underlying sequence database used to create profiles (optional)
default = nr
-z Effective size of sequence database given by -d
default = current size of -d option
Note: It may make sense to use -z without -d when the
profiles were created with an older, smaller version of an
existing database
-S Scaling factor for matrix outputs to avoid round-off problems
default = PRO_DEFAULT_SCALING_UP (currently defined as 100)
Use 1.0 to have no scaling
Output scores will be scaled back down to a unit scale to make
them look more like BLAST scores, but we found working with a larger
scale to help with roundoff problems.
-H get help (overrides all other arguments)
Note: It is not enforced that the values of -G and -E passed to makemat
were actually used in making the checkpoints. However, the values fed
in to makemat are propagated to copymat and impala.
3.1. Secondary preprocessing
Prepare the following files:
i. a collection of ASCII files, corresponding to each of the
original profiles, named
<profile_name>.mtx
(created by makemat);
ii. a collection of "profile master sequences", associated with
the profiles, each in a separate file with arbitrary name and a 3 character
suffix starting with c.
iii. a list of ASCII_matrix files, named
<database_name>.mn
(created by makemat);
iv. a list of master sequence file names, one per
line, in the same order as a list of matrix names, named
<database_name>.sn;
v. ASCII file with auxiliary information, named
<database_name>.aux
(created by makemat);
The files input to copymatices are in ASCII format and thus portable
between machines with different encodings for machine-readable files
The following files will be created:
i. a huge binary file, containing all profile matrices, named
<database_name>.mat;
Arguments to copymat
-P database name (required)
-H get help (overrides all other arguments)
4. Search
Before you start searching, check that you have copies of or soft
links to all the files associated with the PSSM library. If the
library has K PSSMs, you should have
K files with names ending in .mtx
K files with names ending in a 3-letter extension starting with c
1 file with name ending in .pn
1 file with name ending in .sn
1 file with name ending in .aux
1 file with name ending in .mn
1 file with name ending in .mat
Arguments to impala
-i query sequence file (required)
-P database of profiles (required)
-o output file (optional)
default = stdout
-e Expectation value threshold (E), (optional, same as for BLAST)
default = 10
-m alignment view (optional, same as for BLAST)
-z effective length of database (optional)
-1 = length given via -z option to makemat
default (0) implies length is actual length of profile library
adjusted for end effects
-H get help (overrides all other options)
5. Directory convention
Since IMPALA requires a large number of files, it may be convenient
to store your impala files in various directories. For copymat,
makemat, and impala the following parsing convention applies
to the string that follows the -P argument.
If the string starts with a '/', then it is deemed to be a full
path name. Whatever prefix occurs upto and including the rightmost
'/' is deemed to be a prefix that should be prepended to all
file names in the .sn, .pn, and .mn files.
Example: If you call any of the 3 programs including the
argument -P /foo/bar/wolf1187
then
/foo/bar/ is prepended to every filename listed in
wolf1187.pn
wolf1187.sn
wolf1187.mn
before opening the file, but the files
wolf1187.pn
wolf1187.sn
wolf1187.mn
themselves are not changed.
6. Output
IMPALA output closely mimics output of BLASTP family programs and
should be compatible with SEALS BLAST parsers.
Send suggestions, comments, complaints only to Alejandro Schaffer
schaffer@helix.nih.gov
Reference:
Schaffer, A.A., Wolf, Y.I., Ponting, C.P. Koonin, E.V.,
Aravind, L., Altschul, S. F., IMPALA: Matching a Protein Sequence
Against a Collection of PSI-BLAST-Constructed Position-Specific
Score Matrices, Bioninformatics, to appear.
Please cite the above paper if you publish any results computed by IMPALA.
</pre>
</body>
</html>
|