/usr/share/doc/ray/Documentation/NCBI-Taxonomy.txt is in ray-doc 2.3.1-2build2.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | Ray can be utilized to classify k-mers in a taxonomy. To do so,
Ray needs a taxonomy.
See these documents for general documentation about graph coloring and taxonomic profiling
features (called Ray Communities):
- Documentation/Taxonomy.txt
- Documentation/BiologicalAbundances.txt
To download the NCBI taxonomy and generate required files:
add this to your PATH:
export PATH=/home/boiseb01/git-clones/ray/scripts/NCBI-Taxonomy/:$PATH
Then, run this:
CreateRayInputStructures.sh
This will generate a directory with these files:
- NCBI-taxonomy/NCBI-Finished-Bacterial-Genomes
- NCBI-taxonomy/Genome-to-Taxon.tsv
- NCBI-taxonomy/TreeOfLife-Edges.tsv
- NCBI-taxonomy/Taxon-Names.tsv
Now, you can run Ray as usual, but with additional options
to run Ray Communities plugins as well:
mpiexec -n 96 \
Ray \
-k 31 -o Ray-Communities \
-p SeqA_1.fastq SeqA_2.fastq \
-p SeqB_1.fastq SeqB_2.fastq \
-search NCBI-taxonomy/NCBI-Finished-Bacterial-Genomes \
-with-taxonomy NCBI-taxonomy/Genome-to-Taxon.tsv \
NCBI-taxonomy/TreeOfLife-Edges.tsv NCBI-taxonomy/Taxon-Names.tsv
As usual, you can also put all the arguments in a configuration file like this:
mpiexec -n 96 Ray Ray.conf
where Ray.conf contains
-k 31 -o Ray-Communities
-p SeqA_1.fastq SeqA_2.fastq
-p SeqB_1.fastq SeqB_2.fastq
-search NCBI-taxonomy/NCBI-Finished-Bacterial-Genomes
-with-taxonomy NCBI-taxonomy/Genome-to-Taxon.tsv
NCBI-taxonomy/TreeOfLife-Edges.tsv NCBI-taxonomy/Taxon-Names.tsv
|