This file is indexed.

/usr/share/RDKit/Contrib/fraggle/readme.txt is in rdkit-data 201603.5-2.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
Fraggle Readme
--------------

This directory contains the scripts used to run Fraggle.

The algorithm used in the scripts was described at the 2nd RDKit UGM (October 
2013). The presentation can be found at:
https://github.com/rdkit/UGM_2013/blob/master/Presentations/Hussain.Fraggle.pdf

The benchmarking carried out in the presentation utilised the open source
benchmarking platform described in:

Riniker, Sereina, and Gregory A. Landrum. "Open-source platform to benchmark
fingerprints for ligand-based virtual screening." Journal of cheminformatics 5.1
(2013): 26.

With the addition of the following scripts:

fraggle.py
cxn_tversky.py
atomcontrib.py
calculate_scored_lists_mod.py

The information below describes how to run the Fraggle similarity algorithm with
a query compound against a file of database compounds.

How to run Fraggle:
-------------------

Fraggle works in three steps:

1) Need to fragment your query molecule(s)
2) Run a Tversky Search using the generated fragments
3) Post-process results of the Tversky search to give final output

It is recommended to run a standard RDK5 similarity alongside Fraggle

The scripts requires RDKit (www.rdkit.org) be installed and properly configured.
Help is available for all the scripts using the -h option

Step 1
------

Command:
python fraggle.py <QUERY_FILE >FRAGGLE_FRAGMENTS

Exmaple command:
python fraggle.py < data/query.smi > data/query_fragmentation.csv

Format of QUERY_FILE is: SMILES ID <space or comma separated>
See query.smi for an example input file

Format of FRAGGLE_FRAGMENTS: whole mol smiles,ID,fraggle split smiles
See query_fragmentation.csv for an example output file

The following help is available using the -h option:

Program to run the first part of Fraggle. Program splits the molecule
ready for the search

USAGE: ./fraggle.py <file_of_smiles
Format of smiles file: SMILES ID (space or comma separated)
Output: whole mol smiles,ID,fraggle split smiles

Step 2
------

The second step to take the fragments generated in step 1 and run a Tversky
search against your database of molecules. This is the rate determining step of
the algorithm so it is recommended to do this against a database with an
appropriate chemistry cartridge. However a python script is provided which
utilises RDKit.

The script uses a default tversky cut-off of 0.8 (alpha=0,beta=1) which seems to
work the reasonably well for the rdk5 fp.

Command:
python rdkit_tversky.py -f FRAGGLE_FRAGMENTS <DB_SMILES_FILE >TVERSKY_OUTPUT

Example command:
python rdkit_tversky.py -f data/query_fragmentation.csv < data/ChEMBL_11265_actives.smi > data/fragmentation_tversky_out

Format of FRAGGLE_FRAGMENTS file is: whole mol smiles,ID,fraggle split smiles
See query_fragmentation.csv for an example file

Format of DB_SMILES_FILE: SMILES ID (space or comma separated)
See ChEMBL_11265_actives.smi for an example file

Format of TVERSKY_OUTPUT: query_frag_smiles,query_smiles,query_id,retrieved_smi,retrieved_id,tversky_sim
See fragmentation_tversky_out for an example file

The following help is available using the -h option:
Usage: rdkit_tversky.py [options]

Program to Tversky search results as part of Fraggle

Options:
  -h, --help            show this help message and exit
  -f F_FILE, --frags=F_FILE
                        File containing the query fragmentations from Fraggle
  -c CUTOFF, --cutoff=CUTOFF
                        Cutoff for Tversy similarity. Only Tversky results
                        with similarity greater than the cutoff will be
                        output. DEFAULT = 0.8

Format of input file: whole mol smiles,ID,fraggle split smiles
Output:
query_frag_smiles,query_smiles,query_id,retrieved_smi,retrieved_id,tversky_sim

Step 3
------

The last step is to perform the post-processing to give you the final Fraggle
similarity

Command:
python atomcontrib.py <TVERSKY_OUTPUT >FINAL_FRAGGLE_RESULTS

Example command:
python atomcontrib.py < data/fragmentation_tversky_out > data/final_fraggle_results.csv

Format of TVERSKY_OUTPUT file is:
query_frag_smiles,query_smiles,query_id,retrieved_smi,retrieved_id,tversky_sim
See fragmentation_tversky_out for an example file

Format of FINAL_FRAGGLE_RESULTS:
SMILES,ID,QuerySMI,QueryID,Fraggle_Similarity,RDK5_Similarity
See final_fraggle_results.csv for an example file

This program has several options (see help from program below):

Usage: atomcontrib.py [options]

Program to post-process Tversky search results as part of Fraggle

Options:
  -h, --help            show this help message and exit
  -c CUTOFF, --cutoff=CUTOFF
                        Cutoff for fraggle similarity. Only results with
                        similarity greater than the cutoff will be output.
                        DEFAULT = 0.7
  -p PFP, --pfp=PFP     Cutoff for partial fp similarity. DEFAULT = 0.8

Format of input file:
query_frag_smiles,query_smiles,query_id,retrieved_smi,retrieved_id,tversky_sim
Output: SMILES,ID,QuerySMI,QueryID,Fraggle_Similarity,RDK5_Similarity