Substructure searching

Jump to navigation Jump to search

Written by Jiankun Lyu, 2017/09/13

The hierarchy of the directories:

substructure_searching----- working 
              |                |
              |                |------ ZINC-downloader-2D-smi.database_index
              |                | 
              |                |------ sub_pattern.smarts
              ------- scripts ------ submit.csh

1) Make those directories above.

mkdir substructure_searching
cd substructure_searching
mkdir working
mkdir scripts

2) Download databases index from ZINC

2.1) Go to ZINC

2.2) Choose the tranches you want to do substructure searching

Choose the tranches you want to do substructure searching

2.3) download the databases index file

download the databases index file

2.4) download the file above and save it as ZINC-downloader-2D-smi.database_index, then upload the file to the working directory

3) Copy scripts from my path.

cd scripts
cp /mnt/nfs/home/jklyu/zzz.script/analogs_searching/ .
cp /mnt/nfs/home/jklyu/zzz.script/analogs_searching/multi_sub_searching/submit.csh .
cd ../

4) Put SMARTS patterns you want to search in the sub_pattern.smarts file and give each SMARTS pattern a unique number or name

Here is an example in the sub_pattern.smarts file
NS(=O)(=O)c1cccc([F,Cl,Br,I])c1[OD1] 1
NS(=O)(=O)c1cc([F,Cl,Br,I])ccc1[OD1] 2

5) Split the ZINC-downloader-2D-smi.database_index file into chunks

cd working
python ../scripts/ . sub_searching_  ZINC-downloader-2D-smi.database_index number_of_chunks(change it to real number) count

I suggest 3 SMILES files per chunk, so change the number_of_chunks based on your real size of tranches.

6) Submit substructure searching jobs

csh ../scripts/submit.csh full_path_of_sub_pattern.smarts

7) Collect results

cat sub_searching_*/*.extract.output.smi > output.smi