How to do parallel search of smi files on the cluster

Jump to navigation Jump to search

This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel Indexing and parallel computing are used to speedup searching. The performance of qsub depends on the workload of the whole cluster. Generally, searching with qsub has good scalability.

Create a folder with the following files and scripts
input.txt contains bash code for qsub. specify the qsub command, parameters for qsub, input file, the function script, parameters for the function. A example is shown below.


/nfs/soft/tools/utils/qsub-slice/qsub-mr \                 #  The qsub command
    -l 5 \                                                 #  The number of lines to be handled by each task, here is 5
    -N test \                                              #  The name of the queue to submit to
    input.txt \                                            #  The input file names and directory
    ./ \                                      #  The searching function to be performed 
    -q "CS(=O)(=O)CCNCc1ccccc1"                            #  Parameter for, the input query for searching


The input file names and directory. An example of input.txt is shown below. You can use ls *.smi > input.txt to generate this file.


The searching function used by qsub. The core function of is mol2img_trial which is located in "/nfs/home/jizhou/work/Projects/smi_index/dotmatics/". mol2img_trial generates index for the smi file to speedup searching. requires an input query for searching. An example is shown below

-q "CS(=O)(=O)CCNCc1ccccc1"


Run to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job. For more information, please refer to qstat

qstat                         # check the status of jobs, example is shown below.

-bash-4.1$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
6511305 1.25000 test-map   jizhou       r     07/19/2018 10:42:43 all.q@n-5-29.cluster.ucsf.bksl     1 1
6511305 0.75000 test-map   jizhou       r     07/19/2018 10:42:43 all.q@n-9-20.cluster.ucsf.bksl     1 2
6511305 0.58333 test-map   jizhou       r     07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks     1 3
6511305 0.50000 test-map   jizhou       r     07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl     1 4

When all jobs are completed, run to check the outputs. Sample outputs are shown below

CS(=O)(=O)CCNCc1ccncc1 ZINC000037491283|70.6
CS(=O)(=O)CCNCc1ccc(O)cc1 ZINC000037740328|70.6
CS(=O)(=O)CCNCCOc1ccccc1 ZINC000048777006|70.6
CS(=O)(=O)CCNCc1ccccc1 ZINC000037491280|100.0
CS(=O)(=O)CCNCCc1ccccc1 ZINC000037491281|75.0

Clean up

To clean up, run /nfs/soft/tools/utils/qsub-slice/qsub-mr --clean. The outputs directory and its files will be removed.

/nfs/soft/tools/utils/qsub-slice/qsub-mr --clean