Fine Tranching with RDKit using Heavy Atom Count and LogP

Jump to: navigation, search

Written by Jennifer Young on April 14, 2020


These scripts perform fine tranching with RDKit to compute the heavy atom count and logP for each molecule and put it in a bucket of the form HxxPyyy for positive valued logp (i.e. 0 < logp) and HxxMyyy for negative valued logp (i.e. logp < 0).

See github repo

How to run

(If you are using our cluster) Source conda environment for RDKit

If you are using our cluster, there is already a conda environment with RDKit available and you just need to source it using the following command. You need to use bash.

   source /mnt/nfs/home/devtest/anaconda3/bin/activate my-rdkit-env

If you need to create a conda environment, follow the instructions at

Read the section : How to install RDKit with Conda. Once you do

   conda activate my-rdkit-env
   conda install -c conda-forge tqdm

You are ready to run the Python script.

Run Python script with the desired arguments

The input smiles file should have the following 2 columns

  • smiles
  • ID
   python <smiles>

The output file will be a file with the name <smiles_file>_hlogp and will have the following 3 columns

  • original smiles
  • original ID
  • HxxPyyy HxxMyyy