Flexibase Format

From DISI
Jump to: navigation, search
Mol2db.gif
Hierarchy Generator


Input: 1. multi-conformer mol2 file, 2. solvation file, and 3. inhier parameters

Output: 4. database in hierarchy format, 5. molecule summary (stdout)


Important notes:

  • The hierarchy generator does not know if hydrogens have been rotated. Turning on torque_hydrogens if hydrogens have already been rotated, will result in duplicate structures and inaccurate counting of conformations.
  • The hierarchy dock code will not read a database with hierarchy spacing.


1. Multi-conformer mol2 file

This file is a standard Tripos mol2 file. Multiple conformations of the same molecule must have the same MFCD number. The file must be under the UNIX file size limit of 2GB


2. Solvation file

This file contains abbreviated output from AMSOL. Each molecule record begins with a line containing the identification number followed by the total number of atoms and the formal charge. An important note is that the identification number here must match the identification number in the mol2 file! The next numbers are the total polar solvation energy, the total solvent accessible surface, total apolar solvation energy and total solvation energy. After this header, there is a line of data for each atom. The first number is the partial atomic charge, followed by the polar solvation energy, the solvent accessible surface area, apolor solvation energy and finally the total solvation energy of the atom.


3. Hierarchy generator input parameters (inhier)

The hierarchy generator is called by typing the path to the executable followed by the name of the input parameter file (can be anything, by default inhier). Logical keywords (procedures) can be yes/no or true/false.

protein Is the input file a list of protein side chains or small molecules? Protein side chains do not allow for solvation values and preserve the input residue and atom names. If this option is true, solvation correction should be false; comment out the solvation_table line.

equalize_charges This option adjusts formal charge and equalizes charges on equivalent groups. Set this value to No, as charges are corrected outside of the hierarchy generator. Code for this routine was modified from mol2db from the dock suite

solvation-correction Should the hierarchy generator look for solvation data? If yes, molecules with solvation data will have their charges replaced with those from the solvation data table and have solvation data added. Molecules not listed in the solvation data table will retain their original charges, have zeros for the solvation numbers and have a -3 for the sixth value of the branch header line. The molecules with the -3 will be skipped by dock.

color_atoms Should the generator put atom colors on each atoms? No reason to turn this option off. Code for this routine was copied from mol2db from the dock suite.

output_color_table Should the database header be printed? When generating a small database (few hundred molecules) I leave this on. For large databases that will be joined together I leave it off. If this is left off, it needs to be added post database generation to the final joined files.

translate_coordinates Should the coordinates be translated to the origin. Mol2db translated everything to the origin to keep the coordinates small (database spacing). This should no longer be a problem. The ACD can be generated with this set to No.

hierarchy_spacing The hierarchy spacing option is only designed for visual inspection of the hierarchy. Dock will not read files with hierarchy spacing. For database generation, set this value to No.

torque_hydrogens If the input conformations do not have multiple conformations for the hydrogens, this option will generate multiple conformations. Groups rotated include =NH, -SH, and -OH. They are rotated in 180°, 30°, and 30° increments respectively. If the -SH or OH are connected to an aromatic system they are rotated in 60° increments.

mol2_file_list The hierarchy generator can process multiple files (gzipped or not). I recommend commenting this line out and calling the generator for each file form a shell script. This option is not compatible with mol2_file

mol2_file This is the multi-conformer mol2 file used in database generation. This option is not compatible with mol2_file_list

db_file This is the output file to which the generator writes the hierarchy.

solvation_table The file from which the generator reads solvation and charge data.

color_table This keyword marks the beginning of the color table. The color table follows the syntax rules for the mol2db (dock 3.5) color table. The end of the color table is marked by the keyword default_color and the value neutral

Here are some additional notes on the color table, note that the atom names are sybyl atom names http://tripos.com/mol2/atom_types.html :

rules are last match counts. so even though all Ns are positive, the later
rule N.ar matches acceptor so N.ar is an acceptor
first rules are beginning of sybyl atom text -> type
later rules are Atom NotBondedTo Atom -> type like
               O. -1 N.2 -> negative
other rules are Atom BondsAwayFrom Atom -> type like
               C.2 1 N. -> positive or
               0.2 2 N.3 -> amide_o
again rules are read in order and the last matching rule is the one used


4. Hierarchy database format

The hierarchy database format is based on the dock 3.5 database format. The first lines include a header (line #1) and describe the colors used in the database (line #2-8). For the purposes of this discussion, the lines have been numbered (#1, #2, etc.) and the hierarchy has been indented to help distinguish the levels.


Family header: Line #9 is designed for future use. The word Family is followed by a chemical family number. This is hard coded to 1 (first number). The second number is the number of molecules in this family (matches the number of occurrences of line #10), the third is the number of branches that get attached to each molecule. The last number is the number of branches in the family. Line #10 has the first 50 characters of the molecule name and the last letter and number of the identification code (e.g., MFCD, SPEC, ZINC). After a space there can be up to 10 branches listed (10i2). The rigid fragment is considered branch one so the example molecule has branches two and three. By listed which branches make up which molecules, I can later rebuild each molecule and recombine side chains. Any branch in the first position can be combined with any branch in the second position, etc.


Branches: Lines #11, #36, and #44 are branch identifiers. The first number in the first line of a branch lists how many coordinates are listed for the branch. The second number is the number of atoms in the branch (single conformer). This is followed by the number of heavy atoms and number of hydrogens in the branch. Next is the sum of the polar solvation energy for the entire molecule (not just the branch). Next is the number one, or if the molecule lacks solvation energy, -3. A value of -3 causes dock to skip the molecule. Next is the aploar component of desolvation, again for the entire molecule. The number zero was to denote the number of explicit conformations, but is now an open field for future use. The last number of the branch identifier line is the number of conformations for that branch (including recombination within the branch). Multiplying all of these conformation counts together for a given molecule results in the number of conformations for the molecule.


Atom information: The atom information is divided into two parts. First is all atom information except or coordinates, followed by multiple sets of coordinates. Both parts of the atom information start with the hierarchy level.

  • Hierarchy level: The rigid fragment is 9. All groups attached to the rigid fragment are numbered in the 10's; groups attached to those are numbered in the 20's. The tens (and if need be the hundreds place) denotes distance from the rigid fragment. The ones place differentiates independent groups within the branch. Branches from the rigid fragment can be numbered either, 19, 18, etc., or all 19. Originally the each needed to be different, but as the code evolved this was no longer required.
  • Atom information: This block of information (line #12-23, #37, and #45-52) contains information we consider to be the same for all conformations. It also describes all of the parts (number of atoms each group) required to complete the branch. After the hierarchy information, the van der Waals type as described in the Dock 3.5 manual is listed. Next the partial atomic charge multiplied by 10000. This is historical; the multiplication should probably be removed at some point. At the end of the charge is a column that is set to zero. This value is the flag (see dock 2.5). Currently the hierarchy generator and dock do not use flagging. Next is the color, corresponding to the color table at the top of the database file. The last two columns are the polar and apolar partial atomic desolvation values.
  • Atom coordinates: The first number is the hierarchy level followed by the xyz coordinates for the atom (multiplied by 1000). Line #38-42 list six different sets of coordinates for the hydrogen described in line #37. Line #53-55 are the one set of coordinates for the two carbons and hydrogen in group 18. For each position of group 18 (line #53-55, #69-71, #85-87, and #101-103) there are three positions for group 29 (Cl and 2 H) and two positions for group 28, the carboxylic acid. Since groups 29 and 28 can move independently, they have different group numbers. For each of the four positions for group 18 there are six downstream combinations of 29 and 28. This leads to the reported 24 positions for the branch. Any of these positions can be combined with any of the six torsions of the hydrogen leading to 144 conformations for the molecule. Note that the information about how many different sets of coordinates is not encoded anywhere, you have to count the lines, when there is one atom and then 6 different sets of coordinates (like lines #38-42) you know there are 6 conformations for that atom. When there are 3 atoms and 9 lines of coordinates (like lines #56-64), you know there are 3 conformations for that set of atoms. Note also that no information about the tree is explicitly encoded, if you are reading from group 49 for instance and then read a line of group 29 you have to move back up the tree of conformations to that level. The output order of the tree of conformations is 'infix'.


5. Molecule summary:

D MFCD rigid flex I_ats I_confs O_ats O_confs O_hconfs

D00000000 12 9 120 12 82 24 144

MFCD: Id number from the mol2 file (format MFCD12345678 à D12345678)

Rigid: Number of atoms in the same position in all conformations (12)

Flex: Total number of atoms in the molecule minus the rigid atoms (9)

I_ats: Input atoms -- the number of atoms required to represent the molecule in ensemble format. Rigid atoms plus the number of conformations times number of flexible atoms (120)

I_confs: Number of conformations read in for the molecule (12)

O_ats: Number of atoms written to the hierarchy -- usually less than I_ats unless lots of added hydrogen coordinates. (82)

O_confs: Number of conformations, including recombination. This number does not include hydrogen conformations added by the hierarchy generator. This number will frequently be larger than I_confs (recombination), but can also be less than I_confs if the input conformations are degenerate. (24)

O_hconfs: Number of conformations including hydrogens rotated by the hierarchy generator. This is the number of conformations dock will see. (144)


Error messages

no common atoms D00000000

For the specified molecule id there were no atoms with common coordinates. Nothing is written to the database file

Macrocycle error on D0000000

The recursion routine used in generating the hierarchy has problems with macrocyles that are not the rigid fragment. Nothing is written to the database file.


Here is the example file for the molecule shown above. The line numbers are shown at left, the column spacing is not accurate due to formatting changes.

 
#  1    DOCK 5.1 ligand_atoms
#  2    positive                       (1)
#  3    negative                       (2)
#  4    acceptor                       (3)
#  5    donor                          (4)
#  6    ester_o                        (5)
#  7    amide_o                        (6)
#  8    neutral                        (7)
#  9    Family      1   1   2   2
# 10    Molecule_to_describe_the_hierarchy_format      D00000000  2 3
# 11       12   12   8   4    -49.0700     1    -1.59    0          1
# 12      9 1  1380 7    1.570    0.160
# 13      9 1 -2410 7   -1.530    0.220
# 14      9 1  -970 7   -0.510    0.230
# 15      9 1 -1370 7   -2.080    0.260
# 16      9 1  -380 7   -1.610   -0.070
# 17      9 1 -1350 7   -4.130    0.220
# 18      9 5  -560 7   -0.640    0.010
# 19      912 -4710 4   -6.840   -1.350
# 20      9 7  1190 7    1.100   -0.030
# 21      9 7  1190 7    1.210   -0.030
# 22      9 7  1180 7    1.790   -0.030
# 23      9 7  1270 7    1.260   -0.030
# 24      9 -1204  1354     0
# 25      9     0   659     0
# 26      9     0  -732     0
# 27      9 -1204 -1427     0
# 28      9 -2408  -732     0
# 29      9 -2408   658     0
# 30      9 -3866 -1528   -49
# 31      9 -1237  2718     6
# 32      9   919  1190     0
# 33      9   919 -1263     0
# 34      9 -1204 -2489     0
# 35      9 -3327  1189     0
# 36        6    1   0   1    -49.0700     1    -1.59    0          6
# 37     19 6  3880 7    2.390   -0.620
# 38      19  -328  3071  -166
# 39      19  -937  3059  -874
# 40      19 -1854  3036  -699
# 41      19 -2161  3025   181
# 42      19 -1552  3036   889
# 43      19  -635  3059   714
# 44       64    8   5   3    -49.0700     1    -1.59    0         24
# 45     18 5  -100 7    0.290    0.460
# 46     18 1  4520 7   14.160    0.650
# 47     18 7   820 7    2.350   -0.010
# 48     2916 -2080 7   -1.680   -0.080
# 49     29 7   940 7    2.110   -0.020
# 50     29 7  1210 7    1.780   -0.010
# 51     2811 -6070 2  -26.310    0.240
# 52     2811 -7570 2  -33.750   -1.760
# 53      18 -3882 -3141   -97
# 54      18 -4926 -1167  1143
# 55      18 -4160 -1125 -1029
# 56       29 -3131 -4036  1247
# 57       29 -4935 -3454  -148
# 58       29 -3261 -3392  -970
# 59       29 -3084 -3955 -1465
# 60       29 -3396 -3499   823
# 61       29 -4946 -3397  -213
# 62       29 -5457 -3970  -149
# 63       29 -3325 -3445  -995
# 64       29 -3448 -3445   867
# 65       28 -5017   -86  1728
# 66       28 -5869 -2094  1551
# 67       28 -5397   -53  1380
# 68       28 -5422 -2160  1970
# 69      18 -5233  -672   -47
# 70      18 -4072 -2582 -1283
# 71      18 -3736 -2021   926
# 72       29 -5511   482 -1375
# 73       29 -6066 -1390   -59
# 74       29 -5154   -39   849
# 75       29 -5558   401  1337
# 76       29 -5225   -44  -950
# 77       29 -6027 -1431     7
# 78       29 -6782 -1549  -100
# 79       29 -5234   -64   870
# 80       29 -5205  -115  -995
# 81       28 -3182 -3226 -1841
# 82       28 -5339 -2863 -1765
# 83       28 -3377 -3574 -1512
# 84       28 -5132 -2438 -2162
# 85      18 -4582 -1948  1335
# 86      18 -5060  -798  -895
# 87      18 -3466 -2424  -546
# 88       29 -5016  -663  2488
# 89       29 -5510 -2477  1073
# 90       29 -3821 -2541  1864
# 91       29 -3714 -3043  2439
# 92       29 -4786 -1021  1890
# 93       29 -5463 -2526  1019
# 94       29 -6143 -2801  1255
# 95       29 -3882 -2601  1878
# 96       29 -4821  -989  1817
# 97       28 -4908   -64 -1874
# 98       28 -6387 -1016  -571
# 99       28 -5028  -515 -2095
#100       28 -6262  -483  -287
#101      18 -4533 -1865 -1479
#102      18 -3939 -2951   755
#103      18 -4429  -722   443
#104       29 -3627 -2892 -2616
#105       29 -5492 -2366 -1280
#106       29 -4594  -890 -1985
#107       29 -4929  -512 -2567
#108       29 -3835 -2522 -2017
#109       29 -5510 -2302 -1225
#110       29 -6096 -2719 -1504
#111       29 -4677  -908 -2003
#112       29 -3831 -2571 -1946
#113       28 -3290 -3248  1761
#114       28 -4821 -3940   357
#115       28 -3745 -3111  1962
#116       28 -4291 -4115    96