ZINC is big. Currently, ZINC has over 20M molecules, about 14M of which are commercially available. For most applications, most users of ZINC will only want or need to download a fraction of ZINC: a subset. This article describes subsets.
- Standard subsets, numbers 1-10. These are our approximations of popular subsets that appear commonly in the literature
- clean subsets, numbers 11-20. Here we have removed compounds that are known to cause problems in some assays. For information about the additional filtering used, please see 
- immediate availability subsets, numbers 21-30. Normal subsets include compounds that are available from stock, and also compounds that can be made in 6-10 weeks. Some people only want compounds that can be obtained immediately, say less than 2 weeks. If so, this subset is for you.
Subsets of ZINC by one dimensional physical property (molecular weight, calculated logP) are the single most popular way to acquire ZINC. Of these, the first two subsets, "lead-like" (subset #1) and "fragment-like" (subset #2) are by far the most popular. There are good reasons for this.
Lead-like compounds are large enough to be detected in high throughput spectrophotometric or other cheap assays, yet smaller than most drugs, which have been highly optimized for a specific application. Lead-like compounds will be more soluble, in general, than their bigger "drug like" cousins, and thus more likely to actually be assayed.
Fragment-like compounds are even smaller than leads. The good news is, they sample chemical space more throughly than is possible with leads. The bad news is, they are often too small to be detected in a cheap assay, requiring direct biophysical measurement, such as SPR, NMR, or X-ray crystallography.
Together, leads and fragments represent the dominant thinking in the field for screening. The remaining subsets can also be interesting. Here we give a brief explanation of why you might want each one.
Drug-like (#3) captures the famous rule-of-fives, which itself is just a guideline, to which there are many exceptions. There will be times you may want to screen the "drug like" subset of ZINC, but this would probably be later in the project, after you have had a good look at the leads already, or perhaps there is some unusual circumstance.
All purchasable (#6) comes in third place for popularity. Advantage: you can buy these compounds. Disadvantage: for target based virtual screening, many of these compounds will be a waste of time, because they are too big, too specific, and too greasy (insoluble).
Everything (#10) comes in fourth place for popularity, since it is, well, everything we can let you have. We frankly don't think you really want this, but people keep asking for it, so, here it is.
Subsets 7,8,9 will return soon...
Subsets 11-16 will return soon....
Neutral-fragments (#17) are what the name suggests: uncharged fragments. Why would you want this? Charged compounds often have a hard time getting into cells. Docking programs can have trouble weighting among charged and neutral compounds. Wham - put those ideas together and you see why neutral fragments can be interesting.
Greasy-leads (#4) and Big-n-greasy(#5) are deprecated. Frankly, these compounds are nothing but trouble, since they often do not dissolve. If you really want them back, write me, but otherwise, they are gone.
CNS permeable (#29) are of interest for some projects where getting through the BBB is important. We have used well known criteria for this subset.
Monoanions (#31) and monocations (#32) - don't know why you would want this. We created this for a particular project.
Goldilocks (#33) are yet another set that try to "shoot for the middle" of the chemical space problem and balance the competing advantages and disadvantages of bigger vs smaller molecules.
Piotr (#38), kerim-like (#42), abram (#49) - were all created for specific projects - we do not know why you might want these, but they are available should that be the case.
stiff-soluble (#50) and stiffs (#51) are for testing ideas about entropy loss of the ligand on binding. So they are for research, but you might want them too...
We offer subsets by vendor.
User-created subsets of mini subsets
We offer the capability to create small subsets. You can do this via the ZINC results browser page after a search by clicking on "create subset". Another way, if you have the SMILES, to generate a custom subset, is to upload the molecules.
How to create a subset for docking based on SMILES
- 1. Browse to the upload page, http://zinc.docking.org/upload.shtml
- 2. select your files with one smiles and optional identifier separated by whitespace per line
- 3. Check box "click here if private (UCSF only)". This gives you the right to upload 5000 instead of 1000 molecules per transaction. If you are not inside UCSF or otherwise "special", you are stuck at 1000.
- 4. Click "upload and build"
- 5. Click on the link where it says "browse results here" (should be a number)
- 6. Wait about 10 min per 1000 molecules (refresh page) until you see "e_0.0.db.gz" - this is the pH 7 representation of your compounds. e_1.0.db.gz is the "additional forms pH 5.75 - 8.25" file. you need to download/copy these files and dock them.
- 7. OR, you can do: md4db.csh uploads 44249 (or whatever your number was) and it will copy them and set up for docking. (UCSF only). md4db.csh == make database for dock blaster
- 8. Now do cd run.u44249
- 9. Now do startdockbks3 `pwd` on sgehead to start docking.
We offer the capability to upload compounds for processing.
We offer compounds by annotation. more soon.
How do dock to ZINC subsets under unix
- 1. Go to the project directory. This is where "sph" and "grids" live.
- 2. md4db.csh - this will give you options for what kind of subset you want to dock.
- to get currently available subset type: md4db.csh bogus
- current values: byvendor, bysubset, usersubet, uploads, virtual (i.e. make-on-demand)
- 3. say you want bysubset. then go: md4db.csh bysubset
- you are given currently available choices. "fragment like" is #2, for instance.
- to see the names of the subsets, go to the "by property" page under ZINC.
- thus to set up a directory to dock fragments, you would do: md4db.csh bysubset 2
- 4. to start docking, cd run.2 ; then startdockbks3 `pwd`; you must be on sgehead so you can see the cluster.
- do not ask us why it has this bizarre syntax.
Synthesis on Request
Some vendors offer compounds that they will make if asked, usually within about 10 weeks. We like these compounds, because they greatly expand the region of chemical space one can sample without performing synthesis oneself.
-- John Irwin