Decoy Theory

Jump to navigation Jump to search

What are decoys?

Or, more importantly, why do we care about decoys and keep hearing about decoys all the time?

"Decoys" is essentially a codename for a way of evaluating how well your docking program has done on a target (or a set of targets). Decoys refers to a set of molecules that (probably) won't bind to your target. Here are some terms:

  • Ligands: A set of known ligands that bind to your protein target. Often taken from papers or a database like ChEMBL
  • Known Decoys/ Known non-binders: A set of molecules that have been tested against your protein target and found not to bind. ChEMBL or the literature is also a source of these.
  • Property-Matched Decoys: A set of molecules, typically from ZINC, that look like your ligands in chemical and physical property, but are not similar to the ligands by Tanimoto of a 2D fingerprint. Making these is explained here Automated_Database_Preparation#Automatic_Decoy_Generation and more information can be found in the Huang et al [1]

or Verdonk et al [2] papers.

  • Random Decoys: A set of random molecules, usually chosen from ZINC, most of which won't be binders simply by chance.

Now, once you have these various sets, you can examine the enrichment of various sets, usually by looking at a ROC curve, log ROC curve, or the LogAUC of one set over the others. The most common usage is ligands over property-matched decoys. If your target does well at this, the general attitude is that you will do well at a prospective virtual screen. If you have many known decoys or known non-binders you can examine the enrichment of those over ligands, which also tends to indicate how well you are doing. Sometimes it is also illustrative to test the enrichment of ligands over random decoys (usually something like the leadlike ZINC subset, trimmed at 60% Tanimoto overlap [3]. A final thing to examine if you have known decoys is to examine their enrichment over random decoys, expecting random performance.


More relevant pages

Decoys, DUD, LogAUC