Difference between revisions of "MUD - Michael's Utilities for Docking"

From DISI
Jump to navigation Jump to search
Line 41: Line 41:
 
==Job Analysis==
 
==Job Analysis==
  
*Combining parallel results
+
===Combining parallel results===
 +
Merge all parallel jobs into a unique set of scores.
 
  $mud/combine.py
 
  $mud/combine.py
creates the following files
+
 
#combine.scores - fully processed  
+
*Options:
 +
Use -b or --broken to skip finding broken molecules. Use -d or --done to indicate that all subjobs are complete, for the case where you did not submit with a MUD submission script. Use -p or --prefix if your output files are named something other than test. Use --box if your  box file is not at ../../grids/box relative to your subjob directories.
 +
 
 +
*Creates:
 +
#combine.scores - fully processed scores, taking the best one for each id
 
#combine.raw - contains all scores as scrapped from DOCK output
 
#combine.raw - contains all scores as scrapped from DOCK output
 
#combine.broken - broken molecules and the reason they failed
 
#combine.broken - broken molecules and the reason they failed
#combine.zeroes - important sanity check which lists the number of molecules followed by the number of zeroes in each scoring column. Past experience has shown that when DOCK fails randomly and silently, it generates a large  number of zero scores. If this happens, usually re-running the job will give better results.
+
#combine.zeroes - important sanity check
  
Use -b or --broken to skip finding broken molecules. Use -d or --done to indicate that all subjobs are complete, for the case where you did not submit with a MUD submission script. Use -p or --prefix if your output files are named something other than test. Use --box if your  box file is not at ../../grids/box relative to your subjob directories.
+
The .zeroes file is a sanity check because it lists the number of molecules followed by the number of zeroes in each scoring column. Past experience has shown that when DOCK fails randomly and silently, it often generates a large  number of zero scores. If this happens, simply re-running the job will give better results.  
  
 
*Enrichment plots are sensitive to consistent treatment and proper accounting for all docked molecules. The combine script properly accounts for all docked molecules by detecting bumped out, no matched, and timed out molecules.  
 
*Enrichment plots are sensitive to consistent treatment and proper accounting for all docked molecules. The combine script properly accounts for all docked molecules by detecting bumped out, no matched, and timed out molecules.  

Revision as of 09:13, 29 October 2008

What's in MUD?

  • Tools to start, check, and restart dock jobs
  • Tools to combine, enrich, plot, and view docking results

Setting up MUD

  • For convenience, point a shell variable to the base mud directory to save typing
set mud=~mysinger/code/mud/trunk
  • If you use MUD a lot, you can add this to your ~/.login
  • Then simply run commands like this:
$mud/submit.csh
$mud/check.py -h
  • Use -h or --help to get full help information for the .py (python) scripts
  • The .csh scripts will automatically print usage information if mis-used
  • The scripts automatically use their invocation path to find other scripts and libraries they depend on.

Job Control

Main Workflow

  • Submit a parallel job to the cluser
$mud/submit.csh

Uses 'dirlist' to determine which directories to run. Similar to startdockbksX, but also indicates job submission by touching a submitted file in each directory.

  • Check parallel job status
$mud/check.py

Indicates the status of unfinished (or unsubmitted) jobs. Note that it simply returns nothing if everything is finished.

  • Restart all failed subjobs
$mud/restart.py

This works even if some subjobs are still running. Occasionally, however, jobs can fail with no detectable remnants. To force those jobs to restart you can use the -f option, but beware that this will also restart all subjobs that are still running.

More specialized commands

  • Submit a single directory to the cluster
$mud/subsge_single.csh
  • Submit a single directory to the local machine
$mud/sublocal.csh
  • Remove docking output leaving only input - will DELETE even completed jobs
$mud/clean.py
  • Restart single directory
$mud/restartdir.py

Job Analysis

Combining parallel results

Merge all parallel jobs into a unique set of scores.

$mud/combine.py
  • Options:

Use -b or --broken to skip finding broken molecules. Use -d or --done to indicate that all subjobs are complete, for the case where you did not submit with a MUD submission script. Use -p or --prefix if your output files are named something other than test. Use --box if your box file is not at ../../grids/box relative to your subjob directories.

  • Creates:
  1. combine.scores - fully processed scores, taking the best one for each id
  2. combine.raw - contains all scores as scrapped from DOCK output
  3. combine.broken - broken molecules and the reason they failed
  4. combine.zeroes - important sanity check

The .zeroes file is a sanity check because it lists the number of molecules followed by the number of zeroes in each scoring column. Past experience has shown that when DOCK fails randomly and silently, it often generates a large number of zero scores. If this happens, simply re-running the job will give better results.

  • Enrichment plots are sensitive to consistent treatment and proper accounting for all docked molecules. The combine script properly accounts for all docked molecules by detecting bumped out, no matched, and timed out molecules.

To achieve consistency, you have two options:

  • 1. Write coordinates for all molecules (what I use)

In INDOCK, set number_save to 100000 or something high enough to capture all docked hierarchies. DOCK output is now gzipped so this is cheaper than it used to be.

  • 2. Do not check for broken molecules

Use the -b option when running combine.py