Difference between revisions of "MUD - Michael's Utilities for Docking"

From DISI
Jump to navigation Jump to search
Line 64: Line 64:
  
 
format of combine.scores:
 
format of combine.scores:
<id> <contact score>
+
<mono> <id> <contact score> <
  
 
The .zeroes file is a sanity check because it lists the number of molecules followed by the number of zeroes in each scoring column. Past experience has shown that when DOCK fails randomly and silently, it often generates a large  number of zero scores. If this happens, simply re-running the job will give better results.  
 
The .zeroes file is a sanity check because it lists the number of molecules followed by the number of zeroes in each scoring column. Past experience has shown that when DOCK fails randomly and silently, it often generates a large  number of zero scores. If this happens, simply re-running the job will give better results.  
  
 
===Computing Enrichments===
 
===Computing Enrichments===

Revision as of 09:19, 29 October 2008

What's in MUD?

  • Tools to start, check, and restart dock jobs
  • Tools to combine, enrich, plot, and view docking results

Setting up MUD

  • For convenience, point a shell variable to the base mud directory to save typing
set mud=~mysinger/code/mud/trunk
  • If you use MUD a lot, you can add this to your ~/.login
  • Then simply run commands like this:
$mud/submit.csh
$mud/check.py -h
  • Use -h or --help to get full help information for the .py (python) scripts
  • The .csh scripts will automatically print usage information if mis-used
  • The scripts automatically use their invocation path to find other scripts and libraries they depend on.

Job Control

Main Workflow

  • Submit a parallel job to the cluser
$mud/submit.csh

Uses 'dirlist' to determine which directories to run. Similar to startdockbksX, but also indicates job submission by touching a submitted file in each directory.

  • Check parallel job status
$mud/check.py

Indicates the status of unfinished (or unsubmitted) jobs. Note that it simply returns nothing if everything is finished.

  • Restart all failed subjobs
$mud/restart.py

This works even if some subjobs are still running. Occasionally, however, jobs can fail with no detectable remnants. To force those jobs to restart you can use the -f option, but beware that this will also restart all subjobs that are still running.

More specialized commands

  • Submit a single directory to the cluster
$mud/subsge_single.csh
  • Submit a single directory to the local machine
$mud/sublocal.csh
  • Remove docking output leaving only input - will DELETE even completed jobs
$mud/clean.py
  • Restart single directory
$mud/restartdir.py

Job Analysis

  • Enrichment plots are sensitive to consistent treatment and proper accounting for all docked molecules. The combine script properly accounts for all docked molecules by detecting bumped out, no matched, and timed out molecules.

To achieve consistency, you have two options:

  • 1. Write coordinates for all molecules (what I use)

In INDOCK, set number_save to 100000 or something high enough to capture all docked hierarchies. DOCK output is now gzipped so this is cheaper than it used to be.

  • 2. Do not check for broken molecules

Use the -b option when running combine.py

Combining Parallel Jobs

Merge all parallel jobs into a unique set of scores.

$mud/combine.py

This combine carefully accounts for all docked molecules, for more informative enrichment plots.

  • Options:

Use -b or --broken to skip finding broken molecules. Use -d or --done to indicate that all subjobs are complete, for the case where you did not submit with a MUD submission script. Use -p or --prefix if your output files are named something other than test. Use --box if your box file is not at ../../grids/box relative to your subjob directories.

  • Creates:
  1. combine.scores - fully processed scores, taking the best one for each id
  2. combine.raw - contains all scores as scrapped from DOCK output
  3. combine.broken - broken molecules and the reason they failed
  4. combine.zeroes - important sanity check

format of combine.scores: <mono> <id> <contact score> <

The .zeroes file is a sanity check because it lists the number of molecules followed by the number of zeroes in each scoring column. Past experience has shown that when DOCK fails randomly and silently, it often generates a large number of zero scores. If this happens, simply re-running the job will give better results.

Computing Enrichments