Difference between revisions of "DOCK on AWS"

From DISI
Jump to navigation Jump to search
(asdf)
 
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
DOCK on AWS
 
DOCK on AWS
  
* create an account
+
= First time set up =
* create an S3 bucket "results2021"
+
 
* within create dockfiles, database.txt output1
+
* 1. create an account
* set up the ability to upload. On your client computer you need awscli. you need to set up your credentials. you need to upload. also work on download.  
+
https://docs.aws.amazon.com/batch/latest/userguide/spot_fleet_IAM_role.html#spot-fleet-roles-cli
* put someproject.dockfiles.tgz into dockfiles.
+
define these roles
* you need to set up credentials for aws.  
+
AmazonEC2SpotFleetRole, AWSServiceRoleForEC2Spot and AWSServiceRoleForEC2SpotFleet.
* go to AWS Batch. use new batch experience.  
+
* 2. create awsuser (optional!)
* set up compute environments. env1.  managed. env1. enable computer environment. AWSBatchService role. Spot. maximum on demand price 100.  minimum vCPUs 0. max vCPUs 256.  desired vCPUs 0. BEST_FIT_PROGRESSIVE.  
+
* 3. create an S3 bucket "results2021"
* set up job queue. queue1.  
+
within create dockfiles, database.txt output1
* set up job definition. jobdef4.  EC2. retry 1. execution timeout 14400. image btingle/dockaws:latest . bash . vcpus 1. memory 2048.  
+
* 4. set up aws cli access
* S3_DOCKFILES_LOCATION s3://results2021/dockfiles
+
set up the ability to upload. On your client computer you need awscli. you need to set up your credentials. you need to upload. also work on download.
* SHRTCACHE /tmp
+
 
* AWS_ACCESS_KEY_ID xxxxx
+
* 5. AWS Batch. choose new batch experience.  
* S3_INPUT_LOCATION s3://btingletestbucket/input
+
set up compute environments. env1.  managed. env1. enable computer environment. AWSBatchService role. Spot. maximum on demand price 100.  minimum vCPUs 0. max vCPUs 256.  desired vCPUs 0. BEST_FIT_PROGRESSIVE.  
* AWS_SECRET_ACCESS_KEY  xxxxx
+
set up job queue. queue1.  
* S3_OUTPUT_LOCATION s3://btingletestbucket/output1
+
set up job definition. jobdef4.  EC2. retry 1. execution timeout 14400. image btingle/dockaws:latest . bash . vcpus 1. memory 2048.  
* AWS_DEFAULT_REGION us-east-2
+
S3_DOCKFILES_LOCATION s3://results2021/dockfiles
* Enable privileged mode root.  
+
SHRTCACHE /tmp
* log driver awslogs
+
AWS_ACCESS_KEY_ID xxxxx
* go to jobs. submit new job.  
+
S3_INPUT_LOCATION s3://btingletestbucket/input
 +
AWS_SECRET_ACCESS_KEY  xxxxx
 +
S3_OUTPUT_LOCATION s3://btingletestbucket/output1
 +
AWS_DEFAULT_REGION us-east-2
 +
Enable privileged mode root.  
 +
log driver awslogs
 +
 
 +
 
 +
= set up docker image =
 +
 
 +
Either you or a colleague set up the docker image.
 +
 
 +
[[Build new dock64 docker image]]
 +
 
 +
now you will invoke the docker image on AWS.
 +
You are welcome to use ours.
 +
Be very careful about exposing aws credentials.
 +
 
 +
= Set up database list to run =
 +
* 1. make sure zinc-22/sets is current (current within 7 days?  ask JJI if in doubt)
 +
* 2. decide on the range you want to dock. here we choose lead-like
 +
* 3. decide on the platform you want to dock on. here we choose AWS (S3)
 +
 
 +
cd zinc-22/sets
 +
cat *.lead-like.*.s3 >  ~/.aws/my-todock-list.s3
 +
 
 +
See [[Selecting tranches in ZINC22]]
 +
 
 +
* 4. get your dockfiles. Use JK-coloring. [[Coloring and Subcluster Matching]]
 +
 
 +
* 5. upload dockfiles and database selection to AWS S3.
 +
(you already set up ~/.aws/credentials previously)
 +
aws s3 cp myjob-dockfiles.tgz s3://results2021/dockfiles/
 +
aws s3 cp my-todock-list.s3 s3://results2021/databases/
 +
 
 +
Recommend uploading a test set to make sure things work.
 +
grep xaa  my-todock-list.s3 > my-to-dock-sample.s3
 +
grep -v xaa my-todock-list.s3 > my-to-dock-balance.s3
 +
aws s3 cp my-to-dock*.s3 s3://results2021/databases/
 +
 
 +
* 6. go to aws.amazon.com and go to S3 to confirm your files landed correctly.
 +
* 7. to go Batch and review Compute Environments, Job Queues, Job Definitions and refine if needed.
 +
* 8. go to Jobs and set up a new job using your sample.
 +
You are testing that you can do a round-trip with your data before you launch at scale.
 +
 
 +
* 9. verify test job ran correctly.  Also check time, nmatch, sampling, and look for errors in the logs indicating problems.
 +
* 10. launch at scale.
 +
* 11. montior job progress
 +
 
 +
* 12. when done, process the results.
 +
 
 +
= Each job =
 +
* 1. Upload someproject.dockfiles.tgz into dockfiles.
 +
* 2. reference database set up above.
 +
* 2. set up output directory in S3.
 +
* 3. set up job
 +
* 4. run job
 +
 
 +
 
 +
= after job completes =
 +
* 1. check for complete and run
 +
* 2. combine blazing fast
 +
* 3. extract mol2 files
 +
* 4. download data for processing and review.
 +
 
 +
 
 +
= job maintenance =
 +
* 1. move to glacier
 +
* 2. run a variation of the job
 +
* 3. harvest a variation of the job
 +
 
 +
 
 +
= background reading =
 +
* https://aws.amazon.com/getting-started/hands-on/run-batch-jobs-at-scale-with-ec2-spot/
 +
* troubleshooting: https://aws.amazon.com/premiumsupport/knowledge-center/batch-invalid-compute-environment
 +
* watching out for spending too much money
 +
* more debugging https://aws.amazon.com/premiumsupport/knowledge-center/batch-job-stuck-runnable-status/
  
* still to follow is
 
* a. restart / complete all jobs
 
* b. combine blazing fast
 
* c. extract mol2 files.
 
* d. download data
 
* e. at beginning, set up which databases to dock by curating a list.
 
  
 
[[Category:DOCK3.8]]
 
[[Category:DOCK3.8]]
 
[[Category:ZINC22]]
 
[[Category:ZINC22]]
 
[[Category:AWS]]
 
[[Category:AWS]]

Latest revision as of 17:25, 13 April 2021

DOCK on AWS

First time set up

  • 1. create an account
https://docs.aws.amazon.com/batch/latest/userguide/spot_fleet_IAM_role.html#spot-fleet-roles-cli

define these roles

AmazonEC2SpotFleetRole, AWSServiceRoleForEC2Spot and AWSServiceRoleForEC2SpotFleet.
  • 2. create awsuser (optional!)
  • 3. create an S3 bucket "results2021"
within create dockfiles, database.txt output1
  • 4. set up aws cli access
set up the ability to upload. On your client computer you need awscli. you need to set up your credentials. you need to upload. also work on download.
  • 5. AWS Batch. choose new batch experience.
set up compute environments. env1.  managed. env1. enable computer environment. AWSBatchService role. Spot. maximum on demand price 100.  minimum vCPUs 0. max vCPUs 256.  desired vCPUs 0. BEST_FIT_PROGRESSIVE. 
set up job queue. queue1. 
set up job definition. jobdef4.  EC2. retry 1. execution timeout 14400. image btingle/dockaws:latest . bash . vcpus 1. memory 2048. 
S3_DOCKFILES_LOCATION s3://results2021/dockfiles
SHRTCACHE /tmp
AWS_ACCESS_KEY_ID xxxxx
S3_INPUT_LOCATION s3://btingletestbucket/input
AWS_SECRET_ACCESS_KEY  xxxxx
S3_OUTPUT_LOCATION s3://btingletestbucket/output1
AWS_DEFAULT_REGION us-east-2
Enable privileged mode root. 
log driver awslogs


set up docker image

Either you or a colleague set up the docker image.

Build new dock64 docker image

now you will invoke the docker image on AWS. You are welcome to use ours. Be very careful about exposing aws credentials.

Set up database list to run

  • 1. make sure zinc-22/sets is current (current within 7 days? ask JJI if in doubt)
  • 2. decide on the range you want to dock. here we choose lead-like
  • 3. decide on the platform you want to dock on. here we choose AWS (S3)
cd zinc-22/sets
cat *.lead-like.*.s3 >  ~/.aws/my-todock-list.s3

See Selecting tranches in ZINC22

  • 5. upload dockfiles and database selection to AWS S3.

(you already set up ~/.aws/credentials previously)

aws s3 cp myjob-dockfiles.tgz s3://results2021/dockfiles/
aws s3 cp my-todock-list.s3 s3://results2021/databases/

Recommend uploading a test set to make sure things work.

grep xaa  my-todock-list.s3 > my-to-dock-sample.s3
grep -v xaa my-todock-list.s3 > my-to-dock-balance.s3
aws s3 cp my-to-dock*.s3 s3://results2021/databases/
  • 6. go to aws.amazon.com and go to S3 to confirm your files landed correctly.
  • 7. to go Batch and review Compute Environments, Job Queues, Job Definitions and refine if needed.
  • 8. go to Jobs and set up a new job using your sample.

You are testing that you can do a round-trip with your data before you launch at scale.

  • 9. verify test job ran correctly. Also check time, nmatch, sampling, and look for errors in the logs indicating problems.
  • 10. launch at scale.
  • 11. montior job progress
  • 12. when done, process the results.

Each job

  • 1. Upload someproject.dockfiles.tgz into dockfiles.
  • 2. reference database set up above.
  • 2. set up output directory in S3.
  • 3. set up job
  • 4. run job


after job completes

  • 1. check for complete and run
  • 2. combine blazing fast
  • 3. extract mol2 files
  • 4. download data for processing and review.


job maintenance

  • 1. move to glacier
  • 2. run a variation of the job
  • 3. harvest a variation of the job


background reading