Difference between revisions of "Job Arrays"
(→Example Qsub Script) |
(→Putting it all together (Example Qsub Submission with submissions script, configuration file, and experiment script) |
||
Line 105: | Line 105: | ||
== Putting it all together (Example Qsub Submission with submissions script, configuration file, and experiment script == | == Putting it all together (Example Qsub Submission with submissions script, configuration file, and experiment script == | ||
+ | <pre> | ||
#!/bin/bash | #!/bin/bash | ||
# ----------------QSUB Parameters----------------- # | # ----------------QSUB Parameters----------------- # | ||
Line 134: | Line 135: | ||
echo blastall -i $experiment -o $experiment\.blast | echo blastall -i $experiment -o $experiment\.blast | ||
done | done | ||
+ | </pre> |
Revision as of 14:07, 14 March 2014
Contents
- 1 Job Array Introduction
- 2 array.sh (Example submission script)
- 3 job.pl (Example Perl script )
- 4 job.conf (example configuration file)
- 5
- 6 Default vs Highthroughput Queue
- 7 Effectively Using job_array_index
- 8 Putting it all together (Example Qsub Submission with submissions script, configuration file, and experiment script
Job Array Introduction[edit]
Making a new copy of the script and then submitting each one for every input data file is time consuming. An alternative is to make a job array using the -t option in your QSUB submission script. The -t option allows many copies of the same script to be queued all at once. You can use the $PBS_ARRAYID environmental variable to differentiate between the different jobs in the array. The amount of resources you specify in the QSUB submission script is the amount of resources the job script gets each time it is called.
In this tutorial, we will be using three files:
array.sh job.pl job.conf
Lets say you want to run 16 jobs. Instead of submitting 16 different jobs, you can submit one job, but use the '-t' parameter and the PBS_ARRAYID variable. You can read more about the '-t' parameter at http://docs.adaptivecomputing.com/torque/4-1-4/Content/topics/commands/qsub.htm
#PBS -t 0-15
The -t parameter sets the range of the PBS_ARRAYID variable. So setting it to
#PBS -t 0-4
will cause the qsub script to call the script 5 times, each time updating the PBS_ARRAYID, from 0 to 4 , which results in
( perl job.pl $PBS_ARRAYID ) perl job.pl 0 perl job.pl 1 perl job.pl 2 perl job.pl 3 perl job.pl 4
array.sh (Example submission script)[edit]
This submission script changes to the current working directory, submits 16 jobs, and reserves 2 processors and 1gb of ram for each job.
It redirects the stderror and stdout into one file, andemails the job owner on completion or abort.
For each job , it passes the '-t' parameter to the job.pl script, which in this case is 0 to 15
#!/bin/bash # ----------------QSUB Parameters----------------- # #PBS -q default #PBS -l nodes=1:ppn=2,mem=1000mb #PBS -M youremail@illinois.edu #PBS -m abe #PBS -N array_of_perl_jobs #PBS -t 0-15 #PBS -j oe # ----------------Load Modules-------------------- # module load perl/5.16.1 # ----------------Your Commands------------------- # cd $PBS_O_WORKDIR perl job.pl $PBS_ARRAYID
job.pl (Example Perl script )[edit]
#!/usr/bin/env perl #This script echos the job array element that has been passed in use strict; my $pbs_array_id = shift @ARGV; my $experimentID = $pbs_array_id + 1; my $experimentName = `head -n $pbs_array_id job.conf | tail -n1`; print "This is job number $argument \n"; print "About to perform experimentID: $argument experimentName:$experimentName\n";
job.conf (example configuration file)[edit]
dataset0 dataset1 dataset2 dataset3 dataset4 dataset5 .. dataset650
[edit]
Default vs Highthroughput Queue[edit]
The default queue only allows you to submit 80 jobs, but they do not use a walltime limit.
- This queue is most appropriate for a lot of jobs that may run a long time.
The high throughput queue allows you to submit 500 jobs, but they have a walltime limit.
- This queue is most appropriate for a lot of jobs that you want to run in parallel and finish quickly.
Effectively Using job_array_index[edit]
You have 650 datasets you want to analyze, but you can only submit 80 jobs at a time. Instead of submitting 80 jobs, and waiting for them to finish, submit a single 80 element array job that can handle all of the datasets.
A simple formula for dividing and sending your datasets to your script is as follows:
data sets per job = ceiling ( Number of datasets / Number of Job Elements ) data sets per Job = ceiling ( 650 / 80 ) = ceiling(8.12500) = 9
So that means that your 80 jobs are each responsible for handling 9 datasets. So each time you call your job script, you need to pass it the position in the list of datasets , which is the $PBS_ARRAY_ID and the data sets per job ( N ) That way, your job will be able to determine which datasets from the list you need to process.
Here is some simple pseudo code for this situation
data sets per job = N startLineNumber = $PBS_ARRAY_ID * datasets per job endLineNumber = startLineNumber + data_sets_per_job open list of data: go to startLineNumber get dataset do work with dataset if lineNumber < endLineNumber go to next line
Putting it all together (Example Qsub Submission with submissions script, configuration file, and experiment script[edit]
#!/bin/bash # ----------------QSUB Parameters----------------- # #PBS -S /bin/bash #PBS -q default #PBS -l nodes=1:ppn=1,mem=1000mb #PBS -M netid@Illinois.edu #PBS -m abe #PBS -N Blast_100_in_chunks_of_10 #PBS -d /home/n-z/yournetid/experimentA/ #PBS -t 10 # ----------------Load Modules-------------------- # module load blast/2.2.26 # ----------------Your Commands------------------- # #PBS_ARRAY_ID=0 for testing purposes taskID=$PBS_ARRAY_ID itemsToProcess=5 jobList="job.conf" startLineNumber=$(($taskID * $itemsToProcess)) endLineNumber=$(( $startLineNumber + $itemsToProcess )) startLineNumber=$(( $startLineNumber + 1)) #Grab an experiment from the job.conf file for line in `seq $startLineNumber $endLineNumber` do experiment=$( head -n $line $jobList | tail -n 1 ) echo blastall -i $experiment -o $experiment\.blast done