Difference between revisions of "Job Arrays"

From Carl R. Woese Institute for Genomic Biology - University of Illinois Urbana-Champaign
Jump to navigation Jump to search
(Submitting Jobs Effectively)
(job.pl (Example Perl script ))
(35 intermediate revisions by 2 users not shown)
Line 25: Line 25:
 
This submission script changes to the current working directory, submits 16 jobs, and reserves 2 processors and 1gb of ram for each job.
 
This submission script changes to the current working directory, submits 16 jobs, and reserves 2 processors and 1gb of ram for each job.
  
It redirects the stderror and stdout into one file, andemails the job owner on completion or abort.
+
It redirects the stderror and stdout into one file, and emails the job owner on completion or abort.
  
 
For each job , it passes the '-t' parameter to the job.pl script, which in this case is 0 to 15
 
For each job , it passes the '-t' parameter to the job.pl script, which in this case is 0 to 15
Line 35: Line 35:
 
#PBS -m abe
 
#PBS -m abe
 
#PBS -N array_of_perl_jobs
 
#PBS -N array_of_perl_jobs
#PBS -t 0-15
+
#PBS -t 1-16
 
#PBS -j oe
 
#PBS -j oe
 
# ----------------Load Modules-------------------- #
 
# ----------------Load Modules-------------------- #
Line 42: Line 42:
 
cd $PBS_O_WORKDIR
 
cd $PBS_O_WORKDIR
 
perl job.pl $PBS_ARRAYID</pre>
 
perl job.pl $PBS_ARRAYID</pre>
 +
 
== job.pl (Example Perl script ) ==
 
== job.pl (Example Perl script ) ==
 
<pre>#!/usr/bin/env perl
 
<pre>#!/usr/bin/env perl
Line 47: Line 48:
  
 
use strict;
 
use strict;
my $argument = shift @ARGV;
+
my $pbs_array_id = shift @ARGV;
my $experimentID = $argument + 1;
+
my $experimentID = $pbs_array_id;
my $experimentName = `head -n $argument job.conf | tail -n1`;
+
my $experimentName = `head -n $pbs_array_id job.conf | tail -n1`;
 +
 
 +
print "This is job number $pbs_array_id \n";
 +
print "About to perform experimentID: $experimentID experimentName:$experimentName\n";
 +
</pre>
  
print "This is job number $argument \n";
+
== job.conf (example configuration file) ==
print "About to perform experimentID: $argument experimentName:$experimentName\n";
+
<pre>dataset0
 +
dataset1
 +
dataset2
 +
dataset3
 +
dataset4
 +
dataset5
 +
..
 +
dataset650
 
</pre>
 
</pre>
== Effectively using the Job Array ==
+
== Default vs Highthroughput Queue ==
 +
 
 +
'''The default queue only allows you to queue up to 150 jobs, but they do not use a walltime limit.'''
 +
 
 +
*This queue is most appropriate for a lot of jobs that may run a long time.
  
You will need to have an additional script or configuration file to use the '''PBS_ARRAYID''' effectively. Otherwise you are simply passing an integer into your tool, which may not have much meaning. Below is an example of a configuration file that specifies an experiment to run for job.pl . As the '''PBS_ARRAYID '''variable increments, the script is instructed to perform its action on the next experiment.
+
'''The highthroughput queue allows you to queue up to 550 jobs, but they have a walltime limit.'''
  
== Default vs Highthroughput Queue  ==
+
*This queue is most appropriate for a lot of jobs that you want to run in parallel and finish quickly.
  
The default queue only allows you to submit 80 jobs, but they do not use a walltime limit.
+
== Effectively Using job_array_index ==
This queue is most appropriate for:
 
  
The high throughput queue allows you to submit 500 jobs, but they have a walltime limit.
+
You have 650 datasets you want to analyze, but you can only submit 80 jobs at a time. Instead of submitting 80 jobs, and waiting for them to finish, submit a single 80 element array job that can handle all of the datasets.
This queue is most appropriate for:
 
  
== Submitting Jobs Effectively==
+
A simple formula for dividing and sending your datasets to your script is as follows:
 +
<pre> data sets per job = ceiling ( Number of datasets / Number of Job Elements )
 +
data sets per Job = ceiling ( 650 / 80 ) = ceiling(8.12500) = 9
 +
</pre>
 +
So that means that your 80 jobs are each responsible for handling 9 datasets. So each time you call your job script, you need to pass it the position in the list of datasets , which is the $PBS_ARRAYID and the data sets per job ( N ) That way, your job will be able to determine which datasets from the list you need to process.
  
We frequently encounter users having difficulty submitting jobs in the right way to the right queue.
 
Here are some of the common scenarios and our suggested resolutions.
 
  
  
Scenario A,
+
Here is some simple pseudo code for this situation
 +
<pre>data sets per job = N
 +
startLineNumber =  $PBS_ARRAYID * datasets per job
 +
endLineNumber = startLineNumber + data_sets_per_job
  
You have 600 jobs/experiments that run relatively quickly.
+
open list of data:
 +
      go to  startLineNumber
 +
                get dataset
 +
                do work with dataset
 +
                if lineNumber <  endLineNumber
 +
                go to next line
 +
</pre>
 +
== Putting it all together (Example Qsub Submission with submissions script, configuration file, and experiment script ==
  
Scenario B,
+
In order to use the following script, you will need to properly set
  
You have 600 jobs/experiments that take many hours to run.
+
*'-t' (the number of array elements you want)
  
Solution:
+
*-itemsToProcess (the number of items in the job.conf list to pass into your script)
You will not be able to use the high throughput queue due to the walltime limit. You will need to iterate over the jobs in chunks of
 
  
== job.conf (example configuration file) ==
+
*-Your script , modules and custom settings
<pre>
+
<pre>#!/bin/bash
experimentA
+
# ----------------QSUB Parameters----------------- #
experimentB
+
#PBS -S /bin/bash
experimentC
+
#PBS -q default
experimentD
+
#PBS -l nodes=1:ppn=1,mem=1000mb
experimentE
+
#PBS -M netid@Illinois.edu
experimentF
+
#PBS -m abe
experimentG
+
#PBS -N Blast_100_in_chunks_of_10
experimentH
+
#PBS -d /home/n-z/yournetid/experimentA/
experimentI
+
# --EDIT HERE
experimentJ
+
#PBS -t 1-10
experimentK
+
# ----------------Load Modules-------------------- #
experimentL
+
module load blast/2.2.26
experimentM
+
# ----------------Your Commands------------------- #
experimentN
+
# --EDIT HERE
experimentO
+
itemsToProcess=10
experimentP
+
jobList="job.conf"
experimentQ
+
 
experimentR
+
#No need to edit this
experimentS
+
taskID=$PBS_ARRAYID
experimentT
+
startLineNumber=$(($taskID * $itemsToProcess))
experimentU
+
endLineNumber=$(( $startLineNumber + $itemsToProcess ))
experimentV
+
startLineNumber=$(( $startLineNumber + 1))
experimentW
+
#Grab an experiment from the job.conf file
experimentX
+
for line in `seq $startLineNumber $endLineNumber`
experimentY
+
do
experimentZ
+
    experiment=$( head -n $line $jobList | tail -n 1 )
 +
# --EDIT HERE
 +
echo blastall -i $experiment -o $experiment\.blast
 +
done
 
</pre>
 
</pre>

Revision as of 15:31, 5 February 2015

Job Array Introduction[edit]

Making a new copy of the script and then submitting each one for every input data file is time consuming. An alternative is to make a job array using the -t option in your QSUB submission script. The -t option allows many copies of the same script to be queued all at once. You can use the $PBS_ARRAYID environmental variable to differentiate between the different jobs in the array. The amount of resources you specify in the QSUB submission script is the amount of resources the job script gets each time it is called.

In this tutorial, we will be using three files:

array.sh
job.pl
job.conf

Lets say you want to run 16 jobs. Instead of submitting 16 different jobs, you can submit one job, but use the '-t' parameter and the PBS_ARRAYID variable. You can read more about the  '-t'  parameter at http://docs.adaptivecomputing.com/torque/4-1-4/Content/topics/commands/qsub.htm

#PBS -t 0-15

The -t parameter sets the range of the PBS_ARRAYID variable. So setting it to

#PBS -t 0-4

will cause the qsub script to call the script 5 times, each time updating the PBS_ARRAYID, from 0 to 4 , which results in

( perl job.pl $PBS_ARRAYID )

perl job.pl 0 
perl job.pl 1
perl job.pl 2
perl job.pl 3
perl job.pl 4

array.sh (Example submission script)[edit]

This submission script changes to the current working directory, submits 16 jobs, and reserves 2 processors and 1gb of ram for each job.

It redirects the stderror and stdout into one file, and emails the job owner on completion or abort.

For each job , it passes the '-t' parameter to the job.pl script, which in this case is 0 to 15

#!/bin/bash
# ----------------QSUB Parameters----------------- #
#PBS -q default
#PBS -l nodes=1:ppn=2,mem=1000mb
#PBS -M youremail@illinois.edu
#PBS -m abe
#PBS -N array_of_perl_jobs
#PBS -t 1-16
#PBS -j oe
# ----------------Load Modules-------------------- #
module load perl/5.16.1
# ----------------Your Commands------------------- #
cd $PBS_O_WORKDIR
perl job.pl $PBS_ARRAYID

job.pl (Example Perl script )[edit]

#!/usr/bin/env perl
#This script echos the job array element that has been passed in

use strict;
my $pbs_array_id = shift @ARGV;
my $experimentID = $pbs_array_id;
my $experimentName = `head -n $pbs_array_id job.conf | tail -n1`;

print "This is job number $pbs_array_id \n";
print "About to perform experimentID: $experimentID experimentName:$experimentName\n";

job.conf (example configuration file)[edit]

dataset0
dataset1
dataset2
dataset3
dataset4
dataset5
..
dataset650

Default vs Highthroughput Queue[edit]

The default queue only allows you to queue up to 150 jobs, but they do not use a walltime limit.

  • This queue is most appropriate for a lot of jobs that may run a long time.

The highthroughput queue allows you to queue up to 550 jobs, but they have a walltime limit.

  • This queue is most appropriate for a lot of jobs that you want to run in parallel and finish quickly.

Effectively Using job_array_index[edit]

You have 650 datasets you want to analyze, but you can only submit 80 jobs at a time. Instead of submitting 80 jobs, and waiting for them to finish, submit a single 80 element array job that can handle all of the datasets.

A simple formula for dividing and sending your datasets to your script is as follows:

 data sets per job = ceiling ( Number of datasets / Number of Job Elements ) 
data sets per Job = ceiling ( 650 / 80 ) = ceiling(8.12500) = 9

So that means that your 80 jobs are each responsible for handling 9 datasets. So each time you call your job script, you need to pass it the position in the list of datasets , which is the $PBS_ARRAYID and the data sets per job ( N ) That way, your job will be able to determine which datasets from the list you need to process.


Here is some simple pseudo code for this situation

data sets per job = N
startLineNumber =  $PBS_ARRAYID * datasets per job
endLineNumber = startLineNumber + data_sets_per_job

open list of data:
      go to  startLineNumber
                get dataset
                do work with dataset
                if lineNumber <  endLineNumber
                go to next line
 

Putting it all together (Example Qsub Submission with submissions script, configuration file, and experiment script[edit]

In order to use the following script, you will need to properly set

  • '-t' (the number of array elements you want)
  • -itemsToProcess (the number of items in the job.conf list to pass into your script)
  • -Your script , modules and custom settings
#!/bin/bash
# ----------------QSUB Parameters----------------- #
#PBS -S /bin/bash
#PBS -q default
#PBS -l nodes=1:ppn=1,mem=1000mb
#PBS -M netid@Illinois.edu
#PBS -m abe
#PBS -N Blast_100_in_chunks_of_10
#PBS -d /home/n-z/yournetid/experimentA/
# --EDIT HERE
#PBS -t 1-10 
# ----------------Load Modules-------------------- #
module load blast/2.2.26
# ----------------Your Commands------------------- #
# --EDIT HERE
itemsToProcess=10
jobList="job.conf"

#No need to edit this
taskID=$PBS_ARRAYID
startLineNumber=$(($taskID * $itemsToProcess))
endLineNumber=$(( $startLineNumber + $itemsToProcess ))
startLineNumber=$(( $startLineNumber + 1))
#Grab an experiment from the job.conf file
for line in `seq $startLineNumber $endLineNumber`
do
    experiment=$( head -n $line $jobList | tail -n 1 )
# --EDIT HERE
 echo blastall -i $experiment -o $experiment\.blast
done