Difference between revisions of "Job Arrays"

From Carl R. Woese Institute for Genomic Biology - University of Illinois Urbana-Champaign
Jump to navigation Jump to search
Line 1: Line 1:
 
== Job Array Introduction ==
 
== Job Array Introduction ==
  
Making a new copy of the script and then submitting each one for every input data file is time consuming. An alternative is to make a job array using the -t option in your '''QSUB''' '''submission''' '''script'''. The -t option allows many copies of the same script to be queued all at once. You can use the '''$PBS_ARRAYID''' environmental variable to differentiate between the different jobs in the array. The amount of resources you specify in the '''QSUB''' '''submission''' '''script''' is the amount of resources the script gets each time it is called.
+
Making a new copy of the script and then submitting each one for every input data file is time consuming. An alternative is to make a job array using the -t option in your '''QSUB''' '''submission''' '''script'''. The -t option allows many copies of the same script to be queued all at once. You can use the '''$PBS_ARRAYID''' environmental variable to differentiate between the different jobs in the array. The amount of resources you specify in the '''QSUB''' '''submission''' '''script''' is the amount of resources the'''job script''' gets each time it is called.
  
 
In this tutorial, we will be using three files:
 
In this tutorial, we will be using three files:
Line 21: Line 21:
 
perl job.pl 4
 
perl job.pl 4
 
</pre>
 
</pre>
 
 
== array.sh (Example submission script) ==
 
== array.sh (Example submission script) ==
  
This submission script changes to the current working directory, submits 16 jobs, and reserves 2 processors and 1gb of ram for each job.  
+
This submission script changes to the current working directory, submits 16 jobs, and reserves 2 processors and 1gb of ram for each job.
  
It redirects the stderror and stdout into one file, andemails the job owner on completion or abort.
+
It redirects the stderror and stdout into one file, andemails the job owner on completion or abort.
  
 
For each job , it passes the '-t' parameter to the job.pl script, which in this case is 0 to 15
 
For each job , it passes the '-t' parameter to the job.pl script, which in this case is 0 to 15
 
+
<pre>#!/bin/bash
<pre>
 
#!/bin/bash
 
 
# ----------------QSUB Parameters----------------- #
 
# ----------------QSUB Parameters----------------- #
 
#PBS -q default
 
#PBS -q default
Line 45: Line 42:
 
cd $PBS_O_WORKDIR
 
cd $PBS_O_WORKDIR
 
perl job.pl $PBS_ARRAYID</pre>
 
perl job.pl $PBS_ARRAYID</pre>
 
+
== job.pl (Example Perl script ) ==
==job.pl (Example Perl script )==
+
<pre>#!/usr/bin/env perl
<pre>
 
#!/usr/bin/env perl
 
 
#This script echos the job array element that has been passed in
 
#This script echos the job array element that has been passed in
  
Line 59: Line 54:
 
print "About to perform experimentID: $argument experimentName:$experimentName\n";
 
print "About to perform experimentID: $argument experimentName:$experimentName\n";
 
</pre>
 
</pre>
 
 
== Effectively using the Job Array ==
 
== Effectively using the Job Array ==
  
Line 65: Line 59:
  
 
== job.conf (example configuration file) ==
 
== job.conf (example configuration file) ==
<pre>
+
<pre>experimentA
experimentA
 
 
experimentB
 
experimentB
 
experimentC
 
experimentC

Revision as of 15:29, 13 March 2014

Job Array Introduction[edit]

Making a new copy of the script and then submitting each one for every input data file is time consuming. An alternative is to make a job array using the -t option in your QSUB submission script. The -t option allows many copies of the same script to be queued all at once. You can use the $PBS_ARRAYID environmental variable to differentiate between the different jobs in the array. The amount of resources you specify in the QSUB submission script is the amount of resources thejob script gets each time it is called.

In this tutorial, we will be using three files:

array.sh
job.pl
job.conf

Lets say you want to run 16 jobs. Instead of submitting 16 different jobs, you can submit one job, but use the -t parameter and the PBS_ARRAYID variable.

#PBS -t 0-15

The -t parameter sets the range of the PBS_ARRAYID variable. So setting it to

#PBS -t 0-4

will cause the qsub script to call the script 5 times, each time updating the PBS_ARRAYID, from 0 to 4 , which results in

( perl job.pl $PBS_ARRAYID )

perl job.pl 0 
perl job.pl 1
perl job.pl 2
perl job.pl 3
perl job.pl 4

array.sh (Example submission script)[edit]

This submission script changes to the current working directory, submits 16 jobs, and reserves 2 processors and 1gb of ram for each job.

It redirects the stderror and stdout into one file, andemails the job owner on completion or abort.

For each job , it passes the '-t' parameter to the job.pl script, which in this case is 0 to 15

#!/bin/bash
# ----------------QSUB Parameters----------------- #
#PBS -q default
#PBS -l nodes=1:ppn=2,mem=1000mb
#PBS -M youremail@illinois.edu
#PBS -m abe
#PBS -N array_of_perl_jobs
#PBS -t 0-15
#PBS -j oe
# ----------------Load Modules-------------------- #
module load perl/5.16.1
# ----------------Your Commands------------------- #
cd $PBS_O_WORKDIR
perl job.pl $PBS_ARRAYID

job.pl (Example Perl script )[edit]

#!/usr/bin/env perl
#This script echos the job array element that has been passed in

use strict;
my $argument = shift @ARGV;
my $experimentID = $argument + 1;
my $experimentName = `head -n $argument job.conf | tail -n1`;

print "This is job number $argument \n";
print "About to perform experimentID: $argument experimentName:$experimentName\n";

Effectively using the Job Array[edit]

You will need to have an additional script or configuration file to use the PBS_ARRAYID effectively. Otherwise you are simple passing an integer into your tool, which may not have much meaning. Below is an example of a configuration file that specifies an experiment to run for job.pl . As the PBS_ARRAYID variable increments, the script is instructed to perform its action on the next experiment.

job.conf (example configuration file)[edit]

experimentA
experimentB
experimentC
experimentD
experimentE
experimentF
experimentG
experimentN
experimentZ