Difference between revisions of "Job Arrays"

From Carl R. Woese Institute for Genomic Biology - University of Illinois Urbana-Champaign
Jump to navigation Jump to search
(Job Array Introduction)
Line 1: Line 1:
 
==Job Array Introduction ==
 
==Job Array Introduction ==
  
Making a new copy of the script and then submitting each one for every input data file is time consuming. An alternative is to make a job array using the -t option in your submission script. The -t option allows many copies of the same script to be queued all at once. You can use the PBS_ARRAYID to differentiate between the different jobs in the array.  
+
Making a new copy of the script and then submitting each one for every input data file is time consuming. An alternative is to make a job array using the -t option in your submission script. The -t option allows many copies of the same script to be queued all at once. You can use the $PBS_ARRAYID environmental variable to differentiate between the different jobs in the array.  
 
The amount of resources you specify in the QSUB script is the amount of resources the script gets each time it is called.
 
The amount of resources you specify in the QSUB script is the amount of resources the script gets each time it is called.
  

Revision as of 13:09, 13 March 2014

Job Array Introduction[edit]

Making a new copy of the script and then submitting each one for every input data file is time consuming. An alternative is to make a job array using the -t option in your submission script. The -t option allows many copies of the same script to be queued all at once. You can use the $PBS_ARRAYID environmental variable to differentiate between the different jobs in the array. The amount of resources you specify in the QSUB script is the amount of resources the script gets each time it is called.

Lets say you want to run 16 jobs. Instead of submitting 16 different jobs, you can submit one job, but use the -T parameter and the PBS_ARRAYID variable.

#PBS -T 0-15

The -T parameter sets the range of the PBS_ARRAYID variable. So setting it to

#PBS -T 0-4

will cause the qsub script to call the script 5 times, each time updating the PBS_ARRAYID, from 0 to 4 , which results in

( perl job.pl $PBS_ARRAYID )

perl job.pl 0 
perl job.pl 1
perl job.pl 2
perl job.pl 3
perl job.pl 4

Effectively using the Job Array[edit]

You will need to have an additional script or configuration file to use the index. Here is an example of a configuration file that specifies an experiment to run for job.pl , that is sorted by array Common usages include a list of elements such as

Example Qsub Array Script[edit]

This submission script submits 16 jobs, and reserves 2 processors and 1gb of ram for each job. It passes the parameter to the job.pl script

#!/bin/bash
# ----------------QSUB Parameters----------------- #
#PBS -q default
#PBS -l nodes=1:ppn=2,mem=1000mb
#PBS -M youremail@illinois.edu
#PBS -m abe
#PBS -N array_of_perl_jobs
#PBS -t 0-15
#PBS -j oe
# ----------------Load Modules-------------------- #
module load perl/5.16.1
# ----------------Your Commands------------------- #
cd $PBS_O_WORKDIR
perl job.pl $PBS_ARRAYID

Example Perl Script job.pl[edit]

#!/usr/bin/env perl
#This script echos the job array element that has been passed in

use strict;
my $argument = shift @ARGV;
print "This is job number $argument\n";

job.conf[edit]

0 experimentA
1 experimentB
3 experimentC
4 experimentD