Job Arrays

From Carl R. Woese Institute for Genomic Biology - University of Illinois Urbana-Champaign
Revision as of 09:17, 14 March 2018 by Dslater (talk | contribs) (Job Array Introduction)
Jump to navigation Jump to search

Job Array Introduction[edit]

Making a new copy of the script and then submitting each one for every input data file is time consuming. An alternative is to make a job array using the -t option in your SBATCH script. The --array option allows many copies of the same script to be queued all at once. You can use the $SLURM_ARRAY_TASK_ID environmental variable to differentiate between the different jobs in the array. The amount of resources you specify in the SBATCH script is the amount of resources the job script gets each time it is called.

In this tutorial, we will be using three files:

Lets say you want to run 10 jobs. Instead of submitting 10 different jobs, you can submit one job, but use the '--array' parameter and the $SLURM_ARRAY_TASK_ID variable. You can read more about the --array parameter at

#SBATCH --array 1-10

The --array parameter sets the range of the $SLURM_ARRAY_TASK_ID variable. So setting it to

#SBATCH --array 1-4

will cause the qsub script to call the script 4 times, each time updating the $SLURM_ARRAY_TASK_ID, from 1 to 4 , which results in


perl 1
perl 2
perl 3
perl 4 (Example submission script)[edit]

This submission script changes to the current working directory, submits 16 jobs, and reserves 2 processors and 1gb of ram for each job.

It redirects the stderror and stdout into one file, and emails the job owner on completion or abort.

For each job , it passes the '--array' parameter to the script, which in this case is 0 to 15

# ----------------SBATCH Parameters----------------- #
#SBATCH -p normal
#SBATCH -n 1
#SBATCH --mail-user
#SBATCH --mail-type BEGIN, END, FAIL 
#SBATCH -J array_of_jobs
#SBATCH --array 1-10
# ----------------Load Modules-------------------- #
module load perl/5.24.1-IGB-gcc-4.9.4
# ----------------Your Commands------------------- #

perl $SLURM_ARRAY_TASK_ID (Example Perl script )[edit]

#!/usr/bin/env perl
#This script outputs the job array element that has been passed in

use strict;
my $pbs_array_id = shift @ARGV;
my $experimentID = $pbs_array_id;
my $experimentName = `head -n $pbs_array_id job.conf | tail -n1`;

print "This is job number $pbs_array_id \n";
print "About to perform experimentID: $experimentID experimentName:$experimentName\n";

job.conf (example configuration file)[edit]


Effectively Using job_array_index[edit]

You have 650 datasets you want to analyze, but you can only submit 80 jobs at a time. Instead of submitting 80 jobs, and waiting for them to finish, submit a single 80 element array job that can handle all of the datasets.

A simple formula for dividing and sending your datasets to your script is as follows:

 data sets per job = ceiling ( Number of datasets / Number of Job Elements ) 
data sets per Job = ceiling ( 650 / 80 ) = ceiling(8.12500) = 9

So that means that your 80 jobs are each responsible for handling 9 datasets. So each time you call your job script, you need to pass it the position in the list of datasets , which is the $PBS_ARRAYID and the data sets per job ( N ) That way, your job will be able to determine which datasets from the list you need to process.

Here is some simple pseudo code for this situation

data sets per job = N
startLineNumber =  $SLURM_ARRAY_TASK_ID * datasets per job
endLineNumber = startLineNumber + data_sets_per_job

open list of data:
      go to  startLineNumber
                get dataset
                do work with dataset
                if lineNumber <  endLineNumber
                go to next line

Putting it all together (Example SBATCH Submission with submissions script, configuration file, and experiment script[edit]

In order to use the following script, you will need to properly set

  • '--array' (the number of array elements you want)
  • -itemsToProcess (the number of items in the job.conf list to pass into your script)
  • -Your script , modules and custom settings
# ----------------SBATCH Parameters----------------- #
#SBATCH -p normal
#SBATCH -n 1
#SBATCH --mail-user
#SBATCH --mail-type BEGIN, END, FAIL 
#SBATCH -J array_of_jobs
#SBATCH --array 1-10

# ----------------Load Modules-------------------- #
module load BLAST+/2.6.0-IGB-gcc-4.9.4
# ----------------Your Commands------------------- #

#No need to edit this
startLineNumber=$(($taskID * $itemsToProcess))
endLineNumber=$(( $startLineNumber + $itemsToProcess ))
startLineNumber=$(( $startLineNumber + 1))
#Grab an experiment from the job.conf file
for line in `seq $startLineNumber $endLineNumber`
    experiment=$( head -n $line $jobList | tail -n 1 )
 echo blastall -i $experiment -o $experiment\.blast