Difference between revisions of "Biocluster2"

From Carl R. Woese Institute for Genomic Biology - University of Illinois Urbana-Champaign
Jump to navigation Jump to search
(Transferring using SFTP/SCP)
Line 1: Line 1:
 
__TOC__
 
__TOC__
== Quick Links ==
+
==Quick Links==
  
*Main Site - [http://biocluster2.igb.illinois.edu http://biocluster2.igb.illinois.edu]
+
* Main Site - [http://biocluster2.igb.illinois.edu http://biocluster2.igb.illinois.edu]
*Request Account - [http://www.igb.illinois.edu/content/biocluster-account-form http://www.igb.illinois.edu/content/biocluster-account-form]
+
* Request Account - [http://www.igb.illinois.edu/content/biocluster-account-form http://www.igb.illinois.edu/content/biocluster-account-form]
*Cluster Accounting - [https://biocluster2.igb.illinois.edu/accounting/ https://biocluster2.igb.illinois.edu/accounting/]
+
* Cluster Accounting - [https://biocluster2.igb.illinois.edu/accounting/ https://biocluster2.igb.illinois.edu/accounting/]
*Cluster Monitoring - [http://biocluster2.igb.illinois.edu/ganglia/ http://biocluster2.igb.illinois.edu/ganglia/]
+
* Cluster Monitoring - [http://biocluster2.igb.illinois.edu/ganglia/ http://biocluster2.igb.illinois.edu/ganglia/]
*SLURM Script Generator - [http://www-app2.igb.illinois.edu/tools/slurm/ http://www-app2.igb.illinois.edu/tools/slurm/]
+
* SLURM Script Generator - [http://www-app2.igb.illinois.edu/tools/slurm/ http://www-app2.igb.illinois.edu/tools/slurm/]
  
== Description ==
+
==Description==
 +
Biocluster is the High Performance Computing (HPC) resource for the Carl R Wose Institute for Genomic Biology (IGB) at the University of Illinois at Urbana-Champaign (UIUC). Containing 2824 cores and over 27.7 TB of RAM, Biocluster has a mix of various RAM and CPU configurations on nodes to best serve the various computation needs of the IGB and the Bioinformatics community at UIUC. For storage, Biocluster has 2619 TB of storage on its Ceph filesystem for reliable high speed data transfers within the cluster. Networking in Biocluster is either 1 or 10 Gigibit ethernet depending on the class of node and its data transfer needs.
  
Biocluster is the High Performance Computing (HPC) resource for the Carl R Wose Institute for Genomic Biology (IGB) at the University of Illinois at Urbana-Champaign (UIUC).  Containing 2824 cores and over 27.7 TB of RAM, Biocluster has a mix of various RAM and CPU configurations on nodes to best serve the various computation needs of the IGB and the Bioinformatics community at UIUC.  For storage, Biocluster has 2619 TB of storage on its Ceph filesystem for reliable high speed data transfers within the cluster.   Networking in Biocluster is either 1 or 10 Gigibit ethernet depending on the class of node and its data transfer needs.
+
==Cluster Specifications==
  
== Cluster Specifications ==
+
{| border="1" width="1200" cellspacing="1" cellpadding="0" align="center"
 
 
{| class="wikitable" width="1200" align="center" border="1" cellpadding="0" cellspacing="1"
 
 
|-
 
|-
! Queue Name
+
!|Queue Name
! Nodes
+
!|Nodes
! CPUs
+
!|CPUs
! Memory
+
!|Memory
! Networking
+
!|Networking
! Scratch Space /scatch
+
!|Scratch Space /scatch
! GPUs
+
!|GPUs
 
|-
 
|-
| normal (default)
+
||normal (default)
| 20 Poweredge R620
+
||20 Poweredge R620
| 24 Intel Xeon E5-2697 @ 2.7GHz
+
||24 Intel Xeon E5-2697 @ 2.7GHz
| 384GB
+
||384GB
| 10GB Ethernet
+
||10GB Ethernet
| 1TB
+
||1TB
|
+
|
 +
 
 
|-
 
|-
| himemeory
+
||himemeory
| 2 Supermicro
+
||2 Supermicro
| 28 Intel Xeon E5-2690 v4 @ 2.6GHz
+
||28 Intel Xeon E5-2690 v4 @ 2.6GHz
| 768GB
+
||768GB
| 10GB Ethernet
+
||10GB Ethernet
| 4TB SSD
+
||4TB SSD
|
+
|
 +
 
 
|-
 
|-
| largememory
+
||largememory
| 1 Poweredge R910
+
||1 Poweredge R910
| 24 Intel Xeon E7540 @ 2.0GHz
+
||24 Intel Xeon E7540 @ 2.0GHz
| 1TB
+
||1TB
| 10GB Ethernet
+
||10GB Ethernet
| 1TB
+
||1TB
|
+
|
 +
 
 
|-
 
|-
| budget
+
||budget
| 10 Dell Poweredge R410  
+
||10 Dell Poweredge R410
| 8 Intel Xeon E5530 @ 2.4Ghz
+
||8 Intel Xeon E5530 @ 2.4Ghz
| 32GB or 48GB
+
||32GB or 48GB
| 1GB Ethernet
+
||1GB Ethernet
| 1TB
+
||1TB
|  
+
|
 +
 
 
|-
 
|-
| gpu
+
||gpu
| 1 Supermicro
+
||1 Supermicro
| 28 Intel Xeon E5-2680 @ 2.4Ghz
+
||28 Intel Xeon E5-2680 @ 2.4Ghz
| 128GB  
+
||128GB
| 1GB Ethernet
+
||1GB Ethernet
| 1TB
+
||1TB
| 4 NVIDIA GeForce GTX 1080 Ti  
+
||4 NVIDIA GeForce GTX 1080 Ti
 
|-
 
|-
| classroom
+
||classroom
| 22 Dell Poweredge PE1950
+
||22 Dell Poweredge PE1950
| 8 Intel Xeon E5440 @ 2.83GHz
+
||8 Intel Xeon E5440 @ 2.83GHz
| 16GB
+
||16GB
| 1GB Ethernet
+
||1GB Ethernet
| 500GB
+
||500GB
|  
+
|
 +
 
 
|}
 
|}
  
== Storage ==
+
==Storage==
 +
===Information===
 +
 
 +
* The storage system is a CEPH filesystem with 2.4 Petabytes of disk space. This data is '''NOT''' backed up.
 +
* The data is spread across 15 CEPH storage nodes.
  
=== Information ===
+
===Cost===
*The storage system is a CEPH filesystem with 2.4 Petabytes of disk space.  This data is '''NOT''' backed up.
 
*The data is spread across 15 CEPH storage nodes.
 
  
=== Cost ===
+
{| border="1" width="624" cellspacing="1" cellpadding="0" align="center"
{| class="wikitable" width="624" align="center" border="1" cellpadding="0" cellspacing="1"
 
 
|-
 
|-
! Cost (Per Terabyte Per Month)
+
!|Cost (Per Terabyte Per Month)
 
|-
 
|-
| $10
+
||$10
 
|}
 
|}
  
== Queue Costs ==
+
==Queue Costs==
 
 
 
The cost for each job is dependent on which queue it is submitted to. Listed below are the different queues on the cluster with their cost.
 
The cost for each job is dependent on which queue it is submitted to. Listed below are the different queues on the cluster with their cost.
  
Usage is charge by the second.  The CPU cost and memory cost are compared and the largest is what is billed.
+
Usage is charge by the second. The CPU cost and memory cost are compared and the largest is what is billed.
 
+
{| border="1" width="624" cellspacing="1" cellpadding="0" align="center"
{| class="wikitable" width="624" align="center" border="1" cellpadding="0" cellspacing="1"
 
 
|-
 
|-
! Queue Name
+
!|Queue Name
! CPU Cost ($ per CPU per day)
+
!|CPU Cost ($ per CPU per day)
! Memory Cost ($ per GB per day)
+
!|Memory Cost ($ per GB per day)
 
|-
 
|-
| norma (default)
+
||norma (default)
| $1.00
+
||$1.00
| $0.07
+
||$0.07
 
|-
 
|-
| himemeory
+
||himemeory
| $2.00
+
||$2.00
| $0.07
+
||$0.07
 
|-
 
|-
| largememory
+
||largememory
| $8.50
+
||$8.50
| $0.20<br/>
+
||$0.20
 +
 
 
|-
 
|-
| budget<br/>
+
||budget
| $0.50
+
 
| $0.13
+
||$0.50
 +
||$0.13
 
|}
 
|}
  
== Gaining Access ==
+
==Gaining Access==
  
*Please fill out the form at [http://www.igb.illinois.edu/content/biocluster-account-form http://www.igb.illinois.edu/content/biocluster-account-form] to request access to the Biocluster.
+
* Please fill out the form at [http://www.igb.illinois.edu/content/biocluster-account-form http://www.igb.illinois.edu/content/biocluster-account-form] to request access to the Biocluster.
  
== Cluster Rules ==
+
==Cluster Rules==
  
*'''Running jobs on the head node or login nodes are strictly prohibited.''' Running jobs on the head node could cause the entire cluster to crash and affect everyone's jobs on the cluster. Any program found to be running on the headnode will be stopped immediately and your account could be locked. You can start an interactive session to login to a node to manual run programs.
+
* '''Running jobs on the head node or login nodes are strictly prohibited.''' Running jobs on the head node could cause the entire cluster to crash and affect everyone's jobs on the cluster. Any program found to be running on the headnode will be stopped immediately and your account could be locked. You can start an interactive session to login to a node to manual run programs.
*'''Installing Software''' Please email help@igb.illinois.edu for any software requests. Compiled software will be installed in /home/apps. If its a standard RedHat package (rpm), it will be installed in their default locations on the nodes.
+
* '''Installing Software''' Please email help@igb.illinois.edu for any software requests. Compiled software will be installed in /home/apps. If its a standard RedHat package (rpm), it will be installed in their default locations on the nodes.
*'''Creating or Moving over Programs:''' Programs you create or move to the cluster should be first tested by you outside the cluster for stability. Once your program is stable, then it can be moved over to the cluster for use. Unstable programs that cause problems with the cluster can result in your account being locked.&nbsp; Programs should only be added by CNRG personnel and not compiled in your home directory.
+
* '''Creating or Moving over Programs:''' Programs you create or move to the cluster should be first tested by you outside the cluster for stability. Once your program is stable, then it can be moved over to the cluster for use. Unstable programs that cause problems with the cluster can result in your account being locked. Programs should only be added by CNRG personnel and not compiled in your home directory.
*'''Reserving Memory:''' SLURM allows the user to specify the amount of memory they want their program to use.. If your job tries to use more memory than you have reserved, the job will run out of memory and die. Make sure to specify the correct amount of memory.
+
* '''Reserving Memory:''' SLURM allows the user to specify the amount of memory they want their program to use.. If your job tries to use more memory than you have reserved, the job will run out of memory and die. Make sure to specify the correct amount of memory.
*'''Reserving Nodes and Processors:''' For each job, you must reserve the correct number of nodes and processors. By default you are reserved 1 processor on 1 node. If you are running a multiple processor job or a MPI job you need to reserve the appropriate amount. If you do not reserve the correct amount, the cluster will confine your job to that limit, increasing its runtime.
+
* '''Reserving Nodes and Processors:''' For each job, you must reserve the correct number of nodes and processors. By default you are reserved 1 processor on 1 node. If you are running a multiple processor job or a MPI job you need to reserve the appropriate amount. If you do not reserve the correct amount, the cluster will confine your job to that limit, increasing its runtime.
  
== How To Log Into The Cluster ==
+
==How To Log Into The Cluster==
  
*You will need to use an SSH client to connect.
+
* You will need to use an SSH client to connect.
  
=== On Windows ===
+
===On Windows===
  
*You can download Putty from [http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html]
+
* You can download Putty from [http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html]
*Install Putty and run it, in the Host Name input box enter '''biologin.igb.illinois.edu'''
+
* Install Putty and run it, in the Host Name input box enter '''biologin.igb.illinois.edu'''
  
 
[[File:PUTTYbiologin.PNG]]
 
[[File:PUTTYbiologin.PNG]]
  
*Hit Open and login using your IGB account credentials.
+
* Hit Open and login using your IGB account credentials.
  
=== On Mac OS X ===
+
===On Mac OS X===
  
*Simply open the terminal under Go >> Utilities >> Terminal
+
* Simply open the terminal under Go >> Utilities >> Terminal
*Type in '''ssh username@biologin.igb.illinois.edu''' where username is your NetID.
+
* Type in '''ssh username@biologin.igb.illinois.edu''' where username is your NetID.
*Hit the Enter key and type in your IGB password.
+
* Hit the Enter key and type in your IGB password.
  
== How To Submit A Cluster Job ==
+
==How To Submit A Cluster Job==
  
*The cluster runs the '''SLURM ''' queuing and resource mangement program.
+
* The cluster runs the '''SLURM ''' queuing and resource mangement program.
*All jobs are submitted to SLURM which distributes them automatically to the Nodes.
+
* All jobs are submitted to SLURM which distributes them automatically to the Nodes.
*You can find all of the parameters that SLURM uses at [https://slurm.schedmd.com/quickstart.html https://slurm.schedmd.com/quickstart.html]
+
* You can find all of the parameters that SLURM uses at [https://slurm.schedmd.com/quickstart.html https://slurm.schedmd.com/quickstart.html]
*You can use our SLURM Generation Utility to help you learn to generate job scripts [http://www-app2.igb.illinois.edu/tools/slurm/ http://www-app2.igb.illinois.edu/tools/slurm/]
+
* You can use our SLURM Generation Utility to help you learn to generate job scripts [http://www-app2.igb.illinois.edu/tools/slurm/ http://www-app2.igb.illinois.edu/tools/slurm/]
  
=== Create a Job Script ===
+
===Create a Job Script===
  
*You must first create a SLURM job script file in order to tell SLURM how and what to execute on the nodes.
+
* You must first create a SLURM job script file in order to tell SLURM how and what to execute on the nodes.
*Type the following into a text editor and save the file '''test.sh''' ( [[Linux text editing]] )
+
* Type the following into a text editor and save the file '''test.sh''' ( [[Linux text editing]] )
<pre>#!/bin/bash
 
#SBATCH -p normal
 
#SBATCH --mem=1g
 
#SBATCH -N 1
 
#SBATCH -n 1
 
  
sleep 20
+
<pre>#!/bin/bash<div><ol><li>SBATCH -p normal</li><li>SBATCH --mem=1g</li><li>SBATCH -N 1</li><li>SBATCH -n 1</li></ol></div><br class="mw_emptyline_first"><br class="mw_emptyline_first">sleep 20
 
echo "Test Script"  
 
echo "Test Script"  
 
</pre>
 
</pre>
*You just created a simple SLURM Job Script.
+
* You just created a simple SLURM Job Script.
*To submit the script to the cluster, you will use the sbatch command.
+
* To submit the script to the cluster, you will use the sbatch command.
 +
 
 
<pre>sbatch test.sh</pre>
 
<pre>sbatch test.sh</pre>
*Line by line explanation
+
* Line by line explanation
**'''#!/bin/bash''' - tells linux this is a bash program and it should use a bash interpreter to execute it.
+
** '''#!/bin/bash''' - tells linux this is a bash program and it should use a bash interpreter to execute it.
**'''#SBATCH''' - are SLURM parameters, for explanations of these please scroll down to SLURM Parameters Explanations section.
+
** '''#SBATCH''' - are SLURM parameters, for explanations of these please scroll down to SLURM Parameters Explanations section.
**'''sleep 20''' - Sleep 20 seconds (only used to simulate processing time for this example)
+
** '''sleep 20''' - Sleep 20 seconds (only used to simulate processing time for this example)
**'''echo "Test Script"''' - Output some text to the screen when job completes ( simulate output for this example)
+
** '''echo "Test Script"''' - Output some text to the screen when job completes ( simulate output for this example)
*For example if you would like to run a blast job you may simply replace the last two line with the following
+
* For example if you would like to run a blast job you may simply replace the last two line with the following
 +
 
 
<pre>module load BLAST
 
<pre>module load BLAST
 
blastall -p blastn -d nt -i input.fasta -e 10 -o output.result -v 10 -b 5 -a 5
 
blastall -p blastn -d nt -i input.fasta -e 10 -o output.result -v 10 -b 5 -a 5
 
</pre>
 
</pre>
*Note: the module commands are explained under the '''Environment Modules''' section.
+
* Note: the module commands are explained under the '''Environment Modules''' section.
  
==== SLURM Parameters Explanations: ====
+
====SLURM Parameters Explanations:====
  
*To view all possible parameters
+
* To view all possible parameters
**Run '''man sbatch''' at the command line
+
** Run '''man sbatch''' at the command line
**Go to [https://slurm.schedmd.com/sbatch.html https://slurm.schedmd.com/sbatch.html] to view the man page online
+
** Go to [https://slurm.schedmd.com/sbatch.html https://slurm.schedmd.com/sbatch.html] to view the man page online
{| class="wikitable" border="1"
+
 
 +
{| border="1"
 
|-
 
|-
! Command
+
!|Command
! Description
+
!|Description
 
|-
 
|-
| #SBATCH -p PARTITION
+
||#SBATCH -p PARTITION
| Run the job on a specific queue/partition. This defaults to the "normal" queue
+
||Run the job on a specific queue/partition. This defaults to the "normal" queue
 
|-
 
|-
| #SBATCH -D /tmp/working_dir
+
||#SBATCH -D /tmp/working_dir
| Run the script from the /tmp/working_dir directory. This defaults to the current directory you are in.
+
||Run the script from the /tmp/working_dir directory. This defaults to the current directory you are in.
 
|-
 
|-
| #SBATCH -J ExampleJobName
+
||#SBATCH -J ExampleJobName
| Name of the job will be ExampleJobName
+
||Name of the job will be ExampleJobName
 
|-
 
|-
| #SBATCH -e /path/to/errorfile
+
||#SBATCH -e /path/to/errorfile
| Split off the error stream to this file. By default output and error streams are placed in the same file.
+
||Split off the error stream to this file. By default output and error streams are placed in the same file.
 
|-
 
|-
| #SBATCH -o /path/to/ouputfile
+
||#SBATCH -o /path/to/ouputfile
| Split off the output stream to this file. By default output and error streams are placed in the same file.
+
||Split off the output stream to this file. By default output and error streams are placed in the same file.
 
|-
 
|-
| #SBATCH --mail-user username@illinois.edu
+
||#SBATCH --mail-user username@illinois.edu
| Send an e-mail to specified email to receive job information.
+
||Send an e-mail to specified email to receive job information.
 
|-
 
|-
| #SBATCH --mail-type BEGIN, END, FAIL
+
||#SBATCH --mail-type BEGIN, END, FAIL
| Specifies when to send a message to email. You can select multiple of these with a comma separated list. Many other options exist.
+
||Specifies when to send a message to email. You can select multiple of these with a comma separated list. Many other options exist.
 
|-
 
|-
| #SBATCH -N X
+
||#SBATCH -N X
| Reserve X number of nodes.
+
||Reserve X number of nodes.
 
|-
 
|-
| #SBATCH -n X
+
||#SBATCH -n X
| Reserve X number of cpus.
+
||Reserve X number of cpus.
 
|-
 
|-
| #SBATCH --mem=XG  
+
||#SBATCH --mem=XG
| Reserve X gigabytes of RAM for the job.
+
||Reserve X gigabytes of RAM for the job.
|-
 
| #SBATCH --gres=gpu:X
 
| Reserve X NVIDIA GPUs. (Only on GPU queues)
 
 
|-
 
|-
 +
||#SBATCH --gres=gpu:X
 +
||Reserve X NVIDIA GPUs. (Only on GPU queues)
 
|}
 
|}
  
=== Create a Job Array Script ===
+
===Create a Job Array Script===
 +
Making a new copy of the script and then submitting each one for every input data file is time consuming. An alternative is to make a job array using the -t option in your submission script. The ---aray option allows many copies of the same script to be queued all at once. You can use the $SLURM_ARRAY_TASK_ID to differentiate between the different jobs in the array. A brief example on how to do this is available at [[Job Arrays]]
  
Making a new copy of the script and then submitting each one for every input data file is time consuming. An alternative is to make a job array using the -t option in your submission script. The ---aray option allows many copies of the same script to be queued all at once. You can use the $SLURM_ARRAY_TASK_ID to differentiate between the different jobs in the array. A brief example on how to do this is available at [[Job Arrays]]
+
===Start An Interactive Login Session On A Compute Node===
  
=== Start An Interactive Login Session On A Compute Node ===
+
* Use interactive srun if you would like to run a job interactively such as running a quick perl script or run a quick test interactively on your data.
  
*Use interactive srun if you would like to run a job interactively such as running a quick perl script or run a quick test interactively on your data.
 
 
<pre>srun --pty /bin/bash
 
<pre>srun --pty /bin/bash
 
</pre>
 
</pre>
*This will automatically reserve you a slot on one of the compute nodes and will start a terminal session on it.
+
* This will automatically reserve you a slot on one of the compute nodes and will start a terminal session on it.
*Closing your terminal window will also kill your processes running in your interactive srun session, therefor it's better to submit large jobs via non-interactive sbatch.
+
* Closing your terminal window will also kill your processes running in your interactive srun session, therefor it's better to submit large jobs via non-interactive sbatch.
*To run an application with a user interface you will need to setup xserver on your computer [[Xserver Setup]]
+
* To run an application with a user interface you will need to setup xserver on your computer [[Xserver Setup]]
  
== View/Delete Submitted Jobs ==
+
==View/Delete Submitted Jobs==
 +
===Viewing Job Status===
  
=== Viewing Job Status ===
+
* To get a simple view of your current running jobs you may type:
  
*To get a simple view of your current running jobs you may type:
 
 
<pre>squeue -u userid
 
<pre>squeue -u userid
 
</pre>
 
</pre>
*This command brings up a list of your current running jobs.
+
* This command brings up a list of your current running jobs.
*The first number represents the job's ID number.
+
* The first number represents the job's ID number.
*Jobs may have different status flags:
+
* Jobs may have different status flags:
**'''R''' = job is currently running
+
** '''R''' = job is currently running
 +
 
  
 +
* For more detailed view type:
  
*For more detailed view type:
 
 
<pre>squeue -l</pre>
 
<pre>squeue -l</pre>
*This will return a list of all nodes, their slot availability, and your current jobs.
+
* This will return a list of all nodes, their slot availability, and your current jobs.
  
=== List Queues ===
+
===List Queues===
 +
 
 +
* Simple view
 +
 
 +
<pre>sinfo</pre>This will show all queues as well as which nodes in those queues are fully used (alloc), partially used (mix), unused (idle), or unavailable (down).
  
*Simple view
 
<pre>sinfo
 
</pre>
 
This will show all queues as well as which nodes in those queues are fully used (alloc), partially used (mix), unused (idle), or unavailable (down).
 
</pre>
 
  
=== List All Jobs on Cluster With Nodes ===
+
===List All Jobs on Cluster With Nodes===
<pre>squeue
+
<pre>squeue</pre>
</pre>
+
===Deleting Jobs===
  
=== Deleting Jobs ===
+
* Note: You can only delete jobs which are owned by you.
 +
* To delete a job by job-ID number:
 +
* You will need to use '''scancel''', for example to delete a job with ID number 5523 you would type:
  
*Note: You can only delete jobs which are owned by you.
 
*To delete a job by job-ID number:
 
*You will need to use '''scancel''', for example to delete a job with ID number 5523 you would type:
 
 
<pre>scancel 5523
 
<pre>scancel 5523
 
</pre>
 
</pre>
*Delete all of your jobs
+
* Delete all of your jobs
 +
 
 
<pre>scancel -u userid
 
<pre>scancel -u userid
 
</pre>
 
</pre>
 +
===Troubleshooting job errors===
  
=== Troubleshooting job errors ===
+
* To view job errors in case job status shows '''Eqw''' or any other error in the status column use '''qstat -j''', for example if job # 23451 failed you would type:
  
*To view job errors in case job status shows '''Eqw''' or any other error in the status column use '''qstat -j''', for example if job # 23451 failed you would type:
 
 
<pre>scontrol show job 23451
 
<pre>scontrol show job 23451
 
</pre>
 
</pre>
 +
==Applications==
 +
===Application Lists===
  
== Applications ==
+
* List of currently installed applications from the commmand line, run '''module avail'''
 
 
=== Application Lists ===
 
  
*List of currently installed applications from the commmand line, run '''module avail'''
+
===Application Installation===
  
=== Application Installation ===
+
* Please email '''help@igb.illinois.edu''' to request new application or version upgrades
  
*Please email '''help@igb.illinois.edu''' to request new application or version upgrades
+
===Environment Modules===
  
=== Environment Modules ===
+
* To use an application, you need to use the '''module''' command to load the settings for an application
 +
* To load a particular environment for example QIIME/1.9.1, simply run this command:
  
*To use an application, you need to use the '''module''' command to load the settings for an application
 
*To load a particular environment for example QIIME/1.9.1, simply run this command:
 
 
<pre>module load QIIME/1.9.1
 
<pre>module load QIIME/1.9.1
 
</pre>
 
</pre>
*If you would like to simply load the latest version, run the the command without the /1.9.1 (version number):
+
* If you would like to simply load the latest version, run the the command without the /1.9.1 (version number):
 +
 
 
<pre>module load QIIME
 
<pre>module load QIIME
 
</pre>
 
</pre>
*To view which environments you have loaded simply run '''module list''':
+
* To view which environments you have loaded simply run '''module list''':
<pre>bash-4.1$ module list
 
  
Currently Loaded Modules:
+
<pre>bash-4.1$ module list<br class="mw_emptyline_first"><br class="mw_emptyline_first">Currently Loaded Modules:
 
   1) BLAST/2.2.26-Linux_x86_64  2) QIIME/1.9.1
 
   1) BLAST/2.2.26-Linux_x86_64  2) QIIME/1.9.1
 
</pre>
 
</pre>
*When submitting a job using a sbatch script you will have to add the '''module load qiime/1.5.0''' line before running qiime in the script.
+
* When submitting a job using a sbatch script you will have to add the '''module load qiime/1.5.0''' line before running qiime in the script.
*To unload a module simply run '''module unload''':
+
* To unload a module simply run '''module unload''':
 +
 
 
<pre>module unload QIIME
 
<pre>module unload QIIME
 
</pre>
 
</pre>
*Unload all modules
+
* Unload all modules
 +
 
 
<pre>module purge
 
<pre>module purge
 
</pre>
 
</pre>
 +
==Transferring data files==
 +
===Transferring using SFTP/SCP===
 +
====Using WinSCP====
 +
====Using CyberDuck====
 +
===Transferring using Globus===
  
== Transferring data files ==
+
* The biocluster has a Globus endpoint setup. Then end point name is '''igb#biocluster.igb.illinois.edu'''
 
+
* Globus allows the transferring of very large files reliably.
=== Transferring using SFTP/SCP ===
+
* A guide on how to use Globus is [[Globus|here]]
  
=== Transferring using Globus ===
+
===Transferring from Biotech FTP Server===
*The biocluster has a Globus endpoint setup. Then end point name is '''igb#biocluster.igb.illinois.edu'''
 
*Globus allows the transferring of very large files reliably.
 
*A guide on how to use Globus is [[Globus|here]]
 
  
=== Transferring from Biotech FTP Server ===
+
* One option to transfer data from the Biotech FTP server is to use a program called wget.
*One option to transfer data from the Biotech FTP server is to use a program called wget.
+
* It can download 1 file or an entire directory.
*It can download 1 file or an entire directory.
+
* Replace USERNAME with the username provided by the Biotech Center.
*Replace USERNAME with the username provided by the Biotech Center.
 
  
 
The below example will download the file '''test_file.tar.gz'''
 
The below example will download the file '''test_file.tar.gz'''
 
<pre>wget --user=USERNAME --ask-password ftp://ftp.biotech.illinois.edu/test_file.tar.gz</pre>
 
<pre>wget --user=USERNAME --ask-password ftp://ftp.biotech.illinois.edu/test_file.tar.gz</pre>
*If you want to download an entire directory, you need to have the '''-r''' parameter set. This will recursively go through an entire directory and download it.
+
* If you want to download an entire directory, you need to have the '''-r''' parameter set. This will recursively go through an entire directory and download it.
<pre>wget -r --user=USERNAME --ask-password ftp://ftp.biotech.illinois.edu/test_dir/</pre>
 
  
== References ==
+
<pre>wget -r --user=USERNAME --ask-password ftp://ftp.biotech.illinois.edu/test_dir/</pre><p class="mw_paragraph">
*OpenHPC [https://openhpc.community/ https://openhpc.community/]
+
</p><h2>References</h2><p class="mw_paragraph"></p><ul><li>OpenHPC [https://openhpc.community/ https://openhpc.community/]</li><li>SLURM Job Scheduler Documentation - [https://slurm.schedmd.com/ https://slurm.schedmd.com/]</li><li>Rosetta Stone of Schedulers - [https://slurm.schedmd.com/rosetta.pdf https://slurm.schedmd.com/rosetta.pdf]</li><li>SLURM Quick Refernece - [https://slurm.schedmd.com/pdfs/summary.pdf https://slurm.schedmd.com/pdfs/summary.pdf]</li><li>CEPH Filesystem 0 [http://ceph.com/ceph-storage/file-system/ http://ceph.com/ceph-storage/file-system/]</li><li>Lmod Module Homepage - [https://www.tacc.utexas.edu/research-development/tacc-projects/lmod https://www.tacc.utexas.edu/research-development/tacc-projects/lmod]</li><li>Lmod Documentation - [https://lmod.readthedocs.io/en/latest/ https://lmod.readthedocs.io/en/latest/]</li></ul>
*SLURM Job Scheduler Documentation - [https://slurm.schedmd.com/ https://slurm.schedmd.com/]
 
*Rosetta Stone of Schedulers - [https://slurm.schedmd.com/rosetta.pdf https://slurm.schedmd.com/rosetta.pdf]
 
*SLURM Quick Refernece - [https://slurm.schedmd.com/pdfs/summary.pdf https://slurm.schedmd.com/pdfs/summary.pdf]
 
*CEPH Filesystem 0 [http://ceph.com/ceph-storage/file-system/ http://ceph.com/ceph-storage/file-system/]
 
*Lmod Module Homepage - [https://www.tacc.utexas.edu/research-development/tacc-projects/lmod https://www.tacc.utexas.edu/research-development/tacc-projects/lmod]
 
*Lmod Documentation - [https://lmod.readthedocs.io/en/latest/ https://lmod.readthedocs.io/en/latest/]
 

Revision as of 13:25, 22 June 2018

Quick Links[edit]

Description[edit]

Biocluster is the High Performance Computing (HPC) resource for the Carl R Wose Institute for Genomic Biology (IGB) at the University of Illinois at Urbana-Champaign (UIUC). Containing 2824 cores and over 27.7 TB of RAM, Biocluster has a mix of various RAM and CPU configurations on nodes to best serve the various computation needs of the IGB and the Bioinformatics community at UIUC. For storage, Biocluster has 2619 TB of storage on its Ceph filesystem for reliable high speed data transfers within the cluster. Networking in Biocluster is either 1 or 10 Gigibit ethernet depending on the class of node and its data transfer needs.

Cluster Specifications[edit]

Queue Name Nodes CPUs Memory Networking Scratch Space /scatch GPUs
normal (default) 20 Poweredge R620 24 Intel Xeon E5-2697 @ 2.7GHz 384GB 10GB Ethernet 1TB  
himemeory 2 Supermicro 28 Intel Xeon E5-2690 v4 @ 2.6GHz 768GB 10GB Ethernet 4TB SSD  
largememory 1 Poweredge R910 24 Intel Xeon E7540 @ 2.0GHz 1TB 10GB Ethernet 1TB  
budget 10 Dell Poweredge R410 8 Intel Xeon E5530 @ 2.4Ghz 32GB or 48GB 1GB Ethernet 1TB  
gpu 1 Supermicro 28 Intel Xeon E5-2680 @ 2.4Ghz 128GB 1GB Ethernet 1TB 4 NVIDIA GeForce GTX 1080 Ti
classroom 22 Dell Poweredge PE1950 8 Intel Xeon E5440 @ 2.83GHz 16GB 1GB Ethernet 500GB  

Storage[edit]

Information[edit]

  • The storage system is a CEPH filesystem with 2.4 Petabytes of disk space. This data is NOT backed up.
  • The data is spread across 15 CEPH storage nodes.

Cost[edit]

Cost (Per Terabyte Per Month)
$10

Queue Costs[edit]

The cost for each job is dependent on which queue it is submitted to. Listed below are the different queues on the cluster with their cost.

Usage is charge by the second. The CPU cost and memory cost are compared and the largest is what is billed.

Queue Name CPU Cost ($ per CPU per day) Memory Cost ($ per GB per day)
norma (default) $1.00 $0.07
himemeory $2.00 $0.07
largememory $8.50 $0.20
budget $0.50 $0.13

Gaining Access[edit]

Cluster Rules[edit]

  • Running jobs on the head node or login nodes are strictly prohibited. Running jobs on the head node could cause the entire cluster to crash and affect everyone's jobs on the cluster. Any program found to be running on the headnode will be stopped immediately and your account could be locked. You can start an interactive session to login to a node to manual run programs.
  • Installing Software Please email help@igb.illinois.edu for any software requests. Compiled software will be installed in /home/apps. If its a standard RedHat package (rpm), it will be installed in their default locations on the nodes.
  • Creating or Moving over Programs: Programs you create or move to the cluster should be first tested by you outside the cluster for stability. Once your program is stable, then it can be moved over to the cluster for use. Unstable programs that cause problems with the cluster can result in your account being locked. Programs should only be added by CNRG personnel and not compiled in your home directory.
  • Reserving Memory: SLURM allows the user to specify the amount of memory they want their program to use.. If your job tries to use more memory than you have reserved, the job will run out of memory and die. Make sure to specify the correct amount of memory.
  • Reserving Nodes and Processors: For each job, you must reserve the correct number of nodes and processors. By default you are reserved 1 processor on 1 node. If you are running a multiple processor job or a MPI job you need to reserve the appropriate amount. If you do not reserve the correct amount, the cluster will confine your job to that limit, increasing its runtime.

How To Log Into The Cluster[edit]

  • You will need to use an SSH client to connect.

On Windows[edit]

PUTTYbiologin.PNG

  • Hit Open and login using your IGB account credentials.

On Mac OS X[edit]

  • Simply open the terminal under Go >> Utilities >> Terminal
  • Type in ssh username@biologin.igb.illinois.edu where username is your NetID.
  • Hit the Enter key and type in your IGB password.

How To Submit A Cluster Job[edit]

Create a Job Script[edit]

  • You must first create a SLURM job script file in order to tell SLURM how and what to execute on the nodes.
  • Type the following into a text editor and save the file test.sh ( Linux text editing )
#!/bin/bash<div><ol><li>SBATCH -p normal</li><li>SBATCH --mem=1g</li><li>SBATCH -N 1</li><li>SBATCH -n 1</li></ol></div><br class="mw_emptyline_first"><br class="mw_emptyline_first">sleep 20
echo "Test Script" 
  • You just created a simple SLURM Job Script.
  • To submit the script to the cluster, you will use the sbatch command.
sbatch test.sh
  • Line by line explanation
    • #!/bin/bash - tells linux this is a bash program and it should use a bash interpreter to execute it.
    • #SBATCH - are SLURM parameters, for explanations of these please scroll down to SLURM Parameters Explanations section.
    • sleep 20 - Sleep 20 seconds (only used to simulate processing time for this example)
    • echo "Test Script" - Output some text to the screen when job completes ( simulate output for this example)
  • For example if you would like to run a blast job you may simply replace the last two line with the following
module load BLAST
blastall -p blastn -d nt -i input.fasta -e 10 -o output.result -v 10 -b 5 -a 5
  • Note: the module commands are explained under the Environment Modules section.

SLURM Parameters Explanations:[edit]

Command Description
#SBATCH -p PARTITION Run the job on a specific queue/partition. This defaults to the "normal" queue
#SBATCH -D /tmp/working_dir Run the script from the /tmp/working_dir directory. This defaults to the current directory you are in.
#SBATCH -J ExampleJobName Name of the job will be ExampleJobName
#SBATCH -e /path/to/errorfile Split off the error stream to this file. By default output and error streams are placed in the same file.
#SBATCH -o /path/to/ouputfile Split off the output stream to this file. By default output and error streams are placed in the same file.
#SBATCH --mail-user username@illinois.edu Send an e-mail to specified email to receive job information.
#SBATCH --mail-type BEGIN, END, FAIL Specifies when to send a message to email. You can select multiple of these with a comma separated list. Many other options exist.
#SBATCH -N X Reserve X number of nodes.
#SBATCH -n X Reserve X number of cpus.
#SBATCH --mem=XG Reserve X gigabytes of RAM for the job.
#SBATCH --gres=gpu:X Reserve X NVIDIA GPUs. (Only on GPU queues)

Create a Job Array Script[edit]

Making a new copy of the script and then submitting each one for every input data file is time consuming. An alternative is to make a job array using the -t option in your submission script. The ---aray option allows many copies of the same script to be queued all at once. You can use the $SLURM_ARRAY_TASK_ID to differentiate between the different jobs in the array. A brief example on how to do this is available at Job Arrays

Start An Interactive Login Session On A Compute Node[edit]

  • Use interactive srun if you would like to run a job interactively such as running a quick perl script or run a quick test interactively on your data.
srun --pty /bin/bash
  • This will automatically reserve you a slot on one of the compute nodes and will start a terminal session on it.
  • Closing your terminal window will also kill your processes running in your interactive srun session, therefor it's better to submit large jobs via non-interactive sbatch.
  • To run an application with a user interface you will need to setup xserver on your computer Xserver Setup

View/Delete Submitted Jobs[edit]

Viewing Job Status[edit]

  • To get a simple view of your current running jobs you may type:
squeue -u userid
  • This command brings up a list of your current running jobs.
  • The first number represents the job's ID number.
  • Jobs may have different status flags:
    • R = job is currently running


  • For more detailed view type:
squeue -l
  • This will return a list of all nodes, their slot availability, and your current jobs.

List Queues[edit]

  • Simple view
sinfo

This will show all queues as well as which nodes in those queues are fully used (alloc), partially used (mix), unused (idle), or unavailable (down).


List All Jobs on Cluster With Nodes[edit]

squeue

Deleting Jobs[edit]

  • Note: You can only delete jobs which are owned by you.
  • To delete a job by job-ID number:
  • You will need to use scancel, for example to delete a job with ID number 5523 you would type:
scancel 5523
  • Delete all of your jobs
scancel -u userid

Troubleshooting job errors[edit]

  • To view job errors in case job status shows Eqw or any other error in the status column use qstat -j, for example if job # 23451 failed you would type:
scontrol show job 23451

Applications[edit]

Application Lists[edit]

  • List of currently installed applications from the commmand line, run module avail

Application Installation[edit]

  • Please email help@igb.illinois.edu to request new application or version upgrades

Environment Modules[edit]

  • To use an application, you need to use the module command to load the settings for an application
  • To load a particular environment for example QIIME/1.9.1, simply run this command:
module load QIIME/1.9.1
  • If you would like to simply load the latest version, run the the command without the /1.9.1 (version number):
module load QIIME
  • To view which environments you have loaded simply run module list:
bash-4.1$ module list<br class="mw_emptyline_first"><br class="mw_emptyline_first">Currently Loaded Modules:
  1) BLAST/2.2.26-Linux_x86_64   2) QIIME/1.9.1
  • When submitting a job using a sbatch script you will have to add the module load qiime/1.5.0 line before running qiime in the script.
  • To unload a module simply run module unload:
module unload QIIME
  • Unload all modules
module purge

Transferring data files[edit]

Transferring using SFTP/SCP[edit]

Using WinSCP[edit]

Using CyberDuck[edit]

Transferring using Globus[edit]

  • The biocluster has a Globus endpoint setup. Then end point name is igb#biocluster.igb.illinois.edu
  • Globus allows the transferring of very large files reliably.
  • A guide on how to use Globus is here

Transferring from Biotech FTP Server[edit]

  • One option to transfer data from the Biotech FTP server is to use a program called wget.
  • It can download 1 file or an entire directory.
  • Replace USERNAME with the username provided by the Biotech Center.

The below example will download the file test_file.tar.gz

wget --user=USERNAME --ask-password ftp://ftp.biotech.illinois.edu/test_file.tar.gz
  • If you want to download an entire directory, you need to have the -r parameter set. This will recursively go through an entire directory and download it.
wget -r --user=USERNAME --ask-password ftp://ftp.biotech.illinois.edu/test_dir/

References