Revision as of 08:08, 19 February 2024

About

Alphafold is a Highly accurate protein structure prediction program
More information at https://github.com/deepmind/alphafold/

How to Run

Load alphafold module. This loads alphafold, singularity, and the alphafold databases.

module load alphafold/2.3.2

Create scratch folder. This is important so the temporary data will go to the local scratch disk instead of the system's /tmp folder. The /tmp has limited space which if it gets filled up, the node will become unresponsive and cause jobs to fail.

mkdir /scratch/$SLURM_JOB_ID
export TMPDIR=/scratch/$SLURM_JOB_ID

Run run_singularity.py to run alphafold. This is a wrapper script for the alphafold singularity container to make things easier to run.

run_singularity.py --data-dir $BIODB --cpus $SLURM_NTASKS --use-gpu --output-dir example_output --fasta-paths example.fasta

--data-dir parameter should be set to $BIODB. $BIODB points to the location of the alphafold databases
--cpus parameter should be set to $SLURM_NTASKS. $SLURM_NTASKS is a variable which is equal to the number of processors you reserved
--use-gpu enables the use of GPUS. Singularity will automatically use the number of the GPUs you have reserved.
--output-dir parameter specifies where the output files should go. Change this parameter to an folder in your home folder
--fasta-paths parameter specifies your input fasta files. Only one fasta sequence per a file is allowed. If you want to run on multiple sequences, each sequence needs to be in its own file. Then you can specify multiple files like below

--fasta-paths example.fasta,example2.fasta,example3.fasta

Example Job Script

#!/bin/bash
# ----------------SLURM Parameters----------------
#SBATCH -n 4
#SBATCH -N 1
#SBATCH -p gpu
#SBATCH --gres=gpu:1
#SBATCH --mem 70G

# ----------------Load Modules--------------------
module load alphafold/2.1.2
# ----------------Commands------------------------
mkdir /scratch/$SLURM_JOB_ID
export TMPDIR=/scratch/$SLURM_JOB_ID

run_singularity.py --data-dir $BIODB --cpus $SLURM_NTASKS --use-gpu --db-preset full_dbs --output-dir output \
--fasta-paths example.fasta

rm -fr /scratch/$SLURM_JOB_ID

Submit Job

Submit job to the cluster

sbatch example.sh

Parameters

These are all the parameters for run_singularity.py. This can be accessed by running run_singularity.py --help

  --fasta-paths FASTA_PATHS [FASTA_PATHS ...], -f FASTA_PATHS [FASTA_PATHS ...]
                        Paths to FASTA files, each containing one sequence.
                        All FASTA paths must have a unique basename as the
                        basename is used to name the output directories for
                        each prediction.
  --is-prokaryote-list IS_PROKARYOTE_LIST [IS_PROKARYOTE_LIST ...]
                        Optional for multimer system, not used by the single
                        chain system. This list should contain a boolean for
                        each fasta specifying true where the target complex is
                        from a prokaryote, and false where it is not, or where
                        the origin is unknown. These values determine the
                        pairing method for the MSA.
  --max-template-date MAX_TEMPLATE_DATE, -t MAX_TEMPLATE_DATE
                        Maximum template release date to consider (ISO-8601
                        format - i.e. YYYY-MM-DD). Important if folding
                        historical test sets.
  --db-preset {reduced_dbs,full_dbs}
                        Choose preset model configuration - no ensembling with
                        uniref90 + bfd + uniclust30 (full_dbs), or 8 model
                        ensemblings with uniref90 + bfd + uniclust30 (casp14).
  --model-preset {monomer,monomer_casp14,monomer_ptm,multimer}
                        Choose preset model configuration - the monomer model,
                        the monomer model with extra ensembling, monomer model
                        with pTM head, or multimer model
  --benchmark, -b       Run multiple JAX model evaluations to obtain a timing
                        that excludes the compilation time, which should be
                        more indicative of the time required for inferencing
                        many proteins.
  --use-precomputed-msas
                        Whether to read MSAs that have been written to disk.
                        WARNING: This will not check if the sequence, database
                        or configuration have changed.
  --data-dir DATA_DIR, -d DATA_DIR
                        Path to directory with supporting data: AlphaFold
                        parameters and genetic and template databases. Set to
                        the target of download_all_databases.sh.
  --docker-image DOCKER_IMAGE
                        Alphafold docker image.
  --output-dir OUTPUT_DIR, -o OUTPUT_DIR
                        Output directory for results.
  --use-gpu             Enable NVIDIA runtime to run with GPUs.
  --gpu-devices GPU_DEVICES
                        Comma separated list of devices to pass to
                        NVIDIA_VISIBLE_DEVICES.
  --cpus CPUS, -c CPUS  Number of CPUs to use.

Issues

If you receive an error like

RuntimeError: HHSearch failed

Most likely you need to increase the amount of memory you are reserving in your job script.

@@ Line 8: / Line 8: @@
 * Load alphafold module.  This loads alphafold, singularity, and the alphafold databases.
 <pre>
-module load alphafold/2.3.1
+module load alphafold/2.3.2
 </pre>
 * Create scratch folder.  This is important so the temporary data will go to the local scratch disk instead of the system's /tmp folder.  The /tmp has limited space which if it gets filled up, the node will become unresponsive and cause jobs to fail.

Biocluster Alphafold: Difference between revisions

Revision as of 08:08, 19 February 2024

Contents

About

How to Run

Example Job Script

Submit Job

Parameters

Issues

References

Navigation menu