From Carl R. Woese Institute for Genomic Biology - University of Illinois Urbana-Champaign
About
How to Run
- Load alphafold module. This loads alphafold, singularity, and the alphafold databases.
module load alphafold/2.1.1
run_singularity.py
Example Job Script
#!/bin/bash
# ----------------SLURM Parameters----------------
#SBATCH -n 4
#SBATCH -N 1
#SBATCH -p gpu
#SBATCH --gres=gpu:2
#SBATCH --mem 80G
# ----------------Load Modules--------------------
module load alphafold/2.1.1
# ----------------Commands------------------------
run_singularity.py --data-dir $BIODB --cpus $SLURM_NTASKS --use-gpu --db-preset full_dbs --output-dir output \
--fasta-paths example.fasta
Parameters
- These are all the parameters for run_singularity.py. This can be accessed by running run_singularity.py --help
--fasta-paths FASTA_PATHS [FASTA_PATHS ...], -f FASTA_PATHS [FASTA_PATHS ...]
Paths to FASTA files, each containing one sequence.
All FASTA paths must have a unique basename as the
basename is used to name the output directories for
each prediction.
--is-prokaryote-list IS_PROKARYOTE_LIST [IS_PROKARYOTE_LIST ...]
Optional for multimer system, not used by the single
chain system. This list should contain a boolean for
each fasta specifying true where the target complex is
from a prokaryote, and false where it is not, or where
the origin is unknown. These values determine the
pairing method for the MSA.
--max-template-date MAX_TEMPLATE_DATE, -t MAX_TEMPLATE_DATE
Maximum template release date to consider (ISO-8601
format - i.e. YYYY-MM-DD). Important if folding
historical test sets.
--db-preset {reduced_dbs,full_dbs}
Choose preset model configuration - no ensembling with
uniref90 + bfd + uniclust30 (full_dbs), or 8 model
ensemblings with uniref90 + bfd + uniclust30 (casp14).
--model-preset {monomer,monomer_casp14,monomer_ptm,multimer}
Choose preset model configuration - the monomer model,
the monomer model with extra ensembling, monomer model
with pTM head, or multimer model
--benchmark, -b Run multiple JAX model evaluations to obtain a timing
that excludes the compilation time, which should be
more indicative of the time required for inferencing
many proteins.
--use-precomputed-msas
Whether to read MSAs that have been written to disk.
WARNING: This will not check if the sequence, database
or configuration have changed.
--data-dir DATA_DIR, -d DATA_DIR
Path to directory with supporting data: AlphaFold
parameters and genetic and template databases. Set to
the target of download_all_databases.sh.
--docker-image DOCKER_IMAGE
Alphafold docker image.
--output-dir OUTPUT_DIR, -o OUTPUT_DIR
Output directory for results.
--use-gpu Enable NVIDIA runtime to run with GPUs.
--gpu-devices GPU_DEVICES
Comma separated list of devices to pass to
NVIDIA_VISIBLE_DEVICES.
--cpus CPUS, -c CPUS Number of CPUs to use.
References