/ Categories / alphafold

Bundle

Name:protpred
Maintainer:LCC Support
Contact:support@lcc.ncbr.muni.cz

Default build

  • alphafold:2.3.2:auto:auto

alphafold

This package provides an implementation of the inference pipeline of AlphaFold v2.x (Summary from the program web page.)

The official web page: https://github.com/deepmind/alphafold

Supplementary data: AlphaFold Workshop, 30th May - 2nd June, 2023

  • L08 AlphaFold
  • L15 AlphaFold on SOKAR
  • L17 AlphaFold on MetaCentrum
  • L18 AlphaFold Analysing Results
  • L22 Multimer Prediction
  • L23 Multimer Prediction: Analysing Results

Notes:

The package is available on the WOLF, SOKAR, and MetaCentrum clusters.

WOLF: The length of sequences is limited by the HW of dedicated computational node (wolf32: AMD EPYC 7402P 24-Core Processor, 128 GB RAM, 1xGeForce RTX 4070 (12 GB RAM), 700 GB SSD). Furthermore, the prediction cannot run longer than 24 hours. For bigger problems, use either SOKAR or MetaCentrum resources.

SOKAR: The length of sequences is limited by the HW of dedicated computational node (sokar11: AMD EPYC 7402P 24-Core Processor, 128 GB RAM, 2xGeForce RTX 3070 (8 GB RAM), 700 GB SSD). The prediction can run for a month. For bigger problems, use MetaCentrum resources. You can also try sokar10 via the fast_gpu queue (AMD EPYC 7402P 24-Core Processor, 128 GB RAM, 2xGeForce RTX 4070 (12 GB RAM), 700 GB SSD).

Usage:
Usage: alphafold <OPTIONS>

Required Parameters:
 -f <fasta_file>      single FASTA file containing one or more sequences

Optional Parameters:
 -m <model_preset>    Model preset: monomer (default), monomer_casp14, monomer_ptm, multimer
 -d <db_preset>       Control MSA speed/quality tradeoff: reduced_dbs, full_dbs (default)
 -t <max_temp_date>   Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD).
                      Important if folding historical test sets. (defaults to: 2020-05-14)
 -s <stage>           which AF stage to run: all|msa|inference
 -h                   Print usage.
 --help               Print usage including alphafold options.
Specific setup:

The following environment variables can influence performace of the alphafold job.

 ** XLA_PYTHON_CLIENT_MEM_FRACTION - default 4.0, however, for bigger problems, it needs to be increased to > REQ_GPU_RAM / GPU_RAM 
                                     [Example: GeForce RTX 3070 (8GB), 
                                               required GPU RAM is 50GB, then 50/8 = 6.25 -> 
                                               XLA_PYTHON_CLIENT_MEM_FRACTION=7.0 (rounded up)]
 ** OPENMM_RELAX_PLATFORM          - the platform for geometry optimization: CUDA or CPU. By default, it is CUDA if ngpus > 0. 
Typical workflow:
1. Create a new job directory 
2. Enter the job directory 
3. Copy an input FASTA file in the job directory
4. Create a job script in in the job directory (see below)
5. Submit a job into a batch system (see below), you need to specify required resources !!!
6. Wait until the job completes (see outputs of commands pjobs, pinfo)
7. Resulting models are in a subdirectory with a name derived from the input FASTA file. The log from the alphafold program is in the *.stdout file.
Example script (my_job)
#!/usr/bin/env infinity-env

# activate the module
module add alphafold

# run the prediction with default setup (see alphafold -h for the list of options)
# T1049.fasta is an input file with the protein sequence
alphafold -f T1049.fasta 

Job submission on the WOLF cluster
$ psubmit ai my_job ncpus=8 ngpus=1 mem=50gb
Job submission on the SOKAR cluster
$ psubmit ai my_job ncpus=8 ngpus=1 mem=50gb walltime=2d
Job submission on the MetaCentrum
$ psubmit gpu my_job ncpus=8 ngpus=1 mem=50gb
  • alphafold is the name of batch queue, this queue is generally dedicated for structure prediction jobs
  • ncpus - number of requested CPUs, alphafold is not well optimized for large number of CPUs, thus, keep this number low (about 8)
  • ngpus - number of requested GPUs, alphafold support runs on only one GPU
  • mem - amount of system memory
  • walltime - maximum job duration in days (d) or hours (h)