Page Comparison

GROMACS (GROningen MAchine for Chemical Simulations) is a molecular dynamics simulation package.

...

updated: May 2023

...

Table of Contents

...

Available version

Version

Module name

Thread MPI

(single node or GPU)

MPI (multi-node)

2022.5

GROMACS/2022.5-GNU-11.2-CUDA-11.7

gmx mdrun

gmx_mpi mdrun

2023.2

GROMACS/2023.2-GNUcpeGNU-1123.209-CUDA-1112.70

gmx mdrun

gmx_mpi mdrun (see note [a])

Note:

[a] GROMACS/2023.2-GNUcpeGNU-1123.209-CUDA-1112.7 0 MPI does NOT have PME GPU decomposition feature. This does not affect the normal usage of GROMACS unless you have very large system (i.e. >10 M stomsatoms), please see Massively Improved Multi-node NVIDIA GPU Scalability with GROMACS | NVIDIA Technical Blog for more details about PME GPU decomposition feature.

1. Input file

The input file of GROMACS mdrun command is TPR file (.tpr). For an example TPR files, you can see https://www.mpinat.mpg.de/grubmueller/bench where GROMACS intensive benchmark sets are provided.

2. Job submission script

create a script using vi submit.sh command and specify the following details depending on computational resources you want to use.

2.1 using compute node (1 node)

Code Block

#!/bin/bash
#SBATCH -p compute      	#specific partition
#SBATCH -N 1 -c 128         #specific number of nodes and taskcores per nodetask
#SBATCH -t 5-00:00:00       #job time limit <hr:min:sec>
#SBATCH -A lt999999         #project name
#SBATCH -J GROMACS      	#job name

##Module Load##
module restore
module load GROMACS/2022.5-GNU-11.2-CUDA-11.7

gmx mdrun -deffnm input

The script above using compute partition (-p compute), 1 node (-N 1) with 128 cores per task (-c 128) for 1 task (default). The wall-time limit is set to 5 days (-t 5-00:00:00) which is the maximum. The account is set to lt999999 (-A lt999999) that is subjected to change to your own account. The job name is set to GROMACS (-J GROMACS ).

Info
To specify computing resource, change the number of cores at the `-c` option: full node (`-c 128`), half-node (`-c 64`), 1/4-node (`-c 32`)

2.2 using compute node (>1 node)

Code Block

#!/bin/bash
#SBATCH -p compute      		         #specific partition
#SBATCH -N 4 --ntasks-per-node=64 -c 2 	 #specific number of nodes, task per node, and taskcores per nodetask
#SBATCH -t 5-00:00:00        		     #job time limit <hr:min:sec>
#SBATCH -A lt999999             		     #project name
#SBATCH -J GROMACS      		         #job name

##Module Load##
module restore
module load GROMACS/2022.5-GNU-11.2-CUDA-11.7

srun -c $SLURM_CPUS_PER_TASK gmx_mpi mdrun -deffnm input -ntomp $SLURM_CPUS_PER_TASK

The script above using compute partition (-p compute), 4 node (-N 4) with 2 cores per task (-c 2) for 64 tasks per node (--ntasks-per-node=64). This result in 2 x 64= 128 cores per node and 4 x 128= 512 cores in total. The wall-time limit is set to 5 days (-t 5-00:00:00) which is the maximum. The account is set to lt999999 (-A lt999999) that is subjected to change to your own account. The job name is set to GROMACS (-J GROMACS ).

...

Expand

title	technical for advance user

One can tune GROMACS performance by adjust the number of MPI rank (-ntmpi) and number of cores in rank (-ntomp). The -ntmpi matches with slurm total number of tasks (-n or --ntask-per-node multiply by -N) and -ntomp matches with slurm cores per task (-c, --cpus-per-task).

2.3 using GPU node (1 card)

Code Block

#!/bin/bash
#SBATCH -p gpu                          #specific partition
#SBATCH -N 1 --ntasks-per-node=1 -c 16  #specific number of nodes, task per node, and taskcores per nodetask
#SBATCH --gpus-per-task=1		        #specific number of GPU per task
#SBATCH -t 5-00:00:00                   #job time limit <hr:min:sec>
#SBATCH -A lt999999                       #project name
#SBATCH -J GROMACS                   	#job name

##Module Load##
module restore
module load GROMACS/2022.5-GNU-11.2-CUDA-11.7

gmx mdrun -deffnm input -update gpu

The script above using gpu partition (-p gpu), 1 node (-N 1) with 16 cores per task (-c 16) and 1 GPU card per task (--gpus-per-task=1) for 1 tasks per node (--ntasks-per-node=1). This result in 1 x 16= 16 cores with 1 x 1= 1 GPU card. The wall-time limit is set to 5 days (-t 5-00:00:00) which is the maximum. The account is set to lt999999 (-A lt999999) that is subjected to change to your own account. The job name is set to GROMACS (-J GROMACS ).

2.4 using GPU node (>1 cards)

Code Block

#!/bin/bash
#SBATCH -p gpu                          	#specific partition
#SBATCH -N 1 --ntasks-per-node=4 -c 16  	#specific number of nodes, task per node, and taskcores per nodetask
#SBATCH --gpus-per-task=1		            #specific number of GPU per task
#SBATCH -t 5-00:00:00                     	#job time limit <hr:min:sec>
#SBATCH -A lt999999                       	#project name
#SBATCH -J GROMACS                   	    #job name

##Module Load##
module restore
module load GROMACS/2022.5-GNU-11.2-CUDA-11.7

export#export GMX_GPU_DD_COMMS=true
export#export GMX_GPU_PME_PP_COMMS=true
export GMX_ENABLE_DIRECT_GPU_COMM=true

gmx mdrun -deffnm input -update gpu -nb gpu -bonded gpu -pme gpu -ntmpi 8 -ntomp 8 -npme 1

The script above using gpu partition (-p gpu), 1 node (-N 1) with 16 cores per task (-c 16) and 1 GPU card per task (--gpus-per-task=1) for 1 tasks per node (--ntasks-per-node=4). This result in 4 x 16= 64 cores with 4 x 1= 4 GPU card. The wall-time limit is set to 5 days (-t 5-00:00:00) which is the maximum. The account is set to lt999999 (-A lt999999) that is subjected to change to your own account. The job name is set to GROMACS (-J GROMACS ).

Note: Two environment variables (line 13, 14) are set to enable GPU direct communication. see Massively Improved Multi-node NVIDIA GPU Scalability with GROMACS | NVIDIA Technical Blog for more detail.

In GROMACS 2022 and 2023, the GMX_GPU_DD_COMMS and GMX_GPU_PME_PP_COMMS are removed, please use GMX_ENABLE_DIRECT_GPU_COMM instead, see Environment Variables — GROMACS 2022 documentation and Environment Variables - GROMACS 2023 documentation for detail.

Info
To specify computing resource, change the number of tasks per node (`--ntasks-per-node)` to the number of GPU cards you want to use and change `-ntmpi` and `-ntomp` to match with the total number of CPU cores. The total number of CPU equals to `--ntasks-per-node` multiply by `-c` , e.g. 4 x 16 = 64 in this case, therefore, `-ntmpi` is set to 8 and `-ntomp` is set to 8 (8 x8 = 64).

3. Job submission

using sbatch submit.sh command to submit the job to the queuing system.

...

Versions Compared

Old Version 14

New Version Current

Key

Available version

1. Input file

2. Job submission script

2.1 using compute node (1 node)

2.2 using compute node (>1 node)

2.3 using GPU node (1 card)

2.4 using GPU node (>1 cards)

3. Job submission