Page Comparison

Table of Contents

minLevel	1
maxLevel	2
include
outline	false
indent
exclude
type	list
class
printable	false

...

Expand

title	[Feb 2024] Current CPE toolchains

CPE toolchain	Note
cpeGNU/23.03	GCC 11.2.0
cpeCray/23.03	CCE 15.0.1
cpeIntel/23.03	Deprecated and hidden. It will be removed in the future.
cpeIntel/23.09	Intel Compiler 2023.1.0

...

Mostly, Slurm sbatch options only define and request computing resources that can be used inside a job script. The actual resources used by a software/executable can be different depending on how it will be invoked/issued (see Stage 5). For GPU jobs, we recommend using either --gpus or --gpus-per-node to request GPUs at this stage; additionally, please also see GPU binding.

2. Loading modules
It is advised to load every module used when installing the software in the job script, although build dependencies such as CMake, Autotools, and binutils can be omitted. Additionally, those modules should be of the same version as when they were used to compile the program.

...

Expand

title	GPU Binding

When using --gpus-per-node or no additional srun options, all tasks on the same node will see the same set of GPU IDs available on the node. Try

Code Block

language	bash

salloc -p gpu-devel -N2 --gpus-per-node=4 -t 00:05:00 -J "GPU-ID"     # Note: using default --ntasks-per-node=1
srun nvidia-smi -L
srun --ntasks-per-node=4 nvidia-smi -L
srun --ntasks-per-node=2 --gpus-per-node=3 nvidia-smi -L
exit               # Release salloc
squeue --me        # Check that no "GPU-ID" job still running

In this case, you can create a wrapper script to use SLURM_LOCALID or others to set CUDA_VISIBLE_DEVICES of the tasks within the same node, see each task. For example, you could use a wrapper script mentioned in HPE intro_mpi (Section 1) or you could devise an algorithm and use torch.cuda.set_device in PyTorch as demonstrated here.

On the other hands, when using --gpus-per-task or --ntasks-per-gpu to bind resources, the GPU ID IDs seen by each task will be starting from 0 but will be bound to a different GPU/UUID. Try

Code Block

language	bash

salloc -p gpu-devel -N1 --gpus=4 -t 00:05:00 -J "GPU-ID"    # Note: using default --ntasks-per-node=1
srun --ntasks=4 --gpus-per-task=1 nvidia-smi -L
srun --ntasks-per-gpu=4 nvidia-smi -L
exit              # Release salloc
squeue --me

This approach is not recommended when running the software across GPU nodes, see

       # Check that no "GPU-ID" job still running

Additionally, it is stated in HPE intro_mpi (Section 1) that using these options with CrayMPICH could introduce an intra-node MPI performance drawback.

Note
For multi-threaded applications, it is essential to specify `-c` or `--cpus-per-tasks` options for `srun` to prevent a potential decrease in performance (~50%) due to improper CPU binding.

...

Versions Compared

Old Version 16

New Version 17

Key