Table of Contents |
---|
minLevel | 1 |
---|
maxLevel | 2 |
---|
include | |
---|
outline | false |
---|
indent | |
---|
exclude | |
---|
type | list |
---|
class | |
---|
printable | false |
---|
|
...
Expand |
---|
title | [Feb 2024] Current CPE toolchains |
---|
|
CPE toolchain | Note |
---|
cpeGNU/23.03 | GCC 11.2.0 | cpeCray/23.03 | CCE 15.0.1 | cpeIntel/23.03 | Deprecated and hidden. It will be removed in the future. | cpeIntel/23.09 | Intel Compiler 2023.1.0 |
|
...
1. Slurm sbatch header
Anchor |
---|
| SbatchHeader |
---|
| SbatchHeader |
---|
isMissingRequiredParameters | true |
---|
|
The
#SBATCH
macro can be used to specify sbatch options that mostly unchanged, such as partition, time limit, billing account, and so on. For optional options like job name, users can specify them when submitting the script (see
Submitting a job). For more details regarding sbatch options, please visit
Slurm sbatch.
It should be noted that Mostly, Slurm sbatch options only define and request computing resources that can be used inside a job script. The actual resources used by a software/executable can be different depending on how it will be invoked/issued (see Stage 5). For GPU jobs, we recommend using either --gpus
or --gpus-per-node
to request GPUs at this stage; additionally, please also see GPU binding.
2. Loading modules
It is advised to load every module used when installing the software in the job script, although build dependencies such as CMake, Autotools, and binutils can be omitted. Additionally, those modules should be of the same version as when they were used to compile the program.
...
There is usually no need to add options to srun
since, by default, Slurm will automatically derive them from sbatch
. However, we recommend explicitly adding GPU binding options such as
Anchor |
---|
| SrunGPUBinding |
---|
| SrunGPUBinding |
---|
|
Expand |
---|
|
When using --gpus-per-
|
...
node or no additional srun options, all tasks on the same node will see the same set of GPU IDs available on the node. Try
Code Block |
---|
| salloc -p gpu-devel -N2 --gpus-per-node=4 -t 00:05:00 -J "GPU-ID" # Note: using default --ntasks-per-node=1
srun nvidia-smi -L
srun --ntasks-per-node=4 nvidia-smi -L
srun --ntasks-per-node=2 --gpus-per-node=3 nvidia-smi -L
exit
squeue --me |
In this case, you can create a wrapper script to use SLURM_LOCALID to set CUDA_VISIBLE_DEVICES of the tasks within the same node, see HPE intro_mpi (Section 1). On the other hands, when using --gpus-per-task or --ntasks-per-gpu to bind resources, the GPU ID seen by each task will be starting from 0 but will be bound to a different GPU/UUID. Try Code Block |
---|
| salloc -p gpu-devel -N1 --gpus=4 -t 00:05:00 -J "GPU-ID" # Note: using default --ntasks-per-node=1
srun --ntasks=4 --gpus-per-task=1 nvidia-smi -L
srun --ntasks-per-gpu=4 nvidia-smi -L
exit
squeue --me |
This approach is not recommended when running the software across GPU nodes, see HPE intro_mpi (Section 1).
|
Note |
---|
For multi-threaded applications, it is essential to specify -c or --cpus-per-tasks options for srun to prevent a potential decrease in performance (~50%) due to improper CPU binding. |
...
Info |
---|
You can test your initial script on compute-devel or gpu-devel partitions, using #SBATCH -t 02:00:00 , since they normally have a shorter queuing time. |
...
Example
Installation guide
...