Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
minLevel1
maxLevel2
include
outlinefalse
indent
exclude
typelist
class
printablefalse

...

Expand
title [Feb 2024] Current CPE toolchains

CPE toolchain

Note

cpeGNU/23.03

GCC 11.2.0

cpeCray/23.03

CCE 15.0.1

cpeIntel/23.03

Deprecated and hidden. It will be removed in the future.

cpeIntel/23.09

Intel Compiler 2023.1.0

...

Expand
titleGPU Binding
  1. When using --gpus-per-node or no additional srun without any options, all tasks on the same node will see the same set of GPU IDs, starting from 0, available on the node. Try

    Code Block
    languagebash
    salloc -p gpu-devel -N2 --gpus-per-node=4 -t 00:05:00 -J "GPU-ID"  # Note: using default --ntasks-per-node=1
    srun nvidia-smi -L
    srun --ntasks-per-node=4 nvidia-smi -L
    srun --ntasks-per-node=2 --gpus-per-node=3 nvidia-smi -L
    exit               # Release salloc
    squeue --me        # Check that no "GPU-ID" job still running 

    In this case, you can use SLURM_LOCALID or others to set CUDA_VISIBLE_DEVICES of each task. For example, you could use a wrapper script mentioned in HPE intro_mpi (Section 1) or you could devise an algorithm and use torch.cuda.set_device in PyTorch as demonstrated here.

  2. On the other hands, when using --gpus-per-task or --ntasks-per-gpu to bind resources, the GPU IDs seen by each task will be starting start from 0 but will be bound to a different GPU/UUID. Try

    Code Block
    languagebash
    salloc -p gpu-devel -N1 --gpus=4 -t 00:05:00 -J "GPU-ID"  # Note: using default --ntasks-per-node=1
    srun --ntasks=4 --gpus-per-task=1 nvidia-smi -L
    srun --ntasks-per-gpu=4 nvidia-smi -L
    exit              # Release salloc
    squeue --me       # Check that no "GPU-ID" job still running 

    AdditionallyHowever, it is stated in HPE intro_mpi (Section 1) that using these options with CrayMPICH could introduce an intra-node MPI performance drawback.

...