Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
minLevel1
maxLevel2
include
outlinefalse
indent
exclude
typelist
class
printablefalse

...

Expand
title [Feb 2024] Current CPE toolchains

CPE toolchain

Note

cpeGNU/23.03

GCC 11.2.0

cpeCray/23.03

CCE 15.0.1

cpeIntel/23.03

Deprecated and hidden. It will be removed in the future.

cpeIntel/23.09

Intel Compiler 2023.1.0

...

1. Slurm sbatch header

Anchor
SbatchHeader
SbatchHeader
isMissingRequiredParameterstrue

The #SBATCH macro directives can be used to specify sbatch options that mostly unchanged, such as partition, time limit, billing account, and so on. For optional options like job name, users can specify them when submitting the script (see Submitting a job). For more details regarding sbatch options, please visit Slurm sbatch.

Mostly, Slurm sbatch options only define and request computing resources that can be used inside a job script. The actual resources used by a software/executable can be different depending on how it will be invoked/issued (see Stage 5), although these sbatch options are passed and become the default options for it. For GPU jobs, we recommend using either --gpus or --gpus-per-node to request GPUs at this stage ; additionally, please see GPU bindingwill provide the most flexibility for the next stage, GPU binding.

If your application software only supports parallelization by multi-threading, then your software cannot utilize resources across nodes; in this case, therefore, -N, -n, --ntasks and --ntasks-per-node should be set to 1.

2. Loading modules
It is advised to load every module used when installing the software in the job script, although build dependencies such as CMake, Autotools, and binutils can be omitted. Additionally, those modules should be of the same version as when they were used to compile the program.

...

Expand
titleMore information
  • If some software dependencies were installed locally, their search paths should also be added.

  • We do NOT recommend specifying these search paths in ~/.bashrc directly, as it could lead to library internal conflicts when having more than one main software.

  • Some software provides a script to be sourced before using. In this case, sourcing it in your job script should be equivalent to adding its search paths manually by yourself.


When executing your program, if you encounter

  • If 'xxx' is not a typo you can use command-not-found to lookup ..., then, your current PATH variable may be incorrect.

  • xxx: error while loading shared libraries: libXXX.so: cannot open shared object file, then,

    • If libXXX.so seem to be related to your software, then you may set LD_LIBRARY_PATH variable in Step 3 incorrectly.

    • If libXXX.so seem to be from a module you used to build your software, then loading that module should fix the problem.

  • ModuleNotFoundError: No module named 'xxx', then, your current PYTHONPATH may be incorrect.


Preliminary check could be performed on frontend node by doing something like

Code Block
languagebash
bash   # You should check them in another bash shell

module purge
module load <...>
module load <...>

export PATH=<software-bin-path>:${PATH}
export LD_LIBRARY_PATH=<software-lib/lib64-path>:${LD_LIBRARY_PATH}
export PYTHONPATH=<software-python-site-packages>:${PYTHONPATH}

<executable> --help
<executable> --version

exit

4. Setting environment variables
Some software requires additional environment variables to be set at runtime; for example, the path to the temporary directory. Parameters Output environment variables set by Slurm sbatch (see Slurm sbatch - output environment variables) could be utilized in setting up used to set software-specific environment variablesparameters.
For application with OpenMP threading, OMP_NUM_THREADS, OMP_STACKSIZE, ulimit -s unlimited are commonly set in a job script. An example is shown below.

...

Usually, either srun, mpirun, mpiexec or aprun is required to run MPI programs. On LANTA, srun command MUST be used insteadto launch MPI processes. The table below compares a few options of those commands.

Command

Total MPI processes

CPU per MPI process

MPI processes per node

srun

-n, --ntasks

-c, --cpus-per-task

--ntasks-per-node

mpirun/mpiexec

-n, -np

--map-by socket:PE=N

--map-by ppr:N:node

aprun

-n, --pes

-d, --cpus-per-pe

-N, --pes-per-node

There is usually no need to explicitly add options option to srun since, by default, Slurm will automatically derive them from sbatch, with the exception of --cpus-per-task.

Anchor
SrunGPUBinding
SrunGPUBinding

Expand
titleGPU Binding
  1. When using --gpus-per-node or without any options, all tasks on the same node will see the same set of GPU IDs, starting from 0, available on the node. Try

    Code Block
    languagebash
    salloc -p gpu-devel -N2 --gpus-per-node=4 -t 00:05:00 -J "GPU-ID"  # Note: using default --ntasks-per-node=1
    srun nvidia-smi -L
    srun --ntasks-per-node=4 nvidia-smi -L
    srun --ntasks-per-node=2 --gpus-per-node=3 nvidia-smi -L
    exit               # Release salloc
    myqueue squeue   --me        # Check that no "GPU-ID" job still running 

    In this case, you can use SLURM_LOCALID or others to set CUDA_VISIBLE_DEVICES of each task. For example, you could use a wrapper script mentioned in HPE intro_mpi (Section 1) or you could devise an algorithm and use torch.cuda.set_device in PyTorch as demonstrated here.

  2. On the other hands, when using --gpus-per-task or --ntasks-per-gpu to bind resources, the GPU IDs seen by each task will start from 0 (CUDA_VISIBLE_DEVICES) but will be bound to a different GPU/UUID. Try

    Code Block
    languagebash
    salloc -p gpu-devel -N1 --gpus=4 -t 00:05:00 -J "GPU-ID"  # Note: using default --ntasks-per-node=1
    srun --ntasks=4 --gpus-per-task=1 nvidia-smi -L
    srun --ntasks-per-gpu=4 nvidia-smi -L
    exit              # Release salloc
    myqueue  squeue  --me       # Check that no "GPU-ID" job still running 

    However, it is stated in HPE intro_mpi (Section 1) that using these options with CrayMPICH could introduce an intra-node MPI performance drawback.

Note

For multi-threaded hybrid (MPI+Multi-threading) applications, it is essential to specify -c or --cpus-per-tasks options for srun to prevent a potential decrease in performance (~50%>10%) due to improper CPU binding.

...

Info

You can test your initial script on compute-devel or gpu-devel partitions, using #SBATCH -t 02:00:00, since they normally have a shorter queuing time.

Your entire job script will only run on the first requested node (${SLURMD_NODENAME}). Only the lines starting with srun could initiate process and run on the other nodes.

...

Example

Installation guide

...