Page Comparison

Table of Contents

minLevel	1
maxLevel	2
include
outline	false
indent
exclude
type	list
class
printable	false

...

Expand

title	[Feb 2024] Current CPE toolchains

CPE toolchain	Note
cpeGNU/23.03	GCC 11.2.0
cpeCray/23.03	CCE 15.0.1
cpeIntel/23.03	Deprecated and hidden. It will be removed in the future.
cpeIntel/23.09	Intel Compiler 2023.1.0

...

1. Slurm sbatch header

Anchor

	SbatchHeader
	SbatchHeader
isMissingRequiredParameters	true

The #SBATCH macro directives can be used to specify sbatch options that mostly unchanged, such as partition, time limit, billing account, and so on. For optional options like job name, users can specify them when submitting the script (see Submitting a job). For more details regarding sbatch options, please visit Slurm sbatch.

Mostly, Slurm sbatch options only define and request computing resources that can be used inside a job script. The actual resources used by a software/executable can be different depending on how it will be invoked/issued (see Stage 5), although these sbatch options are passed and become the default options for it. For GPU jobs, we recommend using either --gpus or --gpus-per-node to request GPUs at this stage ; additionally, please see GPU bindingwill provide the most flexibility for the next stage, GPU binding.

If your application software only supports parallelization by multi-threading, then your software cannot utilize resources across nodes; in this case, therefore, -N, -n, --ntasks and --ntasks-per-node should be set to 1.

2. Loading modules
It is advised to load every module used when installing the software in the job script, although build dependencies such as CMake, Autotools, and binutils can be omitted. Additionally, those modules should be of the same version as when they were used to compile the program.

...

Expand

title	More information

If some software dependencies were installed locally, their search paths should also be added.
We do NOT recommend specifying these search paths in ~/.bashrc directly, as it could lead to library internal conflicts when having more than one main software.
Some software provides a script to be sourced before using. In this case, sourcing it in your job script should be equivalent to adding its search paths manually by yourself.

When executing your program, if you encounter

If 'xxx' is not a typo you can use command-not-found to lookup ..., then, your current PATH variable may be incorrect.
xxx: error while loading shared libraries: libXXX.so: cannot open shared object file, then,
- If libXXX.so seem to be related to your software, then you may set LD_LIBRARY_PATH variable in Step 3 incorrectly.
- If libXXX.so seem to be from a module you used to build your software, then loading that module should fix the problem.
ModuleNotFoundError: No module named 'xxx', then, your current PYTHONPATH may be incorrect.

Preliminary check could be performed on frontend node by doing something like

Code Block

language	bash

bash   # You should check them in another bash shell

module purge
module load <...>
module load <...>

export PATH=<software-bin-path>:${PATH}
export LD_LIBRARY_PATH=<software-lib/lib64-path>:${LD_LIBRARY_PATH}
export PYTHONPATH=<software-python-site-packages>:${PYTHONPATH}

<executable> --help
<executable> --version

exit

4. Setting environment variables
Some software requires additional environment variables to be set at runtime; for example, the path to the temporary directory. Parameters Output environment variables set by Slurm sbatch (see Slurm sbatch - output environment variables) could be utilized in setting up used to set software-specific environment variablesparameters.
For application with OpenMP threading, OMP_NUM_THREADS, OMP_STACKSIZE, ulimit -s unlimited are commonly set in a job script. An example is shown below.

...

Usually, either srun, mpirun, mpiexec or aprun is required to run MPI programs. On LANTA, srun command MUST be used insteadto launch MPI processes. The table below compares a few options of those commands.

Command	Total MPI processes	CPU per MPI process	MPI processes per node
srun	-n, --ntasks	-c, --cpus-per-task	--ntasks-per-node
mpirun/mpiexec	-n, -np	--map-by socket:PE=N	--map-by ppr:N:node
aprun	-n, --pes	-d, --cpus-per-pe	-N, --pes-per-node

There is usually no need to explicitly add options option to srun since, by default, Slurm will automatically derive them from sbatch, with the exception of --cpus-per-task.

Anchor

	SrunGPUBinding
	SrunGPUBinding

Expand

title	GPU Binding

When using --gpus-per-node or without any options, all tasks on the same node will see the same set of GPU IDs, starting from 0, available on the node. Try

Code Block

language	bash

salloc -p gpu-devel -N2 --gpus-per-node=4 -t 00:05:00 -J "GPU-ID"  # Note: using default --ntasks-per-node=1
srun nvidia-smi -L
srun --ntasks-per-node=4 nvidia-smi -L
srun --ntasks-per-node=2 --gpus-per-node=3 nvidia-smi -L
exit               # Release salloc
myqueue squeue   --me        # Check that no "GPU-ID" job still running

In this case, you can use SLURM_LOCALID or others to set CUDA_VISIBLE_DEVICES of each task. For example, you could use a wrapper script mentioned in HPE intro_mpi (Section 1) or you could devise an algorithm and use torch.cuda.set_device in PyTorch as demonstrated here.

On the other hands, when using --gpus-per-task or --ntasks-per-gpu to bind resources, the GPU IDs seen by each task will start from 0 (CUDA_VISIBLE_DEVICES) but will be bound to a different GPU/UUID. Try

Code Block

language	bash

salloc -p gpu-devel -N1 --gpus=4 -t 00:05:00 -J "GPU-ID"  # Note: using default --ntasks-per-node=1
srun --ntasks=4 --gpus-per-task=1 nvidia-smi -L
srun --ntasks-per-gpu=4 nvidia-smi -L
exit              # Release salloc
myqueue  squeue  --me       # Check that no "GPU-ID" job still running

However, it is stated in HPE intro_mpi (Section 1) that using these options with CrayMPICH could introduce an intra-node MPI performance drawback.

Note
For multi-threaded hybrid (MPI+Multi-threading) applications, it is essential to specify `-c` or `--cpus-per-tasks` options for `srun` to prevent a potential decrease in performance (~50%>10%) due to improper CPU binding.

...

Info
You can test your initial script on compute-devel or gpu-devel partitions, using `#SBATCH -t 02:00:00`, since they normally have a shorter queuing time.

Your entire job script will only run on the first requested node (${SLURMD_NODENAME}). Only the lines starting with srun could initiate process and run on the other nodes.

...

Example

MiniWeather

Installation guide

...

Versions Compared

Old Version 19

New Version Current

Key

Example

Installation guide