Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 19 Current »


1. Preparing software environment

This section offers guidelines on setting up an environment for building and running application software on LANTA.

There are mainly three approaches in preparing an environment on LANTA:

Users should select one over another. They should not be mixed, since library conflicts may occur.

1.1 HPE Cray Programming Environment

LANTA is an HPE Cray EX cluster. On the system, HPE Cray Programming Environment (PrgEnv or CPE) is installed by the vendor and is preferred. The environment provides a uniform interface across different sets of compiler and libraries. Below are the modules for each available compiler suite.

Module name

Description

Note

PrgEnv-gnu

GNU compiler suite

-

PrgEnv-intel

INTEL compiler suite

Intel oneAPI (default), with MKL

PrgEnv-cray

Cray Compiling Environment (CCE)

Loaded by default, upon login

PrgEnv-nvhpc

NVIDIA HPC SDK compiler suite

Inherently with CUDA

PrgEnv-aocc

AMD AOCC compiler suite

Without AOCL

 More information
  • Execute cc --version, CC --version, or ftn --version to check which compiler is being used.

  • With PrgEnv-intel loaded, ${MKLROOT} is set to the corresponding Intel Math Kernal Library (MKL).

  • By the defaults of PrgEnv-intel, the C/C++ compiler is ICX/ICPX while the Fortran compiler is IFORT.

  • To use only Intel Classic, execute module swap intel intel-classic after loading PrgEnv-intel

  • To use only Intel oneAPI, execute module swap intel intel-oneapi after loading PrgEnv-intel

  • With PrgEnv-nvhpc loaded, ${NVIDIA_PATH} is set to the corresponding NVIDIA SDK location.

  • There is PrgEnv-nvidia, but it will become deprecated soon, so it is not recommended.

  • With PrgEnv-aocc loaded, ${AOCC_PATH} is set to the corresponding AOCC location.


--------------------------------- /opt/cray/pe/lmod/modulefiles/core ---------------------------------
PrgEnv-aocc   (2)   cce          (3)   cray-libpals (3)   craypkg-gen   (2)   nvhpc          (2)
PrgEnv-cray   (2)   cpe-cuda     (3)   cray-libsci  (3)   cudatoolkit   (6)   nvidia         (2)
PrgEnv-gnu    (2)   cpe          (3)   cray-mrnet   (2)   gcc           (3)   papi           (3)
PrgEnv-intel  (2)   cray-R       (2)   cray-pals    (3)   gdb4hpc       (3)   perftools-base (3)
PrgEnv-nvhpc  (2)   cray-ccdb    (2)   cray-pmi     (3)   intel-classic (2)   sanitizers4hpc (2)
PrgEnv-nvidia (2)   cray-cti     (5)   cray-python  (2)   intel-oneapi  (2)   valgrind4hpc   (3)
aocc          (2)   cray-dsmml   (1)   cray-stat    (2)   intel         (2)
atp           (3)   cray-dyninst (2)   craype       (3)   iobuf         (1)

--------------------------------- /opt/cray/pe/lmod/modulefiles/craype-targets/default ----------------------------------
craype-x86-milan        (1)     craype-accel-nvidia80   (1)      ... other modules ...

GPU acceleration

For building an application with GPU acceleration, users can use either PrgEnv-nvhpc, cudatoolkit/<version> or nvhpc-mixed. We recommend using PrgEnv-nvhpc for completeness.

 More information
  • [Feb 2024] According to the information from nvidia-smi, NVIDIA driver version on LANTA is currently525.105.17 which supports CUDA version up to 12.0.

  • Please be aware of the compatibility between CUDA and cray-mpich. For instance, CUDA 12 is supported only since cray-mpich/8.1.27.

  • For cross-compile, you may have to use nvcc -ccbin cc to compile .cu

  • You have to manually pass the respective OpenACC/OpenMP offloading flags of each compiler.

  • You may have to pass -lcudart (cudatoolkit), -L${NVIDIA_PATH}/cuda/lib64 -lcudart -lcuda (nvhpc-mixed) and possibly a few others at linking phase (or by setting them in LDFLAGS before configure).

Build target

To enable optimizations that depend on the hardware architecture of LANTA, the following modules should be loaded together with PrgEnv.

Module name

Hardware target

Note

craype-x86-milan

AMD EPYC Milan (x86)

-

craype-accel-nvidia80

NVIDIA A100

Load after PrgEnv-nvhpc, cudatoolkit or nvhpc-mixed

 More information
  • If craype-x86-milan is not loaded when using cc, CC or ftn, you may get a warning
    No supported cpu target is set, CRAY_CPU_TARGET=x86-64 will be used. Load a valid targeting module or set CRAY_CPU_TARGET

  • With the target modules loaded, -march, -mtune, -hcpu, -haccel, -tp, -gpu, ... will be automatically filled in by cc, CC, ftn. To visualize this, try cc --version -craype-verbose.

Cray optimized libraries

Most Cray optimized libraries become accessible only after loading a PrgEnv, ensuring compatibility with the selected compiler suite. Additionally, some libraries, such as NetCDF, require loading other specific libraries first. Below is the hierarchy of commonly used cray-* modules.

LANTA-cray-mod_highdef.png
 Example: PrgEnv-intel + NetCDF
module purge
module load craype-x86-milan
module load PrgEnv-intel
module load cray-hdf5-parallel
module load cray-netcdf-hdf5parallel
module list
# Without version specified, the latest version will be loaded.

CPE version

To ensure backward compatibility after a system upgrade, it is recommended to fix the Cray Programming Environment version using either cpe/<version> or cpe-cuda/<version>. Otherwise, the most recent version will be loaded by default.

 Note: unload cpe / cpe-cuda

When unloading cpe or cpe-cuda, you will encounter,

Unloading the cpe module is insufficient to restore the system defaults.
Please run 'source /opt/cray/pe/cpe/<version>/restore_lmod_system_defaults.[csh|sh]'.

This simply means that the remaining modules are still of the cpe/<version> that you just unloaded. They are not automatically reverted back to the system default version. It is necessary to execute the above command to manually reload and restore them to the defaults.

On the other hand, if you use module purge, you can safely ignore the above message.

1.2 ThaiSC pre-built modules

For user convenience, we provide several shared modules of some widely used software and libraries. These modules were built on top of the HPE Cray Programming Environment, using the CPE toolchain.

CPE toolchain

A CPE toolchain module is a bundle of craype-x86-milan, PrgEnv-<compiler> and cpe-cuda/<version>. The module is defined as a toolchain for convenience and for use with EasyBuild, the framework used for installing most ThaiSC modules.

  [Feb 2024] Current CPE toolchains

CPE toolchain

Note

cpeGNU/23.03

GCC 11.2.0

cpeCray/23.03

CCE 15.0.1

cpeIntel/23.03

Deprecated and hidden. It will be removed in the future.

cpeIntel/23.09

Intel Compiler 2023.1.0

ThaiSC modules

All ThaiSC modules are located at the same module path, so there is no module hierarchy. Executing module avail on LANTA will display all available ThaiSC modules. For a more concise list, you can use module overview, then, use module spider <name> to learn more about each specific module.

Users can readily use ThaiSC modules and CPE toolchains to build their applications. Some popular application software are pre-installed as well, for more information, refer to Applications usage.

username@lanta-xname:~> module overview
--------------------------------- /lustrefs/disk/modules/easybuild/modules/all ---------------------------------
ADIOS2        (2)   GATK                  (1)   NASM            (1)   Tcl          (2)   groff         (2)
ATK           (2)   GDAL                  (2)   NLopt           (2)   Tk           (2)   hwloc         (1)
Amber         (1)   GEOS                  (2)   Nextflow        (2)   Trimmomatic  (1)   intltool      (1)
Apptainer     (1)   GLM                   (2)   Ninja           (1)   UDUNITS2     (1)   jbigkit       (2)
Armadillo     (2)   GLib                  (2)   OSPRay          (2)   VASP         (3)   libGLU        (2)
AutoDock-vina (1)   GMP                   (3)   OpenCASCADE     (2)   VCFtools     (1)   libaec        (2)
Autoconf      (1)   GObject-Introspection (2)   OpenEXR         (2)   WPS          (2)   libdeflate    (2)
Automake      (1)   GROMACS               (2)   OpenFOAM        (2)   WRF          (1)   libdrm        (2)
Autotools     (1)   GSL                   (3)   OpenJPEG        (2)   WRFchem      (2)   libepoxy      (2)
BCFtools      (1)   Gaussian              (1)   OpenMPI         (1)   Wayland      (2)   libffi        (1)
BEDTools      (1)   GenericIO             (2)   OpenSSL         (1)   X11          (2)   libgeotiff    (2)
BLAST+        (1)   Gmsh                  (1)   OpenTURNS       (2)   XZ           (3)   libglvnd      (2)
BLASTDB       (1)   Go                    (1)   PCRE            (1)   Xerces-C++   (1)   libiconv      (2)
BWA           (1)   HDF-EOS               (2)   PCRE2           (1)   Yasm         (1)   libjpeg-turbo (3)
BamTools      (1)   HDF                   (2)   PDAL            (1)   arpack-ng    (2)   libpciaccess  (2)
Beast         (1)   HTSlib                (1)   PETSc           (2)   assimp       (2)   libpng        (3)
Bison         (1)   HYPRE                 (2)   PROJ            (2)   at-spi2-atk  (2)   libreadline   (2)
Blosc         (2)   HarfBuzz              (2)   Pango           (2)   at-spi2-core (2)   libtirpc      (2)
Boost         (4)   ICU                   (3)   ParFlow         (1)   aws-ofi-nccl (1)   libtool       (1)
Bowtie        (1)   Imath                 (2)   ParMETIS        (2)   beagle-lib   (1)   libunwind     (2)
Bowtie2       (1)   JasPer                (3)   ParaView        (1)   binutils     (1)   libxml2       (3)
Brotli        (2)   Java                  (2)   ParallelIO      (1)   bzip2        (3)   lz4           (3)
C-Blosc2      (2)   KaHIP                 (2)   Perl            (2)   cURL         (1)   minimap2      (1)
CFITSIO       (2)   LAME                  (1)   PostgreSQL      (2)   cairo        (2)   nccl          (1)
CGAL          (2)   LLVM                  (1)   QuantumESPRESSO (2)   canu         (1)   ncurses       (2)
CMake         (2)   LMDB                  (1)   RAxML-NG        (1)   cpeCray      (1)   nlohmann_json (1)
CrayNVHPC     (1)   LibTIFF               (2)   SAMtools        (1)   cpeGNU       (1)   numactl       (1)
DB            (2)   M4                    (1)   SCOTCH          (2)   cpeIntel     (1)   pixman        (1)
DBus          (1)   MAFFT                 (1)   SDL2            (2)   ecCodes      (2)   pkgconf       (1)
ESMF          (2)   METIS                 (2)   SLEPc           (2)   expat        (2)   tbb           (1)
EasyBuild     (1)   MPC                   (2)   SPAdes          (1)   flex         (1)   termcap       (1)
Eigen         (1)   MPFR                  (2)   SQLite          (2)   fontconfig   (2)   x264          (1)
FDS           (1)   MUMPS                 (3)   SWIG            (3)   freetype     (2)   x265          (1)
FFmpeg        (2)   Mako                  (2)   SYCL            (1)   gettext      (1)   xorg-macros   (2)
FastQC        (1)   Mamba                 (1)   SZ              (2)   git-lfs      (1)   xprop         (2)
FortranGIS    (2)   Mesa                  (2)   SpectrA         (1)   googletest   (1)   zfp           (2)
FreeXL        (2)   Meson                 (2)   SuiteSparse     (2)   gperf        (1)   zlib          (2)
FriBidi       (2)   MrBayes               (1)   SuperLU_DIST    (2)   gperftools   (2)   zstd          (3)
 Example: Boost/1.81.0-cpeGNU-23.03
module purge
module load Boost/1.81.0-cpeGNU-23.03
echo ${CPATH}
echo ${LIBRARY_PATH}
echo ${LD_LIBRARY_PATH}

2. Building an application software

After an appropriate environment is loaded, this section provides guidelines on how to use it to build an application software on LANTA.

2.1 Compiler wrapper

<wrapper> command

Description

Manual

In substitution for

cc

C compiler wrapper

man cc or cc --help

mpicc / mpiicc

CC

C++ compiler wrapper

man CC or CC --help

mpic++ / mpiicpc

ftn

Fortran compiler wrapper

man ftn or ftn --help

mpif90 / mpiifort

The Cray compiler wrappers, namely, cc, CC and ftn, become available after loading any PrgEnv-<compiler> or CPE toolchain. Upon being invoked, the wrapper will pass relevant information about the cray-* libraries, loaded in the current environment, to the underlying <compiler> to compile source code. It is recommended to use these wrappers for building MPI applications with the native Cray MPICH library cray-mpich.

Adding -craype-verbose to the wrapper when compiling a source file will display the final command executed. To see what will be added before compiling, try <wrapper> --cray-print-opts=all.

 More information
module purge
module load GDAL/3.6.4-cpeIntel-23.09
cc --cray-print-opts=all

The output of the final command indicates that include search paths (-I), library search paths (-L), and linking flags (-l<lib> such as -lnetcdf) of cray-* library are taken care of automatically by the wrapper; there is no need to manually pass them.

For ThaiSC libraries, their include and library search paths are stored in ${CPATH}, ${LIBRARY_PATH} and ${LD_LIBRARY_PATH}, and users have to manually add appropriate linking flags (-l<lib>). However, these steps should be covered automatically if using a build tool.

 Example: HelloWorld.c
# Create the source code
cat << Eof > HelloWorld.c
#include <stdio.h>
#include <mpi.h>
#include <omp.h>

int main(){
    int rank, thread ;
    omp_lock_t io_lock ;
    omp_init_lock(&io_lock);

    MPI_Init(NULL, NULL);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
    #pragma omp parallel private(thread)
    {
        thread = omp_get_thread_num();
        omp_set_lock(&io_lock);
        printf("HelloWorld from rank %d thread %d \n", rank, thread);
        omp_unset_lock(&io_lock);
    }

    omp_destroy_lock(&io_lock);
    MPI_Finalize();
    
return 0; }
Eof

module purge
module load craype-x86-milan
module load PrgEnv-intel
# or
# module load cpeIntel/23.09

# Compile the source file 'HelloWorld.c'
cc -craype-verbose -o hello.exe HelloWorld.c -fopenmp

# Run the program 
srun -p compute-devel -N1 -n4 -c2 hello.exe
 Example: HelloWorld.cpp
# Create the source code
cat << Eof > HelloWorld.cpp
#include <iostream>
#include <mpi.h>
#include <omp.h>

int main(){
    int rank, thread ;
    omp_lock_t io_lock ;
    omp_init_lock(&io_lock);

    MPI_Init(nullptr,nullptr);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
    #pragma omp parallel private(thread)
    {
        thread = omp_get_thread_num();
        omp_set_lock(&io_lock);
	    std::cout << "HelloWorld from rank " << rank << " thread " << thread << std::endl;
        omp_unset_lock(&io_lock);
    }

    omp_destroy_lock(&io_lock);
    MPI_Finalize();
    
return 0; }
Eof

module purge
module load craype-x86-milan
module load PrgEnv-intel
# or
# module load cpeIntel/23.09

# Compile the source file 'HelloWorld.cpp'
CC -craype-verbose -o hello.exe HelloWorld.cpp -fopenmp

# Run the program 
srun -p compute-devel -N1 -n4 -c2 hello.exe
 Example: HelloWorld.f90
# Create the source code
cat << Eof > HelloWorld.f90
program helloworld_mpi_omp
    use mpi
    use omp_lib

    implicit none
    integer :: rank, thread, provided, ierr
    integer(kind=omp_lock_kind) :: io_lock

    call omp_init_lock(io_lock)
    call MPI_Init(ierr)
    call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)

    !\$omp parallel private(thread)
    thread = omp_get_thread_num()

    call omp_set_lock(io_lock)
    print *, 'HelloWorld from rank ', rank, ' thread ', thread
    call omp_unset_lock(io_lock)
    !\$omp end parallel

    call omp_destroy_lock(io_lock)
    call MPI_Finalize(ierr)

end program helloworld_mpi_omp
Eof
# Note: change !\$omp to !$omp if you type the above code by hand.

module purge
module load craype-x86-milan
module load PrgEnv-intel
# or
# module load cpeIntel/23.09

# Compile the source file 'HelloWorld.cpp'
ftn -craype-verbose -o hello.exe HelloWorld.f90 -fopenmp

# Run the program 
srun -p compute-devel -N1 -n4 -c2 hello.exe

2.2 Build tools

Several tools exist to help us build large and complex programs. Among them, GNU make and CMake are commonly used. The developer team for each software chooses what build tool they will support. Therefore, it is important to thoroughly read the software documentation. For some software, users might need to additionally load the latest CMake or Autotools modules on the system (e.g., module load CMake/3.26.4).

There are three general stages in building a program using a build tool: configure, make and make install. For more information, see Basic Installation.

Build tools typically detect compilers through environment variables such as CC, CXX, and FC at the configure stage. Therefore, setting these variables before running configure should be sufficient to make the tool use the Cray compiler wrappers.

export CC=cc CXX=CC FC=ftn F77=ftn F90=ftn
# ./configure --prefix=<your-install-location> ...
# or 
# cmake -DCMAKE_INSTALL_PREFIX=<your-install-location> ...

Nevertheless, if the CMake cache is not clean, you might need to explicitly use:

cmake -DCMAKE_C_COMPILER=cc -DCMAKE_CXX_COMPILER=CC -DCMAKE_Fortran_COMPILER=ftn -DCMAKE_INSTALL_PREFIX=<your-install-location> ...

We encourage users to manually specify the installation path using --prefix= or -DCMAKE_INSTALL_PREFIX= as shown above. This path can be within your project home, such as /project/ltXXXXXX-YYYY/<software-name>, allowing you to manage permissions and share the installed software with your project members. By default on LANTA, your team will be able to read and execute your software but cannot make any changes inside the directory you own.

After these steps, you should be able to execute make and make install, then build your software as you would on any other system.

 Example: libjpeg-turbo/3.0.2-cpeCray-23.03
module purge
module load cpeCray/23.03
module load NASM/2.16.01
module load CMake/3.26.4

wget https://github.com/libjpeg-turbo/libjpeg-turbo/releases/download/3.0.2/libjpeg-turbo-3.0.2.tar.gz
tar xzf libjpeg-turbo-3.0.2.tar.gz
cd libjpeg-turbo-3.0.2
PARENT_DIR=$(pwd)
mkdir build ; cd build

export CC=cc CXX=CC FC=ftn F77=ftn F90=ftn
# or, for verbosity,
#export CC="cc -craype-verbose" CXX="CC -craype-verbose" FC="ftn -craype-verbose" F77="ftn -craype-verbose" F90="ftn -craype-verbose"

cmake -DCMAKE_INSTALL_PREFIX=${PARENT_DIR}/cpeCray-23.03 -G"Unix Makefiles" -DWITH_JPEG8=1 ..
make
make install

cd ${PARENT_DIR}/cpeCray-23.03
ln -s ./lib64 ./lib               # Optional, but recommended

2.3 Related topics


3. Running the software

Every main application software must run on compute/gpu/memory nodes. The recommended approach is to write a job script and send it to Slurm scheduler through sbatch command.

Only use sbatch <job-script>. If users execute bash <job-script> or just simply ./<job-script>, then the script will not get sent to Slurm and will run on frontend node instead!

3.1 Writing a job script

#!/bin/bash
#SBATCH -p gpu                 # Partition
#SBATCH -N 1                   # Number of nodes
#SBATCH --gpus=4               # Number of GPU cards
#SBATCH --ntasks=4             # Number of MPI processes
#SBATCH --cpus-per-task=16     # Number of OpenMP threads per MPI process
#SBATCH -t 5-00:00:00          # Job runtime limit
#SBATCH -A ltXXXXXX            # Billing account 
# #SBATCH -J <JobName>         # Job name

module purge
# --- Load necessary modules ---
module load <...>
module load <...>

# --- Add software to Linux search paths ---
export PATH=<software-bin-path>:${PATH}
export LD_LIBRARY_PATH=<software-lib/lib64-path>:${LD_LIBRARY_PATH}
# export PYTHONPATH=<software-python-site-packages>:${PYTHONPATH}
# source <your-software-specific-script>

# --- (Optional) Set related environment variables ---
# export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}         # MUST specify --cpus-per-task above

# --- Run the software ---
# srun <srun-options> ./<software>
# or
# ./<software>

The above job script template consists of five sections:

1. Slurm sbatch header
The #SBATCH macro can be used to specify sbatch options that mostly unchanged, such as partition, time limit, billing account, and so on. For optional options like job name, users can specify them when submitting the script (see Submitting a job). For more details regarding sbatch options, please visit Slurm sbatch.

Mostly, Slurm sbatch options only define and request computing resources that can be used inside a job script. The actual resources used by a software/executable can be different depending on how it will be invoked/issued (see Stage 5). For GPU jobs, we recommend using either --gpus or --gpus-per-node to request GPUs at this stage; additionally, please see GPU binding.

2. Loading modules
It is advised to load every module used when installing the software in the job script, although build dependencies such as CMake, Autotools, and binutils can be omitted. Additionally, those modules should be of the same version as when they were used to compile the program.

3. Adding software paths
The Linux OS will not be able to find your program if it is not in its search paths. The commonly used ones are namely PATH (for executable/binary), LD_LIBRARY_PATH (for shared library), and PYTHONPATH (for python packages). Users MUST append or prepend them using syntax such as export PATH=<software-bin-path>:${PATH}, otherwise, prior search paths added by module load and others will disappear.
If <your-install-location> is where your software is installed, then putting the below commands in your job script should be sufficient in most cases.

export PATH=<your-install-location>/bin:${PATH}
export LD_LIBRARY_PATH=<your-install-location>/lib:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=<your-install-location>/lib64:${LD_LIBRARY_PATH}

Some of them can be omitted if there no such sub-directory when using ls <your-install-location>.

 PYTHONPATH

If the software also generates Python packages, then you may have to check where those packages are installed. Usually, they are in <your-install-location>/lib/python<Version>/site-packages, <your-install-location>/lib64/python<Version>/site-packages or other similar paths. Then, adding a line such as below would inform Python where the packages are.

export PYTHONPATH=<your-install-location>/lib/python<Version>/site-packages:${PYTHONPATH}
 More information
  • If some software dependencies were installed locally, their search paths should also be added.

  • We do NOT recommend specifying these search paths in ~/.bashrc directly, as it could lead to library conflicts when having more than one main software.

  • Some software provides a script to be sourced before using. In this case, sourcing it in your job script should be equivalent to adding its search paths manually by yourself.


When executing your program, if you encounter

  • If 'xxx' is not a typo you can use command-not-found to lookup ..., then, your current PATH variable may be incorrect.

  • xxx: error while loading shared libraries: libXXX.so: cannot open shared object file, then,

    • If libXXX.so seem to be related to your software, then you may set LD_LIBRARY_PATH variable in Step 3 incorrectly.

    • If libXXX.so seem to be from a module you used to build your software, then loading that module should fix the problem.

  • ModuleNotFoundError: No module named 'xxx', then, your current PYTHONPATH may be incorrect.


Preliminary check could be performed on frontend node by doing something like

bash   # You should check them in another bash shell

module purge
module load <...>
module load <...>

export PATH=<software-bin-path>:${PATH}
export LD_LIBRARY_PATH=<software-lib/lib64-path>:${LD_LIBRARY_PATH}
export PYTHONPATH=<software-python-site-packages>:${PYTHONPATH}

<executable> --help
<executable> --version

exit

4. Setting environment variables
Some software requires additional environment variables to be set at runtime; for example, the path to the temporary directory. Parameters set by Slurm sbatch (see Slurm sbatch - output environment variables) could be utilized in setting up software-specific environment variables.
For application with OpenMP threading, OMP_NUM_THREADS, OMP_STACKSIZE, ulimit -s unlimited are commonly set in a job script. An example is shown below.

export XXX_TMPDIR=/scratch/ltXXXXXX-YYYY/${SLURM_JOBID}
export OMP_STACKSIZE="32M"
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
ulimit -s unlimited 

5. Running your software
Each software has its own command to be issued. Please read the software documentation and forum. Special attention should be paid to how the software recognizes and maps computing resources (CPU-MPI-GPU); occasionally, users may need to insert additional input arguments at runtime. The total resources concurrently utilized in this stage should be less than or equal to the resources previously requested in Stage 1. Oversubscribing resources can reduce overall performance and could cause permanent damage to the hardware.

Usually, either srun, mpirun, mpiexec or aprun is required to run MPI programs. On LANTA, srun command MUST be used instead. The table below compares a few options of those commands.

Command

Total MPI processes

CPU per MPI process

MPI processes per node

srun

-n, --ntasks

-c, --cpus-per-task

--ntasks-per-node

mpirun/mpiexec

-n, -np

--map-by socket:PE=N

--map-by ppr:N:node

aprun

-n, --pes

-d, --cpus-per-pe

-N, --pes-per-node

There is usually no need to add options to srun since, by default, Slurm will automatically derive them from sbatch.

 GPU Binding
  1. When using --gpus-per-node or without any options, all tasks on the same node will see the same set of GPU IDs, starting from 0, available on the node. Try

    salloc -p gpu-devel -N2 --gpus-per-node=4 -t 00:05:00 -J "GPU-ID"  # Note: using default --ntasks-per-node=1
    srun nvidia-smi -L
    srun --ntasks-per-node=4 nvidia-smi -L
    srun --ntasks-per-node=2 --gpus-per-node=3 nvidia-smi -L
    exit               # Release salloc
    squeue --me        # Check that no "GPU-ID" job still running 

    In this case, you can use SLURM_LOCALID or others to set CUDA_VISIBLE_DEVICES of each task. For example, you could use a wrapper script mentioned in HPE intro_mpi (Section 1) or you could devise an algorithm and use torch.cuda.set_device in PyTorch as demonstrated here.

  2. On the other hands, when using --gpus-per-task or --ntasks-per-gpu to bind resources, the GPU IDs seen by each task will start from 0 but will be bound to a different GPU/UUID. Try

    salloc -p gpu-devel -N1 --gpus=4 -t 00:05:00 -J "GPU-ID"  # Note: using default --ntasks-per-node=1
    srun --ntasks=4 --gpus-per-task=1 nvidia-smi -L
    srun --ntasks-per-gpu=4 nvidia-smi -L
    exit              # Release salloc
    squeue --me       # Check that no "GPU-ID" job still running 

    However, it is stated in HPE intro_mpi (Section 1) that using these options with CrayMPICH could introduce an intra-node MPI performance drawback.

For multi-threaded applications, it is essential to specify -c or --cpus-per-tasks options for srun to prevent a potential decrease in performance (~50%) due to improper CPU binding.

3.2 Submitting a job

To submit your job script (e.g., job-script.sh) to Slurm, execute sbatch [options] job-script.sh [arguments]. For example,

username@lanta-xname:~> sbatch job-script.sh                 # Simplest
Submitted batch job XXXXXX1
username@lanta-xname:~> sbatch -J <jobname> job-script.sh    # Same as having '#SBATCH -j jobname1' in job-script.sh
Submitted batch job XXXXXX2
username@lanta-xname:~> sbatch -D <your-case-directory> job-script.sh
Submitted batch job XXXXXX3

If your software asks for a case directory where all inputs must be in, you may need to submit your job inside that directory, or use -D, --chdir option.

You can test your initial script on compute-devel or gpu-devel partitions, using #SBATCH -t 02:00:00, since they normally have a shorter queuing time.


Example

Installation guide

Reference

  • No labels