Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
minLevel1
maxLevel2
include
outlinefalse
indent
exclude
typelist
class
printablefalse

...

There are mainly three approaches in preparing an environment on LANTA:

Users should select one over another. They should not be mixed, since library conflicts may occur.

...

Expand
titleMore information
  • Execute cc --version, CC --version, or ftn --version to check which compiler is being used.

  • With PrgEnv-intel loaded, ${MKLROOT} is set to the corresponding Intel Math Kernal Library (MKL).

  • By the defaults of PrgEnv-intel, the C/C++ compiler is ICX/ICPX while the Fortran compiler is IFORT.

  • To use only Intel Classic, execute module swap intel intel-classic after loading PrgEnv-intel

  • To use only Intel oneAPI, execute module swap intel intel-oneapi after loading PrgEnv-intel

  • With PrgEnv-nvhpc loaded, ${NVIDIA_PATH} is set to the corresponding NVIDIA SDK location.

  • There is PrgEnv-nvidia, but it will become deprecated soon, so it is not recommended.

  • With PrgEnv-aocc loaded, ${AOCC_PATH} is set to the corresponding AOCC location.

GPU acceleration

For building an application with GPU acceleration, users can use either PrgEnv-nvhpc, cudatoolkit/<version> or nvhpc-mixed. We recommend using PrgEnv-nvhpc for completeness.

Expand
titleMore information
  • [Feb 2024] According to the information from nvidia-smi, NVIDIA driver version on LANTA is currently525.105.17 which supports CUDA version up to 12.0.

  • Please be aware of the compatibility between CUDA and cray-mpich. For instance, CUDA 12 is supported only since cray-mpich/8.1.27.

  • For cross-compile, you may have to use nvcc -ccbin cc to compile .cu

  • You have to manually pass the respective OpenACC/OpenMP offloading flags of each compiler.

  • You may have to pass -lcudart (cudatoolkit), -L${NVIDIA_PATH}/cuda/lib64 -lcudart -lcuda (nvhpc-mixed) and possibly a few others at linking phase (or by setting them in LDFLAGS before configure).

Build target

To enable optimizations that depend on the hardware architecture of LANTA, the following modules should be loaded together with PrgEnv.

...

Module name

...

Hardware target

...

Note

...

craype-x86-milan

...

AMD EPYC Milan (x86)

...

-

...

craype-accel-nvidia80

...

NVIDIA A100

...

Load after PrgEnv-nvhpc, cudatoolkit or nvhpc-mixed

Expand
titleMore information
  • If craype-x86-milan is not loaded when using cc, CC or ftn, you may get a warning
    No supported cpu target is set, CRAY_CPU_TARGET=x86-64 will be used. Load a valid targeting module or set CRAY_CPU_TARGET

  • With the target modules loaded, -march, -mtune, -hcpu, -haccel, -tp, -gpu, ... will be automatically filled in by cc, CC, ftn. To visualize this, try cc --version -craype-verbose.

Cray optimized libraries

Most Cray optimized libraries become accessible only after loading a PrgEnv, ensuring compatibility with the selected compiler suite. Additionally, some libraries, such as NetCDF, require loading other specific libraries first. Below is the hierarchy of commonly used cray-* modules.

...

Expand
titleExample: PrgEnv-intel + NetCDF
Code Block
languagebash
module purge
module load craype-x86-milan
module load PrgEnv-intel
module load cray-hdf5-parallel
module load cray-netcdf-hdf5parallel
module list
# Without version specified, the latest version will be loaded.

CPE version

To ensure backward compatibility after a system upgrade, it is recommended to fix the Cray Programming Environment version using either cpe/<version> or cpe-cuda/<version>. Otherwise, the most recent version will be loaded by default.

Expand
titleNote: unload cpe / cpe-cuda

When unloading cpe or cpe-cuda, you will encounter,

Unloading the cpe module is insufficient to restore the system defaults.
Please run 'source /opt/cray/pe/cpe/<version>/restore_lmod_system_defaults.[csh|sh]'.

This simply means that the remaining modules are still of the cpe/<version> that you just unloaded. They are not automatically reverted back to the system default version. It is necessary to execute the above command to manually reload and restore them to the defaults.

On the other hand, if you use module purge, you can safely ignore the above message.

1.2 ThaiSC pre-built modules

For user convenience, we provide several shared modules of some widely used software and libraries. These modules were built on top of the HPE Cray Programming Environment, using the CPE toolchain.

CPE toolchain

A CPE toolchain module is a bundle of craype-x86-milan, PrgEnv-<compiler> and cpe-cuda/<version>. The module is defined as a toolchain for convenience and for use with EasyBuild, the framework used for installing most ThaiSC modules.

...

title [Feb 2024] Current CPE toolchains

...

CPE toolchain

...

Note

...

cpeGNU/23.03

...

GCC 11.2.0

...

cpeCray/23.03

...

CCE 15.0.1

...

cpeIntel/23.03

...

Deprecated and hidden. It will be removed in the future.

...

cpeIntel/23.09

...

Intel Compiler 2023.1.0

ThaiSC modules

All ThaiSC modules are located at the same module path, so there is no module hierarchy. Executing module avail on LANTA will display all available ThaiSC modules. For a more concise list, you can use module overview, then, use module whatis <name> or module help <name> to learn more about a specific module.

Users can readily use ThaiSC modules and CPE toolchains to build their applications. Some popular application software are pre-installed as well, for more information, refer to Applications usage.

Code Block
languagebash
username@lanta-xname:~> module overview ------------------------------ /lustrefs/disk/modules/easybuild/modules/all ------------------------------ ADIOS2 (2) EasyBuild (1) MAFFT (1) Trimmomatic (1) libaec (1) Amber (1) Eigen (1) METIS (2) VASP (3) libdeflate (1) Apptainer (1) FDS (1) MPC (1) VCFtools (1) libgeotiff (1) Armadillo (1) FastQC (1) MPFR (1) WPS (2) libiconv (1) AutoDock-vina (1) FortranGIS (1) Mamba (1) WRF (1) libjpeg-turbo (2) Autoconf (1) FreeXL (1) MrBayes (1) WRFchem (1) libpciaccess (1) Automake (1) GATK

Code Block
languagebash
--------------------------------- /opt/cray/pe/lmod/modulefiles/core ---------------------------------
PrgEnv-aocc   (2)   cce          (3)   cray-libpals (3)   craypkg-gen   (2)   nvhpc          (2)
PrgEnv-cray   (2)   cpe-cuda     (3)   cray-libsci  (3)   cudatoolkit   (6)   nvidia         (2)
PrgEnv-gnu    (2)   cpe          (3)   cray-mrnet   (2)   gcc           (3)   papi           (3)
PrgEnv-intel  (2)   cray-R       (2)   cray-pals    (3)   gdb4hpc       (3)   perftools-base (3)
PrgEnv-nvhpc  (2)   cray-ccdb    (2)   cray-pmi     (3)   intel-classic (2)   sanitizers4hpc (2)
PrgEnv-nvidia (2)   cray-cti     (5)   cray-python  (2)   intel-oneapi  (2)   valgrind4hpc   (3)
aocc          (2)   cray-dsmml   (1)   cray-stat    (2)   intel         (2)
atp           (3)   cray-dyninst (2)   craype       (3)   iobuf         (1)

--------------------------------- /opt/cray/pe/lmod/modulefiles/craype-targets/default ----------------------------------
craype-x86-milan        (1)     craype-accel-nvidia80   (1)      ... other modules ...

GPU acceleration

For building an application with GPU acceleration, users can use either PrgEnv-nvhpc, cudatoolkit/<version> or nvhpc-mixed. We recommend using PrgEnv-nvhpc for completeness.

Expand
titleMore information
  • [Feb 2024] According to the information from nvidia-smi, NVIDIA driver version on LANTA is currently525.105.17 which supports CUDA version up to 12.0.

  • Please be aware of the compatibility between CUDA and cray-mpich. For instance, CUDA 12 is supported only since cray-mpich/8.1.27.

  • For cross-compile, you may have to use nvcc -ccbin cc to compile .cu

  • You have to manually pass the respective OpenACC/OpenMP offloading flags of each compiler.

  • You may have to pass -lcudart (cudatoolkit), -L${NVIDIA_PATH}/cuda/lib64 -lcudart -lcuda (nvhpc-mixed) and possibly a few others at linking phase (or by setting them in LDFLAGS before configure).

Build target

To enable optimizations that depend on the hardware architecture of LANTA, the following modules should be loaded together with PrgEnv.

Module name

Hardware target

Note

craype-x86-milan

AMD EPYC Milan (x86)

-

craype-accel-nvidia80

NVIDIA A100

Load after PrgEnv-nvhpc, cudatoolkit or nvhpc-mixed

Expand
titleMore information
  • If craype-x86-milan is not loaded when using cc, CC or ftn, you may get a warning
    No supported cpu target is set, CRAY_CPU_TARGET=x86-64 will be used. Load a valid targeting module or set CRAY_CPU_TARGET

  • With the target modules loaded, -march, -mtune, -hcpu, -haccel, -tp, -gpu, ... will be automatically filled in by cc, CC, ftn. To visualize this, try cc --version -craype-verbose.

Cray optimized libraries

Most Cray optimized libraries become accessible only after loading a PrgEnv, ensuring compatibility with the selected compiler suite. Additionally, some libraries, such as NetCDF, require loading other specific libraries first. Below is the hierarchy of commonly used cray-* modules.

...

Expand
titleExample: PrgEnv-intel + NetCDF
Code Block
languagebash
module purge
module load craype-x86-milan
module load PrgEnv-intel
module load cray-hdf5-parallel
module load cray-netcdf-hdf5parallel
module list
# Without version specified, the latest version will be loaded.

CPE version

To ensure backward compatibility after a system upgrade, it is recommended to fix the Cray Programming Environment version using either cpe/<version> or cpe-cuda/<version>. Otherwise, the most recent version will be loaded by default.

Expand
titleNote: unload cpe / cpe-cuda

When unloading cpe or cpe-cuda, you will encounter,

Unloading the cpe module is insufficient to restore the system defaults.
Please run 'source /opt/cray/pe/cpe/<version>/restore_lmod_system_defaults.[csh|sh]'.

This simply means that the remaining modules are still of the cpe/<version> that you just unloaded. They are not automatically reverted back to the system default version. It is necessary to execute the above command to manually reload and restore them to the defaults.

On the other hand, if you use module purge, you can safely ignore the above message.

1.2 ThaiSC pre-built modules

For user convenience, we provide several shared modules of some widely used software and libraries. These modules were built on top of the HPE Cray Programming Environment, using the CPE toolchain.

CPE toolchain

A CPE toolchain module is a bundle of craype-x86-milan, PrgEnv-<compiler> and cpe-cuda/<version>. The module is defined as a toolchain for convenience and for use with EasyBuild, the framework used for installing most ThaiSC modules.

Expand
title [Feb 2024] Current CPE toolchains

CPE toolchain

Note

cpeGNU/23.03

GCC 11.2.0

cpeCray/23.03

CCE 15.0.1

cpeIntel/23.03

Deprecated and hidden. It will be removed in the future.

cpeIntel/23.09

Intel Compiler 2023.1.0

ThaiSC modules

All ThaiSC modules are located at the same module path, so there is no module hierarchy. Executing module avail on LANTA will display all available ThaiSC modules. For a more concise list, you can use module overview, then, use module spider <name> to learn more about each specific module.

Users can readily use ThaiSC modules and CPE toolchains to build their applications. Some popular application software are pre-installed as well, for more information, refer to Applications usage.

Code Block
languagebash
username@lanta-xname:~> module overview
--------------------------------- /lustrefs/disk/modules/easybuild/modules/all ---------------------------------
ADIOS2        (2)   GATK                  (1)   NASM            (1)   NASM  Tcl          (12)   XZgroff           (32)
ATK  libpng        (3)
Autotools     (12)   GDAL       (1)   Nextflow        (2)   NLopt  Xerces-C++   (1)   libreadline   (32) BCFtools  Tk    (1)   GEOS    (2)   (1)hwloc   OpenEXR         (1)
Amber  aws-ofi-nccl (1)   libtirpc   (1)   (1)GEOS BEDTools      (1)   GMP        (2)   OpenFOAMNextflow        (2)   beagle-lib Trimmomatic  (1)   libtool intltool      (1)
Apptainer BLAST+    (1)   GLM (1)   GROMACS    (2)   OpenJPEG        (12)   binutilsNinja     (1)   libunwind     (1) BLASTDB  UDUNITS2     (1)   GSL jbigkit       (12)
Armadillo  OpenSSL   (2)   GLib       (1)   bzip2        (32)   OSPRay   libxml2       (2) BWA  VASP         (13)   GaussianlibGLU   (1)   PCRE  (2)
AutoDock-vina (1)   GMP       (1)   cURL         (13)   lz4      OpenCASCADE     (32)  BamTools VCFtools     (1)   Go libaec        (12)
Autoconf  PCRE2    (1)     GObject-Introspection  (12)   canuOpenEXR         (12)   WPS minimap2      (1) Beast  (2)   libdeflate    (12)
Automake  HDF-EOS    (1)   GROMACS   PROJ            (12)   OpenFOAM  cpeCray      (12)   ncclWRF          (1)  Bison libdrm        (12)
Autotools  HDF   (1)   GSL  (1)   ParallelIO      (1)   cpeGNU       (13)   OpenJPEG ncurses       (12)
Boost   WRFchem      (42)   libepoxy HTSlib     (12)
BCFtools   Perl   (1)   Gaussian      (1)   cpeIntel     (1)   OpenMPI  numactl       (1) Bowtie  Wayland      (12)   libffi  HYPRE      (1)
BEDTools  PostgreSQL      (1)   GenericIO ecCodes      (1)   pkgconf   (2)   OpenSSL (1) Bowtie2       (1)   ICUX11          (32)   libgeotiff   QuantumESPRESSO (2)
  expatBLAST+        (21)   Gmsh tbb           (1) C-Blosc2      (1)   ImathOpenTURNS       (12)   RAxML-NGXZ        (1)   flex(3)   libglvnd      (12)
BLASTDB   xorg-macros    (1)  CFITSIO Go       (1)   JasPer     (2)   SAMtools  (1)   PCRE   (1)   gettext      (1)   Xerces-C++ zfp  (1)   libiconv      (2)
BWA CGAL          (1)   JavaHDF-EOS       (2)   SCOTCH     (2)   PCRE2  (1)   git-lfs      (1)   zlib Yasm         (21)  CMake libjpeg-turbo (3)
BamTools      (21)   KaHIPHDF      (1)   SPAdes             (2)   PDAL            (1)   arpack-ng gperftools   (12)   libpciaccess  zstd(2)
Beast         (31)   HTSlib CrayNVHPC     (1)   LMDB       (1)   PETSc SQLite          (12)   groffassimp        (12) DB  libpng        (3)
Bison  (1)   LibTIFF    (1)   HYPRE     SWIG            (2)   hwlocPROJ            (12) ESMF  at-spi2-atk  (2)   libreadline   (12)
Blosc    M4         (1)   Tcl             (1)   jbigkit      (1)
Expand
titleExample: Boost/1.81.0-cpeGNU-23.03
Code Block
languagebash
module purge
module load Boost/1.81.0-cpeGNU-23.03
echo ${CPATH}
echo ${LIBRARY_PATH}
echo ${LD_LIBRARY_PATH}

2. Building an application software

After an appropriate environment is loaded, this section provides guidelines on how to use it to build an application software on LANTA.

2.1 Compiler wrapper

...

<wrapper> command

...

Description

...

Manual

...

In substitution for

...

cc

...

C compiler wrapper

...

man cc or cc --help

...

mpicc / mpiicc

...

CC

...

C++ compiler wrapper

...

man CC or CC --help

...

mpic++ / mpiicpc

...

ftn

...

Fortran compiler wrapper

...

man ftn or ftn --help

...

mpif90 / mpiifort

The Cray compiler wrappers, namely, cc, CC and ftn, become available after loading any PrgEnv-<compiler> or CPE toolchain. Upon being invoked, the wrapper will pass relevant information about the cray-* libraries, loaded in the current environment, to the underlying <compiler> to compile source code. It is recommended to use these wrappers for building MPI applications with the native Cray MPICH library cray-mpich.

Adding -craype-verbose to the wrapper when compiling a source file will display the final command executed. To see what will be added before compiling, try <wrapper> --cray-print-opts=all.

Expand
titleMore information
Code Block
languagebash
module purge
module load GDAL/3.6.4-cpeIntel-23.09
cc --cray-print-opts=all

The output of the final command indicates that include search paths (-I), library search paths (-L), and linking flags (-l<lib> such as -lnetcdf) of cray-* library are taken care of automatically by the wrapper; there is no need to manually pass them.

For ThaiSC libraries, their include and library search paths are stored in ${CPATH}, ${LIBRARY_PATH} and ${LD_LIBRARY_PATH}, and users have to manually add appropriate linking flags (-l<lib>). However, these steps should be covered automatically if using a build tool.
     (2)   HarfBuzz              (2)   Pango           (2)   at-spi2-core (2)   libtirpc      (2)
Boost         (4)   ICU                   (3)   ParFlow         (1)   aws-ofi-nccl (1)   libtool       (1)
Bowtie        (1)   Imath                 (2)   ParMETIS        (2)   beagle-lib   (1)   libunwind     (2)
Bowtie2       (1)   JasPer                (3)   ParaView        (1)   binutils     (1)   libxml2       (3)
Brotli        (2)   Java                  (2)   ParallelIO      (1)   bzip2        (3)   lz4           (3)
C-Blosc2      (2)   KaHIP                 (2)   Perl            (2)   cURL         (1)   minimap2      (1)
CFITSIO       (2)   LAME                  (1)   PostgreSQL      (2)   cairo        (2)   nccl          (1)
CGAL          (2)   LLVM                  (1)   QuantumESPRESSO (2)   canu         (1)   ncurses       (2)
CMake         (2)   LMDB                  (1)   RAxML-NG        (1)   cpeCray      (1)   nlohmann_json (1)
CrayNVHPC     (1)   LibTIFF               (2)   SAMtools        (1)   cpeGNU       (1)   numactl       (1)
DB            (2)   M4                    (1)   SCOTCH          (2)   cpeIntel     (1)   pixman        (1)
DBus          (1)   MAFFT                 (1)   SDL2            (2)   ecCodes      (2)   pkgconf       (1)
ESMF          (2)   METIS                 (2)   SLEPc           (2)   expat        (2)   tbb           (1)
EasyBuild     (1)   MPC                   (2)   SPAdes          (1)   flex         (1)   termcap       (1)
Eigen         (1)   MPFR                  (2)   SQLite          (2)   fontconfig   (2)   x264          (1)
FDS           (1)   MUMPS                 (3)   SWIG            (3)   freetype     (2)   x265          (1)
FFmpeg        (2)   Mako                  (2)   SYCL            (1)   gettext      (1)   xorg-macros   (2)
FastQC        (1)   Mamba                 (1)   SZ              (2)   git-lfs      (1)   xprop         (2)
FortranGIS    (2)   Mesa                  (2)   SpectrA         (1)   googletest   (1)   zfp           (2)
FreeXL        (2)   Meson                 (2)   SuiteSparse     (2)   gperf        (1)   zlib          (2)
FriBidi       (2)   MrBayes               (1)   SuperLU_DIST    (2)   gperftools   (2)   zstd          (3)
Expand
titleExample: Boost/1.81.0-cpeGNU-23.03
Code Block
languagebash
module purge
module load Boost/1.81.0-cpeGNU-23.03
echo ${CPATH}
echo ${LIBRARY_PATH}
echo ${LD_LIBRARY_PATH}

...

2. Building an application software

After an appropriate environment is loaded, this section provides guidelines on how to use it to build an application software on LANTA.

2.1 Compiler wrapper

<wrapper> command

Description

Manual

In substitution for

cc

C compiler wrapper

man cc or cc --help

mpicc / mpiicc

CC

C++ compiler wrapper

man CC or CC --help

mpic++ / mpiicpc

ftn

Fortran compiler wrapper

man ftn or ftn --help

mpif90 / mpiifort

The Cray compiler wrappers, namely, cc, CC and ftn, become available after loading any PrgEnv-<compiler> or CPE toolchain. Upon being invoked, the wrapper will pass relevant information about the cray-* libraries, loaded in the current environment, to the underlying <compiler> to compile source code. It is recommended to use these wrappers for building MPI applications with the native Cray MPICH library cray-mpich.

Adding -craype-verbose to the wrapper when compiling a source file will display the final command executed. To see what will be added before compiling, try <wrapper> --cray-print-opts=all.

Expand
titleMore information
Code Block
languagebash
module purge
module load GDAL/3.6.4-cpeIntel-23.09
cc --cray-print-opts=all

The output of the final command indicates that include search paths (-I), library search paths (-L), and linking flags (-l<lib> such as -lnetcdf) of cray-* library are taken care of automatically by the wrapper; there is no need to manually pass them.

For ThaiSC libraries, their include and library search paths are stored in ${CPATH}, ${LIBRARY_PATH} and ${LD_LIBRARY_PATH}, and users have to manually add appropriate linking flags (-l<lib>). However, these steps should be covered automatically if using a build tool.

Expand
titleExample: HelloWorld.c
Code Block
languagebash
# Create the source code
cat << Eof > HelloWorld.c
#include <stdio.h>
#include <mpi.h>
#include <omp.h>

int main(){
    int rank, thread ;
    omp_lock_t io_lock ;
    omp_init_lock(&io_lock);

    MPI_Init(NULL, NULL);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
    #pragma omp parallel private(thread)
    {
        thread = omp_get_thread_num();
        omp_set_lock(&io_lock);
        printf("HelloWorld from rank %d thread %d \n", rank, thread);
        omp_unset_lock(&io_lock);
    }

    omp_destroy_lock(&io_lock);
    MPI_Finalize();
    
return 0; }
Eof

module purge
module load craype-x86-milan
module load PrgEnv-intel
# or
# module load cpeIntel/23.09

# Compile the source file 'HelloWorld.c'
cc -craype-verbose -o hello.exe HelloWorld.c -fopenmp

# Run the program 
srun -p compute-devel -N1 -n4 -c2 hello.exe
Expand
titleExample: HelloWorld.ccpp
Code Block
languagebash
# Create the source code
cat << Eof > HelloWorld.ccpp
#include <stdio.h><iostream>
#include <mpi.h>
#include <omp.h>

int main(){
    int rank, thread ;
    omp_lock_t io_lock ;
    omp_init_lock(&io_lock);

    MPI_Init(NULLnullptr, NULLnullptr);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
    #pragma omp parallel private(thread)
    {
        thread = omp_get_thread_num();
        omp_set_lock(&io_lock);
	    std::cout << "HelloWorld from rank printf("HelloWorld from<< rank << %d" thread %d \n", rank,<< thread) << std::endl;
        omp_unset_lock(&io_lock);
    }

    omp_destroy_lock(&io_lock);
    MPI_Finalize();
    
return 0; }
Eof

module purge
module load craype-x86-milan
module load PrgEnv-intel
# or
# module load cpeIntel/23.09

# Compile the source file 'HelloWorld.ccpp'
ccCC -craype-verbose -o hello.exe HelloWorld.ccpp -fopenmp

# Run the program 
srun -p compute-devel -N1 -n4 -c2 hello.exe
Expand
titleExample: HelloWorld.cppf90
Code Block
languagebash
# Create the source code
cat << Eof > HelloWorld.cpp
#include <iostream>
#include <mpi.h>
#include <omp.h>

int main(){f90
program helloworld_mpi_omp
    use mpi
    use omp_lib

    implicit none
    integer int:: rank, thread, provided, ;ierr
    integer(kind=omp_lock_tkind) :: io_lock

;    call omp_init_lock(&io_lock);
    call MPI_Init(nullptr,nullptrierr);
    call MPI_Comm_rank(MPI_COMM_WORLD, &rank, ierr);

        #pragma omp!\$omp parallel private(thread)
    {
 
      thread = omp_get_thread_num();

    call   omp_set_lock(&io_lock);
	    std::cout << "print *, 'HelloWorld from rank "', << rank, << "' thread " <<', thread << std::endl;
       call omp_unset_lock(&io_lock);
    }!\$omp end parallel

    call omp_destroy_lock(&io_lock);
    call MPI_Finalize();
    
return 0; }
Eofierr)

end program helloworld_mpi_omp
Eof
# Note: change !\$omp to !$omp if you type the above code by hand.

module purge
module load craype-x86-milan
module load PrgEnv-intel
# or
# module load cpeIntel/23.09

# Compile the source file 'HelloWorld.cpp'
ftn -craype-verbose -o hello.exe HelloWorld.f90 -fopenmp

# CompileRun the source file 'HelloWorld.cpp'
CC -craype-verbose -o hello.exe HelloWorld.cpp -fopenmp

# Run the program 
srun -p compute-devel -N1 -n4 -c2 hello.exe
Expand
titleExample: HelloWorld.f90
Code Block
languagebash
# Create the source code
cat << Eof > HelloWorld.f90
program helloworld_mpi_omp
    use mpi
    use omp_lib

    implicit none
    integer :: rank, thread, provided, ierr
    integer(kind=omp_lock_kind) :: io_lock

    call omp_init_lock(io_lock)
    call MPI_Init(ierr)
    call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)

    !\$omp parallel private(thread)
    thread = omp_get_thread_num()

    call omp_set_lock(io_lock)
    print *, 'HelloWorld from rank ', rank, ' thread ', thread
    call omp_unset_lock(io_lock)
    !\$omp end parallel

    call omp_destroy_lock(io_lock)
    call MPI_Finalize(ierr)

end program helloworld_mpi_omp
Eof
# Note: change !\$omp to !$omp if you type the above code by hand.

module purge
module load craype-x86-milan
module load PrgEnv-intel
# or
# module load cpeIntel/23.09

# Compile the source file 'HelloWorld.cpp'
ftn -craype-verbose -o hello.exe HelloWorld.f90 -fopenmp

# Run the program 
srun -p compute-devel -N1 -n4 -c2 hello.exe

2.2 Build tools

Several tools exist to help us build large and complex programs. Among them, GNU make and CMake are commonly used. The developer team for each software chooses what build tool they will support. Therefore, it is important to thoroughly read the software documentation. For some software, users might need to additionally load the latest CMake or Autotools modules on the system (e.g., module load CMake/3.26.4).

There are three general stages in building a program using a build tool: configure, make and make install. For more information, see Basic Installation.

Build tools typically detect compilers through environment variables such as CC, CXX, and FC at the configure stage. Therefore, setting these variables before running configure should be sufficient to make the tool use the Cray compiler wrappers.

Code Block
languagebash
export CC=cc CXX=CC FC=ftn F77=ftn F90=ftn
# ./configure --prefix=<your-install-location> ...
# or 
# cmake -DCMAKE_INSTALL_PREFIX=<your-install-location> ...

Nevertheless, if the CMake cache is not clean, you might need to explicitly use:

Code Block
languagebash
cmake -DCMAKE_C_COMPILER=cc -DCMAKE_CXX_COMPILER=CC -DCMAKE_Fortran_COMPILER=ftn -DCMAKE_INSTALL_PREFIX=<your-install-location> ...

We encourage users to manually specify the installation path using --prefix= or -DCMAKE_INSTALL_PREFIX= as shown above. This path can be within your project home, such as /project/ltXXXXXX-YYYY/<software-name>, allowing you to manage permissions and share the installed software with your project members. By default on LANTA, your team will be able to read and execute your software but cannot make any changes inside the directory you own.

After these steps, you should be able to execute make and make install, then build your software as you would on any other system.

Expand
titleExample: libjpeg-turbo/3.0.2-cpeCray-23.03
Code Block
languagebash
module purge
module load cpeCray/23.03
module load NASM/2.16.01
module load CMake/3.26.4

wget https://github.com/libjpeg-turbo/libjpeg-turbo/releases/download/3.0.2/libjpeg-turbo-3.0.2.tar.gz
tar xzf libjpeg-turbo-3.0.2.tar.gz
cd libjpeg-turbo-3.0.2
PARENT_DIR=$(pwd)
mkdir build ; cd build

export CC=cc CXX=CC FC=ftn F77=ftn F90=ftn
# or, for verbosity,
#export CC="cc -craype-verbose" CXX="CC -craype-verbose" FC="ftn -craype-verbose" F77="ftn -craype-verbose" F90="ftn -craype-verbose"

cmake -DCMAKE_INSTALL_PREFIX=${PARENT_DIR}/cpeCray-23.03 -G"Unix Makefiles" -DWITH_JPEG8=1 ..
make
make install

cd ${PARENT_DIR}/cpeCray-23.03
ln -s ./lib64 ./lib               # Optional, but recommended

2.3 Related topics

Local module & EasyBuild

A separate page is dedicated for explaining how to manage and install local modules in the user’s home/project paths using EasyBuild → Local module & EasyBuild (In progress)

Useful compiler flags

3. Running the software

Every main application software must run on compute/gpu/memory nodes. The recommended approach is to write a job script and send it to Slurm scheduler through sbatch command.

Note

Only use sbatch <job-script>. If users execute bash <job-script> or just simply ./<job-script>, then the script will not get sent to Slurm and will run on frontend node instead!

3.1 Writing a job script

Code Block
languagebash
#!/bin/bash #SBATCH -p gpu # Partition #SBATCH -N 1 # Number of nodes #SBATCH --gpus=4 # Number of GPU cards #SBATCH --ntasks=4 # Number of MPI processes #SBATCH --cpus-per-task=16 # Number of OpenMP threads per MPI process #SBATCH -t 5-00:00:00 # Job runtime limit #SBATCH -A ltXXXXXX # Billing account # #SBATCH -J <JobName> # Job name module purge # --- Load necessary modules --- module load <...> module load <...> # --- Add software to Linux search paths --- export PATH=<software-bin-path>
 program 
srun -p compute-devel -N1 -n4 -c2 hello.exe

2.2 Build tools

Several tools exist to help us build large and complex programs. Among them, GNU make and CMake are commonly used. The developer team for each software chooses what build tool they will support. Therefore, it is important to thoroughly read the software documentation. For some software, users might need to additionally load the latest CMake or Autotools modules on the system (e.g., module load CMake/3.26.4).

There are three general stages in building a program using a build tool: configure, make and make install. For more information, see Basic Installation.

Build tools typically detect compilers through environment variables such as CC, CXX, and FC at the configure stage. Therefore, setting these variables before running configure should be sufficient to make the tool use the Cray compiler wrappers.

Code Block
languagebash
export CC=cc CXX=CC FC=ftn F77=ftn F90=ftn
# ./configure --prefix=<your-install-location> ...
# or 
# cmake -DCMAKE_INSTALL_PREFIX=<your-install-location> ...

Nevertheless, if the CMake cache is not clean, you might need to explicitly use:

Code Block
languagebash
cmake -DCMAKE_C_COMPILER=cc -DCMAKE_CXX_COMPILER=CC -DCMAKE_Fortran_COMPILER=ftn -DCMAKE_INSTALL_PREFIX=<your-install-location> ...

We encourage users to manually specify the installation path using --prefix= or -DCMAKE_INSTALL_PREFIX= as shown above. This path can be within your project home, such as /project/ltXXXXXX-YYYY/<software-name>, allowing you to manage permissions and share the installed software with your project members. By default on LANTA, your team will be able to read and execute your software but cannot make any changes inside the directory you own.

After these steps, you should be able to execute make and make install, then build your software as you would on any other system.

Expand
titleExample: libjpeg-turbo/3.0.2-cpeCray-23.03
Code Block
languagebash
module purge
module load cpeCray/23.03
module load NASM/2.16.01
module load CMake/3.26.4

wget https://github.com/libjpeg-turbo/libjpeg-turbo/releases/download/3.0.2/libjpeg-turbo-3.0.2.tar.gz
tar xzf libjpeg-turbo-3.0.2.tar.gz
cd libjpeg-turbo-3.0.2
PARENT_DIR=$(pwd)
mkdir build ; cd build

export CC=cc CXX=CC FC=ftn F77=ftn F90=ftn
# or, for verbosity,
#export CC="cc -craype-verbose" CXX="CC -craype-verbose" FC="ftn -craype-verbose" F77="ftn -craype-verbose" F90="ftn -craype-verbose"

cmake -DCMAKE_INSTALL_PREFIX=${PARENT_DIR}/cpeCray-23.03 -G"Unix Makefiles" -DWITH_JPEG8=1 ..
make
make install

cd ${PARENT_DIR}/cpeCray-23.03
ln -s ./lib64 ./lib               # Optional, but recommended

2.3 Related topics

...

3. Running the software

Every main application software must run on compute/gpu/memory nodes. The recommended approach is to write a job script and send it to Slurm scheduler through sbatch command.

Note

Only use sbatch <job-script>. If users execute bash <job-script> or just simply ./<job-script>, then the script will not get sent to Slurm and will run on frontend node instead!

3.1 Writing a job script

Code Block
languagebash
#!/bin/bash
#SBATCH -p gpu                 # Partition
#SBATCH -N 1                   # Number of nodes
#SBATCH --gpus=4               # Number of GPU cards
#SBATCH --ntasks=4             # Number of MPI processes
#SBATCH --cpus-per-task=16     # Number of OpenMP threads per MPI process
#SBATCH -t 5-00:00:00          # Job runtime limit
#SBATCH -A ltXXXXXX            # Billing account 
# #SBATCH -J <JobName>         # Job name

module purge
# --- Load necessary modules ---
module load <...>
module load <...>

# --- Add software to Linux search paths ---
export PATH=<software-bin-path>:${PATH}
export LD_LIBRARY_PATH=<software-lib/lib64-path>:${LD_LIBRARY_PATH}
# export PYTHONPATH=<software-python-site-packages>:${PYTHONPATH}
# source <your-software-specific-script>

# --- (Optional) Set related environment variables ---
# export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}         # MUST specify --cpus-per-task above

# --- Run the software ---
# srun <srun-options> ./<software>
# or
# ./<software>

The above job script template consists of five sections:

1. Slurm sbatch header

Anchor
SbatchHeader
SbatchHeader
isMissingRequiredParameterstrue

The #SBATCH directives can be used to specify sbatch options that mostly unchanged, such as partition, time limit, billing account, and so on. For optional options like job name, users can specify them when submitting the script (see Submitting a job). For more details regarding sbatch options, please visit Slurm sbatch.

Mostly, Slurm sbatch options only define and request computing resources that can be used inside a job script. The actual resources used by a software/executable can be different depending on how it will be invoked/issued (see Stage 5), although these sbatch options are passed and become the default options for it. For GPU jobs, using either --gpus or --gpus-per-node to request GPUs at this stage will provide the most flexibility for the next stage, GPU binding.

If your application software only supports parallelization by multi-threading, then your software cannot utilize resources across nodes; in this case, therefore, -N, -n, --ntasks and --ntasks-per-node should be set to 1.

2. Loading modules
It is advised to load every module used when installing the software in the job script, although build dependencies such as CMake, Autotools, and binutils can be omitted. Additionally, those modules should be of the same version as when they were used to compile the program.

3. Adding software paths
The Linux OS will not be able to find your program if it is not in its search paths. The commonly used ones are namely PATH (for executable/binary), LD_LIBRARY_PATH (for shared library), and PYTHONPATH (for python packages). Users MUST append or prepend them using syntax such as export PATH=<software-bin-path>:${PATH}, otherwise, prior search paths added by module load and others will disappear.
If <your-install-location> is where your software is installed, then putting the below commands in your job script should be sufficient in most cases.

Code Block
languagebash
export PATH=<your-install-location>/bin:${PATH}
export LD_LIBRARY_PATH=<software<your-install-location>/lib/lib64-path>:${LD_LIBRARY_PATH}
#
export PYTHONPATH=<software-python-site-packages>LD_LIBRARY_PATH=<your-install-location>/lib64:${PYTHONPATH}
# source <your-software-specific-script>

# --- (Optional) Set related environment variables ---
# export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}         # MUST specify --cpus-per-task above

# --- Run the software ---
# srun <srun-options> ./<software>
# or
# ./<software>

The above job script template consists of five sections:

...

It should be noted that Slurm sbatch options only define and request computing resources that can be used inside a job script. The actual resources used by a software/executable can be different depending on how it will be invoked/issued (see Stage 5).

2. Loading modules
It is advised to load every module used when installing the software in the job script, although build dependencies such as CMake, Autotools, and binutils can be omitted. Additionally, those modules should be of the same version as when they were used to compile the program.

...

LD_LIBRARY_PATH}

Some of them can be omitted if there no such sub-directory when using ls <your-install-location>.

Expand
titlePYTHONPATH

If the software also generates Python packages, then you may have to check where those packages are installed. Usually, they are in <your-install-location>/lib/python<Version>/site-packages, <your-install-location>/lib64/python<Version>/site-packages or other similar paths. Then, adding a line such as below would inform Python where the packages are.

Code Block
languagebash
export PYTHONPATH=<your-install-location>/lib/python<Version>/site-packages:${PYTHONPATH}
Expand
titleMore information
  • If some software dependencies were installed locally, their search paths should also be added.

  • We do NOT recommend specifying these search paths in ~/.bashrc directly, as it could lead to internal conflicts when having more than one main software.

  • Some software provides a script to be sourced before using. In this case, sourcing it in your job script should be equivalent to adding its search paths manually yourself.


When executing your program, if you encounter

  • If 'xxx' is not a typo you can use command-not-found to lookup ..., then, your current PATH variable may be incorrect.

  • xxx: error while loading shared libraries: libXXX.so: cannot open shared object file, then,

    • If libXXX.so seem to be related to your software, then you may set LD_LIBRARY_PATH variable in Step 3 incorrectly.

    • If libXXX.so seem to be from a module you used to build your software, then loading that module should fix the problem.

  • ModuleNotFoundError: No module named 'xxx', then, your current PYTHONPATH may be incorrect.


Preliminary check could be performed on frontend node by doing something like

Code Block
languagebash
bash   # You should check them in another bash shell

module purge
module load <...>
module load <...>

export PATH=<software-bin-path>:${PATH}
export LD_LIBRARY_PATH=<software-lib/lib64-path>:${LD_LIBRARY_PATH}
export PYTHONPATH=<software-python-site-packages>:${PYTHONPATH}

<executable> --help
<executable> --version

exit

4. Setting environment variables
Some software requires additional environment variables to be set at runtime; for example, the path to the temporary directory. Output environment variables set by Slurm sbatch (see Slurm sbatch - output environment variables) could be used to set software-specific parameters.
For application with OpenMP threading, OMP_NUM_THREADS, OMP_STACKSIZE, ulimit -s unlimited are commonly set in a job script. An example is shown below.

Code Block
languagebash
export PATH=<your-install-location>/bin:${PATHXXX_TMPDIR=/scratch/ltXXXXXX-YYYY/${SLURM_JOBID}
export LDOMP_LIBRARY_PATH=<your-install-location>/lib:${LD_LIBRARY_PATH}STACKSIZE="32M"
export LD_LIBRARY_PATH=<your-install-location>/lib64:${LD_LIBRARY_PATH}

Some of them can be omitted if there no such sub-directory when using ls <your-install-location>.

Expand
titlePYTHONPATH

If the software also generates Python packages, then you may have to check where those packages are installed. Usually, they are in <your-install-location>/lib/python<Version>/site-packages, <your-install-location>/lib64/python<Version>/site-packages or other similar paths. Then, adding a line such as below would inform Python where the packages are.

Code Block
languagebash
export PYTHONPATH=<your-install-location>/lib/python<Version>/site-packages:${PYTHONPATH}

...

titleMore information
  • If some software dependencies were installed locally, their search paths should also be added.

  • We do NOT recommend specifying these search paths in ~/.bashrc directly, as it could lead to library conflicts when having more than one main software.

  • Some software provides a script to be sourced before using. In this case, sourcing it in your job script should be equivalent to adding its search paths manually by yourself.

When executing your program, if you encounter

  • If 'xxx' is not a typo you can use command-not-found to lookup ..., then, your current PATH variable may be incorrect.

  • xxx: error while loading shared libraries: libXXX.so: cannot open shared object file, then,

    • If libXXX.so seem to be related to your software, then you may set LD_LIBRARY_PATH variable in Step 3 incorrectly.

    • If libXXX.so seem to be from a module you used to build your software, then loading that module should fix the problem.

  • ModuleNotFoundError: No module named 'xxx', then, your current PYTHONPATH may be incorrect.

Preliminary check could be performed on frontend node by doing something like

Code Block
languagebash
bash   # You should check them in another bash shell

module purge
module load <...>
module load <...>

export PATH=<software-bin-path>:${PATH}
export LD_LIBRARY_PATH=<software-lib/lib64-path>:${LD_LIBRARY_PATH}
export PYTHONPATH=<software-python-site-packages>:${PYTHONPATH}

<executable> --help
<executable> --version

exit

4. Setting environment variables
Some software requires additional environment variables to be set at runtime; for example, the path to the temporary directory. Parameters set by Slurm sbatch (see Slurm sbatch - output environment variables) could be utilized in setting up software-specific environment variables.
For application with OpenMP threading, OMP_NUM_THREADS, OMP_STACKSIZE, ulimit -s unlimited are commonly set in a job script. An example is shown below.

Code Block
languagebash
export XXX_TMPDIR=/scratch/ltXXXXXX-YYYY/${SLURM_JOBID}
export OMP_STACKSIZE="32M"
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
ulimit -s unlimited 

...

Usually, either srun, mpirun, mpiexec or aprun is required to run MPI programs. On LANTA, srun command MUST be used instead. The table below compares a few options of those commands.

...

Command

...

Total MPI processes

...

CPU per MPI process

...

MPI processes per node

...

srun

...

-n, --ntasks

...

-c, --cpus-per-task

...

--ntasks-per-node

...

mpirun/mpiexec

...

-n, -np

...

--map-by socket:PE=N

...

--map-by ppr:N:node

...

aprun

...

-n, --pes

...

-d, --cpus-per-pe

...

-N, --pes-per-node

There is usually no need to add options to srun since, by default, Slurm will automatically derive them from sbatch. However, we recommend explicitly adding GPU binding options such as --gpus-per-task or --ntasks-per-gpu according to your software specification to srun. Please visit Slurm srun for more details.

Note
For multi-threaded
OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
ulimit -s unlimited 

5. Running your software

Anchor
RunningTheSoftware
RunningTheSoftware
isMissingRequiredParameterstrue

Each software has its own command to be issued. Please read the software documentation and forum. Special attention should be paid to how the software recognizes and maps computing resources (CPU-MPI-GPU); occasionally, users may need to insert additional input arguments at runtime. The total resources concurrently utilized in this stage should be less than or equal to the resources previously requested in Stage 1. Oversubscribing resources can reduce overall performance and could cause permanent damage to the hardware.

Usually, either srun, mpirun, mpiexec or aprun is required to run MPI programs. On LANTA, srun command MUST be used to launch MPI processes. The table below compares a few options of those commands.

Command

Total MPI processes

CPU per MPI process

MPI processes per node

srun

-n, --ntasks

-c, --cpus-per-task

--ntasks-per-node

mpirun/mpiexec

-n, -np

--map-by socket:PE=N

--map-by ppr:N:node

aprun

-n, --pes

-d, --cpus-per-pe

-N, --pes-per-node

There is usually no need to explicitly add option to srun since, by default, Slurm will automatically derive them from sbatch, with the exception of --cpus-per-task.

Anchor
SrunGPUBinding
SrunGPUBinding

Expand
titleGPU Binding
  1. When using --gpus-per-node or without any options, all tasks on the same node will see the same set of GPU IDs, starting from 0, available on the node. Try

    Code Block
    languagebash
    salloc -p gpu-devel -N2 --gpus-per-node=4 -t 00:05:00 -J "GPU-ID"  # Note: using default --ntasks-per-node=1
    srun nvidia-smi -L
    srun --ntasks-per-node=4 nvidia-smi -L
    srun --ntasks-per-node=2 --gpus-per-node=3 nvidia-smi -L
    exit               # Release salloc
    myqueue            # Check that no "GPU-ID" job still running 

    In this case, you can use SLURM_LOCALID or others to set CUDA_VISIBLE_DEVICES of each task. For example, you could use a wrapper script mentioned in HPE intro_mpi (Section 1) or you could devise an algorithm and use torch.cuda.set_device in PyTorch as demonstrated here.

  2. On the other hands, when using --gpus-per-task or --ntasks-per-gpu to bind resources, the GPU IDs seen by each task will start from 0 (CUDA_VISIBLE_DEVICES) but will be bound to a different GPU/UUID. Try

    Code Block
    languagebash
    salloc -p gpu-devel -N1 --gpus=4 -t 00:05:00 -J "GPU-ID"  # Note: using default --ntasks-per-node=1
    srun --ntasks=4 --gpus-per-task=1 nvidia-smi -L
    srun --ntasks-per-gpu=4 nvidia-smi -L
    exit              # Release salloc
    myqueue           # Check that no "GPU-ID" job still running 

    However, it is stated in HPE intro_mpi (Section 1) that using these options with CrayMPICH could introduce an intra-node MPI performance drawback.

Note

For hybrid (MPI+Multi-threading) applications, it is essential to specify -c or --cpus-per-tasks options for srun to prevent a potential decrease in performance (~50%>10%) due to improper CPU binding.

...

Info

You can test your initial script on compute-devel or gpu-devel partitions, using #SBATCH -t 02:00:00, since they normally have a shorter queuing time.

Your entire job script will only run on the first requested node (${SLURMD_NODENAME}). Only the lines starting with srun could initiate process and run on the other nodes.

...

Example

Installation guide

...