Table of Contents

minLevel	1
maxLevel	2
include
outline	false
indent
exclude
type	list
class
printable	false

This section offers guidelines on setting up an environment for building and running application software on LANTA.

There are mainly three approaches in preparing an environment on LANTA:

Manual installation (this guide)
Conda environment (please visit Mamba / Miniconda3)
Container (please visit Container (Apptainer / Singularity))

Users should select one over another. They should not be mixed, since library conflicts may occur.

1.1 HPE Cray Programming Environment

...

Expand

title	More information

Execute cc --version, CC --version, or ftn --version to check which compiler is being used.
With PrgEnv-intel loaded, ${MKLROOT} is set to the corresponding Intel Math Kernal Library (MKL).To use Intel
Classic, execute module swap intel intel-classic after loading By the defaults of PrgEnv-intelWith PrgEnv-nvhpc loaded, , the C/C++ compiler is ICX/ICPX while the Fortran compiler is IFORT.
To use only Intel Classic, execute module swap intel intel-classic after loading PrgEnv-intel
To use only Intel oneAPI, execute module swap intel intel-oneapi after loading PrgEnv-intel
With PrgEnv-nvhpc loaded, ${NVIDIA_PATH} is set to the corresponding NVIDIA SDK location.
There is PrgEnv-nvidia, but it will become deprecated soon, so it is not recommended.
With PrgEnv-aocc loaded, ${AOCC_PATH} is set to the corresponding AOCC location.

GPU acceleration

For building an application with GPU acceleration, users can use either PrgEnv-nvhpc, cudatoolkit/<version> or nvhpc-mixed. We recommend using PrgEnv-nvhpc for completeness.

Expand

title	More information

[Feb 2024] According to the information from nvidia-smi, NVIDIA driver version on LANTA is currently525.105.17 which supports CUDA version up to 12.0.
Please be aware of the compatibility between CUDA and cray-mpich. For instance, CUDA 12 is supported only since cray-mpich/8.1.27.
For cross-compile, you may have to use nvcc -ccbin cc to compile .cu
You have to manually pass the respective OpenACC/OpenMP offloading flags of each compiler.
You may have to pass -lcudart (cudatoolkit), -L${NVIDIA_PATH}/cuda/lib64 -lcudart -lcuda (nvhpc-mixed) and possibly a few others at linking phase (or by setting them in LDFLAGS before configure).

Build target

To enable optimizations that depend on the hardware architecture of LANTA, the following modules should be loaded together with PrgEnv.

...

Module name

...

Hardware target

...

Note

...

craype-x86-milan

...

AMD EPYC Milan (x86)

...

-

...

craype-accel-nvidia80

...

NVIDIA A100

...

Load after PrgEnv-nvhpc, cudatoolkit or nvhpc-mixed

Expand

title	More information

If craype-x86-milan is not loaded when using cc, CC or ftn, you may get a warning
No supported cpu target is set, CRAY_CPU_TARGET=x86-64 will be used. Load a valid targeting module or set CRAY_CPU_TARGET
With the target modules loaded, -march, -mtune, -hcpu, -haccel, -tp, -gpu, ... will be automatically filled in by cc, CC, ftn. To visualize this, try cc --version -craype-verbose.

Cray optimized libraries

Most Cray optimized libraries become accessible only after loading a PrgEnv, ensuring compatibility with the selected compiler suite. Additionally, some libraries, such as NetCDF, require loading other specific libraries first. Below is the hierarchy of commonly used cray-* modules.

...

Expand

title	Example: PrgEnv-intel + NetCDF

Code Block

language	bash

module purge
module load craype-x86-milan
module load PrgEnv-intel
module load cray-hdf5-parallel
module load cray-netcdf-hdf5parallel
module list
# Without version specified, the latest version will be loaded.

CPE version

To ensure backward compatibility after a system upgrade, it is recommended to fix the Cray Programming Environment version using either cpe/<version> or cpe-cuda/<version>. Otherwise, the most recent version will be loaded by default.

Expand

title	Note: unload cpe / cpe-cuda

When unloading cpe or cpe-cuda, you will encounter,

Unloading the cpe module is insufficient to restore the system defaults.
Please run 'source /opt/cray/pe/cpe/<version>/restore_lmod_system_defaults.[csh|sh]'.

This simply means that the remaining modules are still of the cpe/<version> that you just unloaded. They are not automatically reverted back to the system default version. It is necessary to execute the above command to manually reload and restore them to the defaults.

On the other hand, if you use module purge, you can safely ignore the above message.

1.2 ThaiSC pre-built modules

For user convenience, we provide several shared modules of some widely used software and libraries. These modules were built on top of the HPE Cray Programming Environment, using the CPE toolchain.

CPE toolchain

A CPE toolchain module is a bundle of craype-x86-milan, PrgEnv-<compiler> and cpe-cuda/<version>. The module is defined as a toolchain for convenience and for use with EasyBuild, the framework used for installing most ThaiSC modules.

...

title	[Feb 2024] Current CPE toolchains

...

CPE toolchain

...

Note

...

cpeGNU/23.03

...

GCC 11.2.0

...

cpeCray/23.03

...

CCE 15.0.1

...

cpeIntel/23.03

...

Deprecated and hidden. It will be removed in the future.

...

cpeIntel/23.09

...

Intel Compiler 2023.1.0

ThaiSC modules

All ThaiSC modules are located at the same module path, so there is no module hierarchy. Executing module avail on LANTA will display all available ThaiSC modules. For a more concise list, you can use module overview, then, use module whatis <name> or module help <name> to learn more about a specific module.

Users can readily use ThaiSC modules and CPE toolchains to build their applications. Some popular application software are pre-installed as well, for more information, refer to Applications usage.

Code Block

language	bash

username@lanta-xname:~> module avail

Code Block

language	bash

--------------------------------- /opt/cray/pe/lmod/modulefiles/core --------------------------------- (3)   cray-libpals (3)   craypkg-gen   (2)   nvhpc          (2) (3)   cray-libsci  (3)   cudatoolkit   (6)   nvidia         (2) (3)   cray-mrnet   (2)   gcc           (3)   papi           (3) (2)   cray-pals    (3)   gdb4hpc       (3)   perftools-base (3) (2)   cray-pmi     (3)   intel-classic (2)   sanitizers4hpc (2) (5)   cray-python  (2)   intel-oneapi  (2)   valgrind4hpc   (3) (1)   cray-stat    (2)   intel         (2) craype       (3)   iobuf         (1) data-a11y-before="Start of changed content" data-a11y-after="End of changed content" id="changed-diff-2"> /

lustrefs

opt/cray/

disk

pe/

modules

lmod/

easybuild

modulefiles/

modules

craype-targets/

all

default ---------------------------------- data-a11y-before="Start of added content" data-a11y-after="End of added content" id="added-diff-16">craype-x86-milan

ADIOS2/2.9.0-cpeCray-23.03

     (1)     craype-accel-nvidia80   (1)

ParallelIO/2

6

2-cpeIntel-23

09

 other modules

ADIOS2/2

..

9.1-cpeIntel-23.09 (D) Perl/5.36.0-cpeGNU-23.03 PostgreSQL/15.3-cpeIntel-23.09 QuantumESPRESSO/7.2-libxc-6.1.0-cpu (D) QuantumESPRESSO/7.2-libxc-6.1.0-gpu RAxML-NG/1.2.0-cpeGNU-23.03 SAMtools/1.17-cpeGNU-23.03 SCOTCH/7.0.3-cpeCray-23.03 SPAdes/3.15.4-cpeGNU-23.03 SQLite/3.41.2-cpeIntel-23.09 SWIG/4.0.2-cpeGNU-23.03 SWIG/4.1.1-cpeIntel-23.09 (D) Tcl/8.6.13-cpeIntel-23.09 Trimmomatic/0.39-Java-17 VASP/6.3.2-GNU-cpu_vtst (D) VASP/6.3.2-Intel-cpu_vtst

. dy>

GPU acceleration

For building an application with GPU acceleration, users can use either PrgEnv-nvhpc, cudatoolkit/<version> or nvhpc-mixed. We recommend using PrgEnv-nvhpc for completeness.

Expand

title	More information

[Feb 2024] According to the information from nvidia-smi, NVIDIA driver version on LANTA is currently525.105.17 which supports CUDA version up to 12.0.
Please be aware of the compatibility between CUDA and cray-mpich. For instance, CUDA 12 is supported only since cray-mpich/8.1.27.
For cross-compile, you may have to use nvcc -ccbin cc to compile .cu
You have to manually pass the respective OpenACC/OpenMP offloading flags of each compiler.
You may have to pass -lcudart (cudatoolkit), -L${NVIDIA_PATH}/cuda/lib64 -lcudart -lcuda (nvhpc-mixed) and possibly a few others at linking phase (or by setting them in LDFLAGS before configure).

Build target

To enable optimizations that depend on the hardware architecture of LANTA, the following modules should be loaded together with PrgEnv.

Module name	Hardware target	Note
craype-x86-milan	AMD EPYC Milan (x86)	-
craype-accel-nvidia80	NVIDIA A100	Load after `PrgEnv-nvhpc`, `cudatoolkit` or `nvhpc-mixed`

Expand

title	More information

If craype-x86-milan is not loaded when using cc, CC or ftn, you may get a warning
No supported cpu target is set, CRAY_CPU_TARGET=x86-64 will be used. Load a valid targeting module or set CRAY_CPU_TARGET
With the target modules loaded, -march, -mtune, -hcpu, -haccel, -tp, -gpu, ... will be automatically filled in by cc, CC, ftn. To visualize this, try cc --version -craype-verbose.

Cray optimized libraries

Most Cray optimized libraries become accessible only after loading a PrgEnv, ensuring compatibility with the selected compiler suite. Additionally, some libraries, such as NetCDF, require loading other specific libraries first. Below is the hierarchy of commonly used cray-* modules.

...

Expand

title	Example: PrgEnv-intel + NetCDF

Code Block

language	bash

module purge
module load craype-x86-milan
module load PrgEnv-intel
module load cray-hdf5-parallel
module load cray-netcdf-hdf5parallel
module list
# Without version specified, the latest version will be loaded.

CPE version

To ensure backward compatibility after a system upgrade, it is recommended to fix the Cray Programming Environment version using either cpe/<version> or cpe-cuda/<version>. Otherwise, the most recent version will be loaded by default.

Expand

title	Note: unload cpe / cpe-cuda

When unloading cpe or cpe-cuda, you will encounter,

Unloading the cpe module is insufficient to restore the system defaults.
Please run 'source /opt/cray/pe/cpe/<version>/restore_lmod_system_defaults.[csh|sh]'.

This simply means that the remaining modules are still of the cpe/<version> that you just unloaded. They are not automatically reverted back to the system default version. It is necessary to execute the above command to manually reload and restore them to the defaults.

On the other hand, if you use module purge, you can safely ignore the above message.

1.2 ThaiSC pre-built modules

For user convenience, we provide several shared modules of some widely used software and libraries. These modules were built on top of the HPE Cray Programming Environment, using the CPE toolchain.

CPE toolchain

A CPE toolchain module is a bundle of craype-x86-milan, PrgEnv-<compiler> and cpe-cuda/<version>. The module is defined as a toolchain for convenience and for use with EasyBuild, the framework used for installing most ThaiSC modules.

Expand

title	[Feb 2024] Current CPE toolchains

CPE toolchain	Note
cpeGNU/23.03	GCC 11.2.0
cpeCray/23.03	CCE 15.0.1
cpeIntel/23.03	Deprecated and hidden. It will be removed in the future.
cpeIntel/23.09	Intel Compiler 2023.1.0

ThaiSC modules

All ThaiSC modules are located at the same module path, so there is no module hierarchy. Executing module avail on LANTA will display all available ThaiSC modules. For a more concise list, you can use module overview, then, use module spider <name> to learn more about each specific module.

Users can readily use ThaiSC modules and CPE toolchains to build their applications. Some popular application software are pre-installed as well, for more information, refer to Applications usage.

Code Block

language	bash

username@lanta-xname:~> module overview
--------------------------------- /lustrefs/disk/modules/easybuild/modules/all ---------------------------------
ADIOS2        (2)   GATK                  (1)   NASM            VASP/6.3.2-NVHPC-gpu_vtst(1)    Boost/1.79.0-cpeGNU-23.03Tcl          (2)   groff         (2)
 ATK   VCFtools/0.1.16-cpeGNU-23.03    Boost/1.81.0-cpeGNU-23.03    (2)   GDAL                   WPS/4.4-DM-cpeCray-23.03(2)   NLopt           (D2)    Boost/1.81.0-cpeIntel-23.09Tk           (2)   hwloc         (1)
Amber      WPS/4.5-DM-cpeIntel-23.09    Boost/1.82.0-cpeCray-23.03(1)   GEOS            (D)       WRF/4.4.2-DMSM-cpeCray-23.03(2)   Nextflow  Bowtie/1.3.1-cpeGNU-23.03      (2)   Trimmomatic  (1)   intltool      (1)
Apptainer     WRFchem/4.5.1-DM-cpeIntel-23.09
   Bowtie2/2.5.1-cpeGNU-23.03(1)   GLM                   (2)   Ninja      XZ/5.4.3-cpeCray-23.03    C-Blosc2/2.10.5-cpeIntel-23.09  (1)   UDUNITS2     (1)   jbigkit       (2)
 XZ/5.4.3-cpeGNU-23.03Armadillo     CFITSIO/4.2.0-cpeIntel-23.09(2)   GLib                  (2)   OSPRay        XZ/5.4.3-cpeIntel-23.09  (2)   VASP         (D)3)   libGLU        CGAL/5.5.2-cpeCray-23.03(2)
AutoDock-vina (1)   GMP                   (3)   OpenCASCADE     Xerces-C++/3.2.4

Expand

title	Example: Boost/1.81.0-cpeGNU-23.03

Code Block

language	bash

module purge
module load Boost/1.81.0-cpeGNU-23.03
echo ${CPATH}
echo ${LIBRARY_PATH}
echo ${LD_LIBRARY_PATH}

2. Building an application software

After an appropriate environment is loaded, this section provides guidelines on how to use it to build an application software on LANTA.

2.1 Compiler wrapper

...

<wrapper> command

...

Description

...

Manual

...

In substitution for

...

cc

...

C compiler wrapper

...

man cc or cc --help

...

mpicc / mpiicc

...

CC

...

C++ compiler wrapper

...

man CC or CC --help

...

mpic++ / mpiicpc

...

ftn

...

Fortran compiler wrapper

...

man ftn or ftn --help

...

mpif90 / mpiifort

The Cray compiler wrappers, namely, cc, CC and ftn, become available after loading any PrgEnv-<compiler> or CPE toolchain. Upon being invoked, the wrapper will pass relevant information about the cray-* libraries, loaded in the current environment, to the underlying <compiler> to compile source code. It is recommended to use these wrappers for building MPI applications with the native Cray MPICH library cray-mpich.

Adding -craype-verbose to the wrapper when compiling a source file will display the final command executed. To see what will be added before compiling, try <wrapper> --cray-print-opts=all.

Expand

title	More information

Code Block

language	bash

module purge
module load GDAL/3.6.4-cpeIntel-23.09
cc --cray-print-opts=all

The output of the final command indicates that include search paths (-I), library search paths (-L), and linking flags (-l<lib> such as -lnetcdf) of cray-* library are taken care of automatically by the wrapper; there is no need to manually pass them.

For ThaiSC libraries, their include and library search paths are stored in ${CPATH}, ${LIBRARY_PATH} and ${LD_LIBRARY_PATH}, and users have to manually add appropriate linking flags (-l<lib>). However, these steps should be covered automatically if using a build tool.

# Create the source code cat << Eof > HelloWorld.cpp #include <iostream> #include <mpi.h> #include <omp.h> int main(){ int rank, thread ; omp_lock_t io_lock ; omp_init_lock(&io_lock); MPI_Init(nullptr,nullptr); MPI_Comm_rank(MPI_COMM_WORLD, &rank);

(2)   VCFtools     (1)   libaec        (2)
Autoconf      (1)   GObject-Introspection (2)   OpenEXR         (2)   WPS          (2)   libdeflate    (2)
Automake      (1)   GROMACS               (2)   OpenFOAM        (2)   WRF          (1)   libdrm        (2)
Autotools     (1)   GSL                   (3)   OpenJPEG        (2)   WRFchem      (2)   libepoxy      (2)
BCFtools      (1)   Gaussian              (1)   OpenMPI         (1)   Wayland      (2)   libffi        (1)
BEDTools      (1)   GenericIO             (2)   OpenSSL         (1)   X11          (2)   libgeotiff    (2)
BLAST+        (1)

#pragma

omp

Gmsh

parallel

private(thread)

{

thread

=

omp_get_thread_num

(1)

;

   OpenTURNS

omp_set_lock(&io_lock);

 (2)   XZ

std::cout

<<

"HelloWorld

from

rank

"

<<

rank

(3)

<<

"

thread

libglvnd

"

<<

thread

<<

std::endl;

 (2)
BLASTDB

omp_unset_lock(&io_lock);

(1)   Go

}

omp_destroy_lock(&io_lock);

       (1)   PCRE

MPI_Finalize

(1)

;

   Xerces-C++

return

(1)

0;

}

Eof

libiconv

#

Compile

the

source

file 'HelloWorld.cpp' CC -craype-verbose -o hello.exe HelloWorld.cpp -fopenmp # Run the program srun -p compute-devel -N1 -n4 -c2 hello.exe

Expand

title	Example: HelloWorld.cpp

Code Block

language	bash

2.2 Build tools

Several tools exist to help us build large and complex programs. Among them, GNU make and CMake are commonly used. The developer team for each software chooses what build tool they will support. Therefore, it is important to thoroughly read the software documentation. For some software, users might need to additionally load the latest CMake or Autotools modules on the system (e.g., module load CMake/3.26.4).

There are three general stages in building a program using a build tool: configure, make and make install. For more information, see Basic Installation.

Build tools typically detect compilers through environment variables such as CC, CXX, and FC at the configure stage. Therefore, setting these variables before running configure should be sufficient to make the tool use the Cray compiler wrappers.

Code Block

language	bash

export CC=cc CXX=CC FC=ftn F77=ftn F90=ftn
# ./configure --prefix=<your-install-location> ...
# or 
# cmake -DCMAKE_INSTALL_PREFIX=<your-install-location> ...

Nevertheless, if the CMake cache is not clean, you might need to explicitly use:

Code Block

language	bash

cmake -DCMAKE_C_COMPILER=cc -DCMAKE_CXX_COMPILER=CC -DCMAKE_Fortran_COMPILER=ftn -DCMAKE_INSTALL_PREFIX=<your-install-location> ...

We encourage users to manually specify the installation path using --prefix= or -DCMAKE_INSTALL_PREFIX= as shown above. This path can be within your project home, such as /project/ltXXXXXX-YYYY/<software-name>, allowing you to manage permissions and share the installed software with your project members. By default on LANTA, your team will be able to read and execute your software but cannot make any changes inside the directory you own.

After these steps, you should be able to execute make and make install, then build your software as you would on any other system.

module purge module load cpeCray/23.03 module load NASM/2.16.01 module load CMake/3.26.4 wget https://github.com/libjpeg-turbo/libjpeg-turbo/releases/download/3.0.2/libjpeg-turbo-3.0.2.tar.gz tar xzf libjpeg-turbo-3.0.2.tar.gz cd libjpeg-turbo-3.0.2 PARENT_DIR=$(pwd) mkdir build ; cd build export CC=cc CXX=CC FC=ftn F77=ftn F90=ftn # or, for verbosity, #export CC="cc -craype-verbose" CXX="CC -craype-verbose" FC="ftn -craype-verbose" F77="ftn -craype-verbose" F90="ftn -craype-verbose" cmake -DCMAKE_INSTALL_PREFIX=${PARENT_DIR}/cpeCray-23.03 -G"Unix Makefiles" -DWITH_JPEG8=1 .. make make install cd ${PARENT_DIR}/cpeCray-23.03 ln -s ./lib64 ./lib

(2)
BWA           (1)   HDF-EOS               (2)   PCRE2           (1)   Yasm         (1)   libjpeg-turbo (3)
BamTools      (1)   HDF                   (2)   PDAL            (1)   arpack-ng    (2)   libpciaccess  (2)
Beast         (1)   HTSlib                (1)   PETSc           (2)   assimp       (2)   libpng        (3)
Bison         (1)   HYPRE                 (2)   PROJ            (2)   at-spi2-atk  (2)   libreadline   (2)
Blosc         (2)   HarfBuzz              (2)   Pango           (2)

#

Optional, but recommended

Expand

title	Example: libjpeg-turbo/3.0.2-cpeCray-23.03

Code Block

language	bash

Code Block

language	bash

2.3 Related topics

Local module & EasyBuild

A separate page is dedicated for explaining how to manage and install local modules in the user’s home/project paths using EasyBuild → Local module & EasyBuild (In progress)

Useful compiler flags

Intel oneAPI (In progress)

Other approach

3. Running the software

Every main application software must run on compute/gpu/memory nodes. The recommended approach is to write a job script and send it to Slurm scheduler through sbatch command.

Note
Only use `sbatch <job-script>`. If users execute `bash <job-script>` or just simply `./<job-script>`, then the script will not get sent to Slurm and will run on frontend node instead!

3.1 Writing a job script

#!/bin/bash
#SBATCH -p gpu        at-spi2-core (2)   libtirpc      (2)
Boost         (4)   ICU                   (3)   ParFlow         (1)   aws-ofi-nccl (1)   libtool       (1)
Bowtie        (1)   Imath          # Partition #SBATCH -N 1   (2)   ParMETIS        (2)   beagle-lib  # Number(1) of nodes #SBATCHlibunwind --gpus=4    (2)
Bowtie2       (1)   #JasPer Number of GPU cards #SBATCH --ntasks=4           (3)   ParaView        (1)   binutils     (1)   libxml2       (3)
Brotli        (2)   Java                  (2)       # Number of MPI processes
#SBATCH --cpus-per-task=16     # Number of OpenMP threads per MPI process
#SBATCH -t 5-00:00:00          # Job runtime limit
#SBATCH -A ltXXXXXX            # Billing account 
# #SBATCH -J <JobName>         # Job name

module purge
# --- Load necessary modules ---
module load <...>
module load <...>

# --- Add software to Linux search paths ---
export PATH=<software-bin-path>:${PATH}
export LD_LIBRARY_PATH=<software-lib/lib64-path>:${LD_LIBRARY_PATH}
# export PYTHONPATH=<software-python-site-packages>:${PYTHONPATH}
# source <your-software-specific-script>

# --- (Optional) Set related environment variables ---
# export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}         # MUST specify --cpus-per-task above

# --- Run the software ---
# srun <srun-options> ./<software>
# or
# ./<software>

The above job script template consists of five sections:

...

It should be noted that Slurm sbatch options only define and request computing resources that can be used inside a job script. The actual resources used by a software/executable can be different depending on how it will be invoked/issued (see Stage 5).

2. Loading modules
It is advised to load every module used when installing the software in the job script, although build dependencies such as CMake, Autotools, and binutils can be omitted. Additionally, those modules should be of the same version as when they were used to compile the program.

3. Adding software paths
The Linux OS will not be able to find your program if it is not in its search paths. The commonly used ones are namely PATH (for executable/binary), LD_LIBRARY_PATH (for shared library), and PYTHONPATH (for python packages). Users MUST append or prepend them using syntax such as export PATH=<software-bin-path>:${PATH}, otherwise, prior search paths added by module load and others will disappear.
If <your-install-location> is where your software is installed, then putting the below commands in your job script should be sufficient in most cases.

Code Block

language	bash

export PATH=<your-install-location>/binParallelIO      (1)   bzip2        (3)   lz4           (3)
C-Blosc2      (2)   KaHIP                 (2)   Perl            (2)   cURL         (1)   minimap2      (1)
CFITSIO       (2)   LAME                  (1)   PostgreSQL      (2)   cairo        (2)   nccl          (1)
CGAL          (2)   LLVM                  (1)   QuantumESPRESSO (2)   canu         (1)   ncurses       (2)
CMake         (2)   LMDB                  (1)   RAxML-NG        (1)   cpeCray      (1)   nlohmann_json (1)
CrayNVHPC     (1)   LibTIFF               (2)   SAMtools        (1)   cpeGNU       (1)   numactl       (1)
DB            (2)   M4                    (1)   SCOTCH          (2)   cpeIntel     (1)   pixman        (1)
DBus          (1)   MAFFT                 (1)   SDL2            (2)   ecCodes      (2)   pkgconf       (1)
ESMF          (2)   METIS                 (2)   SLEPc           (2)   expat        (2)   tbb           (1)
EasyBuild     (1)   MPC                   (2)   SPAdes          (1)   flex         (1)   termcap       (1)
Eigen         (1)   MPFR                  (2)   SQLite          (2)   fontconfig   (2)   x264          (1)
FDS           (1)   MUMPS                 (3)   SWIG            (3)   freetype     (2)   x265          (1)
FFmpeg        (2)   Mako                  (2)   SYCL            (1)   gettext      (1)   xorg-macros   (2)
FastQC        (1)   Mamba                 (1)   SZ              (2)   git-lfs      (1)   xprop         (2)
FortranGIS    (2)   Mesa                  (2)   SpectrA         (1)   googletest   (1)   zfp           (2)
FreeXL        (2)   Meson                 (2)   SuiteSparse     (2)   gperf        (1)   zlib          (2)
FriBidi       (2)   MrBayes               (1)   SuperLU_DIST    (2)   gperftools   (2)   zstd          (3)

Expand

title	Example: Boost/1.81.0-cpeGNU-23.03

Code Block

language	bash

module purge
module load Boost/1.81.0-cpeGNU-23.03
echo ${CPATH}
echo ${LIBRARY_PATH}
echo ${LD_LIBRARY_PATH}

...

2. Building an application software

After an appropriate environment is loaded, this section provides guidelines on how to use it to build an application software on LANTA.

2.1 Compiler wrapper

<wrapper> command	Description	Manual	In substitution for
`cc`	C compiler wrapper	`man cc` or `cc --help`	mpicc / mpiicc
`CC`	C++ compiler wrapper	`man CC` or `CC --help`	mpic++ / mpiicpc
`ftn`	Fortran compiler wrapper	`man ftn` or `ftn --help`	mpif90 / mpiifort

The Cray compiler wrappers, namely, cc, CC and ftn, become available after loading any PrgEnv-<compiler> or CPE toolchain. Upon being invoked, the wrapper will pass relevant information about the cray-* libraries, loaded in the current environment, to the underlying <compiler> to compile source code. It is recommended to use these wrappers for building MPI applications with the native Cray MPICH library cray-mpich.

Adding -craype-verbose to the wrapper when compiling a source file will display the final command executed. To see what will be added before compiling, try <wrapper> --cray-print-opts=all.

Expand

title	More information

Code Block

language	bash

module purge
module load GDAL/3.6.4-cpeIntel-23.09
cc --cray-print-opts=all

The output of the final command indicates that include search paths (-I), library search paths (-L), and linking flags (-l<lib> such as -lnetcdf) of cray-* library are taken care of automatically by the wrapper; there is no need to manually pass them.

For ThaiSC libraries, their include and library search paths are stored in ${CPATH}, ${LIBRARY_PATH} and ${LD_LIBRARY_PATH}, and users have to manually add appropriate linking flags (-l<lib>). However, these steps should be covered automatically if using a build tool.

Expand

title	Example: HelloWorld.c

Code Block

language	bash

# Create the source code
cat << Eof > HelloWorld.c
#include <stdio.h>
#include <mpi.h>
#include <omp.h>

int main(){
    int rank, thread ;
    omp_lock_t io_lock ;
    omp_init_lock(&io_lock);

    MPI_Init(NULL, NULL);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
    #pragma omp parallel private(thread)
    {
        thread = omp_get_thread_num();
        omp_set_lock(&io_lock);
        printf("HelloWorld from rank %d thread %d \n", rank, thread);
        omp_unset_lock(&io_lock);
    }

    omp_destroy_lock(&io_lock);
    MPI_Finalize();
    
return 0; }
Eof

module purge
module load craype-x86-milan
module load PrgEnv-intel
# or
# module load cpeIntel/23.09

# Compile the source file 'HelloWorld.c'
cc -craype-verbose -o hello.exe HelloWorld.c -fopenmp

# Run the program 
srun -p compute-devel -N1 -n4 -c2 hello.exe

Expand

title	Example: HelloWorld.cpp

Code Block

language	bash

# Create the source code
cat << Eof > HelloWorld.cpp
#include <iostream>
#include <mpi.h>
#include <omp.h>

int main(){
    int rank, thread ;
    omp_lock_t io_lock ;
    omp_init_lock(&io_lock);

    MPI_Init(nullptr,nullptr);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
    #pragma omp parallel private(thread)
    {
        thread = omp_get_thread_num();
        omp_set_lock(&io_lock);
	    std::cout << "HelloWorld from rank " << rank << " thread " << thread << std::endl;
        omp_unset_lock(&io_lock);
    }

    omp_destroy_lock(&io_lock);
    MPI_Finalize();
    
return 0; }
Eof

module purge
module load craype-x86-milan
module load PrgEnv-intel
# or
# module load cpeIntel/23.09

# Compile the source file 'HelloWorld.cpp'
CC -craype-verbose -o hello.exe HelloWorld.cpp -fopenmp

# Run the program 
srun -p compute-devel -N1 -n4 -c2 hello.exe

Expand

title	Example: HelloWorld.f90

Code Block

language	bash

# Create the source code
cat << Eof > HelloWorld.f90
program helloworld_mpi_omp
    use mpi
    use omp_lib

    implicit none
    integer :: rank, thread, provided, ierr
    integer(kind=omp_lock_kind) :: io_lock

    call omp_init_lock(io_lock)
    call MPI_Init(ierr)
    call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)

    !\$omp parallel private(thread)
    thread = omp_get_thread_num()

    call omp_set_lock(io_lock)
    print *, 'HelloWorld from rank ', rank, ' thread ', thread
    call omp_unset_lock(io_lock)
    !\$omp end parallel

    call omp_destroy_lock(io_lock)
    call MPI_Finalize(ierr)

end program helloworld_mpi_omp
Eof
# Note: change !\$omp to !$omp if you type the above code by hand.

module purge
module load craype-x86-milan
module load PrgEnv-intel
# or
# module load cpeIntel/23.09

# Compile the source file 'HelloWorld.cpp'
ftn -craype-verbose -o hello.exe HelloWorld.f90 -fopenmp

# Run the program 
srun -p compute-devel -N1 -n4 -c2 hello.exe

2.2 Build tools

Several tools exist to help us build large and complex programs. Among them, GNU make and CMake are commonly used. The developer team for each software chooses what build tool they will support. Therefore, it is important to thoroughly read the software documentation. For some software, users might need to additionally load the latest CMake or Autotools modules on the system (e.g., module load CMake/3.26.4).

There are three general stages in building a program using a build tool: configure, make and make install. For more information, see Basic Installation.

Build tools typically detect compilers through environment variables such as CC, CXX, and FC at the configure stage. Therefore, setting these variables before running configure should be sufficient to make the tool use the Cray compiler wrappers.

Code Block

language	bash

export CC=cc CXX=CC FC=ftn F77=ftn F90=ftn
# ./configure --prefix=<your-install-location> ...
# or 
# cmake -DCMAKE_INSTALL_PREFIX=<your-install-location> ...

Nevertheless, if the CMake cache is not clean, you might need to explicitly use:

Code Block

language	bash

cmake -DCMAKE_C_COMPILER=cc -DCMAKE_CXX_COMPILER=CC -DCMAKE_Fortran_COMPILER=ftn -DCMAKE_INSTALL_PREFIX=<your-install-location> ...

We encourage users to manually specify the installation path using --prefix= or -DCMAKE_INSTALL_PREFIX= as shown above. This path can be within your project home, such as /project/ltXXXXXX-YYYY/<software-name>, allowing you to manage permissions and share the installed software with your project members. By default on LANTA, your team will be able to read and execute your software but cannot make any changes inside the directory you own.

After these steps, you should be able to execute make and make install, then build your software as you would on any other system.

Expand

title	Example: libjpeg-turbo/3.0.2-cpeCray-23.03

Code Block

language	bash

module purge
module load cpeCray/23.03
module load NASM/2.16.01
module load CMake/3.26.4

wget https://github.com/libjpeg-turbo/libjpeg-turbo/releases/download/3.0.2/libjpeg-turbo-3.0.2.tar.gz
tar xzf libjpeg-turbo-3.0.2.tar.gz
cd libjpeg-turbo-3.0.2
PARENT_DIR=$(pwd)
mkdir build ; cd build

export CC=cc CXX=CC FC=ftn F77=ftn F90=ftn
# or, for verbosity,
#export CC="cc -craype-verbose" CXX="CC -craype-verbose" FC="ftn -craype-verbose" F77="ftn -craype-verbose" F90="ftn -craype-verbose"

cmake -DCMAKE_INSTALL_PREFIX=${PARENT_DIR}/cpeCray-23.03 -G"Unix Makefiles" -DWITH_JPEG8=1 ..
make
make install

cd ${PARENT_DIR}/cpeCray-23.03
ln -s ./lib64 ./lib               # Optional, but recommended

2.3 Related topics

...

3. Running the software

Every main application software must run on compute/gpu/memory nodes. The recommended approach is to write a job script and send it to Slurm scheduler through sbatch command.

Note
Only use `sbatch <job-script>`. If users execute `bash <job-script>` or just simply `./<job-script>`, then the script will not get sent to Slurm and will run on frontend node instead!

3.1 Writing a job script

Code Block

language	bash

#!/bin/bash
#SBATCH -p gpu                 # Partition
#SBATCH -N 1                   # Number of nodes
#SBATCH --gpus=4               # Number of GPU cards
#SBATCH --ntasks=4             # Number of MPI processes
#SBATCH --cpus-per-task=16     # Number of OpenMP threads per MPI process
#SBATCH -t 5-00:00:00          # Job runtime limit
#SBATCH -A ltXXXXXX            # Billing account 
# #SBATCH -J <JobName>         # Job name

module purge
# --- Load necessary modules ---
module load <...>
module load <...>

# --- Add software to Linux search paths ---
export PATH=<software-bin-path>:${PATH}
export LD_LIBRARY_PATH=<software-lib/lib64-path>:${LD_LIBRARY_PATH}
# export PYTHONPATH=<software-python-site-packages>:${PYTHONPATH}
# source <your-software-specific-script>

# --- (Optional) Set related environment variables ---
# export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}         # MUST specify --cpus-per-task above

# --- Run the software ---
# srun <srun-options> ./<software>
# or
# ./<software>

The above job script template consists of five sections:

1. Slurm sbatch header

Anchor

	SbatchHeader
	SbatchHeader
isMissingRequiredParameters	true

The #SBATCH directives can be used to specify sbatch options that mostly unchanged, such as partition, time limit, billing account, and so on. For optional options like job name, users can specify them when submitting the script (see Submitting a job). For more details regarding sbatch options, please visit Slurm sbatch.

Mostly, Slurm sbatch options only define and request computing resources that can be used inside a job script. The actual resources used by a software/executable can be different depending on how it will be invoked/issued (see Stage 5), although these sbatch options are passed and become the default options for it. For GPU jobs, using either --gpus or --gpus-per-node to request GPUs at this stage will provide the most flexibility for the next stage, GPU binding.

If your application software only supports parallelization by multi-threading, then your software cannot utilize resources across nodes; in this case, therefore, -N, -n, --ntasks and --ntasks-per-node should be set to 1.

2. Loading modules
It is advised to load every module used when installing the software in the job script, although build dependencies such as CMake, Autotools, and binutils can be omitted. Additionally, those modules should be of the same version as when they were used to compile the program.

3. Adding software paths
The Linux OS will not be able to find your program if it is not in its search paths. The commonly used ones are namely PATH (for executable/binary), LD_LIBRARY_PATH (for shared library), and PYTHONPATH (for python packages). Users MUST append or prepend them using syntax such as export PATH=<software-bin-path>:${PATH}, otherwise, prior search paths added by module load and others will disappear.
If <your-install-location> is where your software is installed, then putting the below commands in your job script should be sufficient in most cases.

Code Block

language	bash

export PATH=<your-install-location>/bin:${PATH}
export LD_LIBRARY_PATH=<your-install-location>/lib:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=<your-install-location>/lib64:${LD_LIBRARY_PATH}

Some of them can be omitted if there no such sub-directory when using ls <your-install-location>.

Expand

title	PYTHONPATH

If the software also generates Python packages, then you may have to check where those packages are installed. Usually, they are in <your-install-location>/lib/python<Version>/site-packages, <your-install-location>/lib64/python<Version>/site-packages or other similar paths. Then, adding a line such as below would inform Python where the packages are.

Code Block

language	bash

export PYTHONPATH=<your-install-location>/lib/python<Version>/site-packages:${PYTHONPATH}

Expand

title	More information

If some software dependencies were installed locally, their search paths should also be added.
We do NOT recommend specifying these search paths in ~/.bashrc directly, as it could lead to internal conflicts when having more than one main software.
Some software provides a script to be sourced before using. In this case, sourcing it in your job script should be equivalent to adding its search paths manually yourself.

When executing your program, if you encounter

If 'xxx' is not a typo you can use command-not-found to lookup ..., then, your current PATH variable may be incorrect.
xxx: error while loading shared libraries: libXXX.so: cannot open shared object file, then,
- If libXXX.so seem to be related to your software, then you may set LD_LIBRARY_PATH variable in Step 3 incorrectly.
- If libXXX.so seem to be from a module you used to build your software, then loading that module should fix the problem.
ModuleNotFoundError: No module named 'xxx', then, your current PYTHONPATH may be incorrect.

Preliminary check could be performed on frontend node by doing something like

Code Block

language	bash

bash   # You should check them in another bash shell

module purge
module load <...>
module load <...>

export PATH=<software-bin-path>:${PATH}
export LD_LIBRARY_PATH=

<your

<software-

install-location>/lib

lib/lib64-path>:${LD_LIBRARY_PATH}
export

LD_LIBRARY_PATH=<your-install-location>/lib64

PYTHONPATH=<software-python-site-packages>:${

LD_LIBRARY_PATH}

Some of them can be omitted if there no such sub-directory when using ls <your-install-location>.

Expand

title	PYTHONPATH

If the software also generates Python packages, then you may have to check where those packages are installed. Usually, they are in <your-install-location>/lib/python<Version>/site-packages, <your-install-location>/lib64/python<Version>/site-packages or other similar paths. Then, adding a line such as below would inform Python where the packages are.

Code Block

language	bash

export PYTHONPATH=<your-install-location>/lib/python<Version>/site-packages:${PYTHONPATH}

...

title	More information

If some software dependencies were installed locally, their search paths should also be added.
We do NOT recommend specifying these search paths in ~/.bashrc directly, as it could lead to library conflicts when having more than one main software.
Some software provides a script to be sourced before using. In this case, sourcing it in your job script should be equivalent to adding its search paths manually by yourself.

When executing your program, if you encounter

If 'xxx' is not a typo you can use command-not-found to lookup ..., then, your current PATH variable may be incorrect.
xxx: error while loading shared libraries: libXXX.so: cannot open shared object file, then,
- If libXXX.so seem to be related to your software, then you may set LD_LIBRARY_PATH variable in Step 3 incorrectly.
- If libXXX.so seem to be from a module you used to build your software, then loading that module should fix the problem.
ModuleNotFoundError: No module named 'xxx', then, your current PYTHONPATH may be incorrect.

Preliminary check could be performed on frontend node by doing something like

Code Block

language	bash

bash   # You should check them in another bash shell

module purge
module load <...>
module load <...>

export PATH=<software-bin-path>:${PATH}
export LD_LIBRARY_PATH=<software-lib/lib64-path>:${LD_LIBRARY_PATH}
export PYTHONPATH=<software-python-site-packages>:${PYTHONPATH}

<executable> --help
<executable> --version

exit

4. Setting environment variables
Some software requires additional environment variables to be set at runtime; for example, the path to the temporary directory. Parameters set by Slurm sbatch (see Slurm sbatch - output environment variables) could be utilized in setting up software-specific environment variables.
For application with OpenMP threading, OMP_NUM_THREADS, OMP_STACKSIZE, ulimit -s unlimited are commonly set in a job script. An example is shown below.

Code Block

language	bash

export XXX_TMPDIR=/scratch/ltXXXXXX-YYYY/${SLURM_JOBID}
export OMP_STACKSIZE="32M"
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
ulimit -s unlimited

...

Usually, either srun, mpirun, mpiexec or aprun is required to run MPI programs. On LANTA, srun command MUST be used instead. The table below compares a few options of those commands.

...

Command

...

Total MPI processes

...

CPU per MPI process

...

MPI processes per node

...

srun

...

-n, --ntasks

...

-c, --cpus-per-task

...

--ntasks-per-node

...

mpirun/mpiexec

...

-n, -np

...

--map-by socket:PE=N

...

--map-by ppr:N:node

...

aprun

...

-n, --pes

...

-d, --cpus-per-pe

...

-N, --pes-per-node

There is usually no need to add options to srun since, by default, Slurm will automatically derive them from sbatch. However, we recommend explicitly adding GPU binding options such as --gpus-per-task or --ntasks-per-gpu according to your software specification to srun. Please visit Slurm srun for more details.

Note

For multi-threaded

PYTHONPATH}

<executable> --help
<executable> --version

exit

4. Setting environment variables
Some software requires additional environment variables to be set at runtime; for example, the path to the temporary directory. Output environment variables set by Slurm sbatch (see Slurm sbatch - output environment variables) could be used to set software-specific parameters.
For application with OpenMP threading, OMP_NUM_THREADS, OMP_STACKSIZE, ulimit -s unlimited are commonly set in a job script. An example is shown below.

Code Block

language	bash

export XXX_TMPDIR=/scratch/ltXXXXXX-YYYY/${SLURM_JOBID}
export OMP_STACKSIZE="32M"
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
ulimit -s unlimited

5. Running your software

Anchor

	RunningTheSoftware
	RunningTheSoftware
isMissingRequiredParameters	true

Each software has its own command to be issued. Please read the software documentation and forum. Special attention should be paid to how the software recognizes and maps computing resources (CPU-MPI-GPU); occasionally, users may need to insert additional input arguments at runtime. The total resources concurrently utilized in this stage should be less than or equal to the resources previously requested in Stage 1. Oversubscribing resources can reduce overall performance and could cause permanent damage to the hardware.

Usually, either srun, mpirun, mpiexec or aprun is required to run MPI programs. On LANTA, srun command MUST be used to launch MPI processes. The table below compares a few options of those commands.

Command	Total MPI processes	CPU per MPI process	MPI processes per node
srun	-n, --ntasks	-c, --cpus-per-task	--ntasks-per-node
mpirun/mpiexec	-n, -np	--map-by socket:PE=N	--map-by ppr:N:node
aprun	-n, --pes	-d, --cpus-per-pe	-N, --pes-per-node

There is usually no need to explicitly add option to srun since, by default, Slurm will automatically derive them from sbatch, with the exception of --cpus-per-task.

Anchor

	SrunGPUBinding
	SrunGPUBinding

Expand

title	GPU Binding

When using --gpus-per-node or without any options, all tasks on the same node will see the same set of GPU IDs, starting from 0, available on the node. Try

Code Block

language	bash

salloc -p gpu-devel -N2 --gpus-per-node=4 -t 00:05:00 -J "GPU-ID"  # Note: using default --ntasks-per-node=1
srun nvidia-smi -L
srun --ntasks-per-node=4 nvidia-smi -L
srun --ntasks-per-node=2 --gpus-per-node=3 nvidia-smi -L
exit               # Release salloc
myqueue            # Check that no "GPU-ID" job still running

In this case, you can use SLURM_LOCALID or others to set CUDA_VISIBLE_DEVICES of each task. For example, you could use a wrapper script mentioned in HPE intro_mpi (Section 1) or you could devise an algorithm and use torch.cuda.set_device in PyTorch as demonstrated here.

On the other hands, when using --gpus-per-task or --ntasks-per-gpu to bind resources, the GPU IDs seen by each task will start from 0 (CUDA_VISIBLE_DEVICES) but will be bound to a different GPU/UUID. Try

Code Block

language	bash

salloc -p gpu-devel -N1 --gpus=4 -t 00:05:00 -J "GPU-ID"  # Note: using default --ntasks-per-node=1
srun --ntasks=4 --gpus-per-task=1 nvidia-smi -L
srun --ntasks-per-gpu=4 nvidia-smi -L
exit              # Release salloc
myqueue           # Check that no "GPU-ID" job still running

However, it is stated in HPE intro_mpi (Section 1) that using these options with CrayMPICH could introduce an intra-node MPI performance drawback.

Note
For hybrid (MPI+Multi-threading) applications, it is essential to specify `-c` or `--cpus-per-tasks` options for `srun` to prevent a potential decrease in performance (~50%>10%) due to improper CPU binding.

...

Info
You can test your initial script on compute-devel or gpu-devel partitions, using `#SBATCH -t 02:00:00`, since they normally have a shorter queuing time.

Your entire job script will only run on the first requested node (${SLURMD_NODENAME}). Only the lines starting with srun could initiate process and run on the other nodes.

...

Example

MiniWeather (cpeCray + cudatoolkit)

Installation guide

...

Versions Compared

Old Version 8

New Version Current

Key

1.1 HPE Cray Programming Environment

GPU acceleration

Build target

Cray optimized libraries

CPE version

1.2 ThaiSC pre-built modules

CPE toolchain

ThaiSC modules

GPU acceleration

Build target

Cray optimized libraries

CPE version

1.2 ThaiSC pre-built modules

CPE toolchain

ThaiSC modules

2. Building an application software

2.1 Compiler wrapper

2.2 Build tools

2.3 Related topics

Local module & EasyBuild

Useful compiler flags

Other approach

3. Running the software

3.1 Writing a job script

2. Building an application software

2.1 Compiler wrapper

2.2 Build tools

2.3 Related topics

3. Running the software

3.1 Writing a job script

Example

Installation guide

Page Comparison

Versions Compared

Old Version 8

New Version Current

Key

1.1 HPE Cray Programming Environment

GPU acceleration

Build target

Cray optimized libraries

CPE version

1.2 ThaiSC pre-built modules

CPE toolchain

ThaiSC modules

GPU acceleration

Build target

Cray optimized libraries

CPE version

1.2 ThaiSC pre-built modules

CPE toolchain

ThaiSC modules

2. Building an application software

2.1 Compiler wrapper

2.2 Build tools

2.3 Related topics

Local module & EasyBuild

Useful compiler flags

Other approach

3. Running the software

3.1 Writing a job script

2. Building an application software

2.1 Compiler wrapper

2.2 Build tools

2.3 Related topics

3. Running the software

3.1 Writing a job script

Example

Installation guide