Software installation guideline


1. Preparing software environment

This section offers guidelines on setting up an environment for building and running application software on LANTA.

There are mainly three approaches in preparing an environment on LANTA:

Users should select one over another. They should not be mixed, since library conflicts may occur.

1.1 HPE Cray Programming Environment

LANTA is an HPE Cray EX cluster. On the system, HPE Cray Programming Environment (PrgEnv or CPE) is installed by the vendor and is preferred. The environment provides a uniform interface across different sets of compiler and libraries. Below are the modules for each available compiler suite.

Module name

Description

Note

Module name

Description

Note

PrgEnv-gnu

GNU compiler suite

-

PrgEnv-intel

INTEL compiler suite

Intel oneAPI (default), with MKL

PrgEnv-cray

Cray Compiling Environment (CCE)

Loaded by default, upon login

PrgEnv-nvhpc

NVIDIA HPC SDK compiler suite

Inherently with CUDA

PrgEnv-aocc

AMD AOCC compiler suite

Without AOCL

  • Execute cc --version, CC --version, or ftn --version to check which compiler is being used.

  • With PrgEnv-intel loaded, ${MKLROOT} is set to the corresponding Intel Math Kernal Library (MKL).

  • By the defaults of PrgEnv-intel, the C/C++ compiler is ICX/ICPX while the Fortran compiler is IFORT.

  • To use only Intel Classic, execute module swap intel intel-classic after loading PrgEnv-intel

  • To use only Intel oneAPI, execute module swap intel intel-oneapi after loading PrgEnv-intel

  • With PrgEnv-nvhpc loaded, ${NVIDIA_PATH} is set to the corresponding NVIDIA SDK location.

  • There is PrgEnv-nvidia, but it will become deprecated soon, so it is not recommended.

  • With PrgEnv-aocc loaded, ${AOCC_PATH} is set to the corresponding AOCC location.


--------------------------------- /opt/cray/pe/lmod/modulefiles/core --------------------------------- PrgEnv-aocc (2) cce (3) cray-libpals (3) craypkg-gen (2) nvhpc (2) PrgEnv-cray (2) cpe-cuda (3) cray-libsci (3) cudatoolkit (6) nvidia (2) PrgEnv-gnu (2) cpe (3) cray-mrnet (2) gcc (3) papi (3) PrgEnv-intel (2) cray-R (2) cray-pals (3) gdb4hpc (3) perftools-base (3) PrgEnv-nvhpc (2) cray-ccdb (2) cray-pmi (3) intel-classic (2) sanitizers4hpc (2) PrgEnv-nvidia (2) cray-cti (5) cray-python (2) intel-oneapi (2) valgrind4hpc (3) aocc (2) cray-dsmml (1) cray-stat (2) intel (2) atp (3) cray-dyninst (2) craype (3) iobuf (1) --------------------------------- /opt/cray/pe/lmod/modulefiles/craype-targets/default ---------------------------------- craype-x86-milan (1) craype-accel-nvidia80 (1) ... other modules ...

GPU acceleration

For building an application with GPU acceleration, users can use either PrgEnv-nvhpc, cudatoolkit/<version> or nvhpc-mixed. We recommend using PrgEnv-nvhpc for completeness.

  • [Feb 2024] According to the information from nvidia-smi, NVIDIA driver version on LANTA is currently525.105.17 which supports CUDA version up to 12.0.

  • Please be aware of the compatibility between CUDA and cray-mpich. For instance, CUDA 12 is supported only since cray-mpich/8.1.27.

  • For cross-compile, you may have to use nvcc -ccbin cc to compile .cu

  • You have to manually pass the respective OpenACC/OpenMP offloading flags of each compiler.

  • You may have to pass -lcudart (cudatoolkit), -L${NVIDIA_PATH}/cuda/lib64 -lcudart -lcuda (nvhpc-mixed) and possibly a few others at linking phase (or by setting them in LDFLAGS before configure).

Build target

To enable optimizations that depend on the hardware architecture of LANTA, the following modules should be loaded together with PrgEnv.

Module name

Hardware target

Note

Module name

Hardware target

Note

craype-x86-milan

AMD EPYC Milan (x86)

-

craype-accel-nvidia80

NVIDIA A100

Load after PrgEnv-nvhpc, cudatoolkit or nvhpc-mixed

  • If craype-x86-milan is not loaded when using cc, CC or ftn, you may get a warning
    No supported cpu target is set, CRAY_CPU_TARGET=x86-64 will be used. Load a valid targeting module or set CRAY_CPU_TARGET

  • With the target modules loaded, -march, -mtune, -hcpu, -haccel, -tp, -gpu, ... will be automatically filled in by cc, CC, ftn. To visualize this, try cc --version -craype-verbose.

Cray optimized libraries

Most Cray optimized libraries become accessible only after loading a PrgEnv, ensuring compatibility with the selected compiler suite. Additionally, some libraries, such as NetCDF, require loading other specific libraries first. Below is the hierarchy of commonly used cray-* modules.

LANTA-cray-mod_highdef.png
Figure 1: A hierarchy of HPE Cray modules
module purge module load craype-x86-milan module load PrgEnv-intel module load cray-hdf5-parallel module load cray-netcdf-hdf5parallel module list # Without version specified, the latest version will be loaded.

CPE version

To ensure backward compatibility after a system upgrade, it is recommended to fix the Cray Programming Environment version using either cpe/<version> or cpe-cuda/<version>. Otherwise, the most recent version will be loaded by default.

When unloading cpe or cpe-cuda, you will encounter,

Unloading the cpe module is insufficient to restore the system defaults.
Please run 'source /opt/cray/pe/cpe/<version>/restore_lmod_system_defaults.[csh|sh]'.

This simply means that the remaining modules are still of the cpe/<version> that you just unloaded. They are not automatically reverted back to the system default version. It is necessary to execute the above command to manually reload and restore them to the defaults.

On the other hand, if you use module purge, you can safely ignore the above message.

1.2 ThaiSC pre-built modules

For user convenience, we provide several shared modules of some widely used software and libraries. These modules were built on top of the HPE Cray Programming Environment, using the CPE toolchain.

CPE toolchain

A CPE toolchain module is a bundle of craype-x86-milan, PrgEnv-<compiler> and cpe-cuda/<version>. The module is defined as a toolchain for convenience and for use with EasyBuild, the framework used for installing most ThaiSC modules.

ThaiSC modules

All ThaiSC modules are located at the same module path, so there is no module hierarchy. Executing module avail on LANTA will display all available ThaiSC modules. For a more concise list, you can use module overview, then, use module spider <name> to learn more about each specific module.

Users can readily use ThaiSC modules and CPE toolchains to build their applications. Some popular application software are pre-installed as well, for more information, refer to Applications usage.

username@lanta-xname:~> module overview --------------------------------- /lustrefs/disk/modules/easybuild/modules/all --------------------------------- ADIOS2 (2) GATK (1) NASM (1) Tcl (2) groff (2) ATK (2) GDAL (2) NLopt (2) Tk (2) hwloc (1) Amber (1) GEOS (2) Nextflow (2) Trimmomatic (1) intltool (1) Apptainer (1) GLM (2) Ninja (1) UDUNITS2 (1) jbigkit (2) Armadillo (2) GLib (2) OSPRay (2) VASP (3) libGLU (2) AutoDock-vina (1) GMP (3) OpenCASCADE (2) VCFtools (1) libaec (2) Autoconf (1) GObject-Introspection (2) OpenEXR (2) WPS (2) libdeflate (2) Automake (1) GROMACS (2) OpenFOAM (2) WRF (1) libdrm (2) Autotools (1) GSL (3) OpenJPEG (2) WRFchem (2) libepoxy (2) BCFtools (1) Gaussian (1) OpenMPI (1) Wayland (2) libffi (1) BEDTools (1) GenericIO (2) OpenSSL (1) X11 (2) libgeotiff (2) BLAST+ (1) Gmsh (1) OpenTURNS (2) XZ (3) libglvnd (2) BLASTDB (1) Go (1) PCRE (1) Xerces-C++ (1) libiconv (2) BWA (1) HDF-EOS (2) PCRE2 (1) Yasm (1) libjpeg-turbo (3) BamTools (1) HDF (2) PDAL (1) arpack-ng (2) libpciaccess (2) Beast (1) HTSlib (1) PETSc (2) assimp (2) libpng (3) Bison (1) HYPRE (2) PROJ (2) at-spi2-atk (2) libreadline (2) Blosc (2) HarfBuzz (2) Pango (2) at-spi2-core (2) libtirpc (2) Boost (4) ICU (3) ParFlow (1) aws-ofi-nccl (1) libtool (1) Bowtie (1) Imath (2) ParMETIS (2) beagle-lib (1) libunwind (2) Bowtie2 (1) JasPer (3) ParaView (1) binutils (1) libxml2 (3) Brotli (2) Java (2) ParallelIO (1) bzip2 (3) lz4 (3) C-Blosc2 (2) KaHIP (2) Perl (2) cURL (1) minimap2 (1) CFITSIO (2) LAME (1) PostgreSQL (2) cairo (2) nccl (1) CGAL (2) LLVM (1) QuantumESPRESSO (2) canu (1) ncurses (2) CMake (2) LMDB (1) RAxML-NG (1) cpeCray (1) nlohmann_json (1) CrayNVHPC (1) LibTIFF (2) SAMtools (1) cpeGNU (1) numactl (1) DB (2) M4 (1) SCOTCH (2) cpeIntel (1) pixman (1) DBus (1) MAFFT (1) SDL2 (2) ecCodes (2) pkgconf (1) ESMF (2) METIS (2) SLEPc (2) expat (2) tbb (1) EasyBuild (1) MPC (2) SPAdes (1) flex (1) termcap (1) Eigen (1) MPFR (2) SQLite (2) fontconfig (2) x264 (1) FDS (1) MUMPS (3) SWIG (3) freetype (2) x265 (1) FFmpeg (2) Mako (2) SYCL (1) gettext (1) xorg-macros (2) FastQC (1) Mamba (1) SZ (2) git-lfs (1) xprop (2) FortranGIS (2) Mesa (2) SpectrA (1) googletest (1) zfp (2) FreeXL (2) Meson (2) SuiteSparse (2) gperf (1) zlib (2) FriBidi (2) MrBayes (1) SuperLU_DIST (2) gperftools (2) zstd (3)

2. Building an application software

After an appropriate environment is loaded, this section provides guidelines on how to use it to build an application software on LANTA.

2.1 Compiler wrapper

<wrapper> command

Description

Manual

In substitution for

cc

C compiler wrapper

man cc or cc --help

mpicc / mpiicc

CC

C++ compiler wrapper

man CC or CC --help

mpic++ / mpiicpc

ftn

Fortran compiler wrapper

man ftn or ftn --help

mpif90 / mpiifort

The Cray compiler wrappers, namely, cc, CC and ftn, become available after loading any PrgEnv-<compiler> or CPE toolchain. Upon being invoked, the wrapper will pass relevant information about the cray-* libraries, loaded in the current environment, to the underlying <compiler> to compile source code. It is recommended to use these wrappers for building MPI applications with the native Cray MPICH library cray-mpich.

Adding -craype-verbose to the wrapper when compiling a source file will display the final command executed. To see what will be added before compiling, try <wrapper> --cray-print-opts=all.

2.2 Build tools

Several tools exist to help us build large and complex programs. Among them, GNU make and CMake are commonly used. The developer team for each software chooses what build tool they will support. Therefore, it is important to thoroughly read the software documentation. For some software, users might need to additionally load the latest CMake or Autotools modules on the system (e.g., module load CMake/3.26.4).

There are three general stages in building a program using a build tool: configure, make and make install. For more information, see Basic Installation.

Build tools typically detect compilers through environment variables such as CC, CXX, and FC at the configure stage. Therefore, setting these variables before running configure should be sufficient to make the tool use the Cray compiler wrappers.

Nevertheless, if the CMake cache is not clean, you might need to explicitly use:

We encourage users to manually specify the installation path using --prefix= or -DCMAKE_INSTALL_PREFIX= as shown above. This path can be within your project home, such as /project/ltXXXXXX-YYYY/<software-name>, allowing you to manage permissions and share the installed software with your project members. By default on LANTA, your team will be able to read and execute your software but cannot make any changes inside the directory you own.

After these steps, you should be able to execute make and make install, then build your software as you would on any other system.

2.3 Related topics


3. Running the software

Every main application software must run on compute/gpu/memory nodes. The recommended approach is to write a job script and send it to Slurm scheduler through sbatch command.

Only use sbatch <job-script>. If users execute bash <job-script> or just simply ./<job-script>, then the script will not get sent to Slurm and will run on frontend node instead!

3.1 Writing a job script

The above job script template consists of five sections:

1. Slurm sbatch header
The #SBATCH directives can be used to specify sbatch options that mostly unchanged, such as partition, time limit, billing account, and so on. For optional options like job name, users can specify them when submitting the script (see Submitting a job). For more details regarding sbatch options, please visit Slurm sbatch.

Mostly, Slurm sbatch options only define and request computing resources that can be used inside a job script. The actual resources used by a software/executable can be different depending on how it will be invoked/issued (see Stage 5), although these sbatch options are passed and become the default options for it. For GPU jobs, using either --gpus or --gpus-per-node to request GPUs at this stage will provide the most flexibility for the next stage, GPU binding.

If your application software only supports parallelization by multi-threading, then your software cannot utilize resources across nodes; in this case, therefore, -N, -n, --ntasks and --ntasks-per-node should be set to 1.

2. Loading modules
It is advised to load every module used when installing the software in the job script, although build dependencies such as CMake, Autotools, and binutils can be omitted. Additionally, those modules should be of the same version as when they were used to compile the program.

3. Adding software paths
The Linux OS will not be able to find your program if it is not in its search paths. The commonly used ones are namely PATH (for executable/binary), LD_LIBRARY_PATH (for shared library), and PYTHONPATH (for python packages). Users MUST append or prepend them using syntax such as export PATH=<software-bin-path>:${PATH}, otherwise, prior search paths added by module load and others will disappear.
If <your-install-location> is where your software is installed, then putting the below commands in your job script should be sufficient in most cases.

Some of them can be omitted if there no such sub-directory when using ls <your-install-location>.

4. Setting environment variables
Some software requires additional environment variables to be set at runtime; for example, the path to the temporary directory. Output environment variables set by Slurm sbatch (see Slurm sbatch - output environment variables) could be used to set software-specific parameters.
For application with OpenMP threading, OMP_NUM_THREADS, OMP_STACKSIZE, ulimit -s unlimited are commonly set in a job script. An example is shown below.

5. Running your software
Each software has its own command to be issued. Please read the software documentation and forum. Special attention should be paid to how the software recognizes and maps computing resources (CPU-MPI-GPU); occasionally, users may need to insert additional input arguments at runtime. The total resources concurrently utilized in this stage should be less than or equal to the resources previously requested in Stage 1. Oversubscribing resources can reduce overall performance and could cause permanent damage to the hardware.

Usually, either srun, mpirun, mpiexec or aprun is required to run MPI programs. On LANTA, srun command MUST be used to launch MPI processes. The table below compares a few options of those commands.

Command

Total MPI processes

CPU per MPI process

MPI processes per node

Command

Total MPI processes

CPU per MPI process

MPI processes per node

srun

-n, --ntasks

-c, --cpus-per-task

--ntasks-per-node

mpirun/mpiexec

-n, -np

--map-by socket:PE=N

--map-by ppr:N:node

aprun

-n, --pes

-d, --cpus-per-pe

-N, --pes-per-node

There is usually no need to explicitly add option to srun since, by default, Slurm will automatically derive them from sbatch, with the exception of --cpus-per-task.

For hybrid (MPI+Multi-threading) applications, it is essential to specify -c or --cpus-per-tasks options for srun to prevent a potential decrease in performance (>10%) due to improper CPU binding.

3.2 Submitting a job

To submit your job script (e.g., job-script.sh) to Slurm, execute sbatch [options] job-script.sh [arguments]. For example,

If your software asks for a case directory where all inputs must be in, you may need to submit your job inside that directory, or use -D, --chdir option.


Example

Installation guide

Reference