Code Development on Grex

Introduction#

Grex comes with a sizable software stack that contains most of the software development environment for typical HPC applications. This section of the documentation covers best practices for compiling and building your own software on Grex.

On Grex, login nodes can be used to compile software and to run short interactive and/or test runs. All other jobs must be submitted to the batch system. User sessions on the login nodes are limited by cgroups to prevent resource congestion. Thus, it sometimes makes sense to perform some of the code development in interactive jobs, in cases such as (but not limited to):

(a) the build process and/or tests requires heavy, many-core computations, …
(b) you need access to specific hardware that is not present on the login nodes, such as GPUs and newer/different CPUs.

Most of the software on Grex is available through environmental modules. It is almost always necessary to use modules to load current C, C++, Fortran compilers and Python interpreter. To find a software development tool or a library to build your code against, the module spider command is a good start. The applications software is usually installed by us from sources, into subdirectories under /global/software .

It is almost always better to use communication libraries (MPI) provided on Grex rather than building your own, because ensuring tight integration of these libraries with our SLURM scheduler and low-level, interconnect-specific libraries might be tricky.

General Linux Base OS notes#

The base operating system on Grex is a RedHat type of Linux. For many years it used to be a CentOS Linux. Since 2024, we have switched to Alma Linux which is a community owned and governed, RedHat-style distribution. The current OS is Alma Linux 8.

Alma Linux OS comes with its set of development tools, and RedHat environment does provide various developer toolsets and software channels. However, due to the philosophy of RedHat being stable, server-focused distribution, the tools are usually rather old. For example, cmake and git and gcc and python are always a couple of years behind the current versions. Therefore, even for these basic tools you more likely than non would want to load a module with newest versions of these tools:

module load git
module load cmake

Alma Linux also has its system versions of Python, Perl, and GCC compilers. When no modules are loaded, the binaries of these will be available in the PATH. The purpose of these is to make some systems scripts possible, to compile OS packages, drivers and so on. We suggest using these tools using Modules and one of our Software Stacks instead.

We do not install many packages for the dynamic languages (such as python-something) in the base OS level, because it makes maintaining different versions of them complicated. Use the module spider command to find a version of Perl, Python, R, etc. to suit your needs. The same applies to compiler suites like GCC and Intel.

We do install AlmaLinux packages with OS that are:

base OS packages necessary for functioning
graphical libraries that have many dependencies
never change versions that are not critical for performance and/or security.

Here are some examples: FLTK, libjpeg, PCRE, Qt and Gtk. Login nodes of Grex have many ‘’-devel’’ packages installed, while compute nodes do not because we want them lean and quickly re-installable. Therefore, compiling codes that requires ‘’-devel’’ base OS packages might fail on compute nodes. Contact us if something like that happens when compiling or running your applications.

Finally, because HPC machines are shared systems and users do not have sudo access, following some instructions from a Web page that asks for apt-get install this or yum install that will fail. Rather, module spider should be used to see if the package you want is already installed and available as a module. If not, you can always contact support and ask for help to install the program either under your account or as a module when possible.

Compilers and Toolchains#

Due to the hierarchical nature of our Lmod modules system, compilers and certain core libraries (MPI and CUDA) form toolchains. Normally, you would need to choose a compiler suite (GCC or Intel or AOCC) and, in case of parallel applications, a MPI library (OpenMPI or IntelMPI). These come in different versions. Also, you’d want to know if you want CUDA should your applications be able to utilize GPUs. A combination of compiler/version, MPI/version and possibly CUDA makes a toolchain. Toolchains are mutually exclusive; you cannot mix software items compiled with different toolchains!

See Using Modules page for more information.

There is no module loaded by default! There will be only the system’s GCC-8 and no MPI whatsoever. To get started, load an Architecture module, then a compiler/version. Then, if necessary, an MPI (openmpi or intelmpi). If GPUs are required, a CUDA module would be needed to load first because it forms a root of GPU-enabled toolchains.

A typical sequence of commands to get an environment with new Intel-One compiler and OpenMPI, for AVX512 compute nodes is as follows:

module load arch/avx512
module load intel-one/2024.1
module load openmpi/4.1.6

The example below is for GCC 13 and openmpi:

module load arch/avx512
module load gcc/13.2.0
module load openmpi/4.1.6

Compiler modules would set standard system environment variables ($CC, $FC and $CXX) for compiler names. The MPI wrappers (mpicc, mpicxx, mpif90 or mpifort … etc.) will be set correctly by MPI modules to point to the right compilers.

Intel compilers suite#

Intel had been providing an optimizing compiler suite for Intel x86_64 CPU architectures for many years. Since 2023, the venerable “classic” Intel compilers were gradually discontinued and in 2024 replaced by a new Intel-OneAPI compilers suite based on the open source LLVM/Clang codebase. The “classic” compilers (icc, icpc, ifort) are replaced in the OneAPI suite with the new icx, icpx, ifx correspondigly.

As a result of our CentOS to AlmaLinux upgrade, all the older Intel compiler versions were obsoleted and removed from the local Grex software stack.

The name for the Intel suite modules is intel; module spider intel is the command to find available Intel “classic” versions. Latest Intel Classic compilers did include both “classic” and LLVM compilers. On Grex local software stack, the name for the new Intel OneAPI compilers suite modules is intel-one; module spider intel-one is the command to find available Intel OneAPI versions. These will not contain old compilers like icc anymore.

The Intel compilers suite also provides tools and libraries such as MKL (Linear Algebra, FFT, etc.), Intel Performance Primitives (IPP), Intel Threads Building Blocks (TBB), and VTune . Intel MPI as well as MKL for GCC compilers are available as separate modules, should they be needed for use separately. Both classic and OneAPI compiler suites come with and can use optimized performance libraries: Intel MKL, TBB and IPP.

GCC compilers suite#

GCC stands for “GNU Compiler Collection” and includes C, C++ and Fortran languages (as well as many other optionally).

The module name for GCC is gcc, as in module spider gcc.

Multi-lib GCC is not supported, thus all the GCC modules are strictly 64-bit, and thus unable to compile legacy 32-bit programs.

Recent GCC versions have a good support for AVX512 CPU instructions on both Intel and AMD CPUs. However, care may be taken with -march=native because the subsets of AVX512 implemented on Intel Cascade Lake and AMD Genoa CPUs may be different, so it does matter on which host a given code had been compiled.

AOCC comilers suite#

AMD AOCC is an optimized compiler collection for C, C++ and Fortran. It generates code optimized for AMD CPUs, like for newest Zen4 and Zen5 architectures.

The module name for AOCC compiler bunlde is aocc, as in module spider aocc. The compilers are called clang , clang++ and flang .

MPI and Interconnect libraries#

The standard distribution of MPI on Grex is OpenMPI . We build most of the software with it. To keep compatibility with the old Grex software stack, we name the modules openmpi. MPI modules depend on the compiler they were built with, which means that a compiler module should be loaded first; then the dependent MPI modules will become available as well. Changing the compiler module will trigger automatic MPI module reload. This is how the Lmod hierarchy works now.

For a long time Grex was using the interconnect drivers with ibverbs packages from the IB hardware vendor, Mellanox. It is no longer the case: for CentOS-7, we have switched to the vanilla Linux InfiniBand drivers, the open source RDMA-core package, and OpenUCX libraries. The current version of UCX on Grex is 1.6.1. Recent versions of OpenMPI (3.1.x and 4.0.x) do support UCX. Also, our OpenMPI is built with process management interface versions PMI1, PMI2 and PMIx4, for tight integration with the SLURM scheduler.

The current default and recommended version of MPI is OpenMPI 4.1.

#load a compiler module first!
module load openmpi/4.1.6

All MPI modules, be that OpenMPI or Intel, will set MPI compiler wrappers such as mpicc, mpicxx, mpif90 to the compiler suite they were built with. The typical workflow for building parallel programs with MPI would be to first load a compiler module, then an MPI module, and then use the wrapper of C, C++ or Fortran in your makefile or build script.

In case a build or configure script does not want to use the wrapper and needs explicit compiler and link options for MPI, OpenMPI wrappers provide the --show option that lists the required command line options. Try for example:

mpicc --show

to print include and library flags to the C compiler to be linked against the currently loaded OpenMPI version.

There is also IntelMPI, for which the modules are named intelmpi. See the notes on running MPI applications under SLURM here .

Linear Algebra BLAS/LAPACK#

Linear Algebra packages are used in most of the STEM research software. A very popular suite of libraries are BLAS and LAPACK from NetLib , written in Fortran and C. However, modern CPU architectures with its complex instruction sets and memory hierarchies are too complex for code generation. Various optimizations allow to improve BLAS and LAPACK performance at least tenfold as compared to the reference Netlib versions. Thus, it is always a good idea to use the Linear Algebra libraries that are optimized for a given CPU architecture. Such as vendor-optimized Intel MKL and AMD AOCL, OpenBLAS, or similar. These libraries are provided as modules on HPC systems.

It is worth noting that the linear algebra libraries might come with two versions: one 32-bit array indexes, another full 64-bit. Users must pay attention and link against the proper version for their software (that is, a Fortran code with -i8 or -fdefault-integer-8 would link against 64-bit pointers BLAS).

Intel MKL#

The fastest BLAS/LAPACK implementation from Intel, for Intel CPUs. With Intel compilers, it can be used as a convenient compiler flag, -mkl or if threaded version is not needed, -mkl=sequential.

With both Intel and GCC compilers, the MKL libraries can be linked explicitly with compiler/linker options. The base path for MKL includes and libraries is defined as the MKLROOT environment variable. For GCC compilers, module load mkl is needed to add MKLROOT to the environment. There is a command line advisor Website to pick the correct order and libraries. Libraries with the _ilp64 suffix are for 64-bit indexes while _lp64 are for the default, 32-bit indexes.

Note that when the milti-threaded MKL is used, the number of threads is controlled with the MKL_NUM_THREADS environment variable. On the Grex software stack, it is set by the MKL module to 1 to prevent accidental CPU oversubscription. Redefine it in your SLURM job scripts if you really need threaded MKL execution as follows:

export MKL_NUM_THREADS=$SLURM_CPUS_PER_TASK

We use the MKL’s BLAS and LAPACK for compiling R and Python’s NumPY package on Grex, and that’s one example when threaded MKL can speed up computations if the code spends significant time in linear algebra routines by using SMP.

MKL distributions also include ScaLAPACK and FFTW libraries.

OpenBLAS#

The successor and continuation of the famous GotoBLAS2 library. It contains both BLAS and LAPACK in a single library, libopenblas.a . Only the BLAS portion of the library is CPU-optimized though, so performance of LAPACK would lag behind Intel MKL. Use ‘‘module spider openblas’’ to find available versions for a given compiler suite. We provide both 32-bit and 64-bit indexes versions (and reflect it in the version names, like openblas/0.3.7-i32). The performance of OpenBLAS is close to that of MKL.

OpenBLAS does not contain ScaLAPACK, which would have to be loaded as a separate module.

AMD AOCL#

AMD provides its vendor-optimized version of Blis and FLAME libraries which are modern, C++ template based implementations of BLAS and LAPACK. Use ‘‘module spider aocl’’ to see how to load it. AOCL also includes ScaLAPACK for OpenMPI.

ScaLAPACK#

ScaLAPACK is “a library of high-performance linear algebra routines for parallel distributed memory machines.” It is almost always used together with an optimized BLAS/LAPACK implementation. Being a parallel library, ScaLAPACK depends on BLACS, which in turn depends on an MPI library.

Intel MKL includes ScaLAPACK with support of both IntelMPI and OpenMPI for the BLACS layer. MKL also provides ScaLAPACK in both 32bit and 64bit interfaces. It is thus necessary to pick the right library to link against. The command line advisor is helpful for that.

AMD AOCL includes a single ScaLAPACK library compatible with OpenMPI 4.1.

OpenBLAS does not come with ScaLAPAC and needs the separate module loaded for the later.

Fast Fourier Transform (FFTW)#

FFTW3 is the standard and well performing implementation of FFT. module spider fftw should find it. There is a parallel version of the FFTW3 that depends on MPI it uses, thus to load the fftw module, compiler and MPI modules would have to be loaded first. MKL also provides FFTW bindings, which can be used as follows:

Either Intel or GCC MKL modules would set the MKLROOT environment variable and add necessary directories to LD_LIBRARY_PATH. The MKLROOT is handy when using explicit linking against libraries. It can be useful if you want to select a particular compiler (Intel or GCC), pointer width (the corresponding libraries have suffix _lp64 for 32-bit pointers and_ilp64 for 64 bit ones; the later is needed for, for example, Fortran codes with INTEGER*8 array indexes, explicit or set by -i8 compiler option) and a kind of MPI library to be used in BLACS (OpenMPI or IntelMPI which both are available on Grex). An example of the linker options to link against sequential, 64 bit pointed version of BLAS, LAPACK for an Intel Fortran code is:

ifort -O2 -i8 main.f -L$MKLROOT/lib/intel64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lpthread -lm

MKL also has FFTW bindings. They must be enabled separately from the general Intel compilers installation; and therefore, details of the usage might be different between different clusters. On Grex, these libraries are present in two versions: 32-bit pointers (libfftw3xf_intel_lp64) and 64-bit pointers (fftw3xf_intel_ilp64). To link against these FFT libraries, the following include and library options to the compilers can be used (for the _lp64 case):

-I$MKLROOT/include/fftw -I$MKLROOT/interfaces/fftw3xf -L$MKLROOT/interfaces/fftw3xf -lfftw3xf_intel_lp64

The above line is, admittedly, rather elaborate but gives the benefit of compiling and building all the code with MKL, without the need for maintaining a separate library such as FFTW3.

AOCL provides an optimized FFTW dynamic library included in the aocl module.

HDF5 and NetCDF#

Popular hierarchical data formats. Two versions exist on the Grex software stack, one serial and another MPI-dependent version. Which one you load depends whether MPI is loaded.

To see the available versions, use:

module spider hdf5

and/or:

module spider netcdf

Python#

There are modules for Python versions that we build from source using optimizations specific to our HPC hardware.

Note that the base OS python should in most cases not be used; rather find and use a module!

module spider python

We do install certain most popular python modules centrally. pip list would show the installed modules.

R#

We build R from sources and link against MKL. We find that some packages would only work with GCC-compiled versions of R, so R requires using one of GCC toolchains.

module spider "r"

Several of the most popular R packages are installed with the R modules on Grex. Note that it is often the case that R packages are bindings for some other software (JAGS, GEOS, GSL, PROJ, etc.) and require the software or its dynamic libraries to be available at runtime. This means, the modules for the dependencies (JAGS, GEOS, GSL, PROJ) are also to be loaded when R is loaded.