CVMFS and the Alliance software stack

CC CernVMFS on Grex#


CVMFS or CernVM FS stands for CernVM File System. It provides a scalable, reliable and low-maintenance software distribution service. CVMFS was originally developed to assist High Energy Physics (HEP) collaborations to deploy software on the worldwide-distributed computing infrastructure used to run data processing applications. Since then it has been used as a a generic way of distributing software. Presently, we use CernVMFS (CVMFS) to provide the Alliance’s (or Compute Canada’s) software stack. Through the Alliance CVMVS servers, several other publically available CVMFS software repositories are available as well. The examples are a Singularity/Apptainer repository from OpenScienceGrid , Extreme-Scale Scientific Software Stack E4S , and a Genomics software colection (GenPipes/MUGQIC) from C3G . Note that we can only “pull” the software from these repositories. To actually add or change software, datasets, etc., or receive support, the respective organizations controlling these CVMFS repositories should be contacted directly.

Access to the software and data distributed via CVMFS should be transparent to the Grex users: no action is needed other than loading a software module or setting a path. However, for accessing the Compute Canada software stack, a module should always be loaded to switch between software environments.

Grex does not have a local CVMFS “stratum” (that is, a replica server). All we do is to cache the software items as they get requested. Thus, there can be a delay associated with pulling a software item for the first time, from the Alliance’s Stratum 1 (Replica Servers) located at the National HPC sites. It usually does not matter for serial programs but parallel codes, that rely on simultaneous process spawning across many nodes, might cause timeout errors. Thus, it could be useful to first access the codes in a small interactive job to warm up Grex’s local CVMFS cache.

The Alliance’s software stack#


The main reason for having CVMFS supported on Grex is to provide Grex users with the software environment as similar as possible with the environment existing on National Alliance’s HPC machines. On Grex, the module tree from Compute Canada software stack is not set as default, but has to be loaded with the following commands:

module purge
module load CCEnv
module load arch/avx512
module load StdEnv/2023

After the above commands, use module spider to search for any software that might be available in the CC software stack.

Note that “default” environments (the StdEnv and arch modules of the CC stack) are not loaded automatically, unlike on CC / Alliance general purpose (GP) HPC machines. Therefore, it is a good practice to load these modules right away after the CCEnv module.

There is more than one StdEnv version to choose from. The example above is for the current StdEnv/2023 . Each “Standard Environment” of the ComputeCanada software stack provides an “OS Compatibility Layer” in form of gentoo or nixpkgs base OS packages, and a set version of Core GCC compilers and GCC and Intel toolchains.

There are several CPU architectures in the CC software stack. They differ in the CPU instruction set used by the compilers, to generate the binary code. The default for legacy systems like Grex used to be the lowest SSE3 architectures arch/sse3. It ensures that there is no failure on the legacy Grex nodes (which are of NEHALEM, SSE4.2 architecture) due to more recent instructions like AVX, AVX2 and AVX512 that were added by Intel afterwards. However, the current StdEnv/2023 does not support the old CPUs anymore, and thus CCEnv must be used only on the newer Grex partitions that support arch/avx2 and arch/avx512 (that is, on every partition other than “compute”).

Some of the software items on CC software stack might assume certain environment variables set that are not present on Grex; one example is SLURM_TMPDIR. In case your script fails for this reason, the following line could be added to the job script:

export SLURM_TMPDIR=$TMPDIR

While a majority of CC software stack is built using OpenMPI, some items might be based on IntelMPI. These will require following additional environment variables to be able to integrate with SLURM on Grex:

export I_MPI_PMI_LIBRARY=/opt/slurm/lib/libpmi.so
export I_MPI_FABRICS_LIST=shm:dapl

If a script assumes, or relies on using the mpiexec.hydra launcher, the later might have to be provided with -bootstrap slurm option.

How to find software on CC CVMFS#


Compute Canada’s software building system automatically generates documentation for each item, which is available at the Available Software page. So, the first destination to look for a software item is probably to browse this page. Note that this page covers the default CPU architectures (AVX2, AVX512) of the National systems, and legacy architectures (SSE3, AVX) might not necessarily have each of the software versions and items compiled for them.

The module spider command can be used on Grex to search for modules that are actually available. Note that the CCEnv software stack is not loaded by default; you would have to load it first to enable the spider command to search through the CC software stack. The the example below is for the Amber MM software:

module purge
module load CCEnv
module load arch/avx512 
module load StdEnv/2023
module spider amber

One of the available versions of Amber as returned by the commands above, would be amber/22.5-23.5 . A subsequent command module spider amber/22.5-23.5 would then provide dependencies. Then, when finding available software versions and their dependencies, module load commands can be used, as described here or here

How to request software added to CC CVMFS#


The Alliance (formerly Compute Canada) maintains and distributes the software stack as part of its mandate to maintain the National HPC systems. To request a software item installed, the requestor should have an account in CCDB , which is also a prerequisite to have access to Grex. Any CCDB user can submit such a request to support@tech.alliancecan.ca .

An example, R code with dependencies from CC CVMFS stack#

A real-world example of using R on Grex, with several dependencies required for the R packages.

For dynamic languages like R and Python, the Alliance (formerly known as Compute Canada) does not, in general, provide or manage pre-installed packages. Rather, users are expected to load the base R (Python, Perl, Julia) module and then proceed for the local installation of the required R (or Python, Perl, Julia etc.) packages in their home directories. Check the R documentation and Python documentation .

Script example for running R using the Alliance's software stack (CC cvmfs)
run-r-cc-cvmfs.sh
#!/bin/bash

#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=4000M
#SBATCH --time=0-72:00:00
#SBATCH --job-name="R-gdal-jags-bench"

# Load the modules:

module load CCEnv
module load nixpkgs/16.09 gcc/5.4.0
module load r/3.5.2 jags/4.3.0 geos/3.6.1 gdal/2.2.1

export MKL_NUM_THREADS=1

echo "Starting run at: `date`"

R --vanilla < Benchmark.R &> benchmark.${SLURM_JOBID}.txt

echo "Program finished with exit code $? at: `date`"

Notes on MPI based software from CC Stack#

We recommend using a recent environment/toolchain that provides OpenMPI 3.1.x or later, which has a recent PMIx process management interface and supports UCX interconnect libraries that are used on Grex. Earlier versions of OpenMPI might or might not work. With OpenMPI 3.1.x or 4.0.x, srun command should be used in SLURM job scripts on Grex.

Below is an example of an MPI job (Intel benchmark) using the StdEnv/2018.3 toolchain (Intel 2018 / GCC 7.3.0 and OpenMPI 3.1.2).

Script example for running MPI program using the Alliance's software stack
run-mpi-cc-cvmfs.sh
#!/bin/bash

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --mem-per-cpu=4000M
#SBATCH --time=0-1:00:00
#SBATCH --job-name="IMB-MPI1-4"

# Load the modules:

module load CCEnv
module load StdEnv/2018.3
module load imb/2019.3

module list

echo "Starting run at: `date`"

srun IMB-MPI1 > imb-ompi312-2x2.txt

echo "Program finished with exit code $? at: `date`"

If the script above is saved into imb.slurm, it can be submitted as follows:

sbatch imb.slurm

Notes on Restricted/Commercial software on CC Stack#


The Alliance (formerly Compute Canada) software stack can have two options for distribution: open source software stack to all non-CC systems, or the full software stack to systems that obey CCDB groups and ACL permissions that control access to licensed, commercial software. Grex is presntly a CCDB-based system and has full access to the CC software stack.

However, each item of the proprietary code on the CC software stack comes with its own license and/or its own access conditions that we abide by. Thus, to request access to each item of commercial software the procedure must be found on the Alliance documentation site, and followed up via support@tech.alliancecan.ca .

Many commercial items there also are BYOL (bring-your-own license). An example would be Matlab, where our users would want to provide UManitoba’s Matlab license even when using the code from CC CVMFS.

As of now, older Intel compiler modules on the CC CVMFS software stack do not match license available on Grex. Thus, while all GCC compilers and GCC-based toolchains from CC Stack are useful for the local code development on Grex, for Intel it might depend on a version. Newest Intel OneAPI compilers (past 2023.x) are free to use and will work.

Other software repositories available through CC CVMFS#

OpenScienceGrid repository for Singularity/Apptainer OSG software#


On Grex, we mount OSG repositories, mainly for Singularity/Apptainer containers provided through OSG . Pointing the singularity to the desired path under /cvmfs/singularity.opensciencegrid.org/ will automatically mount and fetch the required software items. Discovering them is up to the users. One of the ways would be simply exploring the directories under the path /cvmfs/singularity.opensciencegrid.org/ using ls and cd commands.

See more about using Singularity in our Containers documentation page.

E4S containers in the OSG repository of Singularity/Apptainer software#


In particular, the path /cvmfs/singularity.opensciencegrid.org/ecpe4s provides access to the containerized E4S software stack for HPC and AI applications .

C3G repository for GenPipes/MUGQIC genomes and modules#


On Grex, GenPipes/MUGQIC repositories should be also available through CC CVMFS. Please refer to the GenPipes/MUGQIC Documentation provided by C3G on how to use them.


AlphaFold data repository from ComputeCanada CVMFS#


On Grex, several Genomics data repositories are available thanks to the effort of the Alliance’s Biomolecular National Teams. One of them is Alpha Fold. As of the time of writing this page, the current version of it can be seen as follows:

ls /cvmfs/bio.data.computecanada.ca/content/databases/Core/alphafold2_dbs/2024_01/

Thus, Alphafold can be used on Grex using CC software stack as described here .

A few other databases seems to be also available under:

/cvmfs/bio.data.computecanada.ca/content/databases/Core/