Containers for Software

Introduction#


Linux Containers are means to isolate software dependencies from the base Linux operating system. Several different Linux container engines exist, most notably Docker which was first to emerge as the most popular tool in the DevOps community.

Since then, a lot of work had been done by major Linux players like Google, RedHat and others to develop an open standard for container runtimes, which developed based on Docker, OCI .

There are HPC-specific container engines/runtimes that offer similar or equivalent functionality but allow for easier integration with shared Linux HPC systems. At the time of writing, the most widely used of them is the Singularity container system, developed by a company called SyLabs, and its fork, a Linux Foundation project called Apptainer . They are compatible with each other. Singularity/Apptainer provides functionality for running most Docker images by converting them to the Singularity Image format (SIF). However, Singularity/Apptainer own format is not completely OCI-compatible , so there exists Docker images that would not work properly.

Finally, recent developments in Linux Kernel namespaces allowed to happen such projects as “rootless Docker” and “rootless Podman ” which are more suitable for HPC systems than the original Docker implementation which requires privileged access to the Linux system.

On Grex, Sylabs Singularity-CE is supported on local SBEnv software stack, while Apptainer is supported as part of the ComputeCanada/Alliance CCEnv stack. At the time of writing, these engines can be used largely interchangeably.

New: There is also support for rootless Podman on Grex, for the use cases that require full OCI-compatibility.

Using Singularity from SBEnv on Grex#


A brief introduction on getting started with Singularity can be useful to get started. You will not need to install Singularity on Grex since it is already provided as a module.

Start with module spider singularity; it will list the current version. Due to the nature of container runtime environments, we update Singularity regularly, so the installed version is usually the latest one. Load the module (in the default Grex environment) by the following command:

module load singularity

With singularity command, one can list singularity commands and their options:

singularity help

To execute an application within the container, do it in the usual way for that application, but prefix the command with singularity exec image_name.sif or, if the container has a valid entry point, execute it with singularity run image_name.sif.

singularity run docker://ghcr.io/apptainer/lolcow

In the example above, Singularity downloads a Docker image from a registry, and runs it instantly. It is advisable, to avoid getting banned by container registries for massive downloads off a single HPC system, to “pull” or “build” containers first as images, and then “run” and “exec” them locally.

singularity pull lolcow_local.sif docker://ghcr.io/apptainer/lolcow
# The above should create a local image lolcow_local.sif 
# Lets run it with singularity
singularity run lolcow_local.sif

For another example, to run R on an R script, using an existing container image named R-INLA.sif (INLA is a popular R library installed in the container):

singularity exec ./R-INLA.sif R --vanilla < myscript.R

Quite often, it is necessary to provide the containerized application with data residing outside of the container image. For running HPC jobs, the data usually resides on a shared filesystem such as /home or /project. This is done via bind mounts . Normally, the container bind-mounts $HOME, /tmp and the current working directory. It is possible to mount a subdirectory, such as $PWD/workdir, or an entire filesystem such as /project. Example below bind-mounts a ./workdir folder relative to the current path. The folder must exist before being bind-mounted.

singularity exec -B `pwd`/workdir:/workdir ./R-INLA.sif R --vanilla < myscript.R

In case you do not want to mount anything to preserve the containers’ environment from any overlapping of data/code from say $HOME, use the --containall flag.

Some attention should be paid to Singularity’s local cache and temporary directories. Singularity caches the container images it pulls and Docker layers under $HOME/.singularity. Containers can be large, in tens of gigabytes, and thus they can easily accumulate and exhaust the users’ storage space quota on $HOME. Thus, users might want to set the SINGULARITY_CACHEDIR and SINGULARITY_TMPDIR variables to some place under their /global/scratch space.

For example, to change the location of SINGULARITY_CACHEDIR and SINGULARITY_TMPDIR, before building the singularity image, one might run:

mkdir -p /global/scratch/$USER/singularity/{cache,tmp}
export SINGULARITY_CACHEDIR="/global/scratch/$USER/singularity/cache"
export SINGULARITY_TMPDIR="/global/scratch/$USER/singularity/tmp"

Getting and building Singularity images#


The commands singularity build and/or singularity pull would get pre-built Singularity images from DockerHub, SingularityHub, or SyLabsCloud. “pulling” images does not require elevated permissions (that is, sudo). There are several kinds of container repositories from which containers can be pulled. These repositories are distinguished by the URI string (library://, docker://, oras://, etc.)

module load singularity
# Building Ubuntu image using Sylabs Library
singularity build ubuntu_latest0.sif library://ubuntu
# or using DockerHub, this will create ubuntu_latest.sif
singularity pull docker://ubuntu:latest

Singularity (SIF) images can also be built from other local images, local “sandbox” directories, and from recipes. A Singularity recipe or definition file is a text file that specifies the base image and post-install commands to be performed on it. However, Singularity-CE requires sudo (priviliged) access to build images from recipes, which is not available for users of HPC machines. There are two solutions to this problem.

Make sure you understand licensing and intellectual property implications before using remote build services! The second (fakeroot) method appears to be easier and does not require an external account.

Singularity with GPUs#


Use the --nv flag to singularity run/exec/shell commands. Naturally, you should be on a node that has a GPU, in an interactive job. NVIDIA provides many pre-built Docker and Singularity container images on their GPU cloud , together with instructions on how to pull them and to run them. NVidia’s NGC Docker images should, as a rule, work on HPC machines with Singularity without any changes.

Singularity with OpenScienceGrid CVMFS#


We can run Singularity containers distributed with OSG CVMFS which is currently mounted on Grex’s CVMFS. The containers are distributed via CVMFS as unpacked directory images. So, the way to access them is to find a directory of interest and point singularity runtime to it. The directories will then be mounted and fetched automatically. The repository starts with /cvmfs/singularity.opensciencegrid.org/. Then you’d need an idea from somewhere what you are looking for in the subdirectories of the above-mentioned path. An example (accessing, that is, exploring via singularity shell command, Remoll software distributed through OSG CVMFS by jeffersonlab):

module load singularity
singularity shell /cvmfs/singularity.opensciencegrid.org/jeffersonlab/remoll\:develop

It looks like the list of what is present on the OSG CVMFS is on GitHub: OSG GitHub docker images .

Using Apptainer from CCEnv on Grex#


The Alliance’s (formerly ComputeCanada) software stack now provides Apptainer modules in the two latest Standard Environments, StdEnv/2020 and StdEnv/2023. Most recent Apptainer versions (1.2.4 and older) do not require “suexec” and thus can be used off the CVMFS as usual. The only caveat would be to first unload any “singularity” or “apptainer” modules from other software stacks by module purge. Apptainer on the CCEnv stack is installed in suid-less mode.

The following commands show how to run the image from the previous example /R-INLA.sif:

module purge
module load CCEnv
module load arch/avx512 
module load StdEnv/2023
module load apptainer
# testing if apptainer command works
apptainer version
# running the basic example
apptainer run docker://ghcr.io/apptainer/lolcow

Similarly to Singularity, you will need to bind mount the required data directories for accessing data outside the container. The same best practices mentioned above for Singularity (pulling containers beforehand, controlling the cache location) equally apply for the Apptainer. The environment variables for Apptainer should be using APPTAINER_ instead of SINGULARITY_ prefixes.

Building apptainer images with “fakeroot”#

Apptainer supports building of SIF images from recipes without sudo access using --fakeroot option where available.

On Grex, it can be used as in the following example:

module purge
module load CCEnv
module load StdEnv/2023
module load apptainer
# testing if apptainer command works
apptainer version
# build an image, INLA, from a local Singularity.def recipe
# --fakeroot makes the build possible without elevated access
apptainer build --fakeroot INLA.sif Singularity.def

The resulting SIF image should be compatible with either Singularity-CE or Apptainer runtimes.

Using Podman from SBEnv on Grex#


Podman modules are now provided under the default Grex SBEnv environment. On Grex, Podman is configured as rootless. Podman is meant to be used by experienced users for jobs that cannot be executed as regular binaries, or through Singularity due to OCI requirements. Grex is an HPC systems, so it is expected that users would be using Podman to run compute jobs rather than persistent services (including and not limited to databases, and network services). Thus, Podman jobs and/or running Podman containers that deemed to be inappropriate for HPC may be terminated without notice.

It is forbidden to run Podman on login nodes; it must be run only on compute nodes (e.g. using sbatch or salloc).

The only allowed use on login nodes is to pull images before actually starting a job.

Access to the Podman runtime is through a module. Due to the nature of the container runtime environment, we strive to update Podman regularly, so in most cases, the latest installed version must be used:

module load podman
podman version
podman run --rm docker.io/godlovedc/lolcow

The above command would pull a container image and run it.

Getting and Managing Podman images#


When using Podman to run a job, we suggest to manually pre-download the required image to avoid wasting time during the job. Grex is hosting a Docker Registry proxy/cache locally to improve the download performance, and for avoiding rate limits that can be imposed by various container registries.

module load podman
podman pull _required_image_

The command podman pull image_name would get Podman images from a container registry. Images can also be built from other images, or from containerfiles (e.g. Dockerfiles) using the command podman build Containerfile. A containerfile is a text “recipe” that specifies the base image and commands to be run on it. Podman’s recipes are compatible with Dockerfiles.

Podman, as configured on Grex, by default would store all pulled and locally built images inside the user HOME directory. Depending on the size of the images, it could be easy to exhaust the disk quota on HOME quickly. It is the user’s responsibility to manage their Podman images (delete the old/unused ones).

To manage pulled images, users can take advantage of the following commands:

# list images
podman image ls
# delete an unnecessary image
podman image rm <IMAGE_ID>

Podman with GPUs#


Use the --device=nvidia.com/gpu=all flag when running a podman container. Naturally, you should be on a node that has a GPU. NVIDIA provides many pre-built Docker container images on their NGC Cloud , together with instructions on how to pull and run them. Podman would usually run Docker containers without changes to the command line parameters.