Running jobs on Grex

Introduction#


To run jobs on the cluster, a scheduler is used. In this section, different job scripts are presented. There are specified directive for each type of application: serial, openmp, mpi, hybrid and gpu.

Interactive job via salloc#


On this example, we will test lammps interactively via salloc. All the scripts and instructions are under the directory interactive.

First, ask for interactive job using salloc:

salloc --nodes=1 --ntasks=2 --cpus-per-task=1 --mem-per-cpu=500M --time=15:00

If needed, add other options to salloc command: salloc {+other options}

Once the job is granted, run the commands:

hostname
env | grep SLURM
sq

The above will show the name of the node, all slurm environment variables and the current jobs under your name.

Before starting the test, load lammps module:

HINT: use module spider lammps

module load arch/avx512  gcc/13.2.0  openmpi/4.1.6 lammps/2024-08-29p1

After loading the modules, run the commands:

module list
module show lammps
ls $MODULE_LAMMPS_PREFIX/bin

Make show that the binary lmp is available in your environment:

which lmp
/global/software/alma8/sb/opt/arch-avx512-gcc-13.2.0-openmpi-4.1.6/lammps/2024-08-29p1/bin/lmp

Now, run the lammps test using:

srun lmp -in lammps-input.in

or run it via a script:

sh ./grex-runlmp-interactive.sh
The above shows how to run interactive job for testing and debugging. In case you have many commands, you can bundle them inside a script as in the above.

As a summary, here are the steps to follow for running interactive jobs to test and debug your progrqams and scripts before submitting jobs via sbatch:

  • salloc {+options} to ask for compute node.
  • load the modules that are needed for your workflow.
  • run tests and debug …
  • exit {to resume the interactive job} and go back to the login node.

Sleep job#


On this example, we will test a sleep job submitted via sbatch. All the scripts and instructions are under the directory sleep-job. In addition to running sleep command, the job prints also the slurm environment variables.

Under the directory sleep-job, use cat command to see the content of the script:

cd sleep-job
cat grex-sleep-job.sh

Now, submit the job using:

sbatch grex-sleep-job.sh
  • If there are errors, fix them and/or add the options via cammnd line, like:
sbatch --partition=genoa grex-sleep-job.sh
  • What is the job id for your job?

  • See if your job is on the queue by running the command: sq

  • Once the job is done, inspect the output: slurm-.out

Serial job#


On this example, we will test lammps using a serial job via sbatch. All the scripts and instructions are under the directory serial-job.

Under the directory serial-job, use cat command to see the content of the script:

cd serial-job
cat grex-runlmp-1cpu-serial.sh

Now, submit the job using:

sbatch grex-runlmp-1cpu-serial.sh

OpenMP job#


On this example, we will test lammps using OpenMP job via sbatch. All the scripts and instructions are under the directory openmp-job.

Under the directory openmp-job, use cat command to see the content of the scripts:

cd openmp-job
cat grex-runlmp-2cpu-openmp.sh
cat grex-runlmp-4cpu-openmp.sh
cat grex-runlmp-8cpu-openmp.sh
cat grex-runlmp-16cpu-openmp.sh

Now, submit the job using:

sbatch grex-runlmp-2cpu-openmp.sh
sbatch grex-runlmp-4cpu-openmp.sh
sbatch grex-runlmp-8cpu-openmp.sh
sbatch grex-runlmp-16cpu-openmp.sh

Monitor your jobs and inspect the oupt files:

sq
sq -j <JOB ID>

MPI job#


On this example, we will test lammps using MPI job via sbatch. All the scripts and instructions are under the directory mpi-job.

Under the directory mpi-job, use cat command to see the content of the scripts:

cd mpi-job
cat grex-runlmp-16cpu-mpi.sh
cat grex-runlmp-2cpu-mpi.sh
cat grex-runlmp-4cpu-mpi.sh
cat grex-runlmp-8cpu-mpi.sh
cat grex-runlmp-32cpu-mpi-4nodes.sh
cat grex-runlmp-32cpu-mpi.sh
cat grex-runlmp-64cpu-mpi.sh

Now, submit the job using:

sbatch grex-runlmp-16cpu-mpi.sh
sbatch grex-runlmp-2cpu-mpi.sh
sbatch grex-runlmp-4cpu-mpi.sh
sbatch grex-runlmp-8cpu-mpi.sh
sbatch grex-runlmp-32cpu-mpi-4nodes.sh
sbatch grex-runlmp-32cpu-mpi.sh
sbatch grex-runlmp-64cpu-mpi.sh

Hybrid job (MPI+OpenMP)#


On this example, we will test lammps using hybrid job (MPI and OpenMP) via sbatch. All the scripts and instructions are under the directory hybrid-job.

Under the directory hybrid-job, use cat command to see the content of the scripts:

cd hybrid-job
cat grex-runlmp-16tasks-2threads.sh
cat grex-runlmp-32cpu-mpi.sh
cat grex-runlmp-8tasks-4threads.sh

Now, submit the job using:

sbatch grex-runlmp-16tasks-2threads.sh
sbatch grex-runlmp-32cpu-mpi.sh
sbatch grex-runlmp-8tasks-4threads.sh

The above will submit lamms using a total of 32 core:

  • grex-runlmp-16tasks-2threads.sh 16 MPI Tasks and 2 Threads/Task ==> A total of 32 cores.
  • grex-runlmp-32cpu-mpi.sh 32 MPI Tasks and 1 Thread/Task ==> A total of 32 cores.
  • grex-runlmp-8tasks-4threads.sh 8 MPI Tasks and 4 Threads/Task ==> A total of 32 cores.

GPU job#


On this directory, there are two example: the first one is for running lammps on GPU using singularity and the second uses apptainer.

** First, pull the container:**

On Grex, we use singularity:

module load singularity
singularity pull docker://nvcr.io/hpc/lammps:patch_3Nov2022

On CC/MC, we use apptainer:

module load apptainer
apptainer pull docker://nvcr.io/hpc/lammps:patch_3Nov2022

The above will build the image lammps_patch_3Nov2022.sif (about 560M) that can be used to run lammps.

Here is an example of script to run lammps with this container on Grex:

Example of a job script to LAMMPS using singularity.
#!/bin/bash

#SBATCH --gpus=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=12000M
#SBATCH --time=0-1:00:00
#SBATCH --partition=livi-b
#SBATCH --job-name=GPU

#Load the modules:

module load singularity
module load cuda/12.9.1

echo "Starting run at: `date`"

singularity run --nv -B $PWD:/host_pwd --pwd /host_pwd ./lammps_patch_3Nov2022.sif ./run_lammps.sh

echo "Program finished with exit code $? at: `date`"

The above job script uses a bash script run_lammps.sh where the command line for running lammps is added.

#!/bin/bash

gpu_count=1
input="in.lj"

echo "Running Lennard Jones 8x4x8 example on ${gpu_count} GPUS..."

mpirun -n ${gpu_count} lmp -k on g ${gpu_count} -sf kk -pk kokkos cuda/aware on neigh full comm device binsize 2.8 -var x 8 -var y 4 -var z 8 -in ${input} -log output_lammps-gpu-${SLURM_JOBID}.txt

The job requires an input file in.lj:

Example of input file used to run LAMMPS.
# 3d Lennard-Jones melt

units		lj
atom_style	atomic

lattice		fcc 0.8442
region		box block 0 200 0 200 0 200
create_box	1 box
create_atoms	1 box
mass		1 1.0

velocity	all create 3.0 87287

pair_style	lj/cut 2.5
pair_coeff	1 1 1.0 1.0 2.5

neighbor	0.3 bin
neigh_modify	every 20 delay 0 check no

fix		1 all nve

thermo		25

run		20000

#write_data     config.end_melt

# End of the Input file.

The same scripts can be adapted and used to run LAMMPS using apptainer.

**Note__: On Grex, it is also possible to use podman and pyxis.

Example of time-out job#


On this example, we will test lammps using a serial job via sbatch and asking for very short time. All the scripts and instructions are under the directory time-out-job. The goal is to reproduce the TIMEOUT message for the job.

Under the directory time-out-job, use cat command to see the content of the scripts:

cd time-out-job
cat grex-runlmp-1cpu-serial.sh

Now, submit the job using:

sbatch grex-runlmp-1cpu-serial.sh

Once the job is done, run the commands:

squeue -j <JOB ID>
cat Slurm-<JOB ID>.out

Ask for more time and resubmit the job:

sbatch --time=1:00:00 grex-runlmp-1cpu-serial.sh

Alternatively, edit the job script and increase the time:

#SBATCH –time=0-1:00:00

and re-submit the job.

Example of out of memory kill job#


On this example, we will test lammps using a serial job via sbatch and asking for less memory. All the scripts and instructions are under the directory oom-kill-job. The goal is to reproduce the oom-kill event for the job.

Under the directory oom-kill-job, use cat command to see the content of the scripts:

cd oom-kill-job
cat grex-runlmp-1cpu-serial.sh

Now, submit the job using:

sbatch grex-runlmp-1cpu-serial.sh

Once the job is done, run the commands:

squeue -j <JOB ID>
cat Slurm-<JOB ID>.out

Ask for more memory and re-submit the job:

sbatch --mem=3000M grex-runlmp-1cpu-serial.sh

Alternatively, edit the job script and increase the time:

#SBATCH –mem=3000M

and re-submit the job.

Performance of MPI and OpenMP jobs#


On this example, we will test lammps using MPI and/or OpenMP to compare the performance and how the program scales with the number of CPU. All the scripts and instructions are under the directory performance.

While using parallel programs (OpenMP and/or MPI based applications), it is highly recommended to test how a program scales with number of CPUs. It is well known that increasing the number of CPUs for OpenMP based programs do not increase the performance of the code. The idea is to take a small test case and run it using different number of CPUs: 1, 2, 4, 8, … etc. While MPI programs run across the nodes and use multiple CPUs, OpenMP codes run only on one node. Therefore, the maximum threads to use should not exceed the number of physical cores available on the node.

The command seff can be used to see the CPU efficiency.

For this example, we used LAMMPS and run it on Grex using OpenMP and OpenMPI. This code prints at the end of the run, the performance of the simulation in terms of Tau/day or ns/day and/or TimeStep/Second.

Tests using OpenMP:

JobCPUsTau/dayTimeStep/SecondCPUWall-clock time
01132688.29275.66799.43%00:44:05
02268614.701158.83099.37%00:21:03
034132718.066307.21899.05%00:10:55
048158637.644367.21798.93%00:09:08
0516278099.343643.74896.96%00:05:19
0632274380.425635.14098.26%00:05:19
0764195418.938452.35998.69%00:07:23

Tests using MPI:

JobCPUsTau/dayTimeStep/SecondCPUWall-clock time
01129112.55767.39099.33%00:49:32
02256738.337131.33999.25%00:25:27
034111917.318259.06899.03%00:12:56
048194286.703449.73898.58%00:07:29
0516440557.9321019.81097.23%00:03:21
0632730610.1931691.22797.03%00:02:02
07641464824.8693390.79895.19%00:01:03
08722214538.5015126.24796.03%00:00:42

Running jobs using job-arrays#


On this example, we will test lammps to run job array for running multiple copies of the job with different parameters. All the scripts and instructions are under the directory array-job.

Jobs from a single directory#

On this example, we will submit multiple jobs from the same directory to run lammis with multiple input files: lammps-input-X.in where X=0,…,9

In this case, make sure that the output files do not overlap. That’s why we used output_lammps-array-${SLURM_ARRAY_TASK_ID}-${SLURM_JOBID}.txt as output. The environment variable SLURM_ARRAY_TASK_ID will be used to name the output accourding to the indices used.

The command line used in this case is:

lmp -in lammps-input-${SLURM_ARRAY_TASK_ID}.in -log output_lammps-array-${SLURM_ARRAY_TASK_ID}-${SLURM_JOBID}.txt

First, inspect the script using cat command:

cat grex-runlmp-1cpu-jobarray.sh

Now, submit the job using:

sbatch grex-runlmp-1cpu-jobarray.sh

Alternatively, remove the directive #SBATCH –array=0-9 from the job script and use the following command to submit the job:

sbatch --array=0-9 grex-runlmp-1cpu-jobarray.sh

Other possibilities to submit array jobs:

sbatch --array=0-9%2 grex-runlmp-1cpu-jobarray.sh
sbatch --array=0,2,4-9 grex-runlmp-1cpu-jobarray.sh
sbatch --array=1,3 grex-runlmp-1cpu-jobarray.sh
  • The option –array=0-9%2 means that the script will submit an array job with indices 0-9 and run a maximum of 2 at a time.
  • The option –array=0,2,4-9 means that the script will submit an array job with indices 0,2 and all indices between 4 and 9 (4, 5, 6, 7, 8, 9). time.
  • The option –array=1,3 means that the script will submit an array job with indices 1 and 3. time.

Jobs on multiple directories#

To avoid data overlapping, it is possible to create sub-directories and stage the input files for running the job with different parameters. In this case, there is no need to rename the outut files as they are generated in separate directories.

Here, we use directories with the name Test_X where X=0,…,9 and add a corresponding input file.

In the job script, we should make sure to change the directory to run the corresponding job for a given value of SLURM_ARRAY_TASK_ID

cd Test_${SLURM_ARRAY_TASK_ID}
lmp -in lammps-input.in -log output_lammps-array-${SLURM_ARRAY_TASK_ID}-${SLURM_JOBID}.txt

First, inspect the script using cat command:

cat grex-runlmp-1cpu-jobarray.sh

Now, submit the job using:

sbatch grex-runlmp-1cpu-jobarray.sh

Note:

It is also possible to use the alternative options discussued in the previous example:

sbatch --array=0-9%2 grex-runlmp-1cpu-jobarray.sh
sbatch --array=0,2,4-9 grex-runlmp-1cpu-jobarray.sh
sbatch --array=1,3 grex-runlmp-1cpu-jobarray.sh

Running jobs using GLOST#


On this example, we will show how run multiple tasks using glost instead of job arrays. All the scripts and instructions are under the directory glost-job.

Similar to job arrays, GLOST is used to run multiple independent tasks. In this case, we use a list where we add all the tasks. GLOST uses MPI and assign the first N lines of the list to the N CPUs asked for. Once one of these tasks is done, GLOST will assign the next available task in the list till all the tasks are done or the job times out.

First, inspect the list of tasks and scripts under the directories: multiple-dir and single-dir

cat single-dir/list_glost_tasks.txt
cat single-dir/grex-run-glost.sh
cat multiple-dir/list_glost_tasks.txt
cat multiple-dir/grex-run-glost.sh

Now, submit the jobs using:

pushd single-dir && sbatch grex-run-glost.sh && popd
pushd multiple-dir && sbatch grex-run-glost.sh && popd