Introduction#
To run jobs on the cluster, a scheduler is used. In this section, different job scripts are presented. There are specified directive for each type of application: serial, openmp, mpi, hybrid and gpu.
Interactive job via salloc#
First, ask for interactive job using salloc:
salloc --nodes=1 --ntasks=2 --cpus-per-task=1 --mem-per-cpu=500M --time=15:00If needed, add other options to salloc command: salloc {+other options}
Once the job is granted, run the commands:
hostname
env | grep SLURM
sqThe above will show the name of the node, all slurm environment variables and the current jobs under your name.
Before starting the test, load lammps module:
HINT: use module spider lammps
module load StdEnv/2023 intel/2023.2.1 openmpi/4.1.5 lammps-omp/20240829After loading the modules, run the commands:
module list
module show lammps
ls $EBROOTLAMMPS/binMake show that the binary lmp is available in your environment:
which lmp
/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v4/MPI/intel2023/openmpi4/lammps-omp/20240829/bin/lmpNow, run the lammps test using:
srun lmp -in lammps-input.inor run it via a script:
sh ./mc-runlmp-interactive.shAs a summary, here are the steps to follow for running interactive jobs to test and debug your progrqams and scripts before submitting jobs via sbatch:
- salloc {+options} to ask for compute node.
- load the modules that are needed for your workflow.
- run tests and debug …
- exit {to resume the interactive job} and go back to the login node.
Sleep job#
Under the directory sleep-job, use cat command to see the content of the script:
cd sleep-job
cat mc-sleep-job.shNow, submit the job using:
sbatch mc-sleep-job.sh- If there are errors, fix them and/or add the options via cammnd line, like:
sbatch --account=def-sponsor2 mc-sleep-job.shWhat is the job id for your job?
See if your job is on the queue by running the command: sq
Once the job is done, inspect the output: slurm-
.out
Serial job#
Under the directory serial-job, use cat command to see the content of the script:
cd serial-job
cat mc-runlmp-1cpu-serial.shNow, submit the job using:
sbatch mc-runlmp-1cpu-serial.shOpenMP job#
Under the directory openmp-job, use cat command to see the content of the scripts:
cd openmp-job
cat mc-runlmp-2cpu-openmp.sh
cat mc-runlmp-4cpu-openmp.sh
cat mc-runlmp-8cpu-openmp.sh
cat mc-runlmp-16cpu-openmp.shNow, submit the job using:
sbatch mc-runlmp-2cpu-openmp.sh
sbatch mc-runlmp-4cpu-openmp.sh
sbatch mc-runlmp-8cpu-openmp.sh
sbatch mc-runlmp-16cpu-openmp.shMonitor your jobs and inspect the oupt files:
sq
sq -j <JOB ID>MPI job#
Under the directory mpi-job, use cat command to see the content of the scripts:
cd mpi-job
cat mc-runlmp-16cpu-mpi.sh
cat mc-runlmp-2cpu-mpi.sh
cat mc-runlmp-4cpu-mpi.sh
cat mc-runlmp-8cpu-mpi.sh
cat mc-runlmp-32cpu-mpi-4nodes.sh
cat mc-runlmp-32cpu-mpi.sh
cat mc-runlmp-64cpu-mpi.shNow, submit the job using:
sbatch mc-runlmp-16cpu-mpi.sh
sbatch mc-runlmp-2cpu-mpi.sh
sbatch mc-runlmp-4cpu-mpi.sh
sbatch mc-runlmp-8cpu-mpi.sh
sbatch mc-runlmp-32cpu-mpi-4nodes.sh
sbatch mc-runlmp-32cpu-mpi.sh
sbatch mc-runlmp-64cpu-mpi.shHybrid job (MPI+OpenMP)#
Under the directory hybrid-job, use cat command to see the content of the scripts:
cd hybrid-job
cat mc-runlmp-16tasks-2threads.sh
cat mc-runlmp-32cpu-mpi.sh
cat mc-runlmp-8tasks-4threads.shNow, submit the job using:
sbatch mc-runlmp-16tasks-2threads.sh
sbatch mc-runlmp-32cpu-mpi.sh
sbatch mc-runlmp-8tasks-4threads.shThe above will submit lamms using a total of 32 core:
- mc-runlmp-16tasks-2threads.sh 16 MPI Tasks and 2 Threads/Task ==> A total of 32 cores.
- mc-runlmp-32cpu-mpi.sh 32 MPI Tasks and 1 Thread/Task ==> A total of 32 cores.
- mc-runlmp-8tasks-4threads.sh 8 MPI Tasks and 4 Threads/Task ==> A total of 32 cores.
GPU job#
** First, pull the container:**
On Grex, we use singularity:
module load singularity
singularity pull docker://nvcr.io/hpc/lammps:patch_3Nov2022On CC/MC, we use apptainer:
module load apptainer
apptainer pull docker://nvcr.io/hpc/lammps:patch_3Nov2022The above will build the image lammps_patch_3Nov2022.sif (about 560M) that can be used to run lammps.
Here is an example of script to run lammps with this container on Grex:
#!/bin/bash
#SBATCH --gpus=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=12000M
#SBATCH --time=0-1:00:00
#SBATCH --job-name=GPU
#Load the modules:
module load apptainer
module load cuda/12.9
echo "Starting run at: `date`"
apptainer run --nv -B $PWD:/host_pwd --pwd /host_pwd ./lammps_patch_3Nov2022.sif ./run_lammps.sh
echo "Program finished with exit code $? at: `date`"The above job script uses a bash script run_lammps.sh where the command line for running lammps is added.
#!/bin/bash
gpu_count=1
input="in.lj"
echo "Running Lennard Jones 8x4x8 example on ${gpu_count} GPUS..."
mpirun -n ${gpu_count} lmp -k on g ${gpu_count} -sf kk -pk kokkos cuda/aware on neigh full comm device binsize 2.8 -var x 8 -var y 4 -var z 8 -in ${input} -log output_lammps-gpu-${SLURM_JOBID}.txtThe job requires an input file in.lj:
# 3d Lennard-Jones melt
units lj
atom_style atomic
lattice fcc 0.8442
region box block 0 200 0 200 0 200
create_box 1 box
create_atoms 1 box
mass 1 1.0
velocity all create 3.0 87287
pair_style lj/cut 2.5
pair_coeff 1 1 1.0 1.0 2.5
neighbor 0.3 bin
neigh_modify every 20 delay 0 check no
fix 1 all nve
thermo 25
run 20000
#write_data config.end_melt
# End of the Input file.Example of time-out job#
Under the directory time-out-job, use cat command to see the content of the scripts:
cd time-out-job
cat mc-runlmp-1cpu-serial.shNow, submit the job using:
sbatch mc-runlmp-1cpu-serial.shOnce the job is done, run the commands:
squeue -j <JOB ID>
cat Slurm-<JOB ID>.outAsk for more time and resubmit the job:
sbatch --time=1:00:00 mc-runlmp-1cpu-serial.shAlternatively, edit the job script and increase the time:
#SBATCH –time=0-1:00:00
and re-submit the job.
Example of out of memory kill job#
Under the directory oom-kill-job, use cat command to see the content of the scripts:
cd oom-kill-job
cat mc-runlmp-1cpu-serial.shNow, submit the job using:
sbatch mc-runlmp-1cpu-serial.shOnce the job is done, run the commands:
squeue -j <JOB ID>
cat Slurm-<JOB ID>.outAsk for more memory and re-submit the job:
sbatch --mem=3000M mc-runlmp-1cpu-serial.shAlternatively, edit the job script and increase the time:
#SBATCH –mem=3000M
and re-submit the job.
Performance of MPI and OpenMP jobs#
While using parallel programs (OpenMP and/or MPI based applications), it is highly recommended to test how a program scales with number of CPUs. It is well known that increasing the number of CPUs for OpenMP based programs do not increase the performance of the code. The idea is to take a small test case and run it using different number of CPUs: 1, 2, 4, 8, … etc. While MPI programs run across the nodes and use multiple CPUs, OpenMP codes run only on one node. Therefore, the maximum threads to use should not exceed the number of physical cores available on the node.
The command seff can be used to see the CPU efficiency.
For this example, we used LAMMPS and run it on Grex using OpenMP and OpenMPI. This code prints at the end of the run, the performance of the simulation in terms of Tau/day or ns/day and/or TimeStep/Second.
Tests using OpenMP:
| Job | CPUs | Tau/day | TimeStep/Second | CPU | Wall-clock time |
|---|---|---|---|---|---|
| 01 | 1 | 32688.292 | 75.667 | 99.43% | 00:44:05 |
| 02 | 2 | 68614.701 | 158.830 | 99.37% | 00:21:03 |
| 03 | 4 | 132718.066 | 307.218 | 99.05% | 00:10:55 |
| 04 | 8 | 158637.644 | 367.217 | 98.93% | 00:09:08 |
| 05 | 16 | 278099.343 | 643.748 | 96.96% | 00:05:19 |
| 06 | 32 | 274380.425 | 635.140 | 98.26% | 00:05:19 |
| 07 | 64 | 195418.938 | 452.359 | 98.69% | 00:07:23 |
Tests using MPI:
| Job | CPUs | Tau/day | TimeStep/Second | CPU | Wall-clock time |
|---|---|---|---|---|---|
| 01 | 1 | 29112.557 | 67.390 | 99.33% | 00:49:32 |
| 02 | 2 | 56738.337 | 131.339 | 99.25% | 00:25:27 |
| 03 | 4 | 111917.318 | 259.068 | 99.03% | 00:12:56 |
| 04 | 8 | 194286.703 | 449.738 | 98.58% | 00:07:29 |
| 05 | 16 | 440557.932 | 1019.810 | 97.23% | 00:03:21 |
| 06 | 32 | 730610.193 | 1691.227 | 97.03% | 00:02:02 |
| 07 | 64 | 1464824.869 | 3390.798 | 95.19% | 00:01:03 |
| 08 | 72 | 2214538.501 | 5126.247 | 96.03% | 00:00:42 |
Running jobs using job-arrays#
Jobs from a single directory#
On this example, we will submit multiple jobs from the same directory to run lammis with multiple input files: lammps-input-X.in where X=0,…,9
The command line used in this case is:
lmp -in lammps-input-${SLURM_ARRAY_TASK_ID}.in -log output_lammps-array-${SLURM_ARRAY_TASK_ID}-${SLURM_JOBID}.txtFirst, inspect the script using cat command:
cat mc-runlmp-1cpu-jobarray.shNow, submit the job using:
sbatch mc-runlmp-1cpu-jobarray.shAlternatively, remove the directive #SBATCH –array=0-9 from the job script and use the following command to submit the job:
sbatch --array=0-9 mc-runlmp-1cpu-jobarray.shOther possibilities to submit array jobs:
sbatch --array=0-9%2 mc-runlmp-1cpu-jobarray.sh
sbatch --array=0,2,4-9 mc-runlmp-1cpu-jobarray.sh
sbatch --array=1,3 mc-runlmp-1cpu-jobarray.sh- The option –array=0-9%2 means that the script will submit an array job with indices 0-9 and run a maximum of 2 at a time.
- The option –array=0,2,4-9 means that the script will submit an array job with indices 0,2 and all indices between 4 and 9 (4, 5, 6, 7, 8, 9). time.
- The option –array=1,3 means that the script will submit an array job with indices 1 and 3. time.
Jobs on multiple directories#
To avoid data overlapping, it is possible to create sub-directories and stage the input files for running the job with different parameters. In this case, there is no need to rename the outut files as they are generated in separate directories.
Here, we use directories with the name Test_X where X=0,…,9 and add a corresponding input file.
In the job script, we should make sure to change the directory to run the corresponding job for a given value of SLURM_ARRAY_TASK_ID
cd Test_${SLURM_ARRAY_TASK_ID}
lmp -in lammps-input.in -log output_lammps-array-${SLURM_ARRAY_TASK_ID}-${SLURM_JOBID}.txtFirst, inspect the script using cat command:
cat mc-runlmp-1cpu-jobarray.shNow, submit the job using:
sbatch mc-runlmp-1cpu-jobarray.shNote:
It is also possible to use the alternative options discussued in the previous example:
sbatch --array=0-9%2 mc-runlmp-1cpu-jobarray.sh
sbatch --array=0,2,4-9 mc-runlmp-1cpu-jobarray.sh
sbatch --array=1,3 mc-runlmp-1cpu-jobarray.shRunning jobs using GLOST#
Similar to job arrays, GLOST is used to run multiple independent tasks. In this case, we use a list where we add all the tasks. GLOST uses MPI and assign the first N lines of the list to the N CPUs asked for. Once one of these tasks is done, GLOST will assign the next available task in the list till all the tasks are done or the job times out.
First, inspect the list of tasks and scripts under the directories: multiple-dir and single-dir
cat single-dir/list_glost_tasks.txt
cat single-dir/mc-run-glost.sh
cat multiple-dir/list_glost_tasks.txt
cat multiple-dir/mc-run-glost.shNow, submit the jobs using:
pushd single-dir && sbatch mc-run-glost.sh && popd
pushd multiple-dir && sbatch mc-run-glost.sh && popd