Scheduling policies and running jobs on Contributed nodes

Scheduling policies for contributed systems#


Grex has a few user-contributed nodes. The owners of the hardware have preferred access to them. The current mechanism for the “preferred access” is preemption.

On the definition of preferential access to HPC systems#


Preferential access is when you have non-exclusive access to your hardware, in a sense that others can share in its usage over large enough periods. There are the following technical possibilities that rely on the HPC batch queueing technology we have. HPC makes access to CPU cores / GPUs / Memory exclusive per job, for the duration of the job (as opposed to time-sharing). Priority is a factor that decides which job gets to start (and thus exclude other jobs) first if there is a competitive situation (more jobs than free cores).

The owner is the owner of the contributed hardware. Others are other users. A partition is a subset of the HPC system’s compute nodes.

Preemption by partition: the contributed nodes have a SLURM partition on them, allowing the owner to use them, normally, for batch or interactive jobs. The partition is a “preemptor”. There is an overlapping partition, on the same set of the nodes but for the others to use, which is “preemptible”. Jobs in the preemptible partition can be killed after a set “grace period” (1 hour) as the owner’s job enters the “preemptor” partition. If works, preempted jobs might be check-pointed rather than killed, but that’s harder to set up. Currently, it is not generally supported. If you have a code that supports checkpoint/restart at the application level, you can get most of the contributed nodes.

On Grex, the “preemptor” partition is named after the name of the owner PI, and the preemptible partitions named similarly but with added -b suffix. Use the --partition= option to submit the jobs with sbatch and salloc commands to select the desired partition.

Contributed Nodes#


As of now, the preemptible partitions are:

PartitionNodesGPUs/NodeCPUs/NodeMem/NodeNotes
stamps-b 13432187 GbAVX512
livi-b 2116481500 GbNVSwitch server
agro-b 32224250 GbAMD
mcordcpu-b 45-1681500 Gb** AMD EPYC 9634 84-Core**
mcordgpu-b 55-1681500 GbAMD EPYC 9634

Partition “stamps-b”#


To submit a GPU job to stamps-b partition, please include the directive:

#SBATCH --partition=stamps-b

in your job script or submit your jobs using:

sbatch --partition=stamps-b run-gpu-job.sh

assuming that run-gpu-job.sh is the name of your job script.

Here is an example of script for running LAMMPS job on this partition:

Script example for running LAMMPS on **stamps-b** partition
run-lmp-gpu.sh
#!/bin/bash

#SBATCH --gpus=1 
#SBATCH --partition=stamps-b
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=4000M
#SBATCH --time=0-3:00:00

# Load the modules:

module load intel/2020.4  ompi/4.1.2 lammps-gpu/24Mar22

echo "Starting run at: `date`"

ngpus=1
ncpus=1

lmp_exec=lmp_gpu
lmp_input="in.metal"
lmp_output="log-${ngpus}-gpus-${ncpus}-cpus.txt"

mpirun -np ${ncpus} lmp_gpu -sf gpu -pk gpu ${ngpus} -log ${lmp_output} -in ${lmp_input}

echo "Program finished with exit code $? at: `date`"

Partition “livi-b”#


To submit a GPU job to livi-b partition, please include the directive:

#SBATCH --partition=livi-b

in your job script or submit your jobs using:

sbatch --partition=livi-b run-gpu-job.sh

assuming that run-gpu-job.sh is the name of your job script.

Here is an example of script for running LAMMPS job on this partition:

Script example for running LAMMPS on **livi-b** partition
run-lmp-gpu.sh
#!/bin/bash

#SBATCH --gpus=1 
#SBATCH --partition=livi-b
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=4000M
#SBATCH --time=0-3:00:00

# Load the modules:

module load intel/2020.4  ompi/4.1.2 lammps-gpu/24Mar22

echo "Starting run at: `date`"

ngpus=1
ncpus=1

lmp_exec=lmp_gpu
lmp_input="in.metal"
lmp_output="log-${ngpus}-gpus-${ncpus}-cpus.txt"

mpirun -np ${ncpus} lmp_gpu -sf gpu -pk gpu ${ngpus} -log ${lmp_output} -in ${lmp_input}

echo "Program finished with exit code $? at: `date`"

Partition “agro-b”#


To submit a GPU job to agro-b partition, please include the directive:

#SBATCH --partition=agro-b

in your job script or submit your jobs using:

sbatch --partition=agro-b run-gpu-job.sh

assuming that run-gpu-job.sh is the name of your job script.

Here is an example of script for running LAMMPS job on this partition:

Script example for running LAMMPS on **agro-b** partition
run-lmp-gpu.sh
#!/bin/bash

#SBATCH --gpus=1 
#SBATCH --partition=agro-b
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=4000M
#SBATCH --time=0-3:00:00

# Load the modules:

module load intel/2020.4  ompi/4.1.2 lammps-gpu/24Mar22

echo "Starting run at: `date`"

ngpus=1
ncpus=1

lmp_exec=lmp_gpu
lmp_input="in.metal"
lmp_output="log-${ngpus}-gpus-${ncpus}-cpus.txt"

mpirun -np ${ncpus} lmp_gpu -sf gpu -pk gpu ${ngpus} -log ${lmp_output} -in ${lmp_input}

echo "Program finished with exit code $? at: `date`"

  1. stamps-b: GPU [4 - V100/16GB] nodes contributed by Prof. R. Stamps ↩︎

  2. livi-b: GPU [HGX-2 16xGPU V100/32GB] node contributed by Prof. L. Livi ↩︎

  3. agro-b: GPU [AMD Zen] node contributed by Faculty of Agriculture ↩︎

  4. mcordcpu-b GPU nodes contributed by Prof. Marcos Cordeiro (Department of Agriculture). ↩︎

  5. mcordgpu-b CPU nodes contributed by Prof. Marcos Cordeiro (Department of Agriculture). ↩︎