Grex: High Performance Computing Cluster at University of Manitoba

HPCC

Introduction#


Grex is a UManitoba High Performance Computing (HPC) system, first put in production in early 2011 as part of WestGrid consortium. “Grex” is a Latin name for “herd” (or maybe “flock”?). The names of the Grex login nodes (bison , yak ) also refer to various kinds of bovine animals.

Please note that older login nodes tatanka and zebu are decommissioned during and after the outage of August - September 2024. These login nodes are no longer available.

Since being defunded by WestGrid (on April 2, 2018), Grex is now available only to the users affiliated with University of Manitoba and their collaborators.

If you are a new Grex user, proceed to the quick start guide and documentation right away.

Hardware#


  • The original Grex was an SGI Altix machine, with 312 compute nodes (Xeon 5560, 12 CPU cores and 48 GB of RAM per node) and QDR 40 Gb/s InfiniBand network.

    The SGI Altix machines were decommissioned in Sep 2024.

  • In 2017, a new Seagate Storage Building Blocks (SBB) based Lustre filesystem of 418 TB of useful space was added to Grex.

    The SBB that serves as /scratch is not available.

  • In 2020 and 2021, the University added 57 Intel CascadeLake CPU nodes, a few GPU nodes, a new NVME storage for home directories, and EDR InfiniBand interconnect.

  • On March 2023, a new storage of 1 PB was added to Grex. It is called /project filesystem.

  • On January 2024, the /project was extended by another 1 PB.

  • On Sep 2024, new AMD Genoa nodes have been added (30 nodes with a total of 5760 cores).

  • On April 2025, a new GPU node L40S with 2 GPUs was added to Grex.

The current computing hardware available for general use is as follow:

Login nodes#


As of Sep 14, 2022, Grex is using UManitoba network. We have decommissioned the old WG and BCNET network that was used for about 11 years. Now, the DNS names use hpc.umanitoba.ca instead of the previous name westgrid.ca.

On Grex, there are multiple login nodes:

  • Yak: yak.hpc.umanitoba.ca (please note that the architecture for this node is avx512)
  • Bison: bison.hpc.umanitoba.ca (a second login nodes similar to Yak)
  • Grex: grex.hpc.umanitoba.ca is a DNS alias to the above Yak and Bison login nodes
  • OOD: ood.hpc.umanitoba.ca (only used for OpenOnDemand Web interface and requires VPN if used outside campus network)

To login to Grex in the text (bash) mode, connect to grex.hpc.umanitoba.ca using a secure shell client, SSH .

Compute nodes#


There are several researcher-contributed nodes (CPU and GPU) to Grex which make it a “community cluster”. The researcher-contributed nodes are available for others on opportunistic basis; the owner groups will preempt the others’ workloads.

The current compute nodes available on Grex are listed in the following table:

CPUNodesCPUs/NodeMem/NodeGPUGPUs/NodeVMem/GPUNetwork
(InfiniBand)
Intel
Xeon 6248
1240384 GBN/AN/AN/AEDR 100GB/s
Intel
Xeon 6230R
4352188 GBN/AN/AN/AEDR 100GB/s
AMD
EPYC 96541
31192750 GBN/AN/AN/AHDR 200GB/s
AMD
EPYC 96541
41921500 GBN/AN/AN/AHDR 200GB/s
AMD
EPYC 96342
51681500 GBN/AN/AN/AHDR 100GB/s
Intel
Xeon 52183
232180 GBnVidia Tesla V100432 GBFDR 56GB/s
Intel
Xeon 52184
332180 GBnVidia Tesla V100416 GBFDR 56GB/s
Intel
Xeon 6248R5
1481500 GBnVidia Tesla V1001632 GBEDR 100GB/s
AMD
EPYC 7402P6
224240 GBnVidia A30224 GBEDR 100GB/s
AMD
EPYC 7543P7
232480 GBnVidia A30224 GBEDR 100GB/s
AMD
EPYC 93343
164370 GBnVidia L40S248 GBHDR 200GB/s

Storage#


Grex’s compute nodes have access to three filesystems:

File systemTypeTotal spaceQuota per userQuota per group
/homeNFSv4/RDMA15 TB100 GBN/A
/projectLustre2 PBN/A5 TB

In addition to the shared file system, the compute nodes have their own local disks that can be used as temporary storage when running jobs .

Software#


Grex is a traditional HPC machine, running Linux and SLURM resource management systems. On Grex, we use different software stacks .

Web portals and GUI#


In addition to the traditional bash mode (connecting via ssh), users have access to:

  • OpenOnDemand : on Grex, it is possible to use OpenOnDemand (OOD for short) to login to Grex and run batch or GUI applications (VNC Desktops, Matlab, Gaussview, Jupyter, …)


WestGrid ceased operations on April 1st, 2022. The former WestGrid institutions are now re-organized into two consortia: BC DRI group and Prairies DRI group.

  1. CPU nodes available for all users (of these, five are contributed by a group of CHRIM researchers). ↩︎ ↩︎

  2. CPU nodes contributed by Prof. M. Cordeiro (Department of Agriculture). ↩︎

  3. GPU nodes available for all users. ↩︎ ↩︎

  4. GPU nodes contributed by Prof. R. Stamps (Department of Physics and Astronomy). ↩︎

  5. GPU nodes contributed by Prof. L. Livi (Department of Computer Science). ↩︎

  6. GPU nodes contributed by Faculty of Agriculture. ↩︎

  7. GPU nodes contributed by Prof. M. Cordeiro (Department of Agriculture). ↩︎