Grex: High Performance Computing Cluster at University of Manitoba

Introduction#


Grex is a UManitoba High Performance Computing (HPC) system, first put in production in early 2011 as part of WestGrid consortium. “Grex” is a Latin name for “herd” (or maybe “flock”?). The names of the Grex login nodes (bison , tatanka, zebu , yak ) also refer to various kinds of bovine animals.

Since being defunded by WestGrid (on April 2, 2018), Grex is now available only to the users affiliated with University of Manitoba and their collaborators.

If you are a new Grex user, proceed to the quick start guide and documentation right away.

Hardware#


  • The original Grex was an SGI Altix machine, with 312 compute nodes (Xeon 5560, 12 CPU cores and 48 GB of RAM per node) and QDR 40 Gb/s InfiniBand network.
  • In 2017, a new Seagate Storage Building Blocks (SBB) based Lustre filesystem of 418 TB of useful space was added to Grex.
  • In 2020 and 2021, the University added several modern Intel CascadeLake CPU nodes, a few GPU nodes, a new NVME storage for home directories, and EDR InfiniBand interconnect.
  • On March 2023, a new storage of 1 PB was added to Grex. It is called /project filesystem.
  • On January 2024, the /project was extended by another 1 PB.

The current computing hardware available for general use is as follow:

Login nodes#


As of Sep 14, 2022, Grex is using UManitoba network. We have decommissioned the old WG and BCNET network that was used for about 11 years. Now, the DNS names use hpc.umanitoba.ca instead of the previous name westgrid.ca

On Grex, there are multiple login nodes:

  • Bison: bison.hpc.umanitoba.ca
  • Tatanka: tatanka.hpc.umanitoba.ca
  • Grex: grex.hpc.umanitoba.ca
  • Yak: yak.hpc.umanitoba.ca (please note that the architecture for this node is avx512).
  • Zebu: https://zebu.hpc.umanitoba.ca (only used for OOD and requires VPN if used outside campus network).

To login to Grex in the text (bash) mode, connect to grex.hpc.umanitoba.ca using a secure shell client, SSH .

The DNS name grex.hpc.umanitoba.ca serves as an alias for two login nodes: bison.hpc.umanitoba.ca and tatanka.hpc.umanitoba.ca .

It is also possible to connect via yak.hpc.umanitoba.ca

CPU nodes#


In addition to the original nodes, new skylake and AMD nodes have been added to Grex:

HardwareNumber of nodesCPUs/NodeMem/NodeNetwork
Intel CPU1240384 GBEDR 100GB/s IB interconnect
Intel 6230R4252188 GBEDR 100GB/s IB interconnect
Intel Xeon 556013121248 GBQDR 40GB/s IB interconnect
AMD EPYC 9634251681500 GBHDR 100GB/s IB interconnect

GPU nodes#


There are also several researcher-contributed nodes (CPU and GPU) to Grex which make it a “community cluster”. The researcher-contributed nodes are available for others on opportunistic basis; the owner groups will preempt the others’ workloads.

HardwareNumber of nodesGPUs/NodeCPUs/nodeMem/Node
GPU2432192 GB
4 [V100-32 GB]32432187 GB
4 [V100-16 GB]43432187 GB
16 [V100-32 GB]5116481500 GB
AMD [A30]62218500 GB
NVIDIA AMD [A30]72432500 GB

Storage#


Grex’s compute nodes have access to three filesystems:

File systemTypeTotal spaceQuota per user
/homeNFSv4/RDMA15 TB100 GB
/global/scratchLustre418 TB4 TB
/projectLustre2 PBAllocated per group.

In addition to the shared file system, the compute nodes have their own local disks that can be used as temporary storage when running jobs .

Software#


Grex is a traditional HPC machine, running Linux and SLURM resource management systems. On Grex, we use different software stacks .

Web portals and GUI#


In addition to the traditional bash mode (connecting via ssh), users have access to:

  • OpenOnDemand: on Grex, it is possible to use OpenOnDemand (OOD for short) to login to Grex and run batch or GUI applications (VNC Desktops, Matlab, Gaussview, Jupyter, …). For more information, please refer to the page: OpenOnDemand


WestGrid ceased operations on April 1st, 2022. The former WestGrid institutions are now re-organized into two consortia: BC DRI group and Prairies DRI group.

  1. Original Grex nodes: slated for decommission in the near furure ↩︎

  2. CPU nodes contributed by Prof. Marcos Cordeiro (Department of Agriculture). ↩︎

  3. GPU nodes available for all users (general purpose). ↩︎

  4. GPU nodes contributed by Prof. R. Stamps (Department of Physics and Astronomy). ↩︎

  5. NVSwitch server contributed by Prof. L. Livi (Department of Computer Science). ↩︎

  6. GPU nodes contributed by Faculty of Agriculture. ↩︎

  7. GPU nodes contributed by Prof. Marcos Cordeiro (Department of Agriculture). ↩︎