Introduction#
Grex is a UManitoba High Performance Computing (HPC) system, first put in production in early 2011 as part of WestGrid consortium. “Grex” is a Latin name for “herd” (or maybe “flock”?). The names of the Grex login nodes (bison , tatanka, zebu , yak ) also refer to various kinds of bovine animals.
Since being defunded by WestGrid (on April 2, 2018), Grex is now available only to the users affiliated with University of Manitoba and their collaborators.
If you are a new Grex user, proceed to the quick start guide and documentation right away.
Hardware#
The original Grex was an SGI Altix machine, with 312 compute nodes (Xeon 5560, 12 CPU cores and 48 GB of RAM per node) and QDR 40 Gb/s InfiniBand network.
The SGI Altix machine were decommissioned on Sep 2024.In 2017, a new Seagate Storage Building Blocks (SBB) based Lustre filesystem of 418 TB of useful space was added to Grex.
In 2020 and 2021, the University added 57 Intel CascadeLake CPU nodes, a few GPU nodes, a new NVME storage for home directories, and EDR InfiniBand interconnect.
On March 2023, a new storage of 1 PB was added to Grex. It is called /project filesystem.
On January 2024, the /project was extended by another 1 PB.
On Sep 2024, new AMD Genoa nodes have been added (30 nodes).
The current computing hardware available for general use is as follow:
Login nodes#
As of Sep 14, 2022, Grex is using UManitoba network. We have decommissioned the old WG and BCNET network that was used for about 11 years. Now, the DNS names use hpc.umanitoba.ca instead of the previous name westgrid.ca
On Grex, there are multiple login nodes:
- Yak: yak.hpc.umanitoba.ca (please note that the architecture for this node is avx512).
- Grex: grex.hpc.umanitoba.ca is now an alias to the above Yak login node
- Zebu: https://zebu.hpc.umanitoba.ca (only used for OOD and requires VPN if used outside campus network).
To login to Grex in the text (bash) mode, connect to grex.hpc.umanitoba.ca or yak.hpc.umanitoba.ca using a secure shell client, SSH .
CPU nodes#
In addition to the original nodes, new skylake and AMD nodes have been added to Grex:
Hardware | Number of nodes | CPUs/Node | Mem/Node | Network |
---|---|---|---|---|
Intel CPU | 12 | 40 | 384 GB | EDR 100GB/s IB interconnect |
Intel 6230R | 42 | 52 | 188 GB | EDR 100GB/s IB interconnect |
AMD EPYC 9654 | 27 | 192 | 750 GB | HDR 200GB/s IB interconnect |
AMD EPYC 9654 | 3 | 192 | 1500 GB | HDR 200GB/s IB interconnect |
AMD EPYC 96341 | 5 | 168 | 1500 GB | HDR 100GB/s IB interconnect |
GPU nodes#
There are also several researcher-contributed nodes (CPU and GPU) to Grex which make it a “community cluster”. The researcher-contributed nodes are available for others on opportunistic basis; the owner groups will preempt the others’ workloads.
Hardware | Number of nodes | GPUs/Node | CPUs/node | Mem/Node |
---|---|---|---|---|
GPU | 2 | 4 | 32 | 192 GB |
4 [V100-32 GB]2 | 2 | 4 | 32 | 187 GB |
4 [V100-16 GB]3 | 3 | 4 | 32 | 187 GB |
16 [V100-32 GB]4 | 1 | 16 | 48 | 1500 GB |
AMD [A30]5 | 2 | 2 | 18 | 500 GB |
NVIDIA AMD [A30]6 | 2 | 4 | 32 | 500 GB |
Storage#
Grex’s compute nodes have access to three filesystems:
File system | Type | Total space | Quota per user |
---|---|---|---|
/home | NFSv4/RDMA | 15 TB | 100 GB |
/project | Lustre | 2 PB | Allocated per group. |
In addition to the shared file system, the compute nodes have their own local disks that can be used as temporary storage when running jobs .
Software#
Grex is a traditional HPC machine, running Linux and SLURM resource management systems. On Grex, we use different software stacks .
Web portals and GUI#
In addition to the traditional bash mode (connecting via ssh), users have access to:
- OpenOnDemand: on Grex, it is possible to use OpenOnDemand (OOD for short) to login to Grex and run batch or GUI applications (VNC Desktops, Matlab, Gaussview, Jupyter, …). For more information, please refer to the page: OpenOnDemand
Useful links#
- Digital Research Alliance of Canada (the Alliance), formerly known as Compute Canada.
- the Alliance documentation
- Grex Status page
- Grex documentation
- Local Resources at UManitoba
CPU nodes contributed by Prof. Marcos Cordeiro (Department of Agriculture). ↩︎
GPU nodes available for all users (general purpose). ↩︎
GPU nodes contributed by Prof. R. Stamps (Department of Physics and Astronomy). ↩︎
NVSwitch server contributed by Prof. L. Livi (Department of Computer Science). ↩︎
GPU nodes contributed by Faculty of Agriculture. ↩︎
GPU nodes contributed by Prof. Marcos Cordeiro (Department of Agriculture). ↩︎