Transferring data

Introduction#


OpenSSH tools: SCP, SFTP#


The OpenSSH package, available on Linux, MacOS and recent versions of Windows, provides not only the command line SSH client, but two command line file transfer tools: SFTP and SCP. Thus, on almost any system that allows for SSH connection, data transfers can be performed using these tools. OpenSSH encrypts all the traffic and provides several authentication options. A useful option for SCP and SFTP is to have a key pair, with the public key deposited in CCDB.

SFTP#


On Mac OS and Linux, where OpenSSH client packages are always available, the following command line tools are present: scp, sftp. They work similar to UNIX cp and ftp commands, except that there is a remote target or source.

SFTP opens a session and then drops the user to a command line, which provides commands like ls, lls, get, put, cd, lcd to navigate the local and remote directories, upload and download files etc.

sftp someuser@grex.hpc.umanitoba.ca
sftp> lls
sftp> put  myfile.fchk

Please replace someuser with your username.

SCP#


SCP behaves like cp. It needs a “source” and a “destination” specified. Either of these can be local or remote. The remote destination has the format of user@host:/path .

To copy a file myfile.fchk to Grex, from the current directory into his home directory, a user would run the following command:

scp ./myfile.fchk someuser@grex.hpc.umanitoba.ca:/home/someuser

The same example but using a key pair, assuming the corresponding public key is deposited in CCDB:

scp -i a-private-key.key ./myfile.fchk someuser@grex.hpc.umanitoba.ca:/home/someuser

The Home filesystem is limited in space and performance. For larger files, it might make sense to use SCP for the Project filesystem instead. A convenience symbolic link under /home/someuser/projects points to the Project filesystem.

scp ./myfile_bigdata.csv  someuser@grex.hpc.umanitoba.ca:/home/someuser/projects/def-somegroup/someuser/

More information about OpenSSH file transfer tools exist on OpenSSH SSP manpage . The Alliance/ComputeCanada documentation has a detailed Wiki entry on SCP.

File transfer SCP/SFTP clients with GUI#


There are many file transfer clients that provide convenient graphical user interface (GUI) while using and implementation of SCP or SFTP under the hood.

Some examples of the popular file transfer clients are

Other GUI clients will work with Grex too if they provide SFTP protocol support.

To use such clients, one would need to tell them that SFTP is needed, and to provide the address, which is a name of a Grex login node (yak.hpc.umanitoba.ca or grex.hpc.umanitoba.ca) and your Grex/Alliance username.

Note that we advise against saving your password in the clients: first, it is less secure, and second, it is easy to store a wrong password. File transfer clients would try to auto-connect automatically, and having a wrong password stored with them will create many failed connection attempts from your client machine, which in turn might temporarily block your IP address from accessing Grex.

Note that for GUI clients, care should be taken for support of MultiFactor Authentication . MFA is enforced on both Grex and The Alliance HPC systems.

RSYNC over SSH#


rsync is a versatile local and remote copying tool. Because rsync would “synchronize” the source and destination, it allows for resuming interrupted data transfers without excessive data retransmissions. rsync would equally well synchronize single files and entire directory trees.

For uploading and downloading files from HPC machines that allow only SSH access, rsync does support encapsulation of the data stream in an SSH channel.

An example of rsync over SSH is provided below:

rsync  -aAHSv -x --delete -e "ssh -i a-private-key.key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null " /home/$LOCAL_USER/somedir/  $REMOTE_USER@grex.hpc.umanitoba.ca:/home/$REMOTE_USER/destination/

In the example above,

  • -e is the option governing SSH use and behavior for rsync .
  • SSH tries to use a key pair (replace a-private-key.key with the name and location of your actual private key; for Grex, the corresponding public key can be uploaded to CCDB. If the key is not provided or not found, SSH will default to password authentication.
  • /home/$LOCAL_USER/somedir/ is a path on a local machine. An actual source directory must be supplied instead.
  • /home/$REMOTE_USER is a home directory on the Grex system, and $REMOTE_USER is the user name on Grex. The local and remote user names may or may not be the same.
  • note that the trailing slash / matters for rsync!

There is a lot of useful documentation pages for rsync ; just one example .

Globus Online file transfer#


GlobusOnline, or just Globus is a specialized Data Transfer and Data Sharing tool for large transfers over WAN, possibly across different organizations. Globus transfers data, in an efficient and convenient way, between any two so called “Globus endpoints” or “data collections”.

To use Globus, a user would need at least two endpoints, and an identity (account) for each endpoint is needed. The identities would have to be to be “linked” using Globus Online portal. There are “server” endpoints and “personal endpoints” in Globus.

Grex users have a choice of using either The Alliance identity that comes with CCDB account, or UManitoba identity using UMNetID. The preliminary step of linking identities is done by logging in to www.globus.org by finding your organization in the drop-down menu there. This can be Digital Alliance or University of Manitoba. Likely both identities would have to be linked to the Globus online account to be able to transfer data between the Alliance and Grex.

We do not have a Server Endpoint of Globus on Grex as of the time of writing of the documentation page. However, each user can use Globus Connect Personal to transfer data between any Server Endpoint and Grex. To do so, users need first to create their personal endpoint on Grex, under their account, as follows.

[~]$ module load globus
#
# Use an existing Globus identity to authenticate in the step below
#
[~]$ globus login --no-local-server
Please authenticate with Globus here:
------------------------------------
https://auth.globus.org/v2/oauth2/authorize?[...]
------------------------------------

Enter the resulting Authorization Code here: [...]

You have successfully logged in to the Globus CLI!

You can check your primary identity with
  globus whoami

For information on which of your identities are in session use
  globus session show

Logout of the Globus CLI with
  globus logout
[~]$ globus gcp create mapped <YOUR_NEW_ENDPOINT_NAME>
Message:     Endpoint created successfully
Endpoint ID: abcdef00-1234-0000-4321-000000fedcba
Setup Key:   12345678-aaaa-bbbb-cccc-87654321dddd
[~]$ globusconnectpersonal -setup 12345678-aaaa-bbbb-cccc-87654321dddd
[~]$ tmux new-session -d -s globus 'globusconnectpersonal -start'
### You can now start a transfer by navigating to https://globus.alliancecan.ca/
### and searching/choosing <YOUR_NEW_ENDPOINT_NAME> as the "Collection"

Once the endpoint had been created and the personal Globus server started, the endpoint will be visible/searchable in the GlobusOnline Web interface. Now it can be used for data transfers. The module load globus command also provides Globus command line interface (CLI) that can also be used to move data as described here: Globus CLI examples

It is a good practice to not to keep unnecessary processes running on Grex login nodes. Thus, when all data transfers are finished, user should stop their Globus server process running personal endpoint as follows:

[~]$ tmux kill-session -C -t globus

Once an endpoint had been created, there is (usually) no need to repeat the above steps creating a new endpoint. To restart the same existing endpoint, as needed for new data transfer sessions, it will be enough to run:

[~]$ tmux new-session -d -s globus 'globusconnectpersonal -start'

Another, more general but older guide on how to use Globus personal endpoint on a Linux system, can be found on the Frontenac “Data Transfers” page .

Check the ESNet website if you are curious about Globus, and why large data transfers over WAN might need specialized networks and software setups.

File transfers with OOD browser GUI#


NEW: It is now possible to use OpenOnDemand on Grex Web interface to download and upload data to and from Grex. Use the Files OOD dashboard menu to select a filesystem (currently /home/$USER and /project filesystems are listed there), and then Upload and Download buttons. There is a limit of about 10GB to the file transfer sizes with OOD. The OOD interface is, as of now, open for UManitoba IP addresses only (i.e., machines from campus and on UM VPN will work). The OOD File app allows also for transferring data to/from MS OneDrive and NextCloud with Rclone tool

More information is available on our OOD pages