Installation in EuroHPC systems

Warning

These instructions may be outdated and were last updated in 2023.

Here are instructions for accessing the EuroHPC system, setting up the Python environment and running jobs. Please follow the instructions for the HPC system that will be used during the workshop that you are attending.

Thanks to IZUM in Slovenia we will have an allocation on the petascale Vega EuroHPC system for the duration of the workshop. The sustained and peak performance of Vega is 6.9 petaflops and 10.1 petaflops, respectively.

Architecture:

Vega has both GPU and CPU partititions:

  • CPU partition: Each node has two AMD Epyc 7H12 CPUs, each with 64 cores. 768 nodes with 256 GB, 192 nodes with 1 TB of RAM DDR4-3200, local 1.92 TB M.2 SSD.

  • GPU partition: Each node has 4 GPUs NVidia A100 with 40 GB HBMI2 and two AMD Epyc 7H12 CPUs. In total 60 nodes with 512 GB of RAM DDR4-3200, local 1.92 TB M.2 SSD

Software on the cluster is available through a module system. First load the Anaconda module to get access to the conda package manager:

$ #check available Anaconda modules:
$ ml av Anaconda3
$ ml add Anaconda3/2020.11

To be able to create conda environments in your home directory you need to initialize it. The following command adds the necessary configuration to your .bashrc file:

$ conda init bash

You now need to either log in to the cluster again or start a new shell session by typing bash:

$ bash

Now, either create a new environment with all required dependencies or activate a pre-existing environment created in a directory you have access to:

$ conda env create -f https://raw.githubusercontent.com/ENCCS/hpda-python/main/content/env/environment.yml

The installation can take several minutes. Now activate the environment by:

$ conda activate pyhpda

mpi4py

Additional steps are required to use mpi4py since the Python package needs to be linked with the system’s MPI libraries.

To use mpi4py you need to load a module which contains MPI libraries and then install mpi4py using pip:

$ ml add foss/2020b
$ CC=gcc MPICC=mpicc python3 -m pip install mpi4py --no-binary=mpi4py

Running jobs

Resources can be allocated both through batch jobs (submitting a script to the scheduler) and interactively. You will need to provide a project ID when asking for an allocation. To find out what projects you belong to on the cluster, type:

$ sacctmgr -p show associations user=$USER

The second column of the output contains the project ID.

Vega uses the SLURM scheduler. Use the following command to allocate one interactive node with 8 cores for 1 hour in the CPU partition. If there is a reservation on the cluster for the workshop, add --reservation=RESERVATIONNAME to the command.

$ salloc -N 1 --ntasks-per-node=8 --ntasks-per-core=1 -A <PROJECT-ID> --partition=cpu  -t 01:00:00

To instead book a GPU node, type (again adding reservation flag if relevant):

$ salloc -N 1 --ntasks-per-node=1 --ntasks-per-core=1 -A <PROJECT-ID> --partition=gpu --gres=gpu:1 --cpus-per-task 1 -t 01:00:00

Running Jupyter

The following procedure starts a Jupyter-Lab server on a compute node, creates an SSH tunnel from your local machine to the compute node, and then connects to the remote Jupyter-Lab server from your browser.

First make sure to follow the above instructions to:

  • Allocate an interactive compute node for a sufficiently long time

  • Switch to the pyhpda conda environment.

After allocating an interactive node you will see the name of the node in the output.

After allocating an interactive node you will see the name of the node in the output, e.g. salloc: Nodes cn0709 are ready for job.

You now need to ssh to that node, switch to the pyhpda conda environment, and start the Jupyter-Lab server on a particular port (choose one between 8000 and 9000) and IP address (the name of the compute node). Also load a module containing OpenMPI to have access to MPI inside Jupyter:

$ ssh cn0709
$ conda activate pyhpda
$ ml add foss/2021b
$ jupyter-lab --no-browser --port=8123 --ip=cn0709

Now create an SSH tunnel from a new terminal on your local machine to the correct port and IP:

$ ssh -TN -f YourUsername@login.vega.izum.si -L localhost:8123:cn0709:8123 -L localhost:8787:cn0709:8787

Go back to the terminal running Jupyter-Lab on the compute node, and copy-paste the URL starting with 127.0.0.1 which contains a long token into your local browser. If that does not work, try replacing 127.0.0.1 with localhost.

If everything is working as it should, you should now be able to create a new Jupyter notebook in your browser which is connected to the compute node and the pyhpda conda environment.