1M: Access to Leonardo

Note

You can find general documentation for

our HPC systems here: https://docs.hpc.cineca.it/hpc/hpc_intro.html, and
Leonardo specifics here: https://docs.hpc.cineca.it/hpc/leonardo.html#leonardo-card

The project account on Leonardo Booster partition is tra26_castiel2. Two shared areas are available to all project collaborators: $WORK and $FAST, accessible via the corresponding commands $WORK and $FAST. Compute nodes do not have internet access. Any required models or data should therefore be downloaded in advance using the lrd_all_serial partition. The $FAST area provides faster I/O and already contains two folders, data and models. We recommend storing preloaded datasets and models there so they can be accessed directly by students.

Slides

Jupyter

To run jupyter notebooks on Leonardo we have to use double ssh tunnel. It is suggested to use jupyter notebooks within the node and use normal scripting and job submission for more than 1 nodes.

To open a notebook directly on a compute node a double ssh tunnel is required following the steps below:

On localhost (local shell on your pc, for Windows users use Putty), open an ssh session to Leonardo login node (i.e: login01-ext.leonardo.cineca.it) with the command:
```
 ssh USERNAME@login01-ext.leonardo.cineca.it
```
Once you have logged into Leonardo, enter the folder where you have saved the desired notebooks you want to use, and submit the job (see below for the jobscript):
```
 sbatch start_jupyter.sh
```
On a local shell, open an ssh double tunnel by copying the command written in the output file (.out) associated to the job (string A):

Open a browser and go to the URL to connect the jupyter server (string B)

The start_jupyter.sh jobscript:

#!/bin/bash
        
#SBATCH --job-name=jupyter_environment
#SBATCH --time=01:00:00 ### Change compute time based on your needs ###
#SBATCH --account=tra26_castiel2
#SBATCH --partition=boost_usr_prod
#SBATCH --reservation= ### If you have a reservation, write it here ###
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1 
#SBATCH --cpus-per-task=8
#SBATCH --gres=gpu:1
#SBATCH --error jupyter-%j.err
#SBATCH --output jupyter-%j.out

# Load the cineca-ai module (module load cineca-ai) 
# Alternatively, load the python module you used to create your venv 
module load python/3.11.6--gcc--8.5.0

# Activate your venv
source .../bin/activate ### Write here the path to your venv ###


# Get the worker list associated to this slurm job
worker_list=($(scontrol show hostnames "$SLURM_JOB_NODELIST"))

# Set the first worker as the head node and get his ip
head_node=${worker_list[0]}
head_node_ip=$(srun --nodes=1 --ntasks=1 -w "$head_node" hostname --ip-address)

# Open a SSH tunnel to the login node and then to the compute node from another shell on your local machine, copying and pasting the following instructions printed in the .out file.
# Print ssh tunnel instruction
jupyter_port=$(($RANDOM%(64511-50000+1)+50000))
jupyter_token=${USER}_${jupyter_port}
echo ===================================================
echo [INFO]: To access the Jupyter server, remember to open a ssh tunnel from your local machine with: 
echo ssh -L $jupyter_port:$head_node_ip:$jupyter_port ${USER}@login02-ext.leonardo.cineca.it -N
echo then you can connect to the jupyter server at http://127.0.0.1:$jupyter_port/lab?token=$jupyter_token
echo ===================================================

# Start the head node
echo [INFO]: Starting jupyter notebook server on $head_node 

# Note that the jupyter notebook command is available only because we have enabled the venv
command="jupyter lab --ip=0.0.0.0 --port=${jupyter_port} --NotebookApp.token=${jupyter_token}"
echo [INFO]: $command
$command &

echo [INFO]: Your env is up and running.

sleep infinity

Setting up Python

The `cineca-ai` Module

If you don’t require specific versions to run your hands-on sessions, we suggest to use the module cineca-ai available on Leonardo which already contains the most commonly used Python libraries for AI applications. You can find more information about how to use it here. https://docs.hpc.cineca.it/hpc/hpc_cineca-ai-hpyc.html

Virtual environment

If you are creating a virtual environment for your scripts to run and not using the CINECA modules, please make sure that they can be loaded and modified (if the need arises) by students. You can also create a shared virtual environment on FAST area and have the file with required packages ready in case importing them takes too long. To create a virtual environment, we recommend loading the Python module and using Pip to install packages.

Running jobs

If you need to run jobscripts that require fewer than 2 nodes and last less than 30 minutes, you can use the QoS debugging by specifying the SLURM directive --qos boost_qos_dbg (it has a higher priority and you can obtain your resources earlier).