Queueing jobs and running a Singularity container

Most supercomputers use a system called SLURM to manage jobs and maximise cluster utilisation. There are two main workflows:

  • Interactive mode, in which the user asks for resources, logs into the compute nodes and runs all they need to run

  • Batch mode, in which the user prepares a submit script (usually Bash) that contains all the instructions to be executed. The script is then placed in the queue and runs without any further inputs.

Interactive resources can be requested with the salloc command in the following fashion:

$ salloc -n <n_cores> -t HH:MM:SS -A <allocation_number> -p <partition>

Most parameters are self-explanatory. -t is the wall time, -A is an allocation dependent on the project. -p is required by some clusters that have different types of nodes, e.g. Dardel has a set of nodes reserved for interactive use on a partition called shared and a set of nodes with GPUs (gpu). There are many other options that are explained here. Once the requested resources are granted, a MPI job can be executed with srun -n <n_cores> my_command; alternatively, the user can also ssh directly into the allocated node.

Type-Along

Let’s book a node with a GPU to run our script:

$ salloc -n 1 -t 00:50:00 -A pdc-bus-2024-8 -p gpu --gpus=1

Once we get our node, we can train our model:

$ srun -n 1 singularity exec --rocm -B ./models:/models /cfs/klemming/projects/supr/bustestingshared/ENCCS/rocm_tensorflow/ python models/unet/main.py

The -B flag is used to bind a directory on the filesystem to a directory in the container and the --rocm flag is used to expose the GPU to the container. The rocm_tensorflow folder contains a pre-built container with the necessary Python packages. If the ImageMagick module was loaded, we can inspect a the accuracy and losses in the results/training folder with the display command. Once the network is trained, we can perform inference:

$ srun -n 1 singularity exec --rocm -B ./models:/models,./images:/images /cfs/klemming/projects/supr/bustestingshared/ENCCS/rocm_tensorflow python models/serving/main.py water_body_17.jpg

You can open another local terminal (i.e. on your computer) and copy the images back for local visualisation:

scp -r <yourusername>@dardel.pdc.kth.se:/cfs/klemming/projects/supr/bustestingshared/<yourname>/supercomputing4ai_demo/images /local/path

Bonus - Batch mode job

It is often easier to submit a job to the general queue rather than waiting for an interactive resource. As an example, we can book two nodes and print their host name. Let’s create the following submission script and call it submit_test.sh:

#!/bin/bash -l
#SBATCH -J test # Job name
#SBATCH -t 00:02:00 # Wall time
#SBATCH -A pdc-bus-2024-8 # Allocation number
#SBATCH -p main # Partition - in this case the normal (non-GPU) is fine
#SBATCH --nodes 2 # We want two different nodes
#SBATCH --ntasks-per-node=1 # We want to book only one core on each node - We need just the hostname after all :)
#SBATCH --mail-type=ALL # Get an email when the job starts and when the job ends

# Now for the actual instruction to be executed
srun hostname > out.txt 2>&1 # Black magic to redirect both stdout and stderr to the same file

We can now submit our short job to the queue:

$ sbatch submit.sh

We can check the status in queue with:

$ squeue -u <username>