Running on a cluster
Questions
How should Julia be run on a cluster?
Instructor note
20 min teaching
20 min exercises
Julia on HPC systems
Despite rapid growth in the HPC domain in recent years, Julia is still not considered as mainstream as C/C++ and Fortran in the HPC world, and even Python is more commonly used (and generally available) than Julia. Fortunately, even if Julia is not already available as an environment module on your favorite cluster, it is easy to install Julia from scratch. Moreover, there is little reason to expect the performance of official Julia binaries to be any worse compared to if a system administrator built Julia from scratch with architecture-specific optimization.
An overview of the availability and documentation of Julia on a range of HPC systems around the world (including EuroHPC systems) can be found at https://github.com/hlrs-tasc/julia-on-hpc-systems.
Installing Julia yourself
If you want or need to install Julia yourself on an HPC system, keep the following points in mind:
Install Julia on the cluster’s high-performance parallel file system as this will improve performance of large parallel Julia jobs.
Installation of Julia packages can take up significant disk space and include a large number of files - make sure to use a file system with sufficiently high quotas for both disk space and number of files.
When in doubt, ask the support team of the cluster for guidance!
Install Julia on the cluster
Log in to the cluster used for the workshop or (if you’re browsing this material independently) some cluster you have access to.
Install Julia using Juliaup:
$ curl -fsSL https://install.julialang.org | sh
If your home directory is not the optimal location for installing Julia, answer “no” to the question “Do you want to install with these default configuration choices?” and enter the appropriate directory path.
After the installation your shell configuration file(s) will be updated (e.g.
.bashrc
). Source this file to update your PATH variable:. $HOME/.bashrc
.
Installing packages
On HPC systems it is often recommended to install own programs and packages in a directory different
from the home directory ($HOME
). The JULIA_DEPOT_PATH
variable controls where Julia’s
package manager (as well as Julia’s code loading mechanisms) looks for package registries,
installed packages, named environments, repo clones, cached compiled package images, configuration
files, and the default location of the REPL’s history file.
Since the available file systems can differ significantly between HPC centers, it is hard to make a general statement about where the Julia depot folder should be placed. Generally speaking, the file system hosting the Julia depot should have
Good parallel I/O
No tight quotas on disk space or number of files
Read and write access by the user
No mechanism for the automatic deletion of unused files (or the depot should be excluded as an exception)
On some systems, it resides in the user’s home directory. On other systems, it is put on a parallel scratch file system.
To prepend the JULIA_DEPOT_PATH
variable with a new directory, type
export JULIA_DEPOT_PATH="/path_to_directory/v$(VERSION.major).$(VERSION.minor):$JULIA_DEPOT_PATH
(put this in the shell configuration file, e.g. .bashrc
or .bash_profile
).
MPI configuration
MPI.jl can use either a JLL-provided MPI library, which can be automatically installed when installing MPI.jl, or a system-provided MPI backend. Normally the latter option is appropriate on an HPC cluster. The MPIPreferences.jl package, based on Preferences.jl which is used to store various package configuration switches in persistent TOML files, is used to select which MPI implementation to use.
To install and configure MPI.jl with a particular MPI backend on a cluster, first load the preferred MPI library, e.g.
$ module load OpenMPI
Then, in a Julia session:
using Pkg
Pkg.add("MPI")
Pkg.add("MPIPreferences")
using MPIPreferences
MPIPreferences.use_system_binary()
This will create a file LocalPreferences.toml
in the default Julia directory, e.g.
$HOME/.julia/environments/v1.8
, with content similar to the following:
[MPIPreferences]
_format = "1.0"
abi = "OpenMPI"
binary = "system"
libmpi = "libmpi"
mpiexec = "mpiexec"
Running on GPUs
Julia packages for running code on GPUs (e.g. CUDA.jl and AMDGPU.jl) need both GPU drivers and development toolkits installed on the system you’re using. On a cluster these are normally available through environment modules which need to be loaded before importing and using the Julia GPU package.
On NVIDIA GPUs, the CUDA.jl package needs NVIDIA drivers and toolkits.
When installing the CUDA.jl package and importing it, Julia will look for libraries in the
CUDA_PATH
(or CUDA_HOME
) environment variable. If these are not found they will be
automatically installed but it’s strongly recommended to use instead optimised pre-installed
libraries. These are typically available in environment modules CUDA
, cuDNN
etc.
For example:
$ module load CUDA
$ julia
using Pkg
Pkg.add("CUDA")
using CUDA
CUDA.versioninfo()
ClusterManagers
ClusterManagers.jl is a package for interactive HPC work with all commonly used HPC scheduling systems, including SLURM, PBS, LSF, SGE, HTCondor, Kubernetes, etc.
To use ClusterManagers.jl we need access to Julia on the login node of a cluster. The following
script uses the SlurmManager
for HPC systems using the SLURM scheduler:
using Distributed, ClusterManagers
# request 4 tasks
addprocs(SlurmManager(4), partition="cpu", t="00:5:00", A="p200051", qos="short")
# if using reservation:
#addprocs(SlurmManager(4), partition="cpu", t="00:5:00", A="p200051", reservation="2022-11-enccs-julia-cpu")
# let workers do some work
for i in workers()
id, pid, host = fetch(@spawnat i (myid(), getpid(), gethostname()))
println(id, " " , pid, " ", host)
end
# The Slurm resource allocation is released when all the workers have exited
for i in workers()
rmprocs(i)
end
Use ClusterManagers.jl to launch parallel job
Take the parallelised version of the estimate_pi()
function encountered in an
earlier exercise:
using Distributed
@everywhere function estimate_pi(num_points)
hits = 0
for _ in 1:num_points
x, y = rand(), rand()
if x^2 + y^2 < 1.0
hits += 1
end
end
fraction = hits / num_points
return 4 * fraction
end
Open a Julia REPL on the cluster login node. Import ClusterManagers, Distributed and BenchmarkTools.
Request one SLURM task with the
addprocs()
method (see cluster-specific info above).Define the
estimate_pi()
function with the@everywhere
macro.Benchmark the serial version:
num_points = 10^9 num_jobs = 100 chunks = [num_points / num_jobs for i in 1:num_jobs] @btime mean(pmap(estimate_pi, $chunks))
Now add 7 more cores by repeating the
addprocs()
command and benchmark it again. Note that you need to redefineestimate_pi()
every time you add workers!Add another 8 workers and benchmark one final time.
Finally remove the workers to release the allocations.
Solution
Request 1 worker (core). Replace “PROJECT-ID” and “QOS” appropriately:
addprocs(SlurmManager(1), partition="cpu", t="00:5:00", A="PROJECT-ID", qos="QOS")
Then define the function on the worker:
using Distributed
@everywhere function estimate_pi(num_points)
hits = 0
for _ in 1:num_points
x, y = rand(), rand()
if x^2 + y^2 < 1.0
hits += 1
end
end
fraction = hits / num_points
return 4 * fraction
end
Run on all the cores and time it:
num_points = 10^9
num_jobs = 100
chunks = [num_points / num_jobs for i in 1:num_jobs]
@btime mean(pmap(estimate_pi, chunks))
Repeat the process with 7 more cores:
addprocs(SlurmManager(7), partition="cpu", t="00:5:00", A="PROJECT-ID", qos="QOS")
using Distributed
@everywhere function estimate_pi(num_points)
hits = 0
for _ in 1:num_points
x, y = rand(), rand()
if x^2 + y^2 < 1.0
hits += 1
end
end
fraction = hits / num_points
return 4 * fraction
end
@btime mean(pmap(estimate_pi, chunks))
The redo exact same thing with 8 more workers.
Run an MPI job
Take the MPI version of the estimate_pi()
code that we encountered in the MPI episode:
estimate_pi.jl
using MPI
MPI.Init()
comm = MPI.COMM_WORLD
rank = MPI.Comm_rank(comm)
size = MPI.Comm_size(comm)
function estimate_pi(num_points)
hits = 0
for _ in 1:num_points
x, y = rand(), rand()
if x^2 + y^2 < 1.0
hits += 1
end
end
fraction = hits / num_points
return 4 * fraction
end
function main()
t1 = time()
num_points = 10^9
# divide work evenly between ranks
my_points = floor(Int, num_points / size)
remainder = num_points % size
if rank < remainder
my_points += 1
end
# each rank computes pi for their points
pi = estimate_pi(my_points)
# sum up all estimates and average on root tank
pi_sum = MPI.Reduce(pi, +, comm, root=0)
if rank == 0
println("pi = $(pi_sum / size)")
end
t2 = time()
println("elapsed time = $(t2 - t1)")
end
main()
Use the following batch script to submit a Julia job to the queue (modify the SLURM options as needed):
#!/bin/bash -l
#SBATCH -A p200051
#SBATCH -t 00:10:00
#SBATCH -q short
#SBATCH -p cpu
#SBATCH -N 1
#SBATCH --ntasks-per-node=8
module load OpenMPI
module load Julia
n=$SLURM_NTASKS
srun -n $n julia estimate_pi.jl
Try running it with different number of nodes and/or cores. Does it scale well up to a full node?
Keypoints
Julia can usually be installed and configured without too much hassle on HPC systems.
ClusterManagers is a useful package for working interactively on a cluster through the Julia REPL.
For non-interactive work, Julia jobs can also be submitted through the scheduler.