Access to Vega
You should have recieved your username from the Vega support with the help of our colleagues at ENCCS. One important step to login to the system is to generate an SSH key and upload your public to the server. You can read it in the Vega documentation.
Once the setup is completed, you can login to the system. While the Singularity
module is readily avaiable in the login node, we should book a compute node to run
our examples. You can book a CPU-node for 1 hour using this command
salloc -n 1 -t 1:00:00
And for a GPU-node with X number of GPUs per node (there are 4 GPUs avaiable per node in Vega)
salloc -n 1 --gres=gpu:X --partition=gpu -t 5:00:00
Once the allocation is granted you will receive a message similar to
salloc: Pending job allocation 24122556
salloc: job 24122556 queued and waiting for resources
salloc: job 24122556 has been allocated resources
salloc: Granted job allocation 24122556
salloc: Waiting for resource configuration
salloc: Nodes cn0381 are ready for job
The granted CPU compute node here is cn0381. The general form for the CPU compute node likes like cn0XXX for GPU compute node gnXXX. Now you should SSH to the compute node to run interactively our job using the command
ssh cn0381
You might get a warning regarding the authenticity of the host, similar to the output below.
The authenticity of host 'cn0381 (<no hostip for proxy command>)' can't be established.
ECDSA key fingerprint is SHA256:0BOlvbjVPLytjYEium04uNTCACCQN/Rr7NMJhje30aw.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'cn0381' (ECDSA) to the list of known hosts.
Please enter yes
to the quesiton and igonre it as it is a self-assigned useless check
that doen’t understand what is the purpose of our login. Now, we are able to run
our jobs interactively.
To run Jupyter Notebooks, we need to load Anaconda module.
Afterwards, you can run a Jupyter kernel by specifying the port number and ip address. The ip address here is the name of compute node, in the example given above is cn0381.
jupyter-notebook --no-browser --port=8888 --ip=cn0381
The result should look like
[I 13:21:26.105 NotebookApp] JupyterLab extension loaded from /cvmfs/sling.si/modules/el7/software/Anaconda3/2020.11/lib/python3.8/site-packages/jupyterlab
[I 13:21:26.105 NotebookApp] JupyterLab application directory is /cvmfs/sling.si/modules/el7/software/Anaconda3/2020.11/share/jupyter/lab
[I 13:21:26.107 NotebookApp] Serving notebooks from local directory: /ceph/hpc/home/euhosseine
[I 13:21:26.107 NotebookApp] Jupyter Notebook 6.1.4 is running at:
[I 13:21:26.107 NotebookApp] http://cn0381:8888/?token=80d695595aa333c6d97dc6f868f96b36f4812622a5008090
[I 13:21:26.107 NotebookApp] or http://127.0.0.1:8888/?token=80d695595aa333c6d97dc6f868f96b36f4812622a5008090
[I 13:21:26.107 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 13:21:26.122 NotebookApp]
To access the notebook, open this file in a browser:
file:///ceph/hpc/home/euhosseine/.local/share/jupyter/runtime/nbserver-339349-open.html
Or copy and paste one of these URLs:
http://cn0381:8888/?token=80d695595aa333c6d97dc6f868f96b36f4812622a5008090
or http://127.0.0.1:8888/?token=80d695595aa333c6d97dc6f868f96b36f4812622a5008090
In your local machin (PC/laptop), open a terminal and use this command to tunnel to the running kernel on Vega.
ssh -N -f -L 8888:cn0381:8888 euhosseine@vglogin0005.vega.izum.si
The first port number is for your local machine and the second port number is what
you specified above running a Jupyter Notebook. Open a brower, and enter http://localhost:8888
.
You should see a prompt to enter the passowrd or the token. The token in this run is the number after the token
.
Entering the token, you will be shown the tree of structure of home folder.
To use TensorFlow or Horovod in this course, we can simply load them through module system.
module load TensorFlow/2.5.0-fosscuda-2020b
Or
module load Horovod/0.22.1-fosscuda-2020b-TensorFlow-2.5.0