Remote manager tutorial

We will see here the engine which is employed by BigDFT to run calculations on a remote supercomputer. After this tutorial you will be able to trigger the usage of the nodes from the local workstation.

The remotemanager package will have to be installed on your machine (or in the colab session)

!pip install -U remotemanager
Collecting remotemanager
  Using cached remotemanager-0.10.10-py3-none-any.whl.metadata (1.9 kB)
Requirement already satisfied: pyyaml in /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages (from remotemanager) (6.0.1)
Requirement already satisfied: requests in /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages (from remotemanager) (2.31.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages (from requests->remotemanager) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages (from requests->remotemanager) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages (from requests->remotemanager) (2.1.0)
Requirement already satisfied: certifi>=2017.4.17 in /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages (from requests->remotemanager) (2023.11.17)
Using cached remotemanager-0.10.10-py3-none-any.whl (85 kB)
Installing collected packages: remotemanager
Successfully installed remotemanager-0.10.10

Begin with the common imports, Dataset and URL. As these two are always used, they are available from remotemanager root.

from remotemanager import Dataset, URL

Define a simple function to run:

def basic_function(inp):
    import time
    
    time.sleep(1)
    
    return inp*inp

For the moment, we’re just running on our local machines

url = URL()
#url = URL(host='vega') #switch on this after the first walkthrough of this notebook

The basis concept: Dataset

We here see the main concept of the remotemanager spirit. A function can ran multiple times on a remote machine with different values of its arguments.

This will be useful to control the running of different calculations of more complex data submissions.

As basic setup is done, lets create the Dataset

ds = Dataset(function=basic_function,
             url=url,
             # script='module load Python', # we need python to execute a python function, such script loads it
            )

The Dataset stores the function, the Runners store the arguments

Right now all we have is a function, need to create the args:

values = [1, 3, 7, 50]

for val in values:
    ds.append_run(args={'inp': val})
appended run runner-0
appended run runner-1
appended run runner-2
appended run runner-3

Now we have all the material required:

  • Function

  • Connection

  • Arguments

Time to run:

Here we run the dataset.

WARNING: If you have timeout problems in this part, try to increase the timeout value in the URL class attributes.

#ds.url.timeout = 120
#ds.url.max_timeouts = 4
ds.run(force=True)
assessing run for runner dataset-0324eba7-runner-0... force running
assessing run for runner dataset-0324eba7-runner-1... force running
assessing run for runner dataset-0324eba7-runner-2... force running
assessing run for runner dataset-0324eba7-runner-3... force running

The below cell is useful to wait on a run function, there are two sections to it:

while not ds.all_finished: ...

This block waits for the dataset to be completed. Dataset.all_finished only returns True when all the runners are completed.

Note: You can also use Dataset.is_finished to see the state on a per-runner basis.

WARNING (for the exercise): If you have an error like ‘python not found’ here, define a script header in the dataset.

print(ds.run_cmds)

import time
while not ds.all_finished:
    print('dataset not finished yet, sleeping for 1s')
    time.sleep(1)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[8], line 1
----> 1 print(ds.run_cmds)
      3 import time
      4 while not ds.all_finished:

AttributeError: 'Dataset' object has no attribute 'run_cmds'

If we’ve made it through the wait block then we must have results, let’s fetch them:

ds.fetch_results()

Now they’re fetched, we can access them via the results property without having to talk to the remote again:

ds.results
[1, 9, 49, 2500]
ds.runners[0].history
{'2022-11-16 19:22:00/0': 'created',
 '2022-11-16 19:22:02/0': 'submitted',
 '2022-11-16 19:22:03/0': 'resultfile created remotely',
 '2022-11-16 19:22:12/0': 'completed'}

Exercise

Run this simple function on vega. Change the URL to vega (it should be enough to put in the URL command the same host you put for your ssh connection - e.g. ssh vega). Pay attention to the various steps of the procedure.

Hint: Is python loaded by default when you ssh into vega? Perhaps module spider python can help with this.

Important: contact us in case you are using a setup that would not work for the excercise. You can run the following tutorials also on google colab in case there are difficulties in installing a jupyter notebook environment on your workstation.