General

Retrieval augmented generation is a technique introduced by researchers at Facebook AI Research (FAIR) in paper from 2020 titled “Retrieval-augmented Generation for Knowledge-Intensive NLP Tasks”.

The idea behind RAG is that we can add knowledge to an already trained LLM by injecting into the context useful information retrieved after receiving the user’s query. This is very useful because otherwise the only way to “add” new knowledge to an LLM would be to finetune the model over new data.

rag_sequence_diagram

Retrieval Augmented Generation systems are interesting for companies because it’s a way to have a model that:

their users can query about internal company procedures;
can be updated “on the fly” by indexing new documents/reindexing updated documents;
the company has total control over model knowledge base (i.e. a document is no longer relevant? Just remove it from the knowledge base);

Let’s see a brief demo of how RAG works.

Imports

from dotenv import load_dotenv

import os
import socket
from glob import glob
import re
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor
from functools import partial

from datetime import datetime
import time
import random
import math

from typing import List

import pandas as pd
import numpy as np

from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain_community.cross_encoders import HuggingFaceCrossEncoder
from langchain_openai import ChatOpenAI
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter, SentenceTransformersTokenTextSplitter, CharacterTextSplitter, MarkdownTextSplitter
from langchain_chroma.vectorstores import Chroma 
from langchain_huggingface import HuggingFaceEmbeddings

import chromadb
from chromadb.errors import NotFoundError
from rank_bm25 import BM25Okapi

from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer

from sklearn.metrics.pairwise import cosine_similarity
import matplotlib.pyplot as plt

from pydantic import BaseModel

/leonardo/home/userinternal/rmioli00/git/cinecaxtpc25/session_5/venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Env config

t0 = datetime.now()
date = re.sub(r"[ :-]", "_", str(t0)[:19])
print(f"Last execution {t0}")

load_dotenv()

# Models
EMBEDDER = "BAAI/bge-m3"
RERANKER = "BAAI/bge-reranker-v2-m3"
LLM = "mistralai/Mistral-Small-3.1-24B-Instruct-2503"

# Endpoints
VLLM_OPENAI_ENDPOINT = os.environ["VLLM_OPENAI_ENDPOINT"]
VLLM_KEY = os.environ["VLLM_KEY"]

# Paths
PROMPT_PATH = "../data/prompts"
INPUT_PATH = "../data/input"
OUTPUT_PATH = "../data/output/chunking"

Path(OUTPUT_PATH).mkdir(exist_ok=True, parents=True)

# Use GPU 3 for this notebook. GPUs 0 and 1 are used to load the llm
os.environ["CUDA_VISIBLE_DEVICES"] = "2"

llm = ChatOpenAI(base_url=VLLM_OPENAI_ENDPOINT, api_key=VLLM_KEY, model=LLM, temperature=0, max_completion_tokens=3000)

Last execution 2025-07-28 16:39:27.599879

Why do we need RAG systems?

Let us evaluate the responses of an off-the-shelf pretrained model to a set of questions.

test_questions = ["What GPUs are available on Leonardo?", "Is there any partition without gpus?",
                  "What GPUs are available on the Cloud?", "Can I associate a domain name to a vm?",
                  "What are the naming conventions I should follow when asking for a domain name for a vm machine?",
                  "What is Cineca AI and how do I enable it?", 
                  "What are the names of the QOS queues available on the Leonardo supercomputer BOOSTER partition?"]

for question in test_questions:
    print(f"[QUESTION]: {question}")
    answ = llm.stream([("system", "You are an helpful assistant, answer to the user's questions in a precise and concise manner."),
                       ("human", question)])
    print("[ANSWER]: ", end="")
    for chunk in answ:
        print(chunk.content, end="")
    print("\n")

[QUESTION]: What GPUs are available on Leonardo?
[ANSWER]: As of my last update in October 2023, Leonardo, which is part of the EuroHPC Joint Undertaking, is designed to be one of the most powerful supercomputers in Europe. It is equipped with advanced GPUs to handle its high-performance computing tasks. Specifically, Leonardo is known to feature NVIDIA A100 GPUs. These GPUs are part of the NVIDIA Ampere architecture and are widely used in supercomputing and AI applications due to their high performance and efficiency.

For the most current and detailed information, I recommend checking the official EuroHPC or Leonardo supercomputer documentation, as hardware specifications can be updated or expanded over time.

[QUESTION]: Is there any partition without gpus?
[ANSWER]: Yes, there are partitions without GPUs. In many high-performance computing (HPC) environments, clusters are often divided into different partitions to cater to various types of workloads. Some common types of partitions include:

1. **CPU-only partitions**: These partitions are dedicated to jobs that do not require GPU acceleration. They typically consist of nodes with multiple CPU cores but no GPUs.

2. **GPU partitions**: These partitions are equipped with nodes that have one or more GPUs, suitable for tasks that benefit from GPU acceleration, such as machine learning, deep learning, and certain scientific computations.

3. **Large memory partitions**: These partitions have nodes with a significant amount of RAM, ideal for memory-intensive tasks.

4. **High-throughput partitions**: These partitions are designed to handle a large number of smaller jobs efficiently.

5. **Debugging partitions**: These partitions are used for testing and debugging code, often with shorter time limits and fewer resources.

To determine the specific partitions available in a particular HPC system, you should refer to the documentation or help resources provided by the system administrators.

[QUESTION]: What GPUs are available on the Cloud?
[ANSWER]: The availability of GPUs on the cloud varies by provider, but here are some of the most commonly offered GPUs from major cloud service providers:

1. **Amazon Web Services (AWS)**:
   - NVIDIA Tesla V100
   - NVIDIA A10G
   - NVIDIA T4
   - NVIDIA K80
   - AMD Radeon Pro V520
   - NVIDIA A100 (80GB and 40GB versions)

2. **Microsoft Azure**:
   - NVIDIA Tesla V100
   - NVIDIA A100 (80GB and 40GB versions)
   - NVIDIA T4
   - NVIDIA K80
   - AMD Radeon Instinct MI25

3. **Google Cloud Platform (GCP)**:
   - NVIDIA Tesla V100
   - NVIDIA A100 (80GB and 40GB versions)
   - NVIDIA T4
   - NVIDIA K80

4. **IBM Cloud**:
   - NVIDIA Tesla V100
   - NVIDIA T4
   - NVIDIA K80

5. **Oracle Cloud Infrastructure (OCI)**:
   - NVIDIA Tesla V100
   - NVIDIA A100 (80GB and 40GB versions)
   - NVIDIA T4

6. **Alibaba Cloud**:
   - NVIDIA Tesla V100
   - NVIDIA T4
   - NVIDIA K80

These GPUs are typically available in various instance types and configurations to suit different workloads, such as machine learning, deep learning, high-performance computing, and graphics-intensive applications. Always check the latest offerings on the respective cloud provider's website, as new GPU options are frequently added.

[QUESTION]: Can I associate a domain name to a vm?
[ANSWER]: Yes, you can associate a domain name with a virtual machine (VM). This process typically involves several steps:

1. **Obtain a Domain Name**: Purchase a domain name from a domain registrar if you don't already have one.

2. **Set Up DNS Records**: Configure the DNS records for your domain. This usually involves setting up an A record or CNAME record to point to the IP address of your VM.

   - **A Record**: Maps a domain name directly to an IP address.
   - **CNAME Record**: Maps a domain name to another domain name.

3. **Configure Your VM**: Ensure that your VM is configured to handle incoming traffic on the appropriate ports (e.g., HTTP/HTTPS for web servers).

4. **Update Firewall Settings**: Make sure that any firewalls (both on the VM and in the network) allow traffic to the necessary ports.

5. **Test the Configuration**: Verify that the domain name resolves to your VM's IP address and that the VM is serving the expected content.

6. **SSL/TLS (Optional)**: If you're serving HTTPS, you may need to obtain and install an SSL/TLS certificate for your domain.

Here's a simple example of setting up an A record:

- **Domain Name**: example.com
- **A Record**: example.com -> 192.0.2.1 (IP address of your VM)

After configuring the DNS records, it may take some time for the changes to propagate across the internet.

[QUESTION]: What are the naming conventions I should follow when asking for a domain name for a vm machine?
[ANSWER]: When naming a domain for a virtual machine (VM), follow these conventions to ensure clarity, consistency, and ease of management:

1. **Descriptive and Unique**:
   - Use a name that describes the VM's purpose or function.
   - Ensure the name is unique within your domain namespace.

2. **Consistent Naming Scheme**:
   - Follow a consistent pattern, such as `environment-role-app.domain.com` (e.g., `prod-db-mysql.example.com`).

3. **Use Hyphens or Underscores**:
   - If using multi-word names, separate words with hyphens (e.g., `web-server`) or underscores (e.g., `web_server`), but be consistent in your choice.

4. **Avoid Special Characters**:
   - Stick to alphanumeric characters and hyphens or underscores. Avoid spaces, slashes, and other special characters.

5. **Include Environment**:
   - Indicate the environment (e.g., `dev`, `test`, `prod`) to avoid confusion (e.g., `dev-web-server.example.com`).

6. **Use Fully Qualified Domain Names (FQDN)**:
   - Include the full domain name, not just the hostname (e.g., `web-server.example.com` instead of just `web-server`).

7. **Follow Organizational Standards**:
   - Adhere to any specific naming conventions or standards established by your organization.

8. **Keep it Short but Meaningful**:
   - Aim for names that are short enough to be easily typed and remembered, but meaningful enough to convey the VM's purpose.

### Examples:
- `prod-db-mysql.example.com`
- `dev-web-app.example.com`
- `test-api-server.example.com`
- `staging-file-server.example.com`

By following these conventions, you can create a clear and manageable naming structure for your VMs.

[QUESTION]: What is Cineca AI and how do I enable it?
[ANSWER]: Cineca AI is a suite of artificial intelligence services provided by Cineca, a leading Italian supercomputing center. It offers tools and resources for AI research, development, and deployment. To enable Cineca AI, follow these general steps:

1. **Access the Cineca AI Portal**: Visit the official Cineca AI website and create an account if you don't have one.

2. **Request Access**: Submit a request to access the Cineca AI services. You may need to provide details about your project or research.

3. **Approval**: Wait for approval from Cineca. Once approved, you will receive instructions on how to access the services.

4. **Set Up Your Environment**: Follow the provided guidelines to set up your computing environment. This may include installing specific software or configuring your system.

5. **Start Using Cineca AI**: Once your environment is set up, you can start using the various AI tools and resources provided by Cineca.

For specific details, refer to the Cineca AI documentation or contact their support team.

[QUESTION]: What are the names of the QOS queues available on the Leonardo supercomputer BOOSTER partition?
[ANSWER]: As of my last update, the Leonardo supercomputer's BOOSTER partition typically supports several Quality of Service (QoS) queues to manage different types of jobs. While specific names can vary, common QoS queues on such high-performance computing systems often include:

1. **debug**: For short, quick tests and debugging.
2. **standard**: For standard batch jobs.
3. **express**: For jobs that need quicker turnaround.
4. **long**: For long-running jobs.
5. **gpu**: For jobs that require GPU resources.
6. **bigmem**: For jobs that require large amounts of memory.

For the most accurate and up-to-date information, you should refer to the official documentation or support resources provided by the Leonardo supercomputer's administrators.

Except for the first answer, all the following answers are incorrect.

# We know where to find data, so we read the correspondent file. In a while we are going to automate also this step.
with open(os.path.join(INPUT_PATH, "leonardo.rst.txt"),  mode = "r") as f:
    data = f.readlines()

answ = llm.stream([("system", "You are an helpful assistant, answer to the user questions in a precise and concise manner."),
                   ("human", f"Given the following data, what are the names of the QOS queues available on the Leonardo supercomputer BOOSTER partition?\n\n{data}")])

for chunk in answ:
    print(chunk.content, end="")

The QOS queues available on the Leonardo supercomputer BOOSTER partition are:

- normal
- boost_qos_dbg
- boost_qos_bprod
- boost_qos_lprod
- boost_qos_fuabprod
- qos_fualowprio

In the following section, we will implement a Retrieval-Augmented Generation (RAG) system using the HPC documentation provided by Cineca to their users. As you will observe, the primary task in building RAG systems involves rigorously testing and evaluating different configurations of the information retrieval component to measure their impact on overall system performance. In a RAG pipeline, the generative AI component represents only the final stage of the process; the majority of the effort is dedicated to designing a robust retrieval system tailored to your data.

In our case, the documents are in .rst format, so we are not concerned with reading files in other formats such as .pdf or docx. However, in real-world scenarios, this step can be handled using libraries such as marker, docling, or similar tools, depending on your needs. In this tutorial, we will not focus on document parsing, as our files are already in raw text format. Additionally, the choice of parsing libraries is highly dependent on the structure of your data and your specific preferences.

RAG building blocks

The development and deployment of a RAG system typically involves five main steps:

loading documents from a data source;
chunking these documents in some (optimal) way;
storing the chunked data in a vector database;
retrieving relevant data to answer user queries;
passing the retrieved data to an LLM for generation.

Steps 1 to 3 are performed “offline”, while steps 4 and 5 are executed “online”.

Classical information retrieval approaches

Classical information retrieval systems operated by matching keywords in user queries with terms found in document collections, ranking documents based on term frequency and inverse document frequency (TF-IDF) without understanding the meaning of words. For example, bag-of-words models represented documents as unordered collections of terms, capturing term frequency but ignoring word order and context; search engines would then rank documents with the highest TF-IDF scores for the query terms.

Another classical approach was exact match retrieval using tags, where documents were labeled with controlled keywords (tags), and retrieval was based on finding documents with matching tags, as in early library catalogs or enterprise document management systems.

Semantic search

In contrast, semantic search goes beyond exact term matching by using vector representations (embeddings) of documents and queries, capturing the meaning of words and their relationships. This allows retrieval systems to surface relevant documents even when they do not contain the exact keywords used in the query but are semantically related, enabling systems to understand that “car” and “automobile” refer to the same concept or that a document discussing “heart attacks” may be relevant for the query “myocardial infarction.”

Image courtesy of: https://amitness.com/posts/universal-sentence-encoder.

The embedding phase is performed twice: over the documents, and over the query. Then, we search the document (chunk) with the lower distance with respect to the query.

Chunking

To compute document embeddings effectively, it is necessary to partition the text before processing. This requirement arises from the limited context length of embedding models. For example:

The BAAI/bge-m3 embedder supports a maximum sequence length of approximately 8,000 tokens.
The all-mpnet-base-v2 model has a context length limit of 384 tokens.
The all-MiniLM-L12-v2 model allows a maximum sequence length of 256 tokens.

These limitations define the upper bound for the sequence length we can embed at once.

We will now evaluate various chunking strategies on our document collection.

INPUT_DOCS = glob(os.path.join(INPUT_PATH, "*"))
INPUT_DOCS[:10]

['../data/input/generic_share_create.rst.txt',
 '../data/input/database.rst.txt',
 '../data/input/hpc_software.rst.txt',
 '../data/input/gaia.rst.txt',
 '../data/input/miniconda.rst.txt',
 '../data/input/secgroups_create.rst.txt',
 '../data/input/index_storage_ops.rst.txt',
 '../data/input/interactive_computing.rst.txt',
 '../data/input/security_guidelines.rst.txt',
 '../data/input/fip_association.rst.txt']

# Create a list of document, with some metadata associated to each document.
documents = []

for doc in INPUT_DOCS:
    with open(doc) as f:
        documents.append(Document(f.read(), metadata={"doc_name":doc.split("/")[-1], "scraped_on": "2025-07-22"}))
        
documents[0]

Document(metadata={'doc_name': 'generic_share_create.rst.txt', 'scraped_on': '2025-07-22'}, page_content='.. _shares_generic_create_card:\n\nCreate and use a GENERIC_TYPE share\n===================================\n\nThe following sections describe the steps needed to create a share and mount it on two VMs attached to a local network. \nNote that the user needs to configure the VMs in a way that allows logging in via ssh. \n\nRequest to be enabled to the service\n------------------------------------\n\nThe user willing to make use of the Manila service needs to send an email to superc@cineca.it, communicating \n\n- how many shares are needed.\n- their dimensions (GB).\n- the tenant\'s name.\n\nOnce the tenant is enabled to the service by the User Support Team, all users of the tenant will be able to use the service. \n\nCreate share network\n--------------------\n\nAs a first step, in the :ref:`cloud/os_overview/management_tools/dashboard:horizon dashboard` you need to create the share network \nby clicking on *"Create Share Network"* in *"Share â\x86\x92 Share Networks"* and set the value for the following attributes:\n\n- Share network name.\n- network: choose the desired network, in our example example_share_guide_net.\n- subnet: choose the desired subnet, in our example example_share_guide_subnet.\n- Click on the *"save"* button.\n\n.. image:: /cloud/_img/op_share_generic_img1.png\n\nCreate the share\n----------------\n\nCreate the share by clicking on *"Create Share"* in *"Share â\x86\x92 Shares"* and setting the following information:\n\n- share name\n- share protocol  == "NFS"\n- size (on the right side is visualized information about the actual available and used space within the tenant)\n- Type == "generic_type"\n- Leave blank the option "Make visible for all projects" because it is not enabled \n- In the end, click on the *"create"* button.\n\n.. image:: /cloud/_img/op_share_generic_img2.png\n\n\nSet the access rule(s) on the share just created. \n\n- On the OpenStack dashboard click on *"Share â\x86\x92 Shares"* \n- select the share just created\n- in the menu on the right select *"Manage Rules".*\n\n.. image:: /cloud/_img/op_share_generic_img3.png\n\nClick on *"Add rule"* and set:\n\n- access type: Choose "ip", the rest of options displayed are not available for NFS share\'s protocol.\n- access level: read-write or read-only (depending on your needs)\n- access to: write the IP with permission to access the share. Only one entry is allowed per rule, therefore, you will have to include a rule for the fixed-IP of each VM. \n- Finally, click on the "add" button.\n\n.. image:: /cloud/_img/op_share_generic_img4.png\n\nMount the share on the VMs\n--------------------------\n\nYou are now ready to mount the share on VMs. In the following example, we will consider two VM with Ubuntu 22.04 OS. **Please refer to the network guide of the operating system of your VM to be sure about the actions to be performed.**\n\n- Login into the first VM.\n- Upgrade the packages installed in the VM\n\n.. code-block:: bash\n    \n    sudo apt update\n    sudo apt upgrade\n\n- Install the client. The package name is *"nfs-common"*.\n\n.. code-block:: bash\n    \n    sudo apt install nfs-common\n\n- Identify or create the directory in which the share will be mounted (e.g., "/mnt/share_manila") \n\n.. code-block:: bash\n   \n   sudo mkdir <MOUNT_PATH>\n\n- To mount the share you will need the share <ACCESS_PATH> displayed on the *"Share Overview"* page on OpenStack dashboard under the keyword *"Export Location/Path"*. Gather this information and proceed. \n\n.. image:: /cloud/_img/op_share_generic_img5.png\n\n- Mount the share with the following command. Beware that different versions of nfs-common are available for different versions of Ubuntu and the syntax of the mount command could change.\n\n.. code-block:: bash\n   \n   sudo mount -t nfs -v <ACCESS_PATH> <MOUNT_PATH>\n\n- Then, repeat the same steps for the second VM. ')

These are some of the most basic chunking strategies:

Sentence length: Split documents so that each sequence has a fixed length based on character length.
Recursive Character Text Splitter: Split documents so that each sequence is at most a specified length and ends at a designated delimiter character (e.g., punctuation \n, etc).
Format-specific text splitters: Similar to the above, but designed specifically for certain formats such as Markdown, HTML, Python code, etc., to preserve structure and syntax (these splitters use specific delimiters based on the format you are splitting).

# Split at char level, 100 chars max, 0 overlap between chunks
cts = CharacterTextSplitter(separator="", chunk_size = 100, chunk_overlap = 0)

cts.split_text(documents[0].page_content)[:10]

['.. _shares_generic_create_card:\n\nCreate and use a GENERIC_TYPE share\n===============================',
 '====\n\nThe following sections describe the steps needed to create a share and mount it on two VMs att',
 'ached to a local network. \nNote that the user needs to configure the VMs in a way that allows loggin',
 'g in via ssh. \n\nRequest to be enabled to the service\n------------------------------------\n\nThe user',
 'willing to make use of the Manila service needs to send an email to superc@cineca.it, communicating',
 "- how many shares are needed.\n- their dimensions (GB).\n- the tenant's name.\n\nOnce the tenant is en",
 'abled to the service by the User Support Team, all users of the tenant will be able to use the servi',
 'ce. \n\nCreate share network\n--------------------\n\nAs a first step, in the :ref:`cloud/os_overview/man',
 'agement_tools/dashboard:horizon dashboard` you need to create the share network \nby clicking on *"Cr',
 'eate Share Network"* in *"Share â\x86\x92 Share Networks"* and set the value for the following attributes:']

# Here, chunk size is an upper limit. This splitter attempts to create 
# chunks close to this size, but will split earlier by following a hierarchy of separators.
rcts = RecursiveCharacterTextSplitter(chunk_size = 100, chunk_overlap = 0)

rcts.split_text(documents[0].page_content)[:10]

['.. _shares_generic_create_card:',
 'Create and use a GENERIC_TYPE share\n===================================',
 'The following sections describe the steps needed to create a share and mount it on two VMs attached',
 'to a local network.',
 'Note that the user needs to configure the VMs in a way that allows logging in via ssh.',
 'Request to be enabled to the service\n------------------------------------',
 'The user willing to make use of the Manila service needs to send an email to superc@cineca.it,',
 'communicating',
 "- how many shares are needed.\n- their dimensions (GB).\n- the tenant's name.",
 'Once the tenant is enabled to the service by the User Support Team, all users of the tenant will be']

Typically, we use token length rather than character length for chunking because when working with embedders, the maximum context length is defined in tokens, not characters.

We can count the number of tokens using the embedder’s tokenizer and then split the document with a strategy similar to the Recursive Character Text Splitter (RCTS), but based on the token count as the chunk size.

The embedder we are using for this tutorial has a maximum context length of 8k tokens, this will be our upper limit.

embedder = SentenceTransformer(EMBEDDER)
embedder.max_seq_length

stts = SentenceTransformersTokenTextSplitter(tokens_per_chunk=75, chunk_overlap=0, model_name=EMBEDDER)

print(documents[0].metadata)
for chunk in stts.split_text(documents[0].page_content)[:2]:
    print(f"[CHUNK_CONTENT]:\n{chunk}\n===========")

{'doc_name': 'generic_share_create.rst.txt', 'scraped_on': '2025-07-22'}
[CHUNK_CONTENT]:
.. _shares_generic_create_card: Create and use a GENERIC_TYPE share =================================== The following sections describe the steps needed to create a share and mount it on two VMs attached to a local network. Note that the user needs to configure the VMs in a way that allows logging in via ssh.
===========
[CHUNK_CONTENT]:
Request to be enabled to the service ------------------------------------ The user willing to make use of the Manila service needs to send an email to superc@cineca.it, communicating - how many shares are needed. - their dimensions (GB). - the tenant's name. Once the tenant is enabled to the service by the User Support Team,
===========

The problem of the aformentioned method is that SentenceTransformersTokenTextSplitter does not support the use of specific separators. But we can create a recursive character text splitter using an huggingface tokenizer.

rcts_hftokenizer = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(embedder.tokenizer, chunk_overlap = 0, chunk_size = embedder.max_seq_length)
rcts_hftokenizer._separators = [".", ",", "\n\n", "\n", " ", ""]

print(documents[0].metadata)
# Our embedder has a context length of 8k tokens, which is big... that's why you see only a single chunk here.
# What do you think it would happen if we use an embedder with a limited max sequence length?
for chunk in rcts_hftokenizer.split_text(documents[0].page_content)[:2]:
    print(f"[CHUNK_CONTENT]:\n{chunk}\n===========")

{'doc_name': 'generic_share_create.rst.txt', 'scraped_on': '2025-07-22'}
[CHUNK_CONTENT]:
.. _shares_generic_create_card:

Create and use a GENERIC_TYPE share
===================================

The following sections describe the steps needed to create a share and mount it on two VMs attached to a local network. 
Note that the user needs to configure the VMs in a way that allows logging in via ssh. 

Request to be enabled to the service
------------------------------------

The user willing to make use of the Manila service needs to send an email to superc@cineca.it, communicating 

- how many shares are needed.
- their dimensions (GB).
- the tenant's name.

Once the tenant is enabled to the service by the User Support Team, all users of the tenant will be able to use the service. 

Create share network
--------------------

As a first step, in the :ref:`cloud/os_overview/management_tools/dashboard:horizon dashboard` you need to create the share network 
by clicking on *"Create Share Network"* in *"Share â Share Networks"* and set the value for the following attributes:

- Share network name.
- network: choose the desired network, in our example example_share_guide_net.
- subnet: choose the desired subnet, in our example example_share_guide_subnet.
- Click on the *"save"* button.

.. image:: /cloud/_img/op_share_generic_img1.png

Create the share
----------------

Create the share by clicking on *"Create Share"* in *"Share â Shares"* and setting the following information:

- share name
- share protocol  == "NFS"
- size (on the right side is visualized information about the actual available and used space within the tenant)
- Type == "generic_type"
- Leave blank the option "Make visible for all projects" because it is not enabled 
- In the end, click on the *"create"* button.

.. image:: /cloud/_img/op_share_generic_img2.png


Set the access rule(s) on the share just created. 

- On the OpenStack dashboard click on *"Share â Shares"* 
- select the share just created
- in the menu on the right select *"Manage Rules".*

.. image:: /cloud/_img/op_share_generic_img3.png

Click on *"Add rule"* and set:

- access type: Choose "ip", the rest of options displayed are not available for NFS share's protocol.
- access level: read-write or read-only (depending on your needs)
- access to: write the IP with permission to access the share. Only one entry is allowed per rule, therefore, you will have to include a rule for the fixed-IP of each VM. 
- Finally, click on the "add" button.

.. image:: /cloud/_img/op_share_generic_img4.png

Mount the share on the VMs
--------------------------

You are now ready to mount the share on VMs. In the following example, we will consider two VM with Ubuntu 22.04 OS. **Please refer to the network guide of the operating system of your VM to be sure about the actions to be performed.**

- Login into the first VM.
- Upgrade the packages installed in the VM

.. code-block:: bash
    
    sudo apt update
    sudo apt upgrade

- Install the client. The package name is *"nfs-common"*.

.. code-block:: bash
    
    sudo apt install nfs-common

- Identify or create the directory in which the share will be mounted (e.g., "/mnt/share_manila") 

.. code-block:: bash
   
   sudo mkdir <MOUNT_PATH>

- To mount the share you will need the share <ACCESS_PATH> displayed on the *"Share Overview"* page on OpenStack dashboard under the keyword *"Export Location/Path"*. Gather this information and proceed. 

.. image:: /cloud/_img/op_share_generic_img5.png

- Mount the share with the following command. Beware that different versions of nfs-common are available for different versions of Ubuntu and the syntax of the mount command could change.

.. code-block:: bash
   
   sudo mount -t nfs -v <ACCESS_PATH> <MOUNT_PATH>

- Then, repeat the same steps for the second VM.
===========

Other chunking options we will not explore today:

Semantic chunking
LLM based chunking

Finding the right chunk size

It is true that the maximum sequence length of our embedder imposes an upper limit on the size of the embeddings. However, this does not imply that this upper limit corresponds to the optimal chunk size. Larger chunks often encompass multiple topics, which can result in embeddings that are less focused and “dilute” the thematic coherence.

How can we determine the optimal chunk size?

If a labeled dataset of questions and answers is available, various evaluation metrics can be used to measure how the relevance of retrieved resources changes with different chunk sizes.

%%time
class QA:
    def __init__(self, document_name:str, question:str, answer:str, start_index:int, end_index:int, chunk_size_tok:int, seed:float):
        self.document_name = document_name # The document name containing the answ
        self.question = question # The question created by an llm
        self.answer = answer # The answer 
        self.start_index = start_index # Character index where the question starts
        self.end_index = end_index # Character index where the question ends
        self.chunk_size_tok = chunk_size_tok # Total token size
        self.seed = seed # Seed used to create the answ

# Let's generate a set of syntetic questions and answers
def generate_qa_pairs(document:Document, embedder:SentenceTransformer, 
                      llm:ChatOpenAI, seed:float|None = None) -> QA:
    """
    Extracts a list of questions and answers from the given document.

    Args:
        document (langchain.Document): The document from which to extract an answer chunk for question generation.
        embedder: The embedder to be used in the process.
        llm (langchain.LLM): An instance of a Langchain LLM used for generating questions.
        seed (float, optional): A seed value to ensure reproducibility. If None (default), the current Unix timestamp will be used.

    Returns:
        QA Object: An object containing the question-answer data with the following fields:
            - `document_name`: The name of the source document.
            - `question`: The generated question corresponding to the extracted chunk.
            - `answer`: The answer text extracted from the document.
            - `start_index`: The starting index of the answer within the document.
            - `end_index`: The ending index of the answer within the document.
            - `chunk_size_tok`: The size of the chunk in tokens.
            - `seed`: The seed value used to select the answer for question generation.
    """
    try:
        # Set a seed for replicability
        if seed is None:
            seed = time.time()
        random.seed(seed)
        
        # Here we don't use always the same sequence length, otherwise we are
        # influencing our average chunks length to be always that long
        rand_seq_len = random.randint(100, 1024)

        chunker = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(embedder.tokenizer, 
                                                                            chunk_overlap = 0, 
                                                                            chunk_size = rand_seq_len,
                                                                            add_start_index = True)
        chunker._separators = ["..", ",", "\n\n", "\n", " ", ""]

        # Extract the answer to be used for question generation.
        document_chunks = chunker.split_documents([document])
        # Add end index to the document metadata
        for i in range(len(document_chunks)):
            document_chunks[i].metadata["end_index"] = document_chunks[i].metadata["start_index"] + len(document_chunks[i].page_content) - 1 
            
        # Select a random chunk so that we don't always generate the same
        # set of Q&A pairs
        answer = document_chunks[random.randint(0, len(document_chunks)-1)]

        # Use an llm to generate a question about this piece of text
        class AnswerFormat(BaseModel):
            question:str

        with open(os.path.join(PROMPT_PATH, "qa_syntetic_testset_prompt.txt"), mode = "r") as f:
            # Read the question prompt and append the content of the document to the end
            question_template = f.read() + "\n" + answer.page_content
        
        item = llm.with_structured_output(AnswerFormat).invoke(question_template)
        return QA(answer.metadata["doc_name"], item.question, 
                  answer.page_content, answer.metadata["start_index"], answer.metadata["end_index"], 
                  len(embedder.tokenizer.tokenize(answer.page_content)), seed)
    except Exception as e:
        print(f"Encountered an exception while processing doc {document.metadata['doc_name']}", e)
        return QA(document.metadata["doc_name"], "Error while parsing doc", e, 0, 0, 0, seed)

with ThreadPoolExecutor(max_workers=12) as e:
    results = [*e.map(partial(generate_qa_pairs, embedder = embedder, llm = llm, seed = 42), documents)]

CPU times: user 1.73 s, sys: 22 ms, total: 1.75 s
Wall time: 9.15 s

qa_set = pd.DataFrame({"doc": [qa.document_name for qa in results], "seed":[qa.seed for qa in results],
                       "chunk_size_tok": [qa.chunk_size_tok for qa in results],  
                       "start_index": [qa.start_index for qa in results], "end_index": [qa.end_index for qa in results],
                       "question": [qa.question for qa in results], "answer": [qa.answer for qa in results]})
display(qa_set.head())

# Save to csv for further inspection
qa_set.to_csv(os.path.join(OUTPUT_PATH, f"qa_questions_{date}.csv"), index = False)

	doc	seed	chunk_size_tok	end_index	question	answer
0	generic_share_create.rst.txt	42	644	2429	What specific details must a user include in t...	.. _shares_generic_create_card:\n\nCreate and ...
1	database.rst.txt	42	220	1019	How does the Trove component of OpenStack faci...	.. _database_card:\n\nDatabase\n========\n\n`T...
2	hpc_software.rst.txt	42	734	3131	What steps must a user take to gain access to ...	Software\n========\n\n\| On CINECA clusters, se...
3	gaia.rst.txt	42	471	1611	What are the specific models of Nvidia GPUs th...	.. _gaia_card:\n\nGAIA\n====\n\n**PAGE UNDER C...
4	miniconda.rst.txt	42	737	3062	What are the recommended steps to clean up pre...	.. _miniconda_card:\n\nMiniconda \n=========\n...

# Note: Questions and answers may vary due to multithreading even if we set a seed, that's because multithreading may change the order of execution.
# So we load this Q&A set to ensure replicability of the next steps
qa_set = pd.read_csv("../data/output/chunking/qa_questions_2025_07_23_14_54_35.csv")

Mean Reciprocal Rank

$ MeanReciprocalRank = \frac{1}{|Q|} \sum_{i=1}^{|Q|}\frac{1}{rank_i} $

Given a set of questions $ Q $, we rank all available chunks by their similarity to each question and examine the position of the relevant chunk within this ranking. The average of the reciprocal ranks across all questions is known as the $ Mean Reciprocal Rank $ (MRR).

To identify the optimal chunk size, the following procedure can be employed:

Define a test set with questions and their related answers (we can do it manually or… with an LLM). We already done this step;
Chunk data testing various configurations of chunk sizes (e.g. 100 tokens, 200 tokens, etc..);
For each chunk config, calculate the mean reciprocal rank;
Choose the chunking configuration which maximizes the MRR (the MRR is constrained between 0 and 1; so, the higher the better) and minimizes the chunk size.

Test various chunking config

We experiment with various chunking setups as described earlier. We start with an initial chunk size of 100 tokens and increase it by 100 tokens each time.

Although our embedder supports a context window of around 8000 tokens, we limit the maximum chunk size to 1500 tokens. This is because beyond 1500 tokens, the text chunks become quite long, and we want to avoid making them overly broad in terms of topics.

def find_optimal_chunk_size(initial_chunk_size:int, step_size:int, max_chunk_size:int, 
                            documents:List[Document], qa_set:pd.DataFrame) -> pd.DataFrame:
    # Save the mean reciprocal rank for each config tested
    tested_chunk_size = []
    mean_reciprocal_rank = []

    for chunk_size in range(initial_chunk_size, max_chunk_size + 1, step_size):
        print(f"{datetime.now()} - Testing chunk size of: {chunk_size}")
        rcts_chunker = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(embedder.tokenizer, 
                                                                                chunk_overlap = 0, 
                                                                                chunk_size = chunk_size,
                                                                                add_start_index = True)
        # The . was removed as in rst documents it has a very specific meaning 
        # and this would cause document "oversplitting"
        rcts_chunker._separators = ["..", ",", "\n\n", "\n", " ", ""]

        # The split_documents method creates one document for each chunk. Each document has the same metadata 
        # as the original document. The list is "flat", we don't have nested lists
        splitted_docs = rcts_chunker.split_documents(documents)
        # Add end position for each chunk
        for chunk in splitted_docs:
            chunk.metadata["end_index"] = chunk.metadata["start_index"] + len(chunk.page_content) - 1 

        # Calculate the embedding for each document chunk
        documents_embeddings = embedder.encode([chunk.page_content for chunk in splitted_docs])
        embedded_data = pd.DataFrame({"content":[chunk.page_content for chunk in splitted_docs],
                                    "metadata": [chunk.metadata for chunk in splitted_docs],
                                    "embeddings": [embeddings for embeddings in documents_embeddings]})

        # Unpack metadata columns
        embedded_data[[item for item in embedded_data["metadata"].iloc[0].keys()]] = embedded_data.apply(lambda x: [item for item in x["metadata"].values()], axis = 1, result_type="expand")
        embedded_data.drop("metadata", axis = 1, inplace = True)

        # Embed each question and calc similarity with respect to the chunks
        # created with the chunk size configuration we are testing
        embedded_questions = embedder.encode(qa_set["question"])
        similarity = cosine_similarity(embedded_questions, np.stack(embedded_data["embeddings"].to_numpy()))

        # Create a table with question_id, chunk_id, similarity score
        similarity_abt = pd.DataFrame(similarity).stack().reset_index()
        similarity_abt.columns = ["question_id", "chunk_id", "similarity"]

        # Add chunk informations
        similarity_abt = similarity_abt.merge(embedded_data.reset_index(), how="inner", left_on = "chunk_id", right_on= "index")
        similarity_abt.drop(["index", "embeddings", "scraped_on"], axis=1, inplace=True)
        similarity_abt.rename({"start_index": "chunk_start_index", "end_index":"chunk_end_index", "doc_name":"chunk_doc_provenance", "content":"chunk_content"}, axis = 1, inplace=True)

        # Join qa set information
        similarity_abt = similarity_abt.merge(qa_set.reset_index(), how="inner", left_on = "question_id", right_on= "index")
        similarity_abt.drop(["index", "seed", "chunk_size_tok"], axis=1, inplace=True)
        similarity_abt.rename({"start_index": "answer_start_index", "end_index":"answer_end_index", "doc":"qa_doc_provenance"}, axis = 1, inplace=True)    
        #display(similarity_abt[["question_id", "chunk_id", "similarity", "chunk_doc_provenance", "chunk_start_index", "chunk_end_index", 
        #                        "qa_doc_provenance", "chunk_start_index","chunk_end_index", "question", "answer", "chunk_content"]])
        #break
        
        def calc_rank(df:pd.DataFrame):
            # Sort all the values and use the index as rank
            df = df.sort_values(by = "similarity", ascending = False).reset_index(drop = True)
            # Keep only rows where qa question matches chunk document provenance 
            # and where chunk contains a piece of the answer
            relevant_docs = df["chunk_doc_provenance"] == df["qa_doc_provenance"]
            chunk_contains_beginning_answ = df["answer_start_index"].between(df["chunk_start_index"], df["chunk_end_index"])
            chunk_contains_end_answ = df["answer_end_index"].between(df["chunk_start_index"], df["chunk_end_index"])
            chunk_contains_mid_answ = (df["chunk_start_index"] > df["answer_start_index"]) & (df["chunk_end_index"] < df["answer_end_index"])
            df = df[relevant_docs & ( chunk_contains_beginning_answ | chunk_contains_end_answ | chunk_contains_mid_answ)]
            df = df.reset_index(names = "rank")
            return df["rank"].mean()
            
        # For each question of the qa, calc the (avg) rank
        rank = similarity_abt.groupby("question_id").apply(lambda X: calc_rank(X), include_groups = False)
        reciprocal_rank = rank.map(lambda x: 1/(x + 1)) # Rank starts from zero, so we remap from 1 adding a + 1
        mrr = (1/rank.shape[0]) * sum(reciprocal_rank)
        
        # Save results for plotting
        mean_reciprocal_rank.append(mrr)
        tested_chunk_size.append(chunk_size)
    
    # Check mrr trend and plot against the chunk size
    mrr_tests_data = pd.DataFrame({"chunk_size": tested_chunk_size, "mrr": mean_reciprocal_rank})
    mrr_tests_data["improvement"] = mrr_tests_data["mrr"].diff(1)
    return mrr_tests_data

%%time
mrr_test_data = find_optimal_chunk_size(initial_chunk_size = 100, step_size = 100, max_chunk_size = 1500, documents = documents, qa_set = qa_set)

2025-07-28 16:40:24.248507 - Testing chunk size of: 100
2025-07-28 16:40:32.288890 - Testing chunk size of: 200
2025-07-28 16:40:38.834541 - Testing chunk size of: 300
2025-07-28 16:40:45.030032 - Testing chunk size of: 400
2025-07-28 16:40:51.192913 - Testing chunk size of: 500
2025-07-28 16:40:57.272516 - Testing chunk size of: 600
2025-07-28 16:41:03.475425 - Testing chunk size of: 700
2025-07-28 16:41:09.648438 - Testing chunk size of: 800
2025-07-28 16:41:15.897295 - Testing chunk size of: 900
2025-07-28 16:41:22.109958 - Testing chunk size of: 1000
2025-07-28 16:41:28.252650 - Testing chunk size of: 1100
2025-07-28 16:41:34.423317 - Testing chunk size of: 1200
2025-07-28 16:41:40.952498 - Testing chunk size of: 1300
2025-07-28 16:41:47.153977 - Testing chunk size of: 1400
2025-07-28 16:41:53.681691 - Testing chunk size of: 1500
CPU times: user 1min 50s, sys: 174 ms, total: 1min 50s
Wall time: 1min 36s

display(mrr_test_data)

plt.figure(figsize=(10, 4))
plt.vlines(mrr_test_data["chunk_size"].iloc[-1], ymin=0, ymax=1, colors = "b", label = "Max tested chunk size")
plt.vlines(128, ymin=0, ymax=1, colors = "g", label = "0-128 - Tiny chunks")
plt.vlines(256, ymin=0, ymax=1, colors = "y", label = "128-256 - Small chunks")
plt.vlines(512, ymin=0, ymax=1, colors = "orangered", label = "256-512 - Medium chunks")
plt.vlines(1024, ymin=0, ymax=1, colors = "r", label = "512-1024 - Large chunks")
plt.plot(mrr_test_data["chunk_size"], mrr_test_data["mrr"], label = "MRR")
plt.ylabel("MRR")
plt.xlabel("chunk_size")
plt.title(f"Mean reciprocal rank tests - step size {int(mrr_test_data['chunk_size'].diff().iloc[1])} tok")
plt.legend()
plt.show()

	chunk_size	mrr	improvement
0	100	0.058761	NaN
1	200	0.204699	0.145937
2	300	0.366927	0.162228
3	400	0.482120	0.115193
4	500	0.602926	0.120806
5	600	0.670944	0.068018
6	700	0.780495	0.109551
7	800	0.929104	0.148610
8	900	0.935323	0.006219
9	1000	0.907046	-0.028278
10	1100	0.884655	-0.022391
11	1200	0.936816	0.052161
12	1300	0.933547	-0.003269
13	1400	0.925821	-0.007726
14	1500	0.944527	0.018706

../../../../_images/0ca7decfcca5a7a2148a24472340e5eae8cf48fcbf1c0ed3cc858aab933fb474.png

Chunk and index all documents

Once we found a reasonable chunk size, we are ready to split all our docs with the chosen lenght. Then we need to save our collection of splitted docs in a vector database.

Vector databases

A vector database is a specialized type of database designed to efficiently store and search data represented as high-dimensional vectors. These databases are optimized to perform similarity searches efficiently. Given a query vector, the database quickly finds vectors that are closest to it in terms of distance metrics like cosine similarity, Euclidean distance, etc.

The choice of a vector database depends on the needs of the project, for a comparison see: 1, 2, 3. More “traditional” dbs have added support for vectors (e.g. Postgres, DuckDB).

For this tutorial we are going to use ChromaDB.

# Instantiate our splitter with the chunk size we identified during the previous step
rcts_chunker = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(embedder.tokenizer, 
                                                                         chunk_overlap = 0, 
                                                                         chunk_size = 800,
                                                                         add_start_index = True)
rcts_chunker._separators = ["..", ",", "\n\n", "\n", " ", ""]

chunks = rcts_chunker.split_documents(documents)
chunks[:2]

[Document(metadata={'doc_name': 'generic_share_create.rst.txt', 'scraped_on': '2025-07-22', 'start_index': 0}, page_content='.. _shares_generic_create_card:\n\nCreate and use a GENERIC_TYPE share\n===================================\n\nThe following sections describe the steps needed to create a share and mount it on two VMs attached to a local network. \nNote that the user needs to configure the VMs in a way that allows logging in via ssh. \n\nRequest to be enabled to the service\n------------------------------------\n\nThe user willing to make use of the Manila service needs to send an email to superc@cineca.it, communicating \n\n- how many shares are needed.\n- their dimensions (GB).\n- the tenant\'s name.\n\nOnce the tenant is enabled to the service by the User Support Team, all users of the tenant will be able to use the service. \n\nCreate share network\n--------------------\n\nAs a first step, in the :ref:`cloud/os_overview/management_tools/dashboard:horizon dashboard` you need to create the share network \nby clicking on *"Create Share Network"* in *"Share â\x86\x92 Share Networks"* and set the value for the following attributes:\n\n- Share network name.\n- network: choose the desired network, in our example example_share_guide_net.\n- subnet: choose the desired subnet, in our example example_share_guide_subnet.\n- Click on the *"save"* button.\n\n.. image:: /cloud/_img/op_share_generic_img1.png\n\nCreate the share\n----------------\n\nCreate the share by clicking on *"Create Share"* in *"Share â\x86\x92 Shares"* and setting the following information:\n\n- share name\n- share protocol  == "NFS"\n- size (on the right side is visualized information about the actual available and used space within the tenant)\n- Type == "generic_type"\n- Leave blank the option "Make visible for all projects" because it is not enabled \n- In the end, click on the *"create"* button.\n\n.. image:: /cloud/_img/op_share_generic_img2.png\n\n\nSet the access rule(s) on the share just created. \n\n- On the OpenStack dashboard click on *"Share â\x86\x92 Shares"* \n- select the share just created\n- in the menu on the right select *"Manage Rules".*\n\n.. image:: /cloud/_img/op_share_generic_img3.png\n\nClick on *"Add rule"* and set:\n\n- access type: Choose "ip", the rest of options displayed are not available for NFS share\'s protocol.\n- access level: read-write or read-only (depending on your needs)\n- access to: write the IP with permission to access the share. Only one entry is allowed per rule, therefore, you will have to include a rule for the fixed-IP of each VM. \n- Finally, click on the "add" button.\n\n.. image:: /cloud/_img/op_share_generic_img4.png\n\nMount the share on the VMs\n--------------------------\n\nYou are now ready to mount the share on VMs. In the following example, we will consider two VM with Ubuntu 22.04 OS. **Please refer to the network guide of the operating system of your VM to be sure about the actions to be performed.**\n\n- Login into the first VM.\n- Upgrade the packages installed in the VM\n\n.. code-block:: bash\n    \n    sudo apt update\n    sudo apt upgrade\n\n- Install the client. The package name is *"nfs-common"*.'),
 Document(metadata={'doc_name': 'generic_share_create.rst.txt', 'scraped_on': '2025-07-22', 'start_index': 2972}, page_content='.. code-block:: bash\n    \n    sudo apt install nfs-common\n\n- Identify or create the directory in which the share will be mounted (e.g., "/mnt/share_manila") \n\n.. code-block:: bash\n   \n   sudo mkdir <MOUNT_PATH>\n\n- To mount the share you will need the share <ACCESS_PATH> displayed on the *"Share Overview"* page on OpenStack dashboard under the keyword *"Export Location/Path"*. Gather this information and proceed. \n\n.. image:: /cloud/_img/op_share_generic_img5.png\n\n- Mount the share with the following command. Beware that different versions of nfs-common are available for different versions of Ubuntu and the syntax of the mount command could change.\n\n.. code-block:: bash\n   \n   sudo mount -t nfs -v <ACCESS_PATH> <MOUNT_PATH>\n\n- Then, repeat the same steps for the second VM.')]

# Let's check the chunk distribution
chunk_count = pd.DataFrame([chunk.metadata for chunk in chunks]).groupby("doc_name").count().iloc[:, 1]
chunk_count.name = "chunk_count"

plt.hist(chunk_count, bins=30)
plt.title("N-Chunks per document distribution")
plt.xlabel("N-Chunks")
plt.ylabel("Freq")

Text(0, 0.5, 'Freq')

../../../../_images/e979f756386c59474017930695b9f689d9fcf3d8e73d63b2016c3ec529972482.png

# Util function to create embeddings and add them to a chromadb collection
def create_vector_store(documents:List[Document], embedder, 
                        vector_store_name:str, writing_path:str, is_incremental:bool = True):
    embeddings = embedder.encode([doc.page_content for doc in documents])

    chroma_client = chromadb.PersistentClient(path = writing_path)
    
    if not is_incremental:
        # Drop collection if exists, we want to start from a fresh state
        try:
            chroma_client.delete_collection(vector_store_name)
        except NotFoundError as e:
            pass
    collection = chroma_client.create_collection(name = vector_store_name, get_or_create=True)

    # Add all docs to the collection
    collection.add(documents = [doc.page_content for doc in chunks],
                metadatas  = [doc.metadata for doc in chunks],
                ids = [doc.metadata["doc_name"] + "__" + \
                        str(doc.metadata["start_index"]) for doc in chunks],
                embeddings = embeddings)

# Initialize our vector store
chroma_path = os.path.join(OUTPUT_PATH, "chroma")
Path(chroma_path).mkdir(exist_ok=True, parents=True)

create_vector_store(documents = chunks, embedder = embedder, 
                    vector_store_name="hpc_wiki", writing_path = chroma_path, is_incremental = False)

# Create a client for the db and check the top 3 retrieved docs for a question
lc_embedder = HuggingFaceEmbeddings(model_name = EMBEDDER)
hpc_store = Chroma(collection_name = "hpc_wiki", embedding_function = lc_embedder, 
                   persist_directory= chroma_path)
hpc_store.similarity_search_with_score("What are the names of the QOS queues available on the Leonardo supercomputer BOOSTER partition?", k=3)

[(Document(id='leonardo.rst.txt__0', metadata={'start_index': 0, 'scraped_on': '2025-07-22', 'doc_name': 'leonardo.rst.txt'}, page_content='.. _leonardo_card:\n\nLeonardo\n========\n\nLeonardo is the *pre-exascale* Tier-0 supercomputer of the EuroHPC Joint Undertaking (JU), hosted by **CINECA** and currently located at the Bologna DAMA-Technopole in Italy.\nThis guide provides specific information about the **Leonardo** cluster, including details that differ from the general behavior described in the broader HPC Clusters section.\n\n.. |ico2| image:: img/leonardo_logo.png\n   :height: 55px\n   :class: no-scaled-link\n\nAccess to the System\n--------------------\n\nThe machine is reachable via ``ssh`` (secure Shell) protocol at hostname point: **login.leonardo.cineca.it**. \n\nThe connection is established, automatically, to one of the available login nodes. It is possible to connect to **Leonardo** using one the specific login hostname points:\n\n * login01-ext.leonardo.cineca.it\n * login02-ext.leonardo.cineca.it\n * login05-ext.leonardo.cineca.it\n * login07-ext.leonardo.cineca.it\n\n.. warning::\n    \n    **The mandatory access to Leonardo si the two-factor authetication (2FA)**. Get more information at section :ref:`general/access:Access to the Systems`.\n\nSystem Architecture\n-------------------\n\nThe cluster, supplied by EVIDEN ATOS, is based on two new specifically-designed compute blades, which are available throught two distinc Slurm partitios on the Cluster:\n\n* X2135 **GPU** blade based on NVIDIA Ampere A100-64 accelerators - **Booster** partition.\n* X2140 **CPU**-only blade based on Intel Sapphire Rapids processors - **Data Centric General Purpose (DCGP)** partition.\n\nThe overall system architecture uses NVIDIA Mellanox InfiniBand High Data Rate (HDR) connectivity, with smart in-network computing acceleration engines that enable extremely low latency and high data throughput to provide the highest AI and HPC application performance and scalability. \n\nThe **Booster** partition entered pre-production in May 2023 and moved to **full production in July 2023**.\nThe **DCGP** partition followed, starting pre-production in January 2024 and reaching **full production in February 2024**.\n\nHardware Details\n^^^^^^^^^^^^^^^^\n\n.. tab-set::\n\n    .. tab-item:: Booster'),
  0.7387087941169739),
 (Document(id='specific_users.rst.txt__2544', metadata={'doc_name': 'specific_users.rst.txt', 'start_index': 2544, 'scraped_on': '2025-07-22'}, page_content='../files/2FA_EF.pdf>`\n\n\nSLURM Partitions\n----------------\n\nOn :ref:`hpc/leonardo:Leonardo` and :ref:`hpc/pitagora:Pitagora` *Job Managing and SLURM Partitions* sections, you can find the description of SLURM partitions and QOS to submit your jobs. Notice that EUROfusion users have dedicated partitions and QOS: you are allowed to use the **"_fua_"** partitions and the related QOS, besides the **"_all_serial"** partition which is shared among all users.\n\n\nLow-priority jobs\n^^^^^^^^^^^^^^^^^\n\n1) **If all the budget assigned to your Project Account has been consumed**, you can keep running on Leonardo boost_fua_prod and dcgp_fua_prod partitions at low priority by requesting in your submission script the **qos_fualowprio** QOS:\n\n.. code-block:: bash\n\n  #SBATCH --account=<YOUR Project Account>\n  #SBATCH --qos=qos_fualowprio\n\nThe QOS is *automatically* added to your Project Account upon budget exhaustion.\n\n2) You can also request to run low priority jobs, **without having consumed all the budget of yout active Project Account**, by association to the **FUAL8_LOWPRIO** account on Booster and **FUA38_LOWPRIO_0** account on DCGP (write a mail to superc@cineca.it). You always need to specify also the **qos_fualowprio** QOS in your submission script.\n\n.. code-block:: bash\n  \n   #SBATCH --account=<LOWPRIO Project Account>\n   #SBATCH --qos=qos_fualowprio'),
  0.7474587559700012),
 (Document(id='matlab.rst.txt__7416', metadata={'start_index': 7416, 'doc_name': 'matlab.rst.txt', 'scraped_on': '2025-07-22'}, page_content=".. code-block:: matlabsession\n        \n        >> % Specify QoS\n        >> c.AdditionalProperties.QoS = 'name-of-qos';\n\n        >> % Specify processor cores per node.  Default is 32 for Leonardo GPU nodes and 112 on Leonardo CPU nodes; 18 for Marconi and 48 for Galileo100.\n        >> c.AdditionalProperties.ProcsPerNode = 18;\n\n        >> % specify the number of GPUsPerNode. Valid only on Leonardo GPU partition\n        >> c.AdditionalProperties.GPUsPerNode = 1;\n\n        >> % Specify memory to use for MATLAB jobs, per core (default: 4gb)\n        >> c.AdditionalProperties.MemUsage = '6gb';\n\n        >> % Require node exclusivity\n        >> c.AdditionalProperties.RequireExclusiveNode = true;\n\n        >> % Request to use a reservation\n        >> c.AdditionalProperties.Reservation = 'name-of-reservation';\n\n        >> % Specify e-mail address to receive notifications about your job\n        >> c.AdditionalProperties.EmailAddress = â\x80\x98test@foo.comâ\x80\x99;\n\n        >> % Turn onthe Debug Message.  Default is off (logical boolean true/false).\n        >> c.AdditionalProperties.DebugMessagesTurnedOn = true;\n\n\nTo check for the values of the current configuration options, call the AdditionalProperties without semicolon\n\n.. code-block:: matlabsession\n\n        >> % To view current configurations\n        >> c.AdditionalProperties\n\nTo clear a value, assign the property an empty value (â\x80\x98â\x80\x99, [], or false).\n\n.. code-block:: matlabsession\n\n        >> % To clear a configuration that takes a string as input \n        >> c.AdditionalProperties.EmailAddress = â\x80\x98 â\x80\x99;\n\nTo save a profile, with your configuration so you will find it in future sessions\n\n.. code-block:: matlabsession\n\n        >> c.saveProfile;\n\nSerial Jobs\n^^^^^^^^^^^\n\nUse the batch command to submit asynchronous jobs to the cluster.  The batch command will return a job object which is used to access the output of the submitted job.  See the MATLAB documentation for more help on batch.\n\n.. code-block:: matlabsession\n\n        >> % Get a handle to the cluster\n        >> c = parcluster;\n\nSubmit job to query where MATLAB is running on the cluster\n\n.. code-block:: matlabsession\n\n        >> j = c.batch(@pwd, 1, {});\n\nQuery job for state:  queued | running | finished\n\n.. code-block:: matlabsession\n\n        >> j.State\n\nIf state is finished, fetch results\n\n.. code-block:: matlabsession\n\n        >> j.fetchOutputs{:}\n\nor\n\n.. code-block:: matlabsession\n\n        >> fetchOutputs(j)\n\nDisplay the diary\n\n.. code-block:: matlabsession\n\n        >> diary(j)\n\nDelete the job after results are no longer needed\n\n.. code-block:: matlabsession\n\n        >> j.delete;\n\nTo retrieve a list of currently running or completed jobs, call parcluster to retrieve the cluster object. The cluster object stores an array of jobs that were run, are running, or are queued to run. This allows us to fetch the results of completed jobs. Retrieve and view the list of jobs as shown below."),
  0.8535576462745667)]

Maybe the answer can improve a bit if we add more retrieved documents…

hpc_store.similarity_search_with_score("What are the names of the QOS queues available on the Leonardo supercomputer BOOSTER partition?", k=20)

[(Document(id='leonardo.rst.txt__0', metadata={'scraped_on': '2025-07-22', 'start_index': 0, 'doc_name': 'leonardo.rst.txt'}, page_content='.. _leonardo_card:\n\nLeonardo\n========\n\nLeonardo is the *pre-exascale* Tier-0 supercomputer of the EuroHPC Joint Undertaking (JU), hosted by **CINECA** and currently located at the Bologna DAMA-Technopole in Italy.\nThis guide provides specific information about the **Leonardo** cluster, including details that differ from the general behavior described in the broader HPC Clusters section.\n\n.. |ico2| image:: img/leonardo_logo.png\n   :height: 55px\n   :class: no-scaled-link\n\nAccess to the System\n--------------------\n\nThe machine is reachable via ``ssh`` (secure Shell) protocol at hostname point: **login.leonardo.cineca.it**. \n\nThe connection is established, automatically, to one of the available login nodes. It is possible to connect to **Leonardo** using one the specific login hostname points:\n\n * login01-ext.leonardo.cineca.it\n * login02-ext.leonardo.cineca.it\n * login05-ext.leonardo.cineca.it\n * login07-ext.leonardo.cineca.it\n\n.. warning::\n    \n    **The mandatory access to Leonardo si the two-factor authetication (2FA)**. Get more information at section :ref:`general/access:Access to the Systems`.\n\nSystem Architecture\n-------------------\n\nThe cluster, supplied by EVIDEN ATOS, is based on two new specifically-designed compute blades, which are available throught two distinc Slurm partitios on the Cluster:\n\n* X2135 **GPU** blade based on NVIDIA Ampere A100-64 accelerators - **Booster** partition.\n* X2140 **CPU**-only blade based on Intel Sapphire Rapids processors - **Data Centric General Purpose (DCGP)** partition.\n\nThe overall system architecture uses NVIDIA Mellanox InfiniBand High Data Rate (HDR) connectivity, with smart in-network computing acceleration engines that enable extremely low latency and high data throughput to provide the highest AI and HPC application performance and scalability. \n\nThe **Booster** partition entered pre-production in May 2023 and moved to **full production in July 2023**.\nThe **DCGP** partition followed, starting pre-production in January 2024 and reaching **full production in February 2024**.\n\nHardware Details\n^^^^^^^^^^^^^^^^\n\n.. tab-set::\n\n    .. tab-item:: Booster'),
  0.7387087941169739),
 (Document(id='specific_users.rst.txt__2544', metadata={'doc_name': 'specific_users.rst.txt', 'start_index': 2544, 'scraped_on': '2025-07-22'}, page_content='../files/2FA_EF.pdf>`\n\n\nSLURM Partitions\n----------------\n\nOn :ref:`hpc/leonardo:Leonardo` and :ref:`hpc/pitagora:Pitagora` *Job Managing and SLURM Partitions* sections, you can find the description of SLURM partitions and QOS to submit your jobs. Notice that EUROfusion users have dedicated partitions and QOS: you are allowed to use the **"_fua_"** partitions and the related QOS, besides the **"_all_serial"** partition which is shared among all users.\n\n\nLow-priority jobs\n^^^^^^^^^^^^^^^^^\n\n1) **If all the budget assigned to your Project Account has been consumed**, you can keep running on Leonardo boost_fua_prod and dcgp_fua_prod partitions at low priority by requesting in your submission script the **qos_fualowprio** QOS:\n\n.. code-block:: bash\n\n  #SBATCH --account=<YOUR Project Account>\n  #SBATCH --qos=qos_fualowprio\n\nThe QOS is *automatically* added to your Project Account upon budget exhaustion.\n\n2) You can also request to run low priority jobs, **without having consumed all the budget of yout active Project Account**, by association to the **FUAL8_LOWPRIO** account on Booster and **FUA38_LOWPRIO_0** account on DCGP (write a mail to superc@cineca.it). You always need to specify also the **qos_fualowprio** QOS in your submission script.\n\n.. code-block:: bash\n  \n   #SBATCH --account=<LOWPRIO Project Account>\n   #SBATCH --qos=qos_fualowprio'),
  0.7474587559700012),
 (Document(id='matlab.rst.txt__7416', metadata={'start_index': 7416, 'doc_name': 'matlab.rst.txt', 'scraped_on': '2025-07-22'}, page_content=".. code-block:: matlabsession\n        \n        >> % Specify QoS\n        >> c.AdditionalProperties.QoS = 'name-of-qos';\n\n        >> % Specify processor cores per node.  Default is 32 for Leonardo GPU nodes and 112 on Leonardo CPU nodes; 18 for Marconi and 48 for Galileo100.\n        >> c.AdditionalProperties.ProcsPerNode = 18;\n\n        >> % specify the number of GPUsPerNode. Valid only on Leonardo GPU partition\n        >> c.AdditionalProperties.GPUsPerNode = 1;\n\n        >> % Specify memory to use for MATLAB jobs, per core (default: 4gb)\n        >> c.AdditionalProperties.MemUsage = '6gb';\n\n        >> % Require node exclusivity\n        >> c.AdditionalProperties.RequireExclusiveNode = true;\n\n        >> % Request to use a reservation\n        >> c.AdditionalProperties.Reservation = 'name-of-reservation';\n\n        >> % Specify e-mail address to receive notifications about your job\n        >> c.AdditionalProperties.EmailAddress = â\x80\x98test@foo.comâ\x80\x99;\n\n        >> % Turn onthe Debug Message.  Default is off (logical boolean true/false).\n        >> c.AdditionalProperties.DebugMessagesTurnedOn = true;\n\n\nTo check for the values of the current configuration options, call the AdditionalProperties without semicolon\n\n.. code-block:: matlabsession\n\n        >> % To view current configurations\n        >> c.AdditionalProperties\n\nTo clear a value, assign the property an empty value (â\x80\x98â\x80\x99, [], or false).\n\n.. code-block:: matlabsession\n\n        >> % To clear a configuration that takes a string as input \n        >> c.AdditionalProperties.EmailAddress = â\x80\x98 â\x80\x99;\n\nTo save a profile, with your configuration so you will find it in future sessions\n\n.. code-block:: matlabsession\n\n        >> c.saveProfile;\n\nSerial Jobs\n^^^^^^^^^^^\n\nUse the batch command to submit asynchronous jobs to the cluster.  The batch command will return a job object which is used to access the output of the submitted job.  See the MATLAB documentation for more help on batch.\n\n.. code-block:: matlabsession\n\n        >> % Get a handle to the cluster\n        >> c = parcluster;\n\nSubmit job to query where MATLAB is running on the cluster\n\n.. code-block:: matlabsession\n\n        >> j = c.batch(@pwd, 1, {});\n\nQuery job for state:  queued | running | finished\n\n.. code-block:: matlabsession\n\n        >> j.State\n\nIf state is finished, fetch results\n\n.. code-block:: matlabsession\n\n        >> j.fetchOutputs{:}\n\nor\n\n.. code-block:: matlabsession\n\n        >> fetchOutputs(j)\n\nDisplay the diary\n\n.. code-block:: matlabsession\n\n        >> diary(j)\n\nDelete the job after results are no longer needed\n\n.. code-block:: matlabsession\n\n        >> j.delete;\n\nTo retrieve a list of currently running or completed jobs, call parcluster to retrieve the cluster object. The cluster object stores an array of jobs that were run, are running, or are queued to run. This allows us to fetch the results of completed jobs. Retrieve and view the list of jobs as shown below."),
  0.8535576462745667),
 (Document(id='hpc_intro.rst.txt__6244', metadata={'start_index': 6244, 'scraped_on': '2025-07-22', 'doc_name': 'hpc_intro.rst.txt'}, page_content=".. dropdown:: Example\n   :animate: fade-in-slide-down\n   :chevron: down-up\n\n   A user requests 1 node, 4 CPUs, 4 GPUs, and 3 hours of walltime on the Booster partition of Leonardo. However, the job runs for only 2 hours.\n\n   From this information, we have:\n\n   * T = 2 h (elapsed time)\n\n   * N = 1 node\n\n   * C = 32 CPUs (number of CPUs available on a Leonardo Booster compute node â\x80\x94 see :ref:`hpc/leonardo:Hardware Details`)\n\n   and, since:\n\n   .. math::\n\n      \\frac{\\text{Allocated}(\\text{CPU})}{\\text{Total}(\\text{CPU})} = \\frac{4}{32} = 0.125\n\n   .. math::\n\n      \\frac{\\text{Allocated}(\\text{GPU})}{\\text{Total}(\\text{GPU})} = \\frac{4}{4} = 1.0\n\n   the maximum of the resources requested per node is determined by the GPUs, therefore *R* = 1.0, and the billed hours are then calculated as:\n\n   .. math::\n\n      B_{H} = T \\cdot N \\cdot R \\cdot C = 2 \\cdot 1 \\cdot 1.0 \\cdot 32 = 64 \\text{CPUh}\n\n   This means the job consumes 64 effective CPU hours from the project's budget.\n\n----\n\n.. note::\n\n   * The **serial partition** is available for limited post-production data analysis and can be used even after a Project Account has expired. Usage of this partition is excluded from STDH billing (**free of charge**).\n\n   * By default, the amount of memory allocated per node is proportional to the number of CPUs requested.\n\n   * When nodes are requested in **exclusive mode** (see :ref:`hpc/hpc_scheduler:Scheduler and Job Submission` section), the entire node is reserved for the job, regardless of the specific resources requested. In such cases, the allocated resources may exceed the explicitly requested ones.\n\n   * The **resources per node** are listed in the **Hardware Details** section for each cluster. Refer to the :ref:`hpc/hpc_clusters:Cluster Specifics` section for the complete list of Cineca's HPC systems.\n\nBudget Linearization\n^^^^^^^^^^^^^^^^^^^^\n\nA linearization policy governs the priority of scheduled jobs across Cineca clusters. To each Project Account is assigned a monthly quota (MQ) calculated as: \n\n.. math::\n\n   MQ = TB/NM\n\nTB = total assigned budget\n\nNM = total number of months\n\nBeginning on the first day of each month, any User Accounts belonging a Project Account may utilize their quota at full priority. \nAs the budget is consumed, submitted jobs progressively lose priority until the monthly quota is exhausted. \nSubsequently, these jobs are still considered for execution but with reduced priority compared to accounts with remaining quota. \nThis policy aligns with practices at other prominent HPC centers globally, aiming to enhance response times by aligning CPU hour usage with budget sizes."),
  0.8583566546440125),
 (Document(id='galileo.rst.txt__2334', metadata={'start_index': 2334, 'scraped_on': '2025-07-22', 'doc_name': 'galileo.rst.txt'}, page_content='.. list-table:: \n\t:widths: 10 10 20 10 10 10 10 20\n\t:header-rows: 1\n\t:class: tight-table\n\n\t* - **Partition**\n\t  - **QOS**\n\t  - **#Cores per job**\n\t  - **Walltime**\n\t  - **Max jobs/resources per user**\n\t  - **Max memory per node (MB)**\n\t  - **Priority**\n\t  - **Notes**\n\t* - g100_all_serial\n\n\t    (default)\n\t  - noQOS\n\t  - 4 cores\n\t  - 04:00:00\n\t  - 4 cores\n\n\t    120 submitted jobs\n\t  - 31,200\n\t  \n\t    (30 GB)\n\t  - 40\n\t  - on two login nodes\n\n\t    **budget free**\n\t* - g100_all_serial\n\n\t    (default)\n\t  - qos_install\n\t  - 16 cores\n\t  - 04:00:00\n\t  - 16 cores\n\n\t    1 running job\n\t  - 100 GB\n\t  - 40\n\t  - request to superc@cineca.it\n\t* - g100_usr_dbg\n\t  - noQOS\n\t  - 2 nodes\n\t  - 01:00:00\n\t  -\n\t  - 375,300\n\n\t    (366 GB)\n\t  - 40\n\t  -\n\t* - g100_usr_dbg\n\t  - qos_ind\n\t  - Depending on the specific agreement\n\t  - Depending on the specific agreement\n\t  -\n\t  - 375,300\n\n\t    (366 GB)\n\t  - 90\n\t  - Partition dedicated to specific kinds of users.\n\t* - g100_usr_prod\n\t\n\t    *g100_usr_smem*\n\t  \n\t    **g100_usr_pmem**\n\t  - noQOS\n\t  - min = 1\n\t  \n\t    max =  32 nodes\n\t  - 24:00:00\n\t  - 100 running jobs\n\t  \n\t    120 submitted jobs\n\t  - 375,300\n\t  \n\t    (366 GB)\n\t  - 40\n\t  - runs on thin and persistent memory nodes\n\t\n\t    *runs only on thin nodes*\n\t  \n\t    **runs only on persistent memory nodes**\n\t* - g100_usr_prod\n\t\n\t    *g100_usr_smem*\n\t  \n\t    **g100_usr_pmem**\n\t  - g100_qos_bprod\n\t  - min = 1537 (33 nodes)\n\t  \n\t    max =  3072 (64 nodes)\n\t  - 24:00:00\n\t  - 100 running jobs\n\t  \n\t    120 submitted jobs\n\t  - 375,300\n\t  \n\t    (366 GB)\n\t  - 60\n\t  - runs on thin and persistent memory nodes\n\t\n\t    *runs only on thin nodes*\n\t  \n\t    **runs only on persistent memory nodes**\n\t* - g100_usr_prod\n\t\n\t    *g100_usr_smem*\n\t  \n\t    **g100_usr_pmem**\n\t  - g100_qos_lprod\n\t  - min = 1\n\t  \n\t    max =  2 nodes\n\t  - 4-00:00:00\n\t  - 2 nodes\n\t  \n\t    100 running jobs\n\t  \n\t    120 submitted jobs\n\t  - 375,300\n\t  \n\t    (366 GB)\n\t  - 40\n\t  - runs on thin and persistent memory nodes\n\t\n\t    *runs only on thin nodes*\n\t  \n\t    **runs only on persistent memory nodes**\n\t* - g100_usr_prod\n\t\n\t    *g100_usr_smem*\n\t  \n\t    **g100_usr_pmem**\n\t  - qos_special\n\t  - > 32 nodes\n\t  - > 24:00:00\n\t  -\n\t  - 375,300\n\t  \n\t    (366 GB)\n\t  - 40\n\t  - request to superc@cineca.it\n\t* - g100_usr_bmem\n\t  - noQOS\n\t  - 25 nodes\n\t  - 24:00:00\n\t  - 100 running jobs\n\t  \n\t    120 submitted jobs\n\t  - 3,036,000\n\t  \n\t    (3 TB)\n\t  - 40\n\t  - runs on fat nodes\n\t* - g100_usr_interactive\n\t  - noQOS\n\t  - max = 0.5 node\n\t  - 8:00:00\n\t  - 100 running jobs\n\t  \n\t    120 submitted jobs\n\t  - 375,300\n\t  \n\t    (366 GB)\n\t  - 40\n\t  - on nodes with GPUs\n\n\t    --gres=gpu:N (N=1)\n\t* - g100_meteo_prod\n\t  - qos_meteo\n\t  - \n\t  - 24:00:00\n\t  - \n\t  - 375,300\n\t  \n\t    (366 GB)\n\t  - 40\n\t  - Partition reserved to meteo services'),
  0.8675727844238281),
 (Document(id='qe.rst.txt__0', metadata={'scraped_on': '2025-07-22', 'doc_name': 'qe.rst.txt', 'start_index': 0}, page_content=".. _quantum_espresso_card:\n\nQuantumESPRESSO\n===============\n\nThe following guide describes how to load, configure and use QuantumESPRESSO @ CINECA's cluster.\nQuantumESPRESSO is available on :ref:`hpc/leonardo:Leonardo` and :ref:`hpc/galileo:Galileo100` clusters.\n\nRelevant links\n^^^^^^^^^^^^^^\n\n- QE repository: https://gitlab.com/QEF/q-e.git\n- MaX benchmarks: https://gitlab.com/max-centre/benchmarks-max3.git\n- JUBE xmls: https://gitlab.com/max-centre/JUBE4MaX.git\n- spack recipe: https://gitlab.com/spack/spack/-/blob/develop/var/spack/repos/builtin.mock/packages/quantum-espresso/package.py\n\nModules\n^^^^^^^\n\nCPU-based and GPU-based machines deploy QuantumESPRESSO with different software stacks, to fully exploit the underlying hardware. In particular:\n\n- **Intel/Oneapi** compiler and MPI implementation on G100 and Leonardo DCGP, plus **MKL** for FFT, BLAS/LAPACK and SCALAPACK\n- **NVHPC** compiler and **OpenMPI/HPCX-MPI** on Leonardo Booster, plus **OpenBLAS** and **FFTW** libraries.\n\nInstallations based on gcc compiler do not provide performance, and are provided for postprocessing executables.\n\nAlternative Installations\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIf you wish installing your own version of QuantumESPRESSO, we suggesting using CMake and the options provided in the `Wiki of the official repository <https://gitlab.com/QEF/q-e/-/wikis/Developers/CMake-build-system>`_ for the CINECA cluster in use. \n\nParallelization strategies\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nQuantumESPRESSO supports different parallelization strategies. \n\n- R&G (`-npw` or no options) processes to distribute real/reciprocal spaces\n- pools (`-nk`) to distribute k-points\n- images (`-ni`) to distribute irreducible representations or q-points in a dispersion\n- band processes (`-nbnd`) to distribute the Kohn-Sham states\n- linear algebra processes (auto) to distribute diagonalization, via scalapack or custom algorithm. For GPU installations, the diagonalization is done on a single GPU (scalapack are not used\n\nWe suggest the following for optimal performance on Leonardo Booster:\n\n- prioritize pools over R&G , in particular for workloads with hundreds of planes or less in the z-direction, also for intra-node distribution. \n- The minimum number of k-points per pool (kunit) in PWSCF is the number of k-points (`kunit=1`), while in phonon is usually `kunit=2`"),
  0.8792144060134888),
 (Document(id='leonardo.rst.txt__20840', metadata={'doc_name': 'leonardo.rst.txt', 'start_index': 20840, 'scraped_on': '2025-07-22'}, page_content='.. raw:: html\n\n          <p>Each Booster cell is composed of:</p>\n          <ul>\n            <li><strong>6 Ã\x97 Atos BullSequana XH2000 racks</strong>, each containing:\n              <ul>\n                <li>3 Ã\x97 Level 2 (L2) switches</li>\n                <li>3 Ã\x97 Level 1 (L1) switches</li>\n                <li>30 compute nodes â\x80\x94 each equipped with 4 GPUs, each connected via a dedicated 100 Gbps port</li>\n              </ul>\n            </li>\n          </ul>\n\n          <p><strong>Total per Booster cell:</strong> 18 L2 switches, 18 L1 switches, and 180 compute nodes.</p>\n\n          <h4>Connectivity Overview</h4>\n\n          <p><strong>Level 2 (L2) Switches:</strong></p>\n          <ul>\n            <li><strong>UP:</strong> 22 Ã\x97 200 Gbps ports connecting to L2 switches in other cells</li>\n            <li><strong>DOWN:</strong> 18 Ã\x97 200 Gbps ports connecting to L1 switches within the cell</li>\n            <li><strong>Oversubscription:</strong> 0.8:1</li>\n          </ul>\n\n          <p><strong>Level 1 (L1) Switches:</strong></p>\n          <ul>\n            <li><strong>UP:</strong> 18 Ã\x97 200 Gbps ports connected to all L2 switches in the cell</li>\n            <li><strong>DOWN:</strong> 40 Ã\x97 100 Gbps ports connected to GPUs across 10 compute nodes</li>\n            <li><strong>Oversubscription:</strong> 1.11:1</li>\n          </ul>\n\n        .. figure:: img/leo-net-booster_cell.png\n          :height: 750px\n          :align: center\n      \n      .. tab-item:: DCGP\n        \n        .. raw:: html'),
  0.8816628456115723),
 (Document(id='leonardo.rst.txt__7317', metadata={'scraped_on': '2025-07-22', 'doc_name': 'leonardo.rst.txt', 'start_index': 7317}, page_content='.. tab-item:: Booster\n\n        +----------------+--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n        | **Partition**  | **QOS**            | **#Cores/#GPU per job** | **Walltime** | **Max Nodes/cores/GPUs/user**   | **Priority** | **Notes**                           |\n        +================+====================+=========================+==============+=================================+==============+=====================================+\n        | lrd_all_serial | normal             | 4 cores                 | 04:00:00     | 1 node / 4 cores                | 40           | No GPUs'),
  0.8856500387191772),
 (Document(id='singularity.rst.txt__21115', metadata={'doc_name': 'singularity.rst.txt', 'start_index': 21115, 'scraped_on': '2025-07-22'}, page_content='.. code-block:: bash\n\n            salloc -t 03:00:00 --nodes=6 --ntasks-per-node=4 --ntasks=24 --gres=gpu:4 -p boost_usr_prod -A <Account_name>\n            <load the necessary modules and/or export necessary variables>\n            export OMP_NUM_THREADS=8\n            srun --nodes=6 --ntasks-per-node=4 --ntasks=24 singularity exec --nv <container_img> <container_cmd>\n\n        \n\n    .. tab-item:: Galileo100\n\n        On Galileo100, Singularity version 3.8.0 is available on the login nodes and on the partitions. \n        Beware that, for the Galileo00 cluster, nodes with GPU are available only under \n        a reservation (send an email to superc@cineca.it) and through the interactive computing service; moreover, there one \n        can at most request one node with **2 GPUs** so no internode communication will actually be performed. \n        The necessary MPI, Singularity and CUDA modules are the following:\n\n            * ``module load profile/advanced`` (profile with additional modules)\n            * ``module load autoload singularity/3.8.0--bind--openmpi--4.1.1``\n            * ``module load cuda/11.5.0``\n\n        .. note::\n            \n            The ``module load autoload singularity/3.8.0--bind--openmpi--4.1.1`` command automatically loads the following modules:\n\n            * ``singularity/3.8.0--bindâ\x80\x93openmpiâ\x80\x934.1.1``\n            * ``zlib/1.2.11--gccâ\x80\x9310.2.0``\n            * ``openmpi/4.1.1--gcc--10.2.0-cudaâ\x80\x9311.1.0``\n\n        The following code snippet is an example of a Slurm job script for running MPI parallel containerized applications on the GALILEO100 cluster. \n        Notice that the ``--cpus-per-task`` option has been set to **48** to fully exploit the CPUs in the ``g100_usr_prod`` partition.\n\n        .. code-block:: bash\n\n            #!/bin/bash\n \n            #SBATCH --nodes=6\n            #SBATCH --tasks-per-node=1\n            #SBATCH --cpu-per-task=48\n            #SBATCH --mem=30GB\n            #SBATCH --time=00:10:00\n            #SBATCH --out=slurm.%j.out\n            #SBATCH --err=slurm.%j.err\n            #SBATCH --account=<Account_name>\n            #SBATCH --partition=g100_usr_prod\n  \n            module purge\n            module load profile/advanced\n            module load autoload singularity/3.8.0--bind--openmpi--4.1.1\n            module load cuda/11.5.0\n \n            mpirun -np 6 singularity exec <container_img> <container_cmd>'),
  0.9354047179222107),
 (Document(id='leonardo.rst.txt__18311', metadata={'scraped_on': '2025-07-22', 'doc_name': 'leonardo.rst.txt', 'start_index': 18311}, page_content='.. note::\n\n          The partitions: **dcgp_fua_dbg, dcgp_fua_prod** can be exclusively used by Eurofusion users. For more information see the dedicated :ref:`specific_users/specific_users:Eurofusion` section.\n\nNetwork Architecture\n--------------------\n\n.. raw:: html\n\n  <p><strong>Leonardo</strong> features a state-of-the-art interconnect system tailored for high-performance computing (HPC). It delivers <em>low latency</em> and <em>high bandwidth</em> by leveraging <strong>NVIDIA Mellanox InfiniBand HDR</strong> (High Data Rate) technology, powered by <a href="https://nvdam.widen.net/s/zmbw7rdjml/infiniband-qm8700-datasheet-us-nvidia-1746790-r12-web">NVIDIA QUANTUM QM8700 Smart Switches</a>, and a <strong><a href="https://ieeexplore.ieee.org/document/7885210">Dragonfly+ topology</a></strong>. Below is an overview of its architecture and key features:</p>\n\n  <ul>\n    <li><strong>Hierarchical Cell Structure:</strong> The system is structured into multiple <em>cells</em>, each comprising a group of interconnected compute nodes.</li>\n\n    <li><strong>Inter-cell Connectivity:</strong> As illustrated in the figure below, cells are connected via an all-to-all topology. Each pair of distinct cells is linked by 18 independent connections, each passing through a dedicated Layer 2 (L2) switch. This design ensures high availability and reduces congestion.</li>\n\n    <li><strong>Intra-cell Topology:</strong> Inside each cell, a non-blocking two-layer fat-tree topology is used, allowing scalable and efficient intra-cell communication.</li>\n\n    <li><strong>System Composition:</strong>\n      <ul>\n        <li>19 cells dedicated to the <em>Booster</em> partition.</li>\n        <li>2 cells for the <em>DCGP</em> (Data-Centric General Purpose) partition.</li>\n        <li>1 hybrid cell with both accelerated (36 Booster nodes) and conventional (288 DCGP nodes) compute resources.</li>\n        <li>1 cell allocated for management, storage, and login services.</li>\n      </ul>\n    </li>\n\n    <li><strong>Adaptive Routing:</strong> The network employs adaptive routing, dynamically optimizing data paths to alleviate congestion and maintain performance under load.</li>\n  </ul>\n  \n.. figure:: img/leo-net-all2all.png\n   :height: 350px\n   :align: center\n   :class: no-scaled-link\n\n.. image:: img/spacer.png\n   :align: center\n   :class: no-scaled-link\n   \n.. dropdown:: Cell Configuration and Intra-cell Connectivity\n   :animate: fade-in-slide-down\n   :chevron: down-up\n\n   .. tab-set::\n\n      .. tab-item:: Booster'),
  0.940382719039917),
 (Document(id='pitagora.rst.txt__0', metadata={'start_index': 0, 'doc_name': 'pitagora.rst.txt', 'scraped_on': '2025-07-22'}, page_content=".. _pitagora_card:\n\nPitagora\n========\n\n.. figure:: ../img/warning3.png\n   :align: center\n   :class: no-scaled-link\n   :height: 150px\n\n.. figure:: ../img/spacer.png\n   :align: center\n   :class: no-scaled-link\n   :height: 20px\n\nPitagora is the new EUROfusion supercomputer hosted by **CINECA** and currently built in the CINECA's headquarter in Casalecchio di Reno, Bologna, Italy. The cluster is supplied by Lenovo corp. and is composed of two partitions: A general purpose partition cpu-based named **DCPG** and an accelerated partition based on NVIDIA H100 accelerators named **Booster**.\n\nThe specific guide for the **Pitagora** cluster contains unique information that deviates from the general behavior described in the HPC Clusters sections.\n\nAccess to the System\n--------------------\n\nThe machine is reachable via ``ssh`` (secure Shell) protocol at hostname point: **login.pitagora.cineca.it**. \n\nThe connection is established, automatically, to one of the available login nodes. It is possible to connect to **Pitagora** using one the specific login hostname points:\n\n * login01-ext.pitagora.cineca.it\n * login02-ext.pitagora.cineca.it\n * login03-ext.pitagora.cineca.it\n * login04-ext.pitagora.cineca.it\n * login05-ext.pitagora.cineca.it\n * login06-ext.pitagora.cineca.it\n\n.. warning::\n    \n    **The mandatory access to Pitagora is the two-factor authetication (2FA)**. Get more information at section :ref:`general/access:Access to the Systems`.\n\nSystem Architecture\n-------------------\n\nThe system, supplied by Lenovo, is based on two new specifically-designed compute blades, which are available throught two distinct SLURM partitios \non the Cluster:\n\n* **GPU** blade based on NVIDIA NVIDIA H100 accelerators - **Booster** partition.\n* **CPU**-only blade based on  AMD Turin 128c processors - **Data Centric General Purpose (DCGP)** partition.\n\nThe overall system architecture uses NVIDIA Mellanox InfiniBand High Data Rate (HDR) connectivity, with smart in-network computing acceleration engines that enable extremely low latency and high data throughput to provide the highest AI and HPC application performance and scalability. \n\n\nHardware Details\n^^^^^^^^^^^^^^^^\n.. tab-set::\n\n    .. tab-item:: Booster"),
  0.9409757852554321),
 (Document(id='matlab.rst.txt__5663', metadata={'start_index': 5663, 'scraped_on': '2025-07-22', 'doc_name': 'matlab.rst.txt'}, page_content=".. code-block:: matlabsession\n        \n        >> DEFAULTPROFILE='galileo100 R2024b'\n\non Galileo100 and similarly on Leonardo.\n\nConfiguring Jobs\n^^^^^^^^^^^^^^^^\n\nPrior to submitting the job, various parameters have to be specified in order to be passed to jobs, such as queue, username, e-mail, etc. \n\n.. note::\n Any parameters specified using the below workflow will be persistent between MATLAB sessions if saved at the end of the configuration.\n\nBefore specifying any parameters, you will need to obtain a handle to the cluster object.\n\n.. code-block:: matlabsession\n\n        >> % Get a handle to the cluster\n        >> c = parcluster;\n\nYou are now **required** to specify an Account Name, a Queue Name and the Wall Time (visit :ref:`hpc/hpc_intro:Budget and Accounting` to see how to retrieve your Budget Account Name using the saldo command)\n\n.. code-block:: matlabsession\n\n        >> % Specify an Account to use for MATLAB jobs\n        >> c.AdditionalProperties.AccountName = 'account_name';\n\n        >> % Specify a queue to use for MATLAB jobs\n        >> c.AdditionalProperties.Partition = 'partition-name';\n\n        >> % Specify the walltime (e.g. 5 hours)\n        >> c.AdditionalProperties.WallTime = '05:00:00';\n\nOn Leonardo cluster there are two partitions: 'boost_usr_prod' to access GPU nodes and 'dcgp_usr_prod' to access CPU nodes. You can find additional info on the :ref:`hpc/leonardo:Leonardo` dedicated pages.\nFor Galileo100 cluster the main partition is 'g100_usr_prod'. In :ref:`hpc/galileo:Galileo100` dedicated page you can find other possible Partitions and QOS available allowing for different combinations of nodes, walltime and priority.\n\nYou can specify other **additional** (not-mandatory) parameters along with your job."),
  0.975282609462738),
 (Document(id='hpc_data_storage.rst.txt__40289', metadata={'scraped_on': '2025-07-22', 'doc_name': 'hpc_data_storage.rst.txt', 'start_index': 40289}, page_content='.. code-block:: bash\n\n                    ssh -xt <username>@data.<cluster_name>.cineca.it wget http://ftp.gnu.org/gnu/wget/wget2-2.0.0.tar.gz -P /absolute/path/to/\n\n                Please note that is mandatory to use the flag -P with the absolute path of the destination folder, because of the fake /home directory.\n\n\n            - **curl**\n\n                Sometimes, the 10-minute CPU time limit or the 4-hour wall time limit on the serial queue is not enough to download a large dataset for ML. In this case, you can use curl from the datamover. Here you can find a simple example\n\n                .. code-block:: bash\n\n                    ssh -xt <username>@data.<cluster_name>.cineca.it curl https://curl.se/download/curl-8.2.1.tar.gz --output /absolute/path/to/curl-8.2.1.tar.gz\n\n                Please note that is mandatory to use the flag --output with the absolute path of the destination file, because of the fake /home directory.\n\n\n            - **rclone**\n\n                Rclone is a powerful tool that supports different transfer protocols, and a lot of data [providers](https://rclone.org/#providers). At the moment is available only on Leonardo datamovers. It needs a configuration file. If you are able, you car write the configuration file using your favourite editor (VIM) or you can rely on the rclone config command:\n\n                .. code-block:: bash\n\n                    ssh -xt <username>@data.leonardo.cineca.it rclone --config /leonardo/home/userexternal/<username>/.rclone.conf config\n\n                When your configuration is ready you can use rclone to manage data between Leonardo filesystem and the remote host you have configures. For example:'),
  0.9759920239448547),
 (Document(id='specific_users.rst.txt__0', metadata={'scraped_on': '2025-07-22', 'doc_name': 'specific_users.rst.txt', 'start_index': 0}, page_content='.. _spec_users_card:\n\nEUROfusion\n==========\n\n.. figure:: ../img/warning3.png\n   :align: center\n   :class: no-scaled-link\n   :height: 150px\n\n.. figure:: ../img/spacer.png\n   :align: center\n   :class: no-scaled-link\n   :height: 20px\n\n.. |ico1| image:: img/EUROfusion.png\n   :height: 35px\n   :class: no-scaled-link\n\nThe |ico1| community has access to the following CINECA HPC systems:\n\n * Leonardo\n\n    - Booster partition\n    - DCGP partition\n\n * Pitagora\n\n.. important::\n   The general environment defined on our clusters is the same for all the users, so EUROfusion users are invited to refer to the general documentation.\n   Essential links below.\n\nFor general information regarding the access to the HPC clusters:\n\n* :ref:`general/getting_started:Getting Started`\n* :ref:`general/users_account:Users and Accounts`\n* :ref:`general/access:Access to the Systems`\n\nFor general information regarding the environment on the HPC clusters:\n\n* :ref:`hpc/hpc_intro:Introduction HPC Resources`\n* :ref:`hpc/hpc_data_storage:File Systems and Data Management`\n* :ref:`hpc/hpc_scheduler:Scheduler and Job Submission`\n* :ref:`hpc/hpc_enviroment:Environment and Customization`\n\nFor specific information regarding the HPC clusters used by the EUROfusion community:\n\n* :ref:`hpc/leonardo:Leonardo`\n* :ref:`hpc/pitagora:Pitagora`\n\n\nDedicated tutorials\n-------------------\n\n.. |tutorial| image:: /specific_users/img/tutorial_icon.png\n   :width: 35px\n   :class: no-scaled-link\n\nA presentation of each supercomputer, and of the access method via two-factor authentication (2FA), have been dedicated to the EUROfusion community. You can find the slides, including a report of the final Q&A session, and the recording at the following links (you should log in through the button :bdg-black-line:`Access as a guest`).\n\n.. card:: |tutorial| Leonardo Booster: *Introduction to Leonardo supercomputer for Eurofusion*\n\n   June 6th, 2023\n\n   `Leonardo Booster webinar page <https://learn.cineca.it/course/view.php?id=1461>`_ with slides and recording.\n\n   :download:`Leonardo Booster slides <../files/Leonardo_Booster_EF.pdf>`\n\n.. card:: |tutorial| Leonardo DCGP: *Introduction to Leonardo DCGP for Eurofusion*\n   \n   February 18th, 2025\n   \n   `Leonardo DCGP webinar page <https://learn.cineca.it/course/view.php?id=2025>`_ with slides and recording.\n\n   :download:`Leonardo DCGP slides <../files/Leonardo_DCGP_EF.pdf>`\n\n.. card:: |tutorial| 2FA: *Introduction to two-factor authentication (2FA) on CINECA HPC clusters*\n   \n   June 7th, 2023\n   \n   :download:`2FA slides <'),
  0.9863142967224121),
 (Document(id='leonardo.rst.txt__12699', metadata={'scraped_on': '2025-07-22', 'doc_name': 'leonardo.rst.txt', 'start_index': 12699}, page_content='+----------------+--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n        | **Partition**  | **QOS**            | **#Cores/#GPU per job** | **Walltime** | **Max Nodes/cores/GPUs/user**        | **Priority** | **Notes**                           |\n        +================+====================+=========================+==============+======================================+==============+=====================================+\n        | lrd_all_serial | normal             | max = 4 cores           | 04:00:00     | 1 node / 4 cores                     | 40           | Hyperthreading x 2                  |\n        |                |                    |                         |              |                                      |              |                                     |\n        | (**default**)  |                    | (8 logical cores)       |              | (30800 MB RAM)                       |              | **Budget Free**                     |\n        +----------------+--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n        | dcgp_usr_prod  | normal             | 16 nodes                | 24:00:00     | 512 nodes per prj. account           | 40           |                                     |\n        +                +--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n        |                | dcgp_qos_dbg       | 2 nodes                 | 00:30:00     | 2 nodes / 224 cores per user account | 80           |                                     |\n        |                |                    |                         |              |                                      |              |                                     |\n        |                |                    |                         |              | 512 nodes per prj. account           |              |                                     |\n        +                +--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n        |                | dcgp_qos_bprod     | min = 17 nodes          | 24:00:00     | 128 nodes per user account           | 60           | GrpTRES = 1536 nodes                |\n        |                |                    |                         |              |                                      |              |                                     |\n        |                |                    | max = 128 nodes         |              | 512 nodes per prj. account           |              | Min is 17 FULL nodes                |\n        +                +--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n        |                | dcgp_qos_lprod     | 3 nodes                 | 4-00:00:00   | 3 nodes / 336 cores per user account | 40           |                                     |\n        |                |                    |                         |              |                                      |              |                                     |\n        |                |                    |                         |              | 512 nodes per prj. account           |              |                                     |\n        +----------------+--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n        | dcgp_fua_dbg   | normal             | 2 nodes                 | 00:10:00     | 2 nodes / 224 cores                  | 40           | Runs on 2 nodes                     |\n        +----------------+--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n        | dcgp_fua_prod  | normal             | 16 nodes                | 24:00:00     |                                      | 40           |                                     |\n        +                +--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+'),
  0.9868746399879456),
 (Document(id='pitagora.rst.txt__4589', metadata={'start_index': 4589, 'doc_name': 'pitagora.rst.txt', 'scraped_on': '2025-07-22'}, page_content='.. tab-item:: Booster\n\n        +------------------+------------------------+---------------------------+--------------+----------------------------+--------------+-------------------------------------+\n        | **Partition**    | **QOS**                | **#Nodes/#per job**       | **Walltime** | **#Max Nodes/#per user**   | **Priority** | **Notes**                           |\n        +==================+========================+===========================+==============+============================+==============+=====================================+\n        | boost_fua_prod   | normal                 | max = 16                  | 24:00:00     | 32                         | 40           |                                     |\n        +                  +------------------------+---------------------------+--------------+----------------------------+--------------+-------------------------------------+\n        |                  | boost_qos_fuadbg       | max = 2                   | 00:10:00     |  2                         | 80           |                                     |\n        +                  +------------------------+---------------------------+--------------+----------------------------+--------------+-------------------------------------+\n        |                  | boost_qos_fuaprod      | min = 17 (full nodes)     | 24:00:00     | 32                         | 60           | runs on 96 nodes (GrpTRES)          |\n        |                  |                        | max = 32                  |              |                            |              |                                     |\n        +                  +------------------------+---------------------------+--------------+----------------------------+--------------+-------------------------------------+\n        |                  | boost_qos_fualprod     | max = 3                   | 4-00:00:00   |  3                         | 40           |                                     |\n        +------------------+------------------------+---------------------------+--------------+----------------------------+--------------+-------------------------------------+'),
  1.0001477003097534),
 (Document(id='leonardo.rst.txt__12453', metadata={'scraped_on': '2025-07-22', 'start_index': 12453, 'doc_name': 'leonardo.rst.txt'}, page_content='.. note::\n\n          The partitions: **boost_fua_dbg, boost_fua_prod** can be exclusively used by Eurofusion users. For more information see the dedicated :ref:`specific_users/specific_users:Eurofusion` section.'),
  1.0093889236450195),
 (Document(id='galileo.rst.txt__0', metadata={'doc_name': 'galileo.rst.txt', 'scraped_on': '2025-07-22', 'start_index': 0}, page_content='.. _galileo_card:\n\nGalileo100\n==========\n\nGalileo100 is a new infrastructure co-funded by the European ICEI (Interactive Computing e-Infrastructure) project and engineered by DELL. It is the national Tier-1 system for scientific research and is available to the Italian public and industrial researchers since September 2021. It also features 77 cloud computing servers and was expanded in November 2022 with 82 additional nodes. **Galileo100** is used for high-end technical and industrial HPC projects, as well as meteorology and environmental studies.\n\nThe specific guide for the **Galileo100** cluster contains unique information that deviates from the general behavior described in the HPC Clusters sections.\n\nAccess to the System\n--------------------\n\nThe machine is reachable via ``ssh`` (secure Shell) protocol at hostname point: **login.g100.cineca.it**.\n\nThe connection is established, automatically, to one of the available login nodes. It is possible to connect to **Galileo100** using one the specific login hostname points:\n\n* login01-ext.g100.cineca.it\n* login02-ext.g100.cineca.it \n* login03-ext.g100.cineca.it \n\n.. warning::\n    \n    **The mandatory access to Galileo100 is the two-factor authetication (2FA)**. Get more information at section :ref:`general/access:Access to the Systems`.\n\n\nSystem Architecture\n-------------------\n\n\n\nHardware Details\n^^^^^^^^^^^^^^^^\n.. list-table:: \n    :widths: 30 50\n    :header-rows: 1\n\n    * - **Type**\n      - **Specific**\n    * - Models\n      - Dual-soket Dell PowerEdge\n    * - Nodes\n      - 630\n    * - Processors/node\n      - 2xCPU x86 Intel Xeon Platinum 8276/L 2.4GHz\n    * - CPU/node\n      - 48 \n    * - Accelerators/node\n      - 2xGPU Nvidia V100 PCIe3 with 32 GB Ram on 36 Viz Nodes\n    * - RAM/node\n      - 384 GiB (+ 3.0 TiB Optane on 180 fat nodes)\n    * - Peak Performance\n      - 2 PFlop/s (3.53 TFlop/s in single node)\n    * - Internal Network\n      - Mellanox Infiniband 100GbE\n\n\nDisks and Filesystems\n---------------------\n\nThe storage organization conforms to **CINECA** infrastructure. General information are reported in :ref:`hpc/hpc_data_storage:File Systems and Data Management` section. In the following, only differences with respect to general behavior are listed and explained.\n\n\nJob Managing and SLURM Partitions \n---------------------------------'),
  1.0247918367385864),
 (Document(id='singularity.rst.txt__13791', metadata={'doc_name': 'singularity.rst.txt', 'scraped_on': '2025-07-22', 'start_index': 13791}, page_content='.. list-table:: \n     :widths: 30 30 30 30\n     :header-rows: 1\n     \n     * - \n       - **Driver Version**\n       - **CUDA Version**\n       - **GPU Model**\n     * - Leonardo\n       - 530.30.20\n       - 12.1\n       - NVIDIA A100 SXM6 64 GB HBM2\n     * - Galileo100\n       - 470.42.01\n       - 11.4\n       - NVIDIA V100 PCIe3 32 GB\n\nwhile the `CUDA compatibility <https://docs.nvidia.com/deploy/cuda-compatibility/>`_ table is:\n\n.. list-table:: \n    :widths: 50 50 \n    :header-rows: 1\n\n    * - **CUDA Version**\n      - **Required Drivers**\n    * - CUDA 12.x\n      - from 525.60.13\n    * - CUDA 11.x\n      - from 450.80.02\n\nOne can surely install a working version of CUDA on his own, for example via Spack. However, a simple and effective way to obtain a container image provided with a CUDA installation is to bootstrap from an NVIDIA HPC SDK docker container, which already comes equipped with CUDA, OpenMPI and the NVHPC compilers. Such containers are available at the `NVIDIA catalog <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nvhpc/tags>`_. Their tag follows a simple structure, ``$NVHPC_VERSION-$BUILD_TYPE-cuda$CUDA_VERSION-$OS``,  where:\n\n1. ``$BUILD_TYPE``: can either take the value devel or runtime. The first ones are usually heavier and employed to compile and install applications. The second ones are lightweight containers for deployment, stripped of all the compilers and applications not needed at runtime execution.\n2. ``$CUDA_VERSION``: an either take a specific value (e.g. ) or be a ``multi``.  The multi flavors hold up to three different CUDA version, and as such are much heavier. However, they can be useful to deploy the same base container on HPC with different CUDA specifics or to try out the performance of the various versions.\n\nIn the following we provide a minimal Singularity definition file following the above principles, namely: bootstrap from a develop NVIDIA HPC SDK container, install the needed applications, copy the necessary binaries and files for runtime, pass to a lightweight container. This technique is called multistage build, more information available `here <https://docs.sylabs.io/guides/3.7/user-guide/definition_files.html#multi-stage-builds>`_.'),
  1.028136968612671),
 (Document(id='leonardo.rst.txt__2145', metadata={'start_index': 2145, 'scraped_on': '2025-07-22', 'doc_name': 'leonardo.rst.txt'}, page_content='.. list-table:: \n            :widths: 30 50\n            :header-rows: 1\n\n            * - **Type**\n              - **Specific**\n            * - Models\n              - Atos BullSequana X2135, Da Vinci single-node GPU\n            * - Racks\n              - 116\n            * - Nodes\n              - 3456\n            * - Processors/node\n              - 1x `Intel Ice Lake Intel Xeon Platinum 8358 <https://www.intel.com/content/www/us/en/products/sku/212282/intel-xeon-platinum-8358-processor-48m-cache-2-60-ghz/specifications.html>`_\n            * - CPU/node\n              - 32\n            * - Accelerators/node\n              - 4x `NVIDIA Ampere100 custom <https://doi.org/10.17815/jlsrf-8-186>`_, 64GiB HBM2e NVLink 3.0 (200 GB/s)\n            * - Local Storage/node (tmfs)\n              - (none)\n            * - RAM/node \n              - 512 GiB DDR4 3200 MHz\n            * - Rmax\n              - 241.2 PFlop/s (`top500 <https://www.top500.org/system/180128/>`_)\n            * - Internal Network\n              - 200 Gbps NVIDIA Mellanox HDR InfiniBand - Dragonfly+ Topology \n            * - Storage (raw capacity)\n              - 106 PiB based on DDN ES7990X and Hard Drive Disks (Capacity Tier) \n              \n                5.7 PiB based on DDN ES400NVX2 and Solid State Drives (Fast Tier)\n\n    .. tab-item:: DCGP\n\n        .. list-table::\n            :widths: 30 50\n            :header-rows: 1\n            \n            * - **Type**\n              - **Specific**\n            * - Models\n              - Atos BullSequana X2140 three-node CPU blade\n            * - Racks\n              - 22\n            * - Nodes\n              - 1536\n            * - Processors/node\n              - 2x `Intel Sapphire Rapids Intel Xeon Platinum 8480+ <https://www.intel.com/content/www/us/en/products/sku/231746/intel-xeon-platinum-8480-processor-105m-cache-2-00-ghz/specifications.html>`_\n            * - CPU/node\n              - 112 cores/node\n            * - Accelerators\n              - (none)\n            * - Local Storage/node (tmfs)\n              - 3 TiB\n            * - RAM/node\n              - 512(8x64) GiB DDR5 4800 MHz\n            * - Rmax\n              - 7.84 PFlop/s (`top500 <https://www.top500.org/system/180204/>`_)\n            * - Internal Network\n              - 200 Gbps NVIDIA Mellanox HDR InfiniBand - Dragonfly+ Topology\n            * - Storage (raw capacity)\n              - 106 PiB based on DDN ES7990X and Hard Drive Disks (Capacity Tier) \n              \n                5.7 PiB based on DDN ES400NVX2 and Solid State Drives (Fast Tier)\n\n\nFile Systems and Data Managment\n-------------------------------\n\nThe storage organization conforms to **CINECA** infrastructure. General information are reported in :ref:`hpc/hpc_data_storage:File Systems and Data Management` section. In the following, only differences with respect to general behavior are listed and explained.'),
  1.0423932075500488)]

Evaluating k-size in the retrieval step

We can use the previously created Q&A set to tune the parameter k, which indicates the number of documents returned by the vector search. We are interested in observing how the number of relevant chunks returned for each question changes when we increment k.

In this case, we consider a chunk to be “relevant” if it comes from the document that answers the question we are evaluating.

Precision and recall

We can use the precision@k and recall@k metric, which are defined as:

$Precision@k = \frac{\#Relevant Documents Retrieved}{K}$

Precision@k indicates, for a given query, the proportion of retrieved documents that are relevant out of the total number of documents retrieved.

$Recall@k = \frac{\#Relevant Documents Retrieved}{\#Relevant Documents Total}$

The Recall@k indicates, for a given query, how many relevant documents were left out.

def optimize_retriever(vector_store, qa_set:pd.DataFrame, max_k:int, reranker_name:str = None):

    k = [*range(1, max_k+1)]
    precision_k = []
    recall_k = []
    f1_k = []

    if reranker_name:
        # The cross encoder we will use
        model = HuggingFaceCrossEncoder(model_name=reranker_name)
    
    # For each k calculate precision and recall
    for k_threshold in k:
        print(f"{datetime.now()} - Testing k_threshold {k_threshold}")
        # The precision and recall at k achieved for question i
        precision_q_i = []
        recall_q_i = []
        
        for question_i in range(qa_set.shape[0]):
            question = qa_set.iloc[question_i]["question"]
            doc_provenance = qa_set.iloc[question_i]["doc"]
            
            # Trigger search and get top 20 docs
            retriever = vector_store.as_retriever(search_kwargs = {"k":20})

            if reranker_name:
                # Use a cross-encoder to rank the documents with respect to the query and keep only the top k_threshold
                compressor = CrossEncoderReranker(model=model, top_n=k_threshold)
                compression_retriever = ContextualCompressionRetriever(base_compressor=compressor, base_retriever=retriever)
                retrieved_chunks:List[Document] = compression_retriever.invoke(question)
            else:
                # Trigger search and retrieve the top k docs
                retrieved_chunks:List[Document] = vector_store.similarity_search(question, k=k_threshold)
    
            # Calculate precision @k and recall@k for this question
            n_relevant = [1 if chunks.metadata["doc_name"] == doc_provenance else 0 for chunks in retrieved_chunks]
            precision = sum(n_relevant)/k_threshold
            precision_q_i.append(precision)
            
            total_relevant_chunks = len(vector_store.get(where={"doc_name": doc_provenance}, include = ["documents"])["documents"])
            recall_q_i.append(sum(n_relevant) / total_relevant_chunks)
    
        # The precision achieved over all the set of questions for this level of k
        macro_precision_k = np.mean(precision_q_i)
        macro_recall_k = np.mean(recall_q_i)
        precision_k.append(macro_precision_k)
        recall_k.append(macro_recall_k)
        f1_k.append((2 * macro_precision_k * macro_recall_k) / (macro_recall_k + macro_precision_k))
        
    return pd.DataFrame({"k": k, "precision_k": precision_k, "recall_k": recall_k, "f1_k": f1_k})

%%time
k_thresh_tests = optimize_retriever(vector_store = hpc_store, qa_set = qa_set, max_k = 20, reranker_name = None)

2025-07-28 16:42:10.221331 - Testing k_threshold 1
2025-07-28 16:42:11.520177 - Testing k_threshold 2
2025-07-28 16:42:12.795102 - Testing k_threshold 3
2025-07-28 16:42:14.068262 - Testing k_threshold 4
2025-07-28 16:42:15.346388 - Testing k_threshold 5
2025-07-28 16:42:16.628977 - Testing k_threshold 6
2025-07-28 16:42:17.911770 - Testing k_threshold 7
2025-07-28 16:42:19.195639 - Testing k_threshold 8
2025-07-28 16:42:20.481491 - Testing k_threshold 9
2025-07-28 16:42:21.772858 - Testing k_threshold 10
2025-07-28 16:42:23.061653 - Testing k_threshold 11
2025-07-28 16:42:24.353821 - Testing k_threshold 12
2025-07-28 16:42:25.652520 - Testing k_threshold 13
2025-07-28 16:42:26.951166 - Testing k_threshold 14
2025-07-28 16:42:28.250966 - Testing k_threshold 15
2025-07-28 16:42:29.554705 - Testing k_threshold 16
2025-07-28 16:42:30.856730 - Testing k_threshold 17
2025-07-28 16:42:32.163973 - Testing k_threshold 18
2025-07-28 16:42:33.472672 - Testing k_threshold 19
2025-07-28 16:42:34.779222 - Testing k_threshold 20
CPU times: user 21.3 s, sys: 1.47 s, total: 22.8 s
Wall time: 25.9 s

display(k_thresh_tests)

plt.plot(k_thresh_tests["k"], k_thresh_tests["precision_k"], label = "precision_k")
plt.plot(k_thresh_tests["k"], k_thresh_tests["recall_k"], label = "recall_k")
plt.plot(k_thresh_tests["k"], k_thresh_tests["f1_k"], label = "f1_k")
plt.xlabel("k")
plt.title(f"K threshold tests")
plt.legend()
plt.show()

	k	precision_k	recall_k	f1_k
0	1	0.925373	0.562928	0.700017
1	2	0.619403	0.670902	0.644125
2	3	0.462687	0.701886	0.557721
3	4	0.380597	0.745307	0.503883
4	5	0.319403	0.757492	0.449339
5	6	0.281095	0.774453	0.412477
6	7	0.251599	0.783795	0.380922
7	8	0.225746	0.791868	0.351334
8	9	0.203980	0.794103	0.324585
9	10	0.188060	0.797382	0.304342
10	11	0.176391	0.807090	0.289509
11	12	0.165423	0.815928	0.275076
12	13	0.154994	0.817955	0.260606
13	14	0.143923	0.817955	0.244777
14	15	0.135323	0.820940	0.232347
15	16	0.131530	0.845449	0.227644
16	17	0.123793	0.845449	0.215964
17	18	0.119403	0.849974	0.209391
18	19	0.114690	0.851800	0.202160
19	20	0.108955	0.851800	0.193198

../../../../_images/5ab38eee8eeb67ac77694d569de7801d7e4cc101766e31ee19ba60eeb01b3b07.png

The higher the number of chunks, the lower the precision - this because the large majority of the docs have 1 chunk at most - but the recall raises because for docs with more chunks we leave out a smaller fraction of (possibly) relevant chunks.

We can try to use a reranker to see if our Precision@K and Recall@K improves.

Reranking models

A reranker is model specifically trained to rank the documents based on a query. In this notebook we are going to use a cross encoder. Cross encoders are models that process a query and a document jointly and output a relevance score for each document.

The standard semantic search approach uses bi-encoders, where the query and documents are encoded independently into fixed-size embeddings. The embeddings are then compared using some similarity measures. While bi-encoders are much faster at retrieval time (since document embeddings can be precomputed and stored in a vector database), they often sacrifice accuracy due to the lack of interaction between the query and document during encoding.

By joint encoding the query with each chunk, the model should capture fine-grained interactions between tokens in the query and the document, leading to accurate relevance judgments.

Let’s examine how our metrics change after applying reranking.

%%time
# Reranking with this token config requires 13minutes to complete all the tests, we can skip it and jump to the charts
k_thresh_tests_reranked = optimize_retriever(vector_store = hpc_store, qa_set = qa_set, max_k = 20, reranker_name = RERANKER)

2025-07-28 16:42:42.518987 - Testing k_threshold 1
2025-07-28 16:43:22.770098 - Testing k_threshold 2
2025-07-28 16:44:02.936232 - Testing k_threshold 3
2025-07-28 16:44:43.098456 - Testing k_threshold 4
2025-07-28 16:45:23.259636 - Testing k_threshold 5
2025-07-28 16:46:03.418688 - Testing k_threshold 6
2025-07-28 16:46:43.583112 - Testing k_threshold 7
2025-07-28 16:47:23.741345 - Testing k_threshold 8
2025-07-28 16:48:03.904720 - Testing k_threshold 9
2025-07-28 16:48:44.060776 - Testing k_threshold 10
2025-07-28 16:49:24.221430 - Testing k_threshold 11
2025-07-28 16:50:04.387229 - Testing k_threshold 12
2025-07-28 16:50:44.554619 - Testing k_threshold 13
2025-07-28 16:51:24.726631 - Testing k_threshold 14
2025-07-28 16:52:05.094475 - Testing k_threshold 15
2025-07-28 16:52:45.461225 - Testing k_threshold 16
2025-07-28 16:53:25.663023 - Testing k_threshold 17
2025-07-28 16:54:05.826152 - Testing k_threshold 18
2025-07-28 16:54:45.985537 - Testing k_threshold 19
2025-07-28 16:55:26.155932 - Testing k_threshold 20
CPU times: user 13min 35s, sys: 3.51 s, total: 13min 38s
Wall time: 13min 30s

display(k_thresh_tests_reranked)

plt.plot(k_thresh_tests_reranked["k"], k_thresh_tests_reranked["precision_k"], label = "precision_k")
plt.plot(k_thresh_tests_reranked["k"], k_thresh_tests_reranked["recall_k"], label = "recall_k")
plt.plot(k_thresh_tests_reranked["k"], k_thresh_tests_reranked["f1_k"], label = "f1_k")
plt.xlabel("k")
plt.title(f"K threshold tests")
plt.legend()
plt.show()

	k	precision_k	recall_k	f1_k
0	1	0.895522	0.519004	0.657153
1	2	0.574627	0.628471	0.600344
2	3	0.452736	0.692863	0.547633
3	4	0.373134	0.718776	0.491249
4	5	0.319403	0.742097	0.446591
5	6	0.281095	0.761745	0.410653
6	7	0.247335	0.765224	0.373838
7	8	0.222015	0.771698	0.344825
8	9	0.203980	0.782543	0.323608
9	10	0.186567	0.786033	0.301559
10	11	0.175034	0.796974	0.287030
11	12	0.162935	0.804934	0.271012
12	13	0.154994	0.815016	0.260457
13	14	0.148188	0.823261	0.251165
14	15	0.138308	0.823261	0.236829
15	16	0.131530	0.833709	0.227213
16	17	0.123793	0.833709	0.215576
17	18	0.118574	0.844903	0.207962
18	19	0.113119	0.846146	0.199559
19	20	0.108955	0.851800	0.193198

../../../../_images/90a5a05f22629f66a3c4f2837f66d68968c21ecc8b3a07c0fd825ba955eb2006.png

Let’s plot the difference in performance metrics between the ranked version and the non reranked version.

plt.plot(k_thresh_tests_reranked["k"], k_thresh_tests_reranked["precision_k"] - k_thresh_tests["precision_k"], label = "precision_k - diff")
plt.plot(k_thresh_tests_reranked["k"], k_thresh_tests_reranked["recall_k"] - k_thresh_tests["recall_k"], label = "recall_k - diff")
plt.plot(k_thresh_tests_reranked["k"], k_thresh_tests_reranked["f1_k"] - k_thresh_tests["f1_k"], label = "f1_k - diff")
plt.xlabel("k")
plt.title(f"K threshold tests (reranking improvement)")
plt.legend()
plt.show()

../../../../_images/3d93318f2e3e9ad85ecc224fc0c814e4c85686e0ed9b5e0c244a8f80f77134ea.png

Considering the lage majority of documents are short and many of them have 1 or two chunks at most, reranking does not appear to offer significant benefits for this particular document collection.
However, in scenarios involving multiple document sources and a larger set of relevant candidates, reranking becomes a highly effective technique for identifying and retaining only the most relevant chunks.

The final retriever

We are ready to test the complete solution.

class SemanticRetriever():
    def __init__(self, top_k:int, collection_name:str, chroma_path:str, embedder_name:str, reranker_name:str):
        self.reranker_name = reranker_name
        
        lc_embedder = HuggingFaceEmbeddings(model_name = embedder_name)
        vector_store = Chroma(collection_name = collection_name, embedding_function = lc_embedder, persist_directory = chroma_path)
        self.retriever = vector_store.as_retriever(search_kwargs = {"k":20})
        
        if self.reranker_name:
            reranker = HuggingFaceCrossEncoder(model_name=reranker_name)
            compressor = CrossEncoderReranker(model=reranker, top_n=top_k)
            self.compression_retriever = ContextualCompressionRetriever(base_compressor=compressor, base_retriever=self.retriever)

    def generate_answ(self, query:str, llm:ChatOpenAI):
        if self.reranker_name:
            retrieved_chunks:List[Document] = self.compression_retriever.invoke(query)
        else:
            retrieved_chunks:List[Document] = self.retriever.invoke(query)

        query = "[USER_QUERY]:\n\n" + query + "\n\n[RETRIEVED_RESOURCES]:\n\n" 
        for chunk in retrieved_chunks:
            page_content = chunk.page_content
            doc_name = chunk.metadata["doc_name"]
            query += "[DOCUMENT_TITLE]: " + doc_name + "\n[DOCUMENT_CONTENT]:" + page_content + "\n\n"
        return llm.stream([("system", "You are an helpful assistant, answer to the user questions in a precise and concise manner."),("human", query)])

semantic_retriever = SemanticRetriever(top_k = 3, collection_name = "hpc_wiki", chroma_path = chroma_path, embedder_name = EMBEDDER, reranker_name = RERANKER)

Let’s test the system

for question in test_questions:
    print(f"[QUESTION]: {question}")
    answ = semantic_retriever.generate_answ(question, llm)

    print("[ANSWER]: ", end="")
    for chunk in answ:
        print(chunk.content, end="")
    print("\n")

[QUESTION]: What GPUs are available on Leonardo?
[ANSWER]: The GPUs available on the Leonardo supercomputer are NVIDIA Ampere A100-64 accelerators, which are part of the **Booster** partition.

[QUESTION]: Is there any partition without gpus?
[ANSWER]: Yes, there is a partition without GPUs. The "lrd_all_serial" partition on the Leonardo system does not have GPUs. It is designed for serial jobs with a maximum of 4 cores per job and a walltime limit of 4 hours.

[QUESTION]: What GPUs are available on the Cloud?
[ANSWER]: The available GPUs on the Cloud are:

- Nvidia A30
- Nvidia L40s
- Nvidia H100 NVL

[QUESTION]: Can I associate a domain name to a vm?
[ANSWER]: Yes, you can associate a domain name to a virtual machine (VM). To do this, you need to request a DNS name association from CINECA by sending an email to superc@cineca.it. Here are the key points to follow:

1. **Reverse DNS (PTR Record)**: The reverse of the Floating IP must be set to the hostname of the VM. The naming convention is:
   - For external users: `<VM-name>.ext.cineca.it`
   - For CINECA staff: `<VM-name>.cineca.it`

2. **A Record**: The A record in the DNS will be set according to the hostname specified.

3. **CNAME Record**: If you need the service to be exposed with a different name, you can request a CNAME record with the chosen name. If no other information is provided, only the A record will be set.

Additionally, you can set up a CNAME with your preferred DNS provider. However, note that if the A record is set on an external DNS, the PTR record cannot be set in CINECA DNS.

[QUESTION]: What are the naming conventions I should follow when asking for a domain name for a vm machine?
[ANSWER]: When asking for a domain name for a VM machine, follow these naming conventions:

1. **For external users**: Use the format `<VM-name>.ext.cineca.it`.
2. **For CINECA staff**: Use the format `<VM-name>.cineca.it`.

Replace `<VM-name>` with a descriptive name for your virtual machine.

Additionally, ensure that:
- The reverse of the Floating IP (PTR record) matches the hostname of the VM.
- The record A in the DNS is set accordingly.
- If you need a different name for the service, request a CNAME with the desired name.

[QUESTION]: What is Cineca AI and how do I enable it?
[ANSWER]: **Cineca AI** is a collection of artificial intelligence packages optimized for Cineca's clusters. It includes packages like TensorFlow, PyTorch, and XGBoost, among others.

To enable Cineca AI, follow these steps:

1. **Load the deeplrn profile:**
   ```bash
   module load profile/deeplrn
   ```

2. **Check available versions of Cineca AI:**
   ```bash
   module av cineca-ai
   ```

3. **Load the desired version of Cineca AI:**
   ```bash
   module load cineca-ai/<version>
   ```

4. **Verify the installed packages:**
   ```bash
   python -m pip list
   ```

5. **Use a specific package:**
   ```bash
   python -c "import <package>"
   ```

If you need to install additional packages, you can create a personal virtual environment:

1. **Create a virtual environment:**
   ```bash
   module load profile/deeplrn
   module load cineca-ai/<version>
   python -m venv <myvenv> --system-site-packages
   ```

2. **Activate the virtual environment:**
   ```bash
   source <myvenv>/bin/activate
   ```

3. **Install additional packages:**
   ```bash
   pip install <package>
   ```

4. **Deactivate the virtual environment when done:**
   ```bash
   deactivate
   ```

[QUESTION]: What are the names of the QOS queues available on the Leonardo supercomputer BOOSTER partition?
[ANSWER]: The QOS queues available on the Leonardo supercomputer BOOSTER partition are not explicitly listed in the provided documents. However, the documents mention the partitions and some general information about job configuration. For specific QOS queues, you would typically need to refer to the official documentation or support resources provided by CINECA, the organization hosting the Leonardo supercomputer.

Improving the system

Some of the answers above are good, but the one on Leonardo’s QOS is imprecise, let’s check the chunks.

hpc_leonardo_docs = hpc_store.get(where={"doc_name": "leonardo.rst.txt"}, include = ["documents"])
hpc_leonardo_docs

{'ids': ['leonardo.rst.txt__0',
  'leonardo.rst.txt__2145',
  'leonardo.rst.txt__5018',
  'leonardo.rst.txt__7317',
  'leonardo.rst.txt__8013',
  'leonardo.rst.txt__12453',
  'leonardo.rst.txt__12671',
  'leonardo.rst.txt__12699',
  'leonardo.rst.txt__17224',
  'leonardo.rst.txt__18311',
  'leonardo.rst.txt__20840',
  'leonardo.rst.txt__22356',
  'leonardo.rst.txt__24686'],
 'embeddings': None,
 'documents': ['.. _leonardo_card:\n\nLeonardo\n========\n\nLeonardo is the *pre-exascale* Tier-0 supercomputer of the EuroHPC Joint Undertaking (JU), hosted by **CINECA** and currently located at the Bologna DAMA-Technopole in Italy.\nThis guide provides specific information about the **Leonardo** cluster, including details that differ from the general behavior described in the broader HPC Clusters section.\n\n.. |ico2| image:: img/leonardo_logo.png\n   :height: 55px\n   :class: no-scaled-link\n\nAccess to the System\n--------------------\n\nThe machine is reachable via ``ssh`` (secure Shell) protocol at hostname point: **login.leonardo.cineca.it**. \n\nThe connection is established, automatically, to one of the available login nodes. It is possible to connect to **Leonardo** using one the specific login hostname points:\n\n * login01-ext.leonardo.cineca.it\n * login02-ext.leonardo.cineca.it\n * login05-ext.leonardo.cineca.it\n * login07-ext.leonardo.cineca.it\n\n.. warning::\n    \n    **The mandatory access to Leonardo si the two-factor authetication (2FA)**. Get more information at section :ref:`general/access:Access to the Systems`.\n\nSystem Architecture\n-------------------\n\nThe cluster, supplied by EVIDEN ATOS, is based on two new specifically-designed compute blades, which are available throught two distinc Slurm partitios on the Cluster:\n\n* X2135 **GPU** blade based on NVIDIA Ampere A100-64 accelerators - **Booster** partition.\n* X2140 **CPU**-only blade based on Intel Sapphire Rapids processors - **Data Centric General Purpose (DCGP)** partition.\n\nThe overall system architecture uses NVIDIA Mellanox InfiniBand High Data Rate (HDR) connectivity, with smart in-network computing acceleration engines that enable extremely low latency and high data throughput to provide the highest AI and HPC application performance and scalability. \n\nThe **Booster** partition entered pre-production in May 2023 and moved to **full production in July 2023**.\nThe **DCGP** partition followed, starting pre-production in January 2024 and reaching **full production in February 2024**.\n\nHardware Details\n^^^^^^^^^^^^^^^^\n\n.. tab-set::\n\n    .. tab-item:: Booster',
  '.. list-table:: \n            :widths: 30 50\n            :header-rows: 1\n\n            * - **Type**\n              - **Specific**\n            * - Models\n              - Atos BullSequana X2135, Da Vinci single-node GPU\n            * - Racks\n              - 116\n            * - Nodes\n              - 3456\n            * - Processors/node\n              - 1x `Intel Ice Lake Intel Xeon Platinum 8358 <https://www.intel.com/content/www/us/en/products/sku/212282/intel-xeon-platinum-8358-processor-48m-cache-2-60-ghz/specifications.html>`_\n            * - CPU/node\n              - 32\n            * - Accelerators/node\n              - 4x `NVIDIA Ampere100 custom <https://doi.org/10.17815/jlsrf-8-186>`_, 64GiB HBM2e NVLink 3.0 (200 GB/s)\n            * - Local Storage/node (tmfs)\n              - (none)\n            * - RAM/node \n              - 512 GiB DDR4 3200 MHz\n            * - Rmax\n              - 241.2 PFlop/s (`top500 <https://www.top500.org/system/180128/>`_)\n            * - Internal Network\n              - 200 Gbps NVIDIA Mellanox HDR InfiniBand - Dragonfly+ Topology \n            * - Storage (raw capacity)\n              - 106 PiB based on DDN ES7990X and Hard Drive Disks (Capacity Tier) \n              \n                5.7 PiB based on DDN ES400NVX2 and Solid State Drives (Fast Tier)\n\n    .. tab-item:: DCGP\n\n        .. list-table::\n            :widths: 30 50\n            :header-rows: 1\n            \n            * - **Type**\n              - **Specific**\n            * - Models\n              - Atos BullSequana X2140 three-node CPU blade\n            * - Racks\n              - 22\n            * - Nodes\n              - 1536\n            * - Processors/node\n              - 2x `Intel Sapphire Rapids Intel Xeon Platinum 8480+ <https://www.intel.com/content/www/us/en/products/sku/231746/intel-xeon-platinum-8480-processor-105m-cache-2-00-ghz/specifications.html>`_\n            * - CPU/node\n              - 112 cores/node\n            * - Accelerators\n              - (none)\n            * - Local Storage/node (tmfs)\n              - 3 TiB\n            * - RAM/node\n              - 512(8x64) GiB DDR5 4800 MHz\n            * - Rmax\n              - 7.84 PFlop/s (`top500 <https://www.top500.org/system/180204/>`_)\n            * - Internal Network\n              - 200 Gbps NVIDIA Mellanox HDR InfiniBand - Dragonfly+ Topology\n            * - Storage (raw capacity)\n              - 106 PiB based on DDN ES7990X and Hard Drive Disks (Capacity Tier) \n              \n                5.7 PiB based on DDN ES400NVX2 and Solid State Drives (Fast Tier)\n\n\nFile Systems and Data Managment\n-------------------------------\n\nThe storage organization conforms to **CINECA** infrastructure. General information are reported in :ref:`hpc/hpc_data_storage:File Systems and Data Management` section. In the following, only differences with respect to general behavior are listed and explained.',
  ".. dropdown:: **$TMPDIR**\n\n * on the local SSD disks on login nodes (14 TB of capacity), mounted as ``/scratch_local`` (``TMPDIR=/scratch_local``). This is a shared area with no quota, remove all the files once they are not requested anymore. A cleaning procedure will be enforced in case of improper use of the area.   \n \n * on the local SSD disks on the serial node (``lrd_all_serial``, 14TB of capacity), managed via the Slurm ``job_container/tmpfs plugin``. This plugin provides a *job-specific*, private temporary file system space, with private instances of ``/tmp`` and ``/dev/shm`` in the job's user space (``TMPDIR=/tmp``, visible via the command ``df -h``), removed at the end of the serial job. You can request the resource via sbatch directive or srun option ``--gres=tmpfs:XX`` (for instance: ``--gres=tmpfs:200G``), with a maximum of 1 TB for the serial jobs. If not explicitly requested, the ``/tmp`` has the default dimension of 10 GB.\n \n * on the local SSD disks on DCGP nodes (3 TB  of capacity). As for the serial node, the local ``/tmp`` and ``/dev/shm`` areas are managed via plugin, which at the start of the jobs mounts private instances of ``/tmp`` and ``/dev/shm`` in the job's user space (``TMPDIR=/tmp``, visible via the command ``df -h /tmp``), and unmounts them at the end of the job (all data will be lost). You can request the resource via sbatch directive or srun option ``--gres=tmpfs:XX``, with a maximum of all the available 3 TB for DCGP nodes. As for the serial node, if not explicitly requested, the ``/tmp`` has the default dimension of 10 GB. Please note: for the DCGP jobs the requested amount of ``gres/tmpfs`` resource contributes to the consumed budget, changing the number of accounted equivalent core hours, see the dedicated section on the Accounting.\n \n * on RAM on the diskless booster nodes (with a fixed size of 10 GB, no increase is allowed, and the ``gres/tmpfs`` resource is disabled).\n\nJob Managing and Slurm Partitions \n---------------------------------\n\nIn the following table you can find informations about the Slurm partitions for **Booster** and **DCGP** partitions.  \n\n.. seealso:: \n  Further information about job submission are reported in the general section :ref:`hpc/hpc_scheduler:Scheduler and Job Submission`. \n\n.. tab-set::",
  '.. tab-item:: Booster\n\n        +----------------+--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n        | **Partition**  | **QOS**            | **#Cores/#GPU per job** | **Walltime** | **Max Nodes/cores/GPUs/user**   | **Priority** | **Notes**                           |\n        +================+====================+=========================+==============+=================================+==============+=====================================+\n        | lrd_all_serial | normal             | 4 cores                 | 04:00:00     | 1 node / 4 cores                | 40           | No GPUs',
  ', Hyperthreading x 2         |\n        |                |                    |                         |              |                                 |              |                                     |\n        | (**default**)  |                    | (8 logical cores)       |              | (30800 MB RAM)                  |              | **Budget Free**                     |\n        +----------------+--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n        | boost_usr_prod | normal             | 64 nodes                | 24:00:00     |                                 | 40           |                                     |\n        +                +--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n        |                | boost_qos_dbg      | 2 nodes                 | 00:30:00     | 2 nodes / 64 cores / 8 GPUs     | 80           |                                     |\n        +                +--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n        |                | boost_qos_bprod    | min = 65 nodes          | 24:00:00     | 256 nodes                       | 60           |                                     |\n        |                |                    |                         |              |                                 |              |                                     |\n        |                |                    | max = 256 nodes         |              |                                 |              |                                     |\n        +                +--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n        |                | boost_qos_lprod    | 3 nodes                 | 4-00:00:00   | 3 nodes / 12 GPUs               | 40           |                                     |\n        +----------------+--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n        | boost_fua_dbg  | normal             | 2 nodes                 | 00:10:00     | 2 nodes / 64 cores / 8 GPUs     | 40           | Runs on 2 nodes                     |\n        +----------------+--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n        | boost_fua_prod | normal             | 16 nodes                | 24:00:00     | 4 running jobs per user account | 40           |                                     |\n        |                |                    |                         |              |                                 |              |                                     |\n        |                |                    |                         |              | 32 nodes / 3584 cores           |              |                                     |\n        +                +--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n        |                | boost_qos_fuabprod | min = 17 nodes          | 24:00:00     | 32 nodes / 3584 cores           | 60           | Runs on 49 nodes                    |\n        |                |                    |                         |              |                                 |              |                                     |\n        |                |                    | max = 32 nodes          |              |                                 |              | Min is 17 FULL nodes                |\n        +                +--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n        |                | qos_fualowprio     | 16 nodes                | 08:00:00     |                                 | 0            |                                     |\n        +----------------+--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+',
  '.. note::\n\n          The partitions: **boost_fua_dbg, boost_fua_prod** can be exclusively used by Eurofusion users. For more information see the dedicated :ref:`specific_users/specific_users:Eurofusion` section.',
  '.. tab-item:: DCGP',
  '+----------------+--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n        | **Partition**  | **QOS**            | **#Cores/#GPU per job** | **Walltime** | **Max Nodes/cores/GPUs/user**        | **Priority** | **Notes**                           |\n        +================+====================+=========================+==============+======================================+==============+=====================================+\n        | lrd_all_serial | normal             | max = 4 cores           | 04:00:00     | 1 node / 4 cores                     | 40           | Hyperthreading x 2                  |\n        |                |                    |                         |              |                                      |              |                                     |\n        | (**default**)  |                    | (8 logical cores)       |              | (30800 MB RAM)                       |              | **Budget Free**                     |\n        +----------------+--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n        | dcgp_usr_prod  | normal             | 16 nodes                | 24:00:00     | 512 nodes per prj. account           | 40           |                                     |\n        +                +--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n        |                | dcgp_qos_dbg       | 2 nodes                 | 00:30:00     | 2 nodes / 224 cores per user account | 80           |                                     |\n        |                |                    |                         |              |                                      |              |                                     |\n        |                |                    |                         |              | 512 nodes per prj. account           |              |                                     |\n        +                +--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n        |                | dcgp_qos_bprod     | min = 17 nodes          | 24:00:00     | 128 nodes per user account           | 60           | GrpTRES = 1536 nodes                |\n        |                |                    |                         |              |                                      |              |                                     |\n        |                |                    | max = 128 nodes         |              | 512 nodes per prj. account           |              | Min is 17 FULL nodes                |\n        +                +--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n        |                | dcgp_qos_lprod     | 3 nodes                 | 4-00:00:00   | 3 nodes / 336 cores per user account | 40           |                                     |\n        |                |                    |                         |              |                                      |              |                                     |\n        |                |                    |                         |              | 512 nodes per prj. account           |              |                                     |\n        +----------------+--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n        | dcgp_fua_dbg   | normal             | 2 nodes                 | 00:10:00     | 2 nodes / 224 cores                  | 40           | Runs on 2 nodes                     |\n        +----------------+--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n        | dcgp_fua_prod  | normal             | 16 nodes                | 24:00:00     |                                      | 40           |                                     |\n        +                +--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+',
  '|                | dcgp_qos_fuabprod  | min = 17 nodes          | 24:00:00     | 64 nodes / 7168 cores                | 60           | Runs on 130 nodes                   |\n        |                |                    |                         |              |                                      |              |                                     |\n        |                |                    | max = 64 nodes          |              |                                      |              | Min is 17 FULL nodes                |\n        +                +--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n        |                | qos_fualowprio     | 16 nodes                | 08:00:00     |                                      | 0            |                                     |\n        +----------------+--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+',
  '.. note::\n\n          The partitions: **dcgp_fua_dbg, dcgp_fua_prod** can be exclusively used by Eurofusion users. For more information see the dedicated :ref:`specific_users/specific_users:Eurofusion` section.\n\nNetwork Architecture\n--------------------\n\n.. raw:: html\n\n  <p><strong>Leonardo</strong> features a state-of-the-art interconnect system tailored for high-performance computing (HPC). It delivers <em>low latency</em> and <em>high bandwidth</em> by leveraging <strong>NVIDIA Mellanox InfiniBand HDR</strong> (High Data Rate) technology, powered by <a href="https://nvdam.widen.net/s/zmbw7rdjml/infiniband-qm8700-datasheet-us-nvidia-1746790-r12-web">NVIDIA QUANTUM QM8700 Smart Switches</a>, and a <strong><a href="https://ieeexplore.ieee.org/document/7885210">Dragonfly+ topology</a></strong>. Below is an overview of its architecture and key features:</p>\n\n  <ul>\n    <li><strong>Hierarchical Cell Structure:</strong> The system is structured into multiple <em>cells</em>, each comprising a group of interconnected compute nodes.</li>\n\n    <li><strong>Inter-cell Connectivity:</strong> As illustrated in the figure below, cells are connected via an all-to-all topology. Each pair of distinct cells is linked by 18 independent connections, each passing through a dedicated Layer 2 (L2) switch. This design ensures high availability and reduces congestion.</li>\n\n    <li><strong>Intra-cell Topology:</strong> Inside each cell, a non-blocking two-layer fat-tree topology is used, allowing scalable and efficient intra-cell communication.</li>\n\n    <li><strong>System Composition:</strong>\n      <ul>\n        <li>19 cells dedicated to the <em>Booster</em> partition.</li>\n        <li>2 cells for the <em>DCGP</em> (Data-Centric General Purpose) partition.</li>\n        <li>1 hybrid cell with both accelerated (36 Booster nodes) and conventional (288 DCGP nodes) compute resources.</li>\n        <li>1 cell allocated for management, storage, and login services.</li>\n      </ul>\n    </li>\n\n    <li><strong>Adaptive Routing:</strong> The network employs adaptive routing, dynamically optimizing data paths to alleviate congestion and maintain performance under load.</li>\n  </ul>\n  \n.. figure:: img/leo-net-all2all.png\n   :height: 350px\n   :align: center\n   :class: no-scaled-link\n\n.. image:: img/spacer.png\n   :align: center\n   :class: no-scaled-link\n   \n.. dropdown:: Cell Configuration and Intra-cell Connectivity\n   :animate: fade-in-slide-down\n   :chevron: down-up\n\n   .. tab-set::\n\n      .. tab-item:: Booster',
  '.. raw:: html\n\n          <p>Each Booster cell is composed of:</p>\n          <ul>\n            <li><strong>6 Ã\x97 Atos BullSequana XH2000 racks</strong>, each containing:\n              <ul>\n                <li>3 Ã\x97 Level 2 (L2) switches</li>\n                <li>3 Ã\x97 Level 1 (L1) switches</li>\n                <li>30 compute nodes â\x80\x94 each equipped with 4 GPUs, each connected via a dedicated 100 Gbps port</li>\n              </ul>\n            </li>\n          </ul>\n\n          <p><strong>Total per Booster cell:</strong> 18 L2 switches, 18 L1 switches, and 180 compute nodes.</p>\n\n          <h4>Connectivity Overview</h4>\n\n          <p><strong>Level 2 (L2) Switches:</strong></p>\n          <ul>\n            <li><strong>UP:</strong> 22 Ã\x97 200 Gbps ports connecting to L2 switches in other cells</li>\n            <li><strong>DOWN:</strong> 18 Ã\x97 200 Gbps ports connecting to L1 switches within the cell</li>\n            <li><strong>Oversubscription:</strong> 0.8:1</li>\n          </ul>\n\n          <p><strong>Level 1 (L1) Switches:</strong></p>\n          <ul>\n            <li><strong>UP:</strong> 18 Ã\x97 200 Gbps ports connected to all L2 switches in the cell</li>\n            <li><strong>DOWN:</strong> 40 Ã\x97 100 Gbps ports connected to GPUs across 10 compute nodes</li>\n            <li><strong>Oversubscription:</strong> 1.11:1</li>\n          </ul>\n\n        .. figure:: img/leo-net-booster_cell.png\n          :height: 750px\n          :align: center\n      \n      .. tab-item:: DCGP\n        \n        .. raw:: html',
  '.. raw:: html\n\n          <p>Each DCGP cell is composed of:</p>\n          <ul>\n            <li><strong>8 Ã\x97 Atos BullSequana XH2000 racks</strong>, each containing:\n              <ul>\n                <li>3 or 0 Level 2 (L2) switches</li>\n                <li>2 Ã\x97 Level 1 (L1) switches</li>\n                <li>78 compute nodes â\x80\x94 each connected via a dedicated 100 Gbps port</li>\n              </ul>\n            </li>\n          </ul>\n\n          <p><strong>Total per DCGP cell:</strong> 18 L2 switches, 16 L1 switches, and 624 compute nodes.</p>\n\n          <h4>Connectivity Overview</h4>\n\n          <p><strong>Level 2 (L2) Switches:</strong></p>\n          <ul>\n            <li><strong>UP:</strong> 22 Ã\x97 200 Gbps ports connecting to L2 switches in other cells</li>\n            <li><strong>DOWN:</strong> 18 Ã\x97 200 Gbps ports connecting to L1 switches within the same cell</li>\n            <li><strong>Oversubscription ratio:</strong> 0.8:1</li>\n          </ul>\n\n          <p><strong>Level 1 (L1) Switches:</strong> (divided into two groups):</p>\n          <ul>\n            <li><strong>9 switches with 40 downlinks:</strong>\n              <ul>\n                <li>UP: 18 Ã\x97 200 Gbps ports connected to all L2 switches in the cell</li>\n                <li>DOWN: 40 Ã\x97 100 Gbps ports connect</strong>ed to compute nodes</li>\n                <li>Oversubscription ratio: 1.11:1</li>\n              </ul>\n            </li>\n            <li><strong>9 switches with 38 downlinks:</strong>\n              <ul>\n                <li>UP: 18 Ã\x97 200 Gbps ports connected to all L2 switches in the cell</li>\n                <li>DOWN: 38 Ã\x97 100 Gbps ports connected to compute nodes</li>\n                <li>Oversubscription ratio: 1.05:1</li>\n              </ul>\n            </li>\n          </ul>\n\n        .. figure:: img/leo-net-dcgp_cell.png\n          :height: 750px\n          :align: center\n\nAdvanced Information\n^^^^^^^^^^^^^^^^^^^^\n\n.. dropdown:: Network Topology - Map\n   :animate: fade-in-slide-down\n   :chevron: down-up\n\n    The topology is presented in a table format, where each row corresponds to a compute node. For each node, the table specifies the associated L1 switch and cell, providing a clear overview of the physical and logical network layout within the cluster.\n\n    :download:`Network Topology - Map <../files/ntopology.dat>`',
  '.. dropdown:: Network Topology - Distance Matrix\n   :animate: fade-in-slide-down\n   :chevron: down-up\n\n    The attached compressed CSV file contains the distance matrix of all compute nodes in the cluster. The matrix uses the following metric to represent the network distance between any two nodes:\n\n    * **0** â\x80\x93 Same nodes\n    * **1** â\x80\x93 Same L1 switch, same cell.\n    * **2** â\x80\x93 Different L1 switch, same cell.\n    * **3** â\x80\x93 Different L1 switch and different cell.\n    \n    This matrix can be used to analyze communication locality and optimize node selection for distributed workloads.\n\n    :download:`Distance Matrix <../files/ntopology-dst_mtx.tar.bz2>`\n\n.. dropdown:: Switch Naming Format\n   :animate: fade-in-slide-down\n   :chevron: down-up\n\n    .. code-block::\n      \n      isw<RRrrSS>\n\n    where ``<RRrrSS>`` is a 5- or 6-digits number varies based on the location and type of the switch.\n\n    Specifically:\n\n    * ``RR`` = region number (1 or 2 digits)\n    * ``rr`` = rack number (2 digits)\n    * ``SS`` = switch id (2 digits)\n\n    .. note::\n      If ``SS`` is an even number, it refers to an L1 switch; if it is an odd number, it refers to an L2 switch.\n\nDocuments\n---------\n\n* Article on Leonardo architecture and the technologies adopted for its GPU-accelerated partition: CINECA Supercomputing Centre, SuperComputing Applications and Innovation Department. (2024). â\x80\x9cLEONARDO: A Pan-European Pre-Exascale Supercomputer for HPC and AI applications.â\x80\x9d, Journal of large-scale research facilities, 8, A186. https://doi.org/10.17815/jlsrf-8-186\n* Details about new technologies included in the Witley platform with Intel Xeon Icelake contained in the Leonardo pre-exascale system (`link <https://urldefense.com/v3/__https://software.intel.com/content/www/us/en/develop/articles/third-generation-xeon-scalable-family-overview.html__;!!P1tgJ-3e!TrmMus5wzdLQ963vkc3yfy0BlhC1Hu8vOoce4SgltsTbkSSDrX2p1zTXPCIrpPm3$>`_)\n* Additional documents (`link <https://urldefense.com/v3/__https://software.intel.com/content/www/us/en/develop/articles/xeon-performance-tuning-and-solution-guides.html__;!!P1tgJ-3e!TrmMus5wzdLQ963vkc3yfy0BlhC1Hu8vOoce4SgltsTbkSSDrX2p1zTXPKZ5awkS$>`_)\n\nSome tuning guides for dedicated enviroments (ML/DL or HPC Clusters):\n\n* :download:`Tuning Guide <../files/Tuning_guide.pdf>`\n\n* :download:`Deep Learning <../files/Deep_learning.pdf>`'],
 'uris': None,
 'included': ['documents'],
 'data': None,
 'metadatas': None}

The problem is that for some specific questions we don’t have text associated to a chunk, but we only have “decontestualized” tables, which makes retrieving relevant chunks particularly difficult. Look at this section of our website:

Leonardo QOS

In fact if we tune our question a bit…

answ = semantic_retriever.generate_answ("What are the partitions available in Leonardo BOOST and their associated QOS?", llm)

for chunk in answ:
    print(chunk.content, end="")

### Partitions and Associated QOS in Leonardo BOOST

1. **Booster Partition (GPU)**
   - **Standard QOS**: `boost_fua_prod` (for EUROfusion users) or `boost_usr_prod` (for all users)
   - **Low-priority QOS**: `qos_fualowprio` (for EUROfusion users when budget is exhausted or when using `FUAL8_LOWPRIO` account)

2. **Data Centric General Purpose (DCGP) Partition (CPU)**
   - **Standard QOS**: `dcgp_fua_prod` (for EUROfusion users) or `dcgp_usr_prod` (for all users)
   - **Low-priority QOS**: `qos_fualowprio` (for EUROfusion users when budget is exhausted or when using `FUA38_LOWPRIO_0` account)

### Low-Priority Jobs
- **Budget Exhaustion**: Use `qos_fualowprio` QOS in `boost_fua_prod` and `dcgp_fua_prod` partitions.
- **Without Budget Exhaustion**: Request `FUAL8_LOWPRIO` (Booster) or `FUA38_LOWPRIO_0` (DCGP) accounts and use `qos_fualowprio` QOS.

### Example Submission Script for Low-Priority Jobs
```bash
#SBATCH --account=<YOUR Project Account or LOWPRIO Project Account>
#SBATCH --qos=qos_fualowprio
```

### Additional Information
- **Booster Partition**: Based on NVIDIA Ampere A100-64 accelerators.
- **DCGP Partition**: Based on Intel Sapphire Rapids processors.
- **Network**: NVIDIA Mellanox InfiniBand High Data Rate (HDR) connectivity.

A bit better, but the best relevant chunks still miss relevant context and they are not extracted… In fact, as we were saying before, they have NO RELEVANT context at all sorrounding them.

A combined approach

The simplest solution would be to combine semantic search with an exact or approximate matching approach. In this case, we aim to retrieve chunks that explicitly mention content related to QOS.

Let’s try a classic bag-of-words method. Here, we experiment with using BM25 to retrieve useful content. BM25 assumes that the more frequently a term appears in a document, the more relevant the document is to the query. The k in the formula ensures that this effect saturates, so that additional occurrences have diminishing impact. The score is normalized by document length to prevent longer documents from being favored. Additionally, rare terms are given more importance through the use of the IDF component.

$ BM25(Q, D) = \sum_{i=1}^{n} IDF(q_i) * \frac{f(q_i, D) * (k_1 + 1)}{f(q_i, D) + k_1 * (1 - b + b * \frac{| D |}{avgdl})}$

Where:

$Q$ is our query with keywords $q_1, ..., q_n$;
$D$ is a document in our document base;
$f(q_i, D)$ is the frequency of $q_i$ in document $D$;
$|D|$ is the document length;
$ avgdl $ is the average document length in the document collection;
$k_1$ and $b$ are hyperparams with values usually set as $ b = 2 $ and $ k_1 \in [1.2, 2]$. $k$ controls the therm frequency component, $b$ the document length component;
$IDF(q_i)$ is the inverse document frequency;

$IDF(q_i) = ln(\frac{N - n(q_i) + 0.5}{n(q_i) + 0.5} + 1)$

With:

$N$ total number of documents;
$n(q_i)$ total number of documents containing the term $q_i$

embedder_tokenizer = AutoTokenizer.from_pretrained(EMBEDDER)
tokenized_chunks = [embedder_tokenizer.tokenize(page_chunk, add_special_tokens = False) for page_chunk in hpc_leonardo_docs["documents"]]

bm_25_retriever = BM25Okapi(tokenized_chunks)

pd.options.display.max_colwidth = 5000

query = "What are the names of the QOS queues available on the Leonardo supercomputer BOOSTER partition?"
tokenized_query = embedder_tokenizer.tokenize(query, add_special_tokens = False)

scores = bm_25_retriever.get_scores(tokenized_query)
pd.DataFrame({"scores": scores, "documents": hpc_leonardo_docs["documents"]}).sort_values(by = "scores", ascending = False)

	scores	documents
0	13.961787	.. _leonardo_card:\n\nLeonardo\n========\n\nLeonardo is the pre-exascale Tier-0 supercomputer of the EuroHPC Joint Undertaking (JU), hosted by CINECA and currently located at the Bologna DAMA-Technopole in Italy.\nThis guide provides specific information about the Leonardo cluster, including details that differ from the general behavior described in the broader HPC Clusters section.\n\n.. \|ico2\| image:: img/leonardo_logo.png\n :height: 55px\n :class: no-scaled-link\n\nAccess to the System\n--------------------\n\nThe machine is reachable via ``ssh`` (secure Shell) protocol at hostname point: login.leonardo.cineca.it. \n\nThe connection is established, automatically, to one of the available login nodes. It is possible to connect to Leonardo using one the specific login hostname points:\n\n * login01-ext.leonardo.cineca.it\n * login02-ext.leonardo.cineca.it\n * login05-ext.leonardo.cineca.it\n * login07-ext.leonardo.cineca.it\n\n.. warning::\n \n The mandatory access to Leonardo si the two-factor authetication (2FA). Get more information at section :ref:`general/access:Access to the Systems`.\n\nSystem Architecture\n-------------------\n\nThe cluster, supplied by EVIDEN ATOS, is based on two new specifically-designed compute blades, which are available throught two distinc Slurm partitios on the Cluster:\n\n* X2135 GPU blade based on NVIDIA Ampere A100-64 accelerators - Booster partition.\n* X2140 CPU-only blade based on Intel Sapphire Rapids processors - Data Centric General Purpose (DCGP) partition.\n\nThe overall system architecture uses NVIDIA Mellanox InfiniBand High Data Rate (HDR) connectivity, with smart in-network computing acceleration engines that enable extremely low latency and high data throughput to provide the highest AI and HPC application performance and scalability. \n\nThe Booster partition entered pre-production in May 2023 and moved to full production in July 2023.\nThe DCGP partition followed, starting pre-production in January 2024 and reaching full production in February 2024.\n\nHardware Details\n^^^^^^^^^^^^^^^^\n\n.. tab-set::\n\n .. tab-item:: Booster
12	10.287257	.. dropdown:: Network Topology - Distance Matrix\n :animate: fade-in-slide-down\n :chevron: down-up\n\n The attached compressed CSV file contains the distance matrix of all compute nodes in the cluster. The matrix uses the following metric to represent the network distance between any two nodes:\n\n * 0 â Same nodes\n * 1 â Same L1 switch, same cell.\n * 2 â Different L1 switch, same cell.\n * 3 â Different L1 switch and different cell.\n \n This matrix can be used to analyze communication locality and optimize node selection for distributed workloads.\n\n :download:`Distance Matrix <../files/ntopology-dst_mtx.tar.bz2>`\n\n.. dropdown:: Switch Naming Format\n :animate: fade-in-slide-down\n :chevron: down-up\n\n .. code-block::\n \n isw<RRrrSS>\n\n where ``<RRrrSS>`` is a 5- or 6-digits number varies based on the location and type of the switch.\n\n Specifically:\n\n * ``RR`` = region number (1 or 2 digits)\n * ``rr`` = rack number (2 digits)\n * ``SS`` = switch id (2 digits)\n\n .. note::\n If ``SS`` is an even number, it refers to an L1 switch; if it is an odd number, it refers to an L2 switch.\n\nDocuments\n---------\n\n* Article on Leonardo architecture and the technologies adopted for its GPU-accelerated partition: CINECA Supercomputing Centre, SuperComputing Applications and Innovation Department. (2024). âLEONARDO: A Pan-European Pre-Exascale Supercomputer for HPC and AI applications.â, Journal of large-scale research facilities, 8, A186. https://doi.org/10.17815/jlsrf-8-186\n* Details about new technologies included in the Witley platform with Intel Xeon Icelake contained in the Leonardo pre-exascale system (`link <https://urldefense.com/v3/__https://software.intel.com/content/www/us/en/develop/articles/third-generation-xeon-scalable-family-overview.html__;!!P1tgJ-3e!TrmMus5wzdLQ963vkc3yfy0BlhC1Hu8vOoce4SgltsTbkSSDrX2p1zTXPCIrpPm3$>`_)\n* Additional documents (`link <https://urldefense.com/v3/__https://software.intel.com/content/www/us/en/develop/articles/xeon-performance-tuning-and-solution-guides.html__;!!P1tgJ-3e!TrmMus5wzdLQ963vkc3yfy0BlhC1Hu8vOoce4SgltsTbkSSDrX2p1zTXPKZ5awkS$>`_)\n\nSome tuning guides for dedicated enviroments (ML/DL or HPC Clusters):\n\n* :download:`Tuning Guide <../files/Tuning_guide.pdf>`\n\n* :download:`Deep Learning <../files/Deep_learning.pdf>`
2	7.239758	.. dropdown:: $TMPDIR\n\n * on the local SSD disks on login nodes (14 TB of capacity), mounted as ``/scratch_local`` (``TMPDIR=/scratch_local``). This is a shared area with no quota, remove all the files once they are not requested anymore. A cleaning procedure will be enforced in case of improper use of the area. \n \n * on the local SSD disks on the serial node (``lrd_all_serial``, 14TB of capacity), managed via the Slurm ``job_container/tmpfs plugin``. This plugin provides a job-specific, private temporary file system space, with private instances of ``/tmp`` and ``/dev/shm`` in the job's user space (``TMPDIR=/tmp``, visible via the command ``df -h``), removed at the end of the serial job. You can request the resource via sbatch directive or srun option ``--gres=tmpfs:XX`` (for instance: ``--gres=tmpfs:200G``), with a maximum of 1 TB for the serial jobs. If not explicitly requested, the ``/tmp`` has the default dimension of 10 GB.\n \n * on the local SSD disks on DCGP nodes (3 TB of capacity). As for the serial node, the local ``/tmp`` and ``/dev/shm`` areas are managed via plugin, which at the start of the jobs mounts private instances of ``/tmp`` and ``/dev/shm`` in the job's user space (``TMPDIR=/tmp``, visible via the command ``df -h /tmp``), and unmounts them at the end of the job (all data will be lost). You can request the resource via sbatch directive or srun option ``--gres=tmpfs:XX``, with a maximum of all the available 3 TB for DCGP nodes. As for the serial node, if not explicitly requested, the ``/tmp`` has the default dimension of 10 GB. Please note: for the DCGP jobs the requested amount of ``gres/tmpfs`` resource contributes to the consumed budget, changing the number of accounted equivalent core hours, see the dedicated section on the Accounting.\n \n * on RAM on the diskless booster nodes (with a fixed size of 10 GB, no increase is allowed, and the ``gres/tmpfs`` resource is disabled).\n\nJob Managing and Slurm Partitions \n---------------------------------\n\nIn the following table you can find informations about the Slurm partitions for Booster and DCGP partitions. \n\n.. seealso:: \n Further information about job submission are reported in the general section :ref:`hpc/hpc_scheduler:Scheduler and Job Submission`. \n\n.. tab-set::
9	4.020692	.. note::\n\n The partitions: dcgp_fua_dbg, dcgp_fua_prod can be exclusively used by Eurofusion users. For more information see the dedicated :ref:`specific_users/specific_users:Eurofusion` section.\n\nNetwork Architecture\n--------------------\n\n.. raw:: html\n\n <p><strong>Leonardo</strong> features a state-of-the-art interconnect system tailored for high-performance computing (HPC). It delivers <em>low latency</em> and <em>high bandwidth</em> by leveraging <strong>NVIDIA Mellanox InfiniBand HDR</strong> (High Data Rate) technology, powered by <a href="https://nvdam.widen.net/s/zmbw7rdjml/infiniband-qm8700-datasheet-us-nvidia-1746790-r12-web">NVIDIA QUANTUM QM8700 Smart Switches</a>, and a <strong><a href="https://ieeexplore.ieee.org/document/7885210">Dragonfly+ topology</a></strong>. Below is an overview of its architecture and key features:</p>\n\n <ul>\n <li><strong>Hierarchical Cell Structure:</strong> The system is structured into multiple <em>cells</em>, each comprising a group of interconnected compute nodes.</li>\n\n <li><strong>Inter-cell Connectivity:</strong> As illustrated in the figure below, cells are connected via an all-to-all topology. Each pair of distinct cells is linked by 18 independent connections, each passing through a dedicated Layer 2 (L2) switch. This design ensures high availability and reduces congestion.</li>\n\n <li><strong>Intra-cell Topology:</strong> Inside each cell, a non-blocking two-layer fat-tree topology is used, allowing scalable and efficient intra-cell communication.</li>\n\n <li><strong>System Composition:</strong>\n <ul>\n <li>19 cells dedicated to the <em>Booster</em> partition.</li>\n <li>2 cells for the <em>DCGP</em> (Data-Centric General Purpose) partition.</li>\n <li>1 hybrid cell with both accelerated (36 Booster nodes) and conventional (288 DCGP nodes) compute resources.</li>\n <li>1 cell allocated for management, storage, and login services.</li>\n </ul>\n </li>\n\n <li><strong>Adaptive Routing:</strong> The network employs adaptive routing, dynamically optimizing data paths to alleviate congestion and maintain performance under load.</li>\n </ul>\n \n.. figure:: img/leo-net-all2all.png\n :height: 350px\n :align: center\n :class: no-scaled-link\n\n.. image:: img/spacer.png\n :align: center\n :class: no-scaled-link\n \n.. dropdown:: Cell Configuration and Intra-cell Connectivity\n :animate: fade-in-slide-down\n :chevron: down-up\n\n .. tab-set::\n\n .. tab-item:: Booster
5	2.786747	.. note::\n\n The partitions: boost_fua_dbg, boost_fua_prod can be exclusively used by Eurofusion users. For more information see the dedicated :ref:`specific_users/specific_users:Eurofusion` section.
3	2.784774	.. tab-item:: Booster\n\n +----------------+--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n \| Partition \| QOS \| #Cores/#GPU per job \| Walltime \| Max Nodes/cores/GPUs/user \| Priority \| Notes \|\n +================+====================+=========================+==============+=================================+==============+=====================================+\n \| lrd_all_serial \| normal \| 4 cores \| 04:00:00 \| 1 node / 4 cores \| 40 \| No GPUs
1	2.641071	.. list-table:: \n :widths: 30 50\n :header-rows: 1\n\n * - Type\n - Specific\n * - Models\n - Atos BullSequana X2135, Da Vinci single-node GPU\n * - Racks\n - 116\n * - Nodes\n - 3456\n * - Processors/node\n - 1x `Intel Ice Lake Intel Xeon Platinum 8358 <https://www.intel.com/content/www/us/en/products/sku/212282/intel-xeon-platinum-8358-processor-48m-cache-2-60-ghz/specifications.html>`_\n * - CPU/node\n - 32\n * - Accelerators/node\n - 4x `NVIDIA Ampere100 custom <https://doi.org/10.17815/jlsrf-8-186>`_, 64GiB HBM2e NVLink 3.0 (200 GB/s)\n * - Local Storage/node (tmfs)\n - (none)\n * - RAM/node \n - 512 GiB DDR4 3200 MHz\n * - Rmax\n - 241.2 PFlop/s (`top500 <https://www.top500.org/system/180128/>`_)\n * - Internal Network\n - 200 Gbps NVIDIA Mellanox HDR InfiniBand - Dragonfly+ Topology \n * - Storage (raw capacity)\n - 106 PiB based on DDN ES7990X and Hard Drive Disks (Capacity Tier) \n \n 5.7 PiB based on DDN ES400NVX2 and Solid State Drives (Fast Tier)\n\n .. tab-item:: DCGP\n\n .. list-table::\n :widths: 30 50\n :header-rows: 1\n \n * - Type\n - Specific\n * - Models\n - Atos BullSequana X2140 three-node CPU blade\n * - Racks\n - 22\n * - Nodes\n - 1536\n * - Processors/node\n - 2x `Intel Sapphire Rapids Intel Xeon Platinum 8480+ <https://www.intel.com/content/www/us/en/products/sku/231746/intel-xeon-platinum-8480-processor-105m-cache-2-00-ghz/specifications.html>`_\n * - CPU/node\n - 112 cores/node\n * - Accelerators\n - (none)\n * - Local Storage/node (tmfs)\n - 3 TiB\n * - RAM/node\n - 512(8x64) GiB DDR5 4800 MHz\n * - Rmax\n - 7.84 PFlop/s (`top500 <https://www.top500.org/system/180204/>`_)\n * - Internal Network\n - 200 Gbps NVIDIA Mellanox HDR InfiniBand - Dragonfly+ Topology\n * - Storage (raw capacity)\n - 106 PiB based on DDN ES7990X and Hard Drive Disks (Capacity Tier) \n \n 5.7 PiB based on DDN ES400NVX2 and Solid State Drives (Fast Tier)\n\n\nFile Systems and Data Managment\n-------------------------------\n\nThe storage organization conforms to CINECA infrastructure. General information are reported in :ref:`hpc/hpc_data_storage:File Systems and Data Management` section. In the following, only differences with respect to general behavior are listed and explained.
11	2.631267	.. raw:: html\n\n <p>Each DCGP cell is composed of:</p>\n <ul>\n <li><strong>8 Ã Atos BullSequana XH2000 racks</strong>, each containing:\n <ul>\n <li>3 or 0 Level 2 (L2) switches</li>\n <li>2 Ã Level 1 (L1) switches</li>\n <li>78 compute nodes â each connected via a dedicated 100 Gbps port</li>\n </ul>\n </li>\n </ul>\n\n <p><strong>Total per DCGP cell:</strong> 18 L2 switches, 16 L1 switches, and 624 compute nodes.</p>\n\n <h4>Connectivity Overview</h4>\n\n <p><strong>Level 2 (L2) Switches:</strong></p>\n <ul>\n <li><strong>UP:</strong> 22 Ã 200 Gbps ports connecting to L2 switches in other cells</li>\n <li><strong>DOWN:</strong> 18 Ã 200 Gbps ports connecting to L1 switches within the same cell</li>\n <li><strong>Oversubscription ratio:</strong> 0.8:1</li>\n </ul>\n\n <p><strong>Level 1 (L1) Switches:</strong> (divided into two groups):</p>\n <ul>\n <li><strong>9 switches with 40 downlinks:</strong>\n <ul>\n <li>UP: 18 Ã 200 Gbps ports connected to all L2 switches in the cell</li>\n <li>DOWN: 40 Ã 100 Gbps ports connect</strong>ed to compute nodes</li>\n <li>Oversubscription ratio: 1.11:1</li>\n </ul>\n </li>\n <li><strong>9 switches with 38 downlinks:</strong>\n <ul>\n <li>UP: 18 Ã 200 Gbps ports connected to all L2 switches in the cell</li>\n <li>DOWN: 38 Ã 100 Gbps ports connected to compute nodes</li>\n <li>Oversubscription ratio: 1.05:1</li>\n </ul>\n </li>\n </ul>\n\n .. figure:: img/leo-net-dcgp_cell.png\n :height: 750px\n :align: center\n\nAdvanced Information\n^^^^^^^^^^^^^^^^^^^^\n\n.. dropdown:: Network Topology - Map\n :animate: fade-in-slide-down\n :chevron: down-up\n\n The topology is presented in a table format, where each row corresponds to a compute node. For each node, the table specifies the associated L1 switch and cell, providing a clear overview of the physical and logical network layout within the cluster.\n\n :download:`Network Topology - Map <../files/ntopology.dat>`
10	1.973369	.. raw:: html\n\n <p>Each Booster cell is composed of:</p>\n <ul>\n <li><strong>6 Ã Atos BullSequana XH2000 racks</strong>, each containing:\n <ul>\n <li>3 Ã Level 2 (L2) switches</li>\n <li>3 Ã Level 1 (L1) switches</li>\n <li>30 compute nodes â each equipped with 4 GPUs, each connected via a dedicated 100 Gbps port</li>\n </ul>\n </li>\n </ul>\n\n <p><strong>Total per Booster cell:</strong> 18 L2 switches, 18 L1 switches, and 180 compute nodes.</p>\n\n <h4>Connectivity Overview</h4>\n\n <p><strong>Level 2 (L2) Switches:</strong></p>\n <ul>\n <li><strong>UP:</strong> 22 Ã 200 Gbps ports connecting to L2 switches in other cells</li>\n <li><strong>DOWN:</strong> 18 Ã 200 Gbps ports connecting to L1 switches within the cell</li>\n <li><strong>Oversubscription:</strong> 0.8:1</li>\n </ul>\n\n <p><strong>Level 1 (L1) Switches:</strong></p>\n <ul>\n <li><strong>UP:</strong> 18 Ã 200 Gbps ports connected to all L2 switches in the cell</li>\n <li><strong>DOWN:</strong> 40 Ã 100 Gbps ports connected to GPUs across 10 compute nodes</li>\n <li><strong>Oversubscription:</strong> 1.11:1</li>\n </ul>\n\n .. figure:: img/leo-net-booster_cell.png\n :height: 750px\n :align: center\n \n .. tab-item:: DCGP\n \n .. raw:: html
7	1.916602	+----------------+--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n \| Partition \| QOS \| #Cores/#GPU per job \| Walltime \| Max Nodes/cores/GPUs/user \| Priority \| Notes \|\n +================+====================+=========================+==============+======================================+==============+=====================================+\n \| lrd_all_serial \| normal \| max = 4 cores \| 04:00:00 \| 1 node / 4 cores \| 40 \| Hyperthreading x 2 \|\n \| \| \| \| \| \| \| \|\n \| (default) \| \| (8 logical cores) \| \| (30800 MB RAM) \| \| Budget Free \|\n +----------------+--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n \| dcgp_usr_prod \| normal \| 16 nodes \| 24:00:00 \| 512 nodes per prj. account \| 40 \| \|\n + +--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n \| \| dcgp_qos_dbg \| 2 nodes \| 00:30:00 \| 2 nodes / 224 cores per user account \| 80 \| \|\n \| \| \| \| \| \| \| \|\n \| \| \| \| \| 512 nodes per prj. account \| \| \|\n + +--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n \| \| dcgp_qos_bprod \| min = 17 nodes \| 24:00:00 \| 128 nodes per user account \| 60 \| GrpTRES = 1536 nodes \|\n \| \| \| \| \| \| \| \|\n \| \| \| max = 128 nodes \| \| 512 nodes per prj. account \| \| Min is 17 FULL nodes \|\n + +--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n \| \| dcgp_qos_lprod \| 3 nodes \| 4-00:00:00 \| 3 nodes / 336 cores per user account \| 40 \| \|\n \| \| \| \| \| \| \| \|\n \| \| \| \| \| 512 nodes per prj. account \| \| \|\n +----------------+--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n \| dcgp_fua_dbg \| normal \| 2 nodes \| 00:10:00 \| 2 nodes / 224 cores \| 40 \| Runs on 2 nodes \|\n +----------------+--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n \| dcgp_fua_prod \| normal \| 16 nodes \| 24:00:00 \| \| 40 \| \|\n + +--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+
8	0.592530	\| \| dcgp_qos_fuabprod \| min = 17 nodes \| 24:00:00 \| 64 nodes / 7168 cores \| 60 \| Runs on 130 nodes \|\n \| \| \| \| \| \| \| \|\n \| \| \| max = 64 nodes \| \| \| \| Min is 17 FULL nodes \|\n + +--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n \| \| qos_fualowprio \| 16 nodes \| 08:00:00 \| \| 0 \| \|\n +----------------+--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+
4	0.527483	, Hyperthreading x 2 \|\n \| \| \| \| \| \| \| \|\n \| (default) \| \| (8 logical cores) \| \| (30800 MB RAM) \| \| Budget Free \|\n +----------------+--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n \| boost_usr_prod \| normal \| 64 nodes \| 24:00:00 \| \| 40 \| \|\n + +--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n \| \| boost_qos_dbg \| 2 nodes \| 00:30:00 \| 2 nodes / 64 cores / 8 GPUs \| 80 \| \|\n + +--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n \| \| boost_qos_bprod \| min = 65 nodes \| 24:00:00 \| 256 nodes \| 60 \| \|\n \| \| \| \| \| \| \| \|\n \| \| \| max = 256 nodes \| \| \| \| \|\n + +--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n \| \| boost_qos_lprod \| 3 nodes \| 4-00:00:00 \| 3 nodes / 12 GPUs \| 40 \| \|\n +----------------+--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n \| boost_fua_dbg \| normal \| 2 nodes \| 00:10:00 \| 2 nodes / 64 cores / 8 GPUs \| 40 \| Runs on 2 nodes \|\n +----------------+--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n \| boost_fua_prod \| normal \| 16 nodes \| 24:00:00 \| 4 running jobs per user account \| 40 \| \|\n \| \| \| \| \| \| \| \|\n \| \| \| \| \| 32 nodes / 3584 cores \| \| \|\n + +--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n \| \| boost_qos_fuabprod \| min = 17 nodes \| 24:00:00 \| 32 nodes / 3584 cores \| 60 \| Runs on 49 nodes \|\n \| \| \| \| \| \| \| \|\n \| \| \| max = 32 nodes \| \| \| \| Min is 17 FULL nodes \|\n + +--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n \| \| qos_fualowprio \| 16 nodes \| 08:00:00 \| \| 0 \| \|\n +----------------+--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+
6	0.000000	.. tab-item:: DCGP

Still not good for our use case.

Contextual retrieval

Contextual retrieval is a technique introduced by Anthropic which consist in enriching a text chunk with context before generating its embedding.

This context is produced by a large language model (LLM), which takes two inputs: the full document and the specific chunks that need contextualization. The model then returns a description of each chunk’s role within the overall document. The LLM generated description is prepended to the chunk before embedding. This will cause the embedding to capture not only local content but also higher-level contextual meaning.

# Straight from https://www.anthropic.com/news/contextual-retrieval
def generate_context(chunk_content:Document, full_document:Document, llm) -> str:
    """
        Returns a contextualized chunk.

        Arguments:
            - chunk_content: The chunk that must be contextualized;
            - full_document: The entire langchain document to be used for contextualization;
            - llm: The llm client used to create the context;


        Returns:
            str: A string containing the context of the chunk
    """
    contextual_retrieval_prompt = f"""[DOCUMENT]:
    {full_document.page_content}
    
    [DOCUMENT_CHUNK]:
    {chunk_content.page_content}

    [TASK]:
    Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else. 
    """

    chunk_context = llm.invoke([("system", "You are an helpful assistant, answer to the user questions in a precise and concise manner."),
                                ("human", contextual_retrieval_prompt)]).content
    return chunk_context

%%time
leonardo_doc = [document for document in documents if document.metadata["doc_name"] == "leonardo.rst.txt"][0]
leonardo_chunks = [document for document in chunks if document.metadata["doc_name"] == "leonardo.rst.txt"]

# Send up to 12 queries in parallel
with ThreadPoolExecutor(max_workers=12) as e:
    chunk_contexts = [*e.map(partial(generate_context, llm = llm, full_document = leonardo_doc), leonardo_chunks)]

chunk_contexts

CPU times: user 32.1 ms, sys: 3.95 ms, total: 36.1 ms
Wall time: 3.43 s

['This chunk is part of the "Hardware Details" section under "System Architecture" in the Leonardo supercomputer documentation. It provides specific details about the Booster partition, including the types of compute blades, processors, accelerators, and other hardware specifications.',
 'This chunk provides detailed hardware specifications for the two main partitions of the Leonardo supercomputer: the Booster partition, which is GPU-based, and the DCGP partition, which is CPU-based. It includes information on the models, number of racks and nodes, processors, accelerators, local storage, RAM, peak performance (Rmax), internal network, and storage capacity for each partition.',
 'This chunk is part of the "File Systems and Data Management" section of the Leonardo supercomputer documentation, specifically detailing the usage and characteristics of the `$TMPDIR` environment variable across different node types.',
 'This chunk is part of the "Job Managing and Slurm Partitions" section of the Leonardo supercomputer documentation. It specifically details the Slurm partitions available for the Booster partition, including information on Quality of Service (QOS), core/GPU allocation per job, walltime limits, maximum resources per user, priority, and additional notes.',
 'This chunk is part of the "Job Managing and Slurm Partitions" section of the Leonardo supercomputer documentation. It details the various Slurm partitions available for the Booster partition, including information on Quality of Service (QOS), core/GPU allocation per job, walltime limits, maximum nodes/cores/GPUs per user, priority, and specific notes for each partition.',
 'This chunk is part of the "Job Managing and Slurm Partitions" section of the Leonardo supercomputer documentation, specifically detailing the partitions available for the Booster partition.',
 'This chunk is part of the "Hardware Details" section under the "System Architecture" heading in the Leonardo supercomputer documentation. It provides specific details about the Data Centric General Purpose (DCGP) partition of the Leonardo cluster.',
 'This chunk is part of the "Job Managing and Slurm Partitions" section of the Leonardo supercomputer documentation, specifically detailing the Slurm partitions for the DCGP (Data Centric General Purpose) partition.',
 'This chunk is part of the "Job Managing and Slurm Partitions" section, specifically detailing the Slurm partitions for the DCGP (Data Centric General Purpose) partition of the Leonardo supercomputer. It provides information on the quality of service (QOS) settings, including the minimum and maximum number of nodes, walltime, and other constraints for different job types within the DCGP partition.',
 'This chunk is part of the "Network Architecture" section of the Leonardo supercomputer documentation. It describes the advanced interconnect system designed for high-performance computing (HPC), including details on the hierarchical cell structure, inter-cell connectivity, intra-cell topology, system composition, and adaptive routing.',
 'This chunk is part of the "Network Architecture" section of the Leonardo supercomputer documentation. It details the specific configuration and connectivity of the Booster partition\'s network topology, including the structure of each Booster cell, the number of switches, and the connectivity overview for both Level 2 (L2) and Level 1 (L1) switches.',
 'This chunk is part of the "Network Architecture" section of the Leonardo supercomputer documentation. It specifically details the configuration and connectivity of the Data Centric General Purpose (DCGP) partition cells, including the number of switches and compute nodes, as well as the oversubscription ratios for the Level 1 and Level 2 switches.',
 'This chunk is part of the "Network Architecture" section of the Leonardo supercomputer documentation, specifically detailing advanced information about the network topology and switch naming conventions.']

Let’s embed the chunks again and check the new similarity.

contextualized_chunks = []

for i in range(len(leonardo_chunks)):
    chunk_content = f"""[CONTEXT]:
    {chunk_contexts[i]}

    [CHUNK_CONTENT]:
    {leonardo_chunks[i].page_content}
    """

    contextualized_chunks.append(chunk_content)

contextualized_embeddings = embedder.encode(contextualized_chunks)

pd.options.display.max_colwidth = 5000
scores = cosine_similarity(embedder.encode(["What are the names of the QOS queues available on the Leonardo supercomputer BOOSTER partition?"]), contextualized_embeddings).tolist()[0]

pd.DataFrame({"scores": scores, "documents": [document.page_content for document in leonardo_chunks]}).sort_values(by = "scores", ascending = False)

	scores	documents
3	0.666583	.. tab-item:: Booster\n\n +----------------+--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n \| Partition \| QOS \| #Cores/#GPU per job \| Walltime \| Max Nodes/cores/GPUs/user \| Priority \| Notes \|\n +================+====================+=========================+==============+=================================+==============+=====================================+\n \| lrd_all_serial \| normal \| 4 cores \| 04:00:00 \| 1 node / 4 cores \| 40 \| No GPUs
4	0.648405	, Hyperthreading x 2 \|\n \| \| \| \| \| \| \| \|\n \| (default) \| \| (8 logical cores) \| \| (30800 MB RAM) \| \| Budget Free \|\n +----------------+--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n \| boost_usr_prod \| normal \| 64 nodes \| 24:00:00 \| \| 40 \| \|\n + +--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n \| \| boost_qos_dbg \| 2 nodes \| 00:30:00 \| 2 nodes / 64 cores / 8 GPUs \| 80 \| \|\n + +--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n \| \| boost_qos_bprod \| min = 65 nodes \| 24:00:00 \| 256 nodes \| 60 \| \|\n \| \| \| \| \| \| \| \|\n \| \| \| max = 256 nodes \| \| \| \| \|\n + +--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n \| \| boost_qos_lprod \| 3 nodes \| 4-00:00:00 \| 3 nodes / 12 GPUs \| 40 \| \|\n +----------------+--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n \| boost_fua_dbg \| normal \| 2 nodes \| 00:10:00 \| 2 nodes / 64 cores / 8 GPUs \| 40 \| Runs on 2 nodes \|\n +----------------+--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n \| boost_fua_prod \| normal \| 16 nodes \| 24:00:00 \| 4 running jobs per user account \| 40 \| \|\n \| \| \| \| \| \| \| \|\n \| \| \| \| \| 32 nodes / 3584 cores \| \| \|\n + +--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n \| \| boost_qos_fuabprod \| min = 17 nodes \| 24:00:00 \| 32 nodes / 3584 cores \| 60 \| Runs on 49 nodes \|\n \| \| \| \| \| \| \| \|\n \| \| \| max = 32 nodes \| \| \| \| Min is 17 FULL nodes \|\n + +--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+\n \| \| qos_fualowprio \| 16 nodes \| 08:00:00 \| \| 0 \| \|\n +----------------+--------------------+-------------------------+--------------+---------------------------------+--------------+-------------------------------------+
0	0.630339	.. _leonardo_card:\n\nLeonardo\n========\n\nLeonardo is the pre-exascale Tier-0 supercomputer of the EuroHPC Joint Undertaking (JU), hosted by CINECA and currently located at the Bologna DAMA-Technopole in Italy.\nThis guide provides specific information about the Leonardo cluster, including details that differ from the general behavior described in the broader HPC Clusters section.\n\n.. \|ico2\| image:: img/leonardo_logo.png\n :height: 55px\n :class: no-scaled-link\n\nAccess to the System\n--------------------\n\nThe machine is reachable via ``ssh`` (secure Shell) protocol at hostname point: login.leonardo.cineca.it. \n\nThe connection is established, automatically, to one of the available login nodes. It is possible to connect to Leonardo using one the specific login hostname points:\n\n * login01-ext.leonardo.cineca.it\n * login02-ext.leonardo.cineca.it\n * login05-ext.leonardo.cineca.it\n * login07-ext.leonardo.cineca.it\n\n.. warning::\n \n The mandatory access to Leonardo si the two-factor authetication (2FA). Get more information at section :ref:`general/access:Access to the Systems`.\n\nSystem Architecture\n-------------------\n\nThe cluster, supplied by EVIDEN ATOS, is based on two new specifically-designed compute blades, which are available throught two distinc Slurm partitios on the Cluster:\n\n* X2135 GPU blade based on NVIDIA Ampere A100-64 accelerators - Booster partition.\n* X2140 CPU-only blade based on Intel Sapphire Rapids processors - Data Centric General Purpose (DCGP) partition.\n\nThe overall system architecture uses NVIDIA Mellanox InfiniBand High Data Rate (HDR) connectivity, with smart in-network computing acceleration engines that enable extremely low latency and high data throughput to provide the highest AI and HPC application performance and scalability. \n\nThe Booster partition entered pre-production in May 2023 and moved to full production in July 2023.\nThe DCGP partition followed, starting pre-production in January 2024 and reaching full production in February 2024.\n\nHardware Details\n^^^^^^^^^^^^^^^^\n\n.. tab-set::\n\n .. tab-item:: Booster
1	0.621514	.. list-table:: \n :widths: 30 50\n :header-rows: 1\n\n * - Type\n - Specific\n * - Models\n - Atos BullSequana X2135, Da Vinci single-node GPU\n * - Racks\n - 116\n * - Nodes\n - 3456\n * - Processors/node\n - 1x `Intel Ice Lake Intel Xeon Platinum 8358 <https://www.intel.com/content/www/us/en/products/sku/212282/intel-xeon-platinum-8358-processor-48m-cache-2-60-ghz/specifications.html>`_\n * - CPU/node\n - 32\n * - Accelerators/node\n - 4x `NVIDIA Ampere100 custom <https://doi.org/10.17815/jlsrf-8-186>`_, 64GiB HBM2e NVLink 3.0 (200 GB/s)\n * - Local Storage/node (tmfs)\n - (none)\n * - RAM/node \n - 512 GiB DDR4 3200 MHz\n * - Rmax\n - 241.2 PFlop/s (`top500 <https://www.top500.org/system/180128/>`_)\n * - Internal Network\n - 200 Gbps NVIDIA Mellanox HDR InfiniBand - Dragonfly+ Topology \n * - Storage (raw capacity)\n - 106 PiB based on DDN ES7990X and Hard Drive Disks (Capacity Tier) \n \n 5.7 PiB based on DDN ES400NVX2 and Solid State Drives (Fast Tier)\n\n .. tab-item:: DCGP\n\n .. list-table::\n :widths: 30 50\n :header-rows: 1\n \n * - Type\n - Specific\n * - Models\n - Atos BullSequana X2140 three-node CPU blade\n * - Racks\n - 22\n * - Nodes\n - 1536\n * - Processors/node\n - 2x `Intel Sapphire Rapids Intel Xeon Platinum 8480+ <https://www.intel.com/content/www/us/en/products/sku/231746/intel-xeon-platinum-8480-processor-105m-cache-2-00-ghz/specifications.html>`_\n * - CPU/node\n - 112 cores/node\n * - Accelerators\n - (none)\n * - Local Storage/node (tmfs)\n - 3 TiB\n * - RAM/node\n - 512(8x64) GiB DDR5 4800 MHz\n * - Rmax\n - 7.84 PFlop/s (`top500 <https://www.top500.org/system/180204/>`_)\n * - Internal Network\n - 200 Gbps NVIDIA Mellanox HDR InfiniBand - Dragonfly+ Topology\n * - Storage (raw capacity)\n - 106 PiB based on DDN ES7990X and Hard Drive Disks (Capacity Tier) \n \n 5.7 PiB based on DDN ES400NVX2 and Solid State Drives (Fast Tier)\n\n\nFile Systems and Data Managment\n-------------------------------\n\nThe storage organization conforms to CINECA infrastructure. General information are reported in :ref:`hpc/hpc_data_storage:File Systems and Data Management` section. In the following, only differences with respect to general behavior are listed and explained.
5	0.618120	.. note::\n\n The partitions: boost_fua_dbg, boost_fua_prod can be exclusively used by Eurofusion users. For more information see the dedicated :ref:`specific_users/specific_users:Eurofusion` section.
10	0.616491	.. raw:: html\n\n <p>Each Booster cell is composed of:</p>\n <ul>\n <li><strong>6 Ã Atos BullSequana XH2000 racks</strong>, each containing:\n <ul>\n <li>3 Ã Level 2 (L2) switches</li>\n <li>3 Ã Level 1 (L1) switches</li>\n <li>30 compute nodes â each equipped with 4 GPUs, each connected via a dedicated 100 Gbps port</li>\n </ul>\n </li>\n </ul>\n\n <p><strong>Total per Booster cell:</strong> 18 L2 switches, 18 L1 switches, and 180 compute nodes.</p>\n\n <h4>Connectivity Overview</h4>\n\n <p><strong>Level 2 (L2) Switches:</strong></p>\n <ul>\n <li><strong>UP:</strong> 22 Ã 200 Gbps ports connecting to L2 switches in other cells</li>\n <li><strong>DOWN:</strong> 18 Ã 200 Gbps ports connecting to L1 switches within the cell</li>\n <li><strong>Oversubscription:</strong> 0.8:1</li>\n </ul>\n\n <p><strong>Level 1 (L1) Switches:</strong></p>\n <ul>\n <li><strong>UP:</strong> 18 Ã 200 Gbps ports connected to all L2 switches in the cell</li>\n <li><strong>DOWN:</strong> 40 Ã 100 Gbps ports connected to GPUs across 10 compute nodes</li>\n <li><strong>Oversubscription:</strong> 1.11:1</li>\n </ul>\n\n .. figure:: img/leo-net-booster_cell.png\n :height: 750px\n :align: center\n \n .. tab-item:: DCGP\n \n .. raw:: html
8	0.601556	\| \| dcgp_qos_fuabprod \| min = 17 nodes \| 24:00:00 \| 64 nodes / 7168 cores \| 60 \| Runs on 130 nodes \|\n \| \| \| \| \| \| \| \|\n \| \| \| max = 64 nodes \| \| \| \| Min is 17 FULL nodes \|\n + +--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n \| \| qos_fualowprio \| 16 nodes \| 08:00:00 \| \| 0 \| \|\n +----------------+--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+
7	0.555121	+----------------+--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n \| Partition \| QOS \| #Cores/#GPU per job \| Walltime \| Max Nodes/cores/GPUs/user \| Priority \| Notes \|\n +================+====================+=========================+==============+======================================+==============+=====================================+\n \| lrd_all_serial \| normal \| max = 4 cores \| 04:00:00 \| 1 node / 4 cores \| 40 \| Hyperthreading x 2 \|\n \| \| \| \| \| \| \| \|\n \| (default) \| \| (8 logical cores) \| \| (30800 MB RAM) \| \| Budget Free \|\n +----------------+--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n \| dcgp_usr_prod \| normal \| 16 nodes \| 24:00:00 \| 512 nodes per prj. account \| 40 \| \|\n + +--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n \| \| dcgp_qos_dbg \| 2 nodes \| 00:30:00 \| 2 nodes / 224 cores per user account \| 80 \| \|\n \| \| \| \| \| \| \| \|\n \| \| \| \| \| 512 nodes per prj. account \| \| \|\n + +--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n \| \| dcgp_qos_bprod \| min = 17 nodes \| 24:00:00 \| 128 nodes per user account \| 60 \| GrpTRES = 1536 nodes \|\n \| \| \| \| \| \| \| \|\n \| \| \| max = 128 nodes \| \| 512 nodes per prj. account \| \| Min is 17 FULL nodes \|\n + +--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n \| \| dcgp_qos_lprod \| 3 nodes \| 4-00:00:00 \| 3 nodes / 336 cores per user account \| 40 \| \|\n \| \| \| \| \| \| \| \|\n \| \| \| \| \| 512 nodes per prj. account \| \| \|\n +----------------+--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n \| dcgp_fua_dbg \| normal \| 2 nodes \| 00:10:00 \| 2 nodes / 224 cores \| 40 \| Runs on 2 nodes \|\n +----------------+--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+\n \| dcgp_fua_prod \| normal \| 16 nodes \| 24:00:00 \| \| 40 \| \|\n + +--------------------+-------------------------+--------------+--------------------------------------+--------------+-------------------------------------+
11	0.522800	.. raw:: html\n\n <p>Each DCGP cell is composed of:</p>\n <ul>\n <li><strong>8 Ã Atos BullSequana XH2000 racks</strong>, each containing:\n <ul>\n <li>3 or 0 Level 2 (L2) switches</li>\n <li>2 Ã Level 1 (L1) switches</li>\n <li>78 compute nodes â each connected via a dedicated 100 Gbps port</li>\n </ul>\n </li>\n </ul>\n\n <p><strong>Total per DCGP cell:</strong> 18 L2 switches, 16 L1 switches, and 624 compute nodes.</p>\n\n <h4>Connectivity Overview</h4>\n\n <p><strong>Level 2 (L2) Switches:</strong></p>\n <ul>\n <li><strong>UP:</strong> 22 Ã 200 Gbps ports connecting to L2 switches in other cells</li>\n <li><strong>DOWN:</strong> 18 Ã 200 Gbps ports connecting to L1 switches within the same cell</li>\n <li><strong>Oversubscription ratio:</strong> 0.8:1</li>\n </ul>\n\n <p><strong>Level 1 (L1) Switches:</strong> (divided into two groups):</p>\n <ul>\n <li><strong>9 switches with 40 downlinks:</strong>\n <ul>\n <li>UP: 18 Ã 200 Gbps ports connected to all L2 switches in the cell</li>\n <li>DOWN: 40 Ã 100 Gbps ports connect</strong>ed to compute nodes</li>\n <li>Oversubscription ratio: 1.11:1</li>\n </ul>\n </li>\n <li><strong>9 switches with 38 downlinks:</strong>\n <ul>\n <li>UP: 18 Ã 200 Gbps ports connected to all L2 switches in the cell</li>\n <li>DOWN: 38 Ã 100 Gbps ports connected to compute nodes</li>\n <li>Oversubscription ratio: 1.05:1</li>\n </ul>\n </li>\n </ul>\n\n .. figure:: img/leo-net-dcgp_cell.png\n :height: 750px\n :align: center\n\nAdvanced Information\n^^^^^^^^^^^^^^^^^^^^\n\n.. dropdown:: Network Topology - Map\n :animate: fade-in-slide-down\n :chevron: down-up\n\n The topology is presented in a table format, where each row corresponds to a compute node. For each node, the table specifies the associated L1 switch and cell, providing a clear overview of the physical and logical network layout within the cluster.\n\n :download:`Network Topology - Map <../files/ntopology.dat>`
6	0.519628	.. tab-item:: DCGP
2	0.505805	.. dropdown:: $TMPDIR\n\n * on the local SSD disks on login nodes (14 TB of capacity), mounted as ``/scratch_local`` (``TMPDIR=/scratch_local``). This is a shared area with no quota, remove all the files once they are not requested anymore. A cleaning procedure will be enforced in case of improper use of the area. \n \n * on the local SSD disks on the serial node (``lrd_all_serial``, 14TB of capacity), managed via the Slurm ``job_container/tmpfs plugin``. This plugin provides a job-specific, private temporary file system space, with private instances of ``/tmp`` and ``/dev/shm`` in the job's user space (``TMPDIR=/tmp``, visible via the command ``df -h``), removed at the end of the serial job. You can request the resource via sbatch directive or srun option ``--gres=tmpfs:XX`` (for instance: ``--gres=tmpfs:200G``), with a maximum of 1 TB for the serial jobs. If not explicitly requested, the ``/tmp`` has the default dimension of 10 GB.\n \n * on the local SSD disks on DCGP nodes (3 TB of capacity). As for the serial node, the local ``/tmp`` and ``/dev/shm`` areas are managed via plugin, which at the start of the jobs mounts private instances of ``/tmp`` and ``/dev/shm`` in the job's user space (``TMPDIR=/tmp``, visible via the command ``df -h /tmp``), and unmounts them at the end of the job (all data will be lost). You can request the resource via sbatch directive or srun option ``--gres=tmpfs:XX``, with a maximum of all the available 3 TB for DCGP nodes. As for the serial node, if not explicitly requested, the ``/tmp`` has the default dimension of 10 GB. Please note: for the DCGP jobs the requested amount of ``gres/tmpfs`` resource contributes to the consumed budget, changing the number of accounted equivalent core hours, see the dedicated section on the Accounting.\n \n * on RAM on the diskless booster nodes (with a fixed size of 10 GB, no increase is allowed, and the ``gres/tmpfs`` resource is disabled).\n\nJob Managing and Slurm Partitions \n---------------------------------\n\nIn the following table you can find informations about the Slurm partitions for Booster and DCGP partitions. \n\n.. seealso:: \n Further information about job submission are reported in the general section :ref:`hpc/hpc_scheduler:Scheduler and Job Submission`. \n\n.. tab-set::
9	0.499677	.. note::\n\n The partitions: dcgp_fua_dbg, dcgp_fua_prod can be exclusively used by Eurofusion users. For more information see the dedicated :ref:`specific_users/specific_users:Eurofusion` section.\n\nNetwork Architecture\n--------------------\n\n.. raw:: html\n\n <p><strong>Leonardo</strong> features a state-of-the-art interconnect system tailored for high-performance computing (HPC). It delivers <em>low latency</em> and <em>high bandwidth</em> by leveraging <strong>NVIDIA Mellanox InfiniBand HDR</strong> (High Data Rate) technology, powered by <a href="https://nvdam.widen.net/s/zmbw7rdjml/infiniband-qm8700-datasheet-us-nvidia-1746790-r12-web">NVIDIA QUANTUM QM8700 Smart Switches</a>, and a <strong><a href="https://ieeexplore.ieee.org/document/7885210">Dragonfly+ topology</a></strong>. Below is an overview of its architecture and key features:</p>\n\n <ul>\n <li><strong>Hierarchical Cell Structure:</strong> The system is structured into multiple <em>cells</em>, each comprising a group of interconnected compute nodes.</li>\n\n <li><strong>Inter-cell Connectivity:</strong> As illustrated in the figure below, cells are connected via an all-to-all topology. Each pair of distinct cells is linked by 18 independent connections, each passing through a dedicated Layer 2 (L2) switch. This design ensures high availability and reduces congestion.</li>\n\n <li><strong>Intra-cell Topology:</strong> Inside each cell, a non-blocking two-layer fat-tree topology is used, allowing scalable and efficient intra-cell communication.</li>\n\n <li><strong>System Composition:</strong>\n <ul>\n <li>19 cells dedicated to the <em>Booster</em> partition.</li>\n <li>2 cells for the <em>DCGP</em> (Data-Centric General Purpose) partition.</li>\n <li>1 hybrid cell with both accelerated (36 Booster nodes) and conventional (288 DCGP nodes) compute resources.</li>\n <li>1 cell allocated for management, storage, and login services.</li>\n </ul>\n </li>\n\n <li><strong>Adaptive Routing:</strong> The network employs adaptive routing, dynamically optimizing data paths to alleviate congestion and maintain performance under load.</li>\n </ul>\n \n.. figure:: img/leo-net-all2all.png\n :height: 350px\n :align: center\n :class: no-scaled-link\n\n.. image:: img/spacer.png\n :align: center\n :class: no-scaled-link\n \n.. dropdown:: Cell Configuration and Intra-cell Connectivity\n :animate: fade-in-slide-down\n :chevron: down-up\n\n .. tab-set::\n\n .. tab-item:: Booster
12	0.498936	.. dropdown:: Network Topology - Distance Matrix\n :animate: fade-in-slide-down\n :chevron: down-up\n\n The attached compressed CSV file contains the distance matrix of all compute nodes in the cluster. The matrix uses the following metric to represent the network distance between any two nodes:\n\n * 0 â Same nodes\n * 1 â Same L1 switch, same cell.\n * 2 â Different L1 switch, same cell.\n * 3 â Different L1 switch and different cell.\n \n This matrix can be used to analyze communication locality and optimize node selection for distributed workloads.\n\n :download:`Distance Matrix <../files/ntopology-dst_mtx.tar.bz2>`\n\n.. dropdown:: Switch Naming Format\n :animate: fade-in-slide-down\n :chevron: down-up\n\n .. code-block::\n \n isw<RRrrSS>\n\n where ``<RRrrSS>`` is a 5- or 6-digits number varies based on the location and type of the switch.\n\n Specifically:\n\n * ``RR`` = region number (1 or 2 digits)\n * ``rr`` = rack number (2 digits)\n * ``SS`` = switch id (2 digits)\n\n .. note::\n If ``SS`` is an even number, it refers to an L1 switch; if it is an odd number, it refers to an L2 switch.\n\nDocuments\n---------\n\n* Article on Leonardo architecture and the technologies adopted for its GPU-accelerated partition: CINECA Supercomputing Centre, SuperComputing Applications and Innovation Department. (2024). âLEONARDO: A Pan-European Pre-Exascale Supercomputer for HPC and AI applications.â, Journal of large-scale research facilities, 8, A186. https://doi.org/10.17815/jlsrf-8-186\n* Details about new technologies included in the Witley platform with Intel Xeon Icelake contained in the Leonardo pre-exascale system (`link <https://urldefense.com/v3/__https://software.intel.com/content/www/us/en/develop/articles/third-generation-xeon-scalable-family-overview.html__;!!P1tgJ-3e!TrmMus5wzdLQ963vkc3yfy0BlhC1Hu8vOoce4SgltsTbkSSDrX2p1zTXPCIrpPm3$>`_)\n* Additional documents (`link <https://urldefense.com/v3/__https://software.intel.com/content/www/us/en/develop/articles/xeon-performance-tuning-and-solution-guides.html__;!!P1tgJ-3e!TrmMus5wzdLQ963vkc3yfy0BlhC1Hu8vOoce4SgltsTbkSSDrX2p1zTXPKZ5awkS$>`_)\n\nSome tuning guides for dedicated enviroments (ML/DL or HPC Clusters):\n\n* :download:`Tuning Guide <../files/Tuning_guide.pdf>`\n\n* :download:`Deep Learning <../files/Deep_learning.pdf>`

The first chunks are now more relevant. Let’s re-embed our entire document base using context.

%%time
# Drop this db if exists
chroma_client = chromadb.PersistentClient(path = chroma_path)
try:
    chroma_client.delete_collection("hpc_contextualized_wiki")
except NotFoundError as e:
    pass

collection = chroma_client.create_collection(name = "hpc_contextualized_wiki")

# For each document, split it and generate contextualized chunks
for document in documents:
    document_contextualized_chunks = []
    
    document_chunks = rcts_chunker.split_documents([document])

    # You can save also the contexts as metadata in the vector db, it could be interesting...
    with ThreadPoolExecutor(max_workers=12) as e:
        chunk_contexts = [*e.map(partial(generate_context, llm = llm, full_document = document), document_chunks)]

    for i in range(len(document_chunks)):
        chunk_content = f"""[CONTEXT]:
        {chunk_contexts[i]}
    
        [CHUNK_CONTENT]:
        {document_chunks[i].page_content}
        """
    
        document_contextualized_chunks.append(chunk_content)
    # Embed the contextualized chunk content
    embeddings = embedder.encode(document_contextualized_chunks)
    
    # Add all chunks to the collection
    collection.add(documents = [doc.page_content for doc in document_chunks],
                   metadatas  = [doc.metadata for doc in document_chunks],
                   ids = [doc.metadata["doc_name"] + "__" + \
                          str(doc.metadata["start_index"]) for doc in document_chunks],
                   embeddings = embeddings)

CPU times: user 9.61 s, sys: 429 ms, total: 10 s
Wall time: 2min 2s

Let’s test whether the new database performs better than the old one using our test suite.

hpc_contextualized_store = Chroma(collection_name = "hpc_contextualized_wiki", embedding_function = lc_embedder, persist_directory= chroma_path)

k_thresh_tests_contextualized = optimize_retriever(vector_store = hpc_contextualized_store, qa_set = qa_set, max_k = 20, reranker_name = None)

2025-07-28 16:58:55.500175 - Testing k_threshold 1
2025-07-28 16:58:56.800131 - Testing k_threshold 2
2025-07-28 16:58:58.089317 - Testing k_threshold 3
2025-07-28 16:58:59.393307 - Testing k_threshold 4
2025-07-28 16:59:00.691047 - Testing k_threshold 5
2025-07-28 16:59:01.987567 - Testing k_threshold 6
2025-07-28 16:59:03.284972 - Testing k_threshold 7
2025-07-28 16:59:04.584109 - Testing k_threshold 8
2025-07-28 16:59:05.884367 - Testing k_threshold 9
2025-07-28 16:59:07.187672 - Testing k_threshold 10
2025-07-28 16:59:08.493321 - Testing k_threshold 11
2025-07-28 16:59:09.828080 - Testing k_threshold 12
2025-07-28 16:59:11.134806 - Testing k_threshold 13
2025-07-28 16:59:12.442254 - Testing k_threshold 14
2025-07-28 16:59:13.751521 - Testing k_threshold 15
2025-07-28 16:59:15.066636 - Testing k_threshold 16
2025-07-28 16:59:16.386172 - Testing k_threshold 17
2025-07-28 16:59:17.701877 - Testing k_threshold 18
2025-07-28 16:59:19.020208 - Testing k_threshold 19
2025-07-28 16:59:20.341073 - Testing k_threshold 20

display(k_thresh_tests_contextualized)

plt.plot(k_thresh_tests_contextualized["k"], k_thresh_tests_contextualized["precision_k"], label = "precision_k")
plt.plot(k_thresh_tests_contextualized["k"], k_thresh_tests_contextualized["recall_k"], label = "recall_k")
plt.plot(k_thresh_tests_contextualized["k"], k_thresh_tests_contextualized["f1_k"], label = "f1_k")
plt.xlabel("k")
plt.title(f"K threshold tests")
plt.legend()
plt.show()

	k	precision_k	recall_k	f1_k
0	1	0.970149	0.580838	0.726634
1	2	0.671642	0.723887	0.696786
2	3	0.512438	0.772284	0.616083
3	4	0.421642	0.809344	0.554439
4	5	0.361194	0.830378	0.503415
5	6	0.308458	0.833857	0.450331
6	7	0.275053	0.843199	0.414798
7	8	0.248134	0.850351	0.384168
8	9	0.227197	0.857803	0.359245
9	10	0.210448	0.865013	0.338534
10	11	0.192673	0.866369	0.315239
11	12	0.179104	0.870598	0.297090
12	13	0.166475	0.874330	0.279696
13	14	0.154584	0.874330	0.262719
14	15	0.147264	0.878587	0.252247
15	16	0.138060	0.878587	0.238623
16	17	0.130817	0.879266	0.227749
17	18	0.126036	0.883157	0.220592
18	19	0.120974	0.887020	0.212911
19	20	0.115672	0.887699	0.204673

../../../../_images/ad6bc3480493cedeea95038f091e658fdc84e0a224efcb8c961d776f52d74cca.png

plt.plot(k_thresh_tests_contextualized["k"], k_thresh_tests_contextualized["precision_k"] - k_thresh_tests["precision_k"], label = "precision_k - diff")
plt.plot(k_thresh_tests_contextualized["k"], k_thresh_tests_contextualized["recall_k"] - k_thresh_tests["recall_k"], label = "recall_k - diff")
plt.plot(k_thresh_tests_contextualized["k"], k_thresh_tests_contextualized["f1_k"] - k_thresh_tests["f1_k"], label = "f1_k - diff")
plt.xlabel("k")
plt.title(f"K threshold tests (Contextualization improvement)")
plt.legend()
plt.show()

../../../../_images/a18b65de6d27955e1c96f89fe23565e819df440bf0afd84dcefaffc196e8b9fd.png

semantic_retriever_context = SemanticRetriever(top_k = 3, collection_name = "hpc_contextualized_wiki", chroma_path = chroma_path, embedder_name = EMBEDDER, reranker_name = None)

answ = semantic_retriever_context.generate_answ("What are the partitions available in Leonardo BOOST and their associated QOS?", llm)

for chunk in answ:
    print(chunk.content, end="")

The partitions available in Leonardo BOOST and their associated QOS are as follows:

1. **boost_usr_prod**
   - **QOS**: normal
     - **#Cores/#GPU per job**: 64 nodes
     - **Walltime**: 24:00:00
     - **Max Nodes/cores/GPUs/user**: Not specified
     - **Priority**: 40
   - **QOS**: boost_qos_dbg
     - **#Cores/#GPU per job**: 2 nodes
     - **Walltime**: 00:30:00
     - **Max Nodes/cores/GPUs/user**: 2 nodes / 64 cores / 8 GPUs
     - **Priority**: 80
   - **QOS**: boost_qos_bprod
     - **#Cores/#GPU per job**: min = 65 nodes, max = 256 nodes
     - **Walltime**: 24:00:00
     - **Max Nodes/cores/GPUs/user**: 256 nodes
     - **Priority**: 60
   - **QOS**: boost_qos_lprod
     - **#Cores/#GPU per job**: 3 nodes
     - **Walltime**: 4-00:00:00
     - **Max Nodes/cores/GPUs/user**: 3 nodes / 12 GPUs
     - **Priority**: 40

2. **boost_fua_dbg**
   - **QOS**: normal
     - **#Cores/#GPU per job**: 2 nodes
     - **Walltime**: 00:10:00
     - **Max Nodes/cores/GPUs/user**: 2 nodes / 64 cores / 8 GPUs
     - **Priority**: 40
     - **Notes**: Runs on 2 nodes

3. **boost_fua_prod**
   - **QOS**: normal
     - **#Cores/#GPU per job**: 16 nodes
     - **Walltime**: 24:00:00
     - **Max Nodes/cores/GPUs/user**: 4 running jobs per user account, 32 nodes / 3584 cores
     - **Priority**: 40
   - **QOS**: boost_qos_fuabprod
     - **#Cores/#GPU per job**: min = 17 nodes, max = 32 nodes
     - **Walltime**: 24:00:00
     - **Max Nodes/cores/GPUs/user**: 32 nodes / 3584 cores
     - **Priority**: 60
     - **Notes**: Runs on 49 nodes, Min is 17 FULL nodes
   - **QOS**: qos_fualowprio
     - **#Cores/#GPU per job**: 16 nodes
     - **Walltime**: 08:00:00
     - **Max Nodes/cores/GPUs/user**: Not specified
     - **Priority**: 0

Now, that’s an answer…

for question in test_questions:
    print(f"[QUESTION]: {question}")
    answ = semantic_retriever_context.generate_answ(question, llm)

    print("[ANSWER]: ", end="")
    for chunk in answ:
        print(chunk.content, end="")
    print("\n")

[QUESTION]: What GPUs are available on Leonardo?
[ANSWER]: The Leonardo supercomputer features the following GPUs:

- **NVIDIA Ampere A100-64 accelerators** in the Booster partition. Each node in this partition has 4 NVIDIA Ampere A100 custom GPUs, each with 64 GiB HBM2e memory and NVLink 3.0 (200 GB/s) connectivity.

[QUESTION]: Is there any partition without gpus?
[ANSWER]: Yes, there are partitions without GPUs. Here are some examples:

1. **Galileo**:
   - `g100_all_serial` (default)
   - `g100_usr_dbg`
   - `g100_usr_prod`
   - `g100_usr_bmem`

2. **Leonardo**:
   - `lrd_all_serial` (default)
   - `dcgp_usr_prod`
   - `dcgp_fua_dbg`
   - `dcgp_fua_prod`

3. **Pitagora**:
   - `dcgp_usr_prod`
   - `dcgp_fua_dbg`
   - `dcgp_fua_prod`

These partitions are designed for CPU-only computations and do not include GPU resources.

[QUESTION]: What GPUs are available on the Cloud?
[ANSWER]: The available GPUs on the Cloud are:

- Nvidia A30
- Nvidia L40s
- Nvidia H100 NVL

[QUESTION]: Can I associate a domain name to a vm?
[ANSWER]: Yes, you can associate a domain name with a virtual machine (VM). To do this, you need to request a DNS name association from CINECA by sending an email to superc@cineca.it. Here are the key points to follow:

1. **Reverse DNS (PTR Record)**: The reverse of the Floating IP must be set to the hostname of the VM. The naming convention is:
   - For external users: `<VM-name>.ext.cineca.it`
   - For CINECA staff: `<VM-name>.cineca.it`

2. **Record A**: The DNS record A will be set accordingly.

3. **CNAME**: If you need the service to be exposed with a different name, you can request a CNAME with the chosen different name. If no other information is provided, only the record A will be set.

4. **External DNS**: You can also set up a CNAME with your DNS provider of choice.

5. **Restrictions**: It is not possible to set the PTR record in CINECA DNS if the record A has been set on an external DNS.

For detailed guidelines, refer to the DNS guidelines provided by CINECA.

[QUESTION]: What are the naming conventions I should follow when asking for a domain name for a vm machine?
[ANSWER]: When asking for a domain name for a VM machine, follow these naming conventions:

1. **Reverse of the Floating IP (PTR record)**:
   - For external users: `<VM-name>.ext.cineca.it`
   - For CINECA staff: `<VM-name>.cineca.it`

2. **Record A**: This will be set accordingly to the PTR record.

3. **CNAME**: If the service should be exposed with a different name, you can request a CNAME with the chosen different name. If no other information is provided, only the record A will be set.

4. **PTR Record**: It is not possible to set the PTR record in CINECA DNS if the record A has been set on an external DNS.

To request a DNS name, send an email to `superc@cineca.it` with the desired VM name and specify whether you are an external user or CINECA staff.

[QUESTION]: What is Cineca AI and how do I enable it?
[ANSWER]: Cineca AI is a collection of artificial intelligence packages optimized for Cineca's high-performance computing (HPC) clusters. These packages include popular AI frameworks like TensorFlow, PyTorch, and XGBoost, among others.

### How to Enable Cineca AI

To enable Cineca AI, follow these steps:

1. **Load the Cineca AI Module:**
   First, you need to load the Cineca AI module. This can be done using the `module load` command. The specific version of the module may vary, so you might need to check the available versions.

   ```bash
   module load profile/deeplrn
   module av cineca-ai
   module load cineca-ai/<version>
   ```

   Replace `<version>` with the specific version number you want to use.

2. **Verify the Installation:**
   After loading the module, you can verify the installation by listing the available Python packages.

   ```bash
   python -m pip list
   ```

3. **Use a Specific Package:**
   To use a specific package from the Cineca AI suite, you can import it in your Python script or interactive session.

   ```bash
   python -c "import <package>"
   ```

   Replace `<package>` with the name of the package you want to use.

4. **Create a Virtual Environment (Optional):**
   If you need to install additional packages or want to create an isolated environment, you can create a virtual environment.

   ```bash
   module load profile/deeplrn
   module av cineca-ai
   module load cineca-ai/<version>
   python -m venv my_env --system-site-packages
   source my_env/bin/activate
   pip install <package>
   ```

   Replace `<package>` with the name of the package you want to install.

5. **Deactivate the Virtual Environment:**
   Once you are done working in the virtual environment, you can deactivate it.

   ```bash
   deactivate
   ```

By following these steps, you should be able to enable and use Cineca AI on Cineca's HPC clusters.

[QUESTION]: What are the names of the QOS queues available on the Leonardo supercomputer BOOSTER partition?
[ANSWER]: The QOS queues available on the Leonardo supercomputer's BOOSTER partition are:

1. **normal**
2. **boost_qos_dbg**
3. **boost_qos_bprod**
4. **boost_qos_lprod**
5. **boost_qos_fuabprod**
6. **qos_fualowprio**

A few considerations about contextual retrieval

Production pipelines with many calls to LLMs become costly and slow, especially if you need to re-index your data frequently. An interesting approach here would be to generate context only for chunks containing tables. You can identify chunks containing only tables by using well-written regular expressions or by checking the entropy of the chunk.
Simple approaches (e.g., BM25) are better if they work well with your data.

Final remarks

When you create a RAG system, you don’t chunk just for the sake of chunking. You want to have a testing set to evaluate the performance of various approaches. Everything must be measured, and improvements must be quantified.

You also want a reference set of expert-made questions to quantify the performance of your system and identify which techniques produce the best results.

print(f"Total execution time {datetime.now() - t0}")

Total execution time 0:20:49.053584