Ray Tune

In this notebook we are going to explore the following topics:

  • What is hyperparameter tuning?

  • Core concepts to understand Ray Tune

  • Getting started with Ray Tune

  • Hyperparameter tuning using Ray (DL framework)

  • Hyperparameter tuning using schedulers (ASHA)

  • Bonus: Hyperparameter tuning using Ray (ML framework)

Reference: https://docs.ray.io/en/latest/tune/index.html

What is hyperparameter tuning?

Firstly, it’s important to understand what the hyperparameter tuning is and why it matters.

Hyperparameter tuning is the process of finding the best configuration for your Machine Learning model, like choosing the right “settings” to get the best performance.

Hyperparameters are the parameters that are not learned from the data but are set before the training process begins. These parameters control the behavior of the learning algorithm and the structure of the model.

Examples of hyperparameters:

  • Learning rate: determines how quickly the model adjusts its parameters during training.

  • Number of layers/neurons: specifies the architecture of a neural network.

  • Batch size: controls the number of samples processed before the model updates its parameters.

  • Number of trees: Ffr ensemble models like Random Forests, this indicates the number of trees in the forest.

  • Regularization parameters: controls overfitting by adding penalties (e.g., L1 or L2 regularization).

How does it work?

You train a model with different combinations of learning rate, number of layers, batch size, regularization parameters, … and you measure performance (accuracy, loss) each time. The goal is to find the combination that gives the best result.

However, trying every combination manually or with loops is slow and inefficient, especially for big models or large datasets.

What is Ray Tune?

Ray Tune is a scalable library for hyperparameter tuning and experiment execution. It is part of the Ray ecosystem and it allows users to optimize machine learning models by efficiently searching through hyperparameter spaces using various algorithms.

Ray Tune automates and accelerates hyperparameter tuning using parallel execution (running many trials simultaneously) and a smart search (using algorithms such as Bayesian, HyperOpt and ASHA to avoid wasting time on poor configurations).

Why use Ray Tune?

  • Scalability: easily scales from a single machine to a distributed cluster.

  • Flexibility: supports various search algorithms (e.g., Grid Search, Random Search, Bayesian Optimization, Hyperband).

  • Integration: works seamlessly with popular machine learning frameworks like TensorFlow, PyTorch, and scikit-learn.

Core concepts

Main concepts to understand Ray Tune:

  1. Search Space: defines the hyperparameters and their possible values that Ray Tune will explore during optimization. This could include parameters like learning rates, the number of layers in a neural network, or regularization strengths.This can be ranges, discrete options, or complex combinations.

  2. Trainable: the object that Ray Tune uses to run a trial. It defines how to evaluate an objective function by combining hyperparameters and the computational process to calculate the objective value to be optimized. It essentially encapsulates the logic for training a model and reporting intermediate results to Ray Tune. This could be a function or a class.

  3. Search Algorithm: the strategy for exploring the search space (e.g., random search, Bayesian optimization).

  4. Scheduler: algorithms that help to optimize resource usage during hyperparameter tuning, often by stopping poor-performing trials early.

  5. Trials: individual runs of the tuning process.

  6. ResultGrid: collects the results of your experiment and provides utilities to inspect and analyze them. You can retrieve the best trial, view metrics, and even fetch configurations.

Tune_flow

Fig1: Core components of Ray Tune

Now that we understand what hyperparameter tuning is and how Ray Tune works, let’s use Ray Tune to find the best settings for a simple model. You’ll see how easy it is to go from slow manual tuning to distributed, optimized search.

Getting started with Ray Tune

# Environment configuration
from datetime import datetime
import ray

# Ray Tune
from ray import tune
from ray.air import session
/leonardo_work/tra26_castiel2/mviscia1/ray_rag_venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
2026-01-29 12:42:58,384	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.
2026-01-29 12:42:59,676	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.
!which python
/leonardo_work/tra26_castiel2/mviscia1/ray_rag_venv/bin/python

Connecting to the Ray cluster started previously.

ray.init(log_to_driver = False, ignore_reinit_error = True)
2026-01-29 12:43:00,069	INFO worker.py:1520 -- Using address 10.1.0.82:27667 set in the environment variable RAY_ADDRESS
2026-01-29 12:43:00,070	INFO worker.py:1660 -- Connecting to existing Ray cluster at address: 10.1.0.82:27667...
2026-01-29 12:43:00,079	INFO worker.py:1843 -- Connected to Ray cluster. View the dashboard at http://10.1.0.82:8265 

Check cluster resources

resources = ray.cluster_resources()

print(f"Cluster has {resources['CPU']} CPUs, {resources['GPU'] if 'GPU' in resources else 0} GPUs, execution memory {resources['memory'] * 1e-9} GBs, object storage memory {resources['object_store_memory'] * 1e-9} GBs")
Cluster has 16.0 CPUs, 2.0 GPUs, execution memory 359.45678684200004 GBs, object storage memory 154.05290864600002 GBs

Before we jump into tuning Machine Learning models, let’s start with something simple and visual.

Imagine we want to find the point closest to (3, 3) from a list of candidates. This has nothing to do with training models, but it will help us focus on how Ray Tune works:

How do we define the trainable function?

How do we define the search space?

How do we evaluate each candidate?

How does Ray Tune find the best one?

  • Define all the ingredients to tune the hyperparameters

1) Define the trainable function:

The trainable is a flexible wrapper that can orchestrate processes like simulation, ML model training, or real-world optimization problems. Therefore, we define the model training function that we want to run variations of.

The trainable function:

  • Takes in input a dictionary with configurations of parameters from Ray Tune. In this example, the dictionary contains the values of x and y that Ray Tune selects for that trial.

  • Evaluates the model based on the objective function using the given parameters.

  • Reports the dictionary results back to Ray Tune. The trainable will be executed on a separate Ray Actor (process), so we need to communicate the performance of the model back to Tune (which is on the main Python process).

# Example: finding the closest point to (3,3)

# Define the trainable function
def trainable(config): # The argument config is a dictionary containing the input parameters that Ray Tune will vary during the optimization process
    
    # Extract parameters from the config dict
    x = config["x"]
    y = config["y"]
    
    # Objective function: minimize the distance from (3, 3)
    loss = (x - 3)**2 + (y - 3)**2
    
    # Report the result to Ray Tune
    session.report({'loss':loss}) #train.report

2) Define a search space:

The search space defines the parameter ranges for Ray Tune to explore. Ray supports various sampling techniques such as:

  • tune.uniform: uniform distribution.

  • tune.grid_search: exhaustive search over a grid.

  • tune.loguniform: logarithmic distribution.

  • tune.choice: choose one of these options uniformly

For this example, we’ll search for parameters x and y within a defined range.

# Define the search space
# This tells Ray Tune which values it should explore for each parameter.

search_space = {
    "x": tune.uniform(-10, 10),  # Ray will sample 'x' from a uniform distribution between -10 and 10
    "y": tune.uniform(-10, 10),  # Similarly, 'y' will be sampled from -10 to 10
}

3) Execute hyperparameters tuning and generate trials

Using Ray Tune, we explore the search space to find the best configuration.

  • The Tuner object coordinates the tuning process.

  • Specify the metric and mode to optimize (e.g., minimize loss).

  • Specify the number of trials using num_samples. Tune automatically determines how many trials will run in parallel.

# Run the tuning process
tuner = tune.Tuner(
    trainable, # Trainable function
    param_space=search_space,  # Search space
    tune_config=tune.TuneConfig(
        metric="loss", 
        mode="min",
        # search_alg=search_algorithm, # Include search algorithm
        num_samples=15,  # Number of trials to run ## Try with different values, what happens if you setnum_samples=50?
    )
)

results = tuner.fit() # Execute and manage hyperparameter tuning and generate your trials

# Retrieve the best trial
best_result = results.get_best_result(metric="loss", mode="min")
print("Best configuration:", best_result.config)
print("Minimum loss:", best_result.metrics["loss"])

Tune Status

Current time:2026-01-29 12:43:08
Running for: 00:00:05.37
Memory: 30.6/502.9 GiB

System Info

Using FIFO scheduling algorithm.
Logical resource usage: 2.0/16 CPUs, 0/2 GPUs (0.0/1.0 accelerator_type:A100)

Trial Status

Trial name status loc x y iter total time (s) loss
trainable_ab2a1_00000TERMINATED10.1.0.82:2605381-3.18218 -7.35987 1 0.00111055 145.546
trainable_ab2a1_00001TERMINATED10.1.0.82:2605397 7.80951 -2.88309 1 0.000756979 57.7421
trainable_ab2a1_00002TERMINATED10.1.0.82:2605390 4.15512 -7.6532 1 0.000977755114.825
trainable_ab2a1_00003TERMINATED10.1.0.82:2605400-6.63206 1.8459 1 0.000984907 94.1085
trainable_ab2a1_00004TERMINATED10.1.0.82:2605395 0.706565 9.76353 1 0.00207734 51.0051
trainable_ab2a1_00005TERMINATED10.1.0.82:2605396-5.762 -6.43398 1 0.00127625 165.773
trainable_ab2a1_00006TERMINATED10.1.0.82:2605391-7.1445 -6.92281 1 0.00159645 201.373
trainable_ab2a1_00007TERMINATED10.1.0.82:2605394-3.61454 4.41909 1 0.00181293 45.766
trainable_ab2a1_00008TERMINATED10.1.0.82:2605398 3.14046 -4.96974 1 0.000910521 63.5365
trainable_ab2a1_00009TERMINATED10.1.0.82:2605392 4.04479 5.08997 1 0.00116181 5.45955
trainable_ab2a1_00010TERMINATED10.1.0.82:2605427 6.30511 -4.17758 1 0.000923157 62.4415
trainable_ab2a1_00011TERMINATED10.1.0.82:2605393-6.90388 9.8905 1 0.00111842 145.566
trainable_ab2a1_00012TERMINATED10.1.0.82:2605409 9.64151 9.69836 1 0.000640392 88.9776
trainable_ab2a1_00013TERMINATED10.1.0.82:2605399-5.33 0.332167 1 0.00115252 76.5063
trainable_ab2a1_00014TERMINATED10.1.0.82:2605426 0.533074 2.39244 1 0.000982285 6.45485
2026-01-29 12:43:08,053	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/leonardo/home/userinternal/mviscia1/ray_results/trainable_2026-01-29_12-43-02' in 0.0422s.
2026-01-29 12:43:08,061	INFO tune.py:1041 -- Total run time: 5.39 seconds (5.33 seconds for the tuning loop).
Best configuration: {'x': 4.044788577836298, 'y': 5.089966972308293}
Minimum loss: 5.459545117716688

By default, Ray Tune runs N concurrent trials, where N equals the number of available CPUs (cores) on your machine (you can set the maximu number of concurrent trials by specifying the parameter max_concurrent_trials inside the Tuner).

If you need to customize the resource allocation per trial, you can use tune.with_resources. This allows you to explicitly specify the resources (e.g., CPUs, GPUs, or memory) that each trial requires. You can provide these resource requests either as a dictionary or using a PlacementGroupFactory object.

For every trial, Ray Tune will attempt to create a placement group based on the specified resource requirements, ensuring that your trials run with the necessary resources.

%%script false --no-raise-error

# Specify resources to be used for each trial (if you have 8 CPUs on your machine, this will run 4 concurrent trials at a time)
trainable_with_resources = tune.with_resources(trainable, {"cpu": 2})

# Run the tuning process
tuner = tune.Tuner(
    trainable_with_resources,
    tune_config=tune.TuneConfig(
        num_samples=10
    )
)

results = tuner.fit()
%%script false --no-raise-error

# Specify resources to be used for each trial (if you have 8 CPUs and 1 GPU on your machine, this will run 1 trial at a time)
trainable_with_resources = tune.with_resources(trainable, {"cpu": 2, "gpu":1})

# Run the tuning process
tuner = tune.Tuner(
    trainable_with_resources,
    tune_config=tune.TuneConfig(
        num_samples=10
    )
)

results = tuner.fit()

We can also specify the search algorithm as an input in the Tuner (default to random search). In this example we’re using HyperOptSearch, a search algorithm plugin for Ray Tune that wraps the Hyperopt optimization library, a widely-used library for Bayesian optimization. It helps Ray Tune to choose parameter values intelligently over time by learning from previous trials, rather than of just trying random configurations.

%%script false --no-raise-error

from ray.tune.search import ConcurrencyLimiter
from ray.tune.search.hyperopt import HyperOptSearch 

# Define the search algorithm
search_algorithm = HyperOptSearch(
    metric="loss",
    mode="min",
    n_initial_points=16
)

search_algorithm = ConcurrencyLimiter(search_algorithm, max_concurrent=4)


# Run the tuning process
tuner = tune.Tuner(
    trainable, # Trainable function
    param_space=search_space,  # Search space
    tune_config=tune.TuneConfig(
        search_alg=search_algorithm, # Include search algorithm
        num_samples=10,  # Number of trials to run,
    )
)

results = tuner.fit() # Execute and manage hyperparameter tuning and generate your trials

# Retrieve the best trial
best_result = results.get_best_result(metric="loss", mode="min")
print("Best configuration:", best_result.config)
print("Minimum loss:", best_result.metrics["loss"])

Note that HyperOptSearch has an internal parallelism constraint.

Hyperparameter tuning using Ray (DL framework)

In the following example, we’ll explore the use of Ray Tune for hyperparameter tuning in a Deep Learning framework. In particular, we’ll see how to implement a tuning process for a PyTorch Lightning classifier on a CIFAR-10 dataset.

The CIFAR-10 dataset consists of 60,000 32x32 color images in 10 classes (6,000 images per class). It is widely used in computer vision research. The 10 classes include airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck.

The neural network aims to classify the images into one of the 10 categories. It takes a 32x32 image as input, processes it through a sequence of layers, and outputs a probability distribution over the 10 classes. The network’s performance is evaluated using cross-entropy loss, a common metric for classification tasks.

# Ray Tune on ML framework
import time
import xgboost as xgb
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import root_mean_squared_error
from ray.tune.search.hyperopt import HyperOptSearch
import numpy as np

# Ray Tune on DL framework
import os
from ray.train.torch import TorchTrainer
from ray.tune.integration.pytorch_lightning import TuneReportCallback
from torchvision import transforms, datasets
from torch.utils.data import DataLoader, random_split
import pytorch_lightning as pl
import ray.train.lightning
import torch
import time
from torch import nn
from ray.tune.schedulers import ASHAScheduler
2026-01-29 12:43:12,714	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.
# Define the PyTorch Lightning model
class CIFAR10Classifier(pl.LightningModule):
    def __init__(self, hidden_size, lr):
        super().__init__()
        self.model = nn.Sequential(
            nn.Flatten(),
            nn.Linear(32 * 32 * 3, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, 10),
        )
        self.lr = lr

    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = nn.CrossEntropyLoss()(logits, y)
        self.log("train_loss", loss, on_step=False, on_epoch=True, prog_bar=True) #, sync_dist=True)
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=self.lr)

# Define the training function for Ray Tune
def train_func(train_loop_config):  # config
    config = train_loop_config # elimina
    # Data preparation
    transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
    data_dir = "/leonardo_scratch/fast/tra26_castiel2/data/data_cifar10"
    dataset = datasets.CIFAR10(root=data_dir, train=True, transform=transform, download=False)
    train_dataset, val_dataset = random_split(dataset, [45000, 5000])
    train_loader = DataLoader(train_dataset, batch_size=int(config["batch_size"]), shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=int(config["batch_size"]))

    # Define the model
    model = CIFAR10Classifier(hidden_size=config["hidden_size"], lr=config["lr"])
  
    # Define the PyTorch Lightning trainer
    trainer = pl.Trainer(
        devices="auto",
        accelerator="auto",
        strategy=ray.train.lightning.RayDDPStrategy(),
        max_epochs=10,
        plugins=[ray.train.lightning.RayLightningEnvironment()],
        callbacks=[ray.train.lightning.RayTrainReportCallback()],
        # [1a] Optionally, disable the default checkpointing behavior
        # in favor of the `RayTrainReportCallback` above.
        enable_checkpointing=False,
        logger=False,
        enable_progress_bar=True,
    )

    # Train the model
    trainer = ray.train.lightning.prepare_trainer(trainer)
    trainer.fit(model, train_loader, val_loader)
start_time =time.time()

checkpoint_dir = "/leonardo_scratch/large/userinternal/mviscia1/ray_checkpoint" ### Change to '/leonardo_scratch/large/userexternal/<your HPC username>'

scaling_config = ray.train.ScalingConfig(
    num_workers=2,              
    use_gpu=True,               
    resources_per_worker={      
        "CPU": 6,               
        "GPU": 1
    }
)

# Define a TorchTrainer without hyper-parameters for Tuner
ray_trainer = TorchTrainer(
    train_loop_per_worker=train_func, #train_func
    scaling_config=scaling_config,
    run_config=ray.train.RunConfig(storage_path=checkpoint_dir)
)

# Define the hyperparameter search space
search_space = {
    "hidden_size": tune.choice([128, 256, 512]),
    "lr": tune.loguniform(1e-4, 1e-1),
    "batch_size": tune.choice([32, 64, 128]),
}

# Trigger hyperparameter tuning with Ray Tune
tuner = tune.Tuner(
    ray_trainer, #tune.with_parameters(train_func),
    param_space={"train_loop_config": search_space}, #param_space=search_space,
    tune_config=tune.TuneConfig(
        metric="train_loss",
        mode="min",
        num_samples=4,  # Number of trials
        # max_concurrent_trials=2
    ),
)

results = tuner.fit()

end_time = time.time()
best_result = results.get_best_result(metric="train_loss", mode="min")
comp_time = end_time-start_time

Tune Status

Current time:2026-01-29 12:48:36
Running for: 00:05:18.23
Memory: 31.4/502.9 GiB

System Info

Using FIFO scheduling algorithm.
Logical resource usage: 13.0/16 CPUs, 2.0/2 GPUs (0.0/1.0 accelerator_type:A100)

Trial Status

Trial name status loc train_loop_config/ba tch_size train_loop_config/hi dden_size train_loop_config/lr iter total time (s) train_loss epoch step
TorchTrainer_b452e_00000TERMINATED10.1.0.82:2606482 32256 0.0159635 10 65.4758 4.85645 9 7040
TorchTrainer_b452e_00001TERMINATED10.1.0.82:2606872 32256 0.000937633 10 68.9653 0.971001 9 7040
TorchTrainer_b452e_00002TERMINATED10.1.0.82:2607201 32256 0.0493233 10 70.5082 15.02 9 7040
TorchTrainer_b452e_00003TERMINATED10.1.0.82:2607526128512 0.000523825 10 51.814 0.94413 9 1760
2026-01-29 12:48:36,276	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/leonardo_scratch/large/userinternal/mviscia1/ray_checkpoint/TorchTrainer_2026-01-29_12-43-18' in 0.0083s.
2026-01-29 12:48:36,281	INFO tune.py:1041 -- Total run time: 318.24 seconds (318.22 seconds for the tuning loop).
import pandas as pd
import matplotlib.pyplot as plt

all_metrics = []

for i, result in enumerate(results):
    df = result.metrics_dataframe.copy()
    df["trial_id"] = f"Trial_{i+1}"
    df["config"] = str(result.config)
    all_metrics.append(df)

all_results_df = pd.concat(all_metrics, ignore_index=True)

plt.figure(figsize=(10, 6))

# Group by trial and plot
for trial_id, trial_df in all_results_df.groupby("trial_id"):
    plt.plot(trial_df["training_iteration"], trial_df["train_loss"], label=trial_id)

plt.xlabel("Epoch")
plt.ylabel("Train Loss")
plt.title("Loss curves for all trials")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
../../../../_images/aa7bca05d1abc22229ef97c84e3ca35c1820c0e3f0556d2a691a500a913504a8.png
df.head()
train_loss epoch step timestamp checkpoint_dir_name should_checkpoint done training_iteration trial_id date ... time_total_s pid hostname node_ip time_since_restore iterations_since_restore config/train_loop_config/hidden_size config/train_loop_config/lr config/train_loop_config/batch_size config
0 1.640035 0 176 1769687279 checkpoint_000000 True False 1 Trial_4 2026-01-29_12-47-59 ... 16.496177 2607526 lrdn0082.leonardo.local 10.1.0.82 16.496177 1 512 0.000524 128 {'train_loop_config': {'hidden_size': 512, 'lr...
1 1.433246 1 352 1769687283 checkpoint_000001 True False 2 Trial_4 2026-01-29_12-48-03 ... 20.433736 2607526 lrdn0082.leonardo.local 10.1.0.82 20.433736 2 512 0.000524 128 {'train_loop_config': {'hidden_size': 512, 'lr...
2 1.339727 2 528 1769687287 checkpoint_000002 True False 3 Trial_4 2026-01-29_12-48-07 ... 24.356834 2607526 lrdn0082.leonardo.local 10.1.0.82 24.356834 3 512 0.000524 128 {'train_loop_config': {'hidden_size': 512, 'lr...
3 1.261192 3 704 1769687290 checkpoint_000003 True False 4 Trial_4 2026-01-29_12-48-11 ... 28.264959 2607526 lrdn0082.leonardo.local 10.1.0.82 28.264959 4 512 0.000524 128 {'train_loop_config': {'hidden_size': 512, 'lr...
4 1.191230 4 880 1769687294 checkpoint_000004 True False 5 Trial_4 2026-01-29_12-48-14 ... 32.200040 2607526 lrdn0082.leonardo.local 10.1.0.82 32.200040 5 512 0.000524 128 {'train_loop_config': {'hidden_size': 512, 'lr...

5 rows × 21 columns

print(f"Best Configuration: ", best_result.config)
print(f"Minimum loss: ", best_result.metrics["train_loss"])
print(f"Time Taken: {comp_time} seconds")
Best Configuration:  {'train_loop_config': {'hidden_size': 512, 'lr': 0.000523825235557825, 'batch_size': 128}}
Minimum loss:  0.9441303610801697
Time Taken: 318.2735798358917 seconds

Exercise

Change the the scaling configuration and check how the training scales. You can try the following scaling configurations:

  • 2 workers with 6 CPUs and 1 GPU each

  • 1 worker with 6 CPUs and 1 GPU

How many trails in parallel will run each of the two configuration?

Hyperparameter tuning using schedulers (ASHA)

Schedulers help manage trial execution by optimizing resource usage and terminating poorly performing trials early.

Common schedulers include:

  • ASHAScheduler: successive Halving Algorithm to prune trials.

  • MedianStoppingRule: stops trials if the performance falls below the median.

  • PopulationBasedTraining: mutates hyperparameters dynamically during training.

%%script false --no-raise-error

from ray.tune.schedulers import ASHAScheduler

# Define a scheduler
scheduler = ASHAScheduler(
    metric="loss",  # Metric to optimize
    mode="min",     # Minimizing the loss
    max_t=10,       # Maximum iterations
    grace_period=1, # Minimum iterations before stopping
    reduction_factor=2  # Factor for halving trials
)

# Define the search space
search_space = {
    "x": tune.uniform(-10, 10),  # Range for parameter x
    "y": tune.uniform(-10, 10),  # Range for parameter y
}

# Run tuning process with the scheduler
tuner_with_scheduler = tune.Tuner(
    trainable, # Trainable function
    param_space=search_space, # Search sapce
    tune_config=tune.TuneConfig(
        num_samples=50, # Number of trials to run
        scheduler=scheduler
    )
)

results_with_scheduler = tuner_with_scheduler.fit() # Execute and manage hyperparameter tuning and generate your trials

# Retrieve the best trial
best_result_with_scheduler = results_with_scheduler.get_best_result(metric="loss", mode="min")
print("Best Configuration with Scheduler:", best_result_with_scheduler.config)
print("Best Loss with Scheduler:", best_result_with_scheduler.metrics["loss"])

The ASHA (Asynchronous Successive Halving Algorithm) scheduler is a powerful tool to improve hyperparameter tuning efficiency. It works by early stopping trials that are not performing well, so computational resources are focused on the trials that are likely to succeed. ASHA operates asynchronously, meaning it doesn’t block other trials from running while it evaluates the progress of ongoing trials.

# Define the PyTorch Lightning model
class CIFAR10Classifier(pl.LightningModule):
    def __init__(self, hidden_size, lr):
        super().__init__()
        # Convolutional layers for feature extraction
        self.feature_extractor = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        # Fully connected layers for classification
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(64 * 8 * 8, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, 10),
        )
        self.lr = lr

    def forward(self, x):
        features = self.feature_extractor(x)
        logits = self.classifier(features)
        return logits

    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = nn.CrossEntropyLoss()(logits, y)
        self.log("train_loss", loss, on_step=True, on_epoch=True)
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=self.lr)

# Define the training function for Ray Tune
def train_func(config):
    
    # Data preparation
    transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
    data_dir = "/leonardo_scratch/fast/tra26_castiel2/data/data_cifar10"
    dataset = datasets.CIFAR10(root=data_dir, train=True, transform=transform, download=False)
    train_dataset, val_dataset = random_split(dataset, [45000, 5000])
    train_loader = DataLoader(train_dataset, batch_size=int(config["batch_size"]), shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=int(config["batch_size"]))

    # Define the model
    model = CIFAR10Classifier(hidden_size=config["hidden_size"], lr=config["lr"])
    
    # Define the PyTorch Lightning trainer
    trainer = pl.Trainer(
        devices="auto",
        accelerator="auto",
        strategy=ray.train.lightning.RayDDPStrategy(),
        max_epochs=20,
        plugins=[ray.train.lightning.RayLightningEnvironment()],
        callbacks=[ray.train.lightning.RayTrainReportCallback()],
        # [1a] Optionally, disable the default checkpointing behavior
        # in favor of the `RayTrainReportCallback` above.
        enable_checkpointing=False,
        logger=False,
        enable_progress_bar=True,
    )

    # Train the model
    trainer = ray.train.lightning.prepare_trainer(trainer)
    trainer.fit(model, train_loader, val_loader)
start_time = time.time()

scaling_config = ray.train.ScalingConfig(
        num_workers=2, use_gpu=True, resources_per_worker={"CPU": 6, "GPU": 1}
        )

# Define a TorchTrainer without hyper-parameters for Tuner
ray_trainer = TorchTrainer(
    train_func,
    scaling_config=scaling_config
)

# Define the hyperparameter search space
search_space = {
    "hidden_size": tune.choice([128, 256, 512]),
    "lr": tune.loguniform(1e-4, 1e-1),
    "batch_size": tune.choice([32, 64, 128]),
}

# The maximum training epochs
num_epochs = 20

scheduler = ASHAScheduler(max_t=num_epochs, grace_period=1, reduction_factor=2)

# Trigger hyperparameter tuning with Ray Tune
tuner = tune.Tuner(
    ray_trainer, #tune.with_parameters(train_func),
    param_space={"train_loop_config": search_space}, #param_space=search_space,
    tune_config=tune.TuneConfig(
        metric="train_loss",
        mode="min",
        num_samples=15,  # Number of trials
        scheduler=scheduler,
        max_concurrent_trials=2
    ),
)
results = tuner.fit()

end_time = time.time()
best_result = results.get_best_result(metric="train_loss", mode="min")
comp_time = end_time-start_time

Tune Status

Current time:2026-01-29 12:56:33
Running for: 00:06:43.91
Memory: 31.7/502.9 GiB

System Info

Using AsyncHyperBand: num_stopped=5
Bracket: Iter 16.000: -0.3668258339166641 | Iter 8.000: -0.6192266345024109 | Iter 4.000: -0.8897021412849426 | Iter 2.000: -0.9870593547821045 | Iter 1.000: -1.6667675375938416
Logical resource usage: 13.0/16 CPUs, 2.0/2 GPUs (0.0/1.0 accelerator_type:A100)

Trial Status

Trial name status loc train_loop_config/ba tch_size train_loop_config/hi dden_size train_loop_config/lr iter total time (s) train_loss train_loss_step train_loss_epoch
TorchTrainer_9dac4_00005RUNNING 10.1.0.82:2609608 64512 0.00346566 3 27.4821 0.816097 0.761981 0.816097
TorchTrainer_9dac4_00006PENDING 64512 0.00046246
TorchTrainer_9dac4_00000TERMINATED10.1.0.82:2607915128512 0.000260623 20 98.3711 0.482704 0.594142 0.482704
TorchTrainer_9dac4_00001TERMINATED10.1.0.82:2608275 64256 0.000205668 1 20.7985 1.66818 1.51793 1.66818
TorchTrainer_9dac4_00002TERMINATED10.1.0.82:2608579128128 0.0271474 1 16.1293 2.45891 2.30317 2.45891
TorchTrainer_9dac4_00003TERMINATED10.1.0.82:2608900128128 0.0450408 1 16.8405 3.40087 2.3093 3.40087
TorchTrainer_9dac4_00004TERMINATED10.1.0.82:2609227 32128 0.00235161 20 134.631 0.117854 7.29805e-05 0.117854
print(f"Best Configuration:", best_result.config)
print(f"Minimum loss: ", best_result.metrics["train_loss"])
print(f"Time Taken: {comp_time} seconds")
# Obtain a trial dataframe from all run trials of this `tune.run` call.
dfs = {result.path: result.metrics_dataframe for result in results}
# Plot by epoch
ax = None
for trial_id, d in dfs.items():
    ax = d.train_loss.plot(ax=ax, legend=trial_id)
    #ax.set_ylim(0,2)
plt.xlabel("Epoch")
plt.ylabel("Train Loss")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

Exercise

  • Try to set different resources per trial

  • Try to select different hyperparameters to tune

Bonus: Hyperparameter tuning using Ray (ML framework)

Let’s see an example of using Ray Tune to optimize hyperparameters for a machine learning model using XGBoost. XGBoost is a popular and efficient gradient boosting framework, and Ray Tune can help find the best hyperparameters for it.

What is XGBoost?

XGBoost (eXtreme Gradient Boosting) is a powerful and widely used machine learning algorithm, particularly effective for regression problems. It’s an implementation of gradient boosting that focuses on both performance and computational efficiency.

Gradient Boosting is an ensemble learning technique that builds a series of decision trees in a sequential manner, where each tree corrects the errors of its predecessor. In simple terms:

  • First Tree: The algorithm builds an initial tree based on the training data.

  • Subsequent Trees: Each following tree is trained to predict the residual errors of the previous trees, effectively correcting the mistakes made by the prior trees.

In the case of regression, the goal is to minimize the difference between predicted and actual values, typically using a loss function such as Mean Squared Error (MSE).

In the following example we’ll use the Wine Quality dataset from UCI, and we’ll perform a regression problem where the goal is to predict the quality of wine based on its physicochemical attributes.

We are going to look at three variations of the same exercise:

  1. Ray Tune specifying the search algorithm

  2. Ray Tune using a random search

  3. How does Ray Tune compare to Ray Core?

  1. Ray Tune specifying the search algorithm

# Load the dataset
data = load_wine()
X, X_test, y, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

# Define the training function (trainable)
def train_xgboost(config):
    # Split training data into train/validation sets
    X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.25, random_state=42)

    # Create DMatrix for XGBoost
    train_data = xgb.DMatrix(X_train, label=y_train)
    val_data = xgb.DMatrix(X_val, label=y_val)

    # Train the model
    params = {
        "objective": "reg:squarederror",
        "max_depth": int(config["max_depth"]),
        "learning_rate": config["learning_rate"],
        "subsample": config["subsample"],
        "colsample_bytree": config["colsample_bytree"],
        "eval_metric": "rmse",
    }
    model = xgb.train(
        params,
        train_data,
        evals=[(val_data, "validation")],
        num_boost_round=100,
        early_stopping_rounds=10,
        verbose_eval=False,
    )
    
    # Predict on validation set
    val_preds = model.predict(val_data)
    val_rmse = root_mean_squared_error(y_val, val_preds)
    
    # Report the validation RMSE to Ray Tune
    ray.train.report({'rmse':val_rmse})

np.random.seed(1234)
# Define the hyperparameter search space
search_space = {
    "max_depth": tune.randint(3, 10),
    "learning_rate": tune.loguniform(0.01, 0.3),
    "subsample": tune.uniform(0.5, 1.0),
    "colsample_bytree": tune.uniform(0.5, 1.0),
}

# Measure execution time for Ray Tune
start_time = time.time()

# Define the search algorithm
search_algorithm = HyperOptSearch(  
    metric="rmse",
    mode="min",
    random_state_seed=1234
)
# Run the tuning process
tuner = tune.Tuner(
    train_xgboost,
    param_space=search_space,
    tune_config=tune.TuneConfig(
        search_alg=search_algorithm,
        num_samples=50,
    )
)

results = tuner.fit()

end_time = time.time()

# Extract the best configuration
best_result = results.get_best_result(metric="rmse", mode="min")
best_config = best_result.config
best_rmse = best_result.metrics["rmse"]
comp_time = end_time-start_time
print("=== Ray Tune search algorithm ===")
print(f"Best Configuration:", best_config)
print(f"Best RMSE: {best_rmse}")
print(f"Time Taken: {comp_time}")
  1. Ray Tune using a random search

# Load the dataset
data = load_wine()
X, X_test, y, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

# Define the training function
def train_xgboost(config):
    # Split training data into train/validation sets
    X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.25, random_state=42)

    # Create DMatrix for XGBoost
    train_data = xgb.DMatrix(X_train, label=y_train)
    val_data = xgb.DMatrix(X_val, label=y_val)

    # Train the model
    params = {
        "objective": "reg:squarederror",
        "max_depth": int(config["max_depth"]),
        "learning_rate": config["learning_rate"],
        "subsample": config["subsample"],
        "colsample_bytree": config["colsample_bytree"],
        "eval_metric": "rmse",
    }
    model = xgb.train(
        params,
        train_data,
        evals=[(val_data, "validation")],
        num_boost_round=100,
        early_stopping_rounds=10,
        verbose_eval=False,
    )
    
    # Predict on validation set
    val_preds = model.predict(val_data)
    val_rmse = root_mean_squared_error(y_val, val_preds)
    
    # Report the validation RMSE to Ray Tune
    ray.train.report({'rmse':val_rmse})

np.random.seed(1234)
# Define the hyperparameter search space
search_space = {
    "max_depth": tune.randint(3, 10),
    "learning_rate": tune.loguniform(0.01, 0.3),
    "subsample": tune.uniform(0.5, 1.0),
    "colsample_bytree": tune.uniform(0.5, 1.0),
}

# Measure execution time for Ray Tune
start_time = time.time()
 
np.random.seed(1234)
# Run Ray Tune
tuner = tune.Tuner(
    train_xgboost,
    param_space=search_space,
    tune_config=tune.TuneConfig(
        metric="rmse",
        mode="min",
        num_samples=50,
    ),
)

results = tuner.fit()

end_time = time.time()

# Extract the best configuration
best_result = results.get_best_result(metric="rmse", mode="min")
best_config = best_result.config
best_rmse = best_result.metrics["rmse"]
comp_time = end_time-start_time    
print("=== Ray Tune random search ===")
print(f"Best Configuration:", best_config)
print(f"Best RMSE: {best_rmse}")
print(f"Time Taken: {comp_time}")
  1. How does Ray Tune compare to Ray Core?

# Load the dataset
data = load_wine()
X, X_test, y, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

# Define the training function
@ray.remote
def train_xgboost(config):
    # Split training data into train/validation sets
    X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.25, random_state=42)

    # Create DMatrix for XGBoost
    train_data = xgb.DMatrix(X_train, label=y_train)
    val_data = xgb.DMatrix(X_val, label=y_val)

    # Train the model
    params = {
        "objective": "reg:squarederror",
        "max_depth": int(config["max_depth"]),
        "learning_rate": config["learning_rate"],
        "subsample": config["subsample"],
        "colsample_bytree": config["colsample_bytree"],
        "eval_metric": "rmse",
    }
    model = xgb.train(
        params,
        train_data,
        evals=[(val_data, "validation")],
        num_boost_round=100,
        early_stopping_rounds=10,
        verbose_eval=False,
    )
    
    # Predict on validation set
    val_preds = model.predict(val_data)
    val_rmse = root_mean_squared_error(y_val, val_preds)
    
    # Return the validation RMSE
    return {"config": config, "rmse": val_rmse}

# Define the hyperparameter configurations manually
configs = [
    {"max_depth": d, "learning_rate": lr, "subsample": ss, "colsample_bytree": cs}
    for d in range(3, 10)
    for lr in [0.01, 0.05, 0.1, 0.2]
    for ss in [0.6, 0.8, 1.0]
    for cs in [0.6, 0.8, 1.0]
]

# Measure execution time for Ray Core
start_time = time.time()

# Submit tasks to Ray
futures = [train_xgboost.remote(config) for config in configs]
results = ray.get(futures)

end_time = time.time()

# Find the best result
best_result = min(results, key=lambda x: x["rmse"])
best_config = best_result["config"]
best_rmse = best_result["rmse"]

print("=== Ray Core ===")
print("Best Configuration:", best_config)
print(f"Best RMSE: {best_rmse:.4f}")
print(f"Time Taken: {end_time - start_time:.2f} seconds")

Release resources

#Disconnect the worker, and terminate processes started by ray.init()
ray.shutdown()