Ray Tune

In this notebook we are going to explore the following topics:

What is hyperparameter tuning?
Core concepts to understand Ray Tune
Getting started with Ray Tune
Hyperparameter tuning using Ray (DL framework)
Hyperparameter tuning using schedulers (ASHA)
Bonus: Hyperparameter tuning using Ray (ML framework)

Reference: https://docs.ray.io/en/latest/tune/index.html

What is hyperparameter tuning?

Firstly, it’s important to understand what the hyperparameter tuning is and why it matters.

Hyperparameter tuning is the process of finding the best configuration for your Machine Learning model, like choosing the right “settings” to get the best performance.

Hyperparameters are the parameters that are not learned from the data but are set before the training process begins. These parameters control the behavior of the learning algorithm and the structure of the model.

Examples of hyperparameters:

Learning rate: determines how quickly the model adjusts its parameters during training.
Number of layers/neurons: specifies the architecture of a neural network.
Batch size: controls the number of samples processed before the model updates its parameters.
Number of trees: Ffr ensemble models like Random Forests, this indicates the number of trees in the forest.
Regularization parameters: controls overfitting by adding penalties (e.g., L1 or L2 regularization).

How does it work?

You train a model with different combinations of learning rate, number of layers, batch size, regularization parameters, … and you measure performance (accuracy, loss) each time. The goal is to find the combination that gives the best result.

However, trying every combination manually or with loops is slow and inefficient, especially for big models or large datasets.

What is Ray Tune?

Ray Tune is a scalable library for hyperparameter tuning and experiment execution. It is part of the Ray ecosystem and it allows users to optimize machine learning models by efficiently searching through hyperparameter spaces using various algorithms.

Ray Tune automates and accelerates hyperparameter tuning using parallel execution (running many trials simultaneously) and a smart search (using algorithms such as Bayesian, HyperOpt and ASHA to avoid wasting time on poor configurations).

Why use Ray Tune?

Scalability: easily scales from a single machine to a distributed cluster.
Flexibility: supports various search algorithms (e.g., Grid Search, Random Search, Bayesian Optimization, Hyperband).
Integration: works seamlessly with popular machine learning frameworks like TensorFlow, PyTorch, and scikit-learn.

Core concepts

Main concepts to understand Ray Tune:

Search Space: defines the hyperparameters and their possible values that Ray Tune will explore during optimization. This could include parameters like learning rates, the number of layers in a neural network, or regularization strengths.This can be ranges, discrete options, or complex combinations.
Trainable: the object that Ray Tune uses to run a trial. It defines how to evaluate an objective function by combining hyperparameters and the computational process to calculate the objective value to be optimized. It essentially encapsulates the logic for training a model and reporting intermediate results to Ray Tune. This could be a function or a class.
Search Algorithm: the strategy for exploring the search space (e.g., random search, Bayesian optimization).
Scheduler: algorithms that help to optimize resource usage during hyperparameter tuning, often by stopping poor-performing trials early.
Trials: individual runs of the tuning process.
ResultGrid: collects the results of your experiment and provides utilities to inspect and analyze them. You can retrieve the best trial, view metrics, and even fetch configurations.

Fig1: Core components of Ray Tune

Now that we understand what hyperparameter tuning is and how Ray Tune works, let’s use Ray Tune to find the best settings for a simple model. You’ll see how easy it is to go from slow manual tuning to distributed, optimized search.

Getting started with Ray Tune

# Environment configuration
from datetime import datetime
import ray

# Ray Tune
from ray import tune
from ray.air import session

/leonardo_work/tra26_castiel2/mviscia1/ray_rag_venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
2026-01-29 12:42:58,384	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.
2026-01-29 12:42:59,676	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.

!which python

/leonardo_work/tra26_castiel2/mviscia1/ray_rag_venv/bin/python

Connecting to the Ray cluster started previously.

ray.init(log_to_driver = False, ignore_reinit_error = True)

2026-01-29 12:43:00,069	INFO worker.py:1520 -- Using address 10.1.0.82:27667 set in the environment variable RAY_ADDRESS
2026-01-29 12:43:00,070	INFO worker.py:1660 -- Connecting to existing Ray cluster at address: 10.1.0.82:27667...
2026-01-29 12:43:00,079	INFO worker.py:1843 -- Connected to Ray cluster. View the dashboard at http://10.1.0.82:8265 

Check cluster resources

resources = ray.cluster_resources()

print(f"Cluster has {resources['CPU']} CPUs, {resources['GPU'] if 'GPU' in resources else 0} GPUs, execution memory {resources['memory'] * 1e-9} GBs, object storage memory {resources['object_store_memory'] * 1e-9} GBs")

Cluster has 16.0 CPUs, 2.0 GPUs, execution memory 359.45678684200004 GBs, object storage memory 154.05290864600002 GBs

Before we jump into tuning Machine Learning models, let’s start with something simple and visual.

Imagine we want to find the point closest to (3, 3) from a list of candidates. This has nothing to do with training models, but it will help us focus on how Ray Tune works:

How do we define the trainable function?

How do we define the search space?

How do we evaluate each candidate?

How does Ray Tune find the best one?

Define all the ingredients to tune the hyperparameters

1) Define the trainable function:

The trainable is a flexible wrapper that can orchestrate processes like simulation, ML model training, or real-world optimization problems. Therefore, we define the model training function that we want to run variations of.

The trainable function:

Takes in input a dictionary with configurations of parameters from Ray Tune. In this example, the dictionary contains the values of x and y that Ray Tune selects for that trial.
Evaluates the model based on the objective function using the given parameters.
Reports the dictionary results back to Ray Tune. The trainable will be executed on a separate Ray Actor (process), so we need to communicate the performance of the model back to Tune (which is on the main Python process).

# Example: finding the closest point to (3,3)

# Define the trainable function
def trainable(config): # The argument config is a dictionary containing the input parameters that Ray Tune will vary during the optimization process
    
    # Extract parameters from the config dict
    x = config["x"]
    y = config["y"]
    
    # Objective function: minimize the distance from (3, 3)
    loss = (x - 3)**2 + (y - 3)**2
    
    # Report the result to Ray Tune
    session.report({'loss':loss}) #train.report

2) Define a search space:

The search space defines the parameter ranges for Ray Tune to explore. Ray supports various sampling techniques such as:

tune.uniform: uniform distribution.
tune.grid_search: exhaustive search over a grid.
tune.loguniform: logarithmic distribution.
tune.choice: choose one of these options uniformly

For this example, we’ll search for parameters x and y within a defined range.

# Define the search space
# This tells Ray Tune which values it should explore for each parameter.

search_space = {
    "x": tune.uniform(-10, 10),  # Ray will sample 'x' from a uniform distribution between -10 and 10
    "y": tune.uniform(-10, 10),  # Similarly, 'y' will be sampled from -10 to 10
}

3) Execute hyperparameters tuning and generate trials

Using Ray Tune, we explore the search space to find the best configuration.

The Tuner object coordinates the tuning process.
Specify the metric and mode to optimize (e.g., minimize loss).
Specify the number of trials using num_samples. Tune automatically determines how many trials will run in parallel.

# Run the tuning process
tuner = tune.Tuner(
    trainable, # Trainable function
    param_space=search_space,  # Search space
    tune_config=tune.TuneConfig(
        metric="loss", 
        mode="min",
        # search_alg=search_algorithm, # Include search algorithm
        num_samples=15,  # Number of trials to run ## Try with different values, what happens if you setnum_samples=50?
    )
)

results = tuner.fit() # Execute and manage hyperparameter tuning and generate your trials

# Retrieve the best trial
best_result = results.get_best_result(metric="loss", mode="min")
print("Best configuration:", best_result.config)
print("Minimum loss:", best_result.metrics["loss"])

Tune Status

Current time:	2026-01-29 12:43:08
Running for:	00:00:05.37
Memory:	30.6/502.9 GiB

System Info

Using FIFO scheduling algorithm.
Logical resource usage: 2.0/16 CPUs, 0/2 GPUs (0.0/1.0 accelerator_type:A100)

Trial Status

Trial name	status	loc	x	y	iter	total time (s)	loss
trainable_ab2a1_00000	TERMINATED	10.1.0.82:2605381	-3.18218	-7.35987	1	0.00111055	145.546
trainable_ab2a1_00001	TERMINATED	10.1.0.82:2605397	7.80951	-2.88309	1	0.000756979	57.7421
trainable_ab2a1_00002	TERMINATED	10.1.0.82:2605390	4.15512	-7.6532	1	0.000977755	114.825
trainable_ab2a1_00003	TERMINATED	10.1.0.82:2605400	-6.63206	1.8459	1	0.000984907	94.1085
trainable_ab2a1_00004	TERMINATED	10.1.0.82:2605395	0.706565	9.76353	1	0.00207734	51.0051
trainable_ab2a1_00005	TERMINATED	10.1.0.82:2605396	-5.762	-6.43398	1	0.00127625	165.773
trainable_ab2a1_00006	TERMINATED	10.1.0.82:2605391	-7.1445	-6.92281	1	0.00159645	201.373
trainable_ab2a1_00007	TERMINATED	10.1.0.82:2605394	-3.61454	4.41909	1	0.00181293	45.766
trainable_ab2a1_00008	TERMINATED	10.1.0.82:2605398	3.14046	-4.96974	1	0.000910521	63.5365
trainable_ab2a1_00009	TERMINATED	10.1.0.82:2605392	4.04479	5.08997	1	0.00116181	5.45955
trainable_ab2a1_00010	TERMINATED	10.1.0.82:2605427	6.30511	-4.17758	1	0.000923157	62.4415
trainable_ab2a1_00011	TERMINATED	10.1.0.82:2605393	-6.90388	9.8905	1	0.00111842	145.566
trainable_ab2a1_00012	TERMINATED	10.1.0.82:2605409	9.64151	9.69836	1	0.000640392	88.9776
trainable_ab2a1_00013	TERMINATED	10.1.0.82:2605399	-5.33	0.332167	1	0.00115252	76.5063
trainable_ab2a1_00014	TERMINATED	10.1.0.82:2605426	0.533074	2.39244	1	0.000982285	6.45485

2026-01-29 12:43:08,053	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/leonardo/home/userinternal/mviscia1/ray_results/trainable_2026-01-29_12-43-02' in 0.0422s.
2026-01-29 12:43:08,061	INFO tune.py:1041 -- Total run time: 5.39 seconds (5.33 seconds for the tuning loop).

Best configuration: {'x': 4.044788577836298, 'y': 5.089966972308293}
Minimum loss: 5.459545117716688

By default, Ray Tune runs N concurrent trials, where N equals the number of available CPUs (cores) on your machine (you can set the maximu number of concurrent trials by specifying the parameter max_concurrent_trials inside the Tuner).

If you need to customize the resource allocation per trial, you can use tune.with_resources. This allows you to explicitly specify the resources (e.g., CPUs, GPUs, or memory) that each trial requires. You can provide these resource requests either as a dictionary or using a PlacementGroupFactory object.

For every trial, Ray Tune will attempt to create a placement group based on the specified resource requirements, ensuring that your trials run with the necessary resources.

%%script false --no-raise-error

# Specify resources to be used for each trial (if you have 8 CPUs on your machine, this will run 4 concurrent trials at a time)
trainable_with_resources = tune.with_resources(trainable, {"cpu": 2})

# Run the tuning process
tuner = tune.Tuner(
    trainable_with_resources,
    tune_config=tune.TuneConfig(
        num_samples=10
    )
)

results = tuner.fit()

%%script false --no-raise-error

# Specify resources to be used for each trial (if you have 8 CPUs and 1 GPU on your machine, this will run 1 trial at a time)
trainable_with_resources = tune.with_resources(trainable, {"cpu": 2, "gpu":1})

# Run the tuning process
tuner = tune.Tuner(
    trainable_with_resources,
    tune_config=tune.TuneConfig(
        num_samples=10
    )
)

results = tuner.fit()

We can also specify the search algorithm as an input in the Tuner (default to random search). In this example we’re using HyperOptSearch, a search algorithm plugin for Ray Tune that wraps the Hyperopt optimization library, a widely-used library for Bayesian optimization. It helps Ray Tune to choose parameter values intelligently over time by learning from previous trials, rather than of just trying random configurations.

%%script false --no-raise-error

from ray.tune.search import ConcurrencyLimiter
from ray.tune.search.hyperopt import HyperOptSearch 

# Define the search algorithm
search_algorithm = HyperOptSearch(
    metric="loss",
    mode="min",
    n_initial_points=16
)

search_algorithm = ConcurrencyLimiter(search_algorithm, max_concurrent=4)


# Run the tuning process
tuner = tune.Tuner(
    trainable, # Trainable function
    param_space=search_space,  # Search space
    tune_config=tune.TuneConfig(
        search_alg=search_algorithm, # Include search algorithm
        num_samples=10,  # Number of trials to run,
    )
)

results = tuner.fit() # Execute and manage hyperparameter tuning and generate your trials

# Retrieve the best trial
best_result = results.get_best_result(metric="loss", mode="min")
print("Best configuration:", best_result.config)
print("Minimum loss:", best_result.metrics["loss"])

Note that HyperOptSearch has an internal parallelism constraint.

Hyperparameter tuning using Ray (DL framework)

In the following example, we’ll explore the use of Ray Tune for hyperparameter tuning in a Deep Learning framework. In particular, we’ll see how to implement a tuning process for a PyTorch Lightning classifier on a CIFAR-10 dataset.

The CIFAR-10 dataset consists of 60,000 32x32 color images in 10 classes (6,000 images per class). It is widely used in computer vision research. The 10 classes include airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck.

The neural network aims to classify the images into one of the 10 categories. It takes a 32x32 image as input, processes it through a sequence of layers, and outputs a probability distribution over the 10 classes. The network’s performance is evaluated using cross-entropy loss, a common metric for classification tasks.

# Ray Tune on ML framework
import time
import xgboost as xgb
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import root_mean_squared_error
from ray.tune.search.hyperopt import HyperOptSearch
import numpy as np

# Ray Tune on DL framework
import os
from ray.train.torch import TorchTrainer
from ray.tune.integration.pytorch_lightning import TuneReportCallback
from torchvision import transforms, datasets
from torch.utils.data import DataLoader, random_split
import pytorch_lightning as pl
import ray.train.lightning
import torch
import time
from torch import nn
from ray.tune.schedulers import ASHAScheduler

2026-01-29 12:43:12,714	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.

# Define the PyTorch Lightning model
class CIFAR10Classifier(pl.LightningModule):
    def __init__(self, hidden_size, lr):
        super().__init__()
        self.model = nn.Sequential(
            nn.Flatten(),
            nn.Linear(32 * 32 * 3, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, 10),
        )
        self.lr = lr

    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = nn.CrossEntropyLoss()(logits, y)
        self.log("train_loss", loss, on_step=False, on_epoch=True, prog_bar=True) #, sync_dist=True)
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=self.lr)

# Define the training function for Ray Tune
def train_func(train_loop_config):  # config
    config = train_loop_config # elimina
    # Data preparation
    transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
    data_dir = "/leonardo_scratch/fast/tra26_castiel2/data/data_cifar10"
    dataset = datasets.CIFAR10(root=data_dir, train=True, transform=transform, download=False)
    train_dataset, val_dataset = random_split(dataset, [45000, 5000])
    train_loader = DataLoader(train_dataset, batch_size=int(config["batch_size"]), shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=int(config["batch_size"]))

    # Define the model
    model = CIFAR10Classifier(hidden_size=config["hidden_size"], lr=config["lr"])
  
    # Define the PyTorch Lightning trainer
    trainer = pl.Trainer(
        devices="auto",
        accelerator="auto",
        strategy=ray.train.lightning.RayDDPStrategy(),
        max_epochs=10,
        plugins=[ray.train.lightning.RayLightningEnvironment()],
        callbacks=[ray.train.lightning.RayTrainReportCallback()],
        # [1a] Optionally, disable the default checkpointing behavior
        # in favor of the `RayTrainReportCallback` above.
        enable_checkpointing=False,
        logger=False,
        enable_progress_bar=True,
    )

    # Train the model
    trainer = ray.train.lightning.prepare_trainer(trainer)
    trainer.fit(model, train_loader, val_loader)

start_time =time.time()

checkpoint_dir = "/leonardo_scratch/large/userinternal/mviscia1/ray_checkpoint" ### Change to '/leonardo_scratch/large/userexternal/<your HPC username>'

scaling_config = ray.train.ScalingConfig(
    num_workers=2,              
    use_gpu=True,               
    resources_per_worker={      
        "CPU": 6,               
        "GPU": 1
    }
)

# Define a TorchTrainer without hyper-parameters for Tuner
ray_trainer = TorchTrainer(
    train_loop_per_worker=train_func, #train_func
    scaling_config=scaling_config,
    run_config=ray.train.RunConfig(storage_path=checkpoint_dir)
)

# Define the hyperparameter search space
search_space = {
    "hidden_size": tune.choice([128, 256, 512]),
    "lr": tune.loguniform(1e-4, 1e-1),
    "batch_size": tune.choice([32, 64, 128]),
}

# Trigger hyperparameter tuning with Ray Tune
tuner = tune.Tuner(
    ray_trainer, #tune.with_parameters(train_func),
    param_space={"train_loop_config": search_space}, #param_space=search_space,
    tune_config=tune.TuneConfig(
        metric="train_loss",
        mode="min",
        num_samples=4,  # Number of trials
        # max_concurrent_trials=2
    ),
)

results = tuner.fit()

end_time = time.time()
best_result = results.get_best_result(metric="train_loss", mode="min")
comp_time = end_time-start_time

Tune Status

Current time:	2026-01-29 12:48:36
Running for:	00:05:18.23
Memory:	31.4/502.9 GiB

System Info

Using FIFO scheduling algorithm.
Logical resource usage: 13.0/16 CPUs, 2.0/2 GPUs (0.0/1.0 accelerator_type:A100)

Trial Status

Trial name	status	loc	train_loop_config/ba tch_size	train_loop_config/hi dden_size	train_loop_config/lr	iter	total time (s)	train_loss	epoch	step
TorchTrainer_b452e_00000	TERMINATED	10.1.0.82:2606482	32	256	0.0159635	10	65.4758	4.85645	9	7040
TorchTrainer_b452e_00001	TERMINATED	10.1.0.82:2606872	32	256	0.000937633	10	68.9653	0.971001	9	7040
TorchTrainer_b452e_00002	TERMINATED	10.1.0.82:2607201	32	256	0.0493233	10	70.5082	15.02	9	7040
TorchTrainer_b452e_00003	TERMINATED	10.1.0.82:2607526	128	512	0.000523825	10	51.814	0.94413	9	1760

2026-01-29 12:48:36,276	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/leonardo_scratch/large/userinternal/mviscia1/ray_checkpoint/TorchTrainer_2026-01-29_12-43-18' in 0.0083s.
2026-01-29 12:48:36,281	INFO tune.py:1041 -- Total run time: 318.24 seconds (318.22 seconds for the tuning loop).

import pandas as pd
import matplotlib.pyplot as plt

all_metrics = []

for i, result in enumerate(results):
    df = result.metrics_dataframe.copy()
    df["trial_id"] = f"Trial_{i+1}"
    df["config"] = str(result.config)
    all_metrics.append(df)

all_results_df = pd.concat(all_metrics, ignore_index=True)

plt.figure(figsize=(10, 6))

# Group by trial and plot
for trial_id, trial_df in all_results_df.groupby("trial_id"):
    plt.plot(trial_df["training_iteration"], trial_df["train_loss"], label=trial_id)

plt.xlabel("Epoch")
plt.ylabel("Train Loss")
plt.title("Loss curves for all trials")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

../../../../_images/aa7bca05d1abc22229ef97c84e3ca35c1820c0e3f0556d2a691a500a913504a8.png

df.head()

	train_loss	epoch	step	timestamp	checkpoint_dir_name	should_checkpoint	done	training_iteration	trial_id	date	...	time_total_s	pid	hostname	node_ip	time_since_restore	iterations_since_restore	config/train_loop_config/hidden_size	config/train_loop_config/lr	config/train_loop_config/batch_size	config
0	1.640035	0	176	1769687279	checkpoint_000000	True	False	1	Trial_4	2026-01-29_12-47-59	...	16.496177	2607526	lrdn0082.leonardo.local	10.1.0.82	16.496177	1	512	0.000524	128	{'train_loop_config': {'hidden_size': 512, 'lr...
1	1.433246	1	352	1769687283	checkpoint_000001	True	False	2	Trial_4	2026-01-29_12-48-03	...	20.433736	2607526	lrdn0082.leonardo.local	10.1.0.82	20.433736	2	512	0.000524	128	{'train_loop_config': {'hidden_size': 512, 'lr...
2	1.339727	2	528	1769687287	checkpoint_000002	True	False	3	Trial_4	2026-01-29_12-48-07	...	24.356834	2607526	lrdn0082.leonardo.local	10.1.0.82	24.356834	3	512	0.000524	128	{'train_loop_config': {'hidden_size': 512, 'lr...
3	1.261192	3	704	1769687290	checkpoint_000003	True	False	4	Trial_4	2026-01-29_12-48-11	...	28.264959	2607526	lrdn0082.leonardo.local	10.1.0.82	28.264959	4	512	0.000524	128	{'train_loop_config': {'hidden_size': 512, 'lr...
4	1.191230	4	880	1769687294	checkpoint_000004	True	False	5	Trial_4	2026-01-29_12-48-14	...	32.200040	2607526	lrdn0082.leonardo.local	10.1.0.82	32.200040	5	512	0.000524	128	{'train_loop_config': {'hidden_size': 512, 'lr...

5 rows × 21 columns

print(f"Best Configuration: ", best_result.config)
print(f"Minimum loss: ", best_result.metrics["train_loss"])
print(f"Time Taken: {comp_time} seconds")

Best Configuration:  {'train_loop_config': {'hidden_size': 512, 'lr': 0.000523825235557825, 'batch_size': 128}}
Minimum loss:  0.9441303610801697
Time Taken: 318.2735798358917 seconds

Exercise

Change the the scaling configuration and check how the training scales. You can try the following scaling configurations:

2 workers with 6 CPUs and 1 GPU each
1 worker with 6 CPUs and 1 GPU

How many trails in parallel will run each of the two configuration?

Hyperparameter tuning using schedulers (ASHA)

Schedulers help manage trial execution by optimizing resource usage and terminating poorly performing trials early.

Common schedulers include:

ASHAScheduler: successive Halving Algorithm to prune trials.
MedianStoppingRule: stops trials if the performance falls below the median.
PopulationBasedTraining: mutates hyperparameters dynamically during training.

%%script false --no-raise-error

from ray.tune.schedulers import ASHAScheduler

# Define a scheduler
scheduler = ASHAScheduler(
    metric="loss",  # Metric to optimize
    mode="min",     # Minimizing the loss
    max_t=10,       # Maximum iterations
    grace_period=1, # Minimum iterations before stopping
    reduction_factor=2  # Factor for halving trials
)

# Define the search space
search_space = {
    "x": tune.uniform(-10, 10),  # Range for parameter x
    "y": tune.uniform(-10, 10),  # Range for parameter y
}

# Run tuning process with the scheduler
tuner_with_scheduler = tune.Tuner(
    trainable, # Trainable function
    param_space=search_space, # Search sapce
    tune_config=tune.TuneConfig(
        num_samples=50, # Number of trials to run
        scheduler=scheduler
    )
)

results_with_scheduler = tuner_with_scheduler.fit() # Execute and manage hyperparameter tuning and generate your trials

# Retrieve the best trial
best_result_with_scheduler = results_with_scheduler.get_best_result(metric="loss", mode="min")
print("Best Configuration with Scheduler:", best_result_with_scheduler.config)
print("Best Loss with Scheduler:", best_result_with_scheduler.metrics["loss"])

The ASHA (Asynchronous Successive Halving Algorithm) scheduler is a powerful tool to improve hyperparameter tuning efficiency. It works by early stopping trials that are not performing well, so computational resources are focused on the trials that are likely to succeed. ASHA operates asynchronously, meaning it doesn’t block other trials from running while it evaluates the progress of ongoing trials.

# Define the PyTorch Lightning model
class CIFAR10Classifier(pl.LightningModule):
    def __init__(self, hidden_size, lr):
        super().__init__()
        # Convolutional layers for feature extraction
        self.feature_extractor = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        # Fully connected layers for classification
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(64 * 8 * 8, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, 10),
        )
        self.lr = lr

    def forward(self, x):
        features = self.feature_extractor(x)
        logits = self.classifier(features)
        return logits

    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = nn.CrossEntropyLoss()(logits, y)
        self.log("train_loss", loss, on_step=True, on_epoch=True)
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=self.lr)

# Define the training function for Ray Tune
def train_func(config):
    
    # Data preparation
    transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
    data_dir = "/leonardo_scratch/fast/tra26_castiel2/data/data_cifar10"
    dataset = datasets.CIFAR10(root=data_dir, train=True, transform=transform, download=False)
    train_dataset, val_dataset = random_split(dataset, [45000, 5000])
    train_loader = DataLoader(train_dataset, batch_size=int(config["batch_size"]), shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=int(config["batch_size"]))

    # Define the model
    model = CIFAR10Classifier(hidden_size=config["hidden_size"], lr=config["lr"])
    
    # Define the PyTorch Lightning trainer
    trainer = pl.Trainer(
        devices="auto",
        accelerator="auto",
        strategy=ray.train.lightning.RayDDPStrategy(),
        max_epochs=20,
        plugins=[ray.train.lightning.RayLightningEnvironment()],
        callbacks=[ray.train.lightning.RayTrainReportCallback()],
        # [1a] Optionally, disable the default checkpointing behavior
        # in favor of the `RayTrainReportCallback` above.
        enable_checkpointing=False,
        logger=False,
        enable_progress_bar=True,
    )

    # Train the model
    trainer = ray.train.lightning.prepare_trainer(trainer)
    trainer.fit(model, train_loader, val_loader)

start_time = time.time()

scaling_config = ray.train.ScalingConfig(
        num_workers=2, use_gpu=True, resources_per_worker={"CPU": 6, "GPU": 1}
        )

# Define a TorchTrainer without hyper-parameters for Tuner
ray_trainer = TorchTrainer(
    train_func,
    scaling_config=scaling_config
)

# Define the hyperparameter search space
search_space = {
    "hidden_size": tune.choice([128, 256, 512]),
    "lr": tune.loguniform(1e-4, 1e-1),
    "batch_size": tune.choice([32, 64, 128]),
}

# The maximum training epochs
num_epochs = 20

scheduler = ASHAScheduler(max_t=num_epochs, grace_period=1, reduction_factor=2)

# Trigger hyperparameter tuning with Ray Tune
tuner = tune.Tuner(
    ray_trainer, #tune.with_parameters(train_func),
    param_space={"train_loop_config": search_space}, #param_space=search_space,
    tune_config=tune.TuneConfig(
        metric="train_loss",
        mode="min",
        num_samples=15,  # Number of trials
        scheduler=scheduler,
        max_concurrent_trials=2
    ),
)
results = tuner.fit()

end_time = time.time()
best_result = results.get_best_result(metric="train_loss", mode="min")
comp_time = end_time-start_time

Tune Status

Current time:	2026-01-29 12:56:33
Running for:	00:06:43.91
Memory:	31.7/502.9 GiB

System Info

Using AsyncHyperBand: num_stopped=5
Bracket: Iter 16.000: -0.3668258339166641 | Iter 8.000: -0.6192266345024109 | Iter 4.000: -0.8897021412849426 | Iter 2.000: -0.9870593547821045 | Iter 1.000: -1.6667675375938416
Logical resource usage: 13.0/16 CPUs, 2.0/2 GPUs (0.0/1.0 accelerator_type:A100)

Trial Status

Trial name	status	loc	train_loop_config/ba tch_size	train_loop_config/hi dden_size	train_loop_config/lr	iter	total time (s)	train_loss	train_loss_step	train_loss_epoch
TorchTrainer_9dac4_00005	RUNNING	10.1.0.82:2609608	64	512	0.00346566	3	27.4821	0.816097	0.761981	0.816097
TorchTrainer_9dac4_00006	PENDING		64	512	0.00046246
TorchTrainer_9dac4_00000	TERMINATED	10.1.0.82:2607915	128	512	0.000260623	20	98.3711	0.482704	0.594142	0.482704
TorchTrainer_9dac4_00001	TERMINATED	10.1.0.82:2608275	64	256	0.000205668	1	20.7985	1.66818	1.51793	1.66818
TorchTrainer_9dac4_00002	TERMINATED	10.1.0.82:2608579	128	128	0.0271474	1	16.1293	2.45891	2.30317	2.45891
TorchTrainer_9dac4_00003	TERMINATED	10.1.0.82:2608900	128	128	0.0450408	1	16.8405	3.40087	2.3093	3.40087
TorchTrainer_9dac4_00004	TERMINATED	10.1.0.82:2609227	32	128	0.00235161	20	134.631	0.117854	7.29805e-05	0.117854

print(f"Best Configuration:", best_result.config)
print(f"Minimum loss: ", best_result.metrics["train_loss"])
print(f"Time Taken: {comp_time} seconds")

# Obtain a trial dataframe from all run trials of this `tune.run` call.
dfs = {result.path: result.metrics_dataframe for result in results}

# Plot by epoch
ax = None
for trial_id, d in dfs.items():
    ax = d.train_loss.plot(ax=ax, legend=trial_id)
    #ax.set_ylim(0,2)
plt.xlabel("Epoch")
plt.ylabel("Train Loss")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

Exercise

Try to set different resources per trial
Try to select different hyperparameters to tune

Bonus: Hyperparameter tuning using Ray (ML framework)

Let’s see an example of using Ray Tune to optimize hyperparameters for a machine learning model using XGBoost. XGBoost is a popular and efficient gradient boosting framework, and Ray Tune can help find the best hyperparameters for it.

What is XGBoost?

XGBoost (eXtreme Gradient Boosting) is a powerful and widely used machine learning algorithm, particularly effective for regression problems. It’s an implementation of gradient boosting that focuses on both performance and computational efficiency.

Gradient Boosting is an ensemble learning technique that builds a series of decision trees in a sequential manner, where each tree corrects the errors of its predecessor. In simple terms:

First Tree: The algorithm builds an initial tree based on the training data.
Subsequent Trees: Each following tree is trained to predict the residual errors of the previous trees, effectively correcting the mistakes made by the prior trees.

In the case of regression, the goal is to minimize the difference between predicted and actual values, typically using a loss function such as Mean Squared Error (MSE).

In the following example we’ll use the Wine Quality dataset from UCI, and we’ll perform a regression problem where the goal is to predict the quality of wine based on its physicochemical attributes.

We are going to look at three variations of the same exercise:

Ray Tune specifying the search algorithm
Ray Tune using a random search
How does Ray Tune compare to Ray Core?

Ray Tune specifying the search algorithm

# Load the dataset
data = load_wine()
X, X_test, y, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

# Define the training function (trainable)
def train_xgboost(config):
    # Split training data into train/validation sets
    X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.25, random_state=42)

    # Create DMatrix for XGBoost
    train_data = xgb.DMatrix(X_train, label=y_train)
    val_data = xgb.DMatrix(X_val, label=y_val)

    # Train the model
    params = {
        "objective": "reg:squarederror",
        "max_depth": int(config["max_depth"]),
        "learning_rate": config["learning_rate"],
        "subsample": config["subsample"],
        "colsample_bytree": config["colsample_bytree"],
        "eval_metric": "rmse",
    }
    model = xgb.train(
        params,
        train_data,
        evals=[(val_data, "validation")],
        num_boost_round=100,
        early_stopping_rounds=10,
        verbose_eval=False,
    )
    
    # Predict on validation set
    val_preds = model.predict(val_data)
    val_rmse = root_mean_squared_error(y_val, val_preds)
    
    # Report the validation RMSE to Ray Tune
    ray.train.report({'rmse':val_rmse})

np.random.seed(1234)
# Define the hyperparameter search space
search_space = {
    "max_depth": tune.randint(3, 10),
    "learning_rate": tune.loguniform(0.01, 0.3),
    "subsample": tune.uniform(0.5, 1.0),
    "colsample_bytree": tune.uniform(0.5, 1.0),
}

# Measure execution time for Ray Tune
start_time = time.time()

# Define the search algorithm
search_algorithm = HyperOptSearch(  
    metric="rmse",
    mode="min",
    random_state_seed=1234
)
# Run the tuning process
tuner = tune.Tuner(
    train_xgboost,
    param_space=search_space,
    tune_config=tune.TuneConfig(
        search_alg=search_algorithm,
        num_samples=50,
    )
)

results = tuner.fit()

end_time = time.time()

# Extract the best configuration
best_result = results.get_best_result(metric="rmse", mode="min")
best_config = best_result.config
best_rmse = best_result.metrics["rmse"]
comp_time = end_time-start_time

print("=== Ray Tune search algorithm ===")
print(f"Best Configuration:", best_config)
print(f"Best RMSE: {best_rmse}")
print(f"Time Taken: {comp_time}")

Ray Tune using a random search

# Load the dataset
data = load_wine()
X, X_test, y, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

# Define the training function
def train_xgboost(config):
    # Split training data into train/validation sets
    X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.25, random_state=42)

    # Create DMatrix for XGBoost
    train_data = xgb.DMatrix(X_train, label=y_train)
    val_data = xgb.DMatrix(X_val, label=y_val)

    # Train the model
    params = {
        "objective": "reg:squarederror",
        "max_depth": int(config["max_depth"]),
        "learning_rate": config["learning_rate"],
        "subsample": config["subsample"],
        "colsample_bytree": config["colsample_bytree"],
        "eval_metric": "rmse",
    }
    model = xgb.train(
        params,
        train_data,
        evals=[(val_data, "validation")],
        num_boost_round=100,
        early_stopping_rounds=10,
        verbose_eval=False,
    )
    
    # Predict on validation set
    val_preds = model.predict(val_data)
    val_rmse = root_mean_squared_error(y_val, val_preds)
    
    # Report the validation RMSE to Ray Tune
    ray.train.report({'rmse':val_rmse})

np.random.seed(1234)
# Define the hyperparameter search space
search_space = {
    "max_depth": tune.randint(3, 10),
    "learning_rate": tune.loguniform(0.01, 0.3),
    "subsample": tune.uniform(0.5, 1.0),
    "colsample_bytree": tune.uniform(0.5, 1.0),
}

# Measure execution time for Ray Tune
start_time = time.time()
 
np.random.seed(1234)
# Run Ray Tune
tuner = tune.Tuner(
    train_xgboost,
    param_space=search_space,
    tune_config=tune.TuneConfig(
        metric="rmse",
        mode="min",
        num_samples=50,
    ),
)

results = tuner.fit()

end_time = time.time()

# Extract the best configuration
best_result = results.get_best_result(metric="rmse", mode="min")
best_config = best_result.config
best_rmse = best_result.metrics["rmse"]
comp_time = end_time-start_time    

print("=== Ray Tune random search ===")
print(f"Best Configuration:", best_config)
print(f"Best RMSE: {best_rmse}")
print(f"Time Taken: {comp_time}")

How does Ray Tune compare to Ray Core?

# Load the dataset
data = load_wine()
X, X_test, y, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

# Define the training function
@ray.remote
def train_xgboost(config):
    # Split training data into train/validation sets
    X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.25, random_state=42)

    # Create DMatrix for XGBoost
    train_data = xgb.DMatrix(X_train, label=y_train)
    val_data = xgb.DMatrix(X_val, label=y_val)

    # Train the model
    params = {
        "objective": "reg:squarederror",
        "max_depth": int(config["max_depth"]),
        "learning_rate": config["learning_rate"],
        "subsample": config["subsample"],
        "colsample_bytree": config["colsample_bytree"],
        "eval_metric": "rmse",
    }
    model = xgb.train(
        params,
        train_data,
        evals=[(val_data, "validation")],
        num_boost_round=100,
        early_stopping_rounds=10,
        verbose_eval=False,
    )
    
    # Predict on validation set
    val_preds = model.predict(val_data)
    val_rmse = root_mean_squared_error(y_val, val_preds)
    
    # Return the validation RMSE
    return {"config": config, "rmse": val_rmse}

# Define the hyperparameter configurations manually
configs = [
    {"max_depth": d, "learning_rate": lr, "subsample": ss, "colsample_bytree": cs}
    for d in range(3, 10)
    for lr in [0.01, 0.05, 0.1, 0.2]
    for ss in [0.6, 0.8, 1.0]
    for cs in [0.6, 0.8, 1.0]
]

# Measure execution time for Ray Core
start_time = time.time()

# Submit tasks to Ray
futures = [train_xgboost.remote(config) for config in configs]
results = ray.get(futures)

end_time = time.time()

# Find the best result
best_result = min(results, key=lambda x: x["rmse"])
best_config = best_result["config"]
best_rmse = best_result["rmse"]

print("=== Ray Core ===")
print("Best Configuration:", best_config)
print(f"Best RMSE: {best_rmse:.4f}")
print(f"Time Taken: {end_time - start_time:.2f} seconds")

Release resources

#Disconnect the worker, and terminate processes started by ray.init()
ray.shutdown()

Python version:	3.11.7
Ray version:	2.44.0
Dashboard:	http://10.1.0.82:8265