Optuna Samplers

Optimizing ML models using Optuna Samplers

By Niels van der Velden in reticulate R Python Machine Learning

February 7, 2022

Introduction

When building Machine Learning (ML) models an important part of optimizing the model is searching for the right set of parameters such that the model achieves the highest precision and accuracy. In the most advanced ML models (XGBoost, LGBM, CatBoost, TabNet) there are many parameters to optimize and finding the best performing set can take a lot of time. If we for instance would like to tune 6 parameters and try 10 different values for each there are a total of 1,000,000 different combinations we could try. Some models can take hours to train so it would be impossible to try them all. How could we best navigate this vast search space to find the optimum set of values?

One of the best packages (in my opinion) to tune hyperparameters and navigate large search spaces is the python package Optuna. After reading many tutorials on using Optuna for hyperparameter tuning I always wondered what makes Optuna so efficient and if there would be a simple way to visualize what is happening underneath.

In this article I will try to answer this question by approximating the maximum of a single objective function and the minimum of a multi objective function using a random and gird search and compare the result when doing the same using the advanced TPESampler and CmaEsSampler of Optuna.

Using Optuna to find the maximum of a function

In the below code we ask Optuna to suggest values for x between 0 and 1000 and try to find the value x that would maximize y for the function y = sin(-0.4*pi*x / 20) + sin(-0.5*pi*x / 200) + sin(0.5*pi*x / 100). In this example the values for x are suggested using the TPESampler of Optuna. This sampler uses Bayesian optimization methods to select each x value (see link for a more detailed explanation). In total we run 100 trials and we set the seed of the sampler to 42 to get reproducible results.

import optuna

from optuna.samplers import TPESampler

import math

def objective(trial):
    x = trial.suggest_float('x', 0, 1000)
    y = (
      math.sin(-0.4*math.pi*x / 20) 
    + math.sin(-0.5*math.pi*x / 200) 
    + math.sin(0.5*math.pi*x / 100)
    )
    return y

study = optuna.create_study(
  direction="maximize", 
  sampler=TPESampler(seed=42)
  )
study.optimize(objective, n_trials=100)

print("TPESampler: Best y: {} for x: {} found at trial: {}"
.format(
  round(study.best_value, 2), 
  round(study.best_params["x"],2), 
  study.best_trial.number
  )
)
## TPESampler: Best y: 2.5 for x: 478.02 found at trial: 61

You can see that the TPESampler found the maximum already after 61 trials. How would these results compare to taking just 100 random values for x? We can do this by running the same code as above but instead of using the TPESampler we will import and use the RandomSampler of Optuna.

## RandomSampler: Best y: 2.43 for x: 472.21 found at trial: 89

Using the random sampling approach the best value was found after 89 trials and it was not able to find the absolute maximum of 2.5. The TPESampler is clearly the winner here.

We can plot the x values that were taken for each trial onto the function for which we try to find the maximum. What you can see using the timeline slider at the bottom of the graphs is that at the beginning the TPESampler starts with the selection of random values for x but after a while it starts to use a “best guess” approach using Bayesian optimization to select the most promising value of x to sample next.