Optuna Samplers
Optimizing ML models using Optuna Samplers
By Niels van der Velden in reticulate R Python Machine Learning
February 7, 2022
Introduction
When building Machine Learning (ML) models an important part of optimizing the model is searching for the right set of parameters such that the model achieves the highest precision and accuracy. In the most advanced ML models (XGBoost, LGBM, CatBoost, TabNet) there are many parameters to optimize and finding the best performing set can take a lot of time. If we for instance would like to tune 6 parameters and try 10 different values for each there are a total of 1,000,000 different combinations we could try. Some models can take hours to train so it would be impossible to try them all. How could we best navigate this vast search space to find the optimum set of values?
One of the best packages (in my opinion) to tune hyperparameters and navigate large search spaces is the python package Optuna. After reading many tutorials on using Optuna for hyperparameter tuning I always wondered what makes Optuna so efficient and if there would be a simple way to visualize what is happening underneath.
In this article I will try to answer this question by approximating the maximum of a single objective function and the minimum of a multi objective function using a random and gird search and compare the result when doing the same using the advanced TPESampler
and CmaEsSampler
of Optuna.
Using Optuna to find the maximum of a function
In the below code we ask Optuna to suggest values for x between 0 and 1000 and try to find the value x that would maximize y for the function y = sin(-0.4*pi*x / 20) + sin(-0.5*pi*x / 200) + sin(0.5*pi*x / 100)
. In this example the values for x are suggested using the TPESampler
of Optuna. This sampler uses Bayesian optimization methods to select each x value (see
link for a more detailed explanation). In total we run 100 trials and we set the seed of the sampler to 42 to get reproducible results.
import optuna
from optuna.samplers import TPESampler
import math
def objective(trial):
x = trial.suggest_float('x', 0, 1000)
y = (
math.sin(-0.4*math.pi*x / 20)
+ math.sin(-0.5*math.pi*x / 200)
+ math.sin(0.5*math.pi*x / 100)
)
return y
study = optuna.create_study(
direction="maximize",
sampler=TPESampler(seed=42)
)
study.optimize(objective, n_trials=100)
print("TPESampler: Best y: {} for x: {} found at trial: {}"
.format(
round(study.best_value, 2),
round(study.best_params["x"],2),
study.best_trial.number
)
)
## TPESampler: Best y: 2.5 for x: 478.02 found at trial: 61
You can see that the TPESampler
found the maximum already after 61 trials. How would these results compare to taking just 100 random values for x? We can do this by running the same code as above but instead of using the TPESampler
we will import and use the RandomSampler
of Optuna.
## RandomSampler: Best y: 2.43 for x: 472.21 found at trial: 89
Using the random sampling approach the best value was found after 89 trials and it was not able to find the absolute maximum of 2.5. The TPESampler
is clearly the winner here.
We can plot the x values that were taken for each trial onto the function for which we try to find the maximum. What you can see using the timeline slider at the bottom of the graphs is that at the beginning the TPESampler
starts with the selection of random values for x but after a while it starts to use a “best guess” approach using Bayesian optimization to select the most promising value of x to sample next.