4 - Grid Search via Racing

Advanced tidymodels

Previously - Setup

library(tidymodels)
library(textrecipes)
library(bonsai)

# Max's usual settings: 
tidymodels_prefer()
theme_set(theme_bw())
options(
  pillar.advice = FALSE, 
  pillar.min_title_chars = Inf
)

reg_metrics <- metric_set(mae, rsq)
data(hotel_rates)
set.seed(295)
hotel_rates <- 
  hotel_rates %>% 
  sample_n(5000) %>% 
  arrange(arrival_date) %>% 
  select(-arrival_date) %>% 
  mutate(
    company = factor(as.character(company)),
    country = factor(as.character(country)),
    agent = factor(as.character(agent))
  )

Previously - Data Usage

set.seed(4028)
hotel_split <-
  initial_split(hotel_rates, strata = avg_price_per_room)

hotel_train <- training(hotel_split)
hotel_test <- testing(hotel_split)

set.seed(472)
hotel_rs <- vfold_cv(hotel_train, strata = avg_price_per_room)

Previously - Boosting Model

hotel_rec <-
  recipe(avg_price_per_room ~ ., data = hotel_train) %>%
  step_YeoJohnson(lead_time) %>%
  step_dummy_hash(agent,   num_terms = tune("agent hash")) %>%
  step_dummy_hash(company, num_terms = tune("company hash")) %>%
  step_zv(all_predictors())

lgbm_spec <- 
  boost_tree(trees = tune(), learn_rate = tune(), min_n = tune()) %>% 
  set_mode("regression") %>% 
  set_engine("lightgbm", num_threads = 1)

lgbm_wflow <- workflow(hotel_rec, lgbm_spec)

lgbm_param <-
  lgbm_wflow %>%
  extract_parameter_set_dials() %>%
  update(`agent hash`   = num_hash(c(3, 8)),
         `company hash` = num_hash(c(3, 8)))

First, a shameless promotion

Making Grid Search More Efficient

In the last section, we evaluated 250 models (25 candidates times 10 resamples).

We can make this go faster using parallel processing.

Also, for some models, we can fit far fewer models than the number that are being evaluated.

  • For boosting, a model with X trees can often predict on candidates with less than X trees.

Both of these methods can lead to enormous speed-ups.

Model Racing

Racing is an old tool that we can use to go even faster.

  1. Evaluate all of the candidate models but only for a few resamples.
  2. Determine which candidates have a low probability of being selected.
  3. Eliminate poor candidates.
  4. Repeat with next resample (until no more resamples remain)

This can result in fitting a small number of models.

Discarding Candidates

How do we eliminate tuning parameter combinations?

There are a few methods to do so. We’ll use one based on analysis of variance (ANOVA).

However… there is typically a large difference between resamples in the results.

Resampling Results (Non-Racing)

Here are some realistic (but simulated) examples of two candidate models.

An error estimate is measured for each of 10 resamples.

  • The lines connect resamples.

There is usually a significant resample-to-resample effect (rank corr: 0.83).

Are Candidates Different?

One way to evaluate these models is to do a paired t-test

  • or a t-test on their differences matched by resamples

With \(n = 10\) resamples, the confidence interval for the difference in RMSE is (0.99, 2.8), indicating that candidate number 2 has smaller error.

Evaluating Differences in Candidates

What if we were to have compared the candidates while we seqeuntially evaluated each resample?

👉

One candidate shows superiority when 4 resamples have been evaluated.

Interim Analysis of Results

One version of racing uses a mixed model ANOVA to construct one-sided confidence intervals for each candidate versus the current best.

Any candidates whose bound does not include zero are discarded. Here is an animation.

The resamples are analyzed in a random order (so set the seed).


Kuhn (2014) has examples and simulations to show that the method works.

The finetune package has functions tune_race_anova() and tune_race_win_loss().

Racing

# Let's use a larger grid
set.seed(8945)
lgbm_grid <- 
  lgbm_param %>% 
  grid_space_filling(size = 50)

library(finetune)

set.seed(9)
lgbm_race_res <-
  lgbm_wflow %>%
  tune_race_anova(
    resamples = hotel_rs,
    grid = lgbm_grid, 
    metrics = reg_metrics
  )

The syntax and helper functions are extremely similar to those shown for tune_grid().

Racing Results

show_best(lgbm_race_res, metric = "mae")
#> # A tibble: 2 × 11
#>   trees min_n learn_rate `agent hash` `company hash` .metric .estimator  mean     n std_err .config              
#>   <int> <int>      <dbl>        <int>          <int> <chr>   <chr>      <dbl> <int>   <dbl> <chr>                
#> 1  1347     5     0.0655           66             26 mae     standard    9.64    10   0.173 Preprocessor34_Model1
#> 2   980     8     0.0429           17            135 mae     standard    9.76    10   0.164 Preprocessor25_Model1

Racing Results

Only 171 models were fit (out of 500).

select_best() never considers candidate models that did not get to the end of the race.

There is a helper function to see how candidate models were removed from consideration.

plot_race(lgbm_race_res) + 
  scale_x_continuous(breaks = pretty_breaks())

Your turn

  • Run tune_race_anova() with a different seed.
  • Did you get the same or similar results?
10:00