3 - Grid Search via Racing

Advanced tidymodels

Previously - Setup


# Max's usual settings: 
  pillar.advice = FALSE, 
  pillar.min_title_chars = Inf

reg_metrics <- metric_set(mae, rsq)
hotel_rates <- 
  hotel_rates %>% 
  sample_n(5000) %>% 
  arrange(arrival_date) %>% 
  select(-arrival_date) %>% 
    company = factor(as.character(company)),
    country = factor(as.character(country)),
    agent = factor(as.character(agent))

Previously - Data Usage

hotel_split <-
  initial_split(hotel_rates, strata = avg_price_per_room)

hotel_train <- training(hotel_split)
hotel_test <- testing(hotel_split)

hotel_rs <- vfold_cv(hotel_train, strata = avg_price_per_room)

Previously - Boosting Model

hotel_rec <-
  recipe(avg_price_per_room ~ ., data = hotel_train) %>%
  step_YeoJohnson(lead_time) %>%
  step_dummy_hash(agent,   num_terms = tune("agent hash")) %>%
  step_dummy_hash(company, num_terms = tune("company hash")) %>%

lgbm_spec <- 
  boost_tree(trees = tune(), learn_rate = tune(), min_n = tune()) %>% 
  set_mode("regression") %>% 
  set_engine("lightgbm", num_threads = 1)

lgbm_wflow <- workflow(hotel_rec, lgbm_spec)

lgbm_param <-
  lgbm_wflow %>%
  extract_parameter_set_dials() %>%
  update(`agent hash`   = num_hash(c(3, 8)),
         `company hash` = num_hash(c(3, 8)))

Making Grid Search More Efficient

In the last section, we evaluated 250 models (25 candidates times 10 resamples).

We can make this go faster using parallel processing.

Also, for some models, we can fit far fewer models than the number that are being evaluated.

  • For boosting, a model with X trees can often predict on candidates with less than X trees.

Both of these methods can lead to enormous speed-ups.

Model Racing

Racing is an old tool that we can use to go even faster.

  1. Evaluate all of the candidate models but only for a few resamples.
  2. Determine which candidates have a low probability of being selected.
  3. Eliminate poor candidates.
  4. Repeat with next resample (until no more resamples remain)

This can result in fitting a small number of models.

Discarding Candidates

How do we eliminate tuning parameter combinations?

There are a few methods to do so. We’ll use one based on analysis of variance (ANOVA).

However… there is typically a large difference between resamples in the results.

Resampling Results (Non-Racing)

Here are some realistic (but simulated) examples of two candidate models.

An error estimate is measured for each of 10 resamples.

  • The lines connect resamples.

There is usually a significant resample-to-resample effect (rank corr: 0.83).

Are Candidates Different?

One way to evaluate these models is to do a paired t-test

  • or a t-test on their differences matched by resamples

With \(n = 10\) resamples, the confidence interval is (0.99, 2.8), indicating that candidate number 2 has smaller error.

What if we were to compare each model candidate to the current best at each resample?

One shows superiority when 4 resamples have been evaluated.

Evaluating Differences in Candidates

Interim Analysis of Results

One version of racing uses a mixed model ANOVA to construct one-sided confidence intervals for each candidate versus the current best.

Any candidates whose bound does not include zero are discarded. Here is an animation.

The resamples are analyzed in a random order.

Kuhn (2014) has examples and simulations to show that the method works.

The finetune package has functions tune_race_anova() and tune_race_win_loss().


# Let's use a larger grid
lgbm_grid <- 
  lgbm_param %>% 
  grid_latin_hypercube(size = 50)


lgbm_race_res <-
  lgbm_wflow %>%
    resamples = hotel_rs,
    grid = lgbm_grid, 
    metrics = reg_metrics

The syntax and helper functions are extremely similar to those shown for tune_grid().

Racing Results

show_best(lgbm_race_res, metric = "mae")
#> # A tibble: 2 × 11
#>   trees min_n learn_rate `agent hash` `company hash` .metric .estimator  mean     n std_err .config              
#>   <int> <int>      <dbl>        <int>          <int> <chr>   <chr>      <dbl> <int>   <dbl> <chr>                
#> 1  1516     7     0.0421          176             12 mae     standard    9.60    10   0.181 Preprocessor42_Model1
#> 2  1014     5     0.0791           35            181 mae     standard    9.61    10   0.179 Preprocessor06_Model1

Racing Results

Only 171 models were fit (out of 500).

select_best() never considers candidate models that did not get to the end of the race.

There is a helper function to see how candidate models were removed from consideration.

plot_race(lgbm_race_res) + 
  scale_x_continuous(breaks = pretty_breaks())

Your turn

  • Run tune_race_anova() with a different seed.
  • Did you get the same or similar results?