Advanced tidymodels

```
hotel_rec <-
recipe(avg_price_per_room ~ ., data = hotel_train) %>%
step_YeoJohnson(lead_time) %>%
step_dummy_hash(agent, num_terms = tune("agent hash")) %>%
step_dummy_hash(company, num_terms = tune("company hash")) %>%
step_zv(all_predictors())
lgbm_spec <-
boost_tree(trees = tune(), learn_rate = tune(), min_n = tune()) %>%
set_mode("regression") %>%
set_engine("lightgbm", num_threads = 1)
lgbm_wflow <- workflow(hotel_rec, lgbm_spec)
lgbm_param <-
lgbm_wflow %>%
extract_parameter_set_dials() %>%
update(`agent hash` = num_hash(c(3, 8)),
`company hash` = num_hash(c(3, 8)))
```

In the last section, we evaluated 250 models (25 candidates times 10 resamples).

We can make this go faster using parallel processing.

Also, for some models, we can *fit* far fewer models than the number that are being evaluated.

- For boosting, a model with
`X`

trees can often predict on candidates with less than`X`

trees.

Both of these methods can lead to enormous speed-ups.

*Racing* is an old tool that we can use to go even faster.

- Evaluate all of the candidate models but only for a few resamples.
- Determine which candidates have a low probability of being selected.
- Eliminate poor candidates.
- Repeat with next resample (until no more resamples remain)

This can result in fitting a small number of models.

How do we eliminate tuning parameter combinations?

There are a few methods to do so. We’ll use one based on analysis of variance (ANOVA).

*However*… there is typically a large difference between resamples in the results.

Here are some realistic (but simulated) examples of two candidate models.

An error estimate is measured for each of 10 resamples.

- The lines connect resamples.

There is usually a significant resample-to-resample effect (rank corr: 0.83).

One way to evaluate these models is to do a paired t-test

- or a t-test on their differences matched by resamples

With \(n = 10\) resamples, the confidence interval for the difference in RMSE is (0.99, 2.8), indicating that candidate number 2 has smaller error.

What if we were to have compared the candidates while we seqeuntially evaluated each resample?

👉

One candidate shows superiority when 4 resamples have been evaluated.

One version of racing uses a *mixed model ANOVA* to construct one-sided confidence intervals for each candidate versus the current best.

Any candidates whose bound does not include zero are discarded. Here is an animation.

The resamples are analyzed in a random order (so set the seed).

Kuhn (2014) has examples and simulations to show that the method works.

The finetune package has functions `tune_race_anova()`

and `tune_race_win_loss()`

.

The syntax and helper functions are extremely similar to those shown for `tune_grid()`

.

```
show_best(lgbm_race_res, metric = "mae")
#> # A tibble: 2 × 11
#> trees min_n learn_rate `agent hash` `company hash` .metric .estimator mean n std_err .config
#> <int> <int> <dbl> <int> <int> <chr> <chr> <dbl> <int> <dbl> <chr>
#> 1 1347 5 0.0655 66 26 mae standard 9.64 10 0.173 Preprocessor34_Model1
#> 2 980 8 0.0429 17 135 mae standard 9.76 10 0.164 Preprocessor25_Model1
```

*Run*`tune_race_anova()`

with a different seed.*Did you get the same or similar results?*

`10:00`