5 - Tuning models

Introduction to tidymodels

Tuning parameters

Some model or preprocessing parameters cannot be estimated directly from the data.

Some examples:

  • Tree depth in decision trees
  • Number of neighbors in a K-nearest neighbor model

Optimize tuning parameters

  • Try different values and measure their performance.
  • Find good values for these parameters.
  • Once the value(s) of the parameter(s) are determined, a model can be finalized by fitting the model to the entire training set.

Optimize tuning parameters

The main two strategies for optimization are:

  • Grid search πŸ’  which tests a pre-defined set of candidate values

  • Iterative search πŸŒ€ which suggests/estimates new values of candidate parameters to evaluate

Specifying tuning parameters

Let’s take our previous random forest workflow and tag for tuning the minimum number of data points in each node:

rf_spec <- rand_forest(min_n = tune()) %>% 
  set_mode("classification")

rf_wflow <- workflow(forested ~ ., rf_spec)
rf_wflow
#> ══ Workflow ══════════════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: rand_forest()
#> 
#> ── Preprocessor ──────────────────────────────────────────────────────
#> forested ~ .
#> 
#> ── Model ─────────────────────────────────────────────────────────────
#> Random Forest Model Specification (classification)
#> 
#> Main Arguments:
#>   min_n = tune()
#> 
#> Computational engine: ranger

Try out multiple values

tune_grid() works similar to fit_resamples() but covers multiple parameter values:

set.seed(22)
rf_res <- tune_grid(
  rf_wflow,
  forested_folds,
  grid = 5
)

Compare results

Inspecting results and selecting the best-performing hyperparameter(s):

show_best(rf_res)
#> # A tibble: 5 Γ— 7
#>   min_n .metric .estimator  mean     n std_err .config             
#>   <int> <chr>   <chr>      <dbl> <int>   <dbl> <chr>               
#> 1    21 roc_auc binary     0.972    10 0.00295 Preprocessor1_Model4
#> 2     6 roc_auc binary     0.972    10 0.00303 Preprocessor1_Model2
#> 3    31 roc_auc binary     0.972    10 0.00317 Preprocessor1_Model3
#> 4    13 roc_auc binary     0.972    10 0.00311 Preprocessor1_Model5
#> 5    33 roc_auc binary     0.972    10 0.00322 Preprocessor1_Model1

best_parameter <- select_best(rf_res)
best_parameter
#> # A tibble: 1 Γ— 2
#>   min_n .config             
#>   <int> <chr>               
#> 1    21 Preprocessor1_Model4

collect_metrics() and autoplot() are also available.

The final fit

rf_wflow <- finalize_workflow(rf_wflow, best_parameter)

final_fit <- last_fit(rf_wflow, forested_split) 

collect_metrics(final_fit)
#> # A tibble: 3 Γ— 4
#>   .metric     .estimator .estimate .config             
#>   <chr>       <chr>          <dbl> <chr>               
#> 1 accuracy    binary        0.906  Preprocessor1_Model1
#> 2 roc_auc     binary        0.970  Preprocessor1_Model1
#> 3 brier_class binary        0.0656 Preprocessor1_Model1

Your turn

Modify your model workflow to tune one or more parameters.

Use grid search to find the best parameter(s).

05:00