5 - Tuning models

Introduction to Machine Learning in R with tidymodels

Tuning parameters

Some model or preprocessing parameters cannot be estimated directly from the data.

Some examples:

  • Tree depth in decision trees
  • Number of neighbors in a K-nearest neighbor model

Optimize tuning parameters

  • Try different values and measure their performance.
  • Find good values for these parameters.
  • Once the value(s) of the parameter(s) are determined, a model can be finalized by fitting the model to the entire training set.

Optimize tuning parameters

The main two strategies for optimization are:

  • Grid search πŸ’  which tests a pre-defined set of candidate values

  • Iterative search πŸŒ€ which suggests/estimates new values of candidate parameters to evaluate

Specifying tuning parameters

Let’s take our previous random forest workflow and tag for tuning the minimum number of data points in each node:

rf_spec <- rand_forest(min_n = tune()) |> 
  set_mode("classification")

rf_wflow <- workflow(forested ~ ., rf_spec)
rf_wflow
#> ══ Workflow ══════════════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: rand_forest()
#> 
#> ── Preprocessor ──────────────────────────────────────────────────────
#> forested ~ .
#> 
#> ── Model ─────────────────────────────────────────────────────────────
#> Random Forest Model Specification (classification)
#> 
#> Main Arguments:
#>   min_n = tune()
#> 
#> Computational engine: ranger

Try out multiple values

tune_grid() works similar to fit_resamples() but covers multiple parameter values:

set.seed(22)
rf_res <- tune_grid(
  rf_wflow,
  forested_folds,
  grid = 5
)

Compare results

Inspecting results and selecting the best-performing hyperparameter(s):

show_best(rf_res)
#> # A tibble: 5 Γ— 7
#>   min_n .metric .estimator  mean     n std_err .config        
#>   <int> <chr>   <chr>      <dbl> <int>   <dbl> <chr>          
#> 1    40 roc_auc binary     0.762    10 0.0103  pre0_mod5_post0
#> 2    31 roc_auc binary     0.761    10 0.0100  pre0_mod4_post0
#> 3    22 roc_auc binary     0.760    10 0.00997 pre0_mod3_post0
#> 4    12 roc_auc binary     0.758    10 0.0102  pre0_mod2_post0
#> 5     3 roc_auc binary     0.755    10 0.00980 pre0_mod1_post0

best_parameter <- select_best(rf_res)
best_parameter
#> # A tibble: 1 Γ— 2
#>   min_n .config        
#>   <int> <chr>          
#> 1    40 pre0_mod5_post0

collect_metrics() and autoplot() are also available.

The final fit

rf_wflow <- finalize_workflow(rf_wflow, best_parameter)

final_fit <- last_fit(rf_wflow, forested_split) 

collect_metrics(final_fit)
#> # A tibble: 3 Γ— 4
#>   .metric     .estimator .estimate .config        
#>   <chr>       <chr>          <dbl> <chr>          
#> 1 accuracy    binary         0.764 pre0_mod0_post0
#> 2 roc_auc     binary         0.764 pre0_mod0_post0
#> 3 brier_class binary         0.161 pre0_mod0_post0

Your turn

Modify your model workflow to tune one or more parameters.

Use grid search to find the best parameter(s).

05:00