5 - Tuning models
  Introduction to tidymodels
Tuning parameters
Some model or preprocessing parameters cannot be estimated directly from the data.
Some examples:
- Tree depth in decision trees
 
- Number of neighbors in a K-nearest neighbor model
 
 
Optimize tuning parameters
- Try different values and measure their performance.
 
- Find good values for these parameters.
 
 
- Once the value(s) of the parameter(s) are determined, a model can be finalized by fitting the model to the entire training set.
 
 
Optimize tuning parameters
The main two strategies for optimization are:
Specifying tuning parameters
Letβs take our previous random forest workflow and tag for tuning the minimum number of data points in each node:
rf_spec <- rand_forest(min_n = tune()) |> 
  set_mode("classification")
rf_wflow <- workflow(forested ~ ., rf_spec)
rf_wflow
#> ββ Workflow ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> Preprocessor: Formula
#> Model: rand_forest()
#> 
#> ββ Preprocessor ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> forested ~ .
#> 
#> ββ Model βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> Random Forest Model Specification (classification)
#> 
#> Main Arguments:
#>   min_n = tune()
#> 
#> Computational engine: ranger
 
 
Try out multiple values
tune_grid() works similar to fit_resamples() but covers multiple parameter values:
set.seed(22)
rf_res <- tune_grid(
  rf_wflow,
  forested_folds,
  grid = 5
)
 
 
Compare results
Inspecting results and selecting the best-performing hyperparameter(s):
show_best(rf_res)
#> # A tibble: 5 Γ 7
#>   min_n .metric .estimator  mean     n std_err .config             
#>   <int> <chr>   <chr>      <dbl> <int>   <dbl> <chr>               
#> 1    40 roc_auc binary     0.762    10 0.0107  Preprocessor1_Model1
#> 2    31 roc_auc binary     0.761    10 0.0101  Preprocessor1_Model2
#> 3    22 roc_auc binary     0.760    10 0.0103  Preprocessor1_Model5
#> 4    12 roc_auc binary     0.758    10 0.0101  Preprocessor1_Model4
#> 5     3 roc_auc binary     0.755    10 0.00969 Preprocessor1_Model3
best_parameter <- select_best(rf_res)
best_parameter
#> # A tibble: 1 Γ 2
#>   min_n .config             
#>   <int> <chr>               
#> 1    40 Preprocessor1_Model1
 
 
collect_metrics() and autoplot() are also available.
The final fit
rf_wflow <- finalize_workflow(rf_wflow, best_parameter)
final_fit <- last_fit(rf_wflow, forested_split) 
collect_metrics(final_fit)
#> # A tibble: 3 Γ 4
#>   .metric     .estimator .estimate .config             
#>   <chr>       <chr>          <dbl> <chr>               
#> 1 accuracy    binary         0.764 Preprocessor1_Model1
#> 2 roc_auc     binary         0.764 Preprocessor1_Model1
#> 3 brier_class binary         0.161 Preprocessor1_Model1
 
 
Your turn
![]()
Modify your model workflow to tune one or more parameters.
Use grid search to find the best parameter(s).