5 - Tuning models

Introduction to tidymodels

Tuning parameters

Some model or preprocessing parameters cannot be estimated directly from the data.

Some examples:

  • Tree depth in decision trees
  • Number of neighbors in a K-nearest neighbor model

Optimize tuning parameters

  • Try different values and measure their performance.
  • Find good values for these parameters.
  • Once the value(s) of the parameter(s) are determined, a model can be finalized by fitting the model to the entire training set.

Optimize tuning parameters

The main two strategies for optimization are:

  • Grid search πŸ’  which tests a pre-defined set of candidate values

  • Iterative search πŸŒ€ which suggests/estimates new values of candidate parameters to evaluate

Specifying tuning parameters

Let’s take our previous random forest workflow and tag for tuning the minimum number of data points in each node:

rf_spec <- rand_forest(min_n = tune()) %>% 
  set_mode("classification")

rf_wflow <- workflow(tip ~ ., rf_spec)
rf_wflow
#> ══ Workflow ══════════════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: rand_forest()
#> 
#> ── Preprocessor ──────────────────────────────────────────────────────
#> tip ~ .
#> 
#> ── Model ─────────────────────────────────────────────────────────────
#> Random Forest Model Specification (classification)
#> 
#> Main Arguments:
#>   min_n = tune()
#> 
#> Computational engine: ranger

Try out multiple values

tune_grid() works similar to fit_resamples() but covers multiple parameter values:

set.seed(22)
rf_res <- tune_grid(
  rf_wflow,
  taxi_folds,
  grid = 5
)

Compare results

Inspecting results and selecting the best-performing hyperparameter(s):

show_best(rf_res)
#> # A tibble: 5 Γ— 7
#>   min_n .metric .estimator  mean     n std_err .config             
#>   <int> <chr>   <chr>      <dbl> <int>   <dbl> <chr>               
#> 1    33 roc_auc binary     0.623    10  0.0149 Preprocessor1_Model1
#> 2    31 roc_auc binary     0.622    10  0.0154 Preprocessor1_Model3
#> 3    21 roc_auc binary     0.620    10  0.0149 Preprocessor1_Model4
#> 4    13 roc_auc binary     0.617    10  0.0137 Preprocessor1_Model5
#> 5     6 roc_auc binary     0.611    10  0.0156 Preprocessor1_Model2

best_parameter <- select_best(rf_res)
best_parameter
#> # A tibble: 1 Γ— 2
#>   min_n .config             
#>   <int> <chr>               
#> 1    33 Preprocessor1_Model1

collect_metrics() and autoplot() are also available.

The final fit

rf_wflow <- finalize_workflow(rf_wflow, best_parameter)

final_fit <- last_fit(rf_wflow, taxi_split) 

collect_metrics(final_fit)
#> # A tibble: 2 Γ— 4
#>   .metric  .estimator .estimate .config             
#>   <chr>    <chr>          <dbl> <chr>               
#> 1 accuracy binary         0.913 Preprocessor1_Model1
#> 2 roc_auc  binary         0.648 Preprocessor1_Model1

Your turn

Modify your model workflow to tune one or more parameters.

Use grid search to find the best parameter(s).

05:00