5 - Tuning models
Introduction to tidymodels
Tuning parameters
Some model or preprocessing parameters cannot be estimated directly from the data.
Some examples:
- Tree depth in decision trees
- Number of neighbors in a K-nearest neighbor model
Optimize tuning parameters
- Try different values and measure their performance.
- Find good values for these parameters.
- Once the value(s) of the parameter(s) are determined, a model can be finalized by fitting the model to the entire training set.
Optimize tuning parameters
The main two strategies for optimization are:
Specifying tuning parameters
Letβs take our previous random forest workflow and tag for tuning the minimum number of data points in each node:
rf_spec <- rand_forest(min_n = tune()) %>%
set_mode("classification")
rf_wflow <- workflow(forested ~ ., rf_spec)
rf_wflow
#> ββ Workflow ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> Preprocessor: Formula
#> Model: rand_forest()
#>
#> ββ Preprocessor ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> forested ~ .
#>
#> ββ Model βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> Random Forest Model Specification (classification)
#>
#> Main Arguments:
#> min_n = tune()
#>
#> Computational engine: ranger
Try out multiple values
tune_grid()
works similar to fit_resamples()
but covers multiple parameter values:
set.seed(22)
rf_res <- tune_grid(
rf_wflow,
forested_folds,
grid = 5
)
Compare results
Inspecting results and selecting the best-performing hyperparameter(s):
show_best(rf_res)
#> # A tibble: 5 Γ 7
#> min_n .metric .estimator mean n std_err .config
#> <int> <chr> <chr> <dbl> <int> <dbl> <chr>
#> 1 21 roc_auc binary 0.972 10 0.00295 Preprocessor1_Model4
#> 2 6 roc_auc binary 0.972 10 0.00303 Preprocessor1_Model2
#> 3 31 roc_auc binary 0.972 10 0.00317 Preprocessor1_Model3
#> 4 13 roc_auc binary 0.972 10 0.00311 Preprocessor1_Model5
#> 5 33 roc_auc binary 0.972 10 0.00322 Preprocessor1_Model1
best_parameter <- select_best(rf_res)
best_parameter
#> # A tibble: 1 Γ 2
#> min_n .config
#> <int> <chr>
#> 1 21 Preprocessor1_Model4
collect_metrics()
and autoplot()
are also available.
The final fit
rf_wflow <- finalize_workflow(rf_wflow, best_parameter)
final_fit <- last_fit(rf_wflow, forested_split)
collect_metrics(final_fit)
#> # A tibble: 3 Γ 4
#> .metric .estimator .estimate .config
#> <chr> <chr> <dbl> <chr>
#> 1 accuracy binary 0.906 Preprocessor1_Model1
#> 2 roc_auc binary 0.970 Preprocessor1_Model1
#> 3 brier_class binary 0.0656 Preprocessor1_Model1
Your turn
Modify your model workflow to tune one or more parameters.
Use grid search to find the best parameter(s).