4 - Evaluating models

Introduction to Machine Learning in R with tidymodels

Looking at predictions

augment(forested_fit, new_data = forested_train)
#> # A tibble: 8,749 × 22
#>    .pred_class .pred_Yes .pred_No forested  year elevation eastness roughness
#>    <fct>           <dbl>    <dbl> <fct>    <dbl>     <dbl>    <dbl>     <dbl>
#>  1 Yes             0.931   0.0690 Yes       1997        66       82        10
#>  2 Yes             0.983   0.0172 No        1997       284      -99        58
#>  3 Yes             0.960   0.0401 Yes       2022       130       86        15
#>  4 Yes             0.870   0.130  Yes       2021       202      -55         3
#>  5 Yes             0.823   0.177  Yes       1995        75      -89         1
#>  6 Yes             0.758   0.242  No        1995       110      -53         5
#>  7 Yes             0.823   0.177  Yes       2022       111       73        12
#>  8 No              0.467   0.533  Yes       1997       230       96        14
#>  9 Yes             0.983   0.0172 Yes       2002       160      -88        13
#> 10 Yes             0.871   0.129  Yes       2020        39        9         6
#> # ℹ 8,739 more rows
#> # ℹ 14 more variables: tree_no_tree <fct>, dew_temp <dbl>, precip_annual <dbl>,
#> #   temp_annual_mean <dbl>, temp_annual_min <dbl>, temp_annual_max <dbl>,
#> #   temp_january_min <dbl>, vapor_min <dbl>, vapor_max <dbl>,
#> #   canopy_cover <dbl>, lon <dbl>, lat <dbl>, land_type <fct>, county <fct>

Confusion matrix

Confusion matrix

augment(forested_fit, new_data = forested_train) |>
  conf_mat(truth = forested, estimate = .pred_class)
#>           Truth
#> Prediction  Yes   No
#>        Yes 5884  869
#>        No   438 1558

Confusion matrix

augment(forested_fit, new_data = forested_train) |>
  conf_mat(truth = forested, estimate = .pred_class) |>
  autoplot(type = "heatmap")

Your turn


How would you combine TP, TN, FP, and FN into one number to describe how often the model is correct?

05:00

Metrics for model performance

augment(forested_fit, new_data = forested_train) |>
  accuracy(truth = forested, estimate = .pred_class)
#> # A tibble: 1 × 3
#>   .metric  .estimator .estimate
#>   <chr>    <chr>          <dbl>
#> 1 accuracy binary         0.851

Dangers of accuracy

We need to be careful of using accuracy() since it can give “good” performance by only predicting one way with imbalanced data:

augment(forested_fit, new_data = forested_train) %>%
  mutate(.pred_class = factor("Yes", levels = c("Yes", "No"))) %>%
  accuracy(truth = forested, estimate = .pred_class)
#> # A tibble: 1 × 3
#>   .metric  .estimator .estimate
#>   <chr>    <chr>          <dbl>
#> 1 accuracy binary         0.723

Metrics for model performance

augment(forested_fit, new_data = forested_train) |>
  sensitivity(truth = forested, estimate = .pred_class)
#> # A tibble: 1 × 3
#>   .metric     .estimator .estimate
#>   <chr>       <chr>          <dbl>
#> 1 sensitivity binary         0.931

Metrics for model performance

augment(forested_fit, new_data = forested_train) |>
  sensitivity(truth = forested, estimate = .pred_class)
#> # A tibble: 1 × 3
#>   .metric     .estimator .estimate
#>   <chr>       <chr>          <dbl>
#> 1 sensitivity binary         0.931


augment(forested_fit, new_data = forested_train) |>
  specificity(truth = forested, estimate = .pred_class)
#> # A tibble: 1 × 3
#>   .metric     .estimator .estimate
#>   <chr>       <chr>          <dbl>
#> 1 specificity binary         0.642

Metrics for model performance

We can use metric_set() to combine multiple calculations into one

forested_metrics <- metric_set(accuracy, specificity, sensitivity)

augment(forested_fit, new_data = forested_train) |>
  forested_metrics(truth = forested, estimate = .pred_class)
#> # A tibble: 3 × 3
#>   .metric     .estimator .estimate
#>   <chr>       <chr>          <dbl>
#> 1 accuracy    binary         0.851
#> 2 specificity binary         0.642
#> 3 sensitivity binary         0.931

Metrics for model performance

Metrics and metric sets work with grouped data frames!

augment(forested_fit, new_data = forested_train) |>
  group_by(tree_no_tree) |>
  accuracy(truth = forested, estimate = .pred_class)
#> # A tibble: 2 × 4
#>   tree_no_tree .metric  .estimator .estimate
#>   <fct>        <chr>    <chr>          <dbl>
#> 1 Tree         accuracy binary         0.875
#> 2 No tree      accuracy binary         0.807

Your turn

Apply the forested_metrics metric set to augment()
output grouped by tree_no_tree.

Do any metrics differ substantially between groups?

05:00

Two class data

These metrics assume that we know the threshold for converting “soft” probability predictions into “hard” class predictions.

Is a 50% threshold good?

What happens if we say that we need to be 80% sure to declare an event?

  • sensitivity ⬇️, specificity ⬆️

What happens for a 20% threshold?

  • sensitivity ⬆️, specificity ⬇️

Varying the threshold

ROC curves

For an ROC (receiver operator characteristic) curve, we plot

  • the false positive rate (1 - specificity) on the x-axis
  • the true positive rate (sensitivity) on the y-axis

with sensitivity and specificity calculated at all possible thresholds.

ROC curves

We can use the area under the ROC curve as a classification metric:

  • ROC AUC = 1 💯
  • ROC AUC = 1/2 😢

ROC curves

# Assumes _first_ factor level is event; there are options to change that
augment(forested_fit, new_data = forested_train) |> 
  roc_curve(truth = forested, .pred_Yes) |>
  slice(1, 20, 50)
#> # A tibble: 3 × 3
#>   .threshold specificity sensitivity
#>        <dbl>       <dbl>       <dbl>
#> 1   -Inf           0           1    
#> 2      0.143       0.267       0.990
#> 3      0.385       0.571       0.951

augment(forested_fit, new_data = forested_train) |> 
  roc_auc(truth = forested, .pred_Yes)
#> # A tibble: 1 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 roc_auc binary         0.881

ROC curve plot

augment(forested_fit, 
        new_data = forested_train) |> 
  roc_curve(truth = forested, 
            .pred_Yes) |>
  autoplot()

Your turn

Compute and plot an ROC curve for your current model.

What data are being used for this ROC curve plot?

05:00

Brier score formula

What if we don’t turn predicted probabilities into class predictions?

The Brier score is analogous to the mean squared error in regression models:

\[ Brier_{class} = \frac{1}{NC}\sum_{i=1}^N\sum_{k=1}^C (y_{ik} - \hat{p}_{ik})^2 \]

Brier score

augment(forested_fit, new_data = forested_train) |> 
  brier_class(truth = forested, .pred_Yes) 
#> # A tibble: 1 × 3
#>   .metric     .estimator .estimate
#>   <chr>       <chr>          <dbl>
#> 1 brier_class binary         0.113

Smaller values are better, for binary classification the “bad model threshold” is about 0.25.

Separation vs calibration

The ROC captures separation.

The Brier score captures calibration.

⚠️ DANGERS OF OVERFITTING ⚠️

Dangers of overfitting

Dangers of overfitting ⚠️

Dangers of overfitting ⚠️

forested_fit |>
  augment(forested_train)
#> # A tibble: 8,749 × 22
#>    .pred_class .pred_Yes .pred_No forested  year elevation eastness roughness
#>    <fct>           <dbl>    <dbl> <fct>    <dbl>     <dbl>    <dbl>     <dbl>
#>  1 Yes             0.931   0.0690 Yes       1997        66       82        10
#>  2 Yes             0.983   0.0172 No        1997       284      -99        58
#>  3 Yes             0.960   0.0401 Yes       2022       130       86        15
#>  4 Yes             0.870   0.130  Yes       2021       202      -55         3
#>  5 Yes             0.823   0.177  Yes       1995        75      -89         1
#>  6 Yes             0.758   0.242  No        1995       110      -53         5
#>  7 Yes             0.823   0.177  Yes       2022       111       73        12
#>  8 No              0.467   0.533  Yes       1997       230       96        14
#>  9 Yes             0.983   0.0172 Yes       2002       160      -88        13
#> 10 Yes             0.871   0.129  Yes       2020        39        9         6
#> # ℹ 8,739 more rows
#> # ℹ 14 more variables: tree_no_tree <fct>, dew_temp <dbl>, precip_annual <dbl>,
#> #   temp_annual_mean <dbl>, temp_annual_min <dbl>, temp_annual_max <dbl>,
#> #   temp_january_min <dbl>, vapor_min <dbl>, vapor_max <dbl>,
#> #   canopy_cover <dbl>, lon <dbl>, lat <dbl>, land_type <fct>, county <fct>

We call this “resubstitution” or “repredicting the training set”

Dangers of overfitting ⚠️

forested_fit |>
  augment(forested_train) |>
  accuracy(forested, .pred_class)
#> # A tibble: 1 × 3
#>   .metric  .estimator .estimate
#>   <chr>    <chr>          <dbl>
#> 1 accuracy binary         0.851

We call this a “resubstitution estimate”

Dangers of overfitting ⚠️

forested_fit |>
  augment(forested_train) |>
  accuracy(forested, .pred_class)
#> # A tibble: 1 × 3
#>   .metric  .estimator .estimate
#>   <chr>    <chr>          <dbl>
#> 1 accuracy binary         0.851

Dangers of overfitting ⚠️

forested_fit |>
  augment(forested_train) |>
  accuracy(forested, .pred_class)
#> # A tibble: 1 × 3
#>   .metric  .estimator .estimate
#>   <chr>    <chr>          <dbl>
#> 1 accuracy binary         0.851
forested_fit |>
  augment(forested_test) |>
  accuracy(forested, .pred_class)
#> # A tibble: 1 × 3
#>   .metric  .estimator .estimate
#>   <chr>    <chr>          <dbl>
#> 1 accuracy binary         0.711

⚠️ Remember that we’re demonstrating overfitting

⚠️ Don’t use the test set until the end of your modeling analysis

Your turn

Use augment() and a metric function to compute a classification metric like roc_auc().

Compute the metrics for both training and testing data to demonstrate overfitting!

Notice the evidence of overfitting! ⚠️

05:00

Dangers of overfitting ⚠️

forested_fit |>
  augment(forested_train) |>
  roc_auc(forested, .pred_Yes)
#> # A tibble: 1 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 roc_auc binary         0.881
forested_fit |>
  augment(forested_test) |>
  roc_auc(forested, .pred_Yes)
#> # A tibble: 1 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 roc_auc binary         0.703

What if we want to compare more models?

And/or more model configurations?

And we want to understand if these are important differences?

The testing data are precious 💎

How can we use the training data to compare and evaluate different models? 🤔

Cross-validation

Cross-validation

Your turn

If we use 10 folds, what percent of the training data

  • ends up in analysis
  • ends up in assessment

for each fold?

03:00

Cross-validation

vfold_cv(forested_train) # v = 10 is default
#> #  10-fold cross-validation 
#> # A tibble: 10 × 2
#>    splits             id    
#>    <list>             <chr> 
#>  1 <split [7874/875]> Fold01
#>  2 <split [7874/875]> Fold02
#>  3 <split [7874/875]> Fold03
#>  4 <split [7874/875]> Fold04
#>  5 <split [7874/875]> Fold05
#>  6 <split [7874/875]> Fold06
#>  7 <split [7874/875]> Fold07
#>  8 <split [7874/875]> Fold08
#>  9 <split [7874/875]> Fold09
#> 10 <split [7875/874]> Fold10

Cross-validation

What is in this?

forested_folds <- vfold_cv(forested_train)
forested_folds$splits[1:3]
#> [[1]]
#> <Analysis/Assess/Total>
#> <7874/875/8749>
#> 
#> [[2]]
#> <Analysis/Assess/Total>
#> <7874/875/8749>
#> 
#> [[3]]
#> <Analysis/Assess/Total>
#> <7874/875/8749>

Cross-validation

vfold_cv(forested_train, v = 5)
#> #  5-fold cross-validation 
#> # A tibble: 5 × 2
#>   splits              id   
#>   <list>              <chr>
#> 1 <split [6999/1750]> Fold1
#> 2 <split [6999/1750]> Fold2
#> 3 <split [6999/1750]> Fold3
#> 4 <split [6999/1750]> Fold4
#> 5 <split [7000/1749]> Fold5

Cross-validation

We’ll use this setup:

set.seed(123)
forested_folds <- vfold_cv(forested_train, v = 10)
forested_folds
#> #  10-fold cross-validation 
#> # A tibble: 10 × 2
#>    splits             id    
#>    <list>             <chr> 
#>  1 <split [7874/875]> Fold01
#>  2 <split [7874/875]> Fold02
#>  3 <split [7874/875]> Fold03
#>  4 <split [7874/875]> Fold04
#>  5 <split [7874/875]> Fold05
#>  6 <split [7874/875]> Fold06
#>  7 <split [7874/875]> Fold07
#>  8 <split [7874/875]> Fold08
#>  9 <split [7874/875]> Fold09
#> 10 <split [7875/874]> Fold10

Set the seed when creating resamples

We are equipped with metrics and resamples!

Fit our model to the resamples

forested_res <- fit_resamples(forested_wflow, forested_folds)
forested_res
#> # Resampling results
#> # 10-fold cross-validation 
#> # A tibble: 10 × 4
#>    splits             id     .metrics         .notes          
#>    <list>             <chr>  <list>           <list>          
#>  1 <split [7874/875]> Fold01 <tibble [3 × 4]> <tibble [0 × 4]>
#>  2 <split [7874/875]> Fold02 <tibble [3 × 4]> <tibble [0 × 4]>
#>  3 <split [7874/875]> Fold03 <tibble [3 × 4]> <tibble [0 × 4]>
#>  4 <split [7874/875]> Fold04 <tibble [3 × 4]> <tibble [0 × 4]>
#>  5 <split [7874/875]> Fold05 <tibble [3 × 4]> <tibble [0 × 4]>
#>  6 <split [7874/875]> Fold06 <tibble [3 × 4]> <tibble [0 × 4]>
#>  7 <split [7874/875]> Fold07 <tibble [3 × 4]> <tibble [0 × 4]>
#>  8 <split [7874/875]> Fold08 <tibble [3 × 4]> <tibble [0 × 4]>
#>  9 <split [7874/875]> Fold09 <tibble [3 × 4]> <tibble [0 × 4]>
#> 10 <split [7875/874]> Fold10 <tibble [3 × 4]> <tibble [0 × 4]>

Evaluating model performance

forested_res |>
  collect_metrics()
#> # A tibble: 3 × 6
#>   .metric     .estimator  mean     n std_err .config        
#>   <chr>       <chr>      <dbl> <int>   <dbl> <chr>          
#> 1 accuracy    binary     0.704    10 0.00653 pre0_mod0_post0
#> 2 brier_class binary     0.214    10 0.00349 pre0_mod0_post0
#> 3 roc_auc     binary     0.692    10 0.00496 pre0_mod0_post0

We can reliably measure performance using only the training data 🎉

Comparing metrics

How do the metrics from resampling compare to the metrics from training and testing?

forested_res |>
  collect_metrics() |> 
  select(.metric, mean, n)
#> # A tibble: 3 × 3
#>   .metric      mean     n
#>   <chr>       <dbl> <int>
#> 1 accuracy    0.704    10
#> 2 brier_class 0.214    10
#> 3 roc_auc     0.692    10

The ROC AUC previously was

  • 0.88 for the training set
  • 0.7 for test set

Remember that:

⚠️ the training set gives you overly optimistic metrics

⚠️ the test set is precious

Evaluating model performance

# Save the assessment set results
ctrl_forested <- control_resamples(save_pred = TRUE)
forested_res <- fit_resamples(forested_wflow, forested_folds, control = ctrl_forested)

forested_res
#> # Resampling results
#> # 10-fold cross-validation 
#> # A tibble: 10 × 5
#>    splits             id     .metrics         .notes           .predictions
#>    <list>             <chr>  <list>           <list>           <list>      
#>  1 <split [7874/875]> Fold01 <tibble [3 × 4]> <tibble [0 × 4]> <tibble>    
#>  2 <split [7874/875]> Fold02 <tibble [3 × 4]> <tibble [0 × 4]> <tibble>    
#>  3 <split [7874/875]> Fold03 <tibble [3 × 4]> <tibble [0 × 4]> <tibble>    
#>  4 <split [7874/875]> Fold04 <tibble [3 × 4]> <tibble [0 × 4]> <tibble>    
#>  5 <split [7874/875]> Fold05 <tibble [3 × 4]> <tibble [0 × 4]> <tibble>    
#>  6 <split [7874/875]> Fold06 <tibble [3 × 4]> <tibble [0 × 4]> <tibble>    
#>  7 <split [7874/875]> Fold07 <tibble [3 × 4]> <tibble [0 × 4]> <tibble>    
#>  8 <split [7874/875]> Fold08 <tibble [3 × 4]> <tibble [0 × 4]> <tibble>    
#>  9 <split [7874/875]> Fold09 <tibble [3 × 4]> <tibble [0 × 4]> <tibble>    
#> 10 <split [7875/874]> Fold10 <tibble [3 × 4]> <tibble [0 × 4]> <tibble>

Evaluating model performance

# Save the assessment set results
forested_preds <- collect_predictions(forested_res)
forested_preds
#> # A tibble: 8,749 × 7
#>    .pred_class .pred_Yes .pred_No id     forested  .row .config        
#>    <fct>           <dbl>    <dbl> <chr>  <fct>    <int> <chr>          
#>  1 Yes             0.953   0.0473 Fold01 No           2 pre0_mod0_post0
#>  2 Yes             0.6     0.4    Fold01 No           6 pre0_mod0_post0
#>  3 Yes             0.848   0.152  Fold01 Yes          7 pre0_mod0_post0
#>  4 Yes             0.941   0.0588 Fold01 Yes         36 pre0_mod0_post0
#>  5 Yes             0.895   0.105  Fold01 No          38 pre0_mod0_post0
#>  6 Yes             1       0      Fold01 Yes         58 pre0_mod0_post0
#>  7 No              0.187   0.812  Fold01 No          69 pre0_mod0_post0
#>  8 Yes             0.905   0.0952 Fold01 Yes         71 pre0_mod0_post0
#>  9 Yes             0.907   0.0930 Fold01 Yes         74 pre0_mod0_post0
#> 10 Yes             0.904   0.0962 Fold01 Yes         80 pre0_mod0_post0
#> # ℹ 8,739 more rows

Evaluating model performance

forested_preds |> 
  group_by(id) |>
  forested_metrics(truth = forested, estimate = .pred_class)
#> # A tibble: 30 × 4
#>    id     .metric  .estimator .estimate
#>    <chr>  <chr>    <chr>          <dbl>
#>  1 Fold01 accuracy binary         0.729
#>  2 Fold02 accuracy binary         0.685
#>  3 Fold03 accuracy binary         0.687
#>  4 Fold04 accuracy binary         0.711
#>  5 Fold05 accuracy binary         0.736
#>  6 Fold06 accuracy binary         0.670
#>  7 Fold07 accuracy binary         0.699
#>  8 Fold08 accuracy binary         0.710
#>  9 Fold09 accuracy binary         0.699
#> 10 Fold10 accuracy binary         0.719
#> # ℹ 20 more rows

Where are the fitted models?

forested_res
#> # Resampling results
#> # 10-fold cross-validation 
#> # A tibble: 10 × 5
#>    splits             id     .metrics         .notes           .predictions
#>    <list>             <chr>  <list>           <list>           <list>      
#>  1 <split [7874/875]> Fold01 <tibble [3 × 4]> <tibble [0 × 4]> <tibble>    
#>  2 <split [7874/875]> Fold02 <tibble [3 × 4]> <tibble [0 × 4]> <tibble>    
#>  3 <split [7874/875]> Fold03 <tibble [3 × 4]> <tibble [0 × 4]> <tibble>    
#>  4 <split [7874/875]> Fold04 <tibble [3 × 4]> <tibble [0 × 4]> <tibble>    
#>  5 <split [7874/875]> Fold05 <tibble [3 × 4]> <tibble [0 × 4]> <tibble>    
#>  6 <split [7874/875]> Fold06 <tibble [3 × 4]> <tibble [0 × 4]> <tibble>    
#>  7 <split [7874/875]> Fold07 <tibble [3 × 4]> <tibble [0 × 4]> <tibble>    
#>  8 <split [7874/875]> Fold08 <tibble [3 × 4]> <tibble [0 × 4]> <tibble>    
#>  9 <split [7874/875]> Fold09 <tibble [3 × 4]> <tibble [0 × 4]> <tibble>    
#> 10 <split [7875/874]> Fold10 <tibble [3 × 4]> <tibble [0 × 4]> <tibble>

🗑️

Alternate resampling schemes

Bootstrapping

Bootstrapping

set.seed(3214)
bootstraps(forested_train)
#> # Bootstrap sampling 
#> # A tibble: 25 × 2
#>    splits              id         
#>    <list>              <chr>      
#>  1 <split [8749/3218]> Bootstrap01
#>  2 <split [8749/3264]> Bootstrap02
#>  3 <split [8749/3220]> Bootstrap03
#>  4 <split [8749/3208]> Bootstrap04
#>  5 <split [8749/3230]> Bootstrap05
#>  6 <split [8749/3197]> Bootstrap06
#>  7 <split [8749/3193]> Bootstrap07
#>  8 <split [8749/3226]> Bootstrap08
#>  9 <split [8749/3243]> Bootstrap09
#> 10 <split [8749/3233]> Bootstrap10
#> # ℹ 15 more rows

The whole game - status update

Your turn

Create:

  • Monte Carlo Cross-Validation sets
  • validation set

(use the reference guide to find the functions)

Don’t forget to set a seed when you resample!

05:00

Monte Carlo Cross-Validation

set.seed(322)
mc_cv(forested_train, times = 10)
#> # Monte Carlo cross-validation (0.75/0.25) with 10 resamples 
#> # A tibble: 10 × 2
#>    splits              id        
#>    <list>              <chr>     
#>  1 <split [6561/2188]> Resample01
#>  2 <split [6561/2188]> Resample02
#>  3 <split [6561/2188]> Resample03
#>  4 <split [6561/2188]> Resample04
#>  5 <split [6561/2188]> Resample05
#>  6 <split [6561/2188]> Resample06
#>  7 <split [6561/2188]> Resample07
#>  8 <split [6561/2188]> Resample08
#>  9 <split [6561/2188]> Resample09
#> 10 <split [6561/2188]> Resample10

Validation set

set.seed(853)
forested_val_split <- initial_validation_split(forested)
validation_set(forested_val_split)
#> # A tibble: 1 × 2
#>   splits              id        
#>   <list>              <chr>     
#> 1 <split [4264/1421]> validation

A validation set is just another type of resample

Decision tree 🌳

Random forest 🌳🌲🌴🌵🌴🌳🌳🌴🌲🌵🌴🌲🌳🌴🌳🌵🌵🌴🌲🌲🌳🌴🌳🌴🌲🌴🌵🌴🌲🌴🌵🌲🌵🌴🌲🌳🌴🌵🌳🌴🌳

Random forest 🌳🌲🌴🌵🌳🌳🌴🌲🌵🌴🌳🌵

  • Ensemble many decision tree models

  • All the trees vote! 🗳️

  • Bootstrap aggregating + random predictor sampling

  • Often works well without tuning hyperparameters (more on this later!), as long as there are enough trees

Create a random forest model

rf_spec <- rand_forest(trees = 1000, mode = "classification")
rf_spec
#> Random Forest Model Specification (classification)
#> 
#> Main Arguments:
#>   trees = 1000
#> 
#> Computational engine: ranger

Create a random forest model

rf_wflow <- workflow(forested ~ ., rf_spec)
rf_wflow
#> ══ Workflow ══════════════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: rand_forest()
#> 
#> ── Preprocessor ──────────────────────────────────────────────────────
#> forested ~ .
#> 
#> ── Model ─────────────────────────────────────────────────────────────
#> Random Forest Model Specification (classification)
#> 
#> Main Arguments:
#>   trees = 1000
#> 
#> Computational engine: ranger

Your turn

Use fit_resamples() and rf_wflow to:

  • keep predictions
  • compute metrics
08:00

Evaluating model performance

ctrl_forested <- control_resamples(save_pred = TRUE)

# Random forest uses random numbers so set the seed first

set.seed(2)
rf_res <- fit_resamples(rf_wflow, forested_folds, control = ctrl_forested)
collect_metrics(rf_res)
#> # A tibble: 3 × 6
#>   .metric     .estimator  mean     n std_err .config        
#>   <chr>       <chr>      <dbl> <int>   <dbl> <chr>          
#> 1 accuracy    binary     0.755    10 0.00482 pre0_mod0_post0
#> 2 brier_class binary     0.167    10 0.00321 pre0_mod0_post0
#> 3 roc_auc     binary     0.757    10 0.0103  pre0_mod0_post0

The whole game - status update

The final fit

Suppose that we are happy with our random forest model.

Let’s fit the model on the training set and verify our performance using the test set.

We’ve shown you fit() and predict() (+ augment()) but there is a shortcut:

# forested_split has train + test info
final_fit <- last_fit(rf_wflow, forested_split) 

final_fit
#> # Resampling results
#> # Manual resampling 
#> # A tibble: 1 × 6
#>   splits              id               .metrics .notes   .predictions .workflow 
#>   <list>              <chr>            <list>   <list>   <list>       <list>    
#> 1 <split [8749/2188]> train/test split <tibble> <tibble> <tibble>     <workflow>

What is in final_fit?

collect_metrics(final_fit)
#> # A tibble: 3 × 4
#>   .metric     .estimator .estimate .config        
#>   <chr>       <chr>          <dbl> <chr>          
#> 1 accuracy    binary         0.761 pre0_mod0_post0
#> 2 roc_auc     binary         0.761 pre0_mod0_post0
#> 3 brier_class binary         0.162 pre0_mod0_post0

These are metrics computed with the test set

What is in final_fit?

collect_predictions(final_fit)
#> # A tibble: 2,188 × 7
#>    .pred_class .pred_Yes .pred_No id               forested  .row .config       
#>    <fct>           <dbl>    <dbl> <chr>            <fct>    <int> <chr>         
#>  1 Yes             0.957   0.0429 train/test split Yes          4 pre0_mod0_pos…
#>  2 Yes             0.810   0.190  train/test split Yes          8 pre0_mod0_pos…
#>  3 Yes             0.826   0.174  train/test split Yes         10 pre0_mod0_pos…
#>  4 Yes             0.842   0.158  train/test split Yes         19 pre0_mod0_pos…
#>  5 Yes             0.863   0.137  train/test split Yes         23 pre0_mod0_pos…
#>  6 Yes             0.585   0.415  train/test split No          28 pre0_mod0_pos…
#>  7 Yes             0.847   0.153  train/test split Yes         34 pre0_mod0_pos…
#>  8 Yes             0.605   0.395  train/test split No          35 pre0_mod0_pos…
#>  9 Yes             0.964   0.0362 train/test split Yes         38 pre0_mod0_pos…
#> 10 Yes             0.926   0.0744 train/test split Yes         40 pre0_mod0_pos…
#> # ℹ 2,178 more rows

What is in final_fit?

extract_workflow(final_fit)
#> ══ Workflow [trained] ════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: rand_forest()
#> 
#> ── Preprocessor ──────────────────────────────────────────────────────
#> forested ~ .
#> 
#> ── Model ─────────────────────────────────────────────────────────────
#> Ranger result
#> 
#> Call:
#>  ranger::ranger(x = maybe_data_frame(x), y = y, num.trees = ~1000,      num.threads = 1, verbose = FALSE, seed = sample.int(10^5,          1), probability = TRUE) 
#> 
#> Type:                             Probability estimation 
#> Number of trees:                  1000 
#> Sample size:                      8749 
#> Number of independent variables:  18 
#> Mtry:                             4 
#> Target node size:                 10 
#> Variable importance mode:         none 
#> Splitrule:                        gini 
#> OOB prediction error (Brier s.):  0.1671606

Use this for prediction on new data, like for deploying

The whole game