set.seed(853)forested_val_split <-initial_validation_split(forested)validation_set(forested_val_split)#> # A tibble: 1 × 2#> splits id #> <list> <chr> #> 1 <split [4264/1421]> validation
A validation set is just another type of resample
Decision tree 🌳
Random forest 🌳🌲🌴🌵🌴🌳🌳🌴🌲🌵🌴🌲🌳🌴🌳🌵🌵🌴🌲🌲🌳🌴🌳🌴🌲🌴🌵🌴🌲🌴🌵🌲🌵🌴🌲🌳🌴🌵🌳🌴🌳
Random forest 🌳🌲🌴🌵🌳🌳🌴🌲🌵🌴🌳🌵
Ensemble many decision tree models
All the trees vote! 🗳️
Bootstrap aggregating + random predictor sampling
Often works well without tuning hyperparameters (more on this later!), as long as there are enough trees
Create a random forest model
rf_spec <-rand_forest(trees =1000, mode ="classification")rf_spec#> Random Forest Model Specification (classification)#> #> Main Arguments:#> trees = 1000#> #> Computational engine: ranger
Create a random forest model
rf_wflow <-workflow(forested ~ ., rf_spec)rf_wflow#> ══ Workflow ══════════════════════════════════════════════════════════#> Preprocessor: Formula#> Model: rand_forest()#> #> ── Preprocessor ──────────────────────────────────────────────────────#> forested ~ .#> #> ── Model ─────────────────────────────────────────────────────────────#> Random Forest Model Specification (classification)#> #> Main Arguments:#> trees = 1000#> #> Computational engine: ranger
Your turn
Use fit_resamples() and rf_wflow to:
keep predictions
compute metrics
08:00
Evaluating model performance
ctrl_forested <-control_resamples(save_pred =TRUE)# Random forest uses random numbers so set the seed firstset.seed(2)rf_res <-fit_resamples(rf_wflow, forested_folds, control = ctrl_forested)collect_metrics(rf_res)#> # A tibble: 3 × 6#> .metric .estimator mean n std_err .config #> <chr> <chr> <dbl> <int> <dbl> <chr> #> 1 accuracy binary 0.755 10 0.00482 pre0_mod0_post0#> 2 brier_class binary 0.167 10 0.00321 pre0_mod0_post0#> 3 roc_auc binary 0.757 10 0.0103 pre0_mod0_post0
The whole game - status update
The final fit
Suppose that we are happy with our random forest model.
Let’s fit the model on the training set and verify our performance using the test set.
We’ve shown you fit() and predict() (+ augment()) but there is a shortcut: