set.seed(853)forested_val_split <-initial_validation_split(forested)validation_set(forested_val_split)#> # A tibble: 1 × 2#> splits id #> <list> <chr> #> 1 <split [4264/1421]> validation
A validation set is just another type of resample
Decision tree 🌳
Random forest 🌳🌲🌴🌵🌴🌳🌳🌴🌲🌵🌴🌲🌳🌴🌳🌵🌵🌴🌲🌲🌳🌴🌳🌴🌲🌴🌵🌴🌲🌴🌵🌲🌵🌴🌲🌳🌴🌵🌳🌴🌳
Random forest 🌳🌲🌴🌵🌳🌳🌴🌲🌵🌴🌳🌵
Ensemble many decision tree models
All the trees vote! 🗳️
Bootstrap aggregating + random predictor sampling
Often works well without tuning hyperparameters (more on this later!), as long as there are enough trees
Create a random forest model
rf_spec <-rand_forest(trees =1000, mode ="classification")rf_spec#> Random Forest Model Specification (classification)#> #> Main Arguments:#> trees = 1000#> #> Computational engine: ranger
Create a random forest model
rf_wflow <-workflow(forested ~ ., rf_spec)rf_wflow#> ══ Workflow ══════════════════════════════════════════════════════════#> Preprocessor: Formula#> Model: rand_forest()#> #> ── Preprocessor ──────────────────────────────────────────────────────#> forested ~ .#> #> ── Model ─────────────────────────────────────────────────────────────#> Random Forest Model Specification (classification)#> #> Main Arguments:#> trees = 1000#> #> Computational engine: ranger
Your turn
Use fit_resamples() and rf_wflow to:
keep predictions
compute metrics
08:00
Evaluating model performance
ctrl_forested <-control_resamples(save_pred =TRUE)# Random forest uses random numbers so set the seed firstset.seed(2)rf_res <-fit_resamples(rf_wflow, forested_folds, control = ctrl_forested)collect_metrics(rf_res)#> # A tibble: 3 × 6#> .metric .estimator mean n std_err .config #> <chr> <chr> <dbl> <int> <dbl> <chr> #> 1 accuracy binary 0.918 10 0.00585 Preprocessor1_Model1#> 2 brier_class binary 0.0618 10 0.00337 Preprocessor1_Model1#> 3 roc_auc binary 0.972 10 0.00309 Preprocessor1_Model1
The whole game - status update
The final fit
Suppose that we are happy with our random forest model.
Let’s fit the model on the training set and verify our performance using the test set.
We’ve shown you fit() and predict() (+ augment()) but there is a shortcut: