03:00
Introduction to tidymodels
How do you fit a linear model in R?
How many different ways can you think of?
03:00
lm
for linear model
glmnet
for regularized regression
keras
for regression using TensorFlow
stan
for Bayesian regression
spark
for large data sets
brulee
for regression using torch
All available models are listed at https://www.tidymodels.org/find/parsnip/
Run the tree_spec
chunk in your .qmd
.
Edit this code to use a logistic regression model.
All available models are listed at https://www.tidymodels.org/find/parsnip/
Extension/Challenge: Edit this code to use a different model. For example, try using a conditional inference tree as implemented in the partykit package by changing the engine - or try an entirely different model type!
05:00
\(log(\frac{p}{1 - p}) = \beta_0 + \beta_1\cdot \text{A}\)
Series of splits or if/then statements based on predictors
First the tree grows until some condition is met (maximum depth, no more data)
Then the tree is pruned to reduce its complexity
workflow()
? fit()
and predict()
apply to the preprocessing steps in addition to the actual model fittree_spec <-
decision_tree() %>%
set_mode("classification")
tree_spec %>%
fit(forested ~ ., data = forested_train)
#> parsnip model object
#>
#> n= 5685
#>
#> node), split, n, loss, yval, (yprob)
#> * denotes terminal node
#>
#> 1) root 5685 2550 Yes (0.55145119 0.44854881)
#> 2) land_type=Tree 3064 300 Yes (0.90208877 0.09791123) *
#> 3) land_type=Barren,Non-tree vegetation 2621 371 No (0.14154903 0.85845097)
#> 6) temp_annual_max< 13.395 347 153 Yes (0.55907781 0.44092219)
#> 12) tree_no_tree=Tree 92 6 Yes (0.93478261 0.06521739) *
#> 13) tree_no_tree=No tree 255 108 No (0.42352941 0.57647059) *
#> 7) temp_annual_max>=13.395 2274 177 No (0.07783641 0.92216359) *
tree_spec <-
decision_tree() %>%
set_mode("classification")
workflow() %>%
add_formula(forested ~ .) %>%
add_model(tree_spec) %>%
fit(data = forested_train)
#> ══ Workflow [trained] ════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: decision_tree()
#>
#> ── Preprocessor ──────────────────────────────────────────────────────
#> forested ~ .
#>
#> ── Model ─────────────────────────────────────────────────────────────
#> n= 5685
#>
#> node), split, n, loss, yval, (yprob)
#> * denotes terminal node
#>
#> 1) root 5685 2550 Yes (0.55145119 0.44854881)
#> 2) land_type=Tree 3064 300 Yes (0.90208877 0.09791123) *
#> 3) land_type=Barren,Non-tree vegetation 2621 371 No (0.14154903 0.85845097)
#> 6) temp_annual_max< 13.395 347 153 Yes (0.55907781 0.44092219)
#> 12) tree_no_tree=Tree 92 6 Yes (0.93478261 0.06521739) *
#> 13) tree_no_tree=No tree 255 108 No (0.42352941 0.57647059) *
#> 7) temp_annual_max>=13.395 2274 177 No (0.07783641 0.92216359) *
tree_spec <-
decision_tree() %>%
set_mode("classification")
workflow(forested ~ ., tree_spec) %>%
fit(data = forested_train)
#> ══ Workflow [trained] ════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: decision_tree()
#>
#> ── Preprocessor ──────────────────────────────────────────────────────
#> forested ~ .
#>
#> ── Model ─────────────────────────────────────────────────────────────
#> n= 5685
#>
#> node), split, n, loss, yval, (yprob)
#> * denotes terminal node
#>
#> 1) root 5685 2550 Yes (0.55145119 0.44854881)
#> 2) land_type=Tree 3064 300 Yes (0.90208877 0.09791123) *
#> 3) land_type=Barren,Non-tree vegetation 2621 371 No (0.14154903 0.85845097)
#> 6) temp_annual_max< 13.395 347 153 Yes (0.55907781 0.44092219)
#> 12) tree_no_tree=Tree 92 6 Yes (0.93478261 0.06521739) *
#> 13) tree_no_tree=No tree 255 108 No (0.42352941 0.57647059) *
#> 7) temp_annual_max>=13.395 2274 177 No (0.07783641 0.92216359) *
Run the tree_wflow
chunk in your .qmd
.
Edit this code to make a workflow with your own model of choice.
Extension/Challenge: Other than formulas, what kinds of preprocessors are supported?
05:00
How do you use your new tree_fit
model?
Run:
predict(tree_fit, new_data = forested_test)
What do you notice about the structure of the result?
03:00
Run:
augment(tree_fit, new_data = forested_test)
How does the output compare to the output from predict()
?
03:00
new_data
and the output are the sameHow do you understand your new tree_fit
model?
How do you understand your new tree_fit
model?
You can extract_*()
several components of your fitted workflow.
⚠️ Never predict()
with any extracted components!
How do you understand your new tree_fit
model?
You can use your fitted workflow for model and/or prediction explanations:
Learn more at https://www.tmwr.org/explain.html
Extract the model engine object from your fitted workflow and check it out.
05:00