`03:00`

Introduction to tidymodels

*How do you fit a linear model in R?*

*How many different ways can you think of?*

`03:00`

`lm`

for linear model`glmnet`

for regularized regression`keras`

for regression using TensorFlow`stan`

for Bayesian regression`spark`

for large data sets`brulee`

for regression using torch

- Choose a
__model__ - Specify an engine
- Set the mode

- Choose a model
- Specify an
__engine__ - Set the mode

- Choose a model
- Specify an engine
- Set the
__mode__

All available models are listed at https://www.tidymodels.org/find/parsnip/

- Choose a
__model__ - Specify an
__engine__ - Set the
__mode__

*Run the tree_spec chunk in your .qmd.*

*Edit this code to use a logistic regression model.*

All available models are listed at https://www.tidymodels.org/find/parsnip/

*Extension/Challenge: Edit this code to use a different model. For example, try using a conditional inference tree as implemented in the partykit package by changing the engine - or try an entirely different model type!*

`05:00`

- Logistic regression
- Decision trees

- Logit of outcome probability modeled as linear combination of predictors:

\(log(\frac{p}{1 - p}) = \beta_0 + \beta_1\cdot \text{A}\)

- Find a sigmoid line that separates the two classes

Series of splits or if/then statements based on predictors

First the tree

*grows*until some condition is met (maximum depth, no more data)Then the tree is

*pruned*to reduce its complexity

`workflow()`

? - Workflows handle new data better than base R tools in terms of new factor levels

- You can use other preprocessors besides formulas (more on feature engineering in Advanced tidymodels!)

- They can help organize your work when working with multiple models

__Most importantly__, a workflow captures the entire modeling process:`fit()`

and`predict()`

apply to the preprocessing steps in addition to the actual model fit

```
tree_spec <-
decision_tree() %>%
set_mode("classification")
tree_spec %>%
fit(forested ~ ., data = forested_train)
#> parsnip model object
#>
#> n= 5685
#>
#> node), split, n, loss, yval, (yprob)
#> * denotes terminal node
#>
#> 1) root 5685 2550 Yes (0.55145119 0.44854881)
#> 2) land_type=Tree 3064 300 Yes (0.90208877 0.09791123) *
#> 3) land_type=Barren,Non-tree vegetation 2621 371 No (0.14154903 0.85845097)
#> 6) temp_annual_max< 13.395 347 153 Yes (0.55907781 0.44092219)
#> 12) tree_no_tree=Tree 92 6 Yes (0.93478261 0.06521739) *
#> 13) tree_no_tree=No tree 255 108 No (0.42352941 0.57647059) *
#> 7) temp_annual_max>=13.395 2274 177 No (0.07783641 0.92216359) *
```

```
tree_spec <-
decision_tree() %>%
set_mode("classification")
workflow() %>%
add_formula(forested ~ .) %>%
add_model(tree_spec) %>%
fit(data = forested_train)
#> ══ Workflow [trained] ════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: decision_tree()
#>
#> ── Preprocessor ──────────────────────────────────────────────────────
#> forested ~ .
#>
#> ── Model ─────────────────────────────────────────────────────────────
#> n= 5685
#>
#> node), split, n, loss, yval, (yprob)
#> * denotes terminal node
#>
#> 1) root 5685 2550 Yes (0.55145119 0.44854881)
#> 2) land_type=Tree 3064 300 Yes (0.90208877 0.09791123) *
#> 3) land_type=Barren,Non-tree vegetation 2621 371 No (0.14154903 0.85845097)
#> 6) temp_annual_max< 13.395 347 153 Yes (0.55907781 0.44092219)
#> 12) tree_no_tree=Tree 92 6 Yes (0.93478261 0.06521739) *
#> 13) tree_no_tree=No tree 255 108 No (0.42352941 0.57647059) *
#> 7) temp_annual_max>=13.395 2274 177 No (0.07783641 0.92216359) *
```

```
tree_spec <-
decision_tree() %>%
set_mode("classification")
workflow(forested ~ ., tree_spec) %>%
fit(data = forested_train)
#> ══ Workflow [trained] ════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: decision_tree()
#>
#> ── Preprocessor ──────────────────────────────────────────────────────
#> forested ~ .
#>
#> ── Model ─────────────────────────────────────────────────────────────
#> n= 5685
#>
#> node), split, n, loss, yval, (yprob)
#> * denotes terminal node
#>
#> 1) root 5685 2550 Yes (0.55145119 0.44854881)
#> 2) land_type=Tree 3064 300 Yes (0.90208877 0.09791123) *
#> 3) land_type=Barren,Non-tree vegetation 2621 371 No (0.14154903 0.85845097)
#> 6) temp_annual_max< 13.395 347 153 Yes (0.55907781 0.44092219)
#> 12) tree_no_tree=Tree 92 6 Yes (0.93478261 0.06521739) *
#> 13) tree_no_tree=No tree 255 108 No (0.42352941 0.57647059) *
#> 7) temp_annual_max>=13.395 2274 177 No (0.07783641 0.92216359) *
```

*Run the tree_wflow chunk in your .qmd.*

*Edit this code to make a workflow with your own model of choice.*

*Extension/Challenge: Other than formulas, what kinds of preprocessors are supported?*

`05:00`

How do you use your new `tree_fit`

model?

*Run:*

`predict(tree_fit, new_data = forested_test)`

*What do you notice about the structure of the result?*

`03:00`

*Run:*

`augment(tree_fit, new_data = forested_test)`

*How does the output compare to the output from predict()?*

`03:00`

- The predictions will always be inside a
**tibble** - The column names and types are
**unsurprising**and**predictable** - The number of rows in
`new_data`

and the output**are the same**

How do you **understand** your new `tree_fit`

model?

How do you **understand** your new `tree_fit`

model?

You can `extract_*()`

several components of your fitted workflow.

⚠️ *Never predict() with any extracted components!*

How do you **understand** your new `tree_fit`

model?

You can use your fitted workflow for model and/or prediction explanations:

- overall variable importance, such as with the vip package

- flexible model explainers, such as with the DALEXtra package

Learn more at https://www.tmwr.org/explain.html

*Extract the model engine object from your fitted workflow and check it out.*

`05:00`