03:00
Introduction to tidymodels
How do you fit a linear model in R?
How many different ways can you think of?
03:00
lm
for linear model
glm
for generalized linear model (e.g. logistic regression)
glmnet
for regularized regression
keras
for regression using TensorFlow
stan
for Bayesian regression
spark
for large data sets
All available models are listed at https://www.tidymodels.org/find/parsnip/
Run the tree_spec
chunk in your .qmd
.
Edit this code to use a logistic regression model.
All available models are listed at https://www.tidymodels.org/find/parsnip/
Extension/Challenge: Edit this code to use a different model. For example, try using a conditional inference tree as implemented in the partykit package by changing the engine - or try an entirely different model type!
05:00
\(log(\frac{p}{1 - p}) = \beta_0 + \beta_1\cdot \text{A}\)
Series of splits or if/then statements based on predictors
First the tree grows until some condition is met (maximum depth, no more data)
Then the tree is pruned to reduce its complexity
workflow()
? fit()
and predict()
apply to the preprocessing steps in addition to the actual model fittree_spec <-
decision_tree(cost_complexity = 0.002) %>%
set_mode("classification")
tree_spec %>%
fit(tip ~ ., data = taxi_train)
#> parsnip model object
#>
#> n= 8000
#>
#> node), split, n, loss, yval, (yprob)
#> * denotes terminal node
#>
#> 1) root 8000 616 yes (0.92300000 0.07700000)
#> 2) distance>=14.12 2041 68 yes (0.96668300 0.03331700) *
#> 3) distance< 14.12 5959 548 yes (0.90803826 0.09196174)
#> 6) distance< 5.275 5419 450 yes (0.91695885 0.08304115) *
#> 7) distance>=5.275 540 98 yes (0.81851852 0.18148148)
#> 14) company=Chicago Independents,City Service,Sun Taxi,Taxi Affiliation Services,Taxicab Insurance Agency Llc,other 478 68 yes (0.85774059 0.14225941) *
#> 15) company=Flash Cab 62 30 yes (0.51612903 0.48387097)
#> 30) dow=Thu 12 2 yes (0.83333333 0.16666667) *
#> 31) dow=Sun,Mon,Tue,Wed,Fri,Sat 50 22 no (0.44000000 0.56000000)
#> 62) distance>=11.77 14 4 yes (0.71428571 0.28571429) *
#> 63) distance< 11.77 36 12 no (0.33333333 0.66666667) *
tree_spec <-
decision_tree(cost_complexity = 0.002) %>%
set_mode("classification")
workflow() %>%
add_formula(tip ~ .) %>%
add_model(tree_spec) %>%
fit(data = taxi_train)
#> ══ Workflow [trained] ════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: decision_tree()
#>
#> ── Preprocessor ──────────────────────────────────────────────────────
#> tip ~ .
#>
#> ── Model ─────────────────────────────────────────────────────────────
#> n= 8000
#>
#> node), split, n, loss, yval, (yprob)
#> * denotes terminal node
#>
#> 1) root 8000 616 yes (0.92300000 0.07700000)
#> 2) distance>=14.12 2041 68 yes (0.96668300 0.03331700) *
#> 3) distance< 14.12 5959 548 yes (0.90803826 0.09196174)
#> 6) distance< 5.275 5419 450 yes (0.91695885 0.08304115) *
#> 7) distance>=5.275 540 98 yes (0.81851852 0.18148148)
#> 14) company=Chicago Independents,City Service,Sun Taxi,Taxi Affiliation Services,Taxicab Insurance Agency Llc,other 478 68 yes (0.85774059 0.14225941) *
#> 15) company=Flash Cab 62 30 yes (0.51612903 0.48387097)
#> 30) dow=Thu 12 2 yes (0.83333333 0.16666667) *
#> 31) dow=Sun,Mon,Tue,Wed,Fri,Sat 50 22 no (0.44000000 0.56000000)
#> 62) distance>=11.77 14 4 yes (0.71428571 0.28571429) *
#> 63) distance< 11.77 36 12 no (0.33333333 0.66666667) *
tree_spec <-
decision_tree(cost_complexity = 0.002) %>%
set_mode("classification")
workflow(tip ~ ., tree_spec) %>%
fit(data = taxi_train)
#> ══ Workflow [trained] ════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: decision_tree()
#>
#> ── Preprocessor ──────────────────────────────────────────────────────
#> tip ~ .
#>
#> ── Model ─────────────────────────────────────────────────────────────
#> n= 8000
#>
#> node), split, n, loss, yval, (yprob)
#> * denotes terminal node
#>
#> 1) root 8000 616 yes (0.92300000 0.07700000)
#> 2) distance>=14.12 2041 68 yes (0.96668300 0.03331700) *
#> 3) distance< 14.12 5959 548 yes (0.90803826 0.09196174)
#> 6) distance< 5.275 5419 450 yes (0.91695885 0.08304115) *
#> 7) distance>=5.275 540 98 yes (0.81851852 0.18148148)
#> 14) company=Chicago Independents,City Service,Sun Taxi,Taxi Affiliation Services,Taxicab Insurance Agency Llc,other 478 68 yes (0.85774059 0.14225941) *
#> 15) company=Flash Cab 62 30 yes (0.51612903 0.48387097)
#> 30) dow=Thu 12 2 yes (0.83333333 0.16666667) *
#> 31) dow=Sun,Mon,Tue,Wed,Fri,Sat 50 22 no (0.44000000 0.56000000)
#> 62) distance>=11.77 14 4 yes (0.71428571 0.28571429) *
#> 63) distance< 11.77 36 12 no (0.33333333 0.66666667) *
Run the tree_wflow
chunk in your .qmd
.
Edit this code to make a workflow with your own model of choice.
Extension/Challenge: Other than formulas, what kinds of preprocessors are supported?
05:00
How do you use your new tree_fit
model?
Run:
predict(tree_fit, new_data = taxi_test)
What do you get?
03:00
Run:
augment(tree_fit, new_data = taxi_test)
What do you get?
03:00
new_data
and the output are the sameHow do you understand your new tree_fit
model?
How do you understand your new tree_fit
model?
You can extract_*()
several components of your fitted workflow.
⚠️ Never predict()
with any extracted components!
How do you understand your new tree_fit
model?
You can use your fitted workflow for model and/or prediction explanations:
Learn more at https://www.tmwr.org/explain.html
Extract the model engine object from your fitted workflow and check it out.
05:00