03:00
Machine learning with tidymodels
How do you fit a linear model in R?
How many different ways can you think of?
03:00
lm
for linear model
glm
for generalized linear model (e.g. logistic regression)
glmnet
for regularized regression
keras
for regression using TensorFlow
stan
for Bayesian regression
spark
for large data sets
All available models are listed at https://www.tidymodels.org/find/parsnip/
Run the tree_spec
chunk in your .qmd
.
Edit this code to use a different model.
05:00
All available models are listed at https://www.tidymodels.org/find/parsnip/
\(log(\frac{p}{1 - p}) = \beta_0 + \beta_1\cdot \text{distance}\)
Series of splits or if/then statements based on predictors
First the tree grows until some condition is met (maximum depth, no more data)
Then the tree is pruned to reduce its complexity
workflow()
? fit()
and predict()
apply to the preprocessing steps in addition to the actual model fittree_spec <-
decision_tree() %>%
set_mode("classification")
tree_spec %>%
fit(tip ~ ., data = taxi_train)
#> parsnip model object
#>
#> n= 7045
#>
#> node), split, n, loss, yval, (yprob)
#> * denotes terminal node
#>
#> 1) root 7045 2069 yes (0.70631654 0.29368346)
#> 2) company=Chicago Independents,City Service,Sun Taxi,Taxicab Insurance Agency Llc,other 4328 744 yes (0.82809612 0.17190388)
#> 4) distance< 4.615 2365 254 yes (0.89260042 0.10739958) *
#> 5) distance>=4.615 1963 490 yes (0.75038207 0.24961793)
#> 10) distance>=12.565 1069 81 yes (0.92422825 0.07577175) *
#> 11) distance< 12.565 894 409 yes (0.54250559 0.45749441)
#> 22) company=Chicago Independents,Sun Taxi,Taxicab Insurance Agency Llc 278 71 yes (0.74460432 0.25539568) *
#> 23) company=City Service,other 616 278 no (0.45129870 0.54870130)
#> 46) distance< 7.205 178 59 yes (0.66853933 0.33146067) *
#> 47) distance>=7.205 438 159 no (0.36301370 0.63698630) *
#> 3) company=Flash Cab,Taxi Affiliation Services 2717 1325 yes (0.51232978 0.48767022)
#> 6) distance< 3.235 1331 391 yes (0.70623591 0.29376409) *
#> 7) distance>=3.235 1386 452 no (0.32611833 0.67388167)
#> 14) distance>=12.39 344 90 yes (0.73837209 0.26162791) *
#> 15) distance< 12.39 1042 198 no (0.19001919 0.80998081) *
tree_spec <-
decision_tree() %>%
set_mode("classification")
workflow() %>%
add_formula(tip ~ .) %>%
add_model(tree_spec) %>%
fit(data = taxi_train)
#> ══ Workflow [trained] ════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: decision_tree()
#>
#> ── Preprocessor ──────────────────────────────────────────────────────
#> tip ~ .
#>
#> ── Model ─────────────────────────────────────────────────────────────
#> n= 7045
#>
#> node), split, n, loss, yval, (yprob)
#> * denotes terminal node
#>
#> 1) root 7045 2069 yes (0.70631654 0.29368346)
#> 2) company=Chicago Independents,City Service,Sun Taxi,Taxicab Insurance Agency Llc,other 4328 744 yes (0.82809612 0.17190388)
#> 4) distance< 4.615 2365 254 yes (0.89260042 0.10739958) *
#> 5) distance>=4.615 1963 490 yes (0.75038207 0.24961793)
#> 10) distance>=12.565 1069 81 yes (0.92422825 0.07577175) *
#> 11) distance< 12.565 894 409 yes (0.54250559 0.45749441)
#> 22) company=Chicago Independents,Sun Taxi,Taxicab Insurance Agency Llc 278 71 yes (0.74460432 0.25539568) *
#> 23) company=City Service,other 616 278 no (0.45129870 0.54870130)
#> 46) distance< 7.205 178 59 yes (0.66853933 0.33146067) *
#> 47) distance>=7.205 438 159 no (0.36301370 0.63698630) *
#> 3) company=Flash Cab,Taxi Affiliation Services 2717 1325 yes (0.51232978 0.48767022)
#> 6) distance< 3.235 1331 391 yes (0.70623591 0.29376409) *
#> 7) distance>=3.235 1386 452 no (0.32611833 0.67388167)
#> 14) distance>=12.39 344 90 yes (0.73837209 0.26162791) *
#> 15) distance< 12.39 1042 198 no (0.19001919 0.80998081) *
tree_spec <-
decision_tree() %>%
set_mode("classification")
workflow(tip ~ ., tree_spec) %>%
fit(data = taxi_train)
#> ══ Workflow [trained] ════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: decision_tree()
#>
#> ── Preprocessor ──────────────────────────────────────────────────────
#> tip ~ .
#>
#> ── Model ─────────────────────────────────────────────────────────────
#> n= 7045
#>
#> node), split, n, loss, yval, (yprob)
#> * denotes terminal node
#>
#> 1) root 7045 2069 yes (0.70631654 0.29368346)
#> 2) company=Chicago Independents,City Service,Sun Taxi,Taxicab Insurance Agency Llc,other 4328 744 yes (0.82809612 0.17190388)
#> 4) distance< 4.615 2365 254 yes (0.89260042 0.10739958) *
#> 5) distance>=4.615 1963 490 yes (0.75038207 0.24961793)
#> 10) distance>=12.565 1069 81 yes (0.92422825 0.07577175) *
#> 11) distance< 12.565 894 409 yes (0.54250559 0.45749441)
#> 22) company=Chicago Independents,Sun Taxi,Taxicab Insurance Agency Llc 278 71 yes (0.74460432 0.25539568) *
#> 23) company=City Service,other 616 278 no (0.45129870 0.54870130)
#> 46) distance< 7.205 178 59 yes (0.66853933 0.33146067) *
#> 47) distance>=7.205 438 159 no (0.36301370 0.63698630) *
#> 3) company=Flash Cab,Taxi Affiliation Services 2717 1325 yes (0.51232978 0.48767022)
#> 6) distance< 3.235 1331 391 yes (0.70623591 0.29376409) *
#> 7) distance>=3.235 1386 452 no (0.32611833 0.67388167)
#> 14) distance>=12.39 344 90 yes (0.73837209 0.26162791) *
#> 15) distance< 12.39 1042 198 no (0.19001919 0.80998081) *
Run the tree_wflow
chunk in your .qmd
.
Edit this code to make a workflow with your own model of choice.
05:00
How do you use your new tree_fit
model?
Run:
predict(tree_fit, new_data = taxi_test)
What do you get?
03:00
Run:
augment(tree_fit, new_data = taxi_test)
What do you get?
03:00
new_data
and the output are the sameHow do you understand your new tree_fit
model?
How do you understand your new tree_fit
model?
You can extract_*()
several components of your fitted workflow.
How do you understand your new tree_fit
model?
You can use your fitted workflow for model and/or prediction explanations:
Learn more at https://www.tmwr.org/explain.html
Extract the model engine object from your fitted workflow.
⚠️ Never predict()
with any extracted components!
05:00
How do you use your new tree_fit
model in production?
Learn more at https://vetiver.rstudio.com
How do you use your new model tree_fit
in production?
library(plumber)
pr() %>%
vetiver_api(v)
#> # Plumber router with 2 endpoints, 4 filters, and 1 sub-router.
#> # Use `pr_run()` on this object to start the API.
#> ├──[queryString]
#> ├──[body]
#> ├──[cookieParser]
#> ├──[sharedSecret]
#> ├──/logo
#> │ │ # Plumber static router serving from directory: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/vetiver
#> ├──/ping (GET)
#> └──/predict (POST)
Learn more at https://vetiver.rstudio.com
Run the vetiver
chunk in your .qmd
.
Check out the automated visual documentation.
05:00