Machine learning with tidymodels

Welcome!

Who are you?

You can use the magrittr %>% or base R |> pipe
You are familiar with functions from dplyr, tidyr, ggplot2
You have exposure to basic statistical concepts
You do not need intermediate or expert familiarity with modeling or ML

Who are tidymodels?

Simon Couch
Hannah Frick
Emil Hvitfeldt
Max Kuhn

Many thanks to Davis Vaughan, Julia Silge, David Robinson, Julie Jung, Alison Hill, and Desirée De Leon for their role in creating these materials!

Asking for help

🟪 “I’m stuck and need help!”

🟩 “I finished the exercise”

👀

Tentative plan for this workshop

Today:
- Your data budget
- What makes a model
- Evaluating models

Tomorrow:
- Feature engineering
- Tuning hyperparameters
- Racing methods
- Iterative search methods

Introduce yourself to your neighbors 👋

Check Slack (#ml-ws-2023) for an RStudio Cloud link.

What is machine learning?

Your turn

How are statistics and machine learning related?

How are they similar? Different?

03:00

What is tidymodels?

library(tidymodels)
#> ── Attaching packages ──────────────────────────── tidymodels 1.1.0 ──
#> ✔ broom        1.0.5          ✔ rsample      1.1.1.9000
#> ✔ dials        1.2.0          ✔ tibble       3.2.1     
#> ✔ dplyr        1.1.2          ✔ tidyr        1.3.0     
#> ✔ infer        1.0.4          ✔ tune         1.1.1.9001
#> ✔ modeldata    1.1.0          ✔ workflows    1.1.3     
#> ✔ parsnip      1.1.0.9003     ✔ workflowsets 1.0.1     
#> ✔ purrr        1.0.1          ✔ yardstick    1.2.0.9001
#> ✔ recipes      1.0.6
#> ── Conflicts ─────────────────────────────── tidymodels_conflicts() ──
#> ✖ purrr::discard() masks scales::discard()
#> ✖ dplyr::filter()  masks stats::filter()
#> ✖ dplyr::lag()     masks stats::lag()
#> ✖ recipes::step()  masks stats::step()
#> • Use tidymodels_prefer() to resolve common conflicts.

The whole game

Part of any modelling process is

Splitting your data into training and test set
Using a resampling scheme
Fitting models
Assessing performance
Choosing a model
Fitting and assessing the final model

The whole game

Let’s install some packages

If you are using your own laptop instead of RStudio Cloud:

install.packages("pak")

pkgs <- c("bonsai", "doParallel", "embed", "finetune", "lightgbm", "lme4", 
          "parallelly", "plumber", "probably", "ranger", "rpart", "rpart.plot", 
          "stacks", "textrecipes", "tidymodels", "tidymodels/modeldatatoo", 
          "vetiver")
pak::pak(pkgs)

Check Slack (#ml-ws-2023) for an RStudio Cloud link.

Our versions

bonsai (0.2.1.9000, Github (tidymodels/bonsai@aab79), broom (1.0.5, local), dials (1.2.0, CRAN), doParallel (1.0.17, CRAN), dplyr (1.1.2, CRAN), embed (1.0.0, CRAN), finetune (1.1.0.9000, Github (tidymodels/finetune@52d), ggplot2 (3.4.2, CRAN), lightgbm (3.3.5, CRAN), lme4 (1.1-33, CRAN), modeldata (1.1.0, CRAN), modeldatatoo (0.1.0.9000, Github (tidymodels/modeldatatoo), parallelly (1.36.0, CRAN), parsnip (1.1.0.9003, Github (tidymodels/parsnip@e627), plumber (1.2.1, CRAN), probably (1.0.2, CRAN), purrr (1.0.1, CRAN), ranger (0.15.1, CRAN), recipes (1.0.6, CRAN), rpart (4.1.19, CRAN), rpart.plot (3.1.1, CRAN), rsample (1.1.1.9000, Github (tidymodels/rsample@afc4), scales (1.2.1, CRAN), stacks (1.0.2.9000, local), textrecipes (1.0.2, CRAN), tibble (3.2.1, CRAN), tidymodels (1.1.0, CRAN), tidyr (1.3.0, CRAN), tune (1.1.1.9001, Github (tidymodels/tune@fea8b02), vetiver (0.2.0, CRAN), workflows (1.1.3, CRAN), workflowsets (1.0.1, CRAN), yardstick (1.2.0.9001, Github (tidymodels/yardstick@6c), and Quarto (1.3.433)

1 - Introduction

Who are you?

Who are tidymodels?

Asking for help

👀

Tentative plan for this workshop

Introduce yourself to your neighbors 👋

What is machine learning?

What is machine learning?

What is machine learning?

Your turn

What is tidymodels?

The whole game

The whole game

The whole game

The whole game

The whole game

The whole game

The whole game

The whole game

Let’s install some packages

Our versions