1 - Introduction

Introduction to Machine Learning in R with tidymodels

Welcome!

Wi-Fi network name

TODO-ADD-LATER

Wi-Fi password

TODO-ADD-LATER

Venue information

  • There are gender neutral bathrooms located on floor LL2, next to Chicago A

  • A meditation/prayer room is located on floor LL2 in Chicago A

  • A lactation room is located on floor LL2 in Chicago B

Workshop policies

  • Please review the posit::conf code of conduct, which applies to all workshops: https://posit.co/code-of-conduct

  • CoC site has info on how to report a problem (in person, email, phone)

  • Please do not photograph people wearing red lanyards

Who are you?

  • You can use the magrittr %>% or base R |> pipe

  • You are familiar with functions from dplyr, tidyr, ggplot2

  • You have some exposure to basic statistical concepts like linear models and residuals

  • You do not need intermediate or expert familiarity with modeling or ML

Who are tidymodels?

  • Simon Couch
  • Hannah Frick
  • Emil Hvitfeldt
  • Max Kuhn

+ our TA today, Kristin Bott!

Many thanks to Davis Vaughan, Julia Silge, David Robinson, Julie Jung, Alison Hill, and DesirΓ©e De Leon for their role in creating these materials!

Introduce yourself to your neighbors πŸ‘‹



Log in to Posit Cloud (free): TODO-ADD-LATER

Asking for help

πŸŸͺ β€œI’m stuck and need help!”

🟩 β€œI finished the exercise”

Discord

  • pos.it/conf-event-portal (login)
  • Click on β€œJoin Discord, the virtual networking platform!”
  • Browse Channels -> #workshop-tidymodels

πŸ‘€

πŸ‘€

Plan for this workshop

  • Your data budget
  • What makes a model
  • Evaluating models
  • Tuning models

What is machine learning?

What is machine learning?

What is machine learning? (2025 edition)

What is machine learning?

Your turn

How are statistics and machine learning related?

How are they similar? Different?

03:00

What is tidymodels?

library(tidymodels)
#> ── Attaching packages ──────────────────────────── tidymodels 1.3.0 ──
#> βœ” broom        1.0.9     βœ” rsample      1.3.1
#> βœ” dials        1.4.2     βœ” tibble       3.3.0
#> βœ” dplyr        1.1.4     βœ” tidyr        1.3.1
#> βœ” infer        1.0.9     βœ” tune         2.0.0
#> βœ” modeldata    1.5.1     βœ” workflows    1.3.0
#> βœ” parsnip      1.3.3     βœ” workflowsets 1.1.1
#> βœ” purrr        1.1.0     βœ” yardstick    1.3.2
#> βœ” recipes      1.3.1
#> ── Conflicts ─────────────────────────────── tidymodels_conflicts() ──
#> βœ– purrr::discard() masks scales::discard()
#> βœ– dplyr::filter()  masks stats::filter()
#> βœ– dplyr::lag()     masks stats::lag()
#> βœ– recipes::step()  masks stats::step()

The whole game

  • Roadmap for today
  • Minimal version of predictive modeling process
  • Feature engineering and tuning as iterative extensions

The whole game

The whole game

The whole game

The whole game

The whole game

The whole game

The whole game

Let’s install some packages

If you are using your own laptop instead of Posit Cloud:

# Install the packages for the workshop
pkgs <- 
  c("bonsai", "Cubist", "doParallel", "earth", "embed", "finetune", 
    "lightgbm", "lme4", "parallelly", "plumber", "probably", 
    "ranger", "rpart", "rpart.plot", "rules", "splines2", "stacks", 
    "text2vec", "textrecipes", "tidymodels", "vetiver")

install.packages(pkgs)



Our versions

R version 4.5.1 (2025-06-13), Quarto (1.7.32)

package version
bonsai 0.4.0
broom 1.0.9
Cubist 0.5.0
dials 1.4.2
doParallel 1.0.17
dplyr 1.1.4
earth 5.3.4
embed 1.1.5
finetune 1.2.1
forested 0.2.0
Formula 1.2-5
package version
ggplot2 3.5.2
lattice 0.22-7
lightgbm 4.6.0
lme4 1.1-37
modeldata 1.5.1
parallelly 1.45.1
parsnip 1.3.3
plotmo 3.6.4
plotrix 3.8-4
plumber 1.3.0
probably 1.1.1
package version
purrr 1.1.0
ranger 0.17.0
recipes 1.3.1
rpart 4.1.24
rpart.plot 3.1.3
rsample 1.3.1
rules 1.0.2
scales 1.4.0
splines2 0.5.4
stacks 1.1.1
text2vec 0.6.4
package version
textrecipes 1.1.0
tibble 3.3.0
tidymodels 1.3.0
tidyr 1.3.1
tune 2.0.0
vetiver 0.2.5
workflows 1.3.0
workflowsets 1.1.1
yardstick 1.3.2