1 - Introduction

Introduction to tidymodels

Welcome!

Wi-Fi network name

Posit Conf 2023

Wi-Fi password

conf2023

Workshop policies

  • Please do not photograph people wearing red lanyards

  • There are gender-neutral bathrooms located are among the Grand Suite Bathrooms

  • There are two meditation/prayer rooms: Grand Suite 2A and 2B

  • A lactation room is located in Grand Suite 1

  • The meditation/prayer and lactation rooms are open
    Sun - Tue 7:30am - 7:00pm, Wed 8:00am - 6:00pm

Workshop policies

  • Please review the code of conduct and COVID policies, which apply to all workshops: https://posit.co/code-of-conduct/.

  • CoC site has info on how to report a problem (in person, email, phone)

Who are you?

  • You can use the magrittr %>% or base R |> pipe

  • You are familiar with functions from dplyr, tidyr, ggplot2

  • You have exposure to basic statistical concepts

  • You do not need intermediate or expert familiarity with modeling or ML

Who are tidymodels?

  • Simon Couch
  • Hannah Frick
  • Emil Hvitfeldt
  • Max Kuhn

Silvia Canelón is TAing today!

Many thanks to Davis Vaughan, Julia Silge, David Robinson, Julie Jung, Alison Hill, and Desirée De Leon for their role in creating these materials!

Asking for help

🟪 “I’m stuck and need help!”

🟩 “I finished the exercise”

👀

Plan for this workshop

  • Your data budget
  • What makes a model
  • Evaluating models
  • Tuning models

Introduce yourself to your neighbors 👋



Log in to Posit Cloud (free):

Check the workshop channel on Discord for the link!

What is machine learning?

What is machine learning?

What is machine learning?

Your turn

How are statistics and machine learning related?

How are they similar? Different?

03:00

What is tidymodels?

library(tidymodels)
#> ── Attaching packages ──────────────────────────── tidymodels 1.1.1 ──
#> ✔ broom        1.0.5     ✔ rsample      1.2.0
#> ✔ dials        1.2.0     ✔ tibble       3.2.1
#> ✔ dplyr        1.1.3     ✔ tidyr        1.3.0
#> ✔ infer        1.0.5     ✔ tune         1.1.2
#> ✔ modeldata    1.2.0     ✔ workflows    1.1.3
#> ✔ parsnip      1.1.1     ✔ workflowsets 1.0.1
#> ✔ purrr        1.0.2     ✔ yardstick    1.2.0
#> ✔ recipes      1.0.8
#> ── Conflicts ─────────────────────────────── tidymodels_conflicts() ──
#> ✖ purrr::discard() masks scales::discard()
#> ✖ dplyr::filter()  masks stats::filter()
#> ✖ dplyr::lag()     masks stats::lag()
#> ✖ recipes::step()  masks stats::step()
#> • Learn how to get started at https://www.tidymodels.org/start/

The whole game

  • Roadmap for today
  • Minimal version of predictive modeling process
  • Feature engineering and tuning as iterative extensions

The whole game

The whole game

The whole game

The whole game

The whole game

The whole game

The whole game

Let’s install some packages

If you are using your own laptop instead of Posit Cloud:

# Install the packages for the workshop
pkgs <- 
  c("bonsai", "doParallel", "embed", "finetune", "lightgbm", "lme4",
    "plumber", "probably", "ranger", "rpart", "rpart.plot", "rules",
    "splines2", "stacks", "text2vec", "textrecipes", "tidymodels", 
    "vetiver", "remotes")

install.packages(pkgs)



Or log in to Posit Cloud

Link in our Discord channel!

Our versions

R version 4.2.2 (2022-10-31), Quarto (1.4.104)

package version
bonsai 0.2.1
broom 1.0.5
dials 1.2.0
doParallel 1.0.17
dplyr 1.1.3
embed 1.1.2
finetune 1.1.0
ggplot2 3.4.3
lightgbm 3.3.5
package version
lme4 1.1-34
modeldata 1.2.0
parsnip 1.1.1
plumber 1.2.1
probably 1.0.2
purrr 1.0.2
ranger 0.15.1
recipes 1.0.8
remotes 2.4.2.1
package version
rpart 4.1.19
rpart.plot 3.1.1
rsample 1.2.0
rules 1.0.2
scales 1.2.1
splines2 0.5.1
stacks 1.0.2
text2vec 0.6.3
textrecipes 1.0.4
package version
tibble 3.2.1
tidymodels 1.1.1
tidyr 1.3.0
tune 1.1.2
vetiver 0.2.4
workflows 1.1.3
workflowsets 1.0.1
yardstick 1.2.0