1 - Introduction

Introduction to tidymodels


Workshop policies

Who are you?

  • You can use the magrittr %>% or base R |> pipe

  • You are familiar with functions from dplyr, tidyr, ggplot2

  • You have exposure to basic statistical concepts

  • You do not need intermediate or expert familiarity with modeling or ML

Who are tidymodels?

  • Simon Couch
  • Hannah Frick
  • Emil Hvitfeldt
  • Max Kuhn

Silvia CanelΓ³n is TAing today!

Many thanks to Davis Vaughan, Julia Silge, David Robinson, Julie Jung, Alison Hill, and DesirΓ©e De Leon for their role in creating these materials!

Asking for help

πŸŸͺ β€œI’m stuck and need help!”

🟩 β€œI finished the exercise”


Plan for this workshop

  • Your data budget
  • What makes a model
  • Evaluating models
  • Tuning models

Introduce yourself to your neighbors πŸ‘‹

What is machine learning?

Your turn

How are statistics and machine learning related?

How are they similar? Different?


What is tidymodels?

#> ── Attaching packages ──────────────────────────── tidymodels 1.1.1 ──
#> βœ” broom        1.0.5     βœ” rsample      1.2.0
#> βœ” dials        1.2.0     βœ” tibble       3.2.1
#> βœ” dplyr        1.1.3     βœ” tidyr        1.3.0
#> βœ” infer        1.0.5     βœ” tune         1.1.2
#> βœ” modeldata    1.2.0     βœ” workflows    1.1.3
#> βœ” parsnip      1.1.1     βœ” workflowsets 1.0.1
#> βœ” purrr        1.0.2     βœ” yardstick    1.2.0
#> βœ” recipes      1.0.8
#> ── Conflicts ─────────────────────────────── tidymodels_conflicts() ──
#> βœ– purrr::discard() masks scales::discard()
#> βœ– dplyr::filter()  masks stats::filter()
#> βœ– dplyr::lag()     masks stats::lag()
#> βœ– recipes::step()  masks stats::step()
#> β€’ Learn how to get started at https://www.tidymodels.org/start/

The whole game

  • Roadmap for today
  • Minimal version of predictive modeling process
  • Feature engineering and tuning as iterative extensions

Let’s install some packages

If you are using your own laptop instead of Posit Cloud:

# Install the packages for the workshop
pkgs <- 
  c("bonsai", "doParallel", "embed", "finetune", "lightgbm", "lme4",
    "plumber", "probably", "ranger", "rpart", "rpart.plot", "rules",
    "splines2", "stacks", "text2vec", "textrecipes", "tidymodels", 
    "vetiver", "remotes")


Our versions

R version 4.2.2 (2022-10-31), Quarto (1.4.104)

