install.packages(c("Cubist", "DALEXtra", "doParallel", "earth", "embed",
"forcats", "lme4", "parallelly", "ranger", "remotes", "rpart",
"rpart.plot", "rules", "stacks", "tidymodels",
"vetiver", "xgboost"))
::install_github("topepo/ongoal@hockeyR") remotes
Welcome
These are the materials for workshops on tidymodels presented in Reykjavík Iceland. This workshop provides an introduction to machine learning with R using the tidymodels framework, a collection of packages for modeling and machine learning using tidyverse principles. We will build, evaluate, compare, and tune predictive models. Along the way, we’ll learn about key concepts in machine learning including overfitting, resampling, and feature engineering. Learners will gain knowledge about good predictive modeling practices, as well as hands-on experience using tidymodels packages like parsnip, rsample, recipes, yardstick, tune, and workflows.
Is this workshop for me?
This course assumes intermediate R knowledge. This workshop is for you if:
- You can use the magrittr pipe
%>%
and/or native pipe|>
- You are familiar with functions from dplyr, tidyr, and ggplot2
- You can read data into R, transform and reshape data, and make a wide variety of graphs
We expect participants to have some exposure to basic statistical concepts, but NOT intermediate or expert familiarity with modeling or machine learning.
Preparation
Please join the workshop with a computer that has the following installed (all available for free):
- A recent version of R, available at https://cran.r-project.org/
- A recent version of RStudio Desktop (RStudio Desktop Open Source License, at least v2022.02), available at https://www.rstudio.com/download
- The following R packages, which you can install from the R console:
Slides
These slides are designed to use with live teaching and are published for workshop participants’ convenience. There are not meant as standalone learning materials. For that, we recommend tidymodels.org and Tidy Modeling with R.
Day One
- 01: Introduction
- 02: Your data budget
- 03: What makes a model?
- 04: Evaluating models
Day Two
- 05: Feature engineering
- 06: Tuning hyperparameters
- 07: Transit Case Study
- 08: Wrapping up
There’s also a page for slide annotations; these are extra notes for selected slides.
Code
Quarto files (version 1.4.104) for working along are available on GitHub. (Don’t worry if you haven’t used Quarto before; it will feel familiar to R Markdown users.)
Past workshops
Acknowledgments
This website, including the slides, is made with Quarto. Please submit an issue on the GitHub repo for this workshop if you find something that could be fixed or improved.
Reuse and licensing
Unless otherwise noted (i.e. not an original creation and reused from another source), these educational materials are licensed under Creative Commons Attribution CC BY-SA 4.0.