Advanced tidymodels
Welcome!
Wi-Fi network name
TODO-ADD-LATER
Wi-Fi password
TODO-ADD-LATER
You can use the magrittr %>%
or base R |>
pipe
You are familiar with functions from dplyr, tidyr, ggplot2
You have exposure to basic statistical concepts
You do not need intermediate or expert familiarity with modeling or ML
You have used some tidymodels packages
You have some experience with evaluating statistical models using resampling techniques
Many thanks to Davis Vaughan, Julia Silge, David Robinson, Julie Jung, Alison Hill, and Desirée De Leon for their role in creating these materials!
If you are using your own laptop instead of RStudio Cloud:
We’ll use data on hotels to predict the cost of a room.
The data are in the modeldata package. We’ll sample down the data and refactor some columns:
names(hotel_rates)
#> [1] "avg_price_per_room" "lead_time"
#> [3] "stays_in_weekend_nights" "stays_in_week_nights"
#> [5] "adults" "children"
#> [7] "babies" "meal"
#> [9] "country" "market_segment"
#> [11] "distribution_channel" "is_repeated_guest"
#> [13] "previous_cancellations" "previous_bookings_not_canceled"
#> [15] "reserved_room_type" "assigned_room_type"
#> [17] "booking_changes" "agent"
#> [19] "company" "days_in_waiting_list"
#> [21] "customer_type" "required_car_parking_spaces"
#> [23] "total_of_special_requests" "arrival_date_num"
#> [25] "near_christmas" "near_new_years"
#> [27] "historical_adr"
Let’s split the data into a training set (75%) and testing set (25%) using stratification:
Let’s take some time and investigate the training data. The outcome is avg_price_per_room
.
Are there any interesting characteristics of the data?
10:00
R version 4.3.3 (2024-02-29), Quarto (1.5.30)
package | version |
---|---|
bonsai | 0.2.1 |
broom | 1.0.5 |
dials | 1.2.1 |
doParallel | 1.0.17 |
dplyr | 1.1.4 |
embed | 1.1.4 |
finetune | 1.2.0 |
ggplot2 | 3.5.0 |
lightgbm | 4.3.0 |
package | version |
---|---|
lme4 | 1.1-35.3 |
modeldata | 1.3.0 |
parsnip | 1.2.1 |
plumber | 1.2.2 |
probably | 1.0.3 |
purrr | 1.0.2 |
ranger | 0.16.0 |
recipes | 1.0.10 |
remotes | 2.5.0 |
package | version |
---|---|
rpart | 4.1.23 |
rpart.plot | 3.1.2 |
rsample | 1.2.1 |
rules | 1.0.2 |
scales | 1.3.0 |
splines2 | 0.5.1 |
stacks | 1.0.4 |
text2vec | 0.6.4 |
textrecipes | 1.0.6 |
package | version |
---|---|
tibble | 3.2.1 |
tidymodels | 1.2.0 |
tidyr | 1.3.1 |
tune | 1.2.1 |
vetiver | 0.2.5 |
workflows | 1.1.4 |
workflowsets | 1.1.0 |
yardstick | 1.3.1 |