Every preprocessing step in a recipe that involved calculations uses the training set. For example:
Levels of a factor
Determination of zero-variance
Normalization
Feature extraction
Once a recipe is added to a workflow, this occurs when fit() is called.
Debugging a recipe
Typically, you will want to use a workflow to estimate and apply a recipe.
If you have an error and need to debug your recipe, the original recipe object (e.g. forested_rec) can be estimated manually with a function called prep(). It is analogous to fit(). See TMwR section 16.4.
Another function, bake(), is analogous to predict(), and gives you the processed data back.
Your turn
Take the recipe and prep() then bake() it to see what the resulting data set looks like.
Try removing steps to see how the result changes.
05:00
Printing a recipe
forested_rec#> #> ── Recipe ────────────────────────────────────────────────────────────#> #> ── Inputs#> Number of variables by role#> outcome: 1#> predictor: 18#> #> ── Operations#> • Dummy variables from: all_nominal_predictors()#> • Zero variance filter on: all_predictors()#> • Log transformation on: canopy_cover#> • Centering and scaling for: all_numeric_predictors()
Prepping a recipe
prep(forested_rec)#> #> ── Recipe ────────────────────────────────────────────────────────────#> #> ── Inputs#> Number of variables by role#> outcome: 1#> predictor: 18#> #> ── Training information#> Training data contained 8749 data points and no incomplete rows.#> #> ── Operations#> • Dummy variables from: tree_no_tree, land_type, county | Trained#> • Zero variance filter removed: <none> | Trained#> • Log transformation on: canopy_cover | Trained#> • Centering and scaling for: year elevation, ... | Trained
prep(forested_rec) %>%tidy(number =1)#> # A tibble: 161 × 3#> terms columns id #> <chr> <chr> <chr> #> 1 tree_no_tree No tree dummy_hIEnQ#> 2 land_type Non-tree vegetation dummy_hIEnQ#> 3 land_type Tree dummy_hIEnQ#> 4 county Atkinson dummy_hIEnQ#> 5 county Bacon dummy_hIEnQ#> 6 county Baker dummy_hIEnQ#> 7 county Baldwin dummy_hIEnQ#> 8 county Banks dummy_hIEnQ#> 9 county Barrow dummy_hIEnQ#> 10 county Bartow dummy_hIEnQ#> # ℹ 151 more rows
Using a recipe in tidymodels
The recommended way to use a recipe in tidymodels is to use it as part of a workflow().