1 - Introducción

Introducciendo Tidymodels

¡Bienvenidos!

Nombre de la red Wi-Fi

TODO-ADD-LATER

Contraseña de la red Wi-Fi

TODO-ADD-LATER

Normas del taller

Código de conducta: TODO-ADD-LATER

Quien eres?

Sabe utilizar la “pipa” de magritr (%>%) o R (|>)
Conoce las funciones de dplyr, tidyr y ggplot2
Entiende conceptos estadísticos básicos
No necesitará ser experto en modelaje o aprendizaje automático

El equipo Tidymodels

Simon Couch
Hannah Frick
Emil Hvitfeldt
Max Kuhn

Agradecimientos especiales para: Davis Vaughan, Julia Silge, David Robinson, Julie Jung, Alison Hill y Desirée De Leon

👀

Que planeamos hacer en este taller

Tu presupuesto de datos
Las partes de un modelo
Evaluar modelos
Afinar modelos

Salude a sus vecinos 👋

¿Que es aprendizaje automático?

¿Este es tu sistema para aprendizaje automático?
Sí, le tiramos los datos a este monton de algebra linear, y después tomamos las repuestas que salen
¿Y si las respuestas están equivocadas?
Lo empezamos a mezclar hasta que algo se ve como correcto

¿Que es aprendizaje automático?

flowchart TB
  au[Aprendizaje\nAutomático\nClásico]
  sp[Supervisada]
  au--Datos numéricos o categóricos-->sp
  us[No supervisada]
  au--Datos no están clasificados-->us
  cl[Clasificación]
  sp--Predice categoría-->cl
  rs[Regresión]
  sp--Predice numero-->rs
  ag[Agrupación]
  us--Divide por similitudes-->ag
  rd[Reducción\n de dimensiones]
  us--Busca dependencias\nescondidas-->rd
  as[Asociación]
  us--Identifica secuencias-->as

Tu turno

¿Como se relacionan las estadísticas y el aprendizaje automático?

¿Como se parecen? ¿Cuales son sus diferencias?

03:00

¿Que es Tidymodels?

library(tidymodels)
#> ── Attaching packages ──────────────────────────── tidymodels 1.1.1 ──
#> ✔ broom        1.0.5      ✔ rsample      1.2.0 
#> ✔ dials        1.2.1      ✔ tibble       3.2.1 
#> ✔ dplyr        1.1.4      ✔ tidyr        1.3.1 
#> ✔ infer        1.0.6      ✔ tune         1.1.2 
#> ✔ modeldata    1.3.0      ✔ workflows    1.1.4 
#> ✔ parsnip      1.2.0      ✔ workflowsets 1.0.1 
#> ✔ purrr        1.0.2      ✔ yardstick    1.3.0 
#> ✔ recipes      1.0.10
#> ── Conflicts ─────────────────────────────── tidymodels_conflicts() ──
#> ✖ purrr::discard() masks scales::discard()
#> ✖ dplyr::filter()  masks stats::filter()
#> ✖ dplyr::lag()     masks stats::lag()
#> ✖ recipes::step()  masks stats::step()
#> • Learn how to get started at https://www.tidymodels.org/start/

Expectativas del taller

El “mapa” de hoy
Procesos básicos de los modelos predictivos
Ver la ingenieria de caraterísticas (feature engineering) y afinamiento como extenciones cíclicas

Expectativas del taller

flowchart LR
  ad[Todos\nlos datos]
  style ad fill:#fff,stroke:#666,color:#000
  tr[Entrenamiento]
  style tr fill:#FBE9BF,stroke:#666,color:#000
  ts[Prueba]
  style ts fill:#E5E7FD,stroke:#666,color:#000
  ad --> tr
  ad --> ts

Expectativas del taller

flowchart LR
  ad[Todos\nlos datos]
  style ad fill:#fff,stroke:#666,color:#000
  tr[Entrenamiento]
  style tr fill:#FBE9BF,stroke:#666,color:#000
  ts[Prueba]
  style ts fill:#E5E7FD,stroke:#666,color:#000
  ad --> tr
  ad --> ts
  dt[Arbol de\nDecisión]
  style dt fill:#FDF4E3,stroke:#666,color:#000
  tr --> dt

Expectativas del taller

flowchart LR
  ad[Todos\nlos datos]
  style ad fill:#fff,stroke:#666,color:#000
  tr[Entrenamiento]
  style tr fill:#FBE9BF,stroke:#666,color:#000
  ts[Prueba]
  style ts fill:#E5E7FD,stroke:#666,color:#000
  ad --> tr
  ad --> ts
  lg[Regresión\nlogística]
  style lg fill:#FDF4E3,stroke:#666,color:#000
  tr --> lg
  dt[Arbol de\nDecisión]
  style dt fill:#FDF4E3,stroke:#666,color:#000
  tr --> dt
  rf[Bosque\nAleatorio]
  style rf fill:#FDF4E3,stroke:#666,color:#000
  tr --> rf

Expectativas del taller

flowchart LR
  ad[Todos\nlos datos]
  style ad fill:#fff,stroke:#666,color:#000
  tr[Entrenamiento]
  style tr fill:#FBE9BF,stroke:#666,color:#000
  ts[Prueba]
  style ts fill:#E5E7FD,stroke:#666,color:#000
  ad --> tr
  ad --> ts
  rs[Remuestreo]
  style rs fill:#FDF4E3,stroke:#666,color:#000
  tr --> rs
  lg[Regresión\nlogística]
  style lg fill:#FDF4E3,stroke:#666,color:#000
  rs --> lg
  dt[Arbol de\nDecisión]
  style dt fill:#FDF4E3,stroke:#666,color:#000
  rs --> dt
  rf[Bosque\nAleatorio]
  style rf fill:#FDF4E3,stroke:#666,color:#000
  rs --> rf

Expectativas del taller

flowchart LR
  ad[Todos\nlos datos]
  style ad fill:#fff,stroke:#666,color:#000
  tr[Entrenamiento]
  style tr fill:#FBE9BF,stroke:#666,color:#000
  ts[Prueba]
  style ts fill:#E5E7FD,stroke:#666,color:#000
  ad --> tr
  ad --> ts
  rs[Remuestreo]
  style rs fill:#FDF4E3,stroke:#666,color:#000
  tr --> rs
  lg[Regresión\nlogística]
  style lg fill:#FDF4E3,stroke:#666,color:#000
  rs --> lg
  dt[Arbol de\nDecisión]
  style dt fill:#FDF4E3,stroke:#666,color:#000
  rs --> dt
  rf[Bosque\nAleatorio]
  style rf fill:#FDF4E3,stroke:#666,color:#000
  rs --> rf
  sm[Seleccionar\nmodelo]
  style sm fill:#FDF4E3,stroke:#666,color:#000
  lg --> sm
  dt --> sm
  rf --> sm

Expectativas del taller

flowchart LR
  ad[Todos\nlos datos]
  style ad fill:#fff,stroke:#666,color:#000
  tr[Entrenamiento]
  style tr fill:#FBE9BF,stroke:#666,color:#000
  ts[Prueba]
  style ts fill:#E5E7FD,stroke:#666,color:#000
  ad --> tr
  ad --> ts
  rs[Remuestreo]
  style rs fill:#FDF4E3,stroke:#666,color:#000
  tr --> rs
  lg[Regresión\nlogística]
  style lg fill:#FDF4E3,stroke:#666,color:#000
  rs --> lg
  dt[Arbol de\nDecisión]
  style dt fill:#FDF4E3,stroke:#666,color:#000
  rs --> dt
  rf[Bosque\nAleatorio]
  style rf fill:#FDF4E3,stroke:#666,color:#000
  rs --> rf
  sm[Seleccionar\nmodelo]
  style sm fill:#FDF4E3,stroke:#666,color:#000
  lg --> sm
  dt --> sm
  rf --> sm
  fm[Entrenar modelo\nselecionado]
  style fm fill:#FBE9BF,stroke:#666,color:#000
  sm --> fm
  tr --> fm

Expectativas del taller

flowchart LR
  ad[Todos\nlos datos]
  style ad fill:#fff,stroke:#666,color:#000
  tr[Entrenamiento]
  style tr fill:#FBE9BF,stroke:#666,color:#000
  ts[Prueba]
  style ts fill:#E5E7FD,stroke:#666,color:#000
  ad --> tr
  ad --> ts
  rs[Remuestreo]
  style rs fill:#FDF4E3,stroke:#666,color:#000
  tr --> rs
  lg[Regresión\nlogística]
  style lg fill:#FDF4E3,stroke:#666,color:#000
  rs --> lg
  dt[Arbol de\nDecisión]
  style dt fill:#FDF4E3,stroke:#666,color:#000
  rs --> dt
  rf[Bosque\nAleatorio]
  style rf fill:#FDF4E3,stroke:#666,color:#000
  rs --> rf
  sm[Seleccionar\nmodelo]
  style sm fill:#FDF4E3,stroke:#666,color:#000
  lg --> sm
  dt --> sm
  rf --> sm
  fm[Entrenar modelo\nselecionado]
  style fm fill:#FBE9BF,stroke:#666,color:#000
  sm --> fm
  tr --> fm
  vm[Verificar la\ncalidad]
  style vm fill:#E5E7FD,stroke:#666,color:#000
  fm --> vm
  ts --> vm

Instalemos unos paquetes

pkgs <- 
  c("bonsai", "doParallel", "embed", "finetune", "lightgbm", "lme4",
    "plumber", "probably", "ranger", "rpart", "rpart.plot", "rules",
    "splines2", "stacks", "text2vec", "textrecipes", "tidymodels", 
    "vetiver", "remotes")

install.packages(pkgs)

Nuestras versiones

R version 4.3.2 (2023-10-31), Quarto (1.4.550)

package	version
bonsai	0.2.1
broom	1.0.5
dials	1.2.1
doParallel	1.0.17
dplyr	1.1.4
embed	1.1.3
finetune	1.1.0
ggplot2	3.5.0
lightgbm	4.3.0

package	version
lme4	1.1-35.1
modeldata	1.3.0
parsnip	1.2.0
plumber	1.2.1
probably	1.0.3
purrr	1.0.2
ranger	0.16.0
recipes	1.0.10
remotes	2.4.2.1

package	version
rpart	4.1.23
rpart.plot	3.1.2
rsample	1.2.0
rules	1.0.2
scales	1.3.0
splines2	0.5.1
stacks	1.0.3
text2vec	0.6.4
textrecipes	1.0.6

package	version
tibble	3.2.1
tidymodels	1.1.1
tidyr	1.3.1
tune	1.1.2
vetiver	0.2.5
workflows	1.1.4
workflowsets	1.0.1
yardstick	1.3.0