# AMR Extensions for Tidymodels This family of functions allows using AMR-specific data types such as `` and `` inside `tidymodels` pipelines. ## Usage ``` r all_sir() all_sir_predictors() all_mic() all_mic_predictors() all_disk() all_disk_predictors() step_mic_log2(recipe, ..., role = NA, trained = FALSE, columns = NULL, skip = FALSE, id = recipes::rand_id("mic_log2")) step_sir_numeric(recipe, ..., role = NA, trained = FALSE, columns = NULL, skip = FALSE, id = recipes::rand_id("sir_numeric")) ``` ## Arguments - recipe: A recipe object. The step will be added to the sequence of operations for this recipe. - ...: One or more selector functions to choose variables for this step. See [`selections()`](https://recipes.tidymodels.org/reference/selections.html) for more details. - role: Not used by this step since no new variables are created. - trained: A logical to indicate if the quantities for preprocessing have been estimated. - skip: A logical. Should the step be skipped when the recipe is baked by [`bake()`](https://recipes.tidymodels.org/reference/bake.html)? While all operations are baked when [`prep()`](https://recipes.tidymodels.org/reference/prep.html) is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using `skip = TRUE` as it may affect the computations for subsequent operations. - id: A character string that is unique to this step to identify it. ## Details You can read more in our online [AMR with tidymodels introduction](https://amr-for-r.org/articles/AMR_with_tidymodels.html). Tidyselect helpers include: - `all_sir()` and `all_sir_predictors()` to select [``](https://amr-for-r.org/reference/as.sir.md) columns - `all_mic()` and `all_mic_predictors()` to select [``](https://amr-for-r.org/reference/as.mic.md) columns - `all_disk()` and `all_disk_predictors()` to select [``](https://amr-for-r.org/reference/as.disk.md) columns Pre-processing pipeline steps include: - `step_sir_numeric()` to convert SIR columns to numeric (via [`as.numeric()`](https://rdrr.io/r/base/numeric.html)), to be used with `all_sir_predictors()`: `"S"` = 1, `"I"`/`"SDD"` = 2, `"R"` = 3. All other values are rendered `NA`. Keep this in mind for further processing, especially if the model does not allow for `NA` values. - `step_mic_log2()` to convert MIC columns to numeric (via [`as.numeric()`](https://rdrr.io/r/base/numeric.html)) and apply a log2 transform, to be used with `all_mic_predictors()` These steps integrate with [`recipes::recipe()`](https://recipes.tidymodels.org/reference/recipe.html) and work like standard preprocessing steps. They are useful for preparing data for modelling, especially with classification models. ## See also [`recipes::recipe()`](https://recipes.tidymodels.org/reference/recipe.html), [`as.sir()`](https://amr-for-r.org/reference/as.sir.md), [`as.mic()`](https://amr-for-r.org/reference/as.mic.md), [`as.disk()`](https://amr-for-r.org/reference/as.disk.md) ## Examples ``` r if (require("tidymodels")) { # The below approach formed the basis for this paper: DOI 10.3389/fmicb.2025.1582703 # Presence of ESBL genes was predicted based on raw MIC values. # example data set in the AMR package esbl_isolates # Prepare a binary outcome and convert to ordered factor data <- esbl_isolates %>% mutate(esbl = factor(esbl, levels = c(FALSE, TRUE), ordered = TRUE)) # Split into training and testing sets split <- initial_split(data) training_data <- training(split) testing_data <- testing(split) # Create and prep a recipe with MIC log2 transformation mic_recipe <- recipe(esbl ~ ., data = training_data) %>% # Optionally remove non-predictive variables remove_role(genus, old_role = "predictor") %>% # Apply the log2 transformation to all MIC predictors step_mic_log2(all_mic_predictors()) %>% # And apply the preparation steps prep() # View prepped recipe mic_recipe # Apply the recipe to training and testing data out_training <- bake(mic_recipe, new_data = NULL) out_testing <- bake(mic_recipe, new_data = testing_data) # Fit a logistic regression model fitted <- logistic_reg(mode = "classification") %>% set_engine("glm") %>% fit(esbl ~ ., data = out_training) # Generate predictions on the test set predictions <- predict(fitted, out_testing) %>% bind_cols(out_testing) # Evaluate predictions using standard classification metrics our_metrics <- metric_set(accuracy, recall, precision, sensitivity, specificity, ppv, npv) metrics <- our_metrics(predictions, truth = esbl, estimate = .pred_class) # Show performance metrics } #> Loading required package: tidymodels #> ── Attaching packages ────────────────────────────────────── tidymodels 1.4.1 ── #> ✔ broom 1.0.11 ✔ rsample 1.3.1 #> ✔ dials 1.4.2 ✔ tailor 0.1.0 #> ✔ infer 1.1.0 ✔ tidyr 1.3.2 #> ✔ modeldata 1.5.1 ✔ tune 2.0.1 #> ✔ parsnip 1.4.0 ✔ workflows 1.3.0 #> ✔ purrr 1.2.0 ✔ workflowsets 1.1.1 #> ✔ recipes 1.3.1 ✔ yardstick 1.3.2 #> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ── #> ✖ purrr::discard() masks scales::discard() #> ✖ dplyr::filter() masks stats::filter() #> ✖ dplyr::lag() masks stats::lag() #> ✖ recipes::step() masks stats::step() #> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred #> # A tibble: 7 × 3 #> .metric .estimator .estimate #> #> 1 accuracy binary 0.936 #> 2 recall binary 0.954 #> 3 precision binary 0.925 #> 4 sensitivity binary 0.954 #> 5 specificity binary 0.917 #> 6 ppv binary 0.925 #> 7 npv binary 0.948 ```