Skip to contents

This family of functions allows using AMR-specific data types such as <mic> and <sir> inside tidymodels pipelines.

Usage

all_mic()

all_mic_predictors()

all_sir()

all_sir_predictors()

step_mic_log2(recipe, ..., role = NA, trained = FALSE, columns = NULL,
  skip = FALSE, id = recipes::rand_id("mic_log2"))

step_sir_numeric(recipe, ..., role = NA, trained = FALSE, columns = NULL,
  skip = FALSE, id = recipes::rand_id("sir_numeric"))

Arguments

recipe

A recipe object. The step will be added to the sequence of operations for this recipe.

...

One or more selector functions to choose variables for this step. See selections() for more details.

role

Not used by this step since no new variables are created.

trained

A logical to indicate if the quantities for preprocessing have been estimated.

skip

A logical. Should the step be skipped when the recipe is baked by bake()? While all operations are baked when prep() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations.

id

A character string that is unique to this step to identify it.

Details

You can read more in our online AMR with tidymodels introduction.

Tidyselect helpers include:

  • all_mic() and all_mic_predictors() to select <mic> columns

  • all_sir() and all_sir_predictors() to select <sir> columns

Pre-processing pipeline steps include:

  • step_mic_log2() to convert MIC columns to numeric (via as.numeric()) and apply a log2 transform, to be used with all_mic_predictors()

  • step_sir_numeric() to convert SIR columns to numeric (via as.numeric()), to be used with all_sir_predictors(): "S" = 1, "I"/"SDD" = 2, "R" = 3. All other values are rendered NA. Keep this in mind for further processing, especially if the model does not allow for NA values.

These steps integrate with recipes::recipe() and work like standard preprocessing steps. They are useful for preparing data for modelling, especially with classification models.

Examples

if (require("tidymodels")) {

  # The below approach formed the basis for this paper: DOI 10.3389/fmicb.2025.1582703
  # Presence of ESBL genes was predicted based on raw MIC values.


  # example data set in the AMR package
  esbl_isolates

  # Prepare a binary outcome and convert to ordered factor
  data <- esbl_isolates %>%
    mutate(esbl = factor(esbl, levels = c(FALSE, TRUE), ordered = TRUE))

  # Split into training and testing sets
  split <- initial_split(data)
  training_data <- training(split)
  testing_data <- testing(split)

  # Create and prep a recipe with MIC log2 transformation
  mic_recipe <- recipe(esbl ~ ., data = training_data) %>%

    # Optionally remove non-predictive variables
    remove_role(genus, old_role = "predictor") %>%

    # Apply the log2 transformation to all MIC predictors
    step_mic_log2(all_mic_predictors()) %>%

    # And apply the preparation steps
    prep()

  # View prepped recipe
  mic_recipe

  # Apply the recipe to training and testing data
  out_training <- bake(mic_recipe, new_data = NULL)
  out_testing <- bake(mic_recipe, new_data = testing_data)

  # Fit a logistic regression model
  fitted <- logistic_reg(mode = "classification") %>%
    set_engine("glm") %>%
    fit(esbl ~ ., data = out_training)

  # Generate predictions on the test set
  predictions <- predict(fitted, out_testing) %>%
    bind_cols(out_testing)

  # Evaluate predictions using standard classification metrics
  our_metrics <- metric_set(accuracy, kap, ppv, npv)
  metrics <- our_metrics(predictions, truth = esbl, estimate = .pred_class)

  # Show performance
  metrics
}
#> Loading required package: tidymodels
#> ── Attaching packages ────────────────────────────────────── tidymodels 1.3.0 ──
#>  broom        1.0.8      rsample      1.3.0
#>  dials        1.4.0      tibble       3.3.0
#>  infer        1.0.9      tidyr        1.3.1
#>  modeldata    1.4.0      tune         1.3.0
#>  parsnip      1.3.2      workflows    1.2.0
#>  purrr        1.1.0      workflowsets 1.1.1
#>  recipes      1.3.1      yardstick    1.3.2
#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
#>  purrr::discard() masks scales::discard()
#>  dplyr::filter()  masks stats::filter()
#>  dplyr::lag()     masks stats::lag()
#>  recipes::step()  masks stats::step()
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> # A tibble: 4 × 3
#>   .metric  .estimator .estimate
#>   <chr>    <chr>          <dbl>
#> 1 accuracy binary         0.936
#> 2 kap      binary         0.872
#> 3 ppv      binary         0.925
#> 4 npv      binary         0.948