1
0
mirror of https://github.com/msberends/AMR.git synced 2026-05-31 09:41:47 +02:00
Files
AMR/reference/amr-tidymodels.md
2026-05-02 13:06:13 +00:00

5.1 KiB
Raw Blame History

AMR Extensions for Tidymodels

This family of functions allows using AMR-specific data types such as <sir> and <mic> inside tidymodels pipelines.

Usage

all_sir()

all_sir_predictors()

all_mic()

all_mic_predictors()

all_disk()

all_disk_predictors()

step_mic_log2(recipe, ..., role = NA, trained = FALSE, columns = NULL,
  skip = FALSE, id = recipes::rand_id("mic_log2"))

step_sir_numeric(recipe, ..., role = NA, trained = FALSE, columns = NULL,
  skip = FALSE, id = recipes::rand_id("sir_numeric"))

Details

You can read more in our online AMR with tidymodels introduction.

Tidyselect helpers include:

  • all_sir() and all_sir_predictors() to select <sir> columns

  • all_mic() and all_mic_predictors() to select <mic> columns

  • all_disk() and all_disk_predictors() to select <disk> columns

Pre-processing pipeline steps include:

  • step_sir_numeric() to convert SIR columns to numeric (via as.numeric()), to be used with all_sir_predictors(): "S" = 1, "I"/"SDD" = 2, "R" = 3. All other values are rendered NA. Keep this in mind for further processing, especially if the model does not allow for NA values.

  • step_mic_log2() to convert MIC columns to numeric (via as.numeric()) and apply a log2 transform, to be used with all_mic_predictors()

These steps integrate with recipes::recipe() and work like standard preprocessing steps. They are useful for preparing data for modelling, especially with classification models.

See also

recipes::recipe(), as.sir(), as.mic(), as.disk()

Examples

if (require("tidymodels")) {
  # The below approach formed the basis for this paper: DOI 10.3389/fmicb.2025.1582703
  # Presence of ESBL genes was predicted based on raw MIC values.


  # example data set in the AMR package
  esbl_isolates

  # Prepare a binary outcome and convert to ordered factor
  data <- esbl_isolates %>%
    mutate(esbl = factor(esbl, levels = c(FALSE, TRUE), ordered = TRUE))

  # Split into training and testing sets
  split <- initial_split(data)
  training_data <- training(split)
  testing_data <- testing(split)

  # Create and prep a recipe with MIC log2 transformation
  mic_recipe <- recipe(esbl ~ ., data = training_data) %>%
    # Optionally remove non-predictive variables
    remove_role(genus, old_role = "predictor") %>%
    # Apply the log2 transformation to all MIC predictors
    step_mic_log2(all_mic_predictors()) %>%
    # And apply the preparation steps
    prep()

  # View prepped recipe
  mic_recipe

  # Apply the recipe to training and testing data
  out_training <- bake(mic_recipe, new_data = NULL)
  out_testing <- bake(mic_recipe, new_data = testing_data)

  # Fit a logistic regression model
  fitted <- logistic_reg(mode = "classification") %>%
    set_engine("glm") %>%
    fit(esbl ~ ., data = out_training)

  # Generate predictions on the test set
  predictions <- predict(fitted, out_testing) %>%
    bind_cols(out_testing)

  # Evaluate predictions using standard classification metrics
  our_metrics <- metric_set(
    accuracy,
    recall,
    precision,
    sensitivity,
    specificity,
    ppv,
    npv
  )
  metrics <- our_metrics(predictions, truth = esbl, estimate = .pred_class)

  # Show performance
  metrics
}
#> Loading required package: tidymodels
#> ── Attaching packages ────────────────────────────────────── tidymodels 1.5.0 ──
#> ✔ broom        1.0.12     ✔ rsample      1.3.2 
#> ✔ dials        1.4.3      ✔ tailor       0.1.0 
#> ✔ infer        1.1.0      ✔ tidyr        1.3.2 
#> ✔ modeldata    1.5.1      ✔ tune         2.1.0 
#> ✔ parsnip      1.5.0      ✔ workflows    1.3.0 
#> ✔ purrr        1.2.2      ✔ workflowsets 1.1.1 
#> ✔ recipes      1.3.2      ✔ yardstick    1.4.0 
#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
#> ✖ purrr::discard() masks scales::discard()
#> ✖ dplyr::filter()  masks stats::filter()
#> ✖ dplyr::lag()     masks stats::lag()
#> ✖ recipes::step()  masks stats::step()
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> # A tibble: 7 × 3
#>   .metric     .estimator .estimate
#>   <chr>       <chr>          <dbl>
#> 1 accuracy    binary         0.912
#> 2 recall      binary         0.902
#> 3 precision   binary         0.917
#> 4 sensitivity binary         0.902
#> 5 specificity binary         0.922
#> 6 ppv         binary         0.917
#> 7 npv         binary         0.908