1
0
mirror of https://github.com/msberends/AMR.git synced 2025-12-25 10:30:18 +01:00
Files
AMR/reference/amr-tidymodels.md
2025-12-22 08:48:41 +00:00

193 lines
6.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# AMR Extensions for Tidymodels
This family of functions allows using AMR-specific data types such as
`<sir>` and `<mic>` inside `tidymodels` pipelines.
## Usage
``` r
all_sir()
all_sir_predictors()
all_mic()
all_mic_predictors()
all_disk()
all_disk_predictors()
step_mic_log2(recipe, ..., role = NA, trained = FALSE, columns = NULL,
skip = FALSE, id = recipes::rand_id("mic_log2"))
step_sir_numeric(recipe, ..., role = NA, trained = FALSE, columns = NULL,
skip = FALSE, id = recipes::rand_id("sir_numeric"))
```
## Arguments
- recipe:
A recipe object. The step will be added to the sequence of operations
for this recipe.
- ...:
One or more selector functions to choose variables for this step. See
[`selections()`](https://recipes.tidymodels.org/reference/selections.html)
for more details.
- role:
Not used by this step since no new variables are created.
- trained:
A logical to indicate if the quantities for preprocessing have been
estimated.
- skip:
A logical. Should the step be skipped when the recipe is baked by
[`bake()`](https://recipes.tidymodels.org/reference/bake.html)? While
all operations are baked when
[`prep()`](https://recipes.tidymodels.org/reference/prep.html) is run,
some operations may not be able to be conducted on new data (e.g.
processing the outcome variable(s)). Care should be taken when using
`skip = TRUE` as it may affect the computations for subsequent
operations.
- id:
A character string that is unique to this step to identify it.
## Details
You can read more in our online [AMR with tidymodels
introduction](https://amr-for-r.org/articles/AMR_with_tidymodels.html).
Tidyselect helpers include:
- `all_sir()` and `all_sir_predictors()` to select
[`<sir>`](https://amr-for-r.org/reference/as.sir.md) columns
- `all_mic()` and `all_mic_predictors()` to select
[`<mic>`](https://amr-for-r.org/reference/as.mic.md) columns
- `all_disk()` and `all_disk_predictors()` to select
[`<disk>`](https://amr-for-r.org/reference/as.disk.md) columns
Pre-processing pipeline steps include:
- `step_sir_numeric()` to convert SIR columns to numeric (via
[`as.numeric()`](https://rdrr.io/r/base/numeric.html)), to be used
with `all_sir_predictors()`: `"S"` = 1, `"I"`/`"SDD"` = 2, `"R"` = 3.
All other values are rendered `NA`. Keep this in mind for further
processing, especially if the model does not allow for `NA` values.
- `step_mic_log2()` to convert MIC columns to numeric (via
[`as.numeric()`](https://rdrr.io/r/base/numeric.html)) and apply a
log2 transform, to be used with `all_mic_predictors()`
These steps integrate with
[`recipes::recipe()`](https://recipes.tidymodels.org/reference/recipe.html)
and work like standard preprocessing steps. They are useful for
preparing data for modelling, especially with classification models.
## See also
[`recipes::recipe()`](https://recipes.tidymodels.org/reference/recipe.html),
[`as.sir()`](https://amr-for-r.org/reference/as.sir.md),
[`as.mic()`](https://amr-for-r.org/reference/as.mic.md),
[`as.disk()`](https://amr-for-r.org/reference/as.disk.md)
## Examples
``` r
if (require("tidymodels")) {
# The below approach formed the basis for this paper: DOI 10.3389/fmicb.2025.1582703
# Presence of ESBL genes was predicted based on raw MIC values.
# example data set in the AMR package
esbl_isolates
# Prepare a binary outcome and convert to ordered factor
data <- esbl_isolates %>%
mutate(esbl = factor(esbl, levels = c(FALSE, TRUE), ordered = TRUE))
# Split into training and testing sets
split <- initial_split(data)
training_data <- training(split)
testing_data <- testing(split)
# Create and prep a recipe with MIC log2 transformation
mic_recipe <- recipe(esbl ~ ., data = training_data) %>%
# Optionally remove non-predictive variables
remove_role(genus, old_role = "predictor") %>%
# Apply the log2 transformation to all MIC predictors
step_mic_log2(all_mic_predictors()) %>%
# And apply the preparation steps
prep()
# View prepped recipe
mic_recipe
# Apply the recipe to training and testing data
out_training <- bake(mic_recipe, new_data = NULL)
out_testing <- bake(mic_recipe, new_data = testing_data)
# Fit a logistic regression model
fitted <- logistic_reg(mode = "classification") %>%
set_engine("glm") %>%
fit(esbl ~ ., data = out_training)
# Generate predictions on the test set
predictions <- predict(fitted, out_testing) %>%
bind_cols(out_testing)
# Evaluate predictions using standard classification metrics
our_metrics <- metric_set(accuracy,
recall,
precision,
sensitivity,
specificity,
ppv,
npv)
metrics <- our_metrics(predictions, truth = esbl, estimate = .pred_class)
# Show performance
metrics
}
#> Loading required package: tidymodels
#> ── Attaching packages ────────────────────────────────────── tidymodels 1.4.1 ──
#> ✔ broom 1.0.11 ✔ rsample 1.3.1
#> ✔ dials 1.4.2 ✔ tailor 0.1.0
#> ✔ infer 1.1.0 ✔ tidyr 1.3.2
#> ✔ modeldata 1.5.1 ✔ tune 2.0.1
#> ✔ parsnip 1.4.0 ✔ workflows 1.3.0
#> ✔ purrr 1.2.0 ✔ workflowsets 1.1.1
#> ✔ recipes 1.3.1 ✔ yardstick 1.3.2
#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
#> ✖ purrr::discard() masks scales::discard()
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
#> ✖ recipes::step() masks stats::step()
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> # A tibble: 7 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 accuracy binary 0.936
#> 2 recall binary 0.954
#> 3 precision binary 0.925
#> 4 sensitivity binary 0.954
#> 5 specificity binary 0.917
#> 6 ppv binary 0.925
#> 7 npv binary 0.948
```