(v2.1.1.9182) fix AMR selectors for tidymodels, add unit tests

2026-02-09 20:32:55 +01:00 · 2025-03-03 12:59:27 +01:00
parent b85890449d
commit 9a9468fa84
16 changed files with 84 additions and 33 deletions
--- a/vignettes/AMR_with_tidymodels.Rmd
+++ b/vignettes/AMR_with_tidymodels.Rmd
@@ -42,8 +42,8 @@ We begin by loading the required libraries and preparing the `example_isolates`

 ```{r}
 # Load required libraries
-library(tidymodels)   # For machine learning workflows, and data manipulation (dplyr, tidyr, ...)
 library(AMR)          # For AMR data analysis
+library(tidymodels)   # For machine learning workflows, and data manipulation (dplyr, tidyr, ...)

 # Select relevant columns for prediction
 data <- example_isolates %>%
@@ -81,12 +81,18 @@ resistance_recipe <- recipe(mo ~ ., data = data) %>%
 resistance_recipe
 ```

+For a recipe that includes at least one preprocessing operation, like we have with `step_corr()`, the necessary parameters can be estimated from a training set using `prep()`:
+
+```{r}
+prep(resistance_recipe)
+```
+
 **Explanation:**

 - `recipe(mo ~ ., data = data)` will take the `mo` column as outcome and all other columns as predictors.
 - `step_corr()` removes predictors (i.e., antibiotic columns) that have a higher correlation than 90%.

-Notice how the recipe contains just the antibiotic selector functions - no need to define the columns specifically.
+Notice how the recipe contains just the antibiotic selector functions - no need to define the columns specifically. In the preparation (retrieved with `prep()`) we can see that the columns or variables `r paste0("'", suppressMessages(prep(resistance_recipe))$steps[[1]]$removals, "'", collapse = " and ")` were removed as they correlate too much with existing, other variables.

 #### 2. Specifying the Model

@@ -113,6 +119,7 @@ We bundle the recipe and model together into a `workflow`, which organizes the e
 resistance_workflow <- workflow() %>%
  add_recipe(resistance_recipe) %>% # Add the preprocessing recipe
  add_model(logistic_model) # Add the logistic regression model
+resistance_workflow
 ```

 ### **Training and Evaluating the Model**