1
0
mirror of https://github.com/msberends/AMR.git synced 2025-07-10 07:02:01 +02:00

(v2.1.1.9230) deprecated resistance_predict(), data set folder name without space

This commit is contained in:
2025-03-28 16:48:56 +01:00
parent bd873ac1bc
commit b972bbb96f
25 changed files with 410 additions and 312 deletions

View File

@ -24,7 +24,11 @@ knitr::opts_chunk$set(
> This page was entirely written by our [AMR for R Assistant](https://chatgpt.com/g/g-M4UNLwFi5-amr-for-r-assistant), a ChatGPT manually-trained model able to answer any question about the AMR package.
Antimicrobial resistance (AMR) is a global health crisis, and understanding resistance patterns is crucial for managing effective treatments. The `AMR` R package provides robust tools for analysing AMR data, including convenient antibiotic selector functions like `aminoglycosides()` and `betalactams()`. In this post, we will explore how to use the `tidymodels` framework to predict resistance patterns in the `example_isolates` dataset.
---
## Example 1: Using Antimicrobial Selectors
Antimicrobial resistance (AMR) is a global health crisis, and understanding resistance patterns is crucial for managing effective treatments. The `AMR` R package provides robust tools for analysing AMR data, including convenient antimicrobial selector functions like `aminoglycosides()` and `betalactams()`. In this post, we will explore how to use the `tidymodels` framework to predict resistance patterns in the `example_isolates` dataset.
By leveraging the power of `tidymodels` and the `AMR` package, well build a reproducible machine learning workflow to predict the Gramstain of the microorganism to two important antibiotic classes: aminoglycosides and beta-lactams.
@ -45,6 +49,9 @@ We begin by loading the required libraries and preparing the `example_isolates`
library(AMR) # For AMR data analysis
library(tidymodels) # For machine learning workflows, and data manipulation (dplyr, tidyr, ...)
# Your data could look like this:
example_isolates
# Select relevant columns for prediction
data <- example_isolates %>%
# select AB results dynamically
@ -92,7 +99,7 @@ prep(resistance_recipe)
- `recipe(mo ~ ., data = data)` will take the `mo` column as outcome and all other columns as predictors.
- `step_corr()` removes predictors (i.e., antibiotic columns) that have a higher correlation than 90%.
Notice how the recipe contains just the antibiotic selector functions - no need to define the columns specifically. In the preparation (retrieved with `prep()`) we can see that the columns or variables `r paste0("'", suppressMessages(prep(resistance_recipe))$steps[[1]]$removals, "'", collapse = " and ")` were removed as they correlate too much with existing, other variables.
Notice how the recipe contains just the antimicrobial selector functions - no need to define the columns specifically. In the preparation (retrieved with `prep()`) we can see that the columns or variables `r paste0("'", suppressMessages(prep(resistance_recipe))$steps[[1]]$removals, "'", collapse = " and ")` were removed as they correlate too much with existing, other variables.
#### 2. Specifying the Model
@ -101,7 +108,7 @@ We define a logistic regression model since resistance prediction is a binary cl
```{r}
# Specify a logistic regression model
logistic_model <- logistic_reg() %>%
set_engine("glm") # Use the Generalized Linear Model engine
set_engine("glm") # Use the Generalised Linear Model engine
logistic_model
```
@ -112,7 +119,7 @@ logistic_model
#### 3. Building the Workflow
We bundle the recipe and model together into a `workflow`, which organizes the entire modeling process.
We bundle the recipe and model together into a `workflow`, which organises the entire modeling process.
```{r}
# Combine the recipe and model into a workflow
@ -143,7 +150,7 @@ fitted_workflow <- resistance_workflow %>%
- `initial_split()` splits the data into training and testing sets.
- `fit()` trains the workflow on the training set.
Notice how in `fit()`, the antibiotic selector functions are internally called again. For training, these functions are called since they are stored in the recipe.
Notice how in `fit()`, the antimicrobial selector functions are internally called again. For training, these functions are called since they are stored in the recipe.
Next, we evaluate the model on the testing data.
@ -165,6 +172,14 @@ metrics <- predictions %>%
metrics(truth = mo, estimate = .pred_class) # Calculate performance metrics
metrics
# To assess some other model properties, you can make our own `metrics()` function
our_metrics <- metric_set(accuracy, kap, ppv, npv) # add Positive Predictive Value and Negative Predictive Value
metrics2 <- predictions %>%
our_metrics(truth = mo, estimate = .pred_class) # run again on our `our_metrics()` function
metrics2
```
**Explanation:**
@ -172,7 +187,7 @@ metrics
- `predict()` generates predictions on the testing set.
- `metrics()` computes evaluation metrics like accuracy and kappa.
It appears we can predict the Gram based on AMR results with a `r round(metrics$.estimate[1], 3) * 100`% accuracy based on AMR results of aminoglycosides and beta-lactam antibiotics. The ROC curve looks like this:
It appears we can predict the Gram stain with a `r round(metrics$.estimate[1], 3) * 100`% accuracy based on AMR results of only aminoglycosides and beta-lactam antibiotics. The ROC curve looks like this:
```{r}
predictions %>%
@ -184,4 +199,170 @@ predictions %>%
In this post, we demonstrated how to build a machine learning pipeline with the `tidymodels` framework and the `AMR` package. By combining selector functions like `aminoglycosides()` and `betalactams()` with `tidymodels`, we efficiently prepared data, trained a model, and evaluated its performance.
This workflow is extensible to other antibiotic classes and resistance patterns, empowering users to analyse AMR data systematically and reproducibly.
This workflow is extensible to other antimicrobial classes and resistance patterns, empowering users to analyse AMR data systematically and reproducibly.
---
## Example 2: Predicting AMR Over Time
In this second example, we aim to predict antimicrobial resistance (AMR) trends over time using `tidymodels`. We will model resistance to three antibiotics (amoxicillin `AMX`, amoxicillin-clavulanic acid `AMC`, and ciprofloxacin `CIP`), based on historical data grouped by year and hospital ward.
### **Objective**
Our goal is to:
1. Prepare the dataset by aggregating resistance data over time.
2. Define a regression model to predict AMR trends.
3. Use `tidymodels` to preprocess, train, and evaluate the model.
### **Data Preparation**
We start by transforming the `example_isolates` dataset into a structured time-series format.
```{r}
# Load required libraries
library(AMR)
library(tidymodels)
# Transform dataset
data_time <- example_isolates %>%
top_n_microorganisms(n = 10) %>% # Filter on the top #10 species
mutate(year = as.integer(format(date, "%Y")), # Extract year from date
gramstain = mo_gramstain(mo)) %>% # Get taxonomic names
group_by(year, gramstain) %>%
summarise(across(c(AMX, AMC, CIP),
function(x) resistance(x, minimum = 0),
.names = "res_{.col}"),
.groups = "drop") %>%
filter(!is.na(res_AMX) & !is.na(res_AMC) & !is.na(res_CIP)) # Drop missing values
data_time
```
**Explanation:**
- `mo_name(mo)`: Converts microbial codes into proper species names.
- `resistance()`: Converts AMR results into numeric values (proportion of resistant isolates).
- `group_by(year, ward, species)`: Aggregates resistance rates by year and ward.
### **Defining the Workflow**
We now define the modeling workflow, which consists of a preprocessing step, a model specification, and the fitting process.
#### 1. Preprocessing with a Recipe
```{r}
# Define the recipe
resistance_recipe_time <- recipe(res_AMX ~ year + gramstain, data = data_time) %>%
step_dummy(gramstain, one_hot = TRUE) %>% # Convert categorical to numerical
step_normalize(year) %>% # Normalise year for better model performance
step_nzv(all_predictors()) # Remove near-zero variance predictors
resistance_recipe_time
```
**Explanation:**
- `step_dummy()`: Encodes categorical variables (`ward`, `species`) as numerical indicators.
- `step_normalize()`: Normalises the `year` variable.
- `step_nzv()`: Removes near-zero variance predictors.
#### 2. Specifying the Model
We use a linear regression model to predict resistance trends.
```{r}
# Define the linear regression model
lm_model <- linear_reg() %>%
set_engine("lm") # Use linear regression
lm_model
```
**Explanation:**
- `linear_reg()`: Defines a linear regression model.
- `set_engine("lm")`: Uses Rs built-in linear regression engine.
#### 3. Building the Workflow
We combine the preprocessing recipe and model into a workflow.
```{r}
# Create workflow
resistance_workflow_time <- workflow() %>%
add_recipe(resistance_recipe_time) %>%
add_model(lm_model)
resistance_workflow_time
```
### **Training and Evaluating the Model**
We split the data into training and testing sets, fit the model, and evaluate performance.
```{r}
# Split the data
set.seed(123)
data_split_time <- initial_split(data_time, prop = 0.8)
train_time <- training(data_split_time)
test_time <- testing(data_split_time)
# Train the model
fitted_workflow_time <- resistance_workflow_time %>%
fit(train_time)
# Make predictions
predictions_time <- fitted_workflow_time %>%
predict(test_time) %>%
bind_cols(test_time)
# Evaluate model
metrics_time <- predictions_time %>%
metrics(truth = res_AMX, estimate = .pred)
metrics_time
```
**Explanation:**
- `initial_split()`: Splits data into training and testing sets.
- `fit()`: Trains the workflow.
- `predict()`: Generates resistance predictions.
- `metrics()`: Evaluates model performance.
### **Visualizing Predictions**
We plot resistance trends over time for Amoxicillin.
```{r}
library(ggplot2)
# Plot actual vs predicted resistance over time
ggplot(predictions_time, aes(x = year)) +
geom_point(aes(y = res_AMX, color = "Actual")) +
geom_line(aes(y = .pred, color = "Predicted")) +
labs(title = "Predicted vs Actual AMX Resistance Over Time",
x = "Year",
y = "Resistance Proportion") +
theme_minimal()
```
Additionally, we can visualise resistance trends in `ggplot2` and directly adding linear models there:
```{r}
ggplot(data_time, aes(x = year, y = res_AMX, color = gramstain)) +
geom_line() +
labs(title = "AMX Resistance Trends",
x = "Year",
y = "Resistance Proportion") +
# add a linear model directly in ggplot2:
geom_smooth(method = "lm",
formula = y ~ x,
alpha = 0.25) +
theme_minimal()
```
### **Conclusion**
In this example, we demonstrated how to analyze AMR trends over time using `tidymodels`. By aggregating resistance rates by year and hospital ward, we built a predictive model to track changes in resistance to amoxicillin (`AMX`), amoxicillin-clavulanic acid (`AMC`), and ciprofloxacin (`CIP`).
This method can be extended to other antibiotics and resistance patterns, providing valuable insights into AMR dynamics in healthcare settings.