mirror of
https://github.com/msberends/AMR.git
synced 2025-12-17 21:40:18 +01:00
Built site for AMR@3.0.1.9003: ba30b08
This commit is contained in:
606
articles/AMR_with_tidymodels.md
Normal file
606
articles/AMR_with_tidymodels.md
Normal file
@@ -0,0 +1,606 @@
|
||||
# AMR with tidymodels
|
||||
|
||||
> This page was entirely written by our [AMR for R
|
||||
> Assistant](https://chat.amr-for-r.org), a ChatGPT manually-trained
|
||||
> model able to answer any question about the `AMR` package.
|
||||
|
||||
Antimicrobial resistance (AMR) is a global health crisis, and
|
||||
understanding resistance patterns is crucial for managing effective
|
||||
treatments. The `AMR` R package provides robust tools for analysing AMR
|
||||
data, including convenient antimicrobial selector functions like
|
||||
[`aminoglycosides()`](https://amr-for-r.org/reference/antimicrobial_selectors.md)
|
||||
and
|
||||
[`betalactams()`](https://amr-for-r.org/reference/antimicrobial_selectors.md).
|
||||
|
||||
In this post, we will explore how to use the `tidymodels` framework to
|
||||
predict resistance patterns in the `example_isolates` dataset in two
|
||||
examples.
|
||||
|
||||
This post contains the following examples:
|
||||
|
||||
1. Using Antimicrobial Selectors
|
||||
2. Predicting ESBL Presence Using Raw MICs
|
||||
3. Predicting AMR Over Time
|
||||
|
||||
## Example 1: Using Antimicrobial Selectors
|
||||
|
||||
By leveraging the power of `tidymodels` and the `AMR` package, we’ll
|
||||
build a reproducible machine learning workflow to predict the Gramstain
|
||||
of the microorganism to two important antibiotic classes:
|
||||
aminoglycosides and beta-lactams.
|
||||
|
||||
### **Objective**
|
||||
|
||||
Our goal is to build a predictive model using the `tidymodels` framework
|
||||
to determine the Gramstain of the microorganism based on microbial data.
|
||||
We will:
|
||||
|
||||
1. Preprocess data using the selector functions
|
||||
[`aminoglycosides()`](https://amr-for-r.org/reference/antimicrobial_selectors.md)
|
||||
and
|
||||
[`betalactams()`](https://amr-for-r.org/reference/antimicrobial_selectors.md).
|
||||
2. Define a logistic regression model for prediction.
|
||||
3. Use a structured `tidymodels` workflow to preprocess, train, and
|
||||
evaluate the model.
|
||||
|
||||
### **Data Preparation**
|
||||
|
||||
We begin by loading the required libraries and preparing the
|
||||
`example_isolates` dataset from the `AMR` package.
|
||||
|
||||
``` r
|
||||
# Load required libraries
|
||||
library(AMR) # For AMR data analysis
|
||||
library(tidymodels) # For machine learning workflows, and data manipulation (dplyr, tidyr, ...)
|
||||
```
|
||||
|
||||
Prepare the data:
|
||||
|
||||
``` r
|
||||
# Your data could look like this:
|
||||
example_isolates
|
||||
#> # A tibble: 2,000 × 46
|
||||
#> date patient age gender ward mo PEN OXA FLC AMX
|
||||
#> <date> <chr> <dbl> <chr> <chr> <mo> <sir> <sir> <sir> <sir>
|
||||
#> 1 2002-01-02 A77334 65 F Clinical B_ESCHR_COLI R NA NA NA
|
||||
#> 2 2002-01-03 A77334 65 F Clinical B_ESCHR_COLI R NA NA NA
|
||||
#> 3 2002-01-07 067927 45 F ICU B_STPHY_EPDR R NA R NA
|
||||
#> 4 2002-01-07 067927 45 F ICU B_STPHY_EPDR R NA R NA
|
||||
#> 5 2002-01-13 067927 45 F ICU B_STPHY_EPDR R NA R NA
|
||||
#> 6 2002-01-13 067927 45 F ICU B_STPHY_EPDR R NA R NA
|
||||
#> 7 2002-01-14 462729 78 M Clinical B_STPHY_AURS R NA S R
|
||||
#> 8 2002-01-14 462729 78 M Clinical B_STPHY_AURS R NA S R
|
||||
#> 9 2002-01-16 067927 45 F ICU B_STPHY_EPDR R NA R NA
|
||||
#> 10 2002-01-17 858515 79 F ICU B_STPHY_EPDR R NA S NA
|
||||
#> # ℹ 1,990 more rows
|
||||
#> # ℹ 36 more variables: AMC <sir>, AMP <sir>, TZP <sir>, CZO <sir>, FEP <sir>,
|
||||
#> # CXM <sir>, FOX <sir>, CTX <sir>, CAZ <sir>, CRO <sir>, GEN <sir>,
|
||||
#> # TOB <sir>, AMK <sir>, KAN <sir>, TMP <sir>, SXT <sir>, NIT <sir>,
|
||||
#> # FOS <sir>, LNZ <sir>, CIP <sir>, MFX <sir>, VAN <sir>, TEC <sir>,
|
||||
#> # TCY <sir>, TGC <sir>, DOX <sir>, ERY <sir>, CLI <sir>, AZM <sir>,
|
||||
#> # IPM <sir>, MEM <sir>, MTR <sir>, CHL <sir>, COL <sir>, MUP <sir>, …
|
||||
|
||||
# Select relevant columns for prediction
|
||||
data <- example_isolates %>%
|
||||
# select AB results dynamically
|
||||
select(mo, aminoglycosides(), betalactams()) %>%
|
||||
# replace NAs with NI (not-interpretable)
|
||||
mutate(across(where(is.sir),
|
||||
~replace_na(.x, "NI")),
|
||||
# make factors of SIR columns
|
||||
across(where(is.sir),
|
||||
as.integer),
|
||||
# get Gramstain of microorganisms
|
||||
mo = as.factor(mo_gramstain(mo))) %>%
|
||||
# drop NAs - the ones without a Gramstain (fungi, etc.)
|
||||
drop_na()
|
||||
#> ℹ For `aminoglycosides()` using columns 'GEN' (gentamicin), 'TOB'
|
||||
#> (tobramycin), 'AMK' (amikacin), and 'KAN' (kanamycin)
|
||||
#> ℹ For `betalactams()` using columns 'PEN' (benzylpenicillin), 'OXA'
|
||||
#> (oxacillin), 'FLC' (flucloxacillin), 'AMX' (amoxicillin), 'AMC'
|
||||
#> (amoxicillin/clavulanic acid), 'AMP' (ampicillin), 'TZP'
|
||||
#> (piperacillin/tazobactam), 'CZO' (cefazolin), 'FEP' (cefepime), 'CXM'
|
||||
#> (cefuroxime), 'FOX' (cefoxitin), 'CTX' (cefotaxime), 'CAZ' (ceftazidime),
|
||||
#> 'CRO' (ceftriaxone), 'IPM' (imipenem), and 'MEM' (meropenem)
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
- [`aminoglycosides()`](https://amr-for-r.org/reference/antimicrobial_selectors.md)
|
||||
and
|
||||
[`betalactams()`](https://amr-for-r.org/reference/antimicrobial_selectors.md)
|
||||
dynamically select columns for antimicrobials in these classes.
|
||||
- `drop_na()` ensures the model receives complete cases for training.
|
||||
|
||||
### **Defining the Workflow**
|
||||
|
||||
We now define the `tidymodels` workflow, which consists of three steps:
|
||||
preprocessing, model specification, and fitting.
|
||||
|
||||
#### 1. Preprocessing with a Recipe
|
||||
|
||||
We create a recipe to preprocess the data for modelling.
|
||||
|
||||
``` r
|
||||
# Define the recipe for data preprocessing
|
||||
resistance_recipe <- recipe(mo ~ ., data = data) %>%
|
||||
step_corr(c(aminoglycosides(), betalactams()), threshold = 0.9)
|
||||
resistance_recipe
|
||||
#>
|
||||
#> ── Recipe ──────────────────────────────────────────────────────────────────────
|
||||
#>
|
||||
#> ── Inputs
|
||||
#> Number of variables by role
|
||||
#> outcome: 1
|
||||
#> predictor: 20
|
||||
#>
|
||||
#> ── Operations
|
||||
#> • Correlation filter on: c(aminoglycosides(), betalactams())
|
||||
```
|
||||
|
||||
For a recipe that includes at least one preprocessing operation, like we
|
||||
have with `step_corr()`, the necessary parameters can be estimated from
|
||||
a training set using `prep()`:
|
||||
|
||||
``` r
|
||||
prep(resistance_recipe)
|
||||
#> ℹ For `aminoglycosides()` using columns 'GEN' (gentamicin), 'TOB'
|
||||
#> (tobramycin), 'AMK' (amikacin), and 'KAN' (kanamycin)
|
||||
#> ℹ For `betalactams()` using columns 'PEN' (benzylpenicillin), 'OXA'
|
||||
#> (oxacillin), 'FLC' (flucloxacillin), 'AMX' (amoxicillin), 'AMC'
|
||||
#> (amoxicillin/clavulanic acid), 'AMP' (ampicillin), 'TZP'
|
||||
#> (piperacillin/tazobactam), 'CZO' (cefazolin), 'FEP' (cefepime), 'CXM'
|
||||
#> (cefuroxime), 'FOX' (cefoxitin), 'CTX' (cefotaxime), 'CAZ' (ceftazidime),
|
||||
#> 'CRO' (ceftriaxone), 'IPM' (imipenem), and 'MEM' (meropenem)
|
||||
#>
|
||||
#> ── Recipe ──────────────────────────────────────────────────────────────────────
|
||||
#>
|
||||
#> ── Inputs
|
||||
#> Number of variables by role
|
||||
#> outcome: 1
|
||||
#> predictor: 20
|
||||
#>
|
||||
#> ── Training information
|
||||
#> Training data contained 1968 data points and no incomplete rows.
|
||||
#>
|
||||
#> ── Operations
|
||||
#> • Correlation filter on: AMX CTX | Trained
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
- `recipe(mo ~ ., data = data)` will take the `mo` column as outcome and
|
||||
all other columns as predictors.
|
||||
- `step_corr()` removes predictors (i.e., antibiotic columns) that have
|
||||
a higher correlation than 90%.
|
||||
|
||||
Notice how the recipe contains just the antimicrobial selector
|
||||
functions - no need to define the columns specifically. In the
|
||||
preparation (retrieved with `prep()`) we can see that the columns or
|
||||
variables ‘AMX’ and ‘CTX’ were removed as they correlate too much with
|
||||
existing, other variables.
|
||||
|
||||
#### 2. Specifying the Model
|
||||
|
||||
We define a logistic regression model since resistance prediction is a
|
||||
binary classification task.
|
||||
|
||||
``` r
|
||||
# Specify a logistic regression model
|
||||
logistic_model <- logistic_reg() %>%
|
||||
set_engine("glm") # Use the Generalised Linear Model engine
|
||||
logistic_model
|
||||
#> Logistic Regression Model Specification (classification)
|
||||
#>
|
||||
#> Computational engine: glm
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
- `logistic_reg()` sets up a logistic regression model.
|
||||
- `set_engine("glm")` specifies the use of R’s built-in GLM engine.
|
||||
|
||||
#### 3. Building the Workflow
|
||||
|
||||
We bundle the recipe and model together into a `workflow`, which
|
||||
organises the entire modelling process.
|
||||
|
||||
``` r
|
||||
# Combine the recipe and model into a workflow
|
||||
resistance_workflow <- workflow() %>%
|
||||
add_recipe(resistance_recipe) %>% # Add the preprocessing recipe
|
||||
add_model(logistic_model) # Add the logistic regression model
|
||||
resistance_workflow
|
||||
#> ══ Workflow ════════════════════════════════════════════════════════════════════
|
||||
#> Preprocessor: Recipe
|
||||
#> Model: logistic_reg()
|
||||
#>
|
||||
#> ── Preprocessor ────────────────────────────────────────────────────────────────
|
||||
#> 1 Recipe Step
|
||||
#>
|
||||
#> • step_corr()
|
||||
#>
|
||||
#> ── Model ───────────────────────────────────────────────────────────────────────
|
||||
#> Logistic Regression Model Specification (classification)
|
||||
#>
|
||||
#> Computational engine: glm
|
||||
```
|
||||
|
||||
### **Training and Evaluating the Model**
|
||||
|
||||
To train the model, we split the data into training and testing sets.
|
||||
Then, we fit the workflow on the training set and evaluate its
|
||||
performance.
|
||||
|
||||
``` r
|
||||
# Split data into training and testing sets
|
||||
set.seed(123) # For reproducibility
|
||||
data_split <- initial_split(data, prop = 0.8) # 80% training, 20% testing
|
||||
training_data <- training(data_split) # Training set
|
||||
testing_data <- testing(data_split) # Testing set
|
||||
|
||||
# Fit the workflow to the training data
|
||||
fitted_workflow <- resistance_workflow %>%
|
||||
fit(training_data) # Train the model
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
- `initial_split()` splits the data into training and testing sets.
|
||||
- `fit()` trains the workflow on the training set.
|
||||
|
||||
Notice how in `fit()`, the antimicrobial selector functions are
|
||||
internally called again. For training, these functions are called since
|
||||
they are stored in the recipe.
|
||||
|
||||
Next, we evaluate the model on the testing data.
|
||||
|
||||
``` r
|
||||
# Make predictions on the testing set
|
||||
predictions <- fitted_workflow %>%
|
||||
predict(testing_data) # Generate predictions
|
||||
probabilities <- fitted_workflow %>%
|
||||
predict(testing_data, type = "prob") # Generate probabilities
|
||||
|
||||
predictions <- predictions %>%
|
||||
bind_cols(probabilities) %>%
|
||||
bind_cols(testing_data) # Combine with true labels
|
||||
|
||||
predictions
|
||||
#> # A tibble: 394 × 24
|
||||
#> .pred_class `.pred_Gram-negative` `.pred_Gram-positive` mo GEN TOB
|
||||
#> <fct> <dbl> <dbl> <fct> <int> <int>
|
||||
#> 1 Gram-positive 1.07e- 1 8.93 e- 1 Gram-p… 5 5
|
||||
#> 2 Gram-positive 3.17e- 8 1.000e+ 0 Gram-p… 5 1
|
||||
#> 3 Gram-negative 9.99e- 1 1.42 e- 3 Gram-n… 5 5
|
||||
#> 4 Gram-positive 2.22e-16 1 e+ 0 Gram-p… 5 5
|
||||
#> 5 Gram-negative 9.46e- 1 5.42 e- 2 Gram-n… 5 5
|
||||
#> 6 Gram-positive 1.07e- 1 8.93 e- 1 Gram-p… 5 5
|
||||
#> 7 Gram-positive 2.22e-16 1 e+ 0 Gram-p… 1 5
|
||||
#> 8 Gram-positive 2.22e-16 1 e+ 0 Gram-p… 4 4
|
||||
#> 9 Gram-negative 1 e+ 0 2.22 e-16 Gram-n… 1 1
|
||||
#> 10 Gram-positive 6.05e-11 1.000e+ 0 Gram-p… 4 4
|
||||
#> # ℹ 384 more rows
|
||||
#> # ℹ 18 more variables: AMK <int>, KAN <int>, PEN <int>, OXA <int>, FLC <int>,
|
||||
#> # AMX <int>, AMC <int>, AMP <int>, TZP <int>, CZO <int>, FEP <int>,
|
||||
#> # CXM <int>, FOX <int>, CTX <int>, CAZ <int>, CRO <int>, IPM <int>, MEM <int>
|
||||
|
||||
# Evaluate model performance
|
||||
metrics <- predictions %>%
|
||||
metrics(truth = mo, estimate = .pred_class) # Calculate performance metrics
|
||||
|
||||
metrics
|
||||
#> # A tibble: 2 × 3
|
||||
#> .metric .estimator .estimate
|
||||
#> <chr> <chr> <dbl>
|
||||
#> 1 accuracy binary 0.995
|
||||
#> 2 kap binary 0.989
|
||||
|
||||
|
||||
# To assess some other model properties, you can make our own `metrics()` function
|
||||
our_metrics <- metric_set(accuracy, kap, ppv, npv) # add Positive Predictive Value and Negative Predictive Value
|
||||
metrics2 <- predictions %>%
|
||||
our_metrics(truth = mo, estimate = .pred_class) # run again on our `our_metrics()` function
|
||||
|
||||
metrics2
|
||||
#> # A tibble: 4 × 3
|
||||
#> .metric .estimator .estimate
|
||||
#> <chr> <chr> <dbl>
|
||||
#> 1 accuracy binary 0.995
|
||||
#> 2 kap binary 0.989
|
||||
#> 3 ppv binary 0.987
|
||||
#> 4 npv binary 1
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
- [`predict()`](https://rdrr.io/r/stats/predict.html) generates
|
||||
predictions on the testing set.
|
||||
- `metrics()` computes evaluation metrics like accuracy and kappa.
|
||||
|
||||
It appears we can predict the Gram stain with a 99.5% accuracy based on
|
||||
AMR results of only aminoglycosides and beta-lactam antibiotics. The ROC
|
||||
curve looks like this:
|
||||
|
||||
``` r
|
||||
predictions %>%
|
||||
roc_curve(mo, `.pred_Gram-negative`) %>%
|
||||
autoplot()
|
||||
```
|
||||
|
||||

|
||||
|
||||
### **Conclusion**
|
||||
|
||||
In this post, we demonstrated how to build a machine learning pipeline
|
||||
with the `tidymodels` framework and the `AMR` package. By combining
|
||||
selector functions like
|
||||
[`aminoglycosides()`](https://amr-for-r.org/reference/antimicrobial_selectors.md)
|
||||
and
|
||||
[`betalactams()`](https://amr-for-r.org/reference/antimicrobial_selectors.md)
|
||||
with `tidymodels`, we efficiently prepared data, trained a model, and
|
||||
evaluated its performance.
|
||||
|
||||
This workflow is extensible to other antimicrobial classes and
|
||||
resistance patterns, empowering users to analyse AMR data systematically
|
||||
and reproducibly.
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
## Example 2: Predicting ESBL Presence Using Raw MICs
|
||||
|
||||
In this second example, we demonstrate how to use `<mic>` columns
|
||||
directly in `tidymodels` workflows using AMR-specific recipe steps. This
|
||||
includes a transformation to `log2` scale using `step_mic_log2()`, which
|
||||
prepares MIC values for use in classification models.
|
||||
|
||||
This approach and idea formed the basis for the publication [DOI:
|
||||
10.3389/fmicb.2025.1582703](https://doi.org/10.3389/fmicb.2025.1582703)
|
||||
to model the presence of extended-spectrum beta-lactamases (ESBL).
|
||||
|
||||
> NOTE: THIS EXAMPLE WILL BE AVAILABLE IN A NEXT VERSION (#TODO)
|
||||
>
|
||||
> The new AMR package version will contain new tidymodels selectors such
|
||||
> as `step_mic_log2()`.
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
## Example 2: Predicting AMR Over Time
|
||||
|
||||
In this third example, we aim to predict antimicrobial resistance (AMR)
|
||||
trends over time using `tidymodels`. We will model resistance to three
|
||||
antibiotics (amoxicillin `AMX`, amoxicillin-clavulanic acid `AMC`, and
|
||||
ciprofloxacin `CIP`), based on historical data grouped by year and
|
||||
hospital ward.
|
||||
|
||||
### **Objective**
|
||||
|
||||
Our goal is to:
|
||||
|
||||
1. Prepare the dataset by aggregating resistance data over time.
|
||||
2. Define a regression model to predict AMR trends.
|
||||
3. Use `tidymodels` to preprocess, train, and evaluate the model.
|
||||
|
||||
### **Data Preparation**
|
||||
|
||||
We start by transforming the `example_isolates` dataset into a
|
||||
structured time-series format.
|
||||
|
||||
``` r
|
||||
# Load required libraries
|
||||
library(AMR)
|
||||
library(tidymodels)
|
||||
|
||||
# Transform dataset
|
||||
data_time <- example_isolates %>%
|
||||
top_n_microorganisms(n = 10) %>% # Filter on the top #10 species
|
||||
mutate(year = as.integer(format(date, "%Y")), # Extract year from date
|
||||
gramstain = mo_gramstain(mo)) %>% # Get taxonomic names
|
||||
group_by(year, gramstain) %>%
|
||||
summarise(across(c(AMX, AMC, CIP),
|
||||
function(x) resistance(x, minimum = 0),
|
||||
.names = "res_{.col}"),
|
||||
.groups = "drop") %>%
|
||||
filter(!is.na(res_AMX) & !is.na(res_AMC) & !is.na(res_CIP)) # Drop missing values
|
||||
#> ℹ Using column 'mo' as input for `col_mo`.
|
||||
|
||||
data_time
|
||||
#> # A tibble: 32 × 5
|
||||
#> year gramstain res_AMX res_AMC res_CIP
|
||||
#> <int> <chr> <dbl> <dbl> <dbl>
|
||||
#> 1 2002 Gram-negative 1 0.105 0.0606
|
||||
#> 2 2002 Gram-positive 0.838 0.182 0.162
|
||||
#> 3 2003 Gram-negative 1 0.0714 0
|
||||
#> 4 2003 Gram-positive 0.714 0.244 0.154
|
||||
#> 5 2004 Gram-negative 0.464 0.0938 0
|
||||
#> 6 2004 Gram-positive 0.849 0.299 0.244
|
||||
#> 7 2005 Gram-negative 0.412 0.132 0.0588
|
||||
#> 8 2005 Gram-positive 0.882 0.382 0.154
|
||||
#> 9 2006 Gram-negative 0.379 0 0.1
|
||||
#> 10 2006 Gram-positive 0.778 0.333 0.353
|
||||
#> # ℹ 22 more rows
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
- `mo_name(mo)`: Converts microbial codes into proper species names.
|
||||
- [`resistance()`](https://amr-for-r.org/reference/proportion.md):
|
||||
Converts AMR results into numeric values (proportion of resistant
|
||||
isolates).
|
||||
- `group_by(year, ward, species)`: Aggregates resistance rates by year
|
||||
and ward.
|
||||
|
||||
### **Defining the Workflow**
|
||||
|
||||
We now define the modelling workflow, which consists of a preprocessing
|
||||
step, a model specification, and the fitting process.
|
||||
|
||||
#### 1. Preprocessing with a Recipe
|
||||
|
||||
``` r
|
||||
# Define the recipe
|
||||
resistance_recipe_time <- recipe(res_AMX ~ year + gramstain, data = data_time) %>%
|
||||
step_dummy(gramstain, one_hot = TRUE) %>% # Convert categorical to numerical
|
||||
step_normalize(year) %>% # Normalise year for better model performance
|
||||
step_nzv(all_predictors()) # Remove near-zero variance predictors
|
||||
|
||||
resistance_recipe_time
|
||||
#>
|
||||
#> ── Recipe ──────────────────────────────────────────────────────────────────────
|
||||
#>
|
||||
#> ── Inputs
|
||||
#> Number of variables by role
|
||||
#> outcome: 1
|
||||
#> predictor: 2
|
||||
#>
|
||||
#> ── Operations
|
||||
#> • Dummy variables from: gramstain
|
||||
#> • Centering and scaling for: year
|
||||
#> • Sparse, unbalanced variable filter on: all_predictors()
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
- `step_dummy()`: Encodes categorical variables (`ward`, `species`) as
|
||||
numerical indicators.
|
||||
- `step_normalize()`: Normalises the `year` variable.
|
||||
- `step_nzv()`: Removes near-zero variance predictors.
|
||||
|
||||
#### 2. Specifying the Model
|
||||
|
||||
We use a linear regression model to predict resistance trends.
|
||||
|
||||
``` r
|
||||
# Define the linear regression model
|
||||
lm_model <- linear_reg() %>%
|
||||
set_engine("lm") # Use linear regression
|
||||
|
||||
lm_model
|
||||
#> Linear Regression Model Specification (regression)
|
||||
#>
|
||||
#> Computational engine: lm
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
- `linear_reg()`: Defines a linear regression model.
|
||||
- `set_engine("lm")`: Uses R’s built-in linear regression engine.
|
||||
|
||||
#### 3. Building the Workflow
|
||||
|
||||
We combine the preprocessing recipe and model into a workflow.
|
||||
|
||||
``` r
|
||||
# Create workflow
|
||||
resistance_workflow_time <- workflow() %>%
|
||||
add_recipe(resistance_recipe_time) %>%
|
||||
add_model(lm_model)
|
||||
|
||||
resistance_workflow_time
|
||||
#> ══ Workflow ════════════════════════════════════════════════════════════════════
|
||||
#> Preprocessor: Recipe
|
||||
#> Model: linear_reg()
|
||||
#>
|
||||
#> ── Preprocessor ────────────────────────────────────────────────────────────────
|
||||
#> 3 Recipe Steps
|
||||
#>
|
||||
#> • step_dummy()
|
||||
#> • step_normalize()
|
||||
#> • step_nzv()
|
||||
#>
|
||||
#> ── Model ───────────────────────────────────────────────────────────────────────
|
||||
#> Linear Regression Model Specification (regression)
|
||||
#>
|
||||
#> Computational engine: lm
|
||||
```
|
||||
|
||||
### **Training and Evaluating the Model**
|
||||
|
||||
We split the data into training and testing sets, fit the model, and
|
||||
evaluate performance.
|
||||
|
||||
``` r
|
||||
# Split the data
|
||||
set.seed(123)
|
||||
data_split_time <- initial_split(data_time, prop = 0.8)
|
||||
train_time <- training(data_split_time)
|
||||
test_time <- testing(data_split_time)
|
||||
|
||||
# Train the model
|
||||
fitted_workflow_time <- resistance_workflow_time %>%
|
||||
fit(train_time)
|
||||
|
||||
# Make predictions
|
||||
predictions_time <- fitted_workflow_time %>%
|
||||
predict(test_time) %>%
|
||||
bind_cols(test_time)
|
||||
|
||||
# Evaluate model
|
||||
metrics_time <- predictions_time %>%
|
||||
metrics(truth = res_AMX, estimate = .pred)
|
||||
|
||||
metrics_time
|
||||
#> # A tibble: 3 × 3
|
||||
#> .metric .estimator .estimate
|
||||
#> <chr> <chr> <dbl>
|
||||
#> 1 rmse standard 0.0774
|
||||
#> 2 rsq standard 0.711
|
||||
#> 3 mae standard 0.0704
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
- `initial_split()`: Splits data into training and testing sets.
|
||||
- `fit()`: Trains the workflow.
|
||||
- [`predict()`](https://rdrr.io/r/stats/predict.html): Generates
|
||||
resistance predictions.
|
||||
- `metrics()`: Evaluates model performance.
|
||||
|
||||
### **Visualising Predictions**
|
||||
|
||||
We plot resistance trends over time for amoxicillin.
|
||||
|
||||
``` r
|
||||
library(ggplot2)
|
||||
|
||||
# Plot actual vs predicted resistance over time
|
||||
ggplot(predictions_time, aes(x = year)) +
|
||||
geom_point(aes(y = res_AMX, color = "Actual")) +
|
||||
geom_line(aes(y = .pred, color = "Predicted")) +
|
||||
labs(title = "Predicted vs Actual AMX Resistance Over Time",
|
||||
x = "Year",
|
||||
y = "Resistance Proportion") +
|
||||
theme_minimal()
|
||||
```
|
||||
|
||||

|
||||
|
||||
Additionally, we can visualise resistance trends in `ggplot2` and
|
||||
directly add linear models there:
|
||||
|
||||
``` r
|
||||
ggplot(data_time, aes(x = year, y = res_AMX, color = gramstain)) +
|
||||
geom_line() +
|
||||
labs(title = "AMX Resistance Trends",
|
||||
x = "Year",
|
||||
y = "Resistance Proportion") +
|
||||
# add a linear model directly in ggplot2:
|
||||
geom_smooth(method = "lm",
|
||||
formula = y ~ x,
|
||||
alpha = 0.25) +
|
||||
theme_minimal()
|
||||
```
|
||||
|
||||

|
||||
|
||||
### **Conclusion**
|
||||
|
||||
In this example, we demonstrated how to analyze AMR trends over time
|
||||
using `tidymodels`. By aggregating resistance rates by year and hospital
|
||||
ward, we built a predictive model to track changes in resistance to
|
||||
amoxicillin (`AMX`), amoxicillin-clavulanic acid (`AMC`), and
|
||||
ciprofloxacin (`CIP`).
|
||||
|
||||
This method can be extended to other antibiotics and resistance
|
||||
patterns, providing valuable insights into AMR dynamics in healthcare
|
||||
settings.
|
||||
Reference in New Issue
Block a user