mirror of
https://github.com/msberends/AMR.git
synced 2025-04-15 13:10:32 +02:00
(v2.1.1.9230) deprecated resistance_predict()
, data set folder name without space
This commit is contained in:
parent
bd873ac1bc
commit
b972bbb96f
@ -1,5 +1,5 @@
|
||||
Package: AMR
|
||||
Version: 2.1.1.9229
|
||||
Version: 2.1.1.9230
|
||||
Date: 2025-03-28
|
||||
Title: Antimicrobial Resistance Data Analysis
|
||||
Description: Functions to simplify and standardise antimicrobial resistance (AMR)
|
||||
|
3
NEWS.md
3
NEWS.md
@ -1,4 +1,4 @@
|
||||
# AMR 2.1.1.9229
|
||||
# AMR 2.1.1.9230
|
||||
|
||||
*(this beta version will eventually become v3.0. We're happy to reach a new major milestone soon, which will be all about the new One Health support! Install this beta using [the instructions here](https://msberends.github.io/AMR/#latest-development-version).)*
|
||||
|
||||
@ -8,6 +8,7 @@ This package now supports not only tools for AMR data analysis in clinical setti
|
||||
## Breaking
|
||||
* Dataset `antibiotics` has been renamed to `antimicrobials` as the data set contains more than just antibiotics. Using `antibiotics` will still work, but now returns a warning.
|
||||
* Removed all functions and references that used the deprecated `rsi` class, which were all replaced with their `sir` equivalents over two years ago.
|
||||
* Function `resistance_predict()` is now deprecated and will be removed in a future version. Use the `tidymodels` framework instead, for which we [wrote a basic introduction](https://msberends.github.io/AMR/articles/AMR_with_tidymodels.html).
|
||||
|
||||
## New
|
||||
* **One Health implementation**
|
||||
|
@ -29,7 +29,7 @@
|
||||
|
||||
# ------------------------------------------------
|
||||
# THIS FILE WAS CREATED AUTOMATICALLY!
|
||||
# Source file: data-raw/reproduction scripts/reproduction_of_poorman.R
|
||||
# Source file: data-raw/_reproduction_scripts/reproduction_of_poorman.R
|
||||
# ------------------------------------------------
|
||||
|
||||
# poorman: a package to replace all dplyr functions with base R so we can lose dependency on dplyr.
|
||||
|
6
R/data.R
6
R/data.R
@ -85,7 +85,7 @@
|
||||
#' @description
|
||||
#' A data set containing the full microbial taxonomy (**last updated: `r documentation_date(max(TAXONOMY_VERSION$GBIF$accessed_date, TAXONOMY_VERSION$LPSN$accessed_date, TAXONOMY_VERSION$MycoBank$accessed_date))`**) of `r nr2char(length(unique(microorganisms$kingdom[!microorganisms$kingdom %like% "unknown"])))` kingdoms. This data set is the backbone of this `AMR` package. MO codes can be looked up using [as.mo()] and microorganism properties can be looked up using any of the [`mo_*`][mo_property()] functions.
|
||||
#'
|
||||
#' This data set is carefully crafted, yet made 100% reproducible from public and authoritative taxonomic sources (using [this script](https://github.com/msberends/AMR/blob/main/data-raw/reproduction scripts/reproduction_of_microorganisms.R)), namely: *`r TAXONOMY_VERSION$LPSN$name`* for bacteria, *`r TAXONOMY_VERSION$MycoBank$name`* for fungi, and *`r TAXONOMY_VERSION$GBIF$name`* for all others taxons.
|
||||
#' This data set is carefully crafted, yet made 100% reproducible from public and authoritative taxonomic sources (using [this script](https://github.com/msberends/AMR/blob/main/data-raw/_reproduction_scripts/reproduction_of_microorganisms.R)), namely: *`r TAXONOMY_VERSION$LPSN$name`* for bacteria, *`r TAXONOMY_VERSION$MycoBank$name`* for fungi, and *`r TAXONOMY_VERSION$GBIF$name`* for all others taxons.
|
||||
#' @format A [tibble][tibble::tibble] with `r format(nrow(microorganisms), big.mark = " ")` observations and `r ncol(microorganisms)` variables:
|
||||
#' - `mo`\cr ID of microorganism as used by this package. ***This is a unique identifier.***
|
||||
#' - `fullname`\cr Full name, like `"Escherichia coli"`. For the taxonomic ranks genus, species and subspecies, this is the 'pasted' text of genus, species, and subspecies. For all taxonomic ranks higher than genus, this is the name of the taxon. ***This is a unique identifier.***
|
||||
@ -131,7 +131,7 @@
|
||||
#' - 1 entry of *Moraxella* (*M. catarrhalis*), which was formally named *Branhamella catarrhalis* (Catlin, 1970) though this change was never accepted within the field of clinical microbiology
|
||||
#' - 8 other 'undefined' entries (unknown, unknown Gram-negatives, unknown Gram-positives, unknown yeast, unknown fungus, and unknown anaerobic Gram-pos/Gram-neg bacteria)
|
||||
#'
|
||||
#' The syntax used to transform the original data to a cleansed \R format, can be [found here](https://github.com/msberends/AMR/blob/main/data-raw/reproduction scripts/reproduction_of_microorganisms.R).
|
||||
#' The syntax used to transform the original data to a cleansed \R format, can be [found here](https://github.com/msberends/AMR/blob/main/data-raw/_reproduction_scripts/reproduction_of_microorganisms.R).
|
||||
#' @inheritSection AMR Download Our Reference Data
|
||||
#' @source
|
||||
#' Taxonomic entries were imported in this order of importance:
|
||||
@ -297,7 +297,7 @@
|
||||
#' ### Imported From WHONET
|
||||
#' Clinical breakpoints in this package were validated through and imported from [WHONET](https://whonet.org), a free desktop Windows application developed and supported by the WHO Collaborating Centre for Surveillance of Antimicrobial Resistance. More can be read on [their website](https://whonet.org). The developers of WHONET and this `AMR` package have been in contact about sharing their work. We highly appreciate their great development on the WHONET software.
|
||||
#'
|
||||
#' Our import and reproduction script can be found here: <https://github.com/msberends/AMR/blob/main/data-raw/reproduction scripts/reproduction_of_clinical_breakpoints.R>.
|
||||
#' Our import and reproduction script can be found here: <https://github.com/msberends/AMR/blob/main/data-raw/_reproduction_scripts/reproduction_of_clinical_breakpoints.R>.
|
||||
#'
|
||||
#' ### Response From CLSI and EUCAST
|
||||
#' The CEO of CLSI and the chairman of EUCAST have endorsed the work and public use of this `AMR` package (and consequently the use of their breakpoints) in June 2023, when future development of distributing clinical breakpoints was discussed in a meeting between CLSI, EUCAST, WHO, developers of WHONET software, and developers of this `AMR` package.
|
||||
|
@ -131,6 +131,11 @@ resistance_predict <- function(x,
|
||||
meet_criteria(preserve_measurements, allow_class = "logical", has_length = 1)
|
||||
meet_criteria(info, allow_class = "logical", has_length = 1)
|
||||
|
||||
deprecation_warning(
|
||||
old = "resistance_predict", is_function = TRUE,
|
||||
extra_msg = paste0("Use the tidymodels framework instead, for which we have written a basic and short introduction on our website: ", font_url("https://msberends.github.io/AMR/articles/AMR_with_tidymodels.html", txt = font_bold("AMR with tidymodels")))
|
||||
)
|
||||
|
||||
stop_if(is.null(model), 'choose a regression model with the `model` argument, e.g. resistance_predict(..., model = "binomial")')
|
||||
|
||||
x.bak <- x
|
||||
|
@ -68,7 +68,9 @@ deprecation_warning <- function(old = NULL, new = NULL, fn = NULL, extra_msg = N
|
||||
AMR_env[[paste0("deprecated_", old)]] <- 1
|
||||
if (isTRUE(is_function)) {
|
||||
old <- paste0(old, "()")
|
||||
new <- paste0(new, "()")
|
||||
if (!is.null(new)) {
|
||||
new <- paste0(new, "()")
|
||||
}
|
||||
type <- "function"
|
||||
} else if (isTRUE(is_dataset)) {
|
||||
type <- "dataset"
|
||||
@ -82,7 +84,7 @@ deprecation_warning <- function(old = NULL, new = NULL, fn = NULL, extra_msg = N
|
||||
}
|
||||
warning_(
|
||||
ifelse(is.null(new),
|
||||
paste0("The `", old, "` ", type, " is no longer in use"),
|
||||
paste0("The `", old, "` ", type, " is deprecated"),
|
||||
ifelse(type == "dataset",
|
||||
paste0("The `", old, "` ", type, " has been renamed to `", new, "`"),
|
||||
ifelse(type == "argument",
|
||||
|
2
R/zzz.R
2
R/zzz.R
@ -134,7 +134,7 @@ AMR_env$cross_icon <- if (isTRUE(base::l10n_info()$`UTF-8`)) "\u00d7" else "x"
|
||||
s3_register("knitr::knit_print", "antibiogram")
|
||||
s3_register("knitr::knit_print", "formatted_bug_drug_combinations")
|
||||
# Support vctrs package for use in e.g. dplyr verbs
|
||||
# NOTE 2024-02-22 this is the right way - it should be 2 S3 classes in the second argument
|
||||
# (NOTE 2024-02-22 this is the right way - it should be 2 '.'-separated S3 classes in the second argument)
|
||||
# S3: amr_selector
|
||||
s3_register("vctrs::vec_ptype2", "character.amr_selector")
|
||||
s3_register("vctrs::vec_ptype2", "amr_selector.character")
|
||||
|
@ -76,15 +76,12 @@ navbar:
|
||||
- text: "Generate Antibiogram (Trad./Syndromic/WISCA)"
|
||||
icon: "fa-file-prescription"
|
||||
href: "reference/antibiogram.html" # reference instead of an article
|
||||
- text: "Predict Antimicrobial Resistance"
|
||||
icon: "fa-dice"
|
||||
href: "articles/resistance_predict.html"
|
||||
- text: "Download Data Sets for Own Use"
|
||||
icon: "fa-database"
|
||||
href: "articles/datasets.html"
|
||||
- text: "Use AMR for Predictive Modelling (tidymodels)"
|
||||
icon: "fa-square-root-variable"
|
||||
href: "articles/AMR_with_tidymodels.html"
|
||||
- text: "Download Data Sets for Own Use"
|
||||
icon: "fa-database"
|
||||
href: "articles/datasets.html"
|
||||
- text: "Set User- Or Team-specific Package Settings"
|
||||
icon: "fa-gear"
|
||||
href: "reference/AMR-options.html"
|
||||
|
@ -79,7 +79,7 @@ antivirals <- antivirals %>%
|
||||
as.data.frame(stringsAsFactors = FALSE)
|
||||
|
||||
# add PubChem Compound ID (cid) and their trade names
|
||||
# see `data-raw/reproduction scripts/reproduction_of_antimicrobials.R` for get_CID() and get_synonyms()
|
||||
# see `data-raw/_reproduction_scripts/reproduction_of_antimicrobials.R` for get_CID() and get_synonyms()
|
||||
CIDs <- get_CID(antivirals$name)
|
||||
# these could not be found:
|
||||
antivirals[is.na(CIDs), ] %>% View()
|
@ -28,7 +28,7 @@
|
||||
# ==================================================================== #
|
||||
|
||||
# This script runs in 20-30 minutes and renews all guidelines of CLSI and EUCAST!
|
||||
# Run it with source("data-raw/reproduction scripts/reproduction_of_clinical_breakpoints.R")
|
||||
# Run it with source("data-raw/_reproduction_scripts/reproduction_of_clinical_breakpoints.R")
|
||||
|
||||
library(dplyr)
|
||||
library(readr)
|
||||
@ -39,7 +39,7 @@ devtools::load_all()
|
||||
# and copy the folder C:\WHONET\Resources to the data-raw/WHONET/ folder
|
||||
# (for ASIARS-Net update, also copy C:\WHONET\Codes to the data-raw/WHONET/ folder)
|
||||
|
||||
# BE SURE TO RUN data-raw/reproduction scripts/reproduction_of_microorganisms.groups.R FIRST TO GET THE GROUPS!
|
||||
# BE SURE TO RUN data-raw/_reproduction_scripts/reproduction_of_microorganisms.groups.R FIRST TO GET THE GROUPS!
|
||||
|
||||
# READ DATA ----
|
||||
|
@ -1,6 +1,6 @@
|
||||
This knowledge base contains all context you must know about the AMR package for R. You are a GPT trained to be an assistant for the AMR package in R. You are an incredible R specialist, especially trained in this package and in the tidyverse.
|
||||
|
||||
First and foremost, you are trained on version 2.1.1.9229. Remember this whenever someone asks which AMR package version you’re at.
|
||||
First and foremost, you are trained on version 2.1.1.9230. Remember this whenever someone asks which AMR package version you’re at.
|
||||
|
||||
Below are the contents of the NAMESPACE file, the index.md file, and all the man/*.Rd files (documentation) in the package. Every file content is split using 100 hypens.
|
||||
----------------------------------------------------------------------------------------------------
|
||||
@ -637,7 +637,7 @@ This package was intended as a comprehensive toolbox for integrated AMR data ana
|
||||
* Calculating antimicrobial resistance ([tutorial](./articles/AMR.html))
|
||||
* Determining multi-drug resistance (MDR) / multi-drug resistant organisms (MDRO) ([tutorial](./articles/MDR.html))
|
||||
* Calculating (empirical) susceptibility of both mono therapy and combination therapies ([tutorial](./articles/AMR.html))
|
||||
* Predicting future antimicrobial resistance using regression models ([tutorial](./articles/resistance_predict.html))
|
||||
* Apply AMR function in predictive modelling ([tutorial](./articles/AMR_with_tidymodels.html))
|
||||
* Getting properties for any microorganism (like Gram stain, species, genus or family) ([manual](./reference/mo_property.html))
|
||||
* Getting properties for any antimicrobial (like name, code of EARS-Net/ATC/LOINC/PubChem, defined daily dose or trade name) ([manual](./reference/ab_property.html))
|
||||
* Plotting antimicrobial resistance ([tutorial](./articles/AMR.html))
|
||||
@ -4291,7 +4291,7 @@ The default is \code{"human"}, which can also be set with the package option \co
|
||||
|
||||
Clinical breakpoints in this package were validated through and imported from \href{https://whonet.org}{WHONET}, a free desktop Windows application developed and supported by the WHO Collaborating Centre for Surveillance of Antimicrobial Resistance. More can be read on \href{https://whonet.org}{their website}. The developers of WHONET and this \code{AMR} package have been in contact about sharing their work. We highly appreciate their great development on the WHONET software.
|
||||
|
||||
Our import and reproduction script can be found here: <https://github.com/msberends/AMR/blob/main/data-raw/reproduction scripts/reproduction_of_clinical_breakpoints.R>.
|
||||
Our import and reproduction script can be found here: \url{https://github.com/msberends/AMR/blob/main/data-raw/_reproduction_scripts/reproduction_of_clinical_breakpoints.R}.
|
||||
}
|
||||
|
||||
\subsection{Response From CLSI and EUCAST}{
|
||||
@ -6760,7 +6760,7 @@ microorganisms
|
||||
\description{
|
||||
A data set containing the full microbial taxonomy (\strong{last updated: June 24th, 2024}) of six kingdoms. This data set is the backbone of this \code{AMR} package. MO codes can be looked up using \code{\link[=as.mo]{as.mo()}} and microorganism properties can be looked up using any of the \code{\link[=mo_property]{mo_*}} functions.
|
||||
|
||||
This data set is carefully crafted, yet made 100\% reproducible from public and authoritative taxonomic sources (using \link{this script}(https://github.com/msberends/AMR/blob/main/data-raw/reproduction scripts/reproduction_of_microorganisms.R)), namely: \emph{List of Prokaryotic names with Standing in Nomenclature (LPSN)} for bacteria, \emph{MycoBank} for fungi, and \emph{Global Biodiversity Information Facility (GBIF)} for all others taxons.
|
||||
This data set is carefully crafted, yet made 100\% reproducible from public and authoritative taxonomic sources (using \href{https://github.com/msberends/AMR/blob/main/data-raw/_reproduction_scripts/reproduction_of_microorganisms.R}{this script}), namely: \emph{List of Prokaryotic names with Standing in Nomenclature (LPSN)} for bacteria, \emph{MycoBank} for fungi, and \emph{Global Biodiversity Information Facility (GBIF)} for all others taxons.
|
||||
}
|
||||
\details{
|
||||
Please note that entries are only based on LPSN, MycoBank, and GBIF (see below). Since these sources incorporate entries based on (recent) publications in the International Journal of Systematic and Evolutionary Microbiology (IJSEM), it can happen that the year of publication is sometimes later than one might expect.
|
||||
@ -6791,7 +6791,7 @@ For convenience, some entries were added manually:
|
||||
\item 8 other 'undefined' entries (unknown, unknown Gram-negatives, unknown Gram-positives, unknown yeast, unknown fungus, and unknown anaerobic Gram-pos/Gram-neg bacteria)
|
||||
}
|
||||
|
||||
The syntax used to transform the original data to a cleansed \R format, can be \link{found here}(https://github.com/msberends/AMR/blob/main/data-raw/reproduction scripts/reproduction_of_microorganisms.R).
|
||||
The syntax used to transform the original data to a cleansed \R format, can be \href{https://github.com/msberends/AMR/blob/main/data-raw/_reproduction_scripts/reproduction_of_microorganisms.R}{found here}.
|
||||
}
|
||||
}
|
||||
|
||||
@ -9277,7 +9277,11 @@ knitr::opts_chunk$set(
|
||||
|
||||
> This page was entirely written by our [AMR for R Assistant](https://chatgpt.com/g/g-M4UNLwFi5-amr-for-r-assistant), a ChatGPT manually-trained model able to answer any question about the AMR package.
|
||||
|
||||
Antimicrobial resistance (AMR) is a global health crisis, and understanding resistance patterns is crucial for managing effective treatments. The `AMR` R package provides robust tools for analysing AMR data, including convenient antibiotic selector functions like `aminoglycosides()` and `betalactams()`. In this post, we will explore how to use the `tidymodels` framework to predict resistance patterns in the `example_isolates` dataset.
|
||||
---
|
||||
|
||||
## Example 1: Using Antimicrobial Selectors
|
||||
|
||||
Antimicrobial resistance (AMR) is a global health crisis, and understanding resistance patterns is crucial for managing effective treatments. The `AMR` R package provides robust tools for analysing AMR data, including convenient antimicrobial selector functions like `aminoglycosides()` and `betalactams()`. In this post, we will explore how to use the `tidymodels` framework to predict resistance patterns in the `example_isolates` dataset.
|
||||
|
||||
By leveraging the power of `tidymodels` and the `AMR` package, we’ll build a reproducible machine learning workflow to predict the Gramstain of the microorganism to two important antibiotic classes: aminoglycosides and beta-lactams.
|
||||
|
||||
@ -9298,6 +9302,9 @@ We begin by loading the required libraries and preparing the `example_isolates`
|
||||
library(AMR) # For AMR data analysis
|
||||
library(tidymodels) # For machine learning workflows, and data manipulation (dplyr, tidyr, ...)
|
||||
|
||||
# Your data could look like this:
|
||||
example_isolates
|
||||
|
||||
# Select relevant columns for prediction
|
||||
data <- example_isolates %>%
|
||||
# select AB results dynamically
|
||||
@ -9345,7 +9352,7 @@ prep(resistance_recipe)
|
||||
- `recipe(mo ~ ., data = data)` will take the `mo` column as outcome and all other columns as predictors.
|
||||
- `step_corr()` removes predictors (i.e., antibiotic columns) that have a higher correlation than 90%.
|
||||
|
||||
Notice how the recipe contains just the antibiotic selector functions - no need to define the columns specifically. In the preparation (retrieved with `prep()`) we can see that the columns or variables `r paste0("'", suppressMessages(prep(resistance_recipe))$steps[[1]]$removals, "'", collapse = " and ")` were removed as they correlate too much with existing, other variables.
|
||||
Notice how the recipe contains just the antimicrobial selector functions - no need to define the columns specifically. In the preparation (retrieved with `prep()`) we can see that the columns or variables `r paste0("'", suppressMessages(prep(resistance_recipe))$steps[[1]]$removals, "'", collapse = " and ")` were removed as they correlate too much with existing, other variables.
|
||||
|
||||
#### 2. Specifying the Model
|
||||
|
||||
@ -9354,7 +9361,7 @@ We define a logistic regression model since resistance prediction is a binary cl
|
||||
```{r}
|
||||
# Specify a logistic regression model
|
||||
logistic_model <- logistic_reg() %>%
|
||||
set_engine("glm") # Use the Generalized Linear Model engine
|
||||
set_engine("glm") # Use the Generalised Linear Model engine
|
||||
logistic_model
|
||||
```
|
||||
|
||||
@ -9365,7 +9372,7 @@ logistic_model
|
||||
|
||||
#### 3. Building the Workflow
|
||||
|
||||
We bundle the recipe and model together into a `workflow`, which organizes the entire modeling process.
|
||||
We bundle the recipe and model together into a `workflow`, which organises the entire modeling process.
|
||||
|
||||
```{r}
|
||||
# Combine the recipe and model into a workflow
|
||||
@ -9396,7 +9403,7 @@ fitted_workflow <- resistance_workflow %>%
|
||||
- `initial_split()` splits the data into training and testing sets.
|
||||
- `fit()` trains the workflow on the training set.
|
||||
|
||||
Notice how in `fit()`, the antibiotic selector functions are internally called again. For training, these functions are called since they are stored in the recipe.
|
||||
Notice how in `fit()`, the antimicrobial selector functions are internally called again. For training, these functions are called since they are stored in the recipe.
|
||||
|
||||
Next, we evaluate the model on the testing data.
|
||||
|
||||
@ -9418,6 +9425,14 @@ metrics <- predictions %>%
|
||||
metrics(truth = mo, estimate = .pred_class) # Calculate performance metrics
|
||||
|
||||
metrics
|
||||
|
||||
|
||||
# To assess some other model properties, you can make our own `metrics()` function
|
||||
our_metrics <- metric_set(accuracy, kap, ppv, npv) # add Positive Predictive Value and Negative Predictive Value
|
||||
metrics2 <- predictions %>%
|
||||
our_metrics(truth = mo, estimate = .pred_class) # run again on our `our_metrics()` function
|
||||
|
||||
metrics2
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
@ -9425,7 +9440,7 @@ metrics
|
||||
- `predict()` generates predictions on the testing set.
|
||||
- `metrics()` computes evaluation metrics like accuracy and kappa.
|
||||
|
||||
It appears we can predict the Gram based on AMR results with a `r round(metrics$.estimate[1], 3) * 100`% accuracy based on AMR results of aminoglycosides and beta-lactam antibiotics. The ROC curve looks like this:
|
||||
It appears we can predict the Gram stain with a `r round(metrics$.estimate[1], 3) * 100`% accuracy based on AMR results of only aminoglycosides and beta-lactam antibiotics. The ROC curve looks like this:
|
||||
|
||||
```{r}
|
||||
predictions %>%
|
||||
@ -9437,7 +9452,173 @@ predictions %>%
|
||||
|
||||
In this post, we demonstrated how to build a machine learning pipeline with the `tidymodels` framework and the `AMR` package. By combining selector functions like `aminoglycosides()` and `betalactams()` with `tidymodels`, we efficiently prepared data, trained a model, and evaluated its performance.
|
||||
|
||||
This workflow is extensible to other antibiotic classes and resistance patterns, empowering users to analyse AMR data systematically and reproducibly.
|
||||
This workflow is extensible to other antimicrobial classes and resistance patterns, empowering users to analyse AMR data systematically and reproducibly.
|
||||
|
||||
|
||||
---
|
||||
|
||||
|
||||
## Example 2: Predicting AMR Over Time
|
||||
|
||||
In this second example, we aim to predict antimicrobial resistance (AMR) trends over time using `tidymodels`. We will model resistance to three antibiotics (amoxicillin `AMX`, amoxicillin-clavulanic acid `AMC`, and ciprofloxacin `CIP`), based on historical data grouped by year and hospital ward.
|
||||
|
||||
### **Objective**
|
||||
|
||||
Our goal is to:
|
||||
|
||||
1. Prepare the dataset by aggregating resistance data over time.
|
||||
2. Define a regression model to predict AMR trends.
|
||||
3. Use `tidymodels` to preprocess, train, and evaluate the model.
|
||||
|
||||
### **Data Preparation**
|
||||
|
||||
We start by transforming the `example_isolates` dataset into a structured time-series format.
|
||||
|
||||
```{r}
|
||||
# Load required libraries
|
||||
library(AMR)
|
||||
library(tidymodels)
|
||||
|
||||
# Transform dataset
|
||||
data_time <- example_isolates %>%
|
||||
top_n_microorganisms(n = 10) %>% # Filter on the top #10 species
|
||||
mutate(year = as.integer(format(date, "%Y")), # Extract year from date
|
||||
gramstain = mo_gramstain(mo)) %>% # Get taxonomic names
|
||||
group_by(year, gramstain) %>%
|
||||
summarise(across(c(AMX, AMC, CIP),
|
||||
function(x) resistance(x, minimum = 0),
|
||||
.names = "res_{.col}"),
|
||||
.groups = "drop") %>%
|
||||
filter(!is.na(res_AMX) & !is.na(res_AMC) & !is.na(res_CIP)) # Drop missing values
|
||||
|
||||
data_time
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
- `mo_name(mo)`: Converts microbial codes into proper species names.
|
||||
- `resistance()`: Converts AMR results into numeric values (proportion of resistant isolates).
|
||||
- `group_by(year, ward, species)`: Aggregates resistance rates by year and ward.
|
||||
|
||||
### **Defining the Workflow**
|
||||
|
||||
We now define the modeling workflow, which consists of a preprocessing step, a model specification, and the fitting process.
|
||||
|
||||
#### 1. Preprocessing with a Recipe
|
||||
|
||||
```{r}
|
||||
# Define the recipe
|
||||
resistance_recipe_time <- recipe(res_AMX ~ year + gramstain, data = data_time) %>%
|
||||
step_dummy(gramstain, one_hot = TRUE) %>% # Convert categorical to numerical
|
||||
step_normalize(year) %>% # Normalise year for better model performance
|
||||
step_nzv(all_predictors()) # Remove near-zero variance predictors
|
||||
|
||||
resistance_recipe_time
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
- `step_dummy()`: Encodes categorical variables (`ward`, `species`) as numerical indicators.
|
||||
- `step_normalize()`: Normalises the `year` variable.
|
||||
- `step_nzv()`: Removes near-zero variance predictors.
|
||||
|
||||
#### 2. Specifying the Model
|
||||
|
||||
We use a linear regression model to predict resistance trends.
|
||||
|
||||
```{r}
|
||||
# Define the linear regression model
|
||||
lm_model <- linear_reg() %>%
|
||||
set_engine("lm") # Use linear regression
|
||||
|
||||
lm_model
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
- `linear_reg()`: Defines a linear regression model.
|
||||
- `set_engine("lm")`: Uses R’s built-in linear regression engine.
|
||||
|
||||
#### 3. Building the Workflow
|
||||
|
||||
We combine the preprocessing recipe and model into a workflow.
|
||||
|
||||
```{r}
|
||||
# Create workflow
|
||||
resistance_workflow_time <- workflow() %>%
|
||||
add_recipe(resistance_recipe_time) %>%
|
||||
add_model(lm_model)
|
||||
|
||||
resistance_workflow_time
|
||||
```
|
||||
|
||||
### **Training and Evaluating the Model**
|
||||
|
||||
We split the data into training and testing sets, fit the model, and evaluate performance.
|
||||
|
||||
```{r}
|
||||
# Split the data
|
||||
set.seed(123)
|
||||
data_split_time <- initial_split(data_time, prop = 0.8)
|
||||
train_time <- training(data_split_time)
|
||||
test_time <- testing(data_split_time)
|
||||
|
||||
# Train the model
|
||||
fitted_workflow_time <- resistance_workflow_time %>%
|
||||
fit(train_time)
|
||||
|
||||
# Make predictions
|
||||
predictions_time <- fitted_workflow_time %>%
|
||||
predict(test_time) %>%
|
||||
bind_cols(test_time)
|
||||
|
||||
# Evaluate model
|
||||
metrics_time <- predictions_time %>%
|
||||
metrics(truth = res_AMX, estimate = .pred)
|
||||
|
||||
metrics_time
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
- `initial_split()`: Splits data into training and testing sets.
|
||||
- `fit()`: Trains the workflow.
|
||||
- `predict()`: Generates resistance predictions.
|
||||
- `metrics()`: Evaluates model performance.
|
||||
|
||||
### **Visualizing Predictions**
|
||||
|
||||
We plot resistance trends over time for Amoxicillin.
|
||||
|
||||
```{r}
|
||||
library(ggplot2)
|
||||
|
||||
# Plot actual vs predicted resistance over time
|
||||
ggplot(predictions_time, aes(x = year)) +
|
||||
geom_point(aes(y = res_AMX, color = "Actual")) +
|
||||
geom_line(aes(y = .pred, color = "Predicted")) +
|
||||
labs(title = "Predicted vs Actual AMX Resistance Over Time",
|
||||
x = "Year",
|
||||
y = "Resistance Proportion") +
|
||||
theme_minimal()
|
||||
```
|
||||
|
||||
Additionally, we can visualise resistance trends in `ggplot2` and directly adding linear models there:
|
||||
|
||||
```{r}
|
||||
ggplot(data_time, aes(x = year, y = res_AMX, color = gramstain)) +
|
||||
geom_line() +
|
||||
labs(title = "AMX Resistance Trends",
|
||||
x = "Year",
|
||||
y = "Resistance Proportion") +
|
||||
# add a linear model directly in ggplot2:
|
||||
geom_smooth(method = "lm",
|
||||
formula = y ~ x,
|
||||
alpha = 0.25) +
|
||||
theme_minimal()
|
||||
```
|
||||
|
||||
### **Conclusion**
|
||||
|
||||
In this example, we demonstrated how to analyze AMR trends over time using `tidymodels`. By aggregating resistance rates by year and hospital ward, we built a predictive model to track changes in resistance to amoxicillin (`AMX`), amoxicillin-clavulanic acid (`AMC`), and ciprofloxacin (`CIP`).
|
||||
|
||||
This method can be extended to other antibiotics and resistance patterns, providing valuable insights into AMR dynamics in healthcare settings.
|
||||
|
||||
|
||||
|
||||
@ -10234,144 +10415,6 @@ antivirals %>%
|
||||
|
||||
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
THE PART HEREAFTER CONTAINS CONTENTS FROM FILE 'vignettes/resistance_predict.Rmd':
|
||||
|
||||
|
||||
---
|
||||
title: "How to predict antimicrobial resistance"
|
||||
output:
|
||||
rmarkdown::html_vignette:
|
||||
toc: true
|
||||
vignette: >
|
||||
%\VignetteIndexEntry{How to predict antimicrobial resistance}
|
||||
%\VignetteEncoding{UTF-8}
|
||||
%\VignetteEngine{knitr::rmarkdown}
|
||||
editor_options:
|
||||
chunk_output_type: console
|
||||
---
|
||||
|
||||
```{r setup, include = FALSE, results = 'markup'}
|
||||
knitr::opts_chunk$set(
|
||||
collapse = TRUE,
|
||||
comment = "#>",
|
||||
fig.width = 7.5,
|
||||
fig.height = 4.75
|
||||
)
|
||||
```
|
||||
|
||||
## Needed R packages
|
||||
As with many uses in R, we need some additional packages for AMR data analysis. Our package works closely together with the [tidyverse packages](https://www.tidyverse.org) [`dplyr`](https://dplyr.tidyverse.org/) and [`ggplot2`](https://ggplot2.tidyverse.org). The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.
|
||||
|
||||
Our `AMR` package depends on these packages and even extends their use and functions.
|
||||
|
||||
```{r lib packages, message = FALSE}
|
||||
library(dplyr)
|
||||
library(ggplot2)
|
||||
library(AMR)
|
||||
|
||||
# (if not yet installed, install with:)
|
||||
# install.packages(c("tidyverse", "AMR"))
|
||||
```
|
||||
|
||||
## Prediction analysis
|
||||
Our package contains a function `resistance_predict()`, which takes the same input as functions for [other AMR data analysis](./AMR.html). Based on a date column, it calculates cases per year and uses a regression model to predict antimicrobial resistance.
|
||||
|
||||
It is basically as easy as:
|
||||
```{r, eval = FALSE}
|
||||
# resistance prediction of piperacillin/tazobactam (TZP):
|
||||
resistance_predict(tbl = example_isolates, col_date = "date", col_ab = "TZP", model = "binomial")
|
||||
|
||||
# or:
|
||||
example_isolates %>%
|
||||
resistance_predict(
|
||||
col_ab = "TZP",
|
||||
model = "binomial"
|
||||
)
|
||||
|
||||
# to bind it to object 'predict_TZP' for example:
|
||||
predict_TZP <- example_isolates %>%
|
||||
resistance_predict(
|
||||
col_ab = "TZP",
|
||||
model = "binomial"
|
||||
)
|
||||
```
|
||||
|
||||
The function will look for a date column itself if `col_date` is not set.
|
||||
|
||||
When running any of these commands, a summary of the regression model will be printed unless using `resistance_predict(..., info = FALSE)`.
|
||||
|
||||
```{r, echo = FALSE, message = FALSE}
|
||||
predict_TZP <- example_isolates %>%
|
||||
resistance_predict(col_ab = "TZP", model = "binomial")
|
||||
```
|
||||
|
||||
This text is only a printed summary - the actual result (output) of the function is a `data.frame` containing for each year: the number of observations, the actual observed resistance, the estimated resistance and the standard error below and above the estimation:
|
||||
|
||||
```{r}
|
||||
predict_TZP
|
||||
```
|
||||
|
||||
The function `plot` is available in base R, and can be extended by other packages to depend the output based on the type of input. We extended its function to cope with resistance predictions:
|
||||
|
||||
```{r, fig.height = 5.5}
|
||||
plot(predict_TZP)
|
||||
```
|
||||
|
||||
This is the fastest way to plot the result. It automatically adds the right axes, error bars, titles, number of available observations and type of model.
|
||||
|
||||
We also support the `ggplot2` package with our custom function `ggplot_sir_predict()` to create more appealing plots:
|
||||
|
||||
```{r}
|
||||
ggplot_sir_predict(predict_TZP)
|
||||
|
||||
# choose for error bars instead of a ribbon
|
||||
ggplot_sir_predict(predict_TZP, ribbon = FALSE)
|
||||
```
|
||||
|
||||
### Choosing the right model
|
||||
|
||||
Resistance is not easily predicted; if we look at vancomycin resistance in Gram-positive bacteria, the spread (i.e. standard error) is enormous:
|
||||
|
||||
```{r}
|
||||
example_isolates %>%
|
||||
filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
|
||||
resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "binomial") %>%
|
||||
ggplot_sir_predict()
|
||||
```
|
||||
|
||||
Vancomycin resistance could be 100% in ten years, but might remain very low.
|
||||
|
||||
You can define the model with the `model` parameter. The model chosen above is a generalised linear regression model using a binomial distribution, assuming that a period of zero resistance was followed by a period of increasing resistance leading slowly to more and more resistance.
|
||||
|
||||
Valid values are:
|
||||
|
||||
| Input values | Function used by R | Type of model |
|
||||
|----------------------------------------|-------------------------------|-----------------------------------------------------|
|
||||
| `"binomial"` or `"binom"` or `"logit"` | `glm(..., family = binomial)` | Generalised linear model with binomial distribution |
|
||||
| `"loglin"` or `"poisson"` | `glm(..., family = poisson)` | Generalised linear model with poisson distribution |
|
||||
| `"lin"` or `"linear"` | `lm()` | Linear model |
|
||||
|
||||
For the vancomycin resistance in Gram-positive bacteria, a linear model might be more appropriate:
|
||||
|
||||
```{r}
|
||||
example_isolates %>%
|
||||
filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
|
||||
resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>%
|
||||
ggplot_sir_predict()
|
||||
```
|
||||
|
||||
The model itself is also available from the object, as an `attribute`:
|
||||
```{r}
|
||||
model <- attributes(predict_TZP)$model
|
||||
|
||||
summary(model)$family
|
||||
|
||||
summary(model)$coefficients
|
||||
```
|
||||
|
||||
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
THE PART HEREAFTER CONTAINS CONTENTS FROM FILE 'vignettes/welcome_to_AMR.Rmd':
|
||||
|
@ -29,7 +29,7 @@
|
||||
|
||||
# ------------------------------------------------
|
||||
# THIS FILE WAS CREATED AUTOMATICALLY!
|
||||
# Source file: data-raw/reproduction scripts/reproduction_of_poorman.R
|
||||
# Source file: data-raw/_reproduction_scripts/reproduction_of_poorman.R
|
||||
# ------------------------------------------------
|
||||
|
||||
# {poorman}: a package to replace all dplyr functions with base R so we can lose dependency on {dplyr}.
|
||||
|
2
index.md
2
index.md
@ -240,7 +240,7 @@ This package was intended as a comprehensive toolbox for integrated AMR data ana
|
||||
* Calculating antimicrobial resistance ([tutorial](./articles/AMR.html))
|
||||
* Determining multi-drug resistance (MDR) / multi-drug resistant organisms (MDRO) ([tutorial](./articles/MDR.html))
|
||||
* Calculating (empirical) susceptibility of both mono therapy and combination therapies ([tutorial](./articles/AMR.html))
|
||||
* Predicting future antimicrobial resistance using regression models ([tutorial](./articles/resistance_predict.html))
|
||||
* Apply AMR function in predictive modelling ([tutorial](./articles/AMR_with_tidymodels.html))
|
||||
* Getting properties for any microorganism (like Gram stain, species, genus or family) ([manual](./reference/mo_property.html))
|
||||
* Getting properties for any antimicrobial (like name, code of EARS-Net/ATC/LOINC/PubChem, defined daily dose or trade name) ([manual](./reference/ab_property.html))
|
||||
* Plotting antimicrobial resistance ([tutorial](./articles/AMR.html))
|
||||
|
@ -50,7 +50,7 @@ The default is \code{"human"}, which can also be set with the package option \co
|
||||
|
||||
Clinical breakpoints in this package were validated through and imported from \href{https://whonet.org}{WHONET}, a free desktop Windows application developed and supported by the WHO Collaborating Centre for Surveillance of Antimicrobial Resistance. More can be read on \href{https://whonet.org}{their website}. The developers of WHONET and this \code{AMR} package have been in contact about sharing their work. We highly appreciate their great development on the WHONET software.
|
||||
|
||||
Our import and reproduction script can be found here: <https://github.com/msberends/AMR/blob/main/data-raw/reproduction scripts/reproduction_of_clinical_breakpoints.R>.
|
||||
Our import and reproduction script can be found here: \url{https://github.com/msberends/AMR/blob/main/data-raw/_reproduction_scripts/reproduction_of_clinical_breakpoints.R}.
|
||||
}
|
||||
|
||||
\subsection{Response From CLSI and EUCAST}{
|
||||
|
@ -55,7 +55,7 @@ microorganisms
|
||||
\description{
|
||||
A data set containing the full microbial taxonomy (\strong{last updated: June 24th, 2024}) of six kingdoms. This data set is the backbone of this \code{AMR} package. MO codes can be looked up using \code{\link[=as.mo]{as.mo()}} and microorganism properties can be looked up using any of the \code{\link[=mo_property]{mo_*}} functions.
|
||||
|
||||
This data set is carefully crafted, yet made 100\% reproducible from public and authoritative taxonomic sources (using \link{this script}(https://github.com/msberends/AMR/blob/main/data-raw/reproduction scripts/reproduction_of_microorganisms.R)), namely: \emph{List of Prokaryotic names with Standing in Nomenclature (LPSN)} for bacteria, \emph{MycoBank} for fungi, and \emph{Global Biodiversity Information Facility (GBIF)} for all others taxons.
|
||||
This data set is carefully crafted, yet made 100\% reproducible from public and authoritative taxonomic sources (using \href{https://github.com/msberends/AMR/blob/main/data-raw/_reproduction_scripts/reproduction_of_microorganisms.R}{this script}), namely: \emph{List of Prokaryotic names with Standing in Nomenclature (LPSN)} for bacteria, \emph{MycoBank} for fungi, and \emph{Global Biodiversity Information Facility (GBIF)} for all others taxons.
|
||||
}
|
||||
\details{
|
||||
Please note that entries are only based on LPSN, MycoBank, and GBIF (see below). Since these sources incorporate entries based on (recent) publications in the International Journal of Systematic and Evolutionary Microbiology (IJSEM), it can happen that the year of publication is sometimes later than one might expect.
|
||||
@ -86,7 +86,7 @@ For convenience, some entries were added manually:
|
||||
\item 8 other 'undefined' entries (unknown, unknown Gram-negatives, unknown Gram-positives, unknown yeast, unknown fungus, and unknown anaerobic Gram-pos/Gram-neg bacteria)
|
||||
}
|
||||
|
||||
The syntax used to transform the original data to a cleansed \R format, can be \link{found here}(https://github.com/msberends/AMR/blob/main/data-raw/reproduction scripts/reproduction_of_microorganisms.R).
|
||||
The syntax used to transform the original data to a cleansed \R format, can be \href{https://github.com/msberends/AMR/blob/main/data-raw/_reproduction_scripts/reproduction_of_microorganisms.R}{found here}.
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -63,7 +63,7 @@ test_that("test-misc.R", {
|
||||
is_right <<- FALSE
|
||||
for (env in sys.frames()) {
|
||||
if (!is.null(env[[check_element]]) && is.data.frame(env[[check_element]])) {
|
||||
is_right <<- TRUE
|
||||
is_right <<- all(colnames(example_isolates) %in% colnames(env[[check_element]]))
|
||||
}
|
||||
}
|
||||
return_val
|
||||
|
@ -24,7 +24,11 @@ knitr::opts_chunk$set(
|
||||
|
||||
> This page was entirely written by our [AMR for R Assistant](https://chatgpt.com/g/g-M4UNLwFi5-amr-for-r-assistant), a ChatGPT manually-trained model able to answer any question about the AMR package.
|
||||
|
||||
Antimicrobial resistance (AMR) is a global health crisis, and understanding resistance patterns is crucial for managing effective treatments. The `AMR` R package provides robust tools for analysing AMR data, including convenient antibiotic selector functions like `aminoglycosides()` and `betalactams()`. In this post, we will explore how to use the `tidymodels` framework to predict resistance patterns in the `example_isolates` dataset.
|
||||
---
|
||||
|
||||
## Example 1: Using Antimicrobial Selectors
|
||||
|
||||
Antimicrobial resistance (AMR) is a global health crisis, and understanding resistance patterns is crucial for managing effective treatments. The `AMR` R package provides robust tools for analysing AMR data, including convenient antimicrobial selector functions like `aminoglycosides()` and `betalactams()`. In this post, we will explore how to use the `tidymodels` framework to predict resistance patterns in the `example_isolates` dataset.
|
||||
|
||||
By leveraging the power of `tidymodels` and the `AMR` package, we’ll build a reproducible machine learning workflow to predict the Gramstain of the microorganism to two important antibiotic classes: aminoglycosides and beta-lactams.
|
||||
|
||||
@ -45,6 +49,9 @@ We begin by loading the required libraries and preparing the `example_isolates`
|
||||
library(AMR) # For AMR data analysis
|
||||
library(tidymodels) # For machine learning workflows, and data manipulation (dplyr, tidyr, ...)
|
||||
|
||||
# Your data could look like this:
|
||||
example_isolates
|
||||
|
||||
# Select relevant columns for prediction
|
||||
data <- example_isolates %>%
|
||||
# select AB results dynamically
|
||||
@ -92,7 +99,7 @@ prep(resistance_recipe)
|
||||
- `recipe(mo ~ ., data = data)` will take the `mo` column as outcome and all other columns as predictors.
|
||||
- `step_corr()` removes predictors (i.e., antibiotic columns) that have a higher correlation than 90%.
|
||||
|
||||
Notice how the recipe contains just the antibiotic selector functions - no need to define the columns specifically. In the preparation (retrieved with `prep()`) we can see that the columns or variables `r paste0("'", suppressMessages(prep(resistance_recipe))$steps[[1]]$removals, "'", collapse = " and ")` were removed as they correlate too much with existing, other variables.
|
||||
Notice how the recipe contains just the antimicrobial selector functions - no need to define the columns specifically. In the preparation (retrieved with `prep()`) we can see that the columns or variables `r paste0("'", suppressMessages(prep(resistance_recipe))$steps[[1]]$removals, "'", collapse = " and ")` were removed as they correlate too much with existing, other variables.
|
||||
|
||||
#### 2. Specifying the Model
|
||||
|
||||
@ -101,7 +108,7 @@ We define a logistic regression model since resistance prediction is a binary cl
|
||||
```{r}
|
||||
# Specify a logistic regression model
|
||||
logistic_model <- logistic_reg() %>%
|
||||
set_engine("glm") # Use the Generalized Linear Model engine
|
||||
set_engine("glm") # Use the Generalised Linear Model engine
|
||||
logistic_model
|
||||
```
|
||||
|
||||
@ -112,7 +119,7 @@ logistic_model
|
||||
|
||||
#### 3. Building the Workflow
|
||||
|
||||
We bundle the recipe and model together into a `workflow`, which organizes the entire modeling process.
|
||||
We bundle the recipe and model together into a `workflow`, which organises the entire modeling process.
|
||||
|
||||
```{r}
|
||||
# Combine the recipe and model into a workflow
|
||||
@ -143,7 +150,7 @@ fitted_workflow <- resistance_workflow %>%
|
||||
- `initial_split()` splits the data into training and testing sets.
|
||||
- `fit()` trains the workflow on the training set.
|
||||
|
||||
Notice how in `fit()`, the antibiotic selector functions are internally called again. For training, these functions are called since they are stored in the recipe.
|
||||
Notice how in `fit()`, the antimicrobial selector functions are internally called again. For training, these functions are called since they are stored in the recipe.
|
||||
|
||||
Next, we evaluate the model on the testing data.
|
||||
|
||||
@ -165,6 +172,14 @@ metrics <- predictions %>%
|
||||
metrics(truth = mo, estimate = .pred_class) # Calculate performance metrics
|
||||
|
||||
metrics
|
||||
|
||||
|
||||
# To assess some other model properties, you can make our own `metrics()` function
|
||||
our_metrics <- metric_set(accuracy, kap, ppv, npv) # add Positive Predictive Value and Negative Predictive Value
|
||||
metrics2 <- predictions %>%
|
||||
our_metrics(truth = mo, estimate = .pred_class) # run again on our `our_metrics()` function
|
||||
|
||||
metrics2
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
@ -172,7 +187,7 @@ metrics
|
||||
- `predict()` generates predictions on the testing set.
|
||||
- `metrics()` computes evaluation metrics like accuracy and kappa.
|
||||
|
||||
It appears we can predict the Gram based on AMR results with a `r round(metrics$.estimate[1], 3) * 100`% accuracy based on AMR results of aminoglycosides and beta-lactam antibiotics. The ROC curve looks like this:
|
||||
It appears we can predict the Gram stain with a `r round(metrics$.estimate[1], 3) * 100`% accuracy based on AMR results of only aminoglycosides and beta-lactam antibiotics. The ROC curve looks like this:
|
||||
|
||||
```{r}
|
||||
predictions %>%
|
||||
@ -184,4 +199,170 @@ predictions %>%
|
||||
|
||||
In this post, we demonstrated how to build a machine learning pipeline with the `tidymodels` framework and the `AMR` package. By combining selector functions like `aminoglycosides()` and `betalactams()` with `tidymodels`, we efficiently prepared data, trained a model, and evaluated its performance.
|
||||
|
||||
This workflow is extensible to other antibiotic classes and resistance patterns, empowering users to analyse AMR data systematically and reproducibly.
|
||||
This workflow is extensible to other antimicrobial classes and resistance patterns, empowering users to analyse AMR data systematically and reproducibly.
|
||||
|
||||
|
||||
---
|
||||
|
||||
|
||||
## Example 2: Predicting AMR Over Time
|
||||
|
||||
In this second example, we aim to predict antimicrobial resistance (AMR) trends over time using `tidymodels`. We will model resistance to three antibiotics (amoxicillin `AMX`, amoxicillin-clavulanic acid `AMC`, and ciprofloxacin `CIP`), based on historical data grouped by year and hospital ward.
|
||||
|
||||
### **Objective**
|
||||
|
||||
Our goal is to:
|
||||
|
||||
1. Prepare the dataset by aggregating resistance data over time.
|
||||
2. Define a regression model to predict AMR trends.
|
||||
3. Use `tidymodels` to preprocess, train, and evaluate the model.
|
||||
|
||||
### **Data Preparation**
|
||||
|
||||
We start by transforming the `example_isolates` dataset into a structured time-series format.
|
||||
|
||||
```{r}
|
||||
# Load required libraries
|
||||
library(AMR)
|
||||
library(tidymodels)
|
||||
|
||||
# Transform dataset
|
||||
data_time <- example_isolates %>%
|
||||
top_n_microorganisms(n = 10) %>% # Filter on the top #10 species
|
||||
mutate(year = as.integer(format(date, "%Y")), # Extract year from date
|
||||
gramstain = mo_gramstain(mo)) %>% # Get taxonomic names
|
||||
group_by(year, gramstain) %>%
|
||||
summarise(across(c(AMX, AMC, CIP),
|
||||
function(x) resistance(x, minimum = 0),
|
||||
.names = "res_{.col}"),
|
||||
.groups = "drop") %>%
|
||||
filter(!is.na(res_AMX) & !is.na(res_AMC) & !is.na(res_CIP)) # Drop missing values
|
||||
|
||||
data_time
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
- `mo_name(mo)`: Converts microbial codes into proper species names.
|
||||
- `resistance()`: Converts AMR results into numeric values (proportion of resistant isolates).
|
||||
- `group_by(year, ward, species)`: Aggregates resistance rates by year and ward.
|
||||
|
||||
### **Defining the Workflow**
|
||||
|
||||
We now define the modeling workflow, which consists of a preprocessing step, a model specification, and the fitting process.
|
||||
|
||||
#### 1. Preprocessing with a Recipe
|
||||
|
||||
```{r}
|
||||
# Define the recipe
|
||||
resistance_recipe_time <- recipe(res_AMX ~ year + gramstain, data = data_time) %>%
|
||||
step_dummy(gramstain, one_hot = TRUE) %>% # Convert categorical to numerical
|
||||
step_normalize(year) %>% # Normalise year for better model performance
|
||||
step_nzv(all_predictors()) # Remove near-zero variance predictors
|
||||
|
||||
resistance_recipe_time
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
- `step_dummy()`: Encodes categorical variables (`ward`, `species`) as numerical indicators.
|
||||
- `step_normalize()`: Normalises the `year` variable.
|
||||
- `step_nzv()`: Removes near-zero variance predictors.
|
||||
|
||||
#### 2. Specifying the Model
|
||||
|
||||
We use a linear regression model to predict resistance trends.
|
||||
|
||||
```{r}
|
||||
# Define the linear regression model
|
||||
lm_model <- linear_reg() %>%
|
||||
set_engine("lm") # Use linear regression
|
||||
|
||||
lm_model
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
- `linear_reg()`: Defines a linear regression model.
|
||||
- `set_engine("lm")`: Uses R’s built-in linear regression engine.
|
||||
|
||||
#### 3. Building the Workflow
|
||||
|
||||
We combine the preprocessing recipe and model into a workflow.
|
||||
|
||||
```{r}
|
||||
# Create workflow
|
||||
resistance_workflow_time <- workflow() %>%
|
||||
add_recipe(resistance_recipe_time) %>%
|
||||
add_model(lm_model)
|
||||
|
||||
resistance_workflow_time
|
||||
```
|
||||
|
||||
### **Training and Evaluating the Model**
|
||||
|
||||
We split the data into training and testing sets, fit the model, and evaluate performance.
|
||||
|
||||
```{r}
|
||||
# Split the data
|
||||
set.seed(123)
|
||||
data_split_time <- initial_split(data_time, prop = 0.8)
|
||||
train_time <- training(data_split_time)
|
||||
test_time <- testing(data_split_time)
|
||||
|
||||
# Train the model
|
||||
fitted_workflow_time <- resistance_workflow_time %>%
|
||||
fit(train_time)
|
||||
|
||||
# Make predictions
|
||||
predictions_time <- fitted_workflow_time %>%
|
||||
predict(test_time) %>%
|
||||
bind_cols(test_time)
|
||||
|
||||
# Evaluate model
|
||||
metrics_time <- predictions_time %>%
|
||||
metrics(truth = res_AMX, estimate = .pred)
|
||||
|
||||
metrics_time
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
- `initial_split()`: Splits data into training and testing sets.
|
||||
- `fit()`: Trains the workflow.
|
||||
- `predict()`: Generates resistance predictions.
|
||||
- `metrics()`: Evaluates model performance.
|
||||
|
||||
### **Visualizing Predictions**
|
||||
|
||||
We plot resistance trends over time for Amoxicillin.
|
||||
|
||||
```{r}
|
||||
library(ggplot2)
|
||||
|
||||
# Plot actual vs predicted resistance over time
|
||||
ggplot(predictions_time, aes(x = year)) +
|
||||
geom_point(aes(y = res_AMX, color = "Actual")) +
|
||||
geom_line(aes(y = .pred, color = "Predicted")) +
|
||||
labs(title = "Predicted vs Actual AMX Resistance Over Time",
|
||||
x = "Year",
|
||||
y = "Resistance Proportion") +
|
||||
theme_minimal()
|
||||
```
|
||||
|
||||
Additionally, we can visualise resistance trends in `ggplot2` and directly adding linear models there:
|
||||
|
||||
```{r}
|
||||
ggplot(data_time, aes(x = year, y = res_AMX, color = gramstain)) +
|
||||
geom_line() +
|
||||
labs(title = "AMX Resistance Trends",
|
||||
x = "Year",
|
||||
y = "Resistance Proportion") +
|
||||
# add a linear model directly in ggplot2:
|
||||
geom_smooth(method = "lm",
|
||||
formula = y ~ x,
|
||||
alpha = 0.25) +
|
||||
theme_minimal()
|
||||
```
|
||||
|
||||
### **Conclusion**
|
||||
|
||||
In this example, we demonstrated how to analyze AMR trends over time using `tidymodels`. By aggregating resistance rates by year and hospital ward, we built a predictive model to track changes in resistance to amoxicillin (`AMX`), amoxicillin-clavulanic acid (`AMC`), and ciprofloxacin (`CIP`).
|
||||
|
||||
This method can be extended to other antibiotics and resistance patterns, providing valuable insights into AMR dynamics in healthcare settings.
|
||||
|
@ -1,131 +0,0 @@
|
||||
---
|
||||
title: "How to predict antimicrobial resistance"
|
||||
output:
|
||||
rmarkdown::html_vignette:
|
||||
toc: true
|
||||
vignette: >
|
||||
%\VignetteIndexEntry{How to predict antimicrobial resistance}
|
||||
%\VignetteEncoding{UTF-8}
|
||||
%\VignetteEngine{knitr::rmarkdown}
|
||||
editor_options:
|
||||
chunk_output_type: console
|
||||
---
|
||||
|
||||
```{r setup, include = FALSE, results = 'markup'}
|
||||
knitr::opts_chunk$set(
|
||||
collapse = TRUE,
|
||||
comment = "#>",
|
||||
fig.width = 7.5,
|
||||
fig.height = 4.75
|
||||
)
|
||||
```
|
||||
|
||||
## Needed R packages
|
||||
As with many uses in R, we need some additional packages for AMR data analysis. Our package works closely together with the [tidyverse packages](https://www.tidyverse.org) [`dplyr`](https://dplyr.tidyverse.org/) and [`ggplot2`](https://ggplot2.tidyverse.org). The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.
|
||||
|
||||
Our `AMR` package depends on these packages and even extends their use and functions.
|
||||
|
||||
```{r lib packages, message = FALSE}
|
||||
library(dplyr)
|
||||
library(ggplot2)
|
||||
library(AMR)
|
||||
|
||||
# (if not yet installed, install with:)
|
||||
# install.packages(c("tidyverse", "AMR"))
|
||||
```
|
||||
|
||||
## Prediction analysis
|
||||
Our package contains a function `resistance_predict()`, which takes the same input as functions for [other AMR data analysis](./AMR.html). Based on a date column, it calculates cases per year and uses a regression model to predict antimicrobial resistance.
|
||||
|
||||
It is basically as easy as:
|
||||
```{r, eval = FALSE}
|
||||
# resistance prediction of piperacillin/tazobactam (TZP):
|
||||
resistance_predict(tbl = example_isolates, col_date = "date", col_ab = "TZP", model = "binomial")
|
||||
|
||||
# or:
|
||||
example_isolates %>%
|
||||
resistance_predict(
|
||||
col_ab = "TZP",
|
||||
model = "binomial"
|
||||
)
|
||||
|
||||
# to bind it to object 'predict_TZP' for example:
|
||||
predict_TZP <- example_isolates %>%
|
||||
resistance_predict(
|
||||
col_ab = "TZP",
|
||||
model = "binomial"
|
||||
)
|
||||
```
|
||||
|
||||
The function will look for a date column itself if `col_date` is not set.
|
||||
|
||||
When running any of these commands, a summary of the regression model will be printed unless using `resistance_predict(..., info = FALSE)`.
|
||||
|
||||
```{r, echo = FALSE, message = FALSE}
|
||||
predict_TZP <- example_isolates %>%
|
||||
resistance_predict(col_ab = "TZP", model = "binomial")
|
||||
```
|
||||
|
||||
This text is only a printed summary - the actual result (output) of the function is a `data.frame` containing for each year: the number of observations, the actual observed resistance, the estimated resistance and the standard error below and above the estimation:
|
||||
|
||||
```{r}
|
||||
predict_TZP
|
||||
```
|
||||
|
||||
The function `plot` is available in base R, and can be extended by other packages to depend the output based on the type of input. We extended its function to cope with resistance predictions:
|
||||
|
||||
```{r, fig.height = 5.5}
|
||||
plot(predict_TZP)
|
||||
```
|
||||
|
||||
This is the fastest way to plot the result. It automatically adds the right axes, error bars, titles, number of available observations and type of model.
|
||||
|
||||
We also support the `ggplot2` package with our custom function `ggplot_sir_predict()` to create more appealing plots:
|
||||
|
||||
```{r}
|
||||
ggplot_sir_predict(predict_TZP)
|
||||
|
||||
# choose for error bars instead of a ribbon
|
||||
ggplot_sir_predict(predict_TZP, ribbon = FALSE)
|
||||
```
|
||||
|
||||
### Choosing the right model
|
||||
|
||||
Resistance is not easily predicted; if we look at vancomycin resistance in Gram-positive bacteria, the spread (i.e. standard error) is enormous:
|
||||
|
||||
```{r}
|
||||
example_isolates %>%
|
||||
filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
|
||||
resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "binomial") %>%
|
||||
ggplot_sir_predict()
|
||||
```
|
||||
|
||||
Vancomycin resistance could be 100% in ten years, but might remain very low.
|
||||
|
||||
You can define the model with the `model` parameter. The model chosen above is a generalised linear regression model using a binomial distribution, assuming that a period of zero resistance was followed by a period of increasing resistance leading slowly to more and more resistance.
|
||||
|
||||
Valid values are:
|
||||
|
||||
| Input values | Function used by R | Type of model |
|
||||
|----------------------------------------|-------------------------------|-----------------------------------------------------|
|
||||
| `"binomial"` or `"binom"` or `"logit"` | `glm(..., family = binomial)` | Generalised linear model with binomial distribution |
|
||||
| `"loglin"` or `"poisson"` | `glm(..., family = poisson)` | Generalised linear model with poisson distribution |
|
||||
| `"lin"` or `"linear"` | `lm()` | Linear model |
|
||||
|
||||
For the vancomycin resistance in Gram-positive bacteria, a linear model might be more appropriate:
|
||||
|
||||
```{r}
|
||||
example_isolates %>%
|
||||
filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
|
||||
resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>%
|
||||
ggplot_sir_predict()
|
||||
```
|
||||
|
||||
The model itself is also available from the object, as an `attribute`:
|
||||
```{r}
|
||||
model <- attributes(predict_TZP)$model
|
||||
|
||||
summary(model)$family
|
||||
|
||||
summary(model)$coefficients
|
||||
```
|
Loading…
x
Reference in New Issue
Block a user