mirror of
https://github.com/msberends/AMR.git
synced 2025-12-22 00:20:26 +01:00
(v3.0.1.9005) re-add tidymodels implementation
This commit is contained in:
10
.github/ISSUE_TEMPLATE/1-bug-report.yml
vendored
10
.github/ISSUE_TEMPLATE/1-bug-report.yml
vendored
@@ -22,9 +22,9 @@ body:
|
||||
label: Minimal Reproducible Example (optional)
|
||||
description: Please include a short R code snippet that reproduces the problem, if possible.
|
||||
placeholder:
|
||||
e.g.
|
||||
```r
|
||||
ab_name("amoxicillin/clavulanic acid", language = "es")
|
||||
e.g.\n\n
|
||||
```r<br>
|
||||
ab_name("amoxicillin/clavulanic acid", language = "es")\n
|
||||
```
|
||||
validations:
|
||||
required: false
|
||||
@@ -42,7 +42,7 @@ body:
|
||||
multiple: false
|
||||
options:
|
||||
- ''
|
||||
- Latest CRAN version (3.0.0)
|
||||
- One of the latest GitHub versions (3.0.0.9xxx)
|
||||
- Latest CRAN version (3.0.1)
|
||||
- One of the latest GitHub versions (3.0.1.9xxx)
|
||||
validations:
|
||||
required: true
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
Package: AMR
|
||||
Version: 3.0.1.9004
|
||||
Date: 2025-12-15
|
||||
Version: 3.0.1.9005
|
||||
Date: 2025-12-21
|
||||
Title: Antimicrobial Resistance Data Analysis
|
||||
Description: Functions to simplify and standardise antimicrobial resistance (AMR)
|
||||
data analysis and to work with microbial and antimicrobial properties by
|
||||
|
||||
16
NAMESPACE
16
NAMESPACE
@@ -106,6 +106,8 @@ S3method(print,mo_uncertainties)
|
||||
S3method(print,pca)
|
||||
S3method(print,sir)
|
||||
S3method(print,sir_log)
|
||||
S3method(print,step_mic_log2)
|
||||
S3method(print,step_sir_numeric)
|
||||
S3method(quantile,mic)
|
||||
S3method(rep,ab)
|
||||
S3method(rep,av)
|
||||
@@ -159,6 +161,12 @@ export(administrable_per_os)
|
||||
export(age)
|
||||
export(age_groups)
|
||||
export(all_antimicrobials)
|
||||
export(all_disk)
|
||||
export(all_disk_predictors)
|
||||
export(all_mic)
|
||||
export(all_mic_predictors)
|
||||
export(all_sir)
|
||||
export(all_sir_predictors)
|
||||
export(aminoglycosides)
|
||||
export(aminopenicillins)
|
||||
export(amr_class)
|
||||
@@ -352,6 +360,8 @@ export(sir_df)
|
||||
export(sir_interpretation_history)
|
||||
export(sir_predict)
|
||||
export(skewness)
|
||||
export(step_mic_log2)
|
||||
export(step_sir_numeric)
|
||||
export(streptogramins)
|
||||
export(sulfonamides)
|
||||
export(susceptibility)
|
||||
@@ -390,6 +400,12 @@ if(getRversion() >= "3.0.0") S3method(pillar::type_sum, av)
|
||||
if(getRversion() >= "3.0.0") S3method(pillar::type_sum, mic)
|
||||
if(getRversion() >= "3.0.0") S3method(pillar::type_sum, mo)
|
||||
if(getRversion() >= "3.0.0") S3method(pillar::type_sum, sir)
|
||||
if(getRversion() >= "3.0.0") S3method(recipes::bake, step_mic_log2)
|
||||
if(getRversion() >= "3.0.0") S3method(recipes::bake, step_sir_numeric)
|
||||
if(getRversion() >= "3.0.0") S3method(recipes::prep, step_mic_log2)
|
||||
if(getRversion() >= "3.0.0") S3method(recipes::prep, step_sir_numeric)
|
||||
if(getRversion() >= "3.0.0") S3method(recipes::tidy, step_mic_log2)
|
||||
if(getRversion() >= "3.0.0") S3method(recipes::tidy, step_sir_numeric)
|
||||
if(getRversion() >= "3.0.0") S3method(skimr::get_skimmers, ab)
|
||||
if(getRversion() >= "3.0.0") S3method(skimr::get_skimmers, disk)
|
||||
if(getRversion() >= "3.0.0") S3method(skimr::get_skimmers, mic)
|
||||
|
||||
12
NEWS.md
12
NEWS.md
@@ -1,4 +1,10 @@
|
||||
# AMR 3.0.1.9004
|
||||
# AMR 3.0.1.9005
|
||||
|
||||
### New
|
||||
* Integration with the **tidymodels** framework to allow seamless use of SIR, MIC and disk data in modelling pipelines via `recipes`
|
||||
- `step_mic_log2()` to transform `<mic>` columns with log2, and `step_sir_numeric()` to convert `<sir>` columns to numeric
|
||||
- New `tidyselect` helpers: `all_sir()`, `all_sir_predictors()`, `all_mic()`, `all_mic_predictors()`, `all_disk()`, `all_disk_predictors()`
|
||||
* Data set `esbl_isolates` to practise with AMR modelling
|
||||
|
||||
### Changed
|
||||
* Fixed a bug in `antibiogram()` for when no antimicrobials are set
|
||||
@@ -48,7 +54,7 @@ This is a bugfix release following the release of v3.0.0 in June 2025.
|
||||
This package now supports not only tools for AMR data analysis in clinical settings, but also for veterinary and environmental microbiology. This was made possible through a collaboration with the [University of Prince Edward Island's Atlantic Veterinary College](https://www.upei.ca/avc), Canada. To celebrate this great improvement of the package, we also updated the package logo to reflect this change.
|
||||
|
||||
### Breaking
|
||||
* Dataset `antibiotics` has been renamed to `antimicrobials` as the data set contains more than just antibiotics. Using `antibiotics` will still work, but now returns a warning.
|
||||
* Data set `antibiotics` has been renamed to `antimicrobials` as the data set contains more than just antibiotics. Using `antibiotics` will still work, but now returns a warning.
|
||||
* Removed all functions and references that used the deprecated `rsi` class, which were all replaced with their `sir` equivalents over two years ago.
|
||||
* Functions `resistance_predict()` and `sir_predict()` are now deprecated and will be removed in a future version. Use the `tidymodels` framework instead, for which we [wrote a basic introduction](https://amr-for-r.org/articles/AMR_with_tidymodels.html).
|
||||
|
||||
@@ -60,7 +66,7 @@ This package now supports not only tools for AMR data analysis in clinical setti
|
||||
* `ab_atc()` now supports ATC codes of veterinary antimicrobials (that all start with "Q")
|
||||
* `ab_url()` now supports retrieving the WHOCC url of their ATCvet pages
|
||||
* **Support for WISCA antibiograms**
|
||||
* The `antibiogram()` function now supports creating true Weighted-Incidence Syndromic Combination Antibiograms (WISCA), a powerful Bayesian method for estimating regimen coverage probabilities using pathogen incidence and antimicrobial susceptibility data. WISCA offers improved precision for syndrome-specific treatment, even in datasets with sparse data. A dedicated `wisca()` function is also available for easy usage.
|
||||
* The `antibiogram()` function now supports creating true Weighted-Incidence Syndromic Combination Antibiograms (WISCA), a powerful Bayesian method for estimating regimen coverage probabilities using pathogen incidence and antimicrobial susceptibility data. WISCA offers improved precision for syndrome-specific treatment, even in data sets with sparse data. A dedicated `wisca()` function is also available for easy usage.
|
||||
* **More global coverage of languages**
|
||||
* Added full support for 8 new languages: Arabic, Bengali, Hindi, Indonesian, Korean, Swahili, Urdu, and Vietnamese. The `AMR` package is now available in 28 languages.
|
||||
* **Major update to fungal taxonomy and tools for mycologists**
|
||||
|
||||
@@ -966,8 +966,13 @@ get_current_data <- function(arg_name, call) {
|
||||
# an element `.data` will be in the environment when using dplyr::select()
|
||||
return(env$`.data`)
|
||||
} else if (valid_df(env$training)) {
|
||||
# an element `training` will be in the environment when using some tidymodels functions such as `prep()`
|
||||
if (!is.null(env$x) && valid_df(env$x$template)) {
|
||||
# an element `x$template` will be in the environment when using some tidymodels functions such as `prep()`
|
||||
return(env$x$template)
|
||||
} else {
|
||||
# this is a fallback for some tidymodels functions such as `prep()`
|
||||
return(env$training)
|
||||
}
|
||||
} else if (valid_df(env$data)) {
|
||||
# an element `data` will be in the environment when using older dplyr versions, or some tidymodels functions such as `fit()`
|
||||
return(env$data)
|
||||
|
||||
@@ -163,7 +163,7 @@
|
||||
#' antimicrobials = c("TZP", "TZP+TOB", "TZP+GEN"))
|
||||
#' ```
|
||||
#'
|
||||
#' WISCA uses a sophisticated Bayesian decision model to combine both local and pooled antimicrobial resistance data. This approach not only evaluates local patterns but can also draw on multi-centre datasets to improve regimen accuracy, even in low-incidence infections like paediatric bloodstream infections (BSIs).
|
||||
#' WISCA uses a sophisticated Bayesian decision model to combine both local and pooled antimicrobial resistance data. This approach not only evaluates local patterns but can also draw on multi-centre data sets to improve regimen accuracy, even in low-incidence infections like paediatric bloodstream infections (BSIs).
|
||||
#'
|
||||
#' ### Grouped tibbles
|
||||
#'
|
||||
|
||||
@@ -99,7 +99,8 @@ atc_online_property <- function(atc_code,
|
||||
read_html <- import_fn("read_html", "xml2")
|
||||
|
||||
if (!all(atc_code %in% unlist(AMR::antimicrobials$atc))) {
|
||||
atc_code <- as.character(ab_atc(atc_code, only_first = TRUE))
|
||||
missing <- atc_code %unlike% "[A-Z][0-9][0-9][A-Z][A-Z][0-9][0-9]"
|
||||
atc_code[missing] <- as.character(ab_atc(atc_code[missing], only_first = TRUE))
|
||||
}
|
||||
|
||||
if (!has_internet()) {
|
||||
|
||||
24
R/data.R
24
R/data.R
@@ -282,7 +282,7 @@
|
||||
|
||||
#' Data Set with Clinical Breakpoints for SIR Interpretation
|
||||
#'
|
||||
#' @description Data set containing clinical breakpoints to interpret MIC and disk diffusion to SIR values, according to international guidelines. This dataset contain breakpoints for humans, `r length(unique(clinical_breakpoints$host[!clinical_breakpoints$host %in% clinical_breakpoints$type]))` different animal groups, and ECOFFs.
|
||||
#' @description Data set containing clinical breakpoints to interpret MIC and disk diffusion to SIR values, according to international guidelines. This data set contains breakpoints for humans, `r length(unique(clinical_breakpoints$host[!clinical_breakpoints$host %in% clinical_breakpoints$type]))` different animal groups, and ECOFFs.
|
||||
#'
|
||||
#' These breakpoints are currently implemented:
|
||||
#' - For **clinical microbiology**: EUCAST `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "human")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "human")$guideline)))` and CLSI `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "human")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "human")$guideline)))`;
|
||||
@@ -362,14 +362,14 @@
|
||||
#' dosage
|
||||
"dosage"
|
||||
|
||||
# TODO #' Data Set with `r format(nrow(esbl_isolates), big.mark = " ")` ESBL Isolates
|
||||
# TODO #'
|
||||
# TODO #' A data set containing `r format(nrow(esbl_isolates), big.mark = " ")` microbial isolates with MIC values of common antibiotics and a binary `esbl` column for extended-spectrum beta-lactamase (ESBL) production. This data set contains randomised fictitious data but reflects reality and can be used to practise AMR-related machine learning, e.g., classification modelling with [tidymodels](https://amr-for-r.org/articles/AMR_with_tidymodels.html).
|
||||
# TODO #' @format A [tibble][tibble::tibble] with `r format(nrow(esbl_isolates), big.mark = " ")` observations and `r ncol(esbl_isolates)` variables:
|
||||
# TODO #' - `esbl`\cr Logical indicator if the isolate is ESBL-producing
|
||||
# TODO #' - `genus`\cr Genus of the microorganism
|
||||
# TODO #' - `AMC:COL`\cr MIC values for 17 antimicrobial agents, transformed to class [`mic`] (see [as.mic()])
|
||||
# TODO #' @details See our [tidymodels integration][amr-tidymodels] for an example using this data set.
|
||||
# TODO #' @examples
|
||||
# TODO #' esbl_isolates
|
||||
# TODO "esbl_isolates"
|
||||
#' Data Set with `r format(nrow(esbl_isolates), big.mark = " ")` ESBL Isolates
|
||||
#'
|
||||
#' A data set containing `r format(nrow(esbl_isolates), big.mark = " ")` microbial isolates with MIC values of common antibiotics and a binary `esbl` column for extended-spectrum beta-lactamase (ESBL) production. This data set contains randomised fictitious data but reflects reality and can be used to practise AMR-related machine learning, e.g., classification modelling with [tidymodels](https://amr-for-r.org/articles/AMR_with_tidymodels.html).
|
||||
#' @format A [tibble][tibble::tibble] with `r format(nrow(esbl_isolates), big.mark = " ")` observations and `r ncol(esbl_isolates)` variables:
|
||||
#' - `esbl`\cr Logical indicator if the isolate is ESBL-producing
|
||||
#' - `genus`\cr Genus of the microorganism
|
||||
#' - `AMC:COL`\cr MIC values for 17 antimicrobial agents, transformed to class [`mic`] (see [as.mic()])
|
||||
#' @details See our [tidymodels integration][amr-tidymodels] for an example using this data set.
|
||||
#' @examples
|
||||
#' esbl_isolates
|
||||
"esbl_isolates"
|
||||
|
||||
@@ -1,20 +1,21 @@
|
||||
#' AMR Extensions for Tidymodels
|
||||
#'
|
||||
#' This family of functions allows using AMR-specific data types such as `<mic>` and `<sir>` inside `tidymodels` pipelines.
|
||||
#' This family of functions allows using AMR-specific data types such as `<sir>` and `<mic>` inside `tidymodels` pipelines.
|
||||
#' @inheritParams recipes::step_center
|
||||
#' @details
|
||||
#' You can read more in our online [AMR with tidymodels introduction](https://amr-for-r.org/articles/AMR_with_tidymodels.html).
|
||||
#'
|
||||
#' Tidyselect helpers include:
|
||||
#' - [all_mic()] and [all_mic_predictors()] to select `<mic>` columns
|
||||
#' - [all_sir()] and [all_sir_predictors()] to select `<sir>` columns
|
||||
#' - [all_sir()] and [all_sir_predictors()] to select [`<sir>`][as.sir()] columns
|
||||
#' - [all_mic()] and [all_mic_predictors()] to select [`<mic>`][as.mic()] columns
|
||||
#' - [all_disk()] and [all_disk_predictors()] to select [`<disk>`][as.disk()] columns
|
||||
#'
|
||||
#' Pre-processing pipeline steps include:
|
||||
#' - [step_mic_log2()] to convert MIC columns to numeric (via `as.numeric()`) and apply a log2 transform, to be used with [all_mic_predictors()]
|
||||
#' - [step_sir_numeric()] to convert SIR columns to numeric (via `as.numeric()`), to be used with [all_sir_predictors()]: `"S"` = 1, `"I"`/`"SDD"` = 2, `"R"` = 3. All other values are rendered `NA`. Keep this in mind for further processing, especially if the model does not allow for `NA` values.
|
||||
#' - [step_mic_log2()] to convert MIC columns to numeric (via `as.numeric()`) and apply a log2 transform, to be used with [all_mic_predictors()]
|
||||
#'
|
||||
#' These steps integrate with `recipes::recipe()` and work like standard preprocessing steps. They are useful for preparing data for modelling, especially with classification models.
|
||||
#' @seealso [recipes::recipe()], [as.mic()], [as.sir()]
|
||||
#' @seealso [recipes::recipe()], [as.sir()], [as.mic()], [as.disk()]
|
||||
#' @name amr-tidymodels
|
||||
#' @keywords internal
|
||||
#' @export
|
||||
@@ -66,35 +67,55 @@
|
||||
#' bind_cols(out_testing)
|
||||
#'
|
||||
#' # Evaluate predictions using standard classification metrics
|
||||
#' our_metrics <- metric_set(accuracy, kap, ppv, npv)
|
||||
#' our_metrics <- metric_set(accuracy,
|
||||
#' recall,
|
||||
#' precision,
|
||||
#' sensitivity,
|
||||
#' specificity,
|
||||
#' ppv,
|
||||
#' npv)
|
||||
#' metrics <- our_metrics(predictions, truth = esbl, estimate = .pred_class)
|
||||
#'
|
||||
#' # Show performance
|
||||
#' metrics
|
||||
#' }
|
||||
all_mic <- function() {
|
||||
x <- tidymodels_amr_select(levels(NA_mic_))
|
||||
names(x)
|
||||
}
|
||||
|
||||
#' @rdname amr-tidymodels
|
||||
#' @export
|
||||
all_mic_predictors <- function() {
|
||||
x <- tidymodels_amr_select(levels(NA_mic_))
|
||||
intersect(x, recipes::has_role("predictor"))
|
||||
}
|
||||
|
||||
#' @rdname amr-tidymodels
|
||||
#' @export
|
||||
all_sir <- function() {
|
||||
x <- tidymodels_amr_select(levels(NA_sir_))
|
||||
x <- tidymodels_amr_select(class = "sir")
|
||||
names(x)
|
||||
}
|
||||
|
||||
#' @rdname amr-tidymodels
|
||||
#' @export
|
||||
all_sir_predictors <- function() {
|
||||
x <- tidymodels_amr_select(levels(NA_sir_))
|
||||
x <- tidymodels_amr_select(class = "sir")
|
||||
intersect(x, recipes::has_role("predictor"))
|
||||
}
|
||||
|
||||
#' @rdname amr-tidymodels
|
||||
#' @export
|
||||
all_mic <- function() {
|
||||
x <- tidymodels_amr_select(class = "mic")
|
||||
names(x)
|
||||
}
|
||||
|
||||
#' @rdname amr-tidymodels
|
||||
#' @export
|
||||
all_mic_predictors <- function() {
|
||||
x <- tidymodels_amr_select(class = "mic")
|
||||
intersect(x, recipes::has_role("predictor"))
|
||||
}
|
||||
|
||||
#' @rdname amr-tidymodels
|
||||
#' @export
|
||||
all_disk <- function() {
|
||||
x <- tidymodels_amr_select(class = "disk")
|
||||
names(x)
|
||||
}
|
||||
|
||||
#' @rdname amr-tidymodels
|
||||
#' @export
|
||||
all_disk_predictors <- function() {
|
||||
x <- tidymodels_amr_select(class = "disk")
|
||||
intersect(x, recipes::has_role("predictor"))
|
||||
}
|
||||
|
||||
@@ -160,7 +181,6 @@ bake.step_mic_log2 <- function(object, new_data, ...) {
|
||||
print.step_mic_log2 <- function(x, width = max(20, options()$width - 35), ...) {
|
||||
title <- "Log2 transformation of MIC columns"
|
||||
recipes::print_step(x$columns, x$terms, x$trained, title, width)
|
||||
invisible(x)
|
||||
}
|
||||
|
||||
#' @rawNamespace if(getRversion() >= "3.0.0") S3method(recipes::tidy, step_mic_log2)
|
||||
@@ -236,7 +256,6 @@ bake.step_sir_numeric <- function(object, new_data, ...) {
|
||||
print.step_sir_numeric <- function(x, width = max(20, options()$width - 35), ...) {
|
||||
title <- "Numeric transformation of SIR columns"
|
||||
recipes::print_step(x$columns, x$terms, x$trained, title, width)
|
||||
invisible(x)
|
||||
}
|
||||
|
||||
#' @rawNamespace if(getRversion() >= "3.0.0") S3method(recipes::tidy, step_sir_numeric)
|
||||
@@ -250,13 +269,13 @@ tidy.step_sir_numeric <- function(x, ...) {
|
||||
res
|
||||
}
|
||||
|
||||
tidymodels_amr_select <- function(check_vector) {
|
||||
tidymodels_amr_select <- function(class) {
|
||||
df <- get_current_data()
|
||||
ind <- which(
|
||||
vapply(
|
||||
FUN.VALUE = logical(1),
|
||||
df,
|
||||
function(x) all(x %in% c(check_vector, NA), na.rm = TRUE) & any(x %in% check_vector),
|
||||
function(x) inherits(x, class),
|
||||
USE.NAMES = TRUE
|
||||
),
|
||||
useNames = TRUE
|
||||
138
man/amr-tidymodels.Rd
Normal file
138
man/amr-tidymodels.Rd
Normal file
@@ -0,0 +1,138 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/tidymodels.R
|
||||
\name{amr-tidymodels}
|
||||
\alias{amr-tidymodels}
|
||||
\alias{all_sir}
|
||||
\alias{all_sir_predictors}
|
||||
\alias{all_mic}
|
||||
\alias{all_mic_predictors}
|
||||
\alias{all_disk}
|
||||
\alias{all_disk_predictors}
|
||||
\alias{step_mic_log2}
|
||||
\alias{step_sir_numeric}
|
||||
\title{AMR Extensions for Tidymodels}
|
||||
\usage{
|
||||
all_sir()
|
||||
|
||||
all_sir_predictors()
|
||||
|
||||
all_mic()
|
||||
|
||||
all_mic_predictors()
|
||||
|
||||
all_disk()
|
||||
|
||||
all_disk_predictors()
|
||||
|
||||
step_mic_log2(recipe, ..., role = NA, trained = FALSE, columns = NULL,
|
||||
skip = FALSE, id = recipes::rand_id("mic_log2"))
|
||||
|
||||
step_sir_numeric(recipe, ..., role = NA, trained = FALSE, columns = NULL,
|
||||
skip = FALSE, id = recipes::rand_id("sir_numeric"))
|
||||
}
|
||||
\arguments{
|
||||
\item{recipe}{A recipe object. The step will be added to the sequence of
|
||||
operations for this recipe.}
|
||||
|
||||
\item{...}{One or more selector functions to choose variables for this step.
|
||||
See \code{\link[recipes:selections]{selections()}} for more details.}
|
||||
|
||||
\item{role}{Not used by this step since no new variables are created.}
|
||||
|
||||
\item{trained}{A logical to indicate if the quantities for preprocessing have
|
||||
been estimated.}
|
||||
|
||||
\item{skip}{A logical. Should the step be skipped when the recipe is baked by
|
||||
\code{\link[recipes:bake]{bake()}}? While all operations are baked when \code{\link[recipes:prep]{prep()}} is run, some
|
||||
operations may not be able to be conducted on new data (e.g. processing the
|
||||
outcome variable(s)). Care should be taken when using \code{skip = TRUE} as it
|
||||
may affect the computations for subsequent operations.}
|
||||
|
||||
\item{id}{A character string that is unique to this step to identify it.}
|
||||
}
|
||||
\description{
|
||||
This family of functions allows using AMR-specific data types such as \verb{<sir>} and \verb{<mic>} inside \code{tidymodels} pipelines.
|
||||
}
|
||||
\details{
|
||||
You can read more in our online \href{https://amr-for-r.org/articles/AMR_with_tidymodels.html}{AMR with tidymodels introduction}.
|
||||
|
||||
Tidyselect helpers include:
|
||||
\itemize{
|
||||
\item \code{\link[=all_sir]{all_sir()}} and \code{\link[=all_sir_predictors]{all_sir_predictors()}} to select \code{\link[=as.sir]{<sir>}} columns
|
||||
\item \code{\link[=all_mic]{all_mic()}} and \code{\link[=all_mic_predictors]{all_mic_predictors()}} to select \code{\link[=as.mic]{<mic>}} columns
|
||||
\item \code{\link[=all_disk]{all_disk()}} and \code{\link[=all_disk_predictors]{all_disk_predictors()}} to select \code{\link[=as.disk]{<disk>}} columns
|
||||
}
|
||||
|
||||
Pre-processing pipeline steps include:
|
||||
\itemize{
|
||||
\item \code{\link[=step_sir_numeric]{step_sir_numeric()}} to convert SIR columns to numeric (via \code{as.numeric()}), to be used with \code{\link[=all_sir_predictors]{all_sir_predictors()}}: \code{"S"} = 1, \code{"I"}/\code{"SDD"} = 2, \code{"R"} = 3. All other values are rendered \code{NA}. Keep this in mind for further processing, especially if the model does not allow for \code{NA} values.
|
||||
\item \code{\link[=step_mic_log2]{step_mic_log2()}} to convert MIC columns to numeric (via \code{as.numeric()}) and apply a log2 transform, to be used with \code{\link[=all_mic_predictors]{all_mic_predictors()}}
|
||||
}
|
||||
|
||||
These steps integrate with \code{recipes::recipe()} and work like standard preprocessing steps. They are useful for preparing data for modelling, especially with classification models.
|
||||
}
|
||||
\examples{
|
||||
if (require("tidymodels")) {
|
||||
|
||||
# The below approach formed the basis for this paper: DOI 10.3389/fmicb.2025.1582703
|
||||
# Presence of ESBL genes was predicted based on raw MIC values.
|
||||
|
||||
|
||||
# example data set in the AMR package
|
||||
esbl_isolates
|
||||
|
||||
# Prepare a binary outcome and convert to ordered factor
|
||||
data <- esbl_isolates \%>\%
|
||||
mutate(esbl = factor(esbl, levels = c(FALSE, TRUE), ordered = TRUE))
|
||||
|
||||
# Split into training and testing sets
|
||||
split <- initial_split(data)
|
||||
training_data <- training(split)
|
||||
testing_data <- testing(split)
|
||||
|
||||
# Create and prep a recipe with MIC log2 transformation
|
||||
mic_recipe <- recipe(esbl ~ ., data = training_data) \%>\%
|
||||
|
||||
# Optionally remove non-predictive variables
|
||||
remove_role(genus, old_role = "predictor") \%>\%
|
||||
|
||||
# Apply the log2 transformation to all MIC predictors
|
||||
step_mic_log2(all_mic_predictors()) \%>\%
|
||||
|
||||
# And apply the preparation steps
|
||||
prep()
|
||||
|
||||
# View prepped recipe
|
||||
mic_recipe
|
||||
|
||||
# Apply the recipe to training and testing data
|
||||
out_training <- bake(mic_recipe, new_data = NULL)
|
||||
out_testing <- bake(mic_recipe, new_data = testing_data)
|
||||
|
||||
# Fit a logistic regression model
|
||||
fitted <- logistic_reg(mode = "classification") \%>\%
|
||||
set_engine("glm") \%>\%
|
||||
fit(esbl ~ ., data = out_training)
|
||||
|
||||
# Generate predictions on the test set
|
||||
predictions <- predict(fitted, out_testing) \%>\%
|
||||
bind_cols(out_testing)
|
||||
|
||||
# Evaluate predictions using standard classification metrics
|
||||
our_metrics <- metric_set(accuracy,
|
||||
recall,
|
||||
precision,
|
||||
sensitivity,
|
||||
specificity,
|
||||
ppv,
|
||||
npv)
|
||||
metrics <- our_metrics(predictions, truth = esbl, estimate = .pred_class)
|
||||
|
||||
# Show performance
|
||||
metrics
|
||||
}
|
||||
}
|
||||
\seealso{
|
||||
\code{\link[recipes:recipe]{recipes::recipe()}}, \code{\link[=as.sir]{as.sir()}}, \code{\link[=as.mic]{as.mic()}}, \code{\link[=as.disk]{as.disk()}}
|
||||
}
|
||||
\keyword{internal}
|
||||
@@ -209,7 +209,7 @@ wisca(your_data,
|
||||
antimicrobials = c("TZP", "TZP+TOB", "TZP+GEN"))
|
||||
}\if{html}{\out{</div>}}
|
||||
|
||||
WISCA uses a sophisticated Bayesian decision model to combine both local and pooled antimicrobial resistance data. This approach not only evaluates local patterns but can also draw on multi-centre datasets to improve regimen accuracy, even in low-incidence infections like paediatric bloodstream infections (BSIs).
|
||||
WISCA uses a sophisticated Bayesian decision model to combine both local and pooled antimicrobial resistance data. This approach not only evaluates local patterns but can also draw on multi-centre data sets to improve regimen accuracy, even in low-incidence infections like paediatric bloodstream infections (BSIs).
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -27,7 +27,7 @@ A \link[tibble:tibble]{tibble} with 40 217 observations and 14 variables:
|
||||
clinical_breakpoints
|
||||
}
|
||||
\description{
|
||||
Data set containing clinical breakpoints to interpret MIC and disk diffusion to SIR values, according to international guidelines. This dataset contain breakpoints for humans, 7 different animal groups, and ECOFFs.
|
||||
Data set containing clinical breakpoints to interpret MIC and disk diffusion to SIR values, according to international guidelines. This data set contains breakpoints for humans, 7 different animal groups, and ECOFFs.
|
||||
|
||||
These breakpoints are currently implemented:
|
||||
\itemize{
|
||||
|
||||
27
man/esbl_isolates.Rd
Normal file
27
man/esbl_isolates.Rd
Normal file
@@ -0,0 +1,27 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/data.R
|
||||
\docType{data}
|
||||
\name{esbl_isolates}
|
||||
\alias{esbl_isolates}
|
||||
\title{Data Set with 500 ESBL Isolates}
|
||||
\format{
|
||||
A \link[tibble:tibble]{tibble} with 500 observations and 19 variables:
|
||||
\itemize{
|
||||
\item \code{esbl}\cr Logical indicator if the isolate is ESBL-producing
|
||||
\item \code{genus}\cr Genus of the microorganism
|
||||
\item \code{AMC:COL}\cr MIC values for 17 antimicrobial agents, transformed to class \code{\link{mic}} (see \code{\link[=as.mic]{as.mic()}})
|
||||
}
|
||||
}
|
||||
\usage{
|
||||
esbl_isolates
|
||||
}
|
||||
\description{
|
||||
A data set containing 500 microbial isolates with MIC values of common antibiotics and a binary \code{esbl} column for extended-spectrum beta-lactamase (ESBL) production. This data set contains randomised fictitious data but reflects reality and can be used to practise AMR-related machine learning, e.g., classification modelling with \href{https://amr-for-r.org/articles/AMR_with_tidymodels.html}{tidymodels}.
|
||||
}
|
||||
\details{
|
||||
See our \link[=amr-tidymodels]{tidymodels integration} for an example using this data set.
|
||||
}
|
||||
\examples{
|
||||
esbl_isolates
|
||||
}
|
||||
\keyword{datasets}
|
||||
84
tests/testthat/test-tidymodels.R
Executable file
84
tests/testthat/test-tidymodels.R
Executable file
@@ -0,0 +1,84 @@
|
||||
# ==================================================================== #
|
||||
# TITLE: #
|
||||
# AMR: An R Package for Working with Antimicrobial Resistance Data #
|
||||
# #
|
||||
# SOURCE CODE: #
|
||||
# https://github.com/msberends/AMR #
|
||||
# #
|
||||
# PLEASE CITE THIS SOFTWARE AS: #
|
||||
# Berends MS, Luz CF, Friedrich AW, et al. (2022). #
|
||||
# AMR: An R Package for Working with Antimicrobial Resistance Data. #
|
||||
# Journal of Statistical Software, 104(3), 1-31. #
|
||||
# https://doi.org/10.18637/jss.v104.i03 #
|
||||
# #
|
||||
# Developed at the University of Groningen and the University Medical #
|
||||
# Center Groningen in The Netherlands, in collaboration with many #
|
||||
# colleagues from around the world, see our website. #
|
||||
# #
|
||||
# This R package is free software; you can freely use and distribute #
|
||||
# it for both personal and commercial purposes under the terms of the #
|
||||
# GNU General Public License version 2.0 (GNU GPL-2), as published by #
|
||||
# the Free Software Foundation. #
|
||||
# We created this package for both routine data analysis and academic #
|
||||
# research and it was publicly released in the hope that it will be #
|
||||
# useful, but it comes WITHOUT ANY WARRANTY OR LIABILITY. #
|
||||
# #
|
||||
# Visit our website for the full manual and a complete tutorial about #
|
||||
# how to conduct AMR data analysis: https://amr-for-r.org #
|
||||
# ==================================================================== #
|
||||
|
||||
test_that("tidymodels.R", {
|
||||
skip_on_cran()
|
||||
|
||||
if (AMR:::pkg_is_available("recipes", also_load = TRUE) && AMR:::pkg_is_available("dplyr", also_load = TRUE)) {
|
||||
# SIR
|
||||
df <- tibble(
|
||||
sir1 = as.sir(c("S", "I", "R", "S", "R")),
|
||||
sir2 = as.sir(c("I", "R", "S", "R", "I")),
|
||||
not_sir = c("S", "R", "R", "S", "I")
|
||||
)
|
||||
rec <- recipe(~., data = df) %>% step_sir_numeric(all_sir())
|
||||
prepped <- prep(rec)
|
||||
baked <- bake(prepped, new_data = df)
|
||||
expect_inherits(baked$sir1, "numeric")
|
||||
expect_inherits(baked$sir2, "numeric")
|
||||
expect_equal(baked$not_sir, as.factor(df$not_sir))
|
||||
|
||||
# MIC
|
||||
df <- tibble(
|
||||
mic_col1 = as.mic(c("<=0.002", "0.002", "0.004", "0.016", "32")),
|
||||
mic_col2 = as.mic(c("0.5", "1", "2", "4", "8")),
|
||||
non_mic = c(1, 2, 3, 4, 5)
|
||||
)
|
||||
rec <- recipe(~., data = df) %>% step_mic_log2(all_mic())
|
||||
prepped <- prep(rec)
|
||||
baked <- bake(prepped, new_data = df)
|
||||
expect_inherits(baked$mic_col1, "numeric")
|
||||
expect_inherits(baked$mic_col2, "numeric")
|
||||
expect_equal(baked$non_mic, df$non_mic)
|
||||
expect_equal(baked$mic_col2, log2(as.numeric(df$mic_col2)))
|
||||
|
||||
# disk
|
||||
df <- tibble(
|
||||
disk_col = as.disk(c(21, 22, 23, 24, 25)),
|
||||
non_disk = c(21, 22, 23, 24, 25)
|
||||
)
|
||||
rec <- recipe(~., data = df) %>% step_rm(!all_disk())
|
||||
prepped <- prep(rec)
|
||||
baked <- bake(prepped, new_data = df)
|
||||
expect_inherits(baked$disk_col, "disk")
|
||||
|
||||
# steps check
|
||||
df <- tibble(x = as.mic(c("1", "2", "4")))
|
||||
rec <- recipe(~x, data = df) %>% step_mic_log2(all_mic())
|
||||
prepped <- prep(rec)
|
||||
tidy_df <- tidy(prepped, number = 1)
|
||||
expect_equal(unname(tidy_df$terms), "x")
|
||||
|
||||
df <- tibble(x = as.sir(c("S", "I", "R")))
|
||||
rec <- recipe(~x, data = df) %>% step_sir_numeric(all_sir())
|
||||
prepped <- prep(rec)
|
||||
tidy_df <- tidy(prepped, number = 1)
|
||||
expect_equal(unname(tidy_df$terms), "x")
|
||||
}
|
||||
})
|
||||
@@ -22,7 +22,7 @@ knitr::opts_chunk$set(
|
||||
)
|
||||
```
|
||||
|
||||
> This page was entirely written by our [AMR for R Assistant](https://chat.amr-for-r.org), a ChatGPT manually-trained model able to answer any question about the `AMR` package.
|
||||
> This page was almost entirely written by our [AMR for R Assistant](https://chat.amr-for-r.org), a ChatGPT manually-trained model able to answer any question about the `AMR` package.
|
||||
|
||||
Antimicrobial resistance (AMR) is a global health crisis, and understanding resistance patterns is crucial for managing effective treatments. The `AMR` R package provides robust tools for analysing AMR data, including convenient antimicrobial selector functions like `aminoglycosides()` and `betalactams()`.
|
||||
|
||||
@@ -219,18 +219,163 @@ This workflow is extensible to other antimicrobial classes and resistance patter
|
||||
|
||||
In this second example, we demonstrate how to use `<mic>` columns directly in `tidymodels` workflows using AMR-specific recipe steps. This includes a transformation to `log2` scale using `step_mic_log2()`, which prepares MIC values for use in classification models.
|
||||
|
||||
This approach and idea formed the basis for the publication [DOI: 10.3389/fmicb.2025.1582703](https://doi.org/10.3389/fmicb.2025.1582703) to model the presence of extended-spectrum beta-lactamases (ESBL).
|
||||
This approach and idea formed the basis for the publication [DOI: 10.3389/fmicb.2025.1582703](https://doi.org/10.3389/fmicb.2025.1582703) to model the presence of extended-spectrum beta-lactamases (ESBL) based on MIC values.
|
||||
|
||||
> NOTE: THIS EXAMPLE WILL BE AVAILABLE IN A NEXT VERSION (#TODO)
|
||||
>
|
||||
> The new AMR package version will contain new tidymodels selectors such as `step_mic_log2()`.
|
||||
### **Objective**
|
||||
|
||||
<!-- TODO for AMR v3.1.0: add info from here: https://github.com/msberends/AMR/blob/2461631bcefa78ebdb37bdfad359be74cdd9165a/vignettes/AMR_with_tidymodels.Rmd#L212-L291 -->
|
||||
Our goal is to:
|
||||
|
||||
1. Use raw MIC values to predict whether a bacterial isolate produces ESBL.
|
||||
2. Apply AMR-aware preprocessing in a `tidymodels` recipe.
|
||||
3. Train a classification model and evaluate its predictive performance.
|
||||
|
||||
### **Data Preparation**
|
||||
|
||||
We use the `esbl_isolates` dataset that comes with the AMR package.
|
||||
|
||||
```{r}
|
||||
# Load required libraries
|
||||
library(AMR)
|
||||
library(tidymodels)
|
||||
|
||||
# View the esbl_isolates data set
|
||||
esbl_isolates
|
||||
|
||||
# Prepare a binary outcome and convert to ordered factor
|
||||
data <- esbl_isolates %>%
|
||||
mutate(esbl = factor(esbl, levels = c(FALSE, TRUE), ordered = TRUE))
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
- `esbl_isolates`: Contains MIC test results and ESBL status for each isolate.
|
||||
- `mutate(esbl = ...)`: Converts the target column to an ordered factor for classification.
|
||||
|
||||
### **Defining the Workflow**
|
||||
|
||||
#### 1. Preprocessing with a Recipe
|
||||
|
||||
We use our `step_mic_log2()` function to log2-transform MIC values, ensuring that MICs are numeric and properly scaled. All MIC predictors can easily and agnostically selected using the new `all_mic_predictors()`:
|
||||
|
||||
```{r}
|
||||
# Split into training and testing sets
|
||||
set.seed(123)
|
||||
split <- initial_split(data)
|
||||
training_data <- training(split)
|
||||
testing_data <- testing(split)
|
||||
|
||||
# Define the recipe
|
||||
mic_recipe <- recipe(esbl ~ ., data = training_data) %>%
|
||||
remove_role(genus, old_role = "predictor") %>% # Remove non-informative variable
|
||||
step_mic_log2(all_mic_predictors()) #%>% # Log2 transform all MIC predictors
|
||||
# prep()
|
||||
|
||||
mic_recipe
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
- `remove_role()`: Removes irrelevant variables like genus.
|
||||
- `step_mic_log2()`: Applies `log2(as.numeric(...))` to all MIC predictors in one go.
|
||||
- `prep()`: Finalises the recipe based on training data.
|
||||
|
||||
#### 2. Specifying the Model
|
||||
|
||||
We use a simple logistic regression to model ESBL presence, though recent models such as xgboost ([link to `parsnip` manual](https://parsnip.tidymodels.org/reference/details_boost_tree_xgboost.html)) could be much more precise.
|
||||
|
||||
```{r}
|
||||
# Define the model
|
||||
model <- logistic_reg(mode = "classification") %>%
|
||||
set_engine("glm")
|
||||
|
||||
model
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
- `logistic_reg()`: Specifies a binary classification model.
|
||||
- `set_engine("glm")`: Uses the base R GLM engine.
|
||||
|
||||
#### 3. Building the Workflow
|
||||
|
||||
```{r}
|
||||
# Create workflow
|
||||
workflow_model <- workflow() %>%
|
||||
add_recipe(mic_recipe) %>%
|
||||
add_model(model)
|
||||
|
||||
workflow_model
|
||||
```
|
||||
|
||||
### **Training and Evaluating the Model**
|
||||
|
||||
```{r}
|
||||
# Fit the model
|
||||
fitted <- fit(workflow_model, training_data)
|
||||
|
||||
# Generate predictions
|
||||
predictions <- predict(fitted, testing_data) %>%
|
||||
bind_cols(predict(fitted, out_testing, type = "prob")) %>% # add probabilities
|
||||
bind_cols(testing_data)
|
||||
|
||||
# Evaluate model performance
|
||||
our_metrics <- metric_set(accuracy, recall, precision, sensitivity, specificity, ppv, npv)
|
||||
metrics <- our_metrics(predictions, truth = esbl, estimate = .pred_class)
|
||||
|
||||
metrics
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
- `fit()`: Trains the model on the processed training data.
|
||||
- `predict()`: Produces predictions for unseen test data.
|
||||
- `metric_set()`: Allows evaluating multiple classification metrics. This will make `our_metrics` to become a function that we can use to check the predictions with.
|
||||
|
||||
It appears we can predict ESBL gene presence with a positive predictive value (PPV) of `r round(metrics[metrics$.metric == "ppv", ]$.estimate, 3) * 100`% and a negative predictive value (NPV) of `r round(metrics[metrics$.metric == "npv", ]$.estimate, 3) * 100` using a simplistic logistic regression model.
|
||||
|
||||
### **Visualising Predictions**
|
||||
|
||||
We can visualise predictions by comparing predicted and actual ESBL status.
|
||||
|
||||
```{r}
|
||||
library(ggplot2)
|
||||
|
||||
ggplot(predictions, aes(x = esbl, fill = .pred_class)) +
|
||||
geom_bar(position = "stack") +
|
||||
labs(title = "Predicted vs Actual ESBL Status",
|
||||
x = "Actual ESBL",
|
||||
y = "Count") +
|
||||
theme_minimal()
|
||||
```
|
||||
|
||||
And plot the certainties too - how certain were the actual predictions?
|
||||
|
||||
```{r}
|
||||
predictions %>%
|
||||
mutate(certainty = ifelse(.pred_class == "FALSE",
|
||||
.pred_FALSE,
|
||||
.pred_TRUE),
|
||||
correct = ifelse(esbl == .pred_class, "Right", "Wrong")) %>%
|
||||
ggplot(aes(x = seq_len(nrow(predictions)),
|
||||
y = certainty,
|
||||
colour = correct)) +
|
||||
scale_colour_manual(values = c(Right = "green3", Wrong = "red2"),
|
||||
name = "Correct?") +
|
||||
geom_point() +
|
||||
scale_y_continuous(labels = function(x) paste0(x * 100, "%"),
|
||||
limits = c(0.5, 1)) +
|
||||
theme_minimal()
|
||||
```
|
||||
### **Conclusion**
|
||||
|
||||
In this example, we showcased how the new `AMR`-specific recipe steps simplify working with `<mic>` columns in `tidymodels`. The `step_mic_log2()` transformation converts ordered MICs to log2-transformed numerics, improving compatibility with classification models.
|
||||
|
||||
This pipeline enables realistic, reproducible, and interpretable modelling of antimicrobial resistance data.
|
||||
|
||||
---
|
||||
|
||||
|
||||
## Example 2: Predicting AMR Over Time
|
||||
## Example 3: Predicting AMR Over Time
|
||||
|
||||
In this third example, we aim to predict antimicrobial resistance (AMR) trends over time using `tidymodels`. We will model resistance to three antibiotics (amoxicillin `AMX`, amoxicillin-clavulanic acid `AMC`, and ciprofloxacin `CIP`), based on historical data grouped by year and hospital ward.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user