1
0
mirror of https://github.com/msberends/AMR.git synced 2025-01-24 10:24:34 +01:00

(v2.1.1.9121) support tidymodels

This commit is contained in:
dr. M.S. (Matthijs) Berends 2024-12-19 20:17:15 +01:00
parent 8249cfda46
commit 15fc72fc66
No known key found for this signature in database
16 changed files with 638 additions and 89 deletions

View File

@ -25,6 +25,7 @@
^tests/testthat/_snaps$
^vignettes/AMR\.Rmd$
^vignettes/AMR_intro\.png$
^vignettes/AMR_with_tidymodels\.Rmd$
^vignettes/benchmarks\.Rmd$
^vignettes/benchmarks\.Rmd\.not$
^vignettes/datasets\.Rmd$

View File

@ -1,6 +1,6 @@
Package: AMR
Version: 2.1.1.9120
Date: 2024-12-15
Version: 2.1.1.9121
Date: 2024-12-19
Title: Antimicrobial Resistance Data Analysis
Description: Functions to simplify and standardise antimicrobial resistance (AMR)
data analysis and to work with microbial and antimicrobial properties by
@ -47,6 +47,7 @@ Suggests:
rvest,
skimr,
tibble,
tidymodels,
tidyselect,
tinytest,
vctrs,

View File

@ -1,4 +1,4 @@
# AMR 2.1.1.9120
# AMR 2.1.1.9121
*(this beta version will eventually become v3.0. We're happy to reach a new major milestone soon, which will be all about the new One Health support! Install this beta using [the instructions here](https://msberends.github.io/AMR/#latest-development-version).)*
@ -30,6 +30,8 @@ This package now supports not only tools for AMR data analysis in clinical setti
* New function `rescale_mic()`, which allows users to rescale MIC values to a manually set range. This is the powerhouse behind the `scale_*_mic()` functions, but it can be used independently to, for instance, compare equality in MIC distributions by rescaling them to the same range first.
* **Support for Python**
* While using R for the heavy lifting, [our 'AMR' Python Package](https://pypi.org/project/AMR/) was developed to run the AMR R package natively in Python. The Python package will always have the same version number as the R package, as it is built automatically with every code change.
* **Support for `tidymodels`**
* All antimicrobial selectors (such as `aminoglycosides()` and `betalactams()`) are now supported in `tidymodels` packages such as `recipe` and `parsnip`. See for more info [our tutorial](https://msberends.github.io/AMR/articles/AMR_with_tidymodels.html) on using AMR function for predictive modelling.
* **Other**
* New function `mo_group_members()` to retrieve the member microorganisms of a microorganism group. For example, `mo_group_members("Strep group C")` returns a vector of all microorganisms that belong to that group.
@ -73,7 +75,9 @@ This package now supports not only tools for AMR data analysis in clinical setti
* Updated the prevalence calculation to include genera from the World Health Organization's (WHO) Priority Pathogen List
* Improved algorithm of `first_isolate()` when using the phenotype-based method, to prioritise records with the highest availability of SIR values
* `scale_y_percent()` can now cope with ranges outside the 0-100% range
* Support for new Dutch national MDRO guideline (SRI-richtlijn BRMO, Nov 2024)
* MDRO determination (using `mdro()`)
* Implemented the new Dutch national MDRO guideline (SRI-richtlijn BRMO, Nov 2024)
* Added arguments `esbl`, `carbapenemase`, `mecA`, `mecC`, `vanA`, `vanB` to denote column names or logical values indicating presence of these genes (or production of their proteins)
## Other
* Greatly improved `vctrs` integration, a Tidyverse package working in the background for many Tidyverse functions. For users, this means that functions such as `dplyr`'s `bind_rows()`, `rowwise()` and `c_across()` are now supported for e.g. columns of class `mic`. Despite this, this `AMR` package is still zero-dependent on any other package, including `dplyr` and `vctrs`.

View File

@ -1,6 +1,6 @@
Metadata-Version: 2.1
Name: AMR
Version: 2.1.1.9120
Version: 2.1.1.9121
Summary: A Python wrapper for the AMR R package
Home-page: https://github.com/msberends/AMR
Author: Matthijs Berends

View File

@ -441,9 +441,9 @@ def mdr_tb(x = None, *args, **kwargs):
def mdr_cmi2012(x = None, *args, **kwargs):
"""See our website of the R package for the manual: https://msberends.github.io/AMR/index.html"""
return convert_to_python(amr_r.mdr_cmi2012(x = None, *args, **kwargs))
def eucast_exceptional_phenotypes(x = None, *args, **kwargs):
def eucast_exceptional_phenotypes(*args, **kwargs):
"""See our website of the R package for the manual: https://msberends.github.io/AMR/index.html"""
return convert_to_python(amr_r.eucast_exceptional_phenotypes(x = None, *args, **kwargs))
return convert_to_python(amr_r.eucast_exceptional_phenotypes(*args, **kwargs))
def mean_amr_distance(x, *args, **kwargs):
"""See our website of the R package for the manual: https://msberends.github.io/AMR/index.html"""
return convert_to_python(amr_r.mean_amr_distance(x, *args, **kwargs))

Binary file not shown.

Binary file not shown.

View File

@ -2,7 +2,7 @@ from setuptools import setup, find_packages
setup(
name='AMR',
version='2.1.1.9120',
version='2.1.1.9121',
packages=find_packages(),
install_requires=[
'rpy2',

View File

@ -988,7 +988,7 @@ get_current_data <- function(arg_name, call) {
for (env in frms[which(with_mask)]) {
if (is.function(env$mask$current_rows) && (valid_df(env$data) || valid_df(env$`.data`))) {
# an element `.data` or `data` (containing all data) and `mask` (containing functions) will be in the environment when using dplyr verbs
# we use their mask$current_rows() to get the group rows, since dplyr::cur_data_all() is deprecated and will be removed in the future
# we use their mask$current_rows() below to get the group rows, since dplyr::cur_data_all() is deprecated and will be removed in the future
# e.g. for `example_isolates %>% group_by(ward) %>% mutate(first = first_isolate(.))`
if (valid_df(env$data)) {
# support for dplyr 1.1.x
@ -1008,6 +1008,9 @@ get_current_data <- function(arg_name, call) {
if (valid_df(env$`.data`)) {
# an element `.data` will be in the environment when using dplyr::select()
return(env$`.data`)
} else if (valid_df(env$data)) {
# an element `data` will be in the environment when using older dplyr versions, or tidymodels
return(env$data)
} else if (valid_df(env$xx)) {
# an element `xx` will be in the environment for rows + cols in base R, e.g. `example_isolates[c(1:3), carbapenems()]`
return(env$xx)

243
R/mdro.R
View File

@ -32,6 +32,12 @@
#' Determine which isolates are multidrug-resistant organisms (MDRO) according to international, national, or custom guidelines.
#' @param x a [data.frame] with antibiotics columns, like `AMX` or `amox`. Can be left blank for automatic determination.
#' @param guideline a specific guideline to follow, see sections *Supported international / national guidelines* and *Using Custom Guidelines* below. When left empty, the publication by Magiorakos *et al.* (see below) will be followed.
#' @param esbl [logical] values, or a column name containing logical values, indicating the presence of an ESBL gene (or production of its proteins)
#' @param carbapenemase [logical] values, or a column name containing logical values, indicating the presence of a carbapenemase gene (or production of its proteins)
#' @param mecA [logical] values, or a column name containing logical values, indicating the presence of a *mecA* gene (or production of its proteins)
#' @param mecC [logical] values, or a column name containing logical values, indicating the presence of a *mecC* gene (or production of its proteins)
#' @param vanA [logical] values, or a column name containing logical values, indicating the presence of a *vanA* gene (or production of its proteins)
#' @param vanB [logical] values, or a column name containing logical values, indicating the presence of a *vanB* gene (or production of its proteins)
#' @param ... in case of [custom_mdro_guideline()]: a set of rules, see section *Using Custom Guidelines* below. Otherwise: column name of an antibiotic, see section *Antibiotics* below.
#' @param as_factor a [logical] to indicate whether the returned value should be an ordered [factor] (`TRUE`, default), or otherwise a [character] vector
#' @inheritParams eucast_rules
@ -177,6 +183,12 @@
mdro <- function(x = NULL,
guideline = "CMI2012",
col_mo = NULL,
esbl = NA,
carbapenemase = NA,
mecA = NA,
mecC = NA,
vanA = NA,
vanB = NA,
info = interactive(),
pct_required_classes = 0.5,
combine_SI = TRUE,
@ -190,9 +202,13 @@ mdro <- function(x = NULL,
}
meet_criteria(x, allow_class = "data.frame") # also checks dimensions to be >0
meet_criteria(guideline, allow_class = c("list", "character"), allow_NULL = TRUE)
if (!is.list(guideline)) {
meet_criteria(guideline, allow_class = "character", has_length = 1, allow_NULL = TRUE)
}
if (!is.list(guideline)) meet_criteria(guideline, allow_class = "character", has_length = 1, allow_NULL = TRUE)
meet_criteria(esbl, allow_class = c("logical", "character"), allow_NA = TRUE)
meet_criteria(carbapenemase, allow_class = c("logical", "character"), allow_NA = TRUE)
meet_criteria(mecA, allow_class = c("logical", "character"), allow_NA = TRUE)
meet_criteria(mecC, allow_class = c("logical", "character"), allow_NA = TRUE)
meet_criteria(vanA, allow_class = c("logical", "character"), allow_NA = TRUE)
meet_criteria(vanB, allow_class = c("logical", "character"), allow_NA = TRUE)
meet_criteria(col_mo, allow_class = "character", has_length = 1, is_in = colnames(x), allow_NULL = TRUE)
meet_criteria(info, allow_class = "logical", has_length = 1)
meet_criteria(pct_required_classes, allow_class = "numeric", has_length = 1)
@ -203,7 +219,51 @@ mdro <- function(x = NULL,
if (!any(is_sir_eligible(x))) {
stop_("There were no possible SIR columns found in the data set. Transform columns with `as.sir()` for valid antimicrobial interpretations.")
}
# get gene values as TRUE/FALSE
if (is.character(esbl)) {
meet_criteria(esbl, is_in = colnames(x), allow_NA = FALSE, has_length = 1)
esbl <- x[[esbl]]
meet_criteria(esbl, allow_class = "logical", allow_NA = TRUE)
} else if (length(esbl) == 1) {
esbl <- rep(esbl, NROW(x))
}
if (is.character(carbapenemase)) {
meet_criteria(carbapenemase, is_in = colnames(x), allow_NA = FALSE, has_length = 1)
carbapenemase <- x[[carbapenemase]]
meet_criteria(carbapenemase, allow_class = "logical", allow_NA = TRUE)
} else if (length(carbapenemase) == 1) {
carbapenemase <- rep(carbapenemase, NROW(x))
}
if (is.character(mecA)) {
meet_criteria(mecA, is_in = colnames(x), allow_NA = FALSE, has_length = 1)
mecA <- x[[mecA]]
meet_criteria(mecA, allow_class = "logical", allow_NA = TRUE)
} else if (length(mecA) == 1) {
mecA <- rep(mecA, NROW(x))
}
if (is.character(mecC)) {
meet_criteria(mecC, is_in = colnames(x), allow_NA = FALSE, has_length = 1)
mecC <- x[[mecC]]
meet_criteria(mecC, allow_class = "logical", allow_NA = TRUE)
} else if (length(mecC) == 1) {
mecC <- rep(mecC, NROW(x))
}
if (is.character(vanA)) {
meet_criteria(vanA, is_in = colnames(x), allow_NA = FALSE, has_length = 1)
vanA <- x[[vanA]]
meet_criteria(vanA, allow_class = "logical", allow_NA = TRUE)
} else if (length(vanA) == 1) {
vanA <- rep(vanA, NROW(x))
}
if (is.character(vanB)) {
meet_criteria(vanB, is_in = colnames(x), allow_NA = FALSE, has_length = 1)
vanB <- x[[vanB]]
meet_criteria(vanB, allow_class = "logical", allow_NA = TRUE)
} else if (length(vanB) == 1) {
vanB <- rep(vanB, NROW(x))
}
info.bak <- info
# don't throw info's more than once per call
if (isTRUE(info)) {
@ -476,7 +536,7 @@ mdro <- function(x = NULL,
if (!"AMP" %in% names(cols_ab) && "AMX" %in% names(cols_ab)) {
# ampicillin column is missing, but amoxicillin is available
if (isTRUE(info)) {
message_("Using column '", cols_ab[names(cols_ab) == "AMX"], "' as input for ampicillin since many EUCAST rules depend on it.")
message_("Using column '", cols_ab[names(cols_ab) == "AMX"], "' as input for ampicillin since many MDRO rules depend on it.")
}
cols_ab <- c(cols_ab, c(AMP = unname(cols_ab[names(cols_ab) == "AMX"])))
}
@ -663,6 +723,17 @@ mdro <- function(x = NULL,
out[is.na(out)] <- FALSE
out
}
col_values <- function(df, col, return_if_lacking = "") {
if (col %in% colnames(df)) {
df[[col]]
} else {
rep(return_if_lacking, NROW(df))
}
}
NA_as_FALSE <- function(x) {
x[is.na(x)] <- FALSE
x
}
# antibiotic classes
# nolint start
@ -677,6 +748,10 @@ mdro <- function(x = NULL,
# helper function for editing the table
trans_tbl <- function(to, rows, cols, any_all, reason = NULL) {
cols.bak <- cols
if (identical(cols, "any")) {
cols <- unique(cols_ab)
}
cols <- cols[!ab_missing(cols)]
cols <- cols[!is.na(cols)]
if (length(rows) > 0 && length(cols) > 0) {
@ -690,7 +765,7 @@ mdro <- function(x = NULL,
x[rows, "columns_nonsusceptible"] <<- vapply(
FUN.VALUE = character(1),
rows,
function(row, group_vct = cols) {
function(row, group_vct = cols_ab) {
cols_nonsus <- vapply(
FUN.VALUE = logical(1),
x[row, group_vct, drop = FALSE],
@ -717,7 +792,7 @@ mdro <- function(x = NULL,
rows_affected <- vapply(
FUN.VALUE = logical(1),
x_transposed,
function(y) search_function(y %in% search_result, na.rm = TRUE)
function(y) search_function(y %in% search_result, na.rm = TRUE) | identical(cols.bak, "any")
)
rows_affected <- x[which(rows_affected), "row_number", drop = TRUE]
rows_to_change <- rows[rows %in% rows_affected]
@ -1449,62 +1524,83 @@ mdro <- function(x = NULL,
if (length(ESBLs) > 0) {
trans_tbl(
2, # positive, unconfirmed
which(x$order == "Enterobacterales" & x[[ESBLs[1]]] == "R" & x[[ESBLs[2]]] == "R"),
c(AMX %or% AMP, cephalosporins_3rd),
"all",
reason = "Enterobacterales: ESBL"
rows = which(x$order == "Enterobacterales" & x[[ESBLs[1]]] == "R" & x[[ESBLs[2]]] == "R" & is.na(esbl)),
cols = c(AMX %or% AMP, cephalosporins_3rd),
any_all = "all",
reason = "Enterobacterales: potential ESBL"
)
}
trans_tbl(
3, # positive
which(x$order == "Enterobacterales" & (x$genus %in% c("Proteus", "Providencia") | paste(x$genus, x$species) %in% c("Serratia marcescens", "Morganella morganii"))),
carbapenems_without_imipenem,
"any",
reason = "Enterobacterales: carbapenem or carbapenemase"
rows = which(x$order == "Enterobacterales" & esbl == TRUE),
cols = "any",
any_all = "any",
reason = "Enterobacterales: ESBL"
)
trans_tbl(
3,
which(x$order == "Enterobacterales" & !(x$genus %in% c("Proteus", "Providencia") | paste(x$genus, x$species) %in% c("Serratia marcescens", "Morganella morganii"))),
carbapenems,
"any",
reason = "Enterobacterales: carbapenem or carbapenemase"
rows = which(x$order == "Enterobacterales" & (x$genus %in% c("Proteus", "Providencia") | paste(x$genus, x$species) %in% c("Serratia marcescens", "Morganella morganii"))),
cols = carbapenems_without_imipenem,
any_all = "any",
reason = "Enterobacterales: carbapenem resistance"
)
trans_tbl(
3,
which(x[[SXT]] == "R" &
rows = which(x$order == "Enterobacterales" & !(x$genus %in% c("Proteus", "Providencia") | paste(x$genus, x$species) %in% c("Serratia marcescens", "Morganella morganii"))),
cols = carbapenems,
any_all = "any",
reason = "Enterobacterales: carbapenem resistance"
)
trans_tbl(
3,
rows = which(x$order == "Enterobacterales" & carbapenemase == TRUE),
cols = "any",
any_all = "any",
reason = "Enterobacterales: carbapenemase"
)
trans_tbl(
3,
rows = which(x[[SXT]] == "R" &
(x[[GEN]] == "R" | x[[TOB]] == "R" | x[[AMK]] == "R") &
(x[[CIP]] == "R" | x[[NOR]] == "R" | x[[LVX]] == "R") &
(x$genus %in% c("Enterobacter", "Providencia") | paste(x$genus, x$species) %in% c("Citrobacter freundii", "Klebsiella aerogenes", "Hafnia alvei", "Morganella morganii"))),
c(SXT, aminoglycosides, fluoroquinolones),
"any",
cols = c(SXT, aminoglycosides, fluoroquinolones),
any_all = "any",
reason = "Enterobacterales group II: aminoglycoside + fluoroquinolone + cotrimoxazol"
)
trans_tbl(
3,
which(x[[SXT]] == "R" &
rows = which(x[[SXT]] == "R" &
x[[GEN]] == "R" &
(x[[CIP]] == "R" | x[[NOR]] == "R" | x[[LVX]] == "R") &
paste(x$genus, x$species) == "Serratia marcescens"),
c(SXT, aminoglycosides_serratia_marcescens, fluoroquinolones),
"any",
cols = c(SXT, aminoglycosides_serratia_marcescens, fluoroquinolones),
any_all = "any",
reason = "Enterobacterales group II: aminoglycoside + fluoroquinolone + cotrimoxazol"
)
# Acinetobacter baumannii-calcoaceticus complex
trans_tbl(
3,
which((x[[GEN]] == "R" | x[[TOB]] == "R" | x[[AMK]] == "R") &
rows = which((x[[GEN]] == "R" | x[[TOB]] == "R" | x[[AMK]] == "R") &
(x[[CIP]] == "R" | x[[LVX]] == "R") &
x[[col_mo]] %in% AMR::microorganisms.groups$mo[AMR::microorganisms.groups$mo_group_name == "Acinetobacter baumannii complex"]),
c(aminoglycosides, CIP, LVX),
"any",
cols = c(aminoglycosides, CIP, LVX),
any_all = "any",
reason = "A. baumannii-calcoaceticus complex: aminoglycoside + ciprofloxacin or levofloxacin"
)
trans_tbl(
2, # unconfirmed
which(x[[col_mo]] %in% AMR::microorganisms.groups$mo[AMR::microorganisms.groups$mo_group_name == "Acinetobacter baumannii complex"]),
carbapenems,
"any",
rows = which(x[[col_mo]] %in% AMR::microorganisms.groups$mo[AMR::microorganisms.groups$mo_group_name == "Acinetobacter baumannii complex"] & is.na(carbapenemase)),
cols = carbapenems,
any_all = "any",
reason = "A. baumannii-calcoaceticus complex: potential carbapenemase"
)
trans_tbl(
3,
rows = which(x[[col_mo]] %in% AMR::microorganisms.groups$mo[AMR::microorganisms.groups$mo_group_name == "Acinetobacter baumannii complex"] & carbapenemase == TRUE),
cols = carbapenems,
any_all = "any",
reason = "A. baumannii-calcoaceticus complex: carbapenemase"
)
@ -1513,59 +1609,65 @@ mdro <- function(x = NULL,
# take pip/tazo if just pip is not available - many labs only test for pip/tazo because of availability on a Vitek card
PIP <- TZP
}
if (!ab_missing(MEM) && !ab_missing(IPM) &&
!ab_missing(GEN) && !ab_missing(TOB) &&
!ab_missing(CIP) &&
!ab_missing(CAZ) &&
!ab_missing(PIP)) {
x$psae <- 0
x[which(x[, MEM, drop = TRUE] == "R" | x[, IPM, drop = TRUE] == "R"), "psae"] <- 1 + x[which(x[, MEM, drop = TRUE] == "R" | x[, IPM, drop = TRUE] == "R"), "psae"]
x[which(x[, GEN, drop = TRUE] == "R" & x[, TOB, drop = TRUE] == "R"), "psae"] <- 1 + x[which(x[, GEN, drop = TRUE] == "R" & x[, TOB, drop = TRUE] == "R"), "psae"]
x[which(x[, CIP, drop = TRUE] == "R"), "psae"] <- 1 + x[which(x[, CIP, drop = TRUE] == "R"), "psae"]
x[which(x[, CAZ, drop = TRUE] == "R"), "psae"] <- 1 + x[which(x[, CAZ, drop = TRUE] == "R"), "psae"]
x[which(x[, PIP, drop = TRUE] == "R"), "psae"] <- 1 + x[which(x[, PIP, drop = TRUE] == "R"), "psae"]
} else {
x$psae <- 0
}
x$psae <- 0
x$psae <- x$psae + ifelse(NA_as_FALSE(col_values(x, TOB) == "R" | col_values(x, AMK) == "R"), 1, 0)
x$psae <- x$psae + ifelse(NA_as_FALSE(col_values(x, IPM) == "R" | col_values(x, MEM) == "R"), 1, 0)
x$psae <- x$psae + ifelse(NA_as_FALSE(col_values(x, PIP) == "R"), 1, 0)
x$psae <- x$psae + ifelse(NA_as_FALSE(col_values(x, CAZ) == "R"), 1, 0)
x$psae <- x$psae + ifelse(NA_as_FALSE(col_values(x, CIP) == "R" | col_values(x, NOR) == "R" | col_values(x, LVX) == "R"), 1, 0)
trans_tbl(
3,
which(x$genus == "Pseudomonas" & x$species == "aeruginosa"),
c(CAZ, CIP, GEN, IPM, MEM, TOB, PIP),
"all", # this will set all negatives to "guideline criteria not met" instead of "not covered by guideline"
rows = which(x$genus == "Pseudomonas" & x$species == "aeruginosa"),
cols = c(CAZ, CIP, GEN, IPM, MEM, TOB, PIP),
any_all = "all", # this will set all negatives to "guideline criteria not met" instead of "not covered by guideline"
reason = "P. aeruginosa: at least 3 classes contain R"
)
trans_tbl(
3,
which(x$genus == "Pseudomonas" & x$species == "aeruginosa" & x$psae >= 3),
c(CAZ, CIP, GEN, IPM, MEM, TOB, PIP),
"any", # this is the actual one, changing the ones with x$psae >= 3
rows = which(x$genus == "Pseudomonas" & x$species == "aeruginosa" & x$psae >= 3),
cols = c(CAZ, CIP, GEN, IPM, MEM, TOB, PIP),
any_all = "any", # this is the actual one, changing the ones with x$psae >= 3
reason = "P. aeruginosa: at least 3 classes contain R"
)
# Enterococcus faecium
trans_tbl(
3,
which(x$genus == "Enterococcus" & x$species == "faecium"),
c(PEN %or% AMX %or% AMP, VAN),
"all",
reason = "E. faecium: vancomycin or vanA/vanB gene + penicillin group"
rows = which(x$genus == "Enterococcus" & x$species == "faecium"),
cols = c(PEN %or% AMX %or% AMP, VAN),
any_all = "all",
reason = "E. faecium: vancomycin + penicillin group"
)
trans_tbl(
3,
rows = which(x$genus == "Enterococcus" & x$species == "faecium" & (vanA == TRUE | vanB == TRUE)),
cols = c(PEN, AMX, AMP, VAN),
any_all = "any",
reason = "E. faecium: vanA/vanB gene + penicillin group"
)
# Staphylococcus aureus
trans_tbl(
2,
which(x$genus == "Staphylococcus" & x$species == "aureus"),
c(PEN, AMX, AMP, FLC, OXA, FOX, FOX1),
"any",
reason = "S. aureus: MRSA"
rows = which(x$genus == "Staphylococcus" & x$species == "aureus" & (is.na(mecA) | is.na(mecC))),
cols = c(AMC, TZP, FLC, OXA, FOX, FOX1),
any_all = "any",
reason = "S. aureus: potential MRSA"
)
trans_tbl(
3,
rows = which(x$genus == "Staphylococcus" & x$species == "aureus" & (mecA == TRUE | mecC == TRUE)),
cols = "any",
any_all = "any",
reason = "S. aureus: mecA/mecC gene"
)
# Candida auris
trans_tbl(
3,
which(x$genus == "Candida" & x$species == "auris"),
character(0),
"any",
rows = which(x$genus == "Candida" & x$species == "auris"),
cols = "any",
any_all = "any",
reason = "C. auris: regardless of resistance"
)
}
@ -2040,50 +2142,51 @@ brmo <- function(x = NULL, only_sir_columns = FALSE, ...) {
mdro(x = x, only_sir_columns = only_sir_columns, guideline = "BRMO", ...)
}
#' @rdname mdro
#' @export
mrgn <- function(x = NULL, only_sir_columns = FALSE, ...) {
mrgn <- function(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...) {
meet_criteria(x, allow_class = "data.frame", allow_NULL = TRUE)
meet_criteria(only_sir_columns, allow_class = "logical", has_length = 1)
stop_if(
"guideline" %in% names(list(...)),
"argument `guideline` must not be set since this is a guideline-specific function"
)
mdro(x = x, only_sir_columns = only_sir_columns, guideline = "MRGN", ...)
mdro(x = x, only_sir_columns = only_sir_columns, verbose = verbose, guideline = "MRGN", ...)
}
#' @rdname mdro
#' @export
mdr_tb <- function(x = NULL, only_sir_columns = FALSE, ...) {
mdr_tb <- function(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...) {
meet_criteria(x, allow_class = "data.frame", allow_NULL = TRUE)
meet_criteria(only_sir_columns, allow_class = "logical", has_length = 1)
stop_if(
"guideline" %in% names(list(...)),
"argument `guideline` must not be set since this is a guideline-specific function"
)
mdro(x = x, only_sir_columns = only_sir_columns, guideline = "TB", ...)
mdro(x = x, only_sir_columns = only_sir_columns, verbose = verbose, guideline = "TB", ...)
}
#' @rdname mdro
#' @export
mdr_cmi2012 <- function(x = NULL, only_sir_columns = FALSE, ...) {
mdr_cmi2012 <- function(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...) {
meet_criteria(x, allow_class = "data.frame", allow_NULL = TRUE)
meet_criteria(only_sir_columns, allow_class = "logical", has_length = 1)
stop_if(
"guideline" %in% names(list(...)),
"argument `guideline` must not be set since this is a guideline-specific function"
)
mdro(x = x, only_sir_columns = only_sir_columns, guideline = "CMI2012", ...)
mdro(x = x, only_sir_columns = only_sir_columns, verbose = verbose, guideline = "CMI2012", ...)
}
#' @rdname mdro
#' @export
eucast_exceptional_phenotypes <- function(x = NULL, only_sir_columns = FALSE, ...) {
eucast_exceptional_phenotypes <- function(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...) {
meet_criteria(x, allow_class = "data.frame", allow_NULL = TRUE)
meet_criteria(only_sir_columns, allow_class = "logical", has_length = 1)
stop_if(
"guideline" %in% names(list(...)),
"argument `guideline` must not be set since this is a guideline-specific function"
)
mdro(x = x, only_sir_columns = only_sir_columns, guideline = "EUCAST", ...)
mdro(x = x, only_sir_columns = only_sir_columns, verbose = verbose, guideline = "EUCAST", ...)
}

View File

@ -80,6 +80,9 @@ navbar:
- text: "Download Data Sets for Own Use"
icon: "fa-database"
href: "articles/datasets.html"
- text: "Use AMR for Predictive Modelling (tidymodels)"
icon: "fa-square-root-variable"
href: "articles/AMR_with_tidymodels.html"
- text: "Set User- Or Team-specific Package Settings"
icon: "fa-gear"
href: "reference/AMR-options.html"

View File

@ -1,5 +1,5 @@
This files contains all context you must know about the AMR package for R.
First and foremost, you are trained on version 2.1.1.9120. Remember this whenever someone asks which AMR package version youre at.
First and foremost, you are trained on version 2.1.1.9121. Remember this whenever someone asks which AMR package version youre at.
--------------------------------
THE PART HEREAFTER CONTAINS CONTENTS FROM FILE 'NAMESPACE':
@ -6086,6 +6086,12 @@ mdro(
x = NULL,
guideline = "CMI2012",
col_mo = NULL,
esbl = NA,
carbapenemase = NA,
mecA = NA,
mecC = NA,
vanA = NA,
vanB = NA,
info = interactive(),
pct_required_classes = 0.5,
combine_SI = TRUE,
@ -6098,13 +6104,18 @@ custom_mdro_guideline(..., as_factor = TRUE)
brmo(x = NULL, only_sir_columns = FALSE, ...)
mrgn(x = NULL, only_sir_columns = FALSE, ...)
mrgn(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...)
mdr_tb(x = NULL, only_sir_columns = FALSE, ...)
mdr_tb(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...)
mdr_cmi2012(x = NULL, only_sir_columns = FALSE, ...)
mdr_cmi2012(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...)
eucast_exceptional_phenotypes(x = NULL, only_sir_columns = FALSE, ...)
eucast_exceptional_phenotypes(
x = NULL,
only_sir_columns = FALSE,
verbose = FALSE,
...
)
}
\arguments{
\item{x}{a \link{data.frame} with antibiotics columns, like \code{AMX} or \code{amox}. Can be left blank for automatic determination.}
@ -6113,6 +6124,18 @@ eucast_exceptional_phenotypes(x = NULL, only_sir_columns = FALSE, ...)
\item{col_mo}{column name of the names or codes of the microorganisms (see \code{\link[=as.mo]{as.mo()}}) - the default is the first column of class \code{\link{mo}}. Values will be coerced using \code{\link[=as.mo]{as.mo()}}.}
\item{esbl}{\link{logical} values, or a column name containing logical values, indicating the presence of an ESBL gene (or production of its proteins)}
\item{carbapenemase}{\link{logical} values, or a column name containing logical values, indicating the presence of a carbapenemase gene (or production of its proteins)}
\item{mecA}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{mecA} gene (or production of its proteins)}
\item{mecC}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{mecC} gene (or production of its proteins)}
\item{vanA}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{vanA} gene (or production of its proteins)}
\item{vanB}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{vanB} gene (or production of its proteins)}
\item{info}{a \link{logical} to indicate whether progress should be printed to the console - the default is only print while in interactive sessions}
\item{pct_required_classes}{minimal required percentage of antimicrobial classes that must be available per isolate, rounded down. For example, with the default guideline, 17 antimicrobial classes must be available for \emph{S. aureus}. Setting this \code{pct_required_classes} argument to \code{0.5} (default) means that for every \emph{S. aureus} isolate at least 8 different classes must be available. Any lower number of available classes will return \code{NA} for that isolate.}
@ -8885,6 +8908,203 @@ Whether you're cleaning data or analysing resistance patterns, the `AMR` Python
THE PART HEREAFTER CONTAINS CONTENTS FROM FILE 'vignettes/AMR_with_tidymodels.Rmd':
---
title: "`AMR` with `tidymodels`"
output:
rmarkdown::html_vignette:
toc: true
toc_depth: 3
vignette: >
%\VignetteIndexEntry{`AMR` with `tidymodels`}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
```{r setup, include = FALSE, results = 'markup'}
knitr::opts_chunk$set(
warning = FALSE,
collapse = TRUE,
comment = "#>",
fig.width = 7.5,
fig.height = 5
)
```
Antimicrobial resistance (AMR) is a global health crisis, and understanding resistance patterns is crucial for managing effective treatments. The `AMR` R package provides robust tools for analysing AMR data, including convenient antibiotic selector functions like `aminoglycosides()` and `betalactams()`. In this post, we will explore how to use the `tidymodels` framework to predict resistance patterns in the `example_isolates` dataset.
By leveraging the power of `tidymodels` and the `AMR` package, well build a reproducible machine learning workflow to predict resistance to two important antibiotic classes: aminoglycosides and beta-lactams.
---
### **Objective**
Our goal is to build a predictive model using the `tidymodels` framework to determine resistance patterns based on microbial data. We will:
1. Preprocess data using the selector functions `aminoglycosides()` and `betalactams()`.
2. Define a logistic regression model for prediction.
3. Use a structured `tidymodels` workflow to preprocess, train, and evaluate the model.
---
### **Data Preparation**
We begin by loading the required libraries and preparing the `example_isolates` dataset from the `AMR` package.
```{r}
# Load required libraries
library(tidymodels) # For machine learning workflows, and data manipulation (dplyr, tidyr, ...)
library(AMR) # For AMR data analysis
# Load the example_isolates dataset
data("example_isolates") # Preloaded dataset with AMR results
# Select relevant columns for prediction
data <- example_isolates %>%
# select AB results dynamically
select(mo, aminoglycosides(), betalactams()) %>%
# replace NAs with NI (not-interpretable)
mutate(across(where(is.sir),
~replace_na(.x, "NI")),
# make factors of SIR columns
across(where(is.sir),
as.integer),
# get Gramstain of microorganisms
mo = as.factor(mo_gramstain(mo))) %>%
# drop NAs - the ones without a Gramstain (fungi, etc.)
drop_na() # %>%
# Cefepime is not reliable
#select(-FEP)
```
**Explanation:**
- `aminoglycosides()` and `betalactams()` dynamically select columns for antibiotics in these classes.
- `drop_na()` ensures the model receives complete cases for training.
---
### **Defining the Workflow**
We now define the `tidymodels` workflow, which consists of three steps: preprocessing, model specification, and fitting.
#### 1. Preprocessing with a Recipe
We create a recipe to preprocess the data for modelling. This includes:
- Encoding resistance results (`S`, `I`, `R`) as binary (resistant or not resistant).
- Converting microbial organism names (`mo`) into numerical features using one-hot encoding.
```{r}
# Define the recipe for data preprocessing
resistance_recipe <- recipe(mo ~ ., data = data) %>%
step_corr(c(aminoglycosides(), betalactams()), threshold = 0.9)
resistance_recipe
```
**Explanation:**
- `step_mutate()` transforms resistance results (`R`) into binary variables (TRUE/FALSE).
- `step_dummy()` converts categorical organism (`mo`) names into one-hot encoded numerical features, making them compatible with the model.
#### 2. Specifying the Model
We define a logistic regression model since resistance prediction is a binary classification task.
```{r}
# Specify a logistic regression model
logistic_model <- logistic_reg() %>%
set_engine("glm") # Use the Generalized Linear Model engine
logistic_model
```
**Explanation:**
- `logistic_reg()` sets up a logistic regression model.
- `set_engine("glm")` specifies the use of R's built-in GLM engine.
#### 3. Building the Workflow
We bundle the recipe and model together into a `workflow`, which organizes the entire modeling process.
```{r}
# Combine the recipe and model into a workflow
resistance_workflow <- workflow() %>%
add_recipe(resistance_recipe) %>% # Add the preprocessing recipe
add_model(logistic_model) # Add the logistic regression model
resistance_workflow
```
---
### **Training and Evaluating the Model**
To train the model, we split the data into training and testing sets. Then, we fit the workflow on the training set and evaluate its performance.
```{r}
# Split data into training and testing sets
set.seed(123) # For reproducibility
data_split <- initial_split(data, prop = 0.8) # 80% training, 20% testing
training_data <- training(data_split) # Training set
testing_data <- testing(data_split) # Testing set
# Fit the workflow to the training data
fitted_workflow <- resistance_workflow %>%
fit(training_data) # Train the model
fitted_workflow
```
**Explanation:**
- `initial_split()` splits the data into training and testing sets.
- `fit()` trains the workflow on the training set.
Next, we evaluate the model on the testing data.
```{r}
# Make predictions on the testing set
predictions <- fitted_workflow %>%
predict(testing_data) # Generate predictions
probabilities <- fitted_workflow %>%
predict(testing_data, type = "prob") # Generate probabilities
predictions <- predictions %>%
bind_cols(probabilities) %>%
bind_cols(testing_data) # Combine with true labels
predictions
# Evaluate model performance
metrics <- predictions %>%
metrics(truth = mo, estimate = .pred_class) # Calculate performance metrics
metrics
```
**Explanation:**
- `predict()` generates predictions on the testing set.
- `metrics()` computes evaluation metrics like accuracy and AUC.
It appears we can predict the Gram based on AMR results with a `r round(metrics$.estimate[1], 3)` accuracy. The ROC curve looks like:
```{r}
predictions %>%
roc_curve(mo, `.pred_Gram-negative`) %>%
autoplot()
```
---
### **Conclusion**
In this post, we demonstrated how to build a machine learning pipeline with the `tidymodels` framework and the `AMR` package. By combining selector functions like `aminoglycosides()` and `betalactams()` with `tidymodels`, we efficiently prepared data, trained a model, and evaluated its performance.
This workflow is extensible to other antibiotic classes and resistance patterns, empowering users to analyse AMR data systematically and reproducibly.
---
THE PART HEREAFTER CONTAINS CONTENTS FROM FILE 'vignettes/EUCAST.Rmd':

View File

@ -45,7 +45,7 @@ expect_identical(class(outcome), c("ordered", "factor"))
# example_isolates should have these finding using Dutch guidelines
expect_equal(
as.double(table(outcome)),
c(1994, 0, 6)
c(1977, 23, 0)
)
expect_equal(

View File

@ -23,6 +23,12 @@ mdro(
x = NULL,
guideline = "CMI2012",
col_mo = NULL,
esbl = NA,
carbapenemase = NA,
mecA = NA,
mecC = NA,
vanA = NA,
vanB = NA,
info = interactive(),
pct_required_classes = 0.5,
combine_SI = TRUE,
@ -35,13 +41,18 @@ custom_mdro_guideline(..., as_factor = TRUE)
brmo(x = NULL, only_sir_columns = FALSE, ...)
mrgn(x = NULL, only_sir_columns = FALSE, ...)
mrgn(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...)
mdr_tb(x = NULL, only_sir_columns = FALSE, ...)
mdr_tb(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...)
mdr_cmi2012(x = NULL, only_sir_columns = FALSE, ...)
mdr_cmi2012(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...)
eucast_exceptional_phenotypes(x = NULL, only_sir_columns = FALSE, ...)
eucast_exceptional_phenotypes(
x = NULL,
only_sir_columns = FALSE,
verbose = FALSE,
...
)
}
\arguments{
\item{x}{a \link{data.frame} with antibiotics columns, like \code{AMX} or \code{amox}. Can be left blank for automatic determination.}
@ -50,6 +61,18 @@ eucast_exceptional_phenotypes(x = NULL, only_sir_columns = FALSE, ...)
\item{col_mo}{column name of the names or codes of the microorganisms (see \code{\link[=as.mo]{as.mo()}}) - the default is the first column of class \code{\link{mo}}. Values will be coerced using \code{\link[=as.mo]{as.mo()}}.}
\item{esbl}{\link{logical} values, or a column name containing logical values, indicating the presence of an ESBL gene (or production of its proteins)}
\item{carbapenemase}{\link{logical} values, or a column name containing logical values, indicating the presence of a carbapenemase gene (or production of its proteins)}
\item{mecA}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{mecA} gene (or production of its proteins)}
\item{mecC}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{mecC} gene (or production of its proteins)}
\item{vanA}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{vanA} gene (or production of its proteins)}
\item{vanB}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{vanB} gene (or production of its proteins)}
\item{info}{a \link{logical} to indicate whether progress should be printed to the console - the default is only print while in interactive sessions}
\item{pct_required_classes}{minimal required percentage of antimicrobial classes that must be available per isolate, rounded down. For example, with the default guideline, 17 antimicrobial classes must be available for \emph{S. aureus}. Setting this \code{pct_required_classes} argument to \code{0.5} (default) means that for every \emph{S. aureus} isolate at least 8 different classes must be available. Any lower number of available classes will return \code{NA} for that isolate.}

View File

@ -0,0 +1,191 @@
---
title: "`AMR` with `tidymodels`"
output:
rmarkdown::html_vignette:
toc: true
toc_depth: 3
vignette: >
%\VignetteIndexEntry{`AMR` with `tidymodels`}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
```{r setup, include = FALSE, results = 'markup'}
knitr::opts_chunk$set(
warning = FALSE,
collapse = TRUE,
comment = "#>",
fig.width = 7.5,
fig.height = 5
)
```
Antimicrobial resistance (AMR) is a global health crisis, and understanding resistance patterns is crucial for managing effective treatments. The `AMR` R package provides robust tools for analysing AMR data, including convenient antibiotic selector functions like `aminoglycosides()` and `betalactams()`. In this post, we will explore how to use the `tidymodels` framework to predict resistance patterns in the `example_isolates` dataset.
By leveraging the power of `tidymodels` and the `AMR` package, well build a reproducible machine learning workflow to predict resistance to two important antibiotic classes: aminoglycosides and beta-lactams.
---
### **Objective**
Our goal is to build a predictive model using the `tidymodels` framework to determine resistance patterns based on microbial data. We will:
1. Preprocess data using the selector functions `aminoglycosides()` and `betalactams()`.
2. Define a logistic regression model for prediction.
3. Use a structured `tidymodels` workflow to preprocess, train, and evaluate the model.
---
### **Data Preparation**
We begin by loading the required libraries and preparing the `example_isolates` dataset from the `AMR` package.
```{r}
# Load required libraries
library(tidymodels) # For machine learning workflows, and data manipulation (dplyr, tidyr, ...)
library(AMR) # For AMR data analysis
# Load the example_isolates dataset
data("example_isolates") # Preloaded dataset with AMR results
# Select relevant columns for prediction
data <- example_isolates %>%
# select AB results dynamically
select(mo, aminoglycosides(), betalactams()) %>%
# replace NAs with NI (not-interpretable)
mutate(across(where(is.sir),
~replace_na(.x, "NI")),
# make factors of SIR columns
across(where(is.sir),
as.integer),
# get Gramstain of microorganisms
mo = as.factor(mo_gramstain(mo))) %>%
# drop NAs - the ones without a Gramstain (fungi, etc.)
drop_na() # %>%
# Cefepime is not reliable
#select(-FEP)
```
**Explanation:**
- `aminoglycosides()` and `betalactams()` dynamically select columns for antibiotics in these classes.
- `drop_na()` ensures the model receives complete cases for training.
---
### **Defining the Workflow**
We now define the `tidymodels` workflow, which consists of three steps: preprocessing, model specification, and fitting.
#### 1. Preprocessing with a Recipe
We create a recipe to preprocess the data for modelling. This includes:
- Encoding resistance results (`S`, `I`, `R`) as binary (resistant or not resistant).
- Converting microbial organism names (`mo`) into numerical features using one-hot encoding.
```{r}
# Define the recipe for data preprocessing
resistance_recipe <- recipe(mo ~ ., data = data) %>%
step_corr(c(aminoglycosides(), betalactams()), threshold = 0.9)
resistance_recipe
```
**Explanation:**
- `step_mutate()` transforms resistance results (`R`) into binary variables (TRUE/FALSE).
- `step_dummy()` converts categorical organism (`mo`) names into one-hot encoded numerical features, making them compatible with the model.
#### 2. Specifying the Model
We define a logistic regression model since resistance prediction is a binary classification task.
```{r}
# Specify a logistic regression model
logistic_model <- logistic_reg() %>%
set_engine("glm") # Use the Generalized Linear Model engine
logistic_model
```
**Explanation:**
- `logistic_reg()` sets up a logistic regression model.
- `set_engine("glm")` specifies the use of R's built-in GLM engine.
#### 3. Building the Workflow
We bundle the recipe and model together into a `workflow`, which organizes the entire modeling process.
```{r}
# Combine the recipe and model into a workflow
resistance_workflow <- workflow() %>%
add_recipe(resistance_recipe) %>% # Add the preprocessing recipe
add_model(logistic_model) # Add the logistic regression model
resistance_workflow
```
---
### **Training and Evaluating the Model**
To train the model, we split the data into training and testing sets. Then, we fit the workflow on the training set and evaluate its performance.
```{r}
# Split data into training and testing sets
set.seed(123) # For reproducibility
data_split <- initial_split(data, prop = 0.8) # 80% training, 20% testing
training_data <- training(data_split) # Training set
testing_data <- testing(data_split) # Testing set
# Fit the workflow to the training data
fitted_workflow <- resistance_workflow %>%
fit(training_data) # Train the model
fitted_workflow
```
**Explanation:**
- `initial_split()` splits the data into training and testing sets.
- `fit()` trains the workflow on the training set.
Next, we evaluate the model on the testing data.
```{r}
# Make predictions on the testing set
predictions <- fitted_workflow %>%
predict(testing_data) # Generate predictions
probabilities <- fitted_workflow %>%
predict(testing_data, type = "prob") # Generate probabilities
predictions <- predictions %>%
bind_cols(probabilities) %>%
bind_cols(testing_data) # Combine with true labels
predictions
# Evaluate model performance
metrics <- predictions %>%
metrics(truth = mo, estimate = .pred_class) # Calculate performance metrics
metrics
```
**Explanation:**
- `predict()` generates predictions on the testing set.
- `metrics()` computes evaluation metrics like accuracy and AUC.
It appears we can predict the Gram based on AMR results with a `r round(metrics$.estimate[1], 3)` accuracy. The ROC curve looks like:
```{r}
predictions %>%
roc_curve(mo, `.pred_Gram-negative`) %>%
autoplot()
```
---
### **Conclusion**
In this post, we demonstrated how to build a machine learning pipeline with the `tidymodels` framework and the `AMR` package. By combining selector functions like `aminoglycosides()` and `betalactams()` with `tidymodels`, we efficiently prepared data, trained a model, and evaluated its performance.
This workflow is extensible to other antibiotic classes and resistance patterns, empowering users to analyse AMR data systematically and reproducibly.
---