mirror of
https://github.com/msberends/AMR.git
synced 2025-01-12 00:11:50 +01:00
(v2.1.1.9121) support tidymodels
This commit is contained in:
parent
8249cfda46
commit
15fc72fc66
@ -25,6 +25,7 @@
|
||||
^tests/testthat/_snaps$
|
||||
^vignettes/AMR\.Rmd$
|
||||
^vignettes/AMR_intro\.png$
|
||||
^vignettes/AMR_with_tidymodels\.Rmd$
|
||||
^vignettes/benchmarks\.Rmd$
|
||||
^vignettes/benchmarks\.Rmd\.not$
|
||||
^vignettes/datasets\.Rmd$
|
||||
|
@ -1,6 +1,6 @@
|
||||
Package: AMR
|
||||
Version: 2.1.1.9120
|
||||
Date: 2024-12-15
|
||||
Version: 2.1.1.9121
|
||||
Date: 2024-12-19
|
||||
Title: Antimicrobial Resistance Data Analysis
|
||||
Description: Functions to simplify and standardise antimicrobial resistance (AMR)
|
||||
data analysis and to work with microbial and antimicrobial properties by
|
||||
@ -47,6 +47,7 @@ Suggests:
|
||||
rvest,
|
||||
skimr,
|
||||
tibble,
|
||||
tidymodels,
|
||||
tidyselect,
|
||||
tinytest,
|
||||
vctrs,
|
||||
|
8
NEWS.md
8
NEWS.md
@ -1,4 +1,4 @@
|
||||
# AMR 2.1.1.9120
|
||||
# AMR 2.1.1.9121
|
||||
|
||||
*(this beta version will eventually become v3.0. We're happy to reach a new major milestone soon, which will be all about the new One Health support! Install this beta using [the instructions here](https://msberends.github.io/AMR/#latest-development-version).)*
|
||||
|
||||
@ -30,6 +30,8 @@ This package now supports not only tools for AMR data analysis in clinical setti
|
||||
* New function `rescale_mic()`, which allows users to rescale MIC values to a manually set range. This is the powerhouse behind the `scale_*_mic()` functions, but it can be used independently to, for instance, compare equality in MIC distributions by rescaling them to the same range first.
|
||||
* **Support for Python**
|
||||
* While using R for the heavy lifting, [our 'AMR' Python Package](https://pypi.org/project/AMR/) was developed to run the AMR R package natively in Python. The Python package will always have the same version number as the R package, as it is built automatically with every code change.
|
||||
* **Support for `tidymodels`**
|
||||
* All antimicrobial selectors (such as `aminoglycosides()` and `betalactams()`) are now supported in `tidymodels` packages such as `recipe` and `parsnip`. See for more info [our tutorial](https://msberends.github.io/AMR/articles/AMR_with_tidymodels.html) on using AMR function for predictive modelling.
|
||||
* **Other**
|
||||
* New function `mo_group_members()` to retrieve the member microorganisms of a microorganism group. For example, `mo_group_members("Strep group C")` returns a vector of all microorganisms that belong to that group.
|
||||
|
||||
@ -73,7 +75,9 @@ This package now supports not only tools for AMR data analysis in clinical setti
|
||||
* Updated the prevalence calculation to include genera from the World Health Organization's (WHO) Priority Pathogen List
|
||||
* Improved algorithm of `first_isolate()` when using the phenotype-based method, to prioritise records with the highest availability of SIR values
|
||||
* `scale_y_percent()` can now cope with ranges outside the 0-100% range
|
||||
* Support for new Dutch national MDRO guideline (SRI-richtlijn BRMO, Nov 2024)
|
||||
* MDRO determination (using `mdro()`)
|
||||
* Implemented the new Dutch national MDRO guideline (SRI-richtlijn BRMO, Nov 2024)
|
||||
* Added arguments `esbl`, `carbapenemase`, `mecA`, `mecC`, `vanA`, `vanB` to denote column names or logical values indicating presence of these genes (or production of their proteins)
|
||||
|
||||
## Other
|
||||
* Greatly improved `vctrs` integration, a Tidyverse package working in the background for many Tidyverse functions. For users, this means that functions such as `dplyr`'s `bind_rows()`, `rowwise()` and `c_across()` are now supported for e.g. columns of class `mic`. Despite this, this `AMR` package is still zero-dependent on any other package, including `dplyr` and `vctrs`.
|
||||
|
@ -1,6 +1,6 @@
|
||||
Metadata-Version: 2.1
|
||||
Name: AMR
|
||||
Version: 2.1.1.9120
|
||||
Version: 2.1.1.9121
|
||||
Summary: A Python wrapper for the AMR R package
|
||||
Home-page: https://github.com/msberends/AMR
|
||||
Author: Matthijs Berends
|
||||
|
@ -441,9 +441,9 @@ def mdr_tb(x = None, *args, **kwargs):
|
||||
def mdr_cmi2012(x = None, *args, **kwargs):
|
||||
"""See our website of the R package for the manual: https://msberends.github.io/AMR/index.html"""
|
||||
return convert_to_python(amr_r.mdr_cmi2012(x = None, *args, **kwargs))
|
||||
def eucast_exceptional_phenotypes(x = None, *args, **kwargs):
|
||||
def eucast_exceptional_phenotypes(*args, **kwargs):
|
||||
"""See our website of the R package for the manual: https://msberends.github.io/AMR/index.html"""
|
||||
return convert_to_python(amr_r.eucast_exceptional_phenotypes(x = None, *args, **kwargs))
|
||||
return convert_to_python(amr_r.eucast_exceptional_phenotypes(*args, **kwargs))
|
||||
def mean_amr_distance(x, *args, **kwargs):
|
||||
"""See our website of the R package for the manual: https://msberends.github.io/AMR/index.html"""
|
||||
return convert_to_python(amr_r.mean_amr_distance(x, *args, **kwargs))
|
||||
|
Binary file not shown.
BIN
PythonPackage/AMR/dist/amr-2.1.1.9120.tar.gz
vendored
BIN
PythonPackage/AMR/dist/amr-2.1.1.9120.tar.gz
vendored
Binary file not shown.
BIN
PythonPackage/AMR/dist/amr-2.1.1.9121.tar.gz
vendored
Normal file
BIN
PythonPackage/AMR/dist/amr-2.1.1.9121.tar.gz
vendored
Normal file
Binary file not shown.
@ -2,7 +2,7 @@ from setuptools import setup, find_packages
|
||||
|
||||
setup(
|
||||
name='AMR',
|
||||
version='2.1.1.9120',
|
||||
version='2.1.1.9121',
|
||||
packages=find_packages(),
|
||||
install_requires=[
|
||||
'rpy2',
|
||||
|
@ -988,7 +988,7 @@ get_current_data <- function(arg_name, call) {
|
||||
for (env in frms[which(with_mask)]) {
|
||||
if (is.function(env$mask$current_rows) && (valid_df(env$data) || valid_df(env$`.data`))) {
|
||||
# an element `.data` or `data` (containing all data) and `mask` (containing functions) will be in the environment when using dplyr verbs
|
||||
# we use their mask$current_rows() to get the group rows, since dplyr::cur_data_all() is deprecated and will be removed in the future
|
||||
# we use their mask$current_rows() below to get the group rows, since dplyr::cur_data_all() is deprecated and will be removed in the future
|
||||
# e.g. for `example_isolates %>% group_by(ward) %>% mutate(first = first_isolate(.))`
|
||||
if (valid_df(env$data)) {
|
||||
# support for dplyr 1.1.x
|
||||
@ -1008,6 +1008,9 @@ get_current_data <- function(arg_name, call) {
|
||||
if (valid_df(env$`.data`)) {
|
||||
# an element `.data` will be in the environment when using dplyr::select()
|
||||
return(env$`.data`)
|
||||
} else if (valid_df(env$data)) {
|
||||
# an element `data` will be in the environment when using older dplyr versions, or tidymodels
|
||||
return(env$data)
|
||||
} else if (valid_df(env$xx)) {
|
||||
# an element `xx` will be in the environment for rows + cols in base R, e.g. `example_isolates[c(1:3), carbapenems()]`
|
||||
return(env$xx)
|
||||
|
243
R/mdro.R
243
R/mdro.R
@ -32,6 +32,12 @@
|
||||
#' Determine which isolates are multidrug-resistant organisms (MDRO) according to international, national, or custom guidelines.
|
||||
#' @param x a [data.frame] with antibiotics columns, like `AMX` or `amox`. Can be left blank for automatic determination.
|
||||
#' @param guideline a specific guideline to follow, see sections *Supported international / national guidelines* and *Using Custom Guidelines* below. When left empty, the publication by Magiorakos *et al.* (see below) will be followed.
|
||||
#' @param esbl [logical] values, or a column name containing logical values, indicating the presence of an ESBL gene (or production of its proteins)
|
||||
#' @param carbapenemase [logical] values, or a column name containing logical values, indicating the presence of a carbapenemase gene (or production of its proteins)
|
||||
#' @param mecA [logical] values, or a column name containing logical values, indicating the presence of a *mecA* gene (or production of its proteins)
|
||||
#' @param mecC [logical] values, or a column name containing logical values, indicating the presence of a *mecC* gene (or production of its proteins)
|
||||
#' @param vanA [logical] values, or a column name containing logical values, indicating the presence of a *vanA* gene (or production of its proteins)
|
||||
#' @param vanB [logical] values, or a column name containing logical values, indicating the presence of a *vanB* gene (or production of its proteins)
|
||||
#' @param ... in case of [custom_mdro_guideline()]: a set of rules, see section *Using Custom Guidelines* below. Otherwise: column name of an antibiotic, see section *Antibiotics* below.
|
||||
#' @param as_factor a [logical] to indicate whether the returned value should be an ordered [factor] (`TRUE`, default), or otherwise a [character] vector
|
||||
#' @inheritParams eucast_rules
|
||||
@ -177,6 +183,12 @@
|
||||
mdro <- function(x = NULL,
|
||||
guideline = "CMI2012",
|
||||
col_mo = NULL,
|
||||
esbl = NA,
|
||||
carbapenemase = NA,
|
||||
mecA = NA,
|
||||
mecC = NA,
|
||||
vanA = NA,
|
||||
vanB = NA,
|
||||
info = interactive(),
|
||||
pct_required_classes = 0.5,
|
||||
combine_SI = TRUE,
|
||||
@ -190,9 +202,13 @@ mdro <- function(x = NULL,
|
||||
}
|
||||
meet_criteria(x, allow_class = "data.frame") # also checks dimensions to be >0
|
||||
meet_criteria(guideline, allow_class = c("list", "character"), allow_NULL = TRUE)
|
||||
if (!is.list(guideline)) {
|
||||
meet_criteria(guideline, allow_class = "character", has_length = 1, allow_NULL = TRUE)
|
||||
}
|
||||
if (!is.list(guideline)) meet_criteria(guideline, allow_class = "character", has_length = 1, allow_NULL = TRUE)
|
||||
meet_criteria(esbl, allow_class = c("logical", "character"), allow_NA = TRUE)
|
||||
meet_criteria(carbapenemase, allow_class = c("logical", "character"), allow_NA = TRUE)
|
||||
meet_criteria(mecA, allow_class = c("logical", "character"), allow_NA = TRUE)
|
||||
meet_criteria(mecC, allow_class = c("logical", "character"), allow_NA = TRUE)
|
||||
meet_criteria(vanA, allow_class = c("logical", "character"), allow_NA = TRUE)
|
||||
meet_criteria(vanB, allow_class = c("logical", "character"), allow_NA = TRUE)
|
||||
meet_criteria(col_mo, allow_class = "character", has_length = 1, is_in = colnames(x), allow_NULL = TRUE)
|
||||
meet_criteria(info, allow_class = "logical", has_length = 1)
|
||||
meet_criteria(pct_required_classes, allow_class = "numeric", has_length = 1)
|
||||
@ -203,7 +219,51 @@ mdro <- function(x = NULL,
|
||||
if (!any(is_sir_eligible(x))) {
|
||||
stop_("There were no possible SIR columns found in the data set. Transform columns with `as.sir()` for valid antimicrobial interpretations.")
|
||||
}
|
||||
|
||||
|
||||
# get gene values as TRUE/FALSE
|
||||
if (is.character(esbl)) {
|
||||
meet_criteria(esbl, is_in = colnames(x), allow_NA = FALSE, has_length = 1)
|
||||
esbl <- x[[esbl]]
|
||||
meet_criteria(esbl, allow_class = "logical", allow_NA = TRUE)
|
||||
} else if (length(esbl) == 1) {
|
||||
esbl <- rep(esbl, NROW(x))
|
||||
}
|
||||
if (is.character(carbapenemase)) {
|
||||
meet_criteria(carbapenemase, is_in = colnames(x), allow_NA = FALSE, has_length = 1)
|
||||
carbapenemase <- x[[carbapenemase]]
|
||||
meet_criteria(carbapenemase, allow_class = "logical", allow_NA = TRUE)
|
||||
} else if (length(carbapenemase) == 1) {
|
||||
carbapenemase <- rep(carbapenemase, NROW(x))
|
||||
}
|
||||
if (is.character(mecA)) {
|
||||
meet_criteria(mecA, is_in = colnames(x), allow_NA = FALSE, has_length = 1)
|
||||
mecA <- x[[mecA]]
|
||||
meet_criteria(mecA, allow_class = "logical", allow_NA = TRUE)
|
||||
} else if (length(mecA) == 1) {
|
||||
mecA <- rep(mecA, NROW(x))
|
||||
}
|
||||
if (is.character(mecC)) {
|
||||
meet_criteria(mecC, is_in = colnames(x), allow_NA = FALSE, has_length = 1)
|
||||
mecC <- x[[mecC]]
|
||||
meet_criteria(mecC, allow_class = "logical", allow_NA = TRUE)
|
||||
} else if (length(mecC) == 1) {
|
||||
mecC <- rep(mecC, NROW(x))
|
||||
}
|
||||
if (is.character(vanA)) {
|
||||
meet_criteria(vanA, is_in = colnames(x), allow_NA = FALSE, has_length = 1)
|
||||
vanA <- x[[vanA]]
|
||||
meet_criteria(vanA, allow_class = "logical", allow_NA = TRUE)
|
||||
} else if (length(vanA) == 1) {
|
||||
vanA <- rep(vanA, NROW(x))
|
||||
}
|
||||
if (is.character(vanB)) {
|
||||
meet_criteria(vanB, is_in = colnames(x), allow_NA = FALSE, has_length = 1)
|
||||
vanB <- x[[vanB]]
|
||||
meet_criteria(vanB, allow_class = "logical", allow_NA = TRUE)
|
||||
} else if (length(vanB) == 1) {
|
||||
vanB <- rep(vanB, NROW(x))
|
||||
}
|
||||
|
||||
info.bak <- info
|
||||
# don't throw info's more than once per call
|
||||
if (isTRUE(info)) {
|
||||
@ -476,7 +536,7 @@ mdro <- function(x = NULL,
|
||||
if (!"AMP" %in% names(cols_ab) && "AMX" %in% names(cols_ab)) {
|
||||
# ampicillin column is missing, but amoxicillin is available
|
||||
if (isTRUE(info)) {
|
||||
message_("Using column '", cols_ab[names(cols_ab) == "AMX"], "' as input for ampicillin since many EUCAST rules depend on it.")
|
||||
message_("Using column '", cols_ab[names(cols_ab) == "AMX"], "' as input for ampicillin since many MDRO rules depend on it.")
|
||||
}
|
||||
cols_ab <- c(cols_ab, c(AMP = unname(cols_ab[names(cols_ab) == "AMX"])))
|
||||
}
|
||||
@ -663,6 +723,17 @@ mdro <- function(x = NULL,
|
||||
out[is.na(out)] <- FALSE
|
||||
out
|
||||
}
|
||||
col_values <- function(df, col, return_if_lacking = "") {
|
||||
if (col %in% colnames(df)) {
|
||||
df[[col]]
|
||||
} else {
|
||||
rep(return_if_lacking, NROW(df))
|
||||
}
|
||||
}
|
||||
NA_as_FALSE <- function(x) {
|
||||
x[is.na(x)] <- FALSE
|
||||
x
|
||||
}
|
||||
|
||||
# antibiotic classes
|
||||
# nolint start
|
||||
@ -677,6 +748,10 @@ mdro <- function(x = NULL,
|
||||
|
||||
# helper function for editing the table
|
||||
trans_tbl <- function(to, rows, cols, any_all, reason = NULL) {
|
||||
cols.bak <- cols
|
||||
if (identical(cols, "any")) {
|
||||
cols <- unique(cols_ab)
|
||||
}
|
||||
cols <- cols[!ab_missing(cols)]
|
||||
cols <- cols[!is.na(cols)]
|
||||
if (length(rows) > 0 && length(cols) > 0) {
|
||||
@ -690,7 +765,7 @@ mdro <- function(x = NULL,
|
||||
x[rows, "columns_nonsusceptible"] <<- vapply(
|
||||
FUN.VALUE = character(1),
|
||||
rows,
|
||||
function(row, group_vct = cols) {
|
||||
function(row, group_vct = cols_ab) {
|
||||
cols_nonsus <- vapply(
|
||||
FUN.VALUE = logical(1),
|
||||
x[row, group_vct, drop = FALSE],
|
||||
@ -717,7 +792,7 @@ mdro <- function(x = NULL,
|
||||
rows_affected <- vapply(
|
||||
FUN.VALUE = logical(1),
|
||||
x_transposed,
|
||||
function(y) search_function(y %in% search_result, na.rm = TRUE)
|
||||
function(y) search_function(y %in% search_result, na.rm = TRUE) | identical(cols.bak, "any")
|
||||
)
|
||||
rows_affected <- x[which(rows_affected), "row_number", drop = TRUE]
|
||||
rows_to_change <- rows[rows %in% rows_affected]
|
||||
@ -1449,62 +1524,83 @@ mdro <- function(x = NULL,
|
||||
if (length(ESBLs) > 0) {
|
||||
trans_tbl(
|
||||
2, # positive, unconfirmed
|
||||
which(x$order == "Enterobacterales" & x[[ESBLs[1]]] == "R" & x[[ESBLs[2]]] == "R"),
|
||||
c(AMX %or% AMP, cephalosporins_3rd),
|
||||
"all",
|
||||
reason = "Enterobacterales: ESBL"
|
||||
rows = which(x$order == "Enterobacterales" & x[[ESBLs[1]]] == "R" & x[[ESBLs[2]]] == "R" & is.na(esbl)),
|
||||
cols = c(AMX %or% AMP, cephalosporins_3rd),
|
||||
any_all = "all",
|
||||
reason = "Enterobacterales: potential ESBL"
|
||||
)
|
||||
}
|
||||
trans_tbl(
|
||||
3, # positive
|
||||
which(x$order == "Enterobacterales" & (x$genus %in% c("Proteus", "Providencia") | paste(x$genus, x$species) %in% c("Serratia marcescens", "Morganella morganii"))),
|
||||
carbapenems_without_imipenem,
|
||||
"any",
|
||||
reason = "Enterobacterales: carbapenem or carbapenemase"
|
||||
rows = which(x$order == "Enterobacterales" & esbl == TRUE),
|
||||
cols = "any",
|
||||
any_all = "any",
|
||||
reason = "Enterobacterales: ESBL"
|
||||
)
|
||||
trans_tbl(
|
||||
3,
|
||||
which(x$order == "Enterobacterales" & !(x$genus %in% c("Proteus", "Providencia") | paste(x$genus, x$species) %in% c("Serratia marcescens", "Morganella morganii"))),
|
||||
carbapenems,
|
||||
"any",
|
||||
reason = "Enterobacterales: carbapenem or carbapenemase"
|
||||
rows = which(x$order == "Enterobacterales" & (x$genus %in% c("Proteus", "Providencia") | paste(x$genus, x$species) %in% c("Serratia marcescens", "Morganella morganii"))),
|
||||
cols = carbapenems_without_imipenem,
|
||||
any_all = "any",
|
||||
reason = "Enterobacterales: carbapenem resistance"
|
||||
)
|
||||
trans_tbl(
|
||||
3,
|
||||
which(x[[SXT]] == "R" &
|
||||
rows = which(x$order == "Enterobacterales" & !(x$genus %in% c("Proteus", "Providencia") | paste(x$genus, x$species) %in% c("Serratia marcescens", "Morganella morganii"))),
|
||||
cols = carbapenems,
|
||||
any_all = "any",
|
||||
reason = "Enterobacterales: carbapenem resistance"
|
||||
)
|
||||
trans_tbl(
|
||||
3,
|
||||
rows = which(x$order == "Enterobacterales" & carbapenemase == TRUE),
|
||||
cols = "any",
|
||||
any_all = "any",
|
||||
reason = "Enterobacterales: carbapenemase"
|
||||
)
|
||||
trans_tbl(
|
||||
3,
|
||||
rows = which(x[[SXT]] == "R" &
|
||||
(x[[GEN]] == "R" | x[[TOB]] == "R" | x[[AMK]] == "R") &
|
||||
(x[[CIP]] == "R" | x[[NOR]] == "R" | x[[LVX]] == "R") &
|
||||
(x$genus %in% c("Enterobacter", "Providencia") | paste(x$genus, x$species) %in% c("Citrobacter freundii", "Klebsiella aerogenes", "Hafnia alvei", "Morganella morganii"))),
|
||||
c(SXT, aminoglycosides, fluoroquinolones),
|
||||
"any",
|
||||
cols = c(SXT, aminoglycosides, fluoroquinolones),
|
||||
any_all = "any",
|
||||
reason = "Enterobacterales group II: aminoglycoside + fluoroquinolone + cotrimoxazol"
|
||||
)
|
||||
trans_tbl(
|
||||
3,
|
||||
which(x[[SXT]] == "R" &
|
||||
rows = which(x[[SXT]] == "R" &
|
||||
x[[GEN]] == "R" &
|
||||
(x[[CIP]] == "R" | x[[NOR]] == "R" | x[[LVX]] == "R") &
|
||||
paste(x$genus, x$species) == "Serratia marcescens"),
|
||||
c(SXT, aminoglycosides_serratia_marcescens, fluoroquinolones),
|
||||
"any",
|
||||
cols = c(SXT, aminoglycosides_serratia_marcescens, fluoroquinolones),
|
||||
any_all = "any",
|
||||
reason = "Enterobacterales group II: aminoglycoside + fluoroquinolone + cotrimoxazol"
|
||||
)
|
||||
|
||||
# Acinetobacter baumannii-calcoaceticus complex
|
||||
trans_tbl(
|
||||
3,
|
||||
which((x[[GEN]] == "R" | x[[TOB]] == "R" | x[[AMK]] == "R") &
|
||||
rows = which((x[[GEN]] == "R" | x[[TOB]] == "R" | x[[AMK]] == "R") &
|
||||
(x[[CIP]] == "R" | x[[LVX]] == "R") &
|
||||
x[[col_mo]] %in% AMR::microorganisms.groups$mo[AMR::microorganisms.groups$mo_group_name == "Acinetobacter baumannii complex"]),
|
||||
c(aminoglycosides, CIP, LVX),
|
||||
"any",
|
||||
cols = c(aminoglycosides, CIP, LVX),
|
||||
any_all = "any",
|
||||
reason = "A. baumannii-calcoaceticus complex: aminoglycoside + ciprofloxacin or levofloxacin"
|
||||
)
|
||||
trans_tbl(
|
||||
2, # unconfirmed
|
||||
which(x[[col_mo]] %in% AMR::microorganisms.groups$mo[AMR::microorganisms.groups$mo_group_name == "Acinetobacter baumannii complex"]),
|
||||
carbapenems,
|
||||
"any",
|
||||
rows = which(x[[col_mo]] %in% AMR::microorganisms.groups$mo[AMR::microorganisms.groups$mo_group_name == "Acinetobacter baumannii complex"] & is.na(carbapenemase)),
|
||||
cols = carbapenems,
|
||||
any_all = "any",
|
||||
reason = "A. baumannii-calcoaceticus complex: potential carbapenemase"
|
||||
)
|
||||
trans_tbl(
|
||||
3,
|
||||
rows = which(x[[col_mo]] %in% AMR::microorganisms.groups$mo[AMR::microorganisms.groups$mo_group_name == "Acinetobacter baumannii complex"] & carbapenemase == TRUE),
|
||||
cols = carbapenems,
|
||||
any_all = "any",
|
||||
reason = "A. baumannii-calcoaceticus complex: carbapenemase"
|
||||
)
|
||||
|
||||
@ -1513,59 +1609,65 @@ mdro <- function(x = NULL,
|
||||
# take pip/tazo if just pip is not available - many labs only test for pip/tazo because of availability on a Vitek card
|
||||
PIP <- TZP
|
||||
}
|
||||
if (!ab_missing(MEM) && !ab_missing(IPM) &&
|
||||
!ab_missing(GEN) && !ab_missing(TOB) &&
|
||||
!ab_missing(CIP) &&
|
||||
!ab_missing(CAZ) &&
|
||||
!ab_missing(PIP)) {
|
||||
x$psae <- 0
|
||||
x[which(x[, MEM, drop = TRUE] == "R" | x[, IPM, drop = TRUE] == "R"), "psae"] <- 1 + x[which(x[, MEM, drop = TRUE] == "R" | x[, IPM, drop = TRUE] == "R"), "psae"]
|
||||
x[which(x[, GEN, drop = TRUE] == "R" & x[, TOB, drop = TRUE] == "R"), "psae"] <- 1 + x[which(x[, GEN, drop = TRUE] == "R" & x[, TOB, drop = TRUE] == "R"), "psae"]
|
||||
x[which(x[, CIP, drop = TRUE] == "R"), "psae"] <- 1 + x[which(x[, CIP, drop = TRUE] == "R"), "psae"]
|
||||
x[which(x[, CAZ, drop = TRUE] == "R"), "psae"] <- 1 + x[which(x[, CAZ, drop = TRUE] == "R"), "psae"]
|
||||
x[which(x[, PIP, drop = TRUE] == "R"), "psae"] <- 1 + x[which(x[, PIP, drop = TRUE] == "R"), "psae"]
|
||||
} else {
|
||||
x$psae <- 0
|
||||
}
|
||||
x$psae <- 0
|
||||
x$psae <- x$psae + ifelse(NA_as_FALSE(col_values(x, TOB) == "R" | col_values(x, AMK) == "R"), 1, 0)
|
||||
x$psae <- x$psae + ifelse(NA_as_FALSE(col_values(x, IPM) == "R" | col_values(x, MEM) == "R"), 1, 0)
|
||||
x$psae <- x$psae + ifelse(NA_as_FALSE(col_values(x, PIP) == "R"), 1, 0)
|
||||
x$psae <- x$psae + ifelse(NA_as_FALSE(col_values(x, CAZ) == "R"), 1, 0)
|
||||
x$psae <- x$psae + ifelse(NA_as_FALSE(col_values(x, CIP) == "R" | col_values(x, NOR) == "R" | col_values(x, LVX) == "R"), 1, 0)
|
||||
trans_tbl(
|
||||
3,
|
||||
which(x$genus == "Pseudomonas" & x$species == "aeruginosa"),
|
||||
c(CAZ, CIP, GEN, IPM, MEM, TOB, PIP),
|
||||
"all", # this will set all negatives to "guideline criteria not met" instead of "not covered by guideline"
|
||||
rows = which(x$genus == "Pseudomonas" & x$species == "aeruginosa"),
|
||||
cols = c(CAZ, CIP, GEN, IPM, MEM, TOB, PIP),
|
||||
any_all = "all", # this will set all negatives to "guideline criteria not met" instead of "not covered by guideline"
|
||||
reason = "P. aeruginosa: at least 3 classes contain R"
|
||||
)
|
||||
trans_tbl(
|
||||
3,
|
||||
which(x$genus == "Pseudomonas" & x$species == "aeruginosa" & x$psae >= 3),
|
||||
c(CAZ, CIP, GEN, IPM, MEM, TOB, PIP),
|
||||
"any", # this is the actual one, changing the ones with x$psae >= 3
|
||||
rows = which(x$genus == "Pseudomonas" & x$species == "aeruginosa" & x$psae >= 3),
|
||||
cols = c(CAZ, CIP, GEN, IPM, MEM, TOB, PIP),
|
||||
any_all = "any", # this is the actual one, changing the ones with x$psae >= 3
|
||||
reason = "P. aeruginosa: at least 3 classes contain R"
|
||||
)
|
||||
|
||||
# Enterococcus faecium
|
||||
trans_tbl(
|
||||
3,
|
||||
which(x$genus == "Enterococcus" & x$species == "faecium"),
|
||||
c(PEN %or% AMX %or% AMP, VAN),
|
||||
"all",
|
||||
reason = "E. faecium: vancomycin or vanA/vanB gene + penicillin group"
|
||||
rows = which(x$genus == "Enterococcus" & x$species == "faecium"),
|
||||
cols = c(PEN %or% AMX %or% AMP, VAN),
|
||||
any_all = "all",
|
||||
reason = "E. faecium: vancomycin + penicillin group"
|
||||
)
|
||||
trans_tbl(
|
||||
3,
|
||||
rows = which(x$genus == "Enterococcus" & x$species == "faecium" & (vanA == TRUE | vanB == TRUE)),
|
||||
cols = c(PEN, AMX, AMP, VAN),
|
||||
any_all = "any",
|
||||
reason = "E. faecium: vanA/vanB gene + penicillin group"
|
||||
)
|
||||
|
||||
# Staphylococcus aureus
|
||||
trans_tbl(
|
||||
2,
|
||||
which(x$genus == "Staphylococcus" & x$species == "aureus"),
|
||||
c(PEN, AMX, AMP, FLC, OXA, FOX, FOX1),
|
||||
"any",
|
||||
reason = "S. aureus: MRSA"
|
||||
rows = which(x$genus == "Staphylococcus" & x$species == "aureus" & (is.na(mecA) | is.na(mecC))),
|
||||
cols = c(AMC, TZP, FLC, OXA, FOX, FOX1),
|
||||
any_all = "any",
|
||||
reason = "S. aureus: potential MRSA"
|
||||
)
|
||||
trans_tbl(
|
||||
3,
|
||||
rows = which(x$genus == "Staphylococcus" & x$species == "aureus" & (mecA == TRUE | mecC == TRUE)),
|
||||
cols = "any",
|
||||
any_all = "any",
|
||||
reason = "S. aureus: mecA/mecC gene"
|
||||
)
|
||||
|
||||
# Candida auris
|
||||
trans_tbl(
|
||||
3,
|
||||
which(x$genus == "Candida" & x$species == "auris"),
|
||||
character(0),
|
||||
"any",
|
||||
rows = which(x$genus == "Candida" & x$species == "auris"),
|
||||
cols = "any",
|
||||
any_all = "any",
|
||||
reason = "C. auris: regardless of resistance"
|
||||
)
|
||||
}
|
||||
@ -2040,50 +2142,51 @@ brmo <- function(x = NULL, only_sir_columns = FALSE, ...) {
|
||||
mdro(x = x, only_sir_columns = only_sir_columns, guideline = "BRMO", ...)
|
||||
}
|
||||
|
||||
|
||||
#' @rdname mdro
|
||||
#' @export
|
||||
mrgn <- function(x = NULL, only_sir_columns = FALSE, ...) {
|
||||
mrgn <- function(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...) {
|
||||
meet_criteria(x, allow_class = "data.frame", allow_NULL = TRUE)
|
||||
meet_criteria(only_sir_columns, allow_class = "logical", has_length = 1)
|
||||
stop_if(
|
||||
"guideline" %in% names(list(...)),
|
||||
"argument `guideline` must not be set since this is a guideline-specific function"
|
||||
)
|
||||
mdro(x = x, only_sir_columns = only_sir_columns, guideline = "MRGN", ...)
|
||||
mdro(x = x, only_sir_columns = only_sir_columns, verbose = verbose, guideline = "MRGN", ...)
|
||||
}
|
||||
|
||||
#' @rdname mdro
|
||||
#' @export
|
||||
mdr_tb <- function(x = NULL, only_sir_columns = FALSE, ...) {
|
||||
mdr_tb <- function(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...) {
|
||||
meet_criteria(x, allow_class = "data.frame", allow_NULL = TRUE)
|
||||
meet_criteria(only_sir_columns, allow_class = "logical", has_length = 1)
|
||||
stop_if(
|
||||
"guideline" %in% names(list(...)),
|
||||
"argument `guideline` must not be set since this is a guideline-specific function"
|
||||
)
|
||||
mdro(x = x, only_sir_columns = only_sir_columns, guideline = "TB", ...)
|
||||
mdro(x = x, only_sir_columns = only_sir_columns, verbose = verbose, guideline = "TB", ...)
|
||||
}
|
||||
|
||||
#' @rdname mdro
|
||||
#' @export
|
||||
mdr_cmi2012 <- function(x = NULL, only_sir_columns = FALSE, ...) {
|
||||
mdr_cmi2012 <- function(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...) {
|
||||
meet_criteria(x, allow_class = "data.frame", allow_NULL = TRUE)
|
||||
meet_criteria(only_sir_columns, allow_class = "logical", has_length = 1)
|
||||
stop_if(
|
||||
"guideline" %in% names(list(...)),
|
||||
"argument `guideline` must not be set since this is a guideline-specific function"
|
||||
)
|
||||
mdro(x = x, only_sir_columns = only_sir_columns, guideline = "CMI2012", ...)
|
||||
mdro(x = x, only_sir_columns = only_sir_columns, verbose = verbose, guideline = "CMI2012", ...)
|
||||
}
|
||||
|
||||
#' @rdname mdro
|
||||
#' @export
|
||||
eucast_exceptional_phenotypes <- function(x = NULL, only_sir_columns = FALSE, ...) {
|
||||
eucast_exceptional_phenotypes <- function(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...) {
|
||||
meet_criteria(x, allow_class = "data.frame", allow_NULL = TRUE)
|
||||
meet_criteria(only_sir_columns, allow_class = "logical", has_length = 1)
|
||||
stop_if(
|
||||
"guideline" %in% names(list(...)),
|
||||
"argument `guideline` must not be set since this is a guideline-specific function"
|
||||
)
|
||||
mdro(x = x, only_sir_columns = only_sir_columns, guideline = "EUCAST", ...)
|
||||
mdro(x = x, only_sir_columns = only_sir_columns, verbose = verbose, guideline = "EUCAST", ...)
|
||||
}
|
||||
|
@ -80,6 +80,9 @@ navbar:
|
||||
- text: "Download Data Sets for Own Use"
|
||||
icon: "fa-database"
|
||||
href: "articles/datasets.html"
|
||||
- text: "Use AMR for Predictive Modelling (tidymodels)"
|
||||
icon: "fa-square-root-variable"
|
||||
href: "articles/AMR_with_tidymodels.html"
|
||||
- text: "Set User- Or Team-specific Package Settings"
|
||||
icon: "fa-gear"
|
||||
href: "reference/AMR-options.html"
|
||||
|
@ -1,5 +1,5 @@
|
||||
This files contains all context you must know about the AMR package for R.
|
||||
First and foremost, you are trained on version 2.1.1.9120. Remember this whenever someone asks which AMR package version you’re at.
|
||||
First and foremost, you are trained on version 2.1.1.9121. Remember this whenever someone asks which AMR package version you’re at.
|
||||
--------------------------------
|
||||
|
||||
THE PART HEREAFTER CONTAINS CONTENTS FROM FILE 'NAMESPACE':
|
||||
@ -6086,6 +6086,12 @@ mdro(
|
||||
x = NULL,
|
||||
guideline = "CMI2012",
|
||||
col_mo = NULL,
|
||||
esbl = NA,
|
||||
carbapenemase = NA,
|
||||
mecA = NA,
|
||||
mecC = NA,
|
||||
vanA = NA,
|
||||
vanB = NA,
|
||||
info = interactive(),
|
||||
pct_required_classes = 0.5,
|
||||
combine_SI = TRUE,
|
||||
@ -6098,13 +6104,18 @@ custom_mdro_guideline(..., as_factor = TRUE)
|
||||
|
||||
brmo(x = NULL, only_sir_columns = FALSE, ...)
|
||||
|
||||
mrgn(x = NULL, only_sir_columns = FALSE, ...)
|
||||
mrgn(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...)
|
||||
|
||||
mdr_tb(x = NULL, only_sir_columns = FALSE, ...)
|
||||
mdr_tb(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...)
|
||||
|
||||
mdr_cmi2012(x = NULL, only_sir_columns = FALSE, ...)
|
||||
mdr_cmi2012(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...)
|
||||
|
||||
eucast_exceptional_phenotypes(x = NULL, only_sir_columns = FALSE, ...)
|
||||
eucast_exceptional_phenotypes(
|
||||
x = NULL,
|
||||
only_sir_columns = FALSE,
|
||||
verbose = FALSE,
|
||||
...
|
||||
)
|
||||
}
|
||||
\arguments{
|
||||
\item{x}{a \link{data.frame} with antibiotics columns, like \code{AMX} or \code{amox}. Can be left blank for automatic determination.}
|
||||
@ -6113,6 +6124,18 @@ eucast_exceptional_phenotypes(x = NULL, only_sir_columns = FALSE, ...)
|
||||
|
||||
\item{col_mo}{column name of the names or codes of the microorganisms (see \code{\link[=as.mo]{as.mo()}}) - the default is the first column of class \code{\link{mo}}. Values will be coerced using \code{\link[=as.mo]{as.mo()}}.}
|
||||
|
||||
\item{esbl}{\link{logical} values, or a column name containing logical values, indicating the presence of an ESBL gene (or production of its proteins)}
|
||||
|
||||
\item{carbapenemase}{\link{logical} values, or a column name containing logical values, indicating the presence of a carbapenemase gene (or production of its proteins)}
|
||||
|
||||
\item{mecA}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{mecA} gene (or production of its proteins)}
|
||||
|
||||
\item{mecC}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{mecC} gene (or production of its proteins)}
|
||||
|
||||
\item{vanA}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{vanA} gene (or production of its proteins)}
|
||||
|
||||
\item{vanB}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{vanB} gene (or production of its proteins)}
|
||||
|
||||
\item{info}{a \link{logical} to indicate whether progress should be printed to the console - the default is only print while in interactive sessions}
|
||||
|
||||
\item{pct_required_classes}{minimal required percentage of antimicrobial classes that must be available per isolate, rounded down. For example, with the default guideline, 17 antimicrobial classes must be available for \emph{S. aureus}. Setting this \code{pct_required_classes} argument to \code{0.5} (default) means that for every \emph{S. aureus} isolate at least 8 different classes must be available. Any lower number of available classes will return \code{NA} for that isolate.}
|
||||
@ -8885,6 +8908,203 @@ Whether you're cleaning data or analysing resistance patterns, the `AMR` Python
|
||||
|
||||
|
||||
|
||||
THE PART HEREAFTER CONTAINS CONTENTS FROM FILE 'vignettes/AMR_with_tidymodels.Rmd':
|
||||
|
||||
|
||||
---
|
||||
title: "`AMR` with `tidymodels`"
|
||||
output:
|
||||
rmarkdown::html_vignette:
|
||||
toc: true
|
||||
toc_depth: 3
|
||||
vignette: >
|
||||
%\VignetteIndexEntry{`AMR` with `tidymodels`}
|
||||
%\VignetteEncoding{UTF-8}
|
||||
%\VignetteEngine{knitr::rmarkdown}
|
||||
editor_options:
|
||||
chunk_output_type: console
|
||||
---
|
||||
|
||||
```{r setup, include = FALSE, results = 'markup'}
|
||||
knitr::opts_chunk$set(
|
||||
warning = FALSE,
|
||||
collapse = TRUE,
|
||||
comment = "#>",
|
||||
fig.width = 7.5,
|
||||
fig.height = 5
|
||||
)
|
||||
```
|
||||
|
||||
Antimicrobial resistance (AMR) is a global health crisis, and understanding resistance patterns is crucial for managing effective treatments. The `AMR` R package provides robust tools for analysing AMR data, including convenient antibiotic selector functions like `aminoglycosides()` and `betalactams()`. In this post, we will explore how to use the `tidymodels` framework to predict resistance patterns in the `example_isolates` dataset.
|
||||
|
||||
By leveraging the power of `tidymodels` and the `AMR` package, we’ll build a reproducible machine learning workflow to predict resistance to two important antibiotic classes: aminoglycosides and beta-lactams.
|
||||
|
||||
---
|
||||
|
||||
### **Objective**
|
||||
|
||||
Our goal is to build a predictive model using the `tidymodels` framework to determine resistance patterns based on microbial data. We will:
|
||||
|
||||
1. Preprocess data using the selector functions `aminoglycosides()` and `betalactams()`.
|
||||
2. Define a logistic regression model for prediction.
|
||||
3. Use a structured `tidymodels` workflow to preprocess, train, and evaluate the model.
|
||||
|
||||
---
|
||||
|
||||
### **Data Preparation**
|
||||
|
||||
We begin by loading the required libraries and preparing the `example_isolates` dataset from the `AMR` package.
|
||||
|
||||
```{r}
|
||||
# Load required libraries
|
||||
library(tidymodels) # For machine learning workflows, and data manipulation (dplyr, tidyr, ...)
|
||||
library(AMR) # For AMR data analysis
|
||||
|
||||
# Load the example_isolates dataset
|
||||
data("example_isolates") # Preloaded dataset with AMR results
|
||||
|
||||
# Select relevant columns for prediction
|
||||
data <- example_isolates %>%
|
||||
# select AB results dynamically
|
||||
select(mo, aminoglycosides(), betalactams()) %>%
|
||||
# replace NAs with NI (not-interpretable)
|
||||
mutate(across(where(is.sir),
|
||||
~replace_na(.x, "NI")),
|
||||
# make factors of SIR columns
|
||||
across(where(is.sir),
|
||||
as.integer),
|
||||
# get Gramstain of microorganisms
|
||||
mo = as.factor(mo_gramstain(mo))) %>%
|
||||
# drop NAs - the ones without a Gramstain (fungi, etc.)
|
||||
drop_na() # %>%
|
||||
# Cefepime is not reliable
|
||||
#select(-FEP)
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
- `aminoglycosides()` and `betalactams()` dynamically select columns for antibiotics in these classes.
|
||||
- `drop_na()` ensures the model receives complete cases for training.
|
||||
|
||||
---
|
||||
|
||||
### **Defining the Workflow**
|
||||
|
||||
We now define the `tidymodels` workflow, which consists of three steps: preprocessing, model specification, and fitting.
|
||||
|
||||
#### 1. Preprocessing with a Recipe
|
||||
|
||||
We create a recipe to preprocess the data for modelling. This includes:
|
||||
- Encoding resistance results (`S`, `I`, `R`) as binary (resistant or not resistant).
|
||||
- Converting microbial organism names (`mo`) into numerical features using one-hot encoding.
|
||||
|
||||
```{r}
|
||||
# Define the recipe for data preprocessing
|
||||
resistance_recipe <- recipe(mo ~ ., data = data) %>%
|
||||
step_corr(c(aminoglycosides(), betalactams()), threshold = 0.9)
|
||||
resistance_recipe
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
- `step_mutate()` transforms resistance results (`R`) into binary variables (TRUE/FALSE).
|
||||
- `step_dummy()` converts categorical organism (`mo`) names into one-hot encoded numerical features, making them compatible with the model.
|
||||
|
||||
#### 2. Specifying the Model
|
||||
|
||||
We define a logistic regression model since resistance prediction is a binary classification task.
|
||||
|
||||
```{r}
|
||||
# Specify a logistic regression model
|
||||
logistic_model <- logistic_reg() %>%
|
||||
set_engine("glm") # Use the Generalized Linear Model engine
|
||||
logistic_model
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
- `logistic_reg()` sets up a logistic regression model.
|
||||
- `set_engine("glm")` specifies the use of R's built-in GLM engine.
|
||||
|
||||
#### 3. Building the Workflow
|
||||
|
||||
We bundle the recipe and model together into a `workflow`, which organizes the entire modeling process.
|
||||
|
||||
```{r}
|
||||
# Combine the recipe and model into a workflow
|
||||
resistance_workflow <- workflow() %>%
|
||||
add_recipe(resistance_recipe) %>% # Add the preprocessing recipe
|
||||
add_model(logistic_model) # Add the logistic regression model
|
||||
resistance_workflow
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### **Training and Evaluating the Model**
|
||||
|
||||
To train the model, we split the data into training and testing sets. Then, we fit the workflow on the training set and evaluate its performance.
|
||||
|
||||
```{r}
|
||||
# Split data into training and testing sets
|
||||
set.seed(123) # For reproducibility
|
||||
data_split <- initial_split(data, prop = 0.8) # 80% training, 20% testing
|
||||
training_data <- training(data_split) # Training set
|
||||
testing_data <- testing(data_split) # Testing set
|
||||
|
||||
# Fit the workflow to the training data
|
||||
fitted_workflow <- resistance_workflow %>%
|
||||
fit(training_data) # Train the model
|
||||
|
||||
fitted_workflow
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
- `initial_split()` splits the data into training and testing sets.
|
||||
- `fit()` trains the workflow on the training set.
|
||||
|
||||
Next, we evaluate the model on the testing data.
|
||||
|
||||
```{r}
|
||||
# Make predictions on the testing set
|
||||
predictions <- fitted_workflow %>%
|
||||
predict(testing_data) # Generate predictions
|
||||
probabilities <- fitted_workflow %>%
|
||||
predict(testing_data, type = "prob") # Generate probabilities
|
||||
|
||||
predictions <- predictions %>%
|
||||
bind_cols(probabilities) %>%
|
||||
bind_cols(testing_data) # Combine with true labels
|
||||
|
||||
predictions
|
||||
|
||||
# Evaluate model performance
|
||||
metrics <- predictions %>%
|
||||
metrics(truth = mo, estimate = .pred_class) # Calculate performance metrics
|
||||
|
||||
metrics
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
- `predict()` generates predictions on the testing set.
|
||||
- `metrics()` computes evaluation metrics like accuracy and AUC.
|
||||
|
||||
It appears we can predict the Gram based on AMR results with a `r round(metrics$.estimate[1], 3)` accuracy. The ROC curve looks like:
|
||||
|
||||
```{r}
|
||||
predictions %>%
|
||||
roc_curve(mo, `.pred_Gram-negative`) %>%
|
||||
autoplot()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### **Conclusion**
|
||||
|
||||
In this post, we demonstrated how to build a machine learning pipeline with the `tidymodels` framework and the `AMR` package. By combining selector functions like `aminoglycosides()` and `betalactams()` with `tidymodels`, we efficiently prepared data, trained a model, and evaluated its performance.
|
||||
|
||||
This workflow is extensible to other antibiotic classes and resistance patterns, empowering users to analyse AMR data systematically and reproducibly.
|
||||
|
||||
---
|
||||
|
||||
|
||||
|
||||
THE PART HEREAFTER CONTAINS CONTENTS FROM FILE 'vignettes/EUCAST.Rmd':
|
||||
|
||||
|
@ -45,7 +45,7 @@ expect_identical(class(outcome), c("ordered", "factor"))
|
||||
# example_isolates should have these finding using Dutch guidelines
|
||||
expect_equal(
|
||||
as.double(table(outcome)),
|
||||
c(1994, 0, 6)
|
||||
c(1977, 23, 0)
|
||||
)
|
||||
|
||||
expect_equal(
|
||||
|
31
man/mdro.Rd
31
man/mdro.Rd
@ -23,6 +23,12 @@ mdro(
|
||||
x = NULL,
|
||||
guideline = "CMI2012",
|
||||
col_mo = NULL,
|
||||
esbl = NA,
|
||||
carbapenemase = NA,
|
||||
mecA = NA,
|
||||
mecC = NA,
|
||||
vanA = NA,
|
||||
vanB = NA,
|
||||
info = interactive(),
|
||||
pct_required_classes = 0.5,
|
||||
combine_SI = TRUE,
|
||||
@ -35,13 +41,18 @@ custom_mdro_guideline(..., as_factor = TRUE)
|
||||
|
||||
brmo(x = NULL, only_sir_columns = FALSE, ...)
|
||||
|
||||
mrgn(x = NULL, only_sir_columns = FALSE, ...)
|
||||
mrgn(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...)
|
||||
|
||||
mdr_tb(x = NULL, only_sir_columns = FALSE, ...)
|
||||
mdr_tb(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...)
|
||||
|
||||
mdr_cmi2012(x = NULL, only_sir_columns = FALSE, ...)
|
||||
mdr_cmi2012(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...)
|
||||
|
||||
eucast_exceptional_phenotypes(x = NULL, only_sir_columns = FALSE, ...)
|
||||
eucast_exceptional_phenotypes(
|
||||
x = NULL,
|
||||
only_sir_columns = FALSE,
|
||||
verbose = FALSE,
|
||||
...
|
||||
)
|
||||
}
|
||||
\arguments{
|
||||
\item{x}{a \link{data.frame} with antibiotics columns, like \code{AMX} or \code{amox}. Can be left blank for automatic determination.}
|
||||
@ -50,6 +61,18 @@ eucast_exceptional_phenotypes(x = NULL, only_sir_columns = FALSE, ...)
|
||||
|
||||
\item{col_mo}{column name of the names or codes of the microorganisms (see \code{\link[=as.mo]{as.mo()}}) - the default is the first column of class \code{\link{mo}}. Values will be coerced using \code{\link[=as.mo]{as.mo()}}.}
|
||||
|
||||
\item{esbl}{\link{logical} values, or a column name containing logical values, indicating the presence of an ESBL gene (or production of its proteins)}
|
||||
|
||||
\item{carbapenemase}{\link{logical} values, or a column name containing logical values, indicating the presence of a carbapenemase gene (or production of its proteins)}
|
||||
|
||||
\item{mecA}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{mecA} gene (or production of its proteins)}
|
||||
|
||||
\item{mecC}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{mecC} gene (or production of its proteins)}
|
||||
|
||||
\item{vanA}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{vanA} gene (or production of its proteins)}
|
||||
|
||||
\item{vanB}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{vanB} gene (or production of its proteins)}
|
||||
|
||||
\item{info}{a \link{logical} to indicate whether progress should be printed to the console - the default is only print while in interactive sessions}
|
||||
|
||||
\item{pct_required_classes}{minimal required percentage of antimicrobial classes that must be available per isolate, rounded down. For example, with the default guideline, 17 antimicrobial classes must be available for \emph{S. aureus}. Setting this \code{pct_required_classes} argument to \code{0.5} (default) means that for every \emph{S. aureus} isolate at least 8 different classes must be available. Any lower number of available classes will return \code{NA} for that isolate.}
|
||||
|
191
vignettes/AMR_with_tidymodels.Rmd
Normal file
191
vignettes/AMR_with_tidymodels.Rmd
Normal file
@ -0,0 +1,191 @@
|
||||
---
|
||||
title: "`AMR` with `tidymodels`"
|
||||
output:
|
||||
rmarkdown::html_vignette:
|
||||
toc: true
|
||||
toc_depth: 3
|
||||
vignette: >
|
||||
%\VignetteIndexEntry{`AMR` with `tidymodels`}
|
||||
%\VignetteEncoding{UTF-8}
|
||||
%\VignetteEngine{knitr::rmarkdown}
|
||||
editor_options:
|
||||
chunk_output_type: console
|
||||
---
|
||||
|
||||
```{r setup, include = FALSE, results = 'markup'}
|
||||
knitr::opts_chunk$set(
|
||||
warning = FALSE,
|
||||
collapse = TRUE,
|
||||
comment = "#>",
|
||||
fig.width = 7.5,
|
||||
fig.height = 5
|
||||
)
|
||||
```
|
||||
|
||||
Antimicrobial resistance (AMR) is a global health crisis, and understanding resistance patterns is crucial for managing effective treatments. The `AMR` R package provides robust tools for analysing AMR data, including convenient antibiotic selector functions like `aminoglycosides()` and `betalactams()`. In this post, we will explore how to use the `tidymodels` framework to predict resistance patterns in the `example_isolates` dataset.
|
||||
|
||||
By leveraging the power of `tidymodels` and the `AMR` package, we’ll build a reproducible machine learning workflow to predict resistance to two important antibiotic classes: aminoglycosides and beta-lactams.
|
||||
|
||||
---
|
||||
|
||||
### **Objective**
|
||||
|
||||
Our goal is to build a predictive model using the `tidymodels` framework to determine resistance patterns based on microbial data. We will:
|
||||
|
||||
1. Preprocess data using the selector functions `aminoglycosides()` and `betalactams()`.
|
||||
2. Define a logistic regression model for prediction.
|
||||
3. Use a structured `tidymodels` workflow to preprocess, train, and evaluate the model.
|
||||
|
||||
---
|
||||
|
||||
### **Data Preparation**
|
||||
|
||||
We begin by loading the required libraries and preparing the `example_isolates` dataset from the `AMR` package.
|
||||
|
||||
```{r}
|
||||
# Load required libraries
|
||||
library(tidymodels) # For machine learning workflows, and data manipulation (dplyr, tidyr, ...)
|
||||
library(AMR) # For AMR data analysis
|
||||
|
||||
# Load the example_isolates dataset
|
||||
data("example_isolates") # Preloaded dataset with AMR results
|
||||
|
||||
# Select relevant columns for prediction
|
||||
data <- example_isolates %>%
|
||||
# select AB results dynamically
|
||||
select(mo, aminoglycosides(), betalactams()) %>%
|
||||
# replace NAs with NI (not-interpretable)
|
||||
mutate(across(where(is.sir),
|
||||
~replace_na(.x, "NI")),
|
||||
# make factors of SIR columns
|
||||
across(where(is.sir),
|
||||
as.integer),
|
||||
# get Gramstain of microorganisms
|
||||
mo = as.factor(mo_gramstain(mo))) %>%
|
||||
# drop NAs - the ones without a Gramstain (fungi, etc.)
|
||||
drop_na() # %>%
|
||||
# Cefepime is not reliable
|
||||
#select(-FEP)
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
- `aminoglycosides()` and `betalactams()` dynamically select columns for antibiotics in these classes.
|
||||
- `drop_na()` ensures the model receives complete cases for training.
|
||||
|
||||
---
|
||||
|
||||
### **Defining the Workflow**
|
||||
|
||||
We now define the `tidymodels` workflow, which consists of three steps: preprocessing, model specification, and fitting.
|
||||
|
||||
#### 1. Preprocessing with a Recipe
|
||||
|
||||
We create a recipe to preprocess the data for modelling. This includes:
|
||||
- Encoding resistance results (`S`, `I`, `R`) as binary (resistant or not resistant).
|
||||
- Converting microbial organism names (`mo`) into numerical features using one-hot encoding.
|
||||
|
||||
```{r}
|
||||
# Define the recipe for data preprocessing
|
||||
resistance_recipe <- recipe(mo ~ ., data = data) %>%
|
||||
step_corr(c(aminoglycosides(), betalactams()), threshold = 0.9)
|
||||
resistance_recipe
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
- `step_mutate()` transforms resistance results (`R`) into binary variables (TRUE/FALSE).
|
||||
- `step_dummy()` converts categorical organism (`mo`) names into one-hot encoded numerical features, making them compatible with the model.
|
||||
|
||||
#### 2. Specifying the Model
|
||||
|
||||
We define a logistic regression model since resistance prediction is a binary classification task.
|
||||
|
||||
```{r}
|
||||
# Specify a logistic regression model
|
||||
logistic_model <- logistic_reg() %>%
|
||||
set_engine("glm") # Use the Generalized Linear Model engine
|
||||
logistic_model
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
- `logistic_reg()` sets up a logistic regression model.
|
||||
- `set_engine("glm")` specifies the use of R's built-in GLM engine.
|
||||
|
||||
#### 3. Building the Workflow
|
||||
|
||||
We bundle the recipe and model together into a `workflow`, which organizes the entire modeling process.
|
||||
|
||||
```{r}
|
||||
# Combine the recipe and model into a workflow
|
||||
resistance_workflow <- workflow() %>%
|
||||
add_recipe(resistance_recipe) %>% # Add the preprocessing recipe
|
||||
add_model(logistic_model) # Add the logistic regression model
|
||||
resistance_workflow
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### **Training and Evaluating the Model**
|
||||
|
||||
To train the model, we split the data into training and testing sets. Then, we fit the workflow on the training set and evaluate its performance.
|
||||
|
||||
```{r}
|
||||
# Split data into training and testing sets
|
||||
set.seed(123) # For reproducibility
|
||||
data_split <- initial_split(data, prop = 0.8) # 80% training, 20% testing
|
||||
training_data <- training(data_split) # Training set
|
||||
testing_data <- testing(data_split) # Testing set
|
||||
|
||||
# Fit the workflow to the training data
|
||||
fitted_workflow <- resistance_workflow %>%
|
||||
fit(training_data) # Train the model
|
||||
|
||||
fitted_workflow
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
- `initial_split()` splits the data into training and testing sets.
|
||||
- `fit()` trains the workflow on the training set.
|
||||
|
||||
Next, we evaluate the model on the testing data.
|
||||
|
||||
```{r}
|
||||
# Make predictions on the testing set
|
||||
predictions <- fitted_workflow %>%
|
||||
predict(testing_data) # Generate predictions
|
||||
probabilities <- fitted_workflow %>%
|
||||
predict(testing_data, type = "prob") # Generate probabilities
|
||||
|
||||
predictions <- predictions %>%
|
||||
bind_cols(probabilities) %>%
|
||||
bind_cols(testing_data) # Combine with true labels
|
||||
|
||||
predictions
|
||||
|
||||
# Evaluate model performance
|
||||
metrics <- predictions %>%
|
||||
metrics(truth = mo, estimate = .pred_class) # Calculate performance metrics
|
||||
|
||||
metrics
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
- `predict()` generates predictions on the testing set.
|
||||
- `metrics()` computes evaluation metrics like accuracy and AUC.
|
||||
|
||||
It appears we can predict the Gram based on AMR results with a `r round(metrics$.estimate[1], 3)` accuracy. The ROC curve looks like:
|
||||
|
||||
```{r}
|
||||
predictions %>%
|
||||
roc_curve(mo, `.pred_Gram-negative`) %>%
|
||||
autoplot()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### **Conclusion**
|
||||
|
||||
In this post, we demonstrated how to build a machine learning pipeline with the `tidymodels` framework and the `AMR` package. By combining selector functions like `aminoglycosides()` and `betalactams()` with `tidymodels`, we efficiently prepared data, trained a model, and evaluated its performance.
|
||||
|
||||
This workflow is extensible to other antibiotic classes and resistance patterns, empowering users to analyse AMR data systematically and reproducibly.
|
||||
|
||||
---
|
Loading…
Reference in New Issue
Block a user