diff --git a/.Rbuildignore b/.Rbuildignore index 19b952f7..6fd7fa1c 100755 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -25,6 +25,7 @@ ^tests/testthat/_snaps$ ^vignettes/AMR\.Rmd$ ^vignettes/AMR_intro\.png$ +^vignettes/AMR_with_tidymodels\.Rmd$ ^vignettes/benchmarks\.Rmd$ ^vignettes/benchmarks\.Rmd\.not$ ^vignettes/datasets\.Rmd$ diff --git a/DESCRIPTION b/DESCRIPTION index 0a43bb80..696a98db 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR -Version: 2.1.1.9120 -Date: 2024-12-15 +Version: 2.1.1.9121 +Date: 2024-12-19 Title: Antimicrobial Resistance Data Analysis Description: Functions to simplify and standardise antimicrobial resistance (AMR) data analysis and to work with microbial and antimicrobial properties by @@ -47,6 +47,7 @@ Suggests: rvest, skimr, tibble, + tidymodels, tidyselect, tinytest, vctrs, diff --git a/NEWS.md b/NEWS.md index ae6ba381..0d8ce2f6 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,4 +1,4 @@ -# AMR 2.1.1.9120 +# AMR 2.1.1.9121 *(this beta version will eventually become v3.0. We're happy to reach a new major milestone soon, which will be all about the new One Health support! Install this beta using [the instructions here](https://msberends.github.io/AMR/#latest-development-version).)* @@ -30,6 +30,8 @@ This package now supports not only tools for AMR data analysis in clinical setti * New function `rescale_mic()`, which allows users to rescale MIC values to a manually set range. This is the powerhouse behind the `scale_*_mic()` functions, but it can be used independently to, for instance, compare equality in MIC distributions by rescaling them to the same range first. * **Support for Python** * While using R for the heavy lifting, [our 'AMR' Python Package](https://pypi.org/project/AMR/) was developed to run the AMR R package natively in Python. The Python package will always have the same version number as the R package, as it is built automatically with every code change. +* **Support for `tidymodels`** + * All antimicrobial selectors (such as `aminoglycosides()` and `betalactams()`) are now supported in `tidymodels` packages such as `recipe` and `parsnip`. See for more info [our tutorial](https://msberends.github.io/AMR/articles/AMR_with_tidymodels.html) on using AMR function for predictive modelling. * **Other** * New function `mo_group_members()` to retrieve the member microorganisms of a microorganism group. For example, `mo_group_members("Strep group C")` returns a vector of all microorganisms that belong to that group. @@ -73,7 +75,9 @@ This package now supports not only tools for AMR data analysis in clinical setti * Updated the prevalence calculation to include genera from the World Health Organization's (WHO) Priority Pathogen List * Improved algorithm of `first_isolate()` when using the phenotype-based method, to prioritise records with the highest availability of SIR values * `scale_y_percent()` can now cope with ranges outside the 0-100% range -* Support for new Dutch national MDRO guideline (SRI-richtlijn BRMO, Nov 2024) +* MDRO determination (using `mdro()`) + * Implemented the new Dutch national MDRO guideline (SRI-richtlijn BRMO, Nov 2024) + * Added arguments `esbl`, `carbapenemase`, `mecA`, `mecC`, `vanA`, `vanB` to denote column names or logical values indicating presence of these genes (or production of their proteins) ## Other * Greatly improved `vctrs` integration, a Tidyverse package working in the background for many Tidyverse functions. For users, this means that functions such as `dplyr`'s `bind_rows()`, `rowwise()` and `c_across()` are now supported for e.g. columns of class `mic`. Despite this, this `AMR` package is still zero-dependent on any other package, including `dplyr` and `vctrs`. diff --git a/PythonPackage/AMR/AMR.egg-info/PKG-INFO b/PythonPackage/AMR/AMR.egg-info/PKG-INFO index d41c9396..347e5180 100644 --- a/PythonPackage/AMR/AMR.egg-info/PKG-INFO +++ b/PythonPackage/AMR/AMR.egg-info/PKG-INFO @@ -1,6 +1,6 @@ Metadata-Version: 2.1 Name: AMR -Version: 2.1.1.9120 +Version: 2.1.1.9121 Summary: A Python wrapper for the AMR R package Home-page: https://github.com/msberends/AMR Author: Matthijs Berends diff --git a/PythonPackage/AMR/AMR/functions.py b/PythonPackage/AMR/AMR/functions.py index a2f07d44..25a4e528 100644 --- a/PythonPackage/AMR/AMR/functions.py +++ b/PythonPackage/AMR/AMR/functions.py @@ -441,9 +441,9 @@ def mdr_tb(x = None, *args, **kwargs): def mdr_cmi2012(x = None, *args, **kwargs): """See our website of the R package for the manual: https://msberends.github.io/AMR/index.html""" return convert_to_python(amr_r.mdr_cmi2012(x = None, *args, **kwargs)) -def eucast_exceptional_phenotypes(x = None, *args, **kwargs): +def eucast_exceptional_phenotypes(*args, **kwargs): """See our website of the R package for the manual: https://msberends.github.io/AMR/index.html""" - return convert_to_python(amr_r.eucast_exceptional_phenotypes(x = None, *args, **kwargs)) + return convert_to_python(amr_r.eucast_exceptional_phenotypes(*args, **kwargs)) def mean_amr_distance(x, *args, **kwargs): """See our website of the R package for the manual: https://msberends.github.io/AMR/index.html""" return convert_to_python(amr_r.mean_amr_distance(x, *args, **kwargs)) diff --git a/PythonPackage/AMR/dist/AMR-2.1.1.9120-py3-none-any.whl b/PythonPackage/AMR/dist/AMR-2.1.1.9121-py3-none-any.whl similarity index 75% rename from PythonPackage/AMR/dist/AMR-2.1.1.9120-py3-none-any.whl rename to PythonPackage/AMR/dist/AMR-2.1.1.9121-py3-none-any.whl index 65c72e0d..29d89122 100644 Binary files a/PythonPackage/AMR/dist/AMR-2.1.1.9120-py3-none-any.whl and b/PythonPackage/AMR/dist/AMR-2.1.1.9121-py3-none-any.whl differ diff --git a/PythonPackage/AMR/dist/amr-2.1.1.9120.tar.gz b/PythonPackage/AMR/dist/amr-2.1.1.9120.tar.gz deleted file mode 100644 index fc8f95c6..00000000 Binary files a/PythonPackage/AMR/dist/amr-2.1.1.9120.tar.gz and /dev/null differ diff --git a/PythonPackage/AMR/dist/amr-2.1.1.9121.tar.gz b/PythonPackage/AMR/dist/amr-2.1.1.9121.tar.gz new file mode 100644 index 00000000..e6ad65a7 Binary files /dev/null and b/PythonPackage/AMR/dist/amr-2.1.1.9121.tar.gz differ diff --git a/PythonPackage/AMR/setup.py b/PythonPackage/AMR/setup.py index 1854d100..8ea2c7a8 100644 --- a/PythonPackage/AMR/setup.py +++ b/PythonPackage/AMR/setup.py @@ -2,7 +2,7 @@ from setuptools import setup, find_packages setup( name='AMR', - version='2.1.1.9120', + version='2.1.1.9121', packages=find_packages(), install_requires=[ 'rpy2', diff --git a/R/aa_helper_functions.R b/R/aa_helper_functions.R index 7cea3406..36008387 100644 --- a/R/aa_helper_functions.R +++ b/R/aa_helper_functions.R @@ -988,7 +988,7 @@ get_current_data <- function(arg_name, call) { for (env in frms[which(with_mask)]) { if (is.function(env$mask$current_rows) && (valid_df(env$data) || valid_df(env$`.data`))) { # an element `.data` or `data` (containing all data) and `mask` (containing functions) will be in the environment when using dplyr verbs - # we use their mask$current_rows() to get the group rows, since dplyr::cur_data_all() is deprecated and will be removed in the future + # we use their mask$current_rows() below to get the group rows, since dplyr::cur_data_all() is deprecated and will be removed in the future # e.g. for `example_isolates %>% group_by(ward) %>% mutate(first = first_isolate(.))` if (valid_df(env$data)) { # support for dplyr 1.1.x @@ -1008,6 +1008,9 @@ get_current_data <- function(arg_name, call) { if (valid_df(env$`.data`)) { # an element `.data` will be in the environment when using dplyr::select() return(env$`.data`) + } else if (valid_df(env$data)) { + # an element `data` will be in the environment when using older dplyr versions, or tidymodels + return(env$data) } else if (valid_df(env$xx)) { # an element `xx` will be in the environment for rows + cols in base R, e.g. `example_isolates[c(1:3), carbapenems()]` return(env$xx) diff --git a/R/mdro.R b/R/mdro.R index b7afff1e..eeddb29c 100755 --- a/R/mdro.R +++ b/R/mdro.R @@ -32,6 +32,12 @@ #' Determine which isolates are multidrug-resistant organisms (MDRO) according to international, national, or custom guidelines. #' @param x a [data.frame] with antibiotics columns, like `AMX` or `amox`. Can be left blank for automatic determination. #' @param guideline a specific guideline to follow, see sections *Supported international / national guidelines* and *Using Custom Guidelines* below. When left empty, the publication by Magiorakos *et al.* (see below) will be followed. +#' @param esbl [logical] values, or a column name containing logical values, indicating the presence of an ESBL gene (or production of its proteins) +#' @param carbapenemase [logical] values, or a column name containing logical values, indicating the presence of a carbapenemase gene (or production of its proteins) +#' @param mecA [logical] values, or a column name containing logical values, indicating the presence of a *mecA* gene (or production of its proteins) +#' @param mecC [logical] values, or a column name containing logical values, indicating the presence of a *mecC* gene (or production of its proteins) +#' @param vanA [logical] values, or a column name containing logical values, indicating the presence of a *vanA* gene (or production of its proteins) +#' @param vanB [logical] values, or a column name containing logical values, indicating the presence of a *vanB* gene (or production of its proteins) #' @param ... in case of [custom_mdro_guideline()]: a set of rules, see section *Using Custom Guidelines* below. Otherwise: column name of an antibiotic, see section *Antibiotics* below. #' @param as_factor a [logical] to indicate whether the returned value should be an ordered [factor] (`TRUE`, default), or otherwise a [character] vector #' @inheritParams eucast_rules @@ -177,6 +183,12 @@ mdro <- function(x = NULL, guideline = "CMI2012", col_mo = NULL, + esbl = NA, + carbapenemase = NA, + mecA = NA, + mecC = NA, + vanA = NA, + vanB = NA, info = interactive(), pct_required_classes = 0.5, combine_SI = TRUE, @@ -190,9 +202,13 @@ mdro <- function(x = NULL, } meet_criteria(x, allow_class = "data.frame") # also checks dimensions to be >0 meet_criteria(guideline, allow_class = c("list", "character"), allow_NULL = TRUE) - if (!is.list(guideline)) { - meet_criteria(guideline, allow_class = "character", has_length = 1, allow_NULL = TRUE) - } + if (!is.list(guideline)) meet_criteria(guideline, allow_class = "character", has_length = 1, allow_NULL = TRUE) + meet_criteria(esbl, allow_class = c("logical", "character"), allow_NA = TRUE) + meet_criteria(carbapenemase, allow_class = c("logical", "character"), allow_NA = TRUE) + meet_criteria(mecA, allow_class = c("logical", "character"), allow_NA = TRUE) + meet_criteria(mecC, allow_class = c("logical", "character"), allow_NA = TRUE) + meet_criteria(vanA, allow_class = c("logical", "character"), allow_NA = TRUE) + meet_criteria(vanB, allow_class = c("logical", "character"), allow_NA = TRUE) meet_criteria(col_mo, allow_class = "character", has_length = 1, is_in = colnames(x), allow_NULL = TRUE) meet_criteria(info, allow_class = "logical", has_length = 1) meet_criteria(pct_required_classes, allow_class = "numeric", has_length = 1) @@ -203,7 +219,51 @@ mdro <- function(x = NULL, if (!any(is_sir_eligible(x))) { stop_("There were no possible SIR columns found in the data set. Transform columns with `as.sir()` for valid antimicrobial interpretations.") } - + + # get gene values as TRUE/FALSE + if (is.character(esbl)) { + meet_criteria(esbl, is_in = colnames(x), allow_NA = FALSE, has_length = 1) + esbl <- x[[esbl]] + meet_criteria(esbl, allow_class = "logical", allow_NA = TRUE) + } else if (length(esbl) == 1) { + esbl <- rep(esbl, NROW(x)) + } + if (is.character(carbapenemase)) { + meet_criteria(carbapenemase, is_in = colnames(x), allow_NA = FALSE, has_length = 1) + carbapenemase <- x[[carbapenemase]] + meet_criteria(carbapenemase, allow_class = "logical", allow_NA = TRUE) + } else if (length(carbapenemase) == 1) { + carbapenemase <- rep(carbapenemase, NROW(x)) + } + if (is.character(mecA)) { + meet_criteria(mecA, is_in = colnames(x), allow_NA = FALSE, has_length = 1) + mecA <- x[[mecA]] + meet_criteria(mecA, allow_class = "logical", allow_NA = TRUE) + } else if (length(mecA) == 1) { + mecA <- rep(mecA, NROW(x)) + } + if (is.character(mecC)) { + meet_criteria(mecC, is_in = colnames(x), allow_NA = FALSE, has_length = 1) + mecC <- x[[mecC]] + meet_criteria(mecC, allow_class = "logical", allow_NA = TRUE) + } else if (length(mecC) == 1) { + mecC <- rep(mecC, NROW(x)) + } + if (is.character(vanA)) { + meet_criteria(vanA, is_in = colnames(x), allow_NA = FALSE, has_length = 1) + vanA <- x[[vanA]] + meet_criteria(vanA, allow_class = "logical", allow_NA = TRUE) + } else if (length(vanA) == 1) { + vanA <- rep(vanA, NROW(x)) + } + if (is.character(vanB)) { + meet_criteria(vanB, is_in = colnames(x), allow_NA = FALSE, has_length = 1) + vanB <- x[[vanB]] + meet_criteria(vanB, allow_class = "logical", allow_NA = TRUE) + } else if (length(vanB) == 1) { + vanB <- rep(vanB, NROW(x)) + } + info.bak <- info # don't throw info's more than once per call if (isTRUE(info)) { @@ -476,7 +536,7 @@ mdro <- function(x = NULL, if (!"AMP" %in% names(cols_ab) && "AMX" %in% names(cols_ab)) { # ampicillin column is missing, but amoxicillin is available if (isTRUE(info)) { - message_("Using column '", cols_ab[names(cols_ab) == "AMX"], "' as input for ampicillin since many EUCAST rules depend on it.") + message_("Using column '", cols_ab[names(cols_ab) == "AMX"], "' as input for ampicillin since many MDRO rules depend on it.") } cols_ab <- c(cols_ab, c(AMP = unname(cols_ab[names(cols_ab) == "AMX"]))) } @@ -663,6 +723,17 @@ mdro <- function(x = NULL, out[is.na(out)] <- FALSE out } + col_values <- function(df, col, return_if_lacking = "") { + if (col %in% colnames(df)) { + df[[col]] + } else { + rep(return_if_lacking, NROW(df)) + } + } + NA_as_FALSE <- function(x) { + x[is.na(x)] <- FALSE + x + } # antibiotic classes # nolint start @@ -677,6 +748,10 @@ mdro <- function(x = NULL, # helper function for editing the table trans_tbl <- function(to, rows, cols, any_all, reason = NULL) { + cols.bak <- cols + if (identical(cols, "any")) { + cols <- unique(cols_ab) + } cols <- cols[!ab_missing(cols)] cols <- cols[!is.na(cols)] if (length(rows) > 0 && length(cols) > 0) { @@ -690,7 +765,7 @@ mdro <- function(x = NULL, x[rows, "columns_nonsusceptible"] <<- vapply( FUN.VALUE = character(1), rows, - function(row, group_vct = cols) { + function(row, group_vct = cols_ab) { cols_nonsus <- vapply( FUN.VALUE = logical(1), x[row, group_vct, drop = FALSE], @@ -717,7 +792,7 @@ mdro <- function(x = NULL, rows_affected <- vapply( FUN.VALUE = logical(1), x_transposed, - function(y) search_function(y %in% search_result, na.rm = TRUE) + function(y) search_function(y %in% search_result, na.rm = TRUE) | identical(cols.bak, "any") ) rows_affected <- x[which(rows_affected), "row_number", drop = TRUE] rows_to_change <- rows[rows %in% rows_affected] @@ -1449,62 +1524,83 @@ mdro <- function(x = NULL, if (length(ESBLs) > 0) { trans_tbl( 2, # positive, unconfirmed - which(x$order == "Enterobacterales" & x[[ESBLs[1]]] == "R" & x[[ESBLs[2]]] == "R"), - c(AMX %or% AMP, cephalosporins_3rd), - "all", - reason = "Enterobacterales: ESBL" + rows = which(x$order == "Enterobacterales" & x[[ESBLs[1]]] == "R" & x[[ESBLs[2]]] == "R" & is.na(esbl)), + cols = c(AMX %or% AMP, cephalosporins_3rd), + any_all = "all", + reason = "Enterobacterales: potential ESBL" ) } trans_tbl( 3, # positive - which(x$order == "Enterobacterales" & (x$genus %in% c("Proteus", "Providencia") | paste(x$genus, x$species) %in% c("Serratia marcescens", "Morganella morganii"))), - carbapenems_without_imipenem, - "any", - reason = "Enterobacterales: carbapenem or carbapenemase" + rows = which(x$order == "Enterobacterales" & esbl == TRUE), + cols = "any", + any_all = "any", + reason = "Enterobacterales: ESBL" ) trans_tbl( 3, - which(x$order == "Enterobacterales" & !(x$genus %in% c("Proteus", "Providencia") | paste(x$genus, x$species) %in% c("Serratia marcescens", "Morganella morganii"))), - carbapenems, - "any", - reason = "Enterobacterales: carbapenem or carbapenemase" + rows = which(x$order == "Enterobacterales" & (x$genus %in% c("Proteus", "Providencia") | paste(x$genus, x$species) %in% c("Serratia marcescens", "Morganella morganii"))), + cols = carbapenems_without_imipenem, + any_all = "any", + reason = "Enterobacterales: carbapenem resistance" ) trans_tbl( 3, - which(x[[SXT]] == "R" & + rows = which(x$order == "Enterobacterales" & !(x$genus %in% c("Proteus", "Providencia") | paste(x$genus, x$species) %in% c("Serratia marcescens", "Morganella morganii"))), + cols = carbapenems, + any_all = "any", + reason = "Enterobacterales: carbapenem resistance" + ) + trans_tbl( + 3, + rows = which(x$order == "Enterobacterales" & carbapenemase == TRUE), + cols = "any", + any_all = "any", + reason = "Enterobacterales: carbapenemase" + ) + trans_tbl( + 3, + rows = which(x[[SXT]] == "R" & (x[[GEN]] == "R" | x[[TOB]] == "R" | x[[AMK]] == "R") & (x[[CIP]] == "R" | x[[NOR]] == "R" | x[[LVX]] == "R") & (x$genus %in% c("Enterobacter", "Providencia") | paste(x$genus, x$species) %in% c("Citrobacter freundii", "Klebsiella aerogenes", "Hafnia alvei", "Morganella morganii"))), - c(SXT, aminoglycosides, fluoroquinolones), - "any", + cols = c(SXT, aminoglycosides, fluoroquinolones), + any_all = "any", reason = "Enterobacterales group II: aminoglycoside + fluoroquinolone + cotrimoxazol" ) trans_tbl( 3, - which(x[[SXT]] == "R" & + rows = which(x[[SXT]] == "R" & x[[GEN]] == "R" & (x[[CIP]] == "R" | x[[NOR]] == "R" | x[[LVX]] == "R") & paste(x$genus, x$species) == "Serratia marcescens"), - c(SXT, aminoglycosides_serratia_marcescens, fluoroquinolones), - "any", + cols = c(SXT, aminoglycosides_serratia_marcescens, fluoroquinolones), + any_all = "any", reason = "Enterobacterales group II: aminoglycoside + fluoroquinolone + cotrimoxazol" ) # Acinetobacter baumannii-calcoaceticus complex trans_tbl( 3, - which((x[[GEN]] == "R" | x[[TOB]] == "R" | x[[AMK]] == "R") & + rows = which((x[[GEN]] == "R" | x[[TOB]] == "R" | x[[AMK]] == "R") & (x[[CIP]] == "R" | x[[LVX]] == "R") & x[[col_mo]] %in% AMR::microorganisms.groups$mo[AMR::microorganisms.groups$mo_group_name == "Acinetobacter baumannii complex"]), - c(aminoglycosides, CIP, LVX), - "any", + cols = c(aminoglycosides, CIP, LVX), + any_all = "any", reason = "A. baumannii-calcoaceticus complex: aminoglycoside + ciprofloxacin or levofloxacin" ) trans_tbl( 2, # unconfirmed - which(x[[col_mo]] %in% AMR::microorganisms.groups$mo[AMR::microorganisms.groups$mo_group_name == "Acinetobacter baumannii complex"]), - carbapenems, - "any", + rows = which(x[[col_mo]] %in% AMR::microorganisms.groups$mo[AMR::microorganisms.groups$mo_group_name == "Acinetobacter baumannii complex"] & is.na(carbapenemase)), + cols = carbapenems, + any_all = "any", + reason = "A. baumannii-calcoaceticus complex: potential carbapenemase" + ) + trans_tbl( + 3, + rows = which(x[[col_mo]] %in% AMR::microorganisms.groups$mo[AMR::microorganisms.groups$mo_group_name == "Acinetobacter baumannii complex"] & carbapenemase == TRUE), + cols = carbapenems, + any_all = "any", reason = "A. baumannii-calcoaceticus complex: carbapenemase" ) @@ -1513,59 +1609,65 @@ mdro <- function(x = NULL, # take pip/tazo if just pip is not available - many labs only test for pip/tazo because of availability on a Vitek card PIP <- TZP } - if (!ab_missing(MEM) && !ab_missing(IPM) && - !ab_missing(GEN) && !ab_missing(TOB) && - !ab_missing(CIP) && - !ab_missing(CAZ) && - !ab_missing(PIP)) { - x$psae <- 0 - x[which(x[, MEM, drop = TRUE] == "R" | x[, IPM, drop = TRUE] == "R"), "psae"] <- 1 + x[which(x[, MEM, drop = TRUE] == "R" | x[, IPM, drop = TRUE] == "R"), "psae"] - x[which(x[, GEN, drop = TRUE] == "R" & x[, TOB, drop = TRUE] == "R"), "psae"] <- 1 + x[which(x[, GEN, drop = TRUE] == "R" & x[, TOB, drop = TRUE] == "R"), "psae"] - x[which(x[, CIP, drop = TRUE] == "R"), "psae"] <- 1 + x[which(x[, CIP, drop = TRUE] == "R"), "psae"] - x[which(x[, CAZ, drop = TRUE] == "R"), "psae"] <- 1 + x[which(x[, CAZ, drop = TRUE] == "R"), "psae"] - x[which(x[, PIP, drop = TRUE] == "R"), "psae"] <- 1 + x[which(x[, PIP, drop = TRUE] == "R"), "psae"] - } else { - x$psae <- 0 - } + x$psae <- 0 + x$psae <- x$psae + ifelse(NA_as_FALSE(col_values(x, TOB) == "R" | col_values(x, AMK) == "R"), 1, 0) + x$psae <- x$psae + ifelse(NA_as_FALSE(col_values(x, IPM) == "R" | col_values(x, MEM) == "R"), 1, 0) + x$psae <- x$psae + ifelse(NA_as_FALSE(col_values(x, PIP) == "R"), 1, 0) + x$psae <- x$psae + ifelse(NA_as_FALSE(col_values(x, CAZ) == "R"), 1, 0) + x$psae <- x$psae + ifelse(NA_as_FALSE(col_values(x, CIP) == "R" | col_values(x, NOR) == "R" | col_values(x, LVX) == "R"), 1, 0) trans_tbl( 3, - which(x$genus == "Pseudomonas" & x$species == "aeruginosa"), - c(CAZ, CIP, GEN, IPM, MEM, TOB, PIP), - "all", # this will set all negatives to "guideline criteria not met" instead of "not covered by guideline" + rows = which(x$genus == "Pseudomonas" & x$species == "aeruginosa"), + cols = c(CAZ, CIP, GEN, IPM, MEM, TOB, PIP), + any_all = "all", # this will set all negatives to "guideline criteria not met" instead of "not covered by guideline" reason = "P. aeruginosa: at least 3 classes contain R" ) trans_tbl( 3, - which(x$genus == "Pseudomonas" & x$species == "aeruginosa" & x$psae >= 3), - c(CAZ, CIP, GEN, IPM, MEM, TOB, PIP), - "any", # this is the actual one, changing the ones with x$psae >= 3 + rows = which(x$genus == "Pseudomonas" & x$species == "aeruginosa" & x$psae >= 3), + cols = c(CAZ, CIP, GEN, IPM, MEM, TOB, PIP), + any_all = "any", # this is the actual one, changing the ones with x$psae >= 3 reason = "P. aeruginosa: at least 3 classes contain R" ) # Enterococcus faecium trans_tbl( 3, - which(x$genus == "Enterococcus" & x$species == "faecium"), - c(PEN %or% AMX %or% AMP, VAN), - "all", - reason = "E. faecium: vancomycin or vanA/vanB gene + penicillin group" + rows = which(x$genus == "Enterococcus" & x$species == "faecium"), + cols = c(PEN %or% AMX %or% AMP, VAN), + any_all = "all", + reason = "E. faecium: vancomycin + penicillin group" + ) + trans_tbl( + 3, + rows = which(x$genus == "Enterococcus" & x$species == "faecium" & (vanA == TRUE | vanB == TRUE)), + cols = c(PEN, AMX, AMP, VAN), + any_all = "any", + reason = "E. faecium: vanA/vanB gene + penicillin group" ) # Staphylococcus aureus trans_tbl( 2, - which(x$genus == "Staphylococcus" & x$species == "aureus"), - c(PEN, AMX, AMP, FLC, OXA, FOX, FOX1), - "any", - reason = "S. aureus: MRSA" + rows = which(x$genus == "Staphylococcus" & x$species == "aureus" & (is.na(mecA) | is.na(mecC))), + cols = c(AMC, TZP, FLC, OXA, FOX, FOX1), + any_all = "any", + reason = "S. aureus: potential MRSA" + ) + trans_tbl( + 3, + rows = which(x$genus == "Staphylococcus" & x$species == "aureus" & (mecA == TRUE | mecC == TRUE)), + cols = "any", + any_all = "any", + reason = "S. aureus: mecA/mecC gene" ) # Candida auris trans_tbl( 3, - which(x$genus == "Candida" & x$species == "auris"), - character(0), - "any", + rows = which(x$genus == "Candida" & x$species == "auris"), + cols = "any", + any_all = "any", reason = "C. auris: regardless of resistance" ) } @@ -2040,50 +2142,51 @@ brmo <- function(x = NULL, only_sir_columns = FALSE, ...) { mdro(x = x, only_sir_columns = only_sir_columns, guideline = "BRMO", ...) } + #' @rdname mdro #' @export -mrgn <- function(x = NULL, only_sir_columns = FALSE, ...) { +mrgn <- function(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...) { meet_criteria(x, allow_class = "data.frame", allow_NULL = TRUE) meet_criteria(only_sir_columns, allow_class = "logical", has_length = 1) stop_if( "guideline" %in% names(list(...)), "argument `guideline` must not be set since this is a guideline-specific function" ) - mdro(x = x, only_sir_columns = only_sir_columns, guideline = "MRGN", ...) + mdro(x = x, only_sir_columns = only_sir_columns, verbose = verbose, guideline = "MRGN", ...) } #' @rdname mdro #' @export -mdr_tb <- function(x = NULL, only_sir_columns = FALSE, ...) { +mdr_tb <- function(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...) { meet_criteria(x, allow_class = "data.frame", allow_NULL = TRUE) meet_criteria(only_sir_columns, allow_class = "logical", has_length = 1) stop_if( "guideline" %in% names(list(...)), "argument `guideline` must not be set since this is a guideline-specific function" ) - mdro(x = x, only_sir_columns = only_sir_columns, guideline = "TB", ...) + mdro(x = x, only_sir_columns = only_sir_columns, verbose = verbose, guideline = "TB", ...) } #' @rdname mdro #' @export -mdr_cmi2012 <- function(x = NULL, only_sir_columns = FALSE, ...) { +mdr_cmi2012 <- function(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...) { meet_criteria(x, allow_class = "data.frame", allow_NULL = TRUE) meet_criteria(only_sir_columns, allow_class = "logical", has_length = 1) stop_if( "guideline" %in% names(list(...)), "argument `guideline` must not be set since this is a guideline-specific function" ) - mdro(x = x, only_sir_columns = only_sir_columns, guideline = "CMI2012", ...) + mdro(x = x, only_sir_columns = only_sir_columns, verbose = verbose, guideline = "CMI2012", ...) } #' @rdname mdro #' @export -eucast_exceptional_phenotypes <- function(x = NULL, only_sir_columns = FALSE, ...) { +eucast_exceptional_phenotypes <- function(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...) { meet_criteria(x, allow_class = "data.frame", allow_NULL = TRUE) meet_criteria(only_sir_columns, allow_class = "logical", has_length = 1) stop_if( "guideline" %in% names(list(...)), "argument `guideline` must not be set since this is a guideline-specific function" ) - mdro(x = x, only_sir_columns = only_sir_columns, guideline = "EUCAST", ...) + mdro(x = x, only_sir_columns = only_sir_columns, verbose = verbose, guideline = "EUCAST", ...) } diff --git a/_pkgdown.yml b/_pkgdown.yml index 47ef7619..7bb3f42f 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -80,6 +80,9 @@ navbar: - text: "Download Data Sets for Own Use" icon: "fa-database" href: "articles/datasets.html" + - text: "Use AMR for Predictive Modelling (tidymodels)" + icon: "fa-square-root-variable" + href: "articles/AMR_with_tidymodels.html" - text: "Set User- Or Team-specific Package Settings" icon: "fa-gear" href: "reference/AMR-options.html" diff --git a/data-raw/gpt_training_text_v2.1.1.9120.txt b/data-raw/gpt_training_text_v2.1.1.9121.txt similarity index 98% rename from data-raw/gpt_training_text_v2.1.1.9120.txt rename to data-raw/gpt_training_text_v2.1.1.9121.txt index 6ae8f2a0..17f9b3e3 100644 --- a/data-raw/gpt_training_text_v2.1.1.9120.txt +++ b/data-raw/gpt_training_text_v2.1.1.9121.txt @@ -1,5 +1,5 @@ This files contains all context you must know about the AMR package for R. -First and foremost, you are trained on version 2.1.1.9120. Remember this whenever someone asks which AMR package version you’re at. +First and foremost, you are trained on version 2.1.1.9121. Remember this whenever someone asks which AMR package version you’re at. -------------------------------- THE PART HEREAFTER CONTAINS CONTENTS FROM FILE 'NAMESPACE': @@ -6086,6 +6086,12 @@ mdro( x = NULL, guideline = "CMI2012", col_mo = NULL, + esbl = NA, + carbapenemase = NA, + mecA = NA, + mecC = NA, + vanA = NA, + vanB = NA, info = interactive(), pct_required_classes = 0.5, combine_SI = TRUE, @@ -6098,13 +6104,18 @@ custom_mdro_guideline(..., as_factor = TRUE) brmo(x = NULL, only_sir_columns = FALSE, ...) -mrgn(x = NULL, only_sir_columns = FALSE, ...) +mrgn(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...) -mdr_tb(x = NULL, only_sir_columns = FALSE, ...) +mdr_tb(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...) -mdr_cmi2012(x = NULL, only_sir_columns = FALSE, ...) +mdr_cmi2012(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...) -eucast_exceptional_phenotypes(x = NULL, only_sir_columns = FALSE, ...) +eucast_exceptional_phenotypes( + x = NULL, + only_sir_columns = FALSE, + verbose = FALSE, + ... +) } \arguments{ \item{x}{a \link{data.frame} with antibiotics columns, like \code{AMX} or \code{amox}. Can be left blank for automatic determination.} @@ -6113,6 +6124,18 @@ eucast_exceptional_phenotypes(x = NULL, only_sir_columns = FALSE, ...) \item{col_mo}{column name of the names or codes of the microorganisms (see \code{\link[=as.mo]{as.mo()}}) - the default is the first column of class \code{\link{mo}}. Values will be coerced using \code{\link[=as.mo]{as.mo()}}.} +\item{esbl}{\link{logical} values, or a column name containing logical values, indicating the presence of an ESBL gene (or production of its proteins)} + +\item{carbapenemase}{\link{logical} values, or a column name containing logical values, indicating the presence of a carbapenemase gene (or production of its proteins)} + +\item{mecA}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{mecA} gene (or production of its proteins)} + +\item{mecC}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{mecC} gene (or production of its proteins)} + +\item{vanA}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{vanA} gene (or production of its proteins)} + +\item{vanB}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{vanB} gene (or production of its proteins)} + \item{info}{a \link{logical} to indicate whether progress should be printed to the console - the default is only print while in interactive sessions} \item{pct_required_classes}{minimal required percentage of antimicrobial classes that must be available per isolate, rounded down. For example, with the default guideline, 17 antimicrobial classes must be available for \emph{S. aureus}. Setting this \code{pct_required_classes} argument to \code{0.5} (default) means that for every \emph{S. aureus} isolate at least 8 different classes must be available. Any lower number of available classes will return \code{NA} for that isolate.} @@ -8885,6 +8908,203 @@ Whether you're cleaning data or analysing resistance patterns, the `AMR` Python +THE PART HEREAFTER CONTAINS CONTENTS FROM FILE 'vignettes/AMR_with_tidymodels.Rmd': + + +--- +title: "`AMR` with `tidymodels`" +output: + rmarkdown::html_vignette: + toc: true + toc_depth: 3 +vignette: > + %\VignetteIndexEntry{`AMR` with `tidymodels`} + %\VignetteEncoding{UTF-8} + %\VignetteEngine{knitr::rmarkdown} +editor_options: + chunk_output_type: console +--- + +```{r setup, include = FALSE, results = 'markup'} +knitr::opts_chunk$set( + warning = FALSE, + collapse = TRUE, + comment = "#>", + fig.width = 7.5, + fig.height = 5 +) +``` + +Antimicrobial resistance (AMR) is a global health crisis, and understanding resistance patterns is crucial for managing effective treatments. The `AMR` R package provides robust tools for analysing AMR data, including convenient antibiotic selector functions like `aminoglycosides()` and `betalactams()`. In this post, we will explore how to use the `tidymodels` framework to predict resistance patterns in the `example_isolates` dataset. + +By leveraging the power of `tidymodels` and the `AMR` package, we’ll build a reproducible machine learning workflow to predict resistance to two important antibiotic classes: aminoglycosides and beta-lactams. + +--- + +### **Objective** + +Our goal is to build a predictive model using the `tidymodels` framework to determine resistance patterns based on microbial data. We will: + +1. Preprocess data using the selector functions `aminoglycosides()` and `betalactams()`. +2. Define a logistic regression model for prediction. +3. Use a structured `tidymodels` workflow to preprocess, train, and evaluate the model. + +--- + +### **Data Preparation** + +We begin by loading the required libraries and preparing the `example_isolates` dataset from the `AMR` package. + +```{r} +# Load required libraries +library(tidymodels) # For machine learning workflows, and data manipulation (dplyr, tidyr, ...) +library(AMR) # For AMR data analysis + +# Load the example_isolates dataset +data("example_isolates") # Preloaded dataset with AMR results + +# Select relevant columns for prediction +data <- example_isolates %>% + # select AB results dynamically + select(mo, aminoglycosides(), betalactams()) %>% + # replace NAs with NI (not-interpretable) + mutate(across(where(is.sir), + ~replace_na(.x, "NI")), + # make factors of SIR columns + across(where(is.sir), + as.integer), + # get Gramstain of microorganisms + mo = as.factor(mo_gramstain(mo))) %>% + # drop NAs - the ones without a Gramstain (fungi, etc.) + drop_na() # %>% + # Cefepime is not reliable + #select(-FEP) +``` + +**Explanation:** +- `aminoglycosides()` and `betalactams()` dynamically select columns for antibiotics in these classes. +- `drop_na()` ensures the model receives complete cases for training. + +--- + +### **Defining the Workflow** + +We now define the `tidymodels` workflow, which consists of three steps: preprocessing, model specification, and fitting. + +#### 1. Preprocessing with a Recipe + +We create a recipe to preprocess the data for modelling. This includes: +- Encoding resistance results (`S`, `I`, `R`) as binary (resistant or not resistant). +- Converting microbial organism names (`mo`) into numerical features using one-hot encoding. + +```{r} +# Define the recipe for data preprocessing +resistance_recipe <- recipe(mo ~ ., data = data) %>% + step_corr(c(aminoglycosides(), betalactams()), threshold = 0.9) +resistance_recipe +``` + +**Explanation:** +- `step_mutate()` transforms resistance results (`R`) into binary variables (TRUE/FALSE). +- `step_dummy()` converts categorical organism (`mo`) names into one-hot encoded numerical features, making them compatible with the model. + +#### 2. Specifying the Model + +We define a logistic regression model since resistance prediction is a binary classification task. + +```{r} +# Specify a logistic regression model +logistic_model <- logistic_reg() %>% + set_engine("glm") # Use the Generalized Linear Model engine +logistic_model +``` + +**Explanation:** +- `logistic_reg()` sets up a logistic regression model. +- `set_engine("glm")` specifies the use of R's built-in GLM engine. + +#### 3. Building the Workflow + +We bundle the recipe and model together into a `workflow`, which organizes the entire modeling process. + +```{r} +# Combine the recipe and model into a workflow +resistance_workflow <- workflow() %>% + add_recipe(resistance_recipe) %>% # Add the preprocessing recipe + add_model(logistic_model) # Add the logistic regression model +resistance_workflow +``` + +--- + +### **Training and Evaluating the Model** + +To train the model, we split the data into training and testing sets. Then, we fit the workflow on the training set and evaluate its performance. + +```{r} +# Split data into training and testing sets +set.seed(123) # For reproducibility +data_split <- initial_split(data, prop = 0.8) # 80% training, 20% testing +training_data <- training(data_split) # Training set +testing_data <- testing(data_split) # Testing set + +# Fit the workflow to the training data +fitted_workflow <- resistance_workflow %>% + fit(training_data) # Train the model + +fitted_workflow +``` + +**Explanation:** +- `initial_split()` splits the data into training and testing sets. +- `fit()` trains the workflow on the training set. + +Next, we evaluate the model on the testing data. + +```{r} +# Make predictions on the testing set +predictions <- fitted_workflow %>% + predict(testing_data) # Generate predictions +probabilities <- fitted_workflow %>% + predict(testing_data, type = "prob") # Generate probabilities + +predictions <- predictions %>% + bind_cols(probabilities) %>% + bind_cols(testing_data) # Combine with true labels + +predictions + +# Evaluate model performance +metrics <- predictions %>% + metrics(truth = mo, estimate = .pred_class) # Calculate performance metrics + +metrics +``` + +**Explanation:** +- `predict()` generates predictions on the testing set. +- `metrics()` computes evaluation metrics like accuracy and AUC. + +It appears we can predict the Gram based on AMR results with a `r round(metrics$.estimate[1], 3)` accuracy. The ROC curve looks like: + +```{r} +predictions %>% + roc_curve(mo, `.pred_Gram-negative`) %>% + autoplot() +``` + +--- + +### **Conclusion** + +In this post, we demonstrated how to build a machine learning pipeline with the `tidymodels` framework and the `AMR` package. By combining selector functions like `aminoglycosides()` and `betalactams()` with `tidymodels`, we efficiently prepared data, trained a model, and evaluated its performance. + +This workflow is extensible to other antibiotic classes and resistance patterns, empowering users to analyse AMR data systematically and reproducibly. + +--- + + + THE PART HEREAFTER CONTAINS CONTENTS FROM FILE 'vignettes/EUCAST.Rmd': diff --git a/inst/tinytest/test-mdro.R b/inst/tinytest/test-mdro.R index 8dda697a..8abaaae3 100755 --- a/inst/tinytest/test-mdro.R +++ b/inst/tinytest/test-mdro.R @@ -45,7 +45,7 @@ expect_identical(class(outcome), c("ordered", "factor")) # example_isolates should have these finding using Dutch guidelines expect_equal( as.double(table(outcome)), - c(1994, 0, 6) + c(1977, 23, 0) ) expect_equal( diff --git a/man/mdro.Rd b/man/mdro.Rd index 7a9b8357..5e112647 100644 --- a/man/mdro.Rd +++ b/man/mdro.Rd @@ -23,6 +23,12 @@ mdro( x = NULL, guideline = "CMI2012", col_mo = NULL, + esbl = NA, + carbapenemase = NA, + mecA = NA, + mecC = NA, + vanA = NA, + vanB = NA, info = interactive(), pct_required_classes = 0.5, combine_SI = TRUE, @@ -35,13 +41,18 @@ custom_mdro_guideline(..., as_factor = TRUE) brmo(x = NULL, only_sir_columns = FALSE, ...) -mrgn(x = NULL, only_sir_columns = FALSE, ...) +mrgn(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...) -mdr_tb(x = NULL, only_sir_columns = FALSE, ...) +mdr_tb(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...) -mdr_cmi2012(x = NULL, only_sir_columns = FALSE, ...) +mdr_cmi2012(x = NULL, only_sir_columns = FALSE, verbose = FALSE, ...) -eucast_exceptional_phenotypes(x = NULL, only_sir_columns = FALSE, ...) +eucast_exceptional_phenotypes( + x = NULL, + only_sir_columns = FALSE, + verbose = FALSE, + ... +) } \arguments{ \item{x}{a \link{data.frame} with antibiotics columns, like \code{AMX} or \code{amox}. Can be left blank for automatic determination.} @@ -50,6 +61,18 @@ eucast_exceptional_phenotypes(x = NULL, only_sir_columns = FALSE, ...) \item{col_mo}{column name of the names or codes of the microorganisms (see \code{\link[=as.mo]{as.mo()}}) - the default is the first column of class \code{\link{mo}}. Values will be coerced using \code{\link[=as.mo]{as.mo()}}.} +\item{esbl}{\link{logical} values, or a column name containing logical values, indicating the presence of an ESBL gene (or production of its proteins)} + +\item{carbapenemase}{\link{logical} values, or a column name containing logical values, indicating the presence of a carbapenemase gene (or production of its proteins)} + +\item{mecA}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{mecA} gene (or production of its proteins)} + +\item{mecC}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{mecC} gene (or production of its proteins)} + +\item{vanA}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{vanA} gene (or production of its proteins)} + +\item{vanB}{\link{logical} values, or a column name containing logical values, indicating the presence of a \emph{vanB} gene (or production of its proteins)} + \item{info}{a \link{logical} to indicate whether progress should be printed to the console - the default is only print while in interactive sessions} \item{pct_required_classes}{minimal required percentage of antimicrobial classes that must be available per isolate, rounded down. For example, with the default guideline, 17 antimicrobial classes must be available for \emph{S. aureus}. Setting this \code{pct_required_classes} argument to \code{0.5} (default) means that for every \emph{S. aureus} isolate at least 8 different classes must be available. Any lower number of available classes will return \code{NA} for that isolate.} diff --git a/vignettes/AMR_with_tidymodels.Rmd b/vignettes/AMR_with_tidymodels.Rmd new file mode 100644 index 00000000..e3edd17e --- /dev/null +++ b/vignettes/AMR_with_tidymodels.Rmd @@ -0,0 +1,191 @@ +--- +title: "`AMR` with `tidymodels`" +output: + rmarkdown::html_vignette: + toc: true + toc_depth: 3 +vignette: > + %\VignetteIndexEntry{`AMR` with `tidymodels`} + %\VignetteEncoding{UTF-8} + %\VignetteEngine{knitr::rmarkdown} +editor_options: + chunk_output_type: console +--- + +```{r setup, include = FALSE, results = 'markup'} +knitr::opts_chunk$set( + warning = FALSE, + collapse = TRUE, + comment = "#>", + fig.width = 7.5, + fig.height = 5 +) +``` + +Antimicrobial resistance (AMR) is a global health crisis, and understanding resistance patterns is crucial for managing effective treatments. The `AMR` R package provides robust tools for analysing AMR data, including convenient antibiotic selector functions like `aminoglycosides()` and `betalactams()`. In this post, we will explore how to use the `tidymodels` framework to predict resistance patterns in the `example_isolates` dataset. + +By leveraging the power of `tidymodels` and the `AMR` package, we’ll build a reproducible machine learning workflow to predict resistance to two important antibiotic classes: aminoglycosides and beta-lactams. + +--- + +### **Objective** + +Our goal is to build a predictive model using the `tidymodels` framework to determine resistance patterns based on microbial data. We will: + +1. Preprocess data using the selector functions `aminoglycosides()` and `betalactams()`. +2. Define a logistic regression model for prediction. +3. Use a structured `tidymodels` workflow to preprocess, train, and evaluate the model. + +--- + +### **Data Preparation** + +We begin by loading the required libraries and preparing the `example_isolates` dataset from the `AMR` package. + +```{r} +# Load required libraries +library(tidymodels) # For machine learning workflows, and data manipulation (dplyr, tidyr, ...) +library(AMR) # For AMR data analysis + +# Load the example_isolates dataset +data("example_isolates") # Preloaded dataset with AMR results + +# Select relevant columns for prediction +data <- example_isolates %>% + # select AB results dynamically + select(mo, aminoglycosides(), betalactams()) %>% + # replace NAs with NI (not-interpretable) + mutate(across(where(is.sir), + ~replace_na(.x, "NI")), + # make factors of SIR columns + across(where(is.sir), + as.integer), + # get Gramstain of microorganisms + mo = as.factor(mo_gramstain(mo))) %>% + # drop NAs - the ones without a Gramstain (fungi, etc.) + drop_na() # %>% + # Cefepime is not reliable + #select(-FEP) +``` + +**Explanation:** +- `aminoglycosides()` and `betalactams()` dynamically select columns for antibiotics in these classes. +- `drop_na()` ensures the model receives complete cases for training. + +--- + +### **Defining the Workflow** + +We now define the `tidymodels` workflow, which consists of three steps: preprocessing, model specification, and fitting. + +#### 1. Preprocessing with a Recipe + +We create a recipe to preprocess the data for modelling. This includes: +- Encoding resistance results (`S`, `I`, `R`) as binary (resistant or not resistant). +- Converting microbial organism names (`mo`) into numerical features using one-hot encoding. + +```{r} +# Define the recipe for data preprocessing +resistance_recipe <- recipe(mo ~ ., data = data) %>% + step_corr(c(aminoglycosides(), betalactams()), threshold = 0.9) +resistance_recipe +``` + +**Explanation:** +- `step_mutate()` transforms resistance results (`R`) into binary variables (TRUE/FALSE). +- `step_dummy()` converts categorical organism (`mo`) names into one-hot encoded numerical features, making them compatible with the model. + +#### 2. Specifying the Model + +We define a logistic regression model since resistance prediction is a binary classification task. + +```{r} +# Specify a logistic regression model +logistic_model <- logistic_reg() %>% + set_engine("glm") # Use the Generalized Linear Model engine +logistic_model +``` + +**Explanation:** +- `logistic_reg()` sets up a logistic regression model. +- `set_engine("glm")` specifies the use of R's built-in GLM engine. + +#### 3. Building the Workflow + +We bundle the recipe and model together into a `workflow`, which organizes the entire modeling process. + +```{r} +# Combine the recipe and model into a workflow +resistance_workflow <- workflow() %>% + add_recipe(resistance_recipe) %>% # Add the preprocessing recipe + add_model(logistic_model) # Add the logistic regression model +resistance_workflow +``` + +--- + +### **Training and Evaluating the Model** + +To train the model, we split the data into training and testing sets. Then, we fit the workflow on the training set and evaluate its performance. + +```{r} +# Split data into training and testing sets +set.seed(123) # For reproducibility +data_split <- initial_split(data, prop = 0.8) # 80% training, 20% testing +training_data <- training(data_split) # Training set +testing_data <- testing(data_split) # Testing set + +# Fit the workflow to the training data +fitted_workflow <- resistance_workflow %>% + fit(training_data) # Train the model + +fitted_workflow +``` + +**Explanation:** +- `initial_split()` splits the data into training and testing sets. +- `fit()` trains the workflow on the training set. + +Next, we evaluate the model on the testing data. + +```{r} +# Make predictions on the testing set +predictions <- fitted_workflow %>% + predict(testing_data) # Generate predictions +probabilities <- fitted_workflow %>% + predict(testing_data, type = "prob") # Generate probabilities + +predictions <- predictions %>% + bind_cols(probabilities) %>% + bind_cols(testing_data) # Combine with true labels + +predictions + +# Evaluate model performance +metrics <- predictions %>% + metrics(truth = mo, estimate = .pred_class) # Calculate performance metrics + +metrics +``` + +**Explanation:** +- `predict()` generates predictions on the testing set. +- `metrics()` computes evaluation metrics like accuracy and AUC. + +It appears we can predict the Gram based on AMR results with a `r round(metrics$.estimate[1], 3)` accuracy. The ROC curve looks like: + +```{r} +predictions %>% + roc_curve(mo, `.pred_Gram-negative`) %>% + autoplot() +``` + +--- + +### **Conclusion** + +In this post, we demonstrated how to build a machine learning pipeline with the `tidymodels` framework and the `AMR` package. By combining selector functions like `aminoglycosides()` and `betalactams()` with `tidymodels`, we efficiently prepared data, trained a model, and evaluated its performance. + +This workflow is extensible to other antibiotic classes and resistance patterns, empowering users to analyse AMR data systematically and reproducibly. + +---