From 7c1b56464871ad6bd727d0e47e6d8654340dfab3 Mon Sep 17 00:00:00 2001 From: Matthijs Berends Date: Fri, 31 May 2024 21:24:35 +0200 Subject: [PATCH] fix SIR interpretation for uti --- DESCRIPTION | 2 +- NEWS.md | 4 +-- R/sir.R | 82 +++++++++++++++++++++++++++++++++++---------------- man/as.sir.Rd | 18 ++++++----- 4 files changed, 70 insertions(+), 36 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index fe34c84a..a3758fa2 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,5 +1,5 @@ Package: AMR -Version: 2.1.1.9036 +Version: 2.1.1.9037 Date: 2024-05-31 Title: Antimicrobial Resistance Data Analysis Description: Functions to simplify and standardise antimicrobial resistance (AMR) diff --git a/NEWS.md b/NEWS.md index bf4f1220..08b4f064 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,4 +1,4 @@ -# AMR 2.1.1.9036 +# AMR 2.1.1.9037 *(this beta version will eventually become v3.0. We're happy to reach a new major milestone soon, which will be all about the new One Health support!)* @@ -21,7 +21,7 @@ This package now supports not only tools for AMR data analysis in clinical setti * Function `mo_group_members()` to retrieve the member microorganisms of a microorganism group. For example, `mo_group_members("Strep group C")` returns a vector of all microorganisms that are in that group. ## Changed -* For SIR interpretation, it is now possible to use column names for argument `ab` and `mo`: `as.sir(..., ab = "column1", mo = "column2")`. This greatly improves the flexibility for users. +* For SIR interpretation, it is now possible to use column names for argument `ab`, `mo`, and `uti`: `as.sir(..., ab = "column1", mo = "column2", uti = "column3")`. This greatly improves the flexibility for users. * For MICs: * Added as valid levels: 4096, 6 powers of 0.0625, and 5 powers of 192 (192, 384, 576, 768, 960) * Added new argument `keep_operators` to `as.mic()`. This can be `"all"` (default), `"none"`, or `"edges"`. This argument is also available in the new `rescale_mic()` and `scale_*_mic()` functions. diff --git a/R/sir.R b/R/sir.R index 835c6c51..e9a34c3b 100755 --- a/R/sir.R +++ b/R/sir.R @@ -31,17 +31,17 @@ #' #' @description Clean up existing SIR values, or interpret minimum inhibitory concentration (MIC) values and disk diffusion diameters according to EUCAST or CLSI. [as.sir()] transforms the input to a new class [`sir`], which is an ordered [factor]. #' -#' These breakpoints are currently available: -#' - For **clinical microbiology** from EUCAST `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "human")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "human")$guideline)))` and CLSI `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "human")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "human")$guideline)))`; -#' - For **veterinary microbiology** from EUCAST `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "animal")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "animal")$guideline)))` and CLSI `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "animal")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "animal")$guideline)))`; -#' - ECOFFs (Epidemiological cut-off values) from EUCAST `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "ECOFF")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "ECOFF")$guideline)))` and CLSI `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "ECOFF")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "ECOFF")$guideline)))`. +#' These breakpoints are currently implemented: +#' - For **clinical microbiology**: EUCAST `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "human")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "human")$guideline)))` and CLSI `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "human")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "human")$guideline)))`; +#' - For **veterinary microbiology**: EUCAST `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "animal")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "animal")$guideline)))` and CLSI `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "animal")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "animal")$guideline)))`; +#' - ECOFFs (Epidemiological cut-off values): EUCAST `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "ECOFF")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "ECOFF")$guideline)))` and CLSI `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "ECOFF")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "ECOFF")$guideline)))`. #' #' All breakpoints used for interpretation are available in our [clinical_breakpoints] data set. #' @rdname as.sir #' @param x vector of values (for class [`mic`]: MIC values in mg/L, for class [`disk`]: a disk diffusion radius in millimetres) #' @param mo a vector (or column name) with [character]s that can be coerced to valid microorganism codes with [as.mo()], can be left empty to determine it automatically #' @param ab a vector (or column name) with [character]s that can be coerced to a valid antimicrobial drug code with [as.ab()] -#' @param uti (Urinary Tract Infection) A vector with [logical]s (`TRUE` or `FALSE`) to specify whether a UTI specific interpretation from the guideline should be chosen. For using [as.sir()] on a [data.frame], this can also be a column containing [logical]s or when left blank, the data set will be searched for a column 'specimen', and rows within this column containing 'urin' (such as 'urine', 'urina') will be regarded isolates from a UTI. See *Examples*. +#' @param uti (Urinary Tract Infection) a vector (or column name) with [logical]s (`TRUE` or `FALSE`) to specify whether a UTI specific interpretation from the guideline should be chosen. For using [as.sir()] on a [data.frame], this can also be a column containing [logical]s or when left blank, the data set will be searched for a column 'specimen', and rows within this column containing 'urin' (such as 'urine', 'urina') will be regarded isolates from a UTI. See *Examples*. #' @inheritParams first_isolate #' @param guideline defaults to EUCAST `r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST")$guideline)))` (the latest implemented EUCAST guideline in the [AMR::clinical_breakpoints] data set), but can be set with the [package option][AMR-options] [`AMR_guideline`][AMR-options]. Currently supports EUCAST (`r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST")$guideline)))`) and CLSI (`r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI")$guideline)))`), see *Details*. #' @param conserve_capped_values a [logical] to indicate that MIC values starting with `">"` (but not `">="`) must always return "R" , and that MIC values starting with `"<"` (but not `"<="`) must always return "S" @@ -64,9 +64,11 @@ #' #' 2. For **interpreting minimum inhibitory concentration (MIC) values** according to EUCAST or CLSI. You must clean your MIC values first using [as.mic()], that also gives your columns the new data class [`mic`]. Also, be sure to have a column with microorganism names or codes. It will be found automatically, but can be set manually using the `mo` argument. #' * Using `dplyr`, SIR interpretation can be done very easily with either: -#' ``` +#' ```r #' your_data %>% mutate_if(is.mic, as.sir) #' your_data %>% mutate(across(where(is.mic), as.sir)) +#' your_data %>% mutate_if(is.mic, as.sir, ab = "column_with_antibiotics", mo = "column_with_microorganisms") +#' your_data %>% mutate_if(is.mic, as.sir, ab = c("cipro", "ampicillin", ...), mo = c("E. coli", "K. pneumoniae", ...)) #' #' # for veterinary breakpoints, also set `host`: #' your_data %>% mutate_if(is.mic, as.sir, host = "column_with_animal_hosts", guideline = "CLSI") @@ -74,9 +76,11 @@ #' * Operators like "<=" will be stripped before interpretation. When using `conserve_capped_values = TRUE`, an MIC value of e.g. ">2" will always return "R", even if the breakpoint according to the chosen guideline is ">=4". This is to prevent that capped values from raw laboratory data would not be treated conservatively. The default behaviour (`conserve_capped_values = FALSE`) considers ">2" to be lower than ">=4" and might in this case return "S" or "I". #' 3. For **interpreting disk diffusion diameters** according to EUCAST or CLSI. You must clean your disk zones first using [as.disk()], that also gives your columns the new data class [`disk`]. Also, be sure to have a column with microorganism names or codes. It will be found automatically, but can be set manually using the `mo` argument. #' * Using `dplyr`, SIR interpretation can be done very easily with either: -#' ``` +#' ```r #' your_data %>% mutate_if(is.disk, as.sir) #' your_data %>% mutate(across(where(is.disk), as.sir)) +#' your_data %>% mutate_if(is.disk, as.sir, ab = "column_with_antibiotics", mo = "column_with_microorganisms") +#' your_data %>% mutate_if(is.disk, as.sir, ab = c("cipro", "ampicillin", ...), mo = c("E. coli", "K. pneumoniae", ...)) #' #' # for veterinary breakpoints, also set `host`: #' your_data %>% mutate_if(is.disk, as.sir, host = "column_with_animal_hosts", guideline = "CLSI") @@ -543,7 +547,7 @@ as.sir.data.frame <- function(x, } if (!is.null(col_uti)) { if (is.logical(col_uti)) { - # already a [logical] vector as input + # already a logical vector as input if (length(col_uti) == 1) { uti <- rep(col_uti, NROW(x)) } else { @@ -763,10 +767,10 @@ as_sir_method <- function(method_short, host, ...) { meet_criteria(x, allow_NA = TRUE, .call_depth = -2) - meet_criteria(mo, allow_class = c("mo", "character"), allow_NULL = TRUE, .call_depth = -2) - meet_criteria(ab, allow_class = c("ab", "character"), .call_depth = -2) + meet_criteria(mo, allow_class = c("mo", "character"), has_length = c(1, length(x)), allow_NULL = TRUE, .call_depth = -2) + meet_criteria(ab, allow_class = c("ab", "character"), has_length = c(1, length(x)), .call_depth = -2) meet_criteria(guideline, allow_class = "character", has_length = 1, .call_depth = -2) - meet_criteria(uti, allow_class = "logical", has_length = c(1, length(x)), allow_NULL = TRUE, allow_NA = TRUE, .call_depth = -2) + meet_criteria(uti, allow_class = c("logical", "character"), has_length = c(1, length(x)), allow_NULL = TRUE, allow_NA = TRUE, .call_depth = -2) meet_criteria(conserve_capped_values, allow_class = "logical", has_length = 1, .call_depth = -2) meet_criteria(add_intrinsic_resistance, allow_class = "logical", has_length = 1, .call_depth = -2) meet_criteria(reference_data, allow_class = "data.frame", .call_depth = -2) @@ -816,7 +820,7 @@ as_sir_method <- function(method_short, # get ab if (!is.null(current_df) && length(ab) == 1 && ab %in% colnames(current_df) && any(current_df[[ab]] %like% "[A-Z]", na.rm = TRUE)) { ab <- current_df[[ab]] - } else { + } else if (length(ab) != length(x)) { # for dplyr's across() cur_column_dplyr <- import_fn("cur_column", "dplyr", error_on_fail = FALSE) if (!is.null(cur_column_dplyr) && is.data.frame(current_df)) { @@ -831,7 +835,7 @@ as_sir_method <- function(method_short, if (!is.null(current_df) && length(mo) == 1 && mo %in% colnames(current_df)) { mo_var_found <- paste0(" based on column '", font_bold(mo), "'") mo <- current_df[[mo]] - } else { + } else if (length(mo) != length(x)) { mo_var_found <- "" if (is.null(mo)) { tryCatch( @@ -840,7 +844,7 @@ as_sir_method <- function(method_short, mo <- NULL try( { - mo <- suppressMessages(search_type_in_df(df, "mo")) + mo <- suppressMessages(search_type_in_df(df, "mo", add_col_prefix = FALSE)) }, silent = TRUE ) @@ -862,6 +866,32 @@ as_sir_method <- function(method_short, call = FALSE ) } + + # get uti + if (!is.null(current_df) && length(uti) == 1 && uti %in% colnames(current_df)) { + uti <- current_df[[uti]] + } else if (length(uti) != length(x)) { + if (is.null(uti)) { + tryCatch( + { + df <- get_current_data(arg_name = "uti", call = -3) # will return an error if not found + uti <- NULL + try( + { + uti <- suppressMessages(search_type_in_df(df, "uti", add_col_prefix = FALSE)) + }, + silent = TRUE + ) + if (!is.null(df) && !is.null(uti) && is.data.frame(df)) { + uti <- df[, uti, drop = TRUE] + } + }, + error = function(e) { + uti <- NULL + } + ) + } + } if (length(ab) == 1 && ab %like% paste0("as.", method_short)) { stop_("No unambiguous name was supplied about the antibiotic (argument `ab`). See ?as.sir.", call = FALSE) @@ -899,6 +929,7 @@ as_sir_method <- function(method_short, if (length(uti) == 1) { uti <- rep(uti, length(x)) } + uti[is.na(uti)] <- FALSE if (isTRUE(add_intrinsic_resistance) && guideline_coerced %unlike% "EUCAST") { if (message_not_thrown_before("as.sir", "intrinsic")) { warning_("in `as.sir()`: using 'add_intrinsic_resistance' is only useful when using EUCAST guidelines, since the rules for intrinsic resistance are based on EUCAST.") @@ -918,7 +949,7 @@ as_sir_method <- function(method_short, ")") # this intro text will also be printed in the progress bar if the `progress` package is installed intro_txt <- paste0("Interpreting ", method_long, ": ", ifelse(isTRUE(list(...)$is_data.frame), "column ", ""), - ifelse(length(unique(agent_formatted)) == 1, unique(agent_formatted), paste0("for ", vector_and(ab, quotes = FALSE))), + ifelse(length(unique(agent_formatted)) == 1, unique(agent_formatted), paste0("for ", vector_and(ab, quotes = FALSE, sort = FALSE))), mo_var_found, ifelse(identical(reference_data, AMR::clinical_breakpoints), paste0(", ", font_bold(guideline_coerced)), @@ -1026,7 +1057,7 @@ as_sir_method <- function(method_short, ab_current <- df_unique[i, "ab", drop = TRUE] host_current <- df_unique[i, "host", drop = TRUE] uti_current <- df_unique[i, "uti", drop = TRUE] - if (is.na(uti_current)) { + if (isFALSE(uti_current)) { # no preference, so no filter on UTIs rows <- which(df$mo == mo_current & df$ab == ab_current & df$host == host_current) } else { @@ -1073,21 +1104,20 @@ as_sir_method <- function(method_short, breakpoints_current$host_index <- match(breakpoints_current$host, c("human", "ECOFF", AMR_env$host_preferred_order)) # sort on host and taxonomic rank - # (this will prefer species breakpoints over order breakpoints) - if (is.na(unique(uti_current))) { + # (this will e.g. prefer 'species' breakpoints over 'order' breakpoints) + if (all(uti_current == FALSE, na.rm = TRUE)) { breakpoints_current <- breakpoints_current %pm>% - # this will put UTI = FALSE first, then UTI = TRUE, then UTI = NA - pm_arrange(host_index, rank_index, uti) # 'uti' is a column in data set 'clinical_breakpoints' - } else if (unique(uti_current) == TRUE) { + # this will put UTI = FALSE first, then UTI = NA, then UTI = TRUE + pm_mutate(uti_index = ifelse(uti == FALSE, 1, + ifelse(is.na(uti), 2, + 3))) %pm>% + # be as specific as possible (i.e. prefer species over genus): + pm_arrange(host_index, rank_index, uti_index) + } else if (all(uti_current == TRUE, na.rm = TRUE)) { breakpoints_current <- breakpoints_current %pm>% subset(uti == TRUE) %pm>% # be as specific as possible (i.e. prefer species over genus): pm_arrange(host_index, rank_index) - } else if (unique(uti_current) == FALSE) { - breakpoints_current <- breakpoints_current %pm>% - subset(uti == FALSE) %pm>% - # be as specific as possible (i.e. prefer species over genus): - pm_arrange(host_index, rank_index) } if (NROW(breakpoints_current) == 0) { diff --git a/man/as.sir.Rd b/man/as.sir.Rd index 1a02d1b5..39270a3e 100644 --- a/man/as.sir.Rd +++ b/man/as.sir.Rd @@ -104,7 +104,7 @@ sir_interpretation_history(clean = FALSE) \item{guideline}{defaults to EUCAST 2023 (the latest implemented EUCAST guideline in the \link{clinical_breakpoints} data set), but can be set with the \link[=AMR-options]{package option} \code{\link[=AMR-options]{AMR_guideline}}. Currently supports EUCAST (2011-2023) and CLSI (2011-2023), see \emph{Details}.} -\item{uti}{(Urinary Tract Infection) A vector with \link{logical}s (\code{TRUE} or \code{FALSE}) to specify whether a UTI specific interpretation from the guideline should be chosen. For using \code{\link[=as.sir]{as.sir()}} on a \link{data.frame}, this can also be a column containing \link{logical}s or when left blank, the data set will be searched for a column 'specimen', and rows within this column containing 'urin' (such as 'urine', 'urina') will be regarded isolates from a UTI. See \emph{Examples}.} +\item{uti}{(Urinary Tract Infection) a vector (or column name) with \link{logical}s (\code{TRUE} or \code{FALSE}) to specify whether a UTI specific interpretation from the guideline should be chosen. For using \code{\link[=as.sir]{as.sir()}} on a \link{data.frame}, this can also be a column containing \link{logical}s or when left blank, the data set will be searched for a column 'specimen', and rows within this column containing 'urin' (such as 'urine', 'urina') will be regarded isolates from a UTI. See \emph{Examples}.} \item{conserve_capped_values}{a \link{logical} to indicate that MIC values starting with \code{">"} (but not \code{">="}) must always return "R" , and that MIC values starting with \code{"<"} (but not \code{"<="}) must always return "S"} @@ -130,11 +130,11 @@ Ordered \link{factor} with new class \code{sir} \description{ Clean up existing SIR values, or interpret minimum inhibitory concentration (MIC) values and disk diffusion diameters according to EUCAST or CLSI. \code{\link[=as.sir]{as.sir()}} transforms the input to a new class \code{\link{sir}}, which is an ordered \link{factor}. -These breakpoints are currently available: +These breakpoints are currently implemented: \itemize{ -\item For \strong{clinical microbiology} from EUCAST 2011-2023 and CLSI 2011-2023; -\item For \strong{veterinary microbiology} from EUCAST 2021-2023 and CLSI 2019-2023; -\item ECOFFs (Epidemiological cut-off values) from EUCAST 2020-2023 and CLSI 2022-2023. +\item For \strong{clinical microbiology}: EUCAST 2011-2023 and CLSI 2011-2023; +\item For \strong{veterinary microbiology}: EUCAST 2021-2023 and CLSI 2019-2023; +\item ECOFFs (Epidemiological cut-off values): EUCAST 2020-2023 and CLSI 2022-2023. } All breakpoints used for interpretation are available in our \link{clinical_breakpoints} data set. @@ -150,8 +150,10 @@ The \code{\link[=as.sir]{as.sir()}} function can work in four ways: \itemize{ \item Using \code{dplyr}, SIR interpretation can be done very easily with either: -\if{html}{\out{
}}\preformatted{your_data \%>\% mutate_if(is.mic, as.sir) +\if{html}{\out{
}}\preformatted{your_data \%>\% mutate_if(is.mic, as.sir) your_data \%>\% mutate(across(where(is.mic), as.sir)) +your_data \%>\% mutate_if(is.mic, as.sir, ab = "column_with_antibiotics", mo = "column_with_microorganisms") +your_data \%>\% mutate_if(is.mic, as.sir, ab = c("cipro", "ampicillin", ...), mo = c("E. coli", "K. pneumoniae", ...)) # for veterinary breakpoints, also set `host`: your_data \%>\% mutate_if(is.mic, as.sir, host = "column_with_animal_hosts", guideline = "CLSI") @@ -162,8 +164,10 @@ your_data \%>\% mutate_if(is.mic, as.sir, host = "column_with_animal_hosts", gui \itemize{ \item Using \code{dplyr}, SIR interpretation can be done very easily with either: -\if{html}{\out{
}}\preformatted{your_data \%>\% mutate_if(is.disk, as.sir) +\if{html}{\out{
}}\preformatted{your_data \%>\% mutate_if(is.disk, as.sir) your_data \%>\% mutate(across(where(is.disk), as.sir)) +your_data \%>\% mutate_if(is.disk, as.sir, ab = "column_with_antibiotics", mo = "column_with_microorganisms") +your_data \%>\% mutate_if(is.disk, as.sir, ab = c("cipro", "ampicillin", ...), mo = c("E. coli", "K. pneumoniae", ...)) # for veterinary breakpoints, also set `host`: your_data \%>\% mutate_if(is.disk, as.sir, host = "column_with_animal_hosts", guideline = "CLSI")