fix SIR interpretation for uti

2025-12-06 18:39:20 +01:00 · 2024-05-31 21:24:35 +02:00
parent 60c6c21e45
commit 7c1b564648
4 changed files with 70 additions and 36 deletions
--- a/2
+++ b/2
@@ -1,5 +1,5 @@
 Package: AMR
-Version: 2.1.1.9036
+Version: 2.1.1.9037
 Date: 2024-05-31
 Title: Antimicrobial Resistance Data Analysis
 Description: Functions to simplify and standardise antimicrobial resistance (AMR)
--- a/NEWS.md
+++ b/NEWS.md
@@ -1,4 +1,4 @@
-# AMR 2.1.1.9036
+# AMR 2.1.1.9037

 *(this beta version will eventually become v3.0. We're happy to reach a new major milestone soon, which will be all about the new One Health support!)*

@@ -21,7 +21,7 @@ This package now supports not only tools for AMR data analysis in clinical setti
 * Function `mo_group_members()` to retrieve the member microorganisms of a microorganism group. For example, `mo_group_members("Strep group C")` returns a vector of all microorganisms that are in that group.

 ## Changed
-* For SIR interpretation, it is now possible to use column names for argument `ab` and `mo`: `as.sir(..., ab = "column1", mo = "column2")`. This greatly improves the flexibility for users.
+* For SIR interpretation, it is now possible to use column names for argument `ab`, `mo`, and `uti`: `as.sir(..., ab = "column1", mo = "column2", uti = "column3")`. This greatly improves the flexibility for users.
 * For MICs:
  * Added as valid levels: 4096, 6 powers of 0.0625, and 5 powers of 192 (192, 384, 576, 768, 960)
  * Added new argument `keep_operators` to `as.mic()`. This can be `"all"` (default), `"none"`, or `"edges"`. This argument is also available in the new `rescale_mic()` and `scale_*_mic()` functions.
--- a/R/sir.R
+++ b/R/sir.R
@@ -31,17 +31,17 @@
 #'
 #' @description Clean up existing SIR values, or interpret minimum inhibitory concentration (MIC) values and disk diffusion diameters according to EUCAST or CLSI. [as.sir()] transforms the input to a new class [`sir`], which is an ordered [factor].
 #' 
-#' These breakpoints are currently available:
-#' - For **clinical microbiology** from EUCAST `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "human")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "human")$guideline)))` and CLSI `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "human")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "human")$guideline)))`;
-#' - For **veterinary microbiology** from EUCAST `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "animal")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "animal")$guideline)))` and CLSI `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "animal")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "animal")$guideline)))`;
-#' - ECOFFs (Epidemiological cut-off values) from EUCAST `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "ECOFF")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "ECOFF")$guideline)))` and CLSI `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "ECOFF")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "ECOFF")$guideline)))`.
+#' These breakpoints are currently implemented:
+#' - For **clinical microbiology**: EUCAST `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "human")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "human")$guideline)))` and CLSI `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "human")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "human")$guideline)))`;
+#' - For **veterinary microbiology**: EUCAST `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "animal")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "animal")$guideline)))` and CLSI `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "animal")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "animal")$guideline)))`;
+#' - ECOFFs (Epidemiological cut-off values): EUCAST `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "ECOFF")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "ECOFF")$guideline)))` and CLSI `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "ECOFF")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "ECOFF")$guideline)))`.
 #' 
 #' All breakpoints used for interpretation are available in our [clinical_breakpoints] data set.
 #' @rdname as.sir
 #' @param x vector of values (for class [`mic`]: MIC values in mg/L, for class [`disk`]: a disk diffusion radius in millimetres)
 #' @param mo a vector (or column name) with [character]s that can be coerced to valid microorganism codes with [as.mo()], can be left empty to determine it automatically
 #' @param ab a vector (or column name) with [character]s that can be coerced to a valid antimicrobial drug code with [as.ab()]
-#' @param uti (Urinary Tract Infection) A vector with [logical]s (`TRUE` or `FALSE`) to specify whether a UTI specific interpretation from the guideline should be chosen. For using [as.sir()] on a [data.frame], this can also be a column containing [logical]s or when left blank, the data set will be searched for a column 'specimen', and rows within this column containing 'urin' (such as 'urine', 'urina') will be regarded isolates from a UTI. See *Examples*.
+#' @param uti (Urinary Tract Infection) a vector (or column name) with [logical]s (`TRUE` or `FALSE`) to specify whether a UTI specific interpretation from the guideline should be chosen. For using [as.sir()] on a [data.frame], this can also be a column containing [logical]s or when left blank, the data set will be searched for a column 'specimen', and rows within this column containing 'urin' (such as 'urine', 'urina') will be regarded isolates from a UTI. See *Examples*.
 #' @inheritParams first_isolate
 #' @param guideline defaults to EUCAST `r  max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST")$guideline)))` (the latest implemented EUCAST guideline in the [AMR::clinical_breakpoints] data set), but can be set with the [package option][AMR-options] [`AMR_guideline`][AMR-options]. Currently supports EUCAST (`r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST")$guideline)))`) and CLSI (`r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI")$guideline)))`), see *Details*.
 #' @param conserve_capped_values a [logical] to indicate that MIC values starting with `">"` (but not `">="`) must always return "R" , and that MIC values starting with `"<"` (but not `"<="`) must always return "S"
@@ -64,9 +64,11 @@
 #'
 #' 2. For **interpreting minimum inhibitory concentration (MIC) values** according to EUCAST or CLSI. You must clean your MIC values first using [as.mic()], that also gives your columns the new data class [`mic`]. Also, be sure to have a column with microorganism names or codes. It will be found automatically, but can be set manually using the `mo` argument.
 #'    * Using `dplyr`, SIR interpretation can be done very easily with either:
-#'      ```
+#'      ```r
 #'      your_data %>% mutate_if(is.mic, as.sir)
 #'      your_data %>% mutate(across(where(is.mic), as.sir))
+#'      your_data %>% mutate_if(is.mic, as.sir, ab = "column_with_antibiotics", mo = "column_with_microorganisms")
+#'      your_data %>% mutate_if(is.mic, as.sir, ab = c("cipro", "ampicillin", ...), mo = c("E. coli", "K. pneumoniae", ...))
 #'      
 #'      # for veterinary breakpoints, also set `host`:
 #'      your_data %>% mutate_if(is.mic, as.sir, host = "column_with_animal_hosts", guideline = "CLSI")
@@ -74,9 +76,11 @@
 #'    * Operators like "<=" will be stripped before interpretation. When using `conserve_capped_values = TRUE`, an MIC value of e.g. ">2" will always return "R", even if the breakpoint according to the chosen guideline is ">=4". This is to prevent that capped values from raw laboratory data would not be treated conservatively. The default behaviour (`conserve_capped_values = FALSE`) considers ">2" to be lower than ">=4" and might in this case return "S" or "I".
 #' 3. For **interpreting disk diffusion diameters** according to EUCAST or CLSI. You must clean your disk zones first using [as.disk()], that also gives your columns the new data class [`disk`]. Also, be sure to have a column with microorganism names or codes. It will be found automatically, but can be set manually using the `mo` argument.
 #'    * Using `dplyr`, SIR interpretation can be done very easily with either:
-#'      ```
+#'      ```r
 #'      your_data %>% mutate_if(is.disk, as.sir)
 #'      your_data %>% mutate(across(where(is.disk), as.sir))
+#'      your_data %>% mutate_if(is.disk, as.sir, ab = "column_with_antibiotics", mo = "column_with_microorganisms")
+#'      your_data %>% mutate_if(is.disk, as.sir, ab = c("cipro", "ampicillin", ...), mo = c("E. coli", "K. pneumoniae", ...))
 #'      
 #'      # for veterinary breakpoints, also set `host`:
 #'      your_data %>% mutate_if(is.disk, as.sir, host = "column_with_animal_hosts", guideline = "CLSI")
@@ -543,7 +547,7 @@ as.sir.data.frame <- function(x,
  }
  if (!is.null(col_uti)) {
    if (is.logical(col_uti)) {
-      # already a [logical] vector as input
+      # already a logical vector as input
      if (length(col_uti) == 1) {
        uti <- rep(col_uti, NROW(x))
      } else {
@@ -763,10 +767,10 @@ as_sir_method <- function(method_short,
                          host,
                          ...) {
  meet_criteria(x, allow_NA = TRUE, .call_depth = -2)
-  meet_criteria(mo, allow_class = c("mo", "character"), allow_NULL = TRUE, .call_depth = -2)
-  meet_criteria(ab, allow_class = c("ab", "character"), .call_depth = -2)
+  meet_criteria(mo, allow_class = c("mo", "character"), has_length = c(1, length(x)), allow_NULL = TRUE, .call_depth = -2)
+  meet_criteria(ab, allow_class = c("ab", "character"), has_length = c(1, length(x)), .call_depth = -2)
  meet_criteria(guideline, allow_class = "character", has_length = 1, .call_depth = -2)
-  meet_criteria(uti, allow_class = "logical", has_length = c(1, length(x)), allow_NULL = TRUE, allow_NA = TRUE, .call_depth = -2)
+  meet_criteria(uti, allow_class = c("logical", "character"), has_length = c(1, length(x)), allow_NULL = TRUE, allow_NA = TRUE, .call_depth = -2)
  meet_criteria(conserve_capped_values, allow_class = "logical", has_length = 1, .call_depth = -2)
  meet_criteria(add_intrinsic_resistance, allow_class = "logical", has_length = 1, .call_depth = -2)
  meet_criteria(reference_data, allow_class = "data.frame", .call_depth = -2)
@@ -816,7 +820,7 @@ as_sir_method <- function(method_short,
  # get ab
  if (!is.null(current_df) && length(ab) == 1 && ab %in% colnames(current_df) && any(current_df[[ab]] %like% "[A-Z]", na.rm = TRUE)) {
    ab <- current_df[[ab]]
-  } else {
+  } else if (length(ab) != length(x)) {
    # for dplyr's across()
    cur_column_dplyr <- import_fn("cur_column", "dplyr", error_on_fail = FALSE)
    if (!is.null(cur_column_dplyr) && is.data.frame(current_df)) {
@@ -831,7 +835,7 @@ as_sir_method <- function(method_short,
  if (!is.null(current_df) && length(mo) == 1 && mo %in% colnames(current_df)) {
    mo_var_found <- paste0(" based on column '", font_bold(mo), "'")
    mo <- current_df[[mo]]
-  } else {
+  } else if (length(mo) != length(x)) {
    mo_var_found <- ""
    if (is.null(mo)) {
      tryCatch(
@@ -840,7 +844,7 @@ as_sir_method <- function(method_short,
          mo <- NULL
          try(
            {
-              mo <- suppressMessages(search_type_in_df(df, "mo"))
+              mo <- suppressMessages(search_type_in_df(df, "mo", add_col_prefix = FALSE))
            },
            silent = TRUE
          )
@@ -862,6 +866,32 @@ as_sir_method <- function(method_short,
      call = FALSE
    )
  }
+  
+  # get uti
+  if (!is.null(current_df) && length(uti) == 1 && uti %in% colnames(current_df)) {
+    uti <- current_df[[uti]]
+  } else if (length(uti) != length(x)) {
+    if (is.null(uti)) {
+      tryCatch(
+        {
+          df <- get_current_data(arg_name = "uti", call = -3) # will return an error if not found
+          uti <- NULL
+          try(
+            {
+              uti <- suppressMessages(search_type_in_df(df, "uti", add_col_prefix = FALSE))
+            },
+            silent = TRUE
+          )
+          if (!is.null(df) && !is.null(uti) && is.data.frame(df)) {
+            uti <- df[, uti, drop = TRUE]
+          }
+        },
+        error = function(e) {
+          uti <- NULL
+        }
+      )
+    }
+  }

  if (length(ab) == 1 && ab %like% paste0("as.", method_short)) {
    stop_("No unambiguous name was supplied about the antibiotic (argument `ab`). See ?as.sir.", call = FALSE)
@@ -899,6 +929,7 @@ as_sir_method <- function(method_short,
  if (length(uti) == 1) {
    uti <- rep(uti, length(x))
  }
+  uti[is.na(uti)] <- FALSE
  if (isTRUE(add_intrinsic_resistance) && guideline_coerced %unlike% "EUCAST") {
    if (message_not_thrown_before("as.sir", "intrinsic")) {
      warning_("in `as.sir()`: using 'add_intrinsic_resistance' is only useful when using EUCAST guidelines, since the rules for intrinsic resistance are based on EUCAST.")
@@ -918,7 +949,7 @@ as_sir_method <- function(method_short,
                                                    ")")
  # this intro text will also be printed in the progress bar if the `progress` package is installed
  intro_txt <- paste0("Interpreting ", method_long, ": ", ifelse(isTRUE(list(...)$is_data.frame), "column ", ""),
-                      ifelse(length(unique(agent_formatted)) == 1, unique(agent_formatted), paste0("for ", vector_and(ab, quotes = FALSE))),
+                      ifelse(length(unique(agent_formatted)) == 1, unique(agent_formatted), paste0("for ", vector_and(ab, quotes = FALSE, sort = FALSE))),
                      mo_var_found,
                      ifelse(identical(reference_data, AMR::clinical_breakpoints),
                             paste0(", ", font_bold(guideline_coerced)),
@@ -1026,7 +1057,7 @@ as_sir_method <- function(method_short,
    ab_current <- df_unique[i, "ab", drop = TRUE]
    host_current <- df_unique[i, "host", drop = TRUE]
    uti_current <- df_unique[i, "uti", drop = TRUE]
-    if (is.na(uti_current)) {
+    if (isFALSE(uti_current)) {
      # no preference, so no filter on UTIs
      rows <- which(df$mo == mo_current & df$ab == ab_current & df$host == host_current)
    } else {
@@ -1073,21 +1104,20 @@ as_sir_method <- function(method_short,
    breakpoints_current$host_index <- match(breakpoints_current$host, c("human", "ECOFF", AMR_env$host_preferred_order))
    
    # sort on host and taxonomic rank
-    # (this will prefer species breakpoints over order breakpoints)
-    if (is.na(unique(uti_current))) {
+    # (this will e.g. prefer 'species' breakpoints over 'order' breakpoints)
+    if (all(uti_current == FALSE, na.rm = TRUE)) {
      breakpoints_current <- breakpoints_current %pm>%
-        # this will put UTI = FALSE first, then UTI = TRUE, then UTI = NA
-        pm_arrange(host_index, rank_index, uti) # 'uti' is a column in data set 'clinical_breakpoints'  
-    } else if (unique(uti_current) == TRUE) {
+        # this will put UTI = FALSE first, then UTI = NA, then UTI = TRUE
+        pm_mutate(uti_index = ifelse(uti == FALSE, 1, 
+                                     ifelse(is.na(uti), 2,
+                                            3))) %pm>%
+        # be as specific as possible (i.e. prefer species over genus):
+        pm_arrange(host_index, rank_index, uti_index)
+    } else if (all(uti_current == TRUE, na.rm = TRUE)) {
      breakpoints_current <- breakpoints_current %pm>%
        subset(uti == TRUE) %pm>%
        # be as specific as possible (i.e. prefer species over genus):
        pm_arrange(host_index, rank_index)
-    } else if (unique(uti_current) == FALSE) {
-      breakpoints_current <- breakpoints_current %pm>%
-        subset(uti == FALSE) %pm>%
-        # be as specific as possible (i.e. prefer species over genus):
-        pm_arrange(host_index, rank_index)
    }
    
    if (NROW(breakpoints_current) == 0) {
--- a/man/as.sir.Rd
+++ b/man/as.sir.Rd
@@ -104,7 +104,7 @@ sir_interpretation_history(clean = FALSE)

 \item{guideline}{defaults to EUCAST 2023 (the latest implemented EUCAST guideline in the \link{clinical_breakpoints} data set), but can be set with the \link[=AMR-options]{package option} \code{\link[=AMR-options]{AMR_guideline}}. Currently supports EUCAST (2011-2023) and CLSI (2011-2023), see \emph{Details}.}

-\item{uti}{(Urinary Tract Infection) A vector with \link{logical}s (\code{TRUE} or \code{FALSE}) to specify whether a UTI specific interpretation from the guideline should be chosen. For using \code{\link[=as.sir]{as.sir()}} on a \link{data.frame}, this can also be a column containing \link{logical}s or when left blank, the data set will be searched for a column 'specimen', and rows within this column containing 'urin' (such as 'urine', 'urina') will be regarded isolates from a UTI. See \emph{Examples}.}
+\item{uti}{(Urinary Tract Infection) a vector (or column name) with \link{logical}s (\code{TRUE} or \code{FALSE}) to specify whether a UTI specific interpretation from the guideline should be chosen. For using \code{\link[=as.sir]{as.sir()}} on a \link{data.frame}, this can also be a column containing \link{logical}s or when left blank, the data set will be searched for a column 'specimen', and rows within this column containing 'urin' (such as 'urine', 'urina') will be regarded isolates from a UTI. See \emph{Examples}.}

 \item{conserve_capped_values}{a \link{logical} to indicate that MIC values starting with \code{">"} (but not \code{">="}) must always return "R" , and that MIC values starting with \code{"<"} (but not \code{"<="}) must always return "S"}

@@ -130,11 +130,11 @@ Ordered \link{factor} with new class \code{sir}
 \description{
 Clean up existing SIR values, or interpret minimum inhibitory concentration (MIC) values and disk diffusion diameters according to EUCAST or CLSI. \code{\link[=as.sir]{as.sir()}} transforms the input to a new class \code{\link{sir}}, which is an ordered \link{factor}.

-These breakpoints are currently available:
+These breakpoints are currently implemented:
 \itemize{
-\item For \strong{clinical microbiology} from EUCAST 2011-2023 and CLSI 2011-2023;
-\item For \strong{veterinary microbiology} from EUCAST 2021-2023 and CLSI 2019-2023;
-\item ECOFFs (Epidemiological cut-off values) from EUCAST 2020-2023 and CLSI 2022-2023.
+\item For \strong{clinical microbiology}: EUCAST 2011-2023 and CLSI 2011-2023;
+\item For \strong{veterinary microbiology}: EUCAST 2021-2023 and CLSI 2019-2023;
+\item ECOFFs (Epidemiological cut-off values): EUCAST 2020-2023 and CLSI 2022-2023.
 }

 All breakpoints used for interpretation are available in our \link{clinical_breakpoints} data set.
@@ -150,8 +150,10 @@ The \code{\link[=as.sir]{as.sir()}} function can work in four ways:
 \itemize{
 \item Using \code{dplyr}, SIR interpretation can be done very easily with either:

-\if{html}{\out{<div class="sourceCode">}}\preformatted{your_data \%>\% mutate_if(is.mic, as.sir)
+\if{html}{\out{<div class="sourceCode r">}}\preformatted{your_data \%>\% mutate_if(is.mic, as.sir)
 your_data \%>\% mutate(across(where(is.mic), as.sir))
+your_data \%>\% mutate_if(is.mic, as.sir, ab = "column_with_antibiotics", mo = "column_with_microorganisms")
+your_data \%>\% mutate_if(is.mic, as.sir, ab = c("cipro", "ampicillin", ...), mo = c("E. coli", "K. pneumoniae", ...))

 # for veterinary breakpoints, also set `host`:
 your_data \%>\% mutate_if(is.mic, as.sir, host = "column_with_animal_hosts", guideline = "CLSI")
@@ -162,8 +164,10 @@ your_data \%>\% mutate_if(is.mic, as.sir, host = "column_with_animal_hosts", gui
 \itemize{
 \item Using \code{dplyr}, SIR interpretation can be done very easily with either:

-\if{html}{\out{<div class="sourceCode">}}\preformatted{your_data \%>\% mutate_if(is.disk, as.sir)
+\if{html}{\out{<div class="sourceCode r">}}\preformatted{your_data \%>\% mutate_if(is.disk, as.sir)
 your_data \%>\% mutate(across(where(is.disk), as.sir))
+your_data \%>\% mutate_if(is.disk, as.sir, ab = "column_with_antibiotics", mo = "column_with_microorganisms")
+your_data \%>\% mutate_if(is.disk, as.sir, ab = c("cipro", "ampicillin", ...), mo = c("E. coli", "K. pneumoniae", ...))

 # for veterinary breakpoints, also set `host`:
 your_data \%>\% mutate_if(is.disk, as.sir, host = "column_with_animal_hosts", guideline = "CLSI")