(v2.1.1.9043) fix sir translation with as.double

2025-08-19 07:33:16 +02:00 · 2024-06-10 10:34:45 +02:00
parent a3071cf58b
commit 31207952d3
10 changed files with 68 additions and 38 deletions
--- a/4
+++ b/4
@@ -1,6 +1,6 @@
 Package: AMR
-Version: 2.1.1.9042
-Date: 2024-06-09
+Version: 2.1.1.9043
+Date: 2024-06-10
 Title: Antimicrobial Resistance Data Analysis
 Description: Functions to simplify and standardise antimicrobial resistance (AMR)
  data analysis and to work with microbial and antimicrobial properties by
--- a/1
+++ b/1
@@ -39,6 +39,7 @@ S3method(as.data.frame,av)
 S3method(as.data.frame,mic)
 S3method(as.data.frame,mo)
 S3method(as.double,mic)
+S3method(as.double,sir)
 S3method(as.list,custom_eucast_rules)
 S3method(as.list,custom_mdro_guideline)
 S3method(as.list,mic)
--- a/NEWS.md
+++ b/NEWS.md
@@ -1,4 +1,4 @@
-# AMR 2.1.1.9042
+# AMR 2.1.1.9043

 *(this beta version will eventually become v3.0. We're happy to reach a new major milestone soon, which will be all about the new One Health support!)*

@@ -15,7 +15,7 @@ This package now supports not only tools for AMR data analysis in clinical setti
  * The `antibiotics` data set contains all veterinary antibiotics, such as pradofloxacin and enrofloxacin. All WHOCC codes for veterinary use have been added as well.
  * `ab_atc()` now supports ATC codes of veterinary antibiotics (that all start with "Q")
  * `ab_url()` now supports retrieving the WHOCC url of their ATCvet pages
-* `as.sir()` now returns additional factor levels "N" for non-interpretable and "SDD" for susceptible dose-dependent. Users can now set their own criteria (using regular expressions) as to what should be considered S, I, R, SDD, and N.
+* `as.sir()` now brings additional factor levels: "N" for non-interpretable and "SDD" for susceptible dose-dependent. Users can now set their own criteria (using regular expressions) as to what should be considered S, I, R, SDD, and N. Also, to get quantitative values, `as.double()` or a `sir` object will return 1 for S, 2 for SDD/I, and 3 for R (N will become `NA`). Other functions using `sir` classes (e.g., `summary()`) are updated to reflect the change to contain N and SDD.
 * The function group `scale_*_mic()`, namely: `scale_x_mic()`, `scale_y_mic()`, `scale_colour_mic()` and `scale_fill_mic()`. They are advanced ggplot2 extensions to allow easy plotting of MIC values. They allow for manual range definition and plotting missing intermediate log2 levels.
 * Function `rescale_mic()`, which allows to rescale MIC values to a manually set range. This is the powerhouse behind the `scale_*_mic()` functions, but it can be used by users directly to e.g. compare equality in MIC distributions by rescaling them to the same range first.
 * Function `mo_group_members()` to retrieve the member microorganisms of a microorganism group. For example, `mo_group_members("Strep group C")` returns a vector of all microorganisms that are in that group.
@@ -39,6 +39,7 @@ This package now supports not only tools for AMR data analysis in clinical setti
 * Fix for mapping 'high level' antibiotics in `as.ab()` (amphotericin B-high, gentamicin-high, kanamycin-high, streptomycin-high, tobramycin-high)
 * Improved overall algorithm of `as.ab()` for better performance and accuracy
 * When using antibiotic selectors such as `aminoglycosides()` that exclude non-treatable drugs like gentamicin-high, the function now always returns a warning that these can be included using `only_treatable = FALSE`
+* Intermediate log2 levels used for MIC plotting are now more common values instead of following a strict dilution range

 ## Other
 * Added Jordan Stull, Matthew Saab, and Javier Sanchez as contributors, to thank them for their valuable input
--- a/R/export_biosample.R
+++ b/R/export_biosample.R
@@ -33,12 +33,12 @@
 #' @param x a data set 
 #' @param filename a character string specifying the file name
 #' @param type a character string specifying the type of data set, either "pathogen MIC" or "beta-lactamase MIC", see <https://www.ncbi.nlm.nih.gov/biosample/docs/>
-#' @param save_as_xlsx
-export_biosample <- function(x,
-                             filename = paste0("biosample_", format(Sys.time(), "%Y-%m-%d-%H%M%S"), ".xlsx"),
-                             type = "pathogen MIC",
-                             columns = where(is.mic),
-                             save_as_xlsx = TRUE) {
+#' @keywords internal
+export_ncbi_biosample <- function(x,
+                                  filename = paste0("biosample_", format(Sys.time(), "%Y-%m-%d-%H%M%S"), ".xlsx"),
+                                  type = "pathogen MIC",
+                                  columns = where(is.mic),
+                                  save_as_xlsx = TRUE) {
  meet_criteria(x, allow_class = "data.frame") # also checks dimensions to be >0
  meet_criteria(filename, allow_class = "character", has_length = 1)
  meet_criteria(type, allow_class = "character", has_length = 1, is_in = c("pathogen MIC", "beta-lactamase MIC"))
--- a/R/sir.R
+++ b/R/sir.R
@@ -29,7 +29,7 @@

 #' Translate MIC and Disk Diffusion to SIR, or Clean Existing SIR Data
 #'
-#' @description Clean up existing SIR values, or interpret minimum inhibitory concentration (MIC) values and disk diffusion diameters according to EUCAST or CLSI. [as.sir()] transforms the input to a new class [`sir`], which is an ordered [factor].
+#' @description Clean up existing SIR values, or interpret minimum inhibitory concentration (MIC) values and disk diffusion diameters according to EUCAST or CLSI. [as.sir()] transforms the input to a new class [`sir`], which is an ordered [factor] containing the levels `S`, `SDD`, `I`, `R`, `N`.
 #' 
 #' These breakpoints are currently implemented:
 #' - For **clinical microbiology**: EUCAST `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "human")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "EUCAST" & type == "human")$guideline)))` and CLSI `r min(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "human")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(AMR::clinical_breakpoints, guideline %like% "CLSI" & type == "human")$guideline)))`;
@@ -123,8 +123,10 @@
 #' ### Other
 #'
 #' The function [is.sir()] detects if the input contains class `sir`. If the input is a [data.frame], it iterates over all columns and returns a [logical] vector.
+#' 
+#' The base R function [as.double()] can be used to retrieve quantitative values from a `sir` object: `"S"` = 1, `"I"`/`"SDD"` = 2, `"R"` = 3. All other values are rendered `NA` . **Note:** Do not use `as.integer()`, since that (because of how R works internally) will return the factor level indices, and not these aforementioned quantitative values.
 #'
-#' The function [is_sir_eligible()] returns `TRUE` when a columns contains at most 5% invalid antimicrobial interpretations (not S and/or I and/or R and/or N and/or SDD), and `FALSE` otherwise. The threshold of 5% can be set with the `threshold` argument. If the input is a [data.frame], it iterates over all columns and returns a [logical] vector.
+#' The function [is_sir_eligible()] returns `TRUE` when a column contains at most 5% invalid antimicrobial interpretations (not S and/or I and/or R and/or N and/or SDD), and `FALSE` otherwise. The threshold of 5% can be set with the `threshold` argument. If the input is a [data.frame], it iterates over all columns and returns a [logical] vector.
 #' @section Interpretation of SIR:
 #' In 2019, the European Committee on Antimicrobial Susceptibility Testing (EUCAST) has decided to change the definitions of susceptibility testing categories S, I, and R as shown below (<https://www.eucast.org/newsiandr>):
 #'
@@ -224,7 +226,12 @@
 #' is.sir(sir_data)
 #' plot(sir_data) # for percentages
 #' barplot(sir_data) # for frequencies
-#'
+#' 
+#' # as common in R, you can use as.integer() to return factor indices:
+#' as.integer(as.sir(c("S", "SDD", "I", "R", "N", NA)))
+#' # but for computational use, as.double() will return 1 for S, 2 for I/SDD, and 3 for R:
+#' as.double(as.sir(c("S", "SDD", "I", "R", "N", NA)))
+#' 
 #' # the dplyr way
 #' if (require("dplyr")) {
 #'   example_isolates %>%
@@ -1394,6 +1401,17 @@ print.sir <- function(x, ...) {
  print(as.character(x), quote = FALSE)
 }

+
+#' @method as.double sir
+#' @export
+as.double.sir <- function(x, ...) {
+  dbls <- rep(NA_real_, length(x))
+  dbls[x == "S"] <- 1
+  dbls[x %in% c("SDD", "I")] <- 2
+  dbls[x == "R"] <- 3
+  dbls
+}
+
 #' @method droplevels sir
 #' @export
 #' @noRd
@@ -1410,8 +1428,10 @@ summary.sir <- function(object, ...) {
  x <- object
  n <- sum(!is.na(x))
  S <- sum(x == "S", na.rm = TRUE)
+  SDD <- sum(x == "SDD", na.rm = TRUE)
  I <- sum(x == "I", na.rm = TRUE)
  R <- sum(x == "R", na.rm = TRUE)
+  N <- sum(x == "N", na.rm = TRUE)
  pad <- function(x) {
    if (is.na(x)) {
      return("??")
@@ -1426,10 +1446,11 @@ summary.sir <- function(object, ...) {
  }
  value <- c(
    "Class" = "sir",
+    "%S" = paste0(pad(percentage(S / n, digits = 1)), " (n=", S, ")"),
+    "%SDD" = paste0(pad(percentage(SDD / n, digits = 1)), " (n=", SDD, ")"),
+    "%I" = paste0(pad(percentage(I / n, digits = 1)), " (n=", I, ")"),
    "%R" = paste0(pad(percentage(R / n, digits = 1)), " (n=", R, ")"),
-    "%SI" = paste0(pad(percentage((S + I) / n, digits = 1)), " (n=", S + I, ")"),
-    "- %S" = paste0(pad(percentage(S / n, digits = 1)), " (n=", S, ")"),
-    "- %I" = paste0(pad(percentage(I / n, digits = 1)), " (n=", I, ")")
+    "%N" = paste0(pad(percentage(N / n, digits = 1)), " (n=", N, ")")
  )
  class(value) <- c("summaryDefault", "table")
  value
--- a/R/sir_calc.R
+++ b/R/sir_calc.R
@@ -135,20 +135,13 @@ sir_calc <- function(...,

    x_transposed <- as.list(as.data.frame(t(x), stringsAsFactors = FALSE))
    if (isTRUE(only_all_tested)) {
-      get_integers <- function(x) {
-        ints <- rep(NA_integer_, length(x))
-        ints[x == "S"] <- 1L
-        ints[x %in% c("SDD", "I")] <- 2L
-        ints[x == "R"] <- 3L
-        ints
-      }
      # no NAs in any column
      y <- apply(
-        X = as.data.frame(lapply(x, get_integers), stringsAsFactors = FALSE),
+        X = as.data.frame(lapply(x, as.double), stringsAsFactors = FALSE),
        MARGIN = 1,
        FUN = min
      )
-      numerator <- sum(!is.na(y) & y %in% get_integers(ab_result), na.rm = TRUE)
+      numerator <- sum(!is.na(y) & y %in% as.double(ab_result), na.rm = TRUE)
      denominator <- sum(vapply(FUN.VALUE = logical(1), x_transposed, function(y) !(anyNA(y))))
    } else {
      # may contain NAs in any column
@@ -364,7 +357,11 @@ sir_calc_df <- function(type, # "proportion", "count" or "both"
  } else {
    # don't use as.sir() here, as it would add the class 'sir' and we would like
    # the same data structure as output, regardless of input
-    out$interpretation <- factor(out$interpretation, levels = c("S", "SDD", "I", "R", "N"), ordered = TRUE)
+    if (out$value[out$interpretation == "SDD"] > 0) {
+      out$interpretation <- factor(out$interpretation, levels = c("S", "SDD", "I", "R"), ordered = TRUE)  
+    } else {
+      out$interpretation <- factor(out$interpretation, levels = c("S", "I", "R"), ordered = TRUE)
+    }
  }
  
  out <- out[!is.na(out$interpretation), , drop = FALSE]
--- a/_pkgdown.yml
+++ b/_pkgdown.yml
@@ -170,7 +170,7 @@ reference:
    - "`atc_online_property`"
    - "`add_custom_antimicrobials`"

-  - title: "Preparing data"
+  - title: "Preparing data: antimicrobial results"
    desc: >
      With `as.mic()` and `as.disk()` you can transform your raw input to valid MIC or disk diffusion values.
      Use `as.sir()` for cleaning raw data to let it only contain "R", "I" and "S", or to interpret MIC or disk diffusion values as SIR based on the lastest EUCAST and CLSI guidelines.
@@ -217,7 +217,7 @@ reference:
    - "`ggplot_sir`"
    - "`ggplot_pca`"
    
-  - title: "Other: AMR-specific options"
+  - title: "AMR-specific options"
    desc: >
      The AMR package is customisable, by providing settings that can be set per user or per team. For
      example, the default interpretation guideline can be changed from EUCAST to CLSI, or a supported
@@ -262,6 +262,7 @@ reference:
    contents:
    - "`age_groups`"
    - "`age`"
+    - "`export_ncbi_biosample`"
    - "`availability`"
    - "`get_AMR_locale`"
    - "`italicise_taxonomy`"
--- a/inst/tinytest/test-sir.R
+++ b/inst/tinytest/test-sir.R
@@ -51,17 +51,18 @@ if (AMR:::pkg_is_available("ggplot2")) {
  expect_inherits(ggplot2::autoplot(as.sir(c("S", "SDD", "I", "R", "N"))), "gg")
 }
 expect_stdout(print(as.sir(c("S", "SDD", "I", "R", "N"))))
-expect_equal(as.character(as.sir(c(1:3))), c("S", "SDD", "I", "R", "N"))
-expect_equal(as.character(as.sir(c(1:3))), c("S", "SDD", "I", "R", "N"))
+expect_equal(as.character(as.sir(c(1:3))), c("S", "I", "R"))
+expect_equal(as.character(as.sir(c(1:3))), c("S", "I", "R"))
 expect_equal(suppressWarnings(as.logical(as.sir("INVALID VALUE"))), NA)
 expect_equal(
  summary(as.sir(c("S", "R"))),
  structure(c(
    "Class" = "sir",
+    "%S" = "50.0% (n=1)",
+    "%SDD" = " 0.0% (n=0)",
+    "%I" = " 0.0% (n=0)",
    "%R" = "50.0% (n=1)",
-    "%SI" = "50.0% (n=1)",
-    "- %S" = "50.0% (n=1)",
-    "- %I" = " 0.0% (n=0)"
+    "%N" = " 0.0% (n=0)"
  ), class = c("summaryDefault", "table"))
 )
 expect_identical(
--- a/man/as.sir.Rd
+++ b/man/as.sir.Rd
@@ -128,7 +128,7 @@ sir_interpretation_history(clean = FALSE)
 Ordered \link{factor} with new class \code{sir}
 }
 \description{
-Clean up existing SIR values, or interpret minimum inhibitory concentration (MIC) values and disk diffusion diameters according to EUCAST or CLSI. \code{\link[=as.sir]{as.sir()}} transforms the input to a new class \code{\link{sir}}, which is an ordered \link{factor}.
+Clean up existing SIR values, or interpret minimum inhibitory concentration (MIC) values and disk diffusion diameters according to EUCAST or CLSI. \code{\link[=as.sir]{as.sir()}} transforms the input to a new class \code{\link{sir}}, which is an ordered \link{factor} containing the levels \code{S}, \code{SDD}, \code{I}, \code{R}, \code{N}.

 These breakpoints are currently implemented:
 \itemize{
@@ -215,7 +215,9 @@ The repository of this package \href{https://github.com/msberends/AMR/blob/main/

 The function \code{\link[=is.sir]{is.sir()}} detects if the input contains class \code{sir}. If the input is a \link{data.frame}, it iterates over all columns and returns a \link{logical} vector.

-The function \code{\link[=is_sir_eligible]{is_sir_eligible()}} returns \code{TRUE} when a columns contains at most 5\% invalid antimicrobial interpretations (not S and/or I and/or R and/or N and/or SDD), and \code{FALSE} otherwise. The threshold of 5\% can be set with the \code{threshold} argument. If the input is a \link{data.frame}, it iterates over all columns and returns a \link{logical} vector.
+The base R function \code{\link[=as.double]{as.double()}} can be used to retrieve quantitative values from a \code{sir} object: \code{"S"} = 1, \code{"I"}/\code{"SDD"} = 2, \code{"R"} = 3. All other values are rendered \code{NA} . \strong{Note:} Do not use \code{as.integer()}, since that (because of how R works internally) will return the factor level indices, and not these aforementioned quantitative values.
+
+The function \code{\link[=is_sir_eligible]{is_sir_eligible()}} returns \code{TRUE} when a column contains at most 5\% invalid antimicrobial interpretations (not S and/or I and/or R and/or N and/or SDD), and \code{FALSE} otherwise. The threshold of 5\% can be set with the \code{threshold} argument. If the input is a \link{data.frame}, it iterates over all columns and returns a \link{logical} vector.
 }

 \code{NA_sir_} is a missing value of the new \code{sir} class, analogous to e.g. base \R's \code{\link[base:NA]{NA_character_}}.
@@ -318,6 +320,11 @@ is.sir(sir_data)
 plot(sir_data) # for percentages
 barplot(sir_data) # for frequencies

+# as common in R, you can use as.integer() to return factor indices:
+as.integer(as.sir(c("S", "SDD", "I", "R", "N", NA)))
+# but for computational use, as.double() will return 1 for S, 2 for I/SDD, and 3 for R:
+as.double(as.sir(c("S", "SDD", "I", "R", "N", NA)))
+
 # the dplyr way
 if (require("dplyr")) {
  example_isolates \%>\%
--- a/man/export_ncbi_biosample.Rd
+++ b/man/export_ncbi_biosample.Rd
@@ -1,10 +1,10 @@
 % Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/export_biosample.R
-\name{export_biosample}
-\alias{export_biosample}
+\name{export_ncbi_biosample}
+\alias{export_ncbi_biosample}
 \title{Export Data Set as NCBI BioSample Antibiogram}
 \usage{
-export_biosample(
+export_ncbi_biosample(
  x,
  filename = paste0("biosample_", format(Sys.time(), "\%Y-\%m-\%d-\%H\%M\%S"), ".xlsx"),
  type = "pathogen MIC",
@@ -22,3 +22,4 @@ export_biosample(
 \description{
 Export Data Set as NCBI BioSample Antibiogram
 }
+\keyword{internal}