(v2.1.1.9060) SDD results now in as.sir()

2025-07-26 03:15:56 +02:00 · 2024-06-19 15:08:23 +02:00
parent 0c3d81f32e
commit c67d003e9e
17 changed files with 34127 additions and 34129 deletions
--- a/4
+++ b/4
@ -1,6 +1,6 @@
 Package: AMR
-Version: 2.1.1.9059
-Date: 2024-06-17
+Version: 2.1.1.9060
+Date: 2024-06-19
 Title: Antimicrobial Resistance Data Analysis
 Description: Functions to simplify and standardise antimicrobial resistance (AMR)
  data analysis and to work with microbial and antimicrobial properties by
--- a/NEWS.md
+++ b/NEWS.md
@ -1,4 +1,4 @@
-# AMR 2.1.1.9059
+# AMR 2.1.1.9060

 *(this beta version will eventually become v3.0. We're happy to reach a new major milestone soon, which will be all about the new One Health support! Install this beta using [the instructions here](https://msberends.github.io/AMR/#latest-development-version).)*

@ -15,19 +15,27 @@ This package now supports not only tools for AMR data analysis in clinical setti
  * The `antibiotics` data set contains all veterinary antibiotics, such as pradofloxacin and enrofloxacin. All WHOCC codes for veterinary use have been added as well.
  * `ab_atc()` now supports ATC codes of veterinary antibiotics (that all start with "Q")
  * `ab_url()` now supports retrieving the WHOCC url of their ATCvet pages
-* EUCAST 2024 and CLSI 2024 are now supported, by adding all of their over 4,000 new clinical breakpoints to the `clinical_breakpoints` data set for usage in `as.sir()`. EUCAST 2024 (v14.0) is now the new default guideline for all MIC and disks diffusion interpretations.
-* `as.sir()` now brings additional factor levels: "NI" for non-interpretable and "SDD" for susceptible dose-dependent. Users can now set their own criteria (using regular expressions) as to what should be considered S, I, R, SDD, and NI. Also, to get quantitative values, `as.double()` on a `sir` object will return 1 for S, 2 for SDD/I, and 3 for R (NI will become `NA`). Other functions using `sir` classes (e.g., `summary()`) are updated to reflect the change to contain NI and SDD.
-* The function group `scale_*_mic()`, namely: `scale_x_mic()`, `scale_y_mic()`, `scale_colour_mic()` and `scale_fill_mic()`. They are advanced ggplot2 extensions to allow easy plotting of MIC values. They allow for manual range definition and plotting missing intermediate log2 levels.
-* Function `rescale_mic()`, which allows to rescale MIC values to a manually set range. This is the powerhouse behind the `scale_*_mic()` functions, but it can be used by users directly to e.g. compare equality in MIC distributions by rescaling them to the same range first.
-* Function `mo_group_members()` to retrieve the member microorganisms of a microorganism group. For example, `mo_group_members("Strep group C")` returns a vector of all microorganisms that are in that group.
+* Clinical breakpoints
+  * EUCAST 2024 and CLSI 2024 are now supported, by adding all of their over 4,000 new clinical breakpoints to the `clinical_breakpoints` data set for usage in `as.sir()`. EUCAST 2024 is now the new default guideline for all MIC and disks diffusion interpretations.
+  * `as.sir()` now brings additional factor levels: "NI" for non-interpretable and "SDD" for susceptible dose-dependent. Currently, the `clinical_breakpoints` data set contains 24 breakpoints that can return the value "SDD" instead of "I".
+* MIC plotting and transforming
+  * The function group `scale_*_mic()`, namely: `scale_x_mic()`, `scale_y_mic()`, `scale_colour_mic()` and `scale_fill_mic()`. They are advanced ggplot2 extensions to allow easy plotting of MIC values. They allow for manual range definition and plotting missing intermediate log2 levels.
+  * Function `rescale_mic()`, which allows to rescale MIC values to a manually set range. This is the powerhouse behind the `scale_*_mic()` functions, but it can be used by users directly to e.g. compare equality in MIC distributions by rescaling them to the same range first.
+* Other
+  * Function `mo_group_members()` to retrieve the member microorganisms of a microorganism group. For example, `mo_group_members("Strep group C")` returns a vector of all microorganisms that are in that group.

 ## Changed
-* For SIR interpretation, it is now possible to use column names for argument `ab`, `mo`, and `uti`: `as.sir(..., ab = "column1", mo = "column2", uti = "column3")`. This greatly improves the flexibility for users.
-* Extended the antibiotic selectors with `nitrofurans()` and `rifamycins()`
-* `antibiotics` data set:
+* SIR interpretation
+  * It is now possible to use column names for argument `ab`, `mo`, and `uti`: `as.sir(..., ab = "column1", mo = "column2", uti = "column3")`. This greatly improves the flexibility for users.
+  * Users can now set their own criteria (using regular expressions) as to what should be considered S, I, R, SDD, and NI.
+  * To get quantitative values, `as.double()` on a `sir` object will return 1 for S, 2 for SDD/I, and 3 for R (NI will become `NA`). Other functions using `sir` classes (e.g., `summary()`) are updated to reflect the change to contain NI and SDD.
+* `antibiotics` data set
  * Added "clindamycin inducible screening" as `CLI1`. Since clindamycin is a lincosamide, the antibiotic selector `lincosamides()` now contains the argument `only_treatable = TRUE` (similar to other antibiotic selectors that contain non-treatable drugs)
  * Added Amorolfine (`AMO`, D01AE16), which is now also part of the `antifungals()` selector
-* For MICs:
+* Antibiotic selectors
+  * Added selectors `nitrofurans()` and `rifamycins()`
+  * When using antibiotic selectors such as `aminoglycosides()` that exclude non-treatable drugs like gentamicin-high, the function now always returns a warning that these can be included using `only_treatable = FALSE`
+* MICs
  * Added as valid levels: 4096, 6 powers of 0.0625, and 5 powers of 192 (192, 384, 576, 768, 960)
  * Added new argument `keep_operators` to `as.mic()`. This can be `"all"` (default), `"none"`, or `"edges"`. This argument is also available in the new `rescale_mic()` and `scale_*_mic()` functions.
  * Comparisons of MIC values are now more strict. For example, `>32` is higher than (and never equal to) `32`. Thus, `as.mic(">32") == as.mic(32)` now returns `FALSE`, and `as.mic(">32") > as.mic(32)` now returns `TRUE`.
@ -42,7 +50,6 @@ This package now supports not only tools for AMR data analysis in clinical setti
 * Fix for mapping 'high level' antibiotics in `as.ab()` (amphotericin B-high, gentamicin-high, kanamycin-high, streptomycin-high, tobramycin-high)
 * Improved overall algorithm of `as.ab()` for better performance and accuracy
 * Improved overall algorithm of `as.mo()` for better performance and accuracy. Specifically, more weight is given to genus and species combinations in cases where the subspecies is miswritten, so that the result will be the correct genus and species.
-* When using antibiotic selectors such as `aminoglycosides()` that exclude non-treatable drugs like gentamicin-high, the function now always returns a warning that these can be included using `only_treatable = FALSE`
 * Intermediate log2 levels used for MIC plotting are now more common values instead of following a strict dilution range

 ## Other
--- a/R/data.R
+++ b/R/data.R
@ -283,6 +283,7 @@
 #' - `breakpoint_S`\cr Lowest MIC value or highest number of millimetres that leads to "S"
 #' - `breakpoint_R`\cr Highest MIC value or lowest number of millimetres that leads to "R"
 #' - `uti`\cr A [logical] value (`TRUE`/`FALSE`) to indicate whether the rule applies to a urinary tract infection (UTI)
+#' - `is_SDD`\cr A [logical] value (`TRUE`/`FALSE`) to indicate whether the intermediate range between "S" and "R" should be interpreted as "SDD", instead of "I". This currently applies to `r sum(clinical_breakpoints$is_SDD)` breakpoints.
 #' @details
 #' ### Different types of breakpoints
 #' Supported types of breakpoints are `r vector_and(clinical_breakpoints$type, quote = FALSE)`. ECOFF (Epidemiological cut-off) values are used in antimicrobial susceptibility testing to differentiate between wild-type and non-wild-type strains of bacteria or fungi.
--- a/R/sir.R
+++ b/R/sir.R
@ -1033,13 +1033,14 @@ as_sir_method <- function(method_short,
    stop_("No unambiguous name was supplied about the antibiotic (argument `ab`). See ?as.sir.", call = FALSE)
  }

-  ab.bak <- ab
+  ab.bak <- trimws2(ab)
  ab <- suppressWarnings(as.ab(ab))
  if (!is.null(list(...)$mo.bak)) {
    mo.bak <- list(...)$mo.bak
  } else {
    mo.bak <- mo
  }
+  mo.bak <- trimws2(mo.bak)
  # be sure to take current taxonomy, as the 'clinical_breakpoints' data set only contains current taxonomy
  mo <- suppressWarnings(suppressMessages(as.mo(mo, keep_synonyms = FALSE, info = FALSE)))
  if (all(is.na(ab))) {
@ -1352,8 +1353,9 @@ as_sir_method <- function(method_short,
          values <= breakpoints_current$breakpoint_S ~ as.sir("S"),
          guideline_coerced %like% "EUCAST" & values > breakpoints_current$breakpoint_R ~ as.sir("R"),
          guideline_coerced %like% "CLSI" & values >= breakpoints_current$breakpoint_R ~ as.sir("R"),
-          # return "I" when breakpoints are in the middle
-          !is.na(breakpoints_current$breakpoint_S) & !is.na(breakpoints_current$breakpoint_R) ~ as.sir("I"),
+          # return "I" or "SDD" when breakpoints are in the middle
+          !is.na(breakpoints_current$breakpoint_S) & !is.na(breakpoints_current$breakpoint_R) & breakpoints_current$is_SDD == FALSE ~ as.sir("I"),
+          !is.na(breakpoints_current$breakpoint_S) & !is.na(breakpoints_current$breakpoint_R) & breakpoints_current$is_SDD == TRUE ~ as.sir("SDD"),
          # and NA otherwise
          TRUE ~ NA_sir_
        )
@ -1363,8 +1365,9 @@ as_sir_method <- function(method_short,
          as.double(values) >= as.double(breakpoints_current$breakpoint_S) ~ as.sir("S"),
          guideline_coerced %like% "EUCAST" & as.double(values) < as.double(breakpoints_current$breakpoint_R) ~ as.sir("R"),
          guideline_coerced %like% "CLSI" & as.double(values) <= as.double(breakpoints_current$breakpoint_R) ~ as.sir("R"),
-          # return "I" when breakpoints are in the middle
-          !is.na(breakpoints_current$breakpoint_S) & !is.na(breakpoints_current$breakpoint_R) ~ as.sir("I"),
+          # return "I" or "SDD" when breakpoints are in the middle
+          !is.na(breakpoints_current$breakpoint_S) & !is.na(breakpoints_current$breakpoint_R) & breakpoints_current$is_SDD == FALSE ~ as.sir("I"),
+          !is.na(breakpoints_current$breakpoint_S) & !is.na(breakpoints_current$breakpoint_R) & breakpoints_current$is_SDD == TRUE ~ as.sir("SDD"),
          # and NA otherwise
          TRUE ~ NA_sir_
        )
--- a/data-raw/clin_break.md5
+++ b/data-raw/clin_break.md5
@ -1 +1 @@
-5d90ad7fe89682bfc58700682c562207
+91ebec60b7f84f55ec0b756964d7c5b6
--- a/data-raw/clinical_breakpoints.dta
+++ b/data-raw/clinical_breakpoints.dta
--- a/data-raw/clinical_breakpoints.feather
+++ b/data-raw/clinical_breakpoints.feather
--- a/data-raw/clinical_breakpoints.rds
+++ b/data-raw/clinical_breakpoints.rds
--- a/data-raw/clinical_breakpoints.sav
+++ b/data-raw/clinical_breakpoints.sav
--- a/data-raw/clinical_breakpoints.txt
+++ b/data-raw/clinical_breakpoints.txt
--- a/data-raw/clinical_breakpoints.xlsx
+++ b/data-raw/clinical_breakpoints.xlsx
--- a/data-raw/clinical_breakpoints.xpt
+++ b/data-raw/clinical_breakpoints.xpt
--- a/data-raw/reproduction_of_clinical_breakpoints.R
+++ b/data-raw/reproduction_of_clinical_breakpoints.R
@ -201,11 +201,11 @@ whonet_breakpoints %>%
  pivot_wider(names_from = BREAKPOINT_TYPE, values_from = n) %>% 
  janitor::adorn_totals(where = c("row", "col"))
 # compared to current
-AMR::clinical_breakpoints |>
-  count(GUIDELINES = gsub("[^a-zA-Z]", "", guideline), type) |>
-  arrange(tolower(type)) |>
+AMR::clinical_breakpoints %>%
+  count(GUIDELINES = gsub("[^a-zA-Z]", "", guideline), type) %>%
+  arrange(tolower(type)) %>%
  pivot_wider(names_from = type, values_from = n) %>% 
-  as.data.frame() |>
+  as.data.frame() %>%
  janitor::adorn_totals(where = c("row", "col"))

 breakpoints <- whonet_breakpoints %>%
@ -264,7 +264,8 @@ breakpoints_new <- breakpoints %>%
    disk_dose = POTENCY,
    breakpoint_S = ifelse(type == "ECOFF" & is.na(S) & !is.na(ECV_ECOFF), ECV_ECOFF, S),
    breakpoint_R = ifelse(type == "ECOFF" & is.na(R) & !is.na(ECV_ECOFF), ECV_ECOFF, R),
-    uti = ifelse(is.na(site), FALSE, gsub(".*(UTI|urinary|urine).*", "UTI", site) == "UTI")
+    uti = ifelse(is.na(site), FALSE, gsub(".*(UTI|urinary|urine).*", "UTI", site) == "UTI"),
+    is_SDD = !is.na(SDD)
  ) %>%
  # Greek symbols and EM dash symbols are not allowed by CRAN, so replace them with ASCII:
  mutate(disk_dose = disk_dose %>%
@ -275,15 +276,6 @@ breakpoints_new <- breakpoints %>%
  filter(!(is.na(breakpoint_S) & is.na(breakpoint_R)) & !is.na(mo) & !is.na(ab)) %>%
  distinct(guideline, type, host, ab, mo, method, site, breakpoint_S, .keep_all = TRUE)

-# check the strange duplicates
-breakpoints_new %>% 
-  mutate(id = paste(guideline, type, host, ab, mo, method, site)) %>% 
-  filter(id %in% .$id[which(duplicated(id))]) |> 
-  arrange(desc(guideline))
-# remove duplicates
-breakpoints_new <- breakpoints_new %>% 
-  distinct(guideline, type, host, ab, mo, method, site, .keep_all = TRUE)
-
 # fix reference table names
 breakpoints_new %>% filter(guideline %like% "EUCAST", is.na(ref_tbl)) %>% View()
 breakpoints_new <- breakpoints_new %>% 
@ -295,12 +287,12 @@ breakpoints_new <- breakpoints_new %>%
 breakpoints_new[which(breakpoints_new$method == "DISK"), "breakpoint_S"] <- as.double(as.disk(breakpoints_new[which(breakpoints_new$method == "DISK"), "breakpoint_S", drop = TRUE]))
 breakpoints_new[which(breakpoints_new$method == "DISK"), "breakpoint_R"] <- as.double(as.disk(breakpoints_new[which(breakpoints_new$method == "DISK"), "breakpoint_R", drop = TRUE]))

-# regarding animal breakpoints, CLSI has adults and foals for horses, but only for amikacin - remove them
-breakpoints_new |> 
-  filter(host %like% "foal") |>
+# regarding animal breakpoints, CLSI has adults and foals for horses, but only for amikacin - only keep adult horses
+breakpoints_new %>% 
+  filter(host %like% "foal") %>%
  View()
-breakpoints_new <- breakpoints_new |> 
-  filter(host %unlike% "foal") |> 
+breakpoints_new <- breakpoints_new %>% 
+  filter(host %unlike% "foal") %>% 
  mutate(host = ifelse(host %like% "horse", "horse", host))

 # FIXES FOR WHONET ERRORS ----
@ -372,9 +364,17 @@ breakpoints_new <- breakpoints_new %>%
 # fill missing R breakpoint where there is an S breakpoint
 breakpoints_new[which(is.na(breakpoints_new$breakpoint_R)), "breakpoint_R"] <- breakpoints_new[which(is.na(breakpoints_new$breakpoint_R)), "breakpoint_S"]

-# keep distinct rows
-breakpoints_new <- breakpoints_new |>
-  distinct()
+
+# check the strange duplicates
+breakpoints_new %>% 
+  mutate(id = paste(guideline, type, host, method, site, mo, ab, uti)) %>% 
+  filter(id %in% .$id[which(duplicated(id))]) %>% 
+  arrange(desc(guideline))
+# 2024-06-19 mostly ECOFFs, but there's no explanation in the whonet_breakpoints file, we have to remove duplicates
+# remove duplicates
+breakpoints_new <- breakpoints_new %>% 
+  distinct(guideline, type, host, method, site, mo, ab, uti, .keep_all = TRUE)
+

 # CHECKS AND SAVE TO PACKAGE ----

--- a/data/clinical_breakpoints.rda
+++ b/data/clinical_breakpoints.rda
--- a/inst/tinytest/test-sir.R
+++ b/inst/tinytest/test-sir.R
@ -35,6 +35,8 @@ expect_identical(
  unique(gsub("[^A-Z]", "", AMR::clinical_breakpoints$guideline)),
  c("EUCAST", "CLSI")
 )
+# no missing SDDs
+expect_identical(sum(is.na(AMR::clinical_breakpoints$is_SDD)), 0)

 expect_true(as.sir("S") < as.sir("I"))
 expect_true(as.sir("I") < as.sir("R"))
@ -292,6 +294,12 @@ expect_message(as.sir(data.frame(
  specimen = c("urine", "blood")
 )))

+# SDD vs I in CLSI 2024
+expect_identical(as.sir(as.mic(2 ^ c(-2:4)), mo = "Enterococcus faecium", ab = "Dapto", guideline = "CLSI 2024"),
+                 as.sir(c("SDD", "SDD", "SDD", "SDD", "SDD", "R", "R")))
+expect_identical(as.sir(as.mic(2 ^ c(-2:2)), mo = "Enterococcus faecium", ab = "Cipro
+                        ", guideline = "CLSI 2024"),
+                 as.sir(c("S", "S", "S", "I", "R")))


 # Veterinary --------------------------------------------------------------
--- a/man/as.sir.Rd
+++ b/man/as.sir.Rd
@ -214,7 +214,7 @@ After using \code{\link[=as.sir]{as.sir()}}, you can use the \code{\link[=eucast

 \subsection{Machine-Readable Clinical Breakpoints}{

-The repository of this package \href{https://github.com/msberends/AMR/blob/main/data-raw/clinical_breakpoints.txt}{contains a machine-readable version} of all guidelines. This is a CSV file consisting of 34 085 rows and 13 columns. This file is machine-readable, since it contains one row for every unique combination of the test method (MIC or disk diffusion), the antimicrobial drug and the microorganism. \strong{This allows for easy implementation of these rules in laboratory information systems (LIS)}. Note that it only contains interpretation guidelines for humans - interpretation guidelines from CLSI for animals were removed.
+The repository of this package \href{https://github.com/msberends/AMR/blob/main/data-raw/clinical_breakpoints.txt}{contains a machine-readable version} of all guidelines. This is a CSV file consisting of 34 063 rows and 14 columns. This file is machine-readable, since it contains one row for every unique combination of the test method (MIC or disk diffusion), the antimicrobial drug and the microorganism. \strong{This allows for easy implementation of these rules in laboratory information systems (LIS)}. Note that it only contains interpretation guidelines for humans - interpretation guidelines from CLSI for animals were removed.
 }

 \subsection{Other}{
--- a/man/clinical_breakpoints.Rd
+++ b/man/clinical_breakpoints.Rd
@ -5,7 +5,7 @@
 \alias{clinical_breakpoints}
 \title{Data Set with Clinical Breakpoints for SIR Interpretation}
 \format{
-A \link[tibble:tibble]{tibble} with 34 085 observations and 13 variables:
+A \link[tibble:tibble]{tibble} with 34 063 observations and 14 variables:
 \itemize{
 \item \code{guideline}\cr Name of the guideline
 \item \code{type}\cr Breakpoint type, either "ECOFF", "animal", or "human"
@ -20,6 +20,7 @@ A \link[tibble:tibble]{tibble} with 34 085 observations and 13 variables:
 \item \code{breakpoint_S}\cr Lowest MIC value or highest number of millimetres that leads to "S"
 \item \code{breakpoint_R}\cr Highest MIC value or lowest number of millimetres that leads to "R"
 \item \code{uti}\cr A \link{logical} value (\code{TRUE}/\code{FALSE}) to indicate whether the rule applies to a urinary tract infection (UTI)
+\item \code{is_SDD}\cr A \link{logical} value (\code{TRUE}/\code{FALSE}) to indicate whether the intermediate range between "S" and "R" should be interpreted as "SDD", instead of "I". This currently applies to 24 breakpoints.
 }
 }
 \usage{