1
0
mirror of https://github.com/msberends/AMR.git synced 2026-05-31 21:01:45 +02:00

Fix custom reference_data support in as.sir() (fixes #239)

- custom guideline names now correctly classify values as R: CLSI convention
  (>= breakpoint_R for MIC, <= for disk) applies only when guideline contains
  "CLSI"; all other guidelines including custom ones use the EUCAST convention
  (> breakpoint_R for MIC, < for disk)
- guideline argument is now optional when reference_data is manually set: if
  omitted or if its value does not match any row in the custom data, all rows
  in reference_data are used; if set to a value present in the data, only
  matching rows are filtered — useful for multi-guideline custom tables
- host = NA in custom reference_data now acts as a host-agnostic fallback
  when no host-specific breakpoint row exists for the current animal species
- updated reference_data argument documentation to explain these conventions

https://claude.ai/code/session_01Q8KtFFGG9qrjAgLJBbxG2U
This commit is contained in:
Claude
2026-04-25 08:20:40 +00:00
parent 19157ce718
commit 6ef7441d51
4 changed files with 70 additions and 7 deletions

View File

@@ -1,6 +1,6 @@
Package: AMR Package: AMR
Version: 3.0.1.9050 Version: 3.0.1.9050
Date: 2026-04-24 Date: 2026-04-25
Title: Antimicrobial Resistance Data Analysis Title: Antimicrobial Resistance Data Analysis
Description: Functions to simplify and standardise antimicrobial resistance (AMR) Description: Functions to simplify and standardise antimicrobial resistance (AMR)
data analysis and to work with microbial and antimicrobial properties by data analysis and to work with microbial and antimicrobial properties by

View File

@@ -21,6 +21,7 @@
* Two new `NA` objects, `NA_ab_` and `NA_mo_`, analogous to base R's `NA_character_` and `NA_integer_`, for use in pipelines that require typed missing values * Two new `NA` objects, `NA_ab_` and `NA_mo_`, analogous to base R's `NA_character_` and `NA_integer_`, for use in pipelines that require typed missing values
### Fixes ### Fixes
* `as.sir()` with `reference_data`: custom guideline names now correctly classify values as R using EUCAST convention (`> breakpoint_R` for MIC, `< breakpoint_R` for disk); custom breakpoints with `host = NA` now serve as a host-agnostic fallback when no host-specific row matches (fixes #239)
* Fixed multiple bugs in the `parallel = TRUE` mode of `as.sir()` for data frames: (1) PSOCK workers (Windows / R < 4.0) now correctly load the AMR package before processing, with a graceful fallback to sequential mode when the package cannot be loaded; (2) resolved stale-environment issue where the PSOCK path read a frozen copy of `AMR_env` instead of the live one, causing the wrong log entries to be captured; (3) fixed log-entry duplication in the fork-based path (`mclapply`) where pre-existing `sir_interpretation_history` rows were included in every worker's captured log; (4) removed use of non-exported internal functions (`%pm>%`, `pm_pull`, `as.sir.default`) from the worker closure, which made PSOCK workers fail; (5) suppressed per-column progress messages inside workers to prevent interleaved console output; (6) fixed a malformed Unicode escape `\u00a` (3 digits) in the "DONE" status message * Fixed multiple bugs in the `parallel = TRUE` mode of `as.sir()` for data frames: (1) PSOCK workers (Windows / R < 4.0) now correctly load the AMR package before processing, with a graceful fallback to sequential mode when the package cannot be loaded; (2) resolved stale-environment issue where the PSOCK path read a frozen copy of `AMR_env` instead of the live one, causing the wrong log entries to be captured; (3) fixed log-entry duplication in the fork-based path (`mclapply`) where pre-existing `sir_interpretation_history` rows were included in every worker's captured log; (4) removed use of non-exported internal functions (`%pm>%`, `pm_pull`, `as.sir.default`) from the worker closure, which made PSOCK workers fail; (5) suppressed per-column progress messages inside workers to prevent interleaved console output; (6) fixed a malformed Unicode escape `\u00a` (3 digits) in the "DONE" status message
* Fixed a bug in `as.sir()` where values that were purely numeric (e.g., `"1"`) and matched the broad SIR-matching regex would be incorrectly stripped of all content by the Unicode letter filter * Fixed a bug in `as.sir()` where values that were purely numeric (e.g., `"1"`) and matched the broad SIR-matching regex would be incorrectly stripped of all content by the Unicode letter filter
* Fixed a bug in `as.mic()` where MIC values in scientific notation (e.g., `"1e-3"`) were incorrectly handled because the letter `e` was removed along with other Unicode letters; scientific notation `e` is now preserved * Fixed a bug in `as.mic()` where MIC values in scientific notation (e.g., `"1e-3"`) were incorrectly handled because the letter `e` was removed along with other Unicode letters; scientific notation `e` is now preserved

26
R/sir.R
View File

@@ -69,7 +69,7 @@ VALID_SIR_LEVELS <- c("S", "SDD", "I", "R", "NI", "WT", "NWT", "NS")
#' @param host A vector (or column name) with [character]s to indicate the host. Only useful for veterinary breakpoints, as it requires `breakpoint_type = "animal"`. The values can be any text resembling the animal species, even in any of the `r length(LANGUAGES_SUPPORTED)` supported languages of this package. For foreign languages, be sure to set the language with [set_AMR_locale()] (though it will be automatically guessed based on the system language). #' @param host A vector (or column name) with [character]s to indicate the host. Only useful for veterinary breakpoints, as it requires `breakpoint_type = "animal"`. The values can be any text resembling the animal species, even in any of the `r length(LANGUAGES_SUPPORTED)` supported languages of this package. For foreign languages, be sure to set the language with [set_AMR_locale()] (though it will be automatically guessed based on the system language).
#' @param language Language to convert values set in `host` when using animal breakpoints. Use one of these supported language names or [ISO 639-1 codes](https://en.wikipedia.org/wiki/ISO_639-1): `r vector_or(paste0(sapply(LANGUAGES_SUPPORTED_NAMES, function(x) x[[1]]), " (" , LANGUAGES_SUPPORTED, ")"), quotes = FALSE, sort = FALSE)`. #' @param language Language to convert values set in `host` when using animal breakpoints. Use one of these supported language names or [ISO 639-1 codes](https://en.wikipedia.org/wiki/ISO_639-1): `r vector_or(paste0(sapply(LANGUAGES_SUPPORTED_NAMES, function(x) x[[1]]), " (" , LANGUAGES_SUPPORTED, ")"), quotes = FALSE, sort = FALSE)`.
#' @param verbose A [logical] to indicate that all notes should be printed during interpretation of MIC values or disk diffusion values. #' @param verbose A [logical] to indicate that all notes should be printed during interpretation of MIC values or disk diffusion values.
#' @param reference_data A [data.frame] to be used for interpretation, which defaults to the [clinical_breakpoints] data set. Changing this argument allows for using own interpretation guidelines. This argument must contain a data set that is equal in structure to the [clinical_breakpoints] data set (same column names and column types). Please note that the `guideline` argument will be ignored when `reference_data` is manually set. #' @param reference_data A [data.frame] to be used for interpretation, which defaults to the [clinical_breakpoints] data set. Changing this argument allows for using own interpretation guidelines. This argument must contain a data set that is equal in structure to the [clinical_breakpoints] data set (same column names and column types). When `reference_data` is manually set, the `guideline` argument is optional: if omitted (or if its value does not match any row in the custom data), all rows in `reference_data` are considered. If `guideline` is set to a value that exists in the `guideline` column of the custom data, only matching rows are used — useful when a single custom table contains multiple guidelines. For the R classification, the EUCAST convention is used by default: MIC values `> breakpoint_R` and disk diffusion values `< breakpoint_R` are classified as R, with values between `breakpoint_S` and `breakpoint_R` classified as I (or SDD). Only when `guideline` contains `"CLSI"` are the closed-interval rules (`>= breakpoint_R` for MIC, `<= breakpoint_R` for disk) applied instead.
#' @param threshold Maximum fraction of invalid antimicrobial interpretations of `x`, see *Examples*. #' @param threshold Maximum fraction of invalid antimicrobial interpretations of `x`, see *Examples*.
#' @param conserve_capped_values Deprecated, use `capped_mic_handling` instead. #' @param conserve_capped_values Deprecated, use `capped_mic_handling` instead.
#' @param ... For using on a [data.frame]: selection of columns to apply `as.sir()` to. Supports [tidyselect language][tidyselect::starts_with()] such as `where(is.mic)`, `starts_with(...)`, or `column1:column4`, and can thus also be [antimicrobial selectors][amr_selector()], e.g. `as.sir(df, penicillins())`. #' @param ... For using on a [data.frame]: selection of columns to apply `as.sir()` to. Supports [tidyselect language][tidyselect::starts_with()] such as `where(is.mic)`, `starts_with(...)`, or `column1:column4`, and can thus also be [antimicrobial selectors][amr_selector()], e.g. `as.sir(df, penicillins())`.
@@ -1690,8 +1690,15 @@ as_sir_method <- function(method_short,
# gather all available breakpoints for current MO # gather all available breakpoints for current MO
# TODO for VET09 do not filter out E. coli and such # TODO for VET09 do not filter out E. coli and such
# For custom reference_data: skip guideline filter when guideline_current is not in the data (#239)
guideline_filter_current <- if (!identical(reference_data, AMR::clinical_breakpoints) &&
!guideline_current %in% breakpoints$guideline) {
unique(breakpoints$guideline)
} else {
guideline_current
}
breakpoints_current <- breakpoints %pm>% breakpoints_current <- breakpoints %pm>%
subset(ab == ab_current & guideline == guideline_current) %pm>% subset(ab == ab_current & guideline %in% guideline_filter_current) %pm>%
subset(mo %in% c( subset(mo %in% c(
mo_current, mo_current_genus, mo_current_family, mo_current, mo_current_genus, mo_current_family,
mo_current_order, mo_current_class, mo_current_order, mo_current_class,
@@ -1701,8 +1708,13 @@ as_sir_method <- function(method_short,
)) ))
if (breakpoint_type == "animal") { if (breakpoint_type == "animal") {
# 2025-03-13/ for now, only strictly follow guideline for current host, no extrapolation host_matched <- breakpoints_current[which(breakpoints_current$host == host_current), , drop = FALSE]
breakpoints_current <- breakpoints_current[which(breakpoints_current$host == host_current), , drop = FALSE] if (nrow(host_matched) > 0) {
breakpoints_current <- host_matched
} else {
# fall back to host-agnostic rows (host = NA) for custom breakpoint tables (#239)
breakpoints_current <- breakpoints_current[which(is.na(breakpoints_current$host)), , drop = FALSE]
}
} }
## fall-back methods for veterinary guidelines ---- ## fall-back methods for veterinary guidelines ----
@@ -1978,8 +1990,9 @@ as_sir_method <- function(method_short,
# otherwise: the normal (uncapped or ignored) interpretation # otherwise: the normal (uncapped or ignored) interpretation
input_clean <= breakpoints_current$breakpoint_S ~ as.sir("S"), input_clean <= breakpoints_current$breakpoint_S ~ as.sir("S"),
guideline_current %like% "EUCAST" & input_clean > breakpoints_current$breakpoint_R ~ as.sir("R"), # CLSI uses closed interval (>=); EUCAST and all custom guidelines use open interval (>)
guideline_current %like% "CLSI" & input_clean >= breakpoints_current$breakpoint_R ~ as.sir("R"), guideline_current %like% "CLSI" & input_clean >= breakpoints_current$breakpoint_R ~ as.sir("R"),
!guideline_current %like% "CLSI" & input_clean > breakpoints_current$breakpoint_R ~ as.sir("R"),
# return "I" or "SDD" when breakpoints are in the middle # return "I" or "SDD" when breakpoints are in the middle
!is.na(breakpoints_current$breakpoint_S) & !is.na(breakpoints_current$breakpoint_R) & breakpoints_current$is_SDD == TRUE ~ as.sir("SDD"), !is.na(breakpoints_current$breakpoint_S) & !is.na(breakpoints_current$breakpoint_R) & breakpoints_current$is_SDD == TRUE ~ as.sir("SDD"),
@@ -1992,8 +2005,9 @@ as_sir_method <- function(method_short,
new_sir <- case_when_AMR( new_sir <- case_when_AMR(
is.na(input_clean) ~ NA_sir_, is.na(input_clean) ~ NA_sir_,
as.double(input_clean) >= as.double(breakpoints_current$breakpoint_S) ~ as.sir("S"), as.double(input_clean) >= as.double(breakpoints_current$breakpoint_S) ~ as.sir("S"),
guideline_current %like% "EUCAST" & as.double(input_clean) < as.double(breakpoints_current$breakpoint_R) ~ as.sir("R"), # CLSI uses closed interval (<=); EUCAST and all custom guidelines use open interval (<)
guideline_current %like% "CLSI" & as.double(input_clean) <= as.double(breakpoints_current$breakpoint_R) ~ as.sir("R"), guideline_current %like% "CLSI" & as.double(input_clean) <= as.double(breakpoints_current$breakpoint_R) ~ as.sir("R"),
!guideline_current %like% "CLSI" & as.double(input_clean) < as.double(breakpoints_current$breakpoint_R) ~ as.sir("R"),
# return "I" or "SDD" when breakpoints are in the middle # return "I" or "SDD" when breakpoints are in the middle
!is.na(breakpoints_current$breakpoint_S) & !is.na(breakpoints_current$breakpoint_R) & breakpoints_current$is_SDD == TRUE ~ as.sir("SDD"), !is.na(breakpoints_current$breakpoint_S) & !is.na(breakpoints_current$breakpoint_R) & breakpoints_current$is_SDD == TRUE ~ as.sir("SDD"),
!is.na(breakpoints_current$breakpoint_S) & !is.na(breakpoints_current$breakpoint_R) & breakpoints_current$is_SDD == FALSE ~ as.sir("I"), !is.na(breakpoints_current$breakpoint_S) & !is.na(breakpoints_current$breakpoint_R) & breakpoints_current$is_SDD == FALSE ~ as.sir("I"),

View File

@@ -529,3 +529,51 @@ test_that("test-sir.R", {
expect_lte(n_mentions, 1L) expect_lte(n_mentions, 1L)
} }
}) })
# issue #239 — custom reference_data support
test_that("custom reference_data: non-EUCAST/CLSI guideline produces R", {
# use any MIC/human row as structural template, then override mo/ab/guideline/breakpoints
my_bp <- clinical_breakpoints[clinical_breakpoints$method == "MIC" &
clinical_breakpoints$type == "human", ][1, ]
my_bp$guideline <- "MyLab 2025"
my_bp$mo <- "B_ESCHR_COLI"
my_bp$ab <- "AMC"
my_bp$breakpoint_S <- 8
my_bp$breakpoint_R <- 32
# guideline omitted: all rows in reference_data are used
expect_equal(as.character(suppressMessages(
as.sir(as.mic(64), mo = "E. coli", ab = "AMC", reference_data = my_bp)
)), "R")
expect_equal(as.character(suppressMessages(
as.sir(as.mic(16), mo = "E. coli", ab = "AMC", reference_data = my_bp)
)), "I")
# at R breakpoint value must be I (open interval: > not >=)
expect_equal(as.character(suppressMessages(
as.sir(as.mic(32), mo = "E. coli", ab = "AMC", reference_data = my_bp)
)), "I")
# guideline explicitly set: same result when it matches the data
expect_equal(as.character(suppressMessages(
as.sir(as.mic(64), mo = "E. coli", ab = "AMC",
guideline = "MyLab 2025", reference_data = my_bp)
)), "R")
})
test_that("custom reference_data: host = NA acts as host-agnostic fallback", {
my_bp <- clinical_breakpoints[clinical_breakpoints$method == "MIC" &
clinical_breakpoints$type == "human", ][1, ]
my_bp$guideline <- "MyLab 2025"
my_bp$mo <- "B_ESCHR_COLI"
my_bp$ab <- "AMC"
my_bp$type <- "animal"
my_bp$host <- NA
my_bp$breakpoint_S <- 8
my_bp$breakpoint_R <- 32
# NA host should match when no species-specific row exists; guideline omitted
result <- suppressMessages(
as.sir(as.mic(64), mo = "E. coli", ab = "AMC", host = "dogs", reference_data = my_bp)
)
expect_equal(as.character(result), "R")
})