mirror of
https://github.com/msberends/AMR.git
synced 2026-05-31 21:01:45 +02:00
Fix custom reference_data support in as.sir() (fixes #239)
- custom guideline names now correctly classify values as R: CLSI convention (>= breakpoint_R for MIC, <= for disk) applies only when guideline contains "CLSI"; all other guidelines including custom ones use the EUCAST convention (> breakpoint_R for MIC, < for disk) - guideline argument is now optional when reference_data is manually set: if omitted or if its value does not match any row in the custom data, all rows in reference_data are used; if set to a value present in the data, only matching rows are filtered — useful for multi-guideline custom tables - host = NA in custom reference_data now acts as a host-agnostic fallback when no host-specific breakpoint row exists for the current animal species - updated reference_data argument documentation to explain these conventions https://claude.ai/code/session_01Q8KtFFGG9qrjAgLJBbxG2U
This commit is contained in:
@@ -1,6 +1,6 @@
|
|||||||
Package: AMR
|
Package: AMR
|
||||||
Version: 3.0.1.9050
|
Version: 3.0.1.9050
|
||||||
Date: 2026-04-24
|
Date: 2026-04-25
|
||||||
Title: Antimicrobial Resistance Data Analysis
|
Title: Antimicrobial Resistance Data Analysis
|
||||||
Description: Functions to simplify and standardise antimicrobial resistance (AMR)
|
Description: Functions to simplify and standardise antimicrobial resistance (AMR)
|
||||||
data analysis and to work with microbial and antimicrobial properties by
|
data analysis and to work with microbial and antimicrobial properties by
|
||||||
|
|||||||
1
NEWS.md
1
NEWS.md
@@ -21,6 +21,7 @@
|
|||||||
* Two new `NA` objects, `NA_ab_` and `NA_mo_`, analogous to base R's `NA_character_` and `NA_integer_`, for use in pipelines that require typed missing values
|
* Two new `NA` objects, `NA_ab_` and `NA_mo_`, analogous to base R's `NA_character_` and `NA_integer_`, for use in pipelines that require typed missing values
|
||||||
|
|
||||||
### Fixes
|
### Fixes
|
||||||
|
* `as.sir()` with `reference_data`: custom guideline names now correctly classify values as R using EUCAST convention (`> breakpoint_R` for MIC, `< breakpoint_R` for disk); custom breakpoints with `host = NA` now serve as a host-agnostic fallback when no host-specific row matches (fixes #239)
|
||||||
* Fixed multiple bugs in the `parallel = TRUE` mode of `as.sir()` for data frames: (1) PSOCK workers (Windows / R < 4.0) now correctly load the AMR package before processing, with a graceful fallback to sequential mode when the package cannot be loaded; (2) resolved stale-environment issue where the PSOCK path read a frozen copy of `AMR_env` instead of the live one, causing the wrong log entries to be captured; (3) fixed log-entry duplication in the fork-based path (`mclapply`) where pre-existing `sir_interpretation_history` rows were included in every worker's captured log; (4) removed use of non-exported internal functions (`%pm>%`, `pm_pull`, `as.sir.default`) from the worker closure, which made PSOCK workers fail; (5) suppressed per-column progress messages inside workers to prevent interleaved console output; (6) fixed a malformed Unicode escape `\u00a` (3 digits) in the "DONE" status message
|
* Fixed multiple bugs in the `parallel = TRUE` mode of `as.sir()` for data frames: (1) PSOCK workers (Windows / R < 4.0) now correctly load the AMR package before processing, with a graceful fallback to sequential mode when the package cannot be loaded; (2) resolved stale-environment issue where the PSOCK path read a frozen copy of `AMR_env` instead of the live one, causing the wrong log entries to be captured; (3) fixed log-entry duplication in the fork-based path (`mclapply`) where pre-existing `sir_interpretation_history` rows were included in every worker's captured log; (4) removed use of non-exported internal functions (`%pm>%`, `pm_pull`, `as.sir.default`) from the worker closure, which made PSOCK workers fail; (5) suppressed per-column progress messages inside workers to prevent interleaved console output; (6) fixed a malformed Unicode escape `\u00a` (3 digits) in the "DONE" status message
|
||||||
* Fixed a bug in `as.sir()` where values that were purely numeric (e.g., `"1"`) and matched the broad SIR-matching regex would be incorrectly stripped of all content by the Unicode letter filter
|
* Fixed a bug in `as.sir()` where values that were purely numeric (e.g., `"1"`) and matched the broad SIR-matching regex would be incorrectly stripped of all content by the Unicode letter filter
|
||||||
* Fixed a bug in `as.mic()` where MIC values in scientific notation (e.g., `"1e-3"`) were incorrectly handled because the letter `e` was removed along with other Unicode letters; scientific notation `e` is now preserved
|
* Fixed a bug in `as.mic()` where MIC values in scientific notation (e.g., `"1e-3"`) were incorrectly handled because the letter `e` was removed along with other Unicode letters; scientific notation `e` is now preserved
|
||||||
|
|||||||
26
R/sir.R
26
R/sir.R
@@ -69,7 +69,7 @@ VALID_SIR_LEVELS <- c("S", "SDD", "I", "R", "NI", "WT", "NWT", "NS")
|
|||||||
#' @param host A vector (or column name) with [character]s to indicate the host. Only useful for veterinary breakpoints, as it requires `breakpoint_type = "animal"`. The values can be any text resembling the animal species, even in any of the `r length(LANGUAGES_SUPPORTED)` supported languages of this package. For foreign languages, be sure to set the language with [set_AMR_locale()] (though it will be automatically guessed based on the system language).
|
#' @param host A vector (or column name) with [character]s to indicate the host. Only useful for veterinary breakpoints, as it requires `breakpoint_type = "animal"`. The values can be any text resembling the animal species, even in any of the `r length(LANGUAGES_SUPPORTED)` supported languages of this package. For foreign languages, be sure to set the language with [set_AMR_locale()] (though it will be automatically guessed based on the system language).
|
||||||
#' @param language Language to convert values set in `host` when using animal breakpoints. Use one of these supported language names or [ISO 639-1 codes](https://en.wikipedia.org/wiki/ISO_639-1): `r vector_or(paste0(sapply(LANGUAGES_SUPPORTED_NAMES, function(x) x[[1]]), " (" , LANGUAGES_SUPPORTED, ")"), quotes = FALSE, sort = FALSE)`.
|
#' @param language Language to convert values set in `host` when using animal breakpoints. Use one of these supported language names or [ISO 639-1 codes](https://en.wikipedia.org/wiki/ISO_639-1): `r vector_or(paste0(sapply(LANGUAGES_SUPPORTED_NAMES, function(x) x[[1]]), " (" , LANGUAGES_SUPPORTED, ")"), quotes = FALSE, sort = FALSE)`.
|
||||||
#' @param verbose A [logical] to indicate that all notes should be printed during interpretation of MIC values or disk diffusion values.
|
#' @param verbose A [logical] to indicate that all notes should be printed during interpretation of MIC values or disk diffusion values.
|
||||||
#' @param reference_data A [data.frame] to be used for interpretation, which defaults to the [clinical_breakpoints] data set. Changing this argument allows for using own interpretation guidelines. This argument must contain a data set that is equal in structure to the [clinical_breakpoints] data set (same column names and column types). Please note that the `guideline` argument will be ignored when `reference_data` is manually set.
|
#' @param reference_data A [data.frame] to be used for interpretation, which defaults to the [clinical_breakpoints] data set. Changing this argument allows for using own interpretation guidelines. This argument must contain a data set that is equal in structure to the [clinical_breakpoints] data set (same column names and column types). When `reference_data` is manually set, the `guideline` argument is optional: if omitted (or if its value does not match any row in the custom data), all rows in `reference_data` are considered. If `guideline` is set to a value that exists in the `guideline` column of the custom data, only matching rows are used — useful when a single custom table contains multiple guidelines. For the R classification, the EUCAST convention is used by default: MIC values `> breakpoint_R` and disk diffusion values `< breakpoint_R` are classified as R, with values between `breakpoint_S` and `breakpoint_R` classified as I (or SDD). Only when `guideline` contains `"CLSI"` are the closed-interval rules (`>= breakpoint_R` for MIC, `<= breakpoint_R` for disk) applied instead.
|
||||||
#' @param threshold Maximum fraction of invalid antimicrobial interpretations of `x`, see *Examples*.
|
#' @param threshold Maximum fraction of invalid antimicrobial interpretations of `x`, see *Examples*.
|
||||||
#' @param conserve_capped_values Deprecated, use `capped_mic_handling` instead.
|
#' @param conserve_capped_values Deprecated, use `capped_mic_handling` instead.
|
||||||
#' @param ... For using on a [data.frame]: selection of columns to apply `as.sir()` to. Supports [tidyselect language][tidyselect::starts_with()] such as `where(is.mic)`, `starts_with(...)`, or `column1:column4`, and can thus also be [antimicrobial selectors][amr_selector()], e.g. `as.sir(df, penicillins())`.
|
#' @param ... For using on a [data.frame]: selection of columns to apply `as.sir()` to. Supports [tidyselect language][tidyselect::starts_with()] such as `where(is.mic)`, `starts_with(...)`, or `column1:column4`, and can thus also be [antimicrobial selectors][amr_selector()], e.g. `as.sir(df, penicillins())`.
|
||||||
@@ -1690,8 +1690,15 @@ as_sir_method <- function(method_short,
|
|||||||
|
|
||||||
# gather all available breakpoints for current MO
|
# gather all available breakpoints for current MO
|
||||||
# TODO for VET09 do not filter out E. coli and such
|
# TODO for VET09 do not filter out E. coli and such
|
||||||
|
# For custom reference_data: skip guideline filter when guideline_current is not in the data (#239)
|
||||||
|
guideline_filter_current <- if (!identical(reference_data, AMR::clinical_breakpoints) &&
|
||||||
|
!guideline_current %in% breakpoints$guideline) {
|
||||||
|
unique(breakpoints$guideline)
|
||||||
|
} else {
|
||||||
|
guideline_current
|
||||||
|
}
|
||||||
breakpoints_current <- breakpoints %pm>%
|
breakpoints_current <- breakpoints %pm>%
|
||||||
subset(ab == ab_current & guideline == guideline_current) %pm>%
|
subset(ab == ab_current & guideline %in% guideline_filter_current) %pm>%
|
||||||
subset(mo %in% c(
|
subset(mo %in% c(
|
||||||
mo_current, mo_current_genus, mo_current_family,
|
mo_current, mo_current_genus, mo_current_family,
|
||||||
mo_current_order, mo_current_class,
|
mo_current_order, mo_current_class,
|
||||||
@@ -1701,8 +1708,13 @@ as_sir_method <- function(method_short,
|
|||||||
))
|
))
|
||||||
|
|
||||||
if (breakpoint_type == "animal") {
|
if (breakpoint_type == "animal") {
|
||||||
# 2025-03-13/ for now, only strictly follow guideline for current host, no extrapolation
|
host_matched <- breakpoints_current[which(breakpoints_current$host == host_current), , drop = FALSE]
|
||||||
breakpoints_current <- breakpoints_current[which(breakpoints_current$host == host_current), , drop = FALSE]
|
if (nrow(host_matched) > 0) {
|
||||||
|
breakpoints_current <- host_matched
|
||||||
|
} else {
|
||||||
|
# fall back to host-agnostic rows (host = NA) for custom breakpoint tables (#239)
|
||||||
|
breakpoints_current <- breakpoints_current[which(is.na(breakpoints_current$host)), , drop = FALSE]
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
## fall-back methods for veterinary guidelines ----
|
## fall-back methods for veterinary guidelines ----
|
||||||
@@ -1978,8 +1990,9 @@ as_sir_method <- function(method_short,
|
|||||||
|
|
||||||
# otherwise: the normal (uncapped or ignored) interpretation
|
# otherwise: the normal (uncapped or ignored) interpretation
|
||||||
input_clean <= breakpoints_current$breakpoint_S ~ as.sir("S"),
|
input_clean <= breakpoints_current$breakpoint_S ~ as.sir("S"),
|
||||||
guideline_current %like% "EUCAST" & input_clean > breakpoints_current$breakpoint_R ~ as.sir("R"),
|
# CLSI uses closed interval (>=); EUCAST and all custom guidelines use open interval (>)
|
||||||
guideline_current %like% "CLSI" & input_clean >= breakpoints_current$breakpoint_R ~ as.sir("R"),
|
guideline_current %like% "CLSI" & input_clean >= breakpoints_current$breakpoint_R ~ as.sir("R"),
|
||||||
|
!guideline_current %like% "CLSI" & input_clean > breakpoints_current$breakpoint_R ~ as.sir("R"),
|
||||||
|
|
||||||
# return "I" or "SDD" when breakpoints are in the middle
|
# return "I" or "SDD" when breakpoints are in the middle
|
||||||
!is.na(breakpoints_current$breakpoint_S) & !is.na(breakpoints_current$breakpoint_R) & breakpoints_current$is_SDD == TRUE ~ as.sir("SDD"),
|
!is.na(breakpoints_current$breakpoint_S) & !is.na(breakpoints_current$breakpoint_R) & breakpoints_current$is_SDD == TRUE ~ as.sir("SDD"),
|
||||||
@@ -1992,8 +2005,9 @@ as_sir_method <- function(method_short,
|
|||||||
new_sir <- case_when_AMR(
|
new_sir <- case_when_AMR(
|
||||||
is.na(input_clean) ~ NA_sir_,
|
is.na(input_clean) ~ NA_sir_,
|
||||||
as.double(input_clean) >= as.double(breakpoints_current$breakpoint_S) ~ as.sir("S"),
|
as.double(input_clean) >= as.double(breakpoints_current$breakpoint_S) ~ as.sir("S"),
|
||||||
guideline_current %like% "EUCAST" & as.double(input_clean) < as.double(breakpoints_current$breakpoint_R) ~ as.sir("R"),
|
# CLSI uses closed interval (<=); EUCAST and all custom guidelines use open interval (<)
|
||||||
guideline_current %like% "CLSI" & as.double(input_clean) <= as.double(breakpoints_current$breakpoint_R) ~ as.sir("R"),
|
guideline_current %like% "CLSI" & as.double(input_clean) <= as.double(breakpoints_current$breakpoint_R) ~ as.sir("R"),
|
||||||
|
!guideline_current %like% "CLSI" & as.double(input_clean) < as.double(breakpoints_current$breakpoint_R) ~ as.sir("R"),
|
||||||
# return "I" or "SDD" when breakpoints are in the middle
|
# return "I" or "SDD" when breakpoints are in the middle
|
||||||
!is.na(breakpoints_current$breakpoint_S) & !is.na(breakpoints_current$breakpoint_R) & breakpoints_current$is_SDD == TRUE ~ as.sir("SDD"),
|
!is.na(breakpoints_current$breakpoint_S) & !is.na(breakpoints_current$breakpoint_R) & breakpoints_current$is_SDD == TRUE ~ as.sir("SDD"),
|
||||||
!is.na(breakpoints_current$breakpoint_S) & !is.na(breakpoints_current$breakpoint_R) & breakpoints_current$is_SDD == FALSE ~ as.sir("I"),
|
!is.na(breakpoints_current$breakpoint_S) & !is.na(breakpoints_current$breakpoint_R) & breakpoints_current$is_SDD == FALSE ~ as.sir("I"),
|
||||||
|
|||||||
@@ -529,3 +529,51 @@ test_that("test-sir.R", {
|
|||||||
expect_lte(n_mentions, 1L)
|
expect_lte(n_mentions, 1L)
|
||||||
}
|
}
|
||||||
})
|
})
|
||||||
|
|
||||||
|
# issue #239 — custom reference_data support
|
||||||
|
test_that("custom reference_data: non-EUCAST/CLSI guideline produces R", {
|
||||||
|
# use any MIC/human row as structural template, then override mo/ab/guideline/breakpoints
|
||||||
|
my_bp <- clinical_breakpoints[clinical_breakpoints$method == "MIC" &
|
||||||
|
clinical_breakpoints$type == "human", ][1, ]
|
||||||
|
my_bp$guideline <- "MyLab 2025"
|
||||||
|
my_bp$mo <- "B_ESCHR_COLI"
|
||||||
|
my_bp$ab <- "AMC"
|
||||||
|
my_bp$breakpoint_S <- 8
|
||||||
|
my_bp$breakpoint_R <- 32
|
||||||
|
|
||||||
|
# guideline omitted: all rows in reference_data are used
|
||||||
|
expect_equal(as.character(suppressMessages(
|
||||||
|
as.sir(as.mic(64), mo = "E. coli", ab = "AMC", reference_data = my_bp)
|
||||||
|
)), "R")
|
||||||
|
expect_equal(as.character(suppressMessages(
|
||||||
|
as.sir(as.mic(16), mo = "E. coli", ab = "AMC", reference_data = my_bp)
|
||||||
|
)), "I")
|
||||||
|
# at R breakpoint value must be I (open interval: > not >=)
|
||||||
|
expect_equal(as.character(suppressMessages(
|
||||||
|
as.sir(as.mic(32), mo = "E. coli", ab = "AMC", reference_data = my_bp)
|
||||||
|
)), "I")
|
||||||
|
|
||||||
|
# guideline explicitly set: same result when it matches the data
|
||||||
|
expect_equal(as.character(suppressMessages(
|
||||||
|
as.sir(as.mic(64), mo = "E. coli", ab = "AMC",
|
||||||
|
guideline = "MyLab 2025", reference_data = my_bp)
|
||||||
|
)), "R")
|
||||||
|
})
|
||||||
|
|
||||||
|
test_that("custom reference_data: host = NA acts as host-agnostic fallback", {
|
||||||
|
my_bp <- clinical_breakpoints[clinical_breakpoints$method == "MIC" &
|
||||||
|
clinical_breakpoints$type == "human", ][1, ]
|
||||||
|
my_bp$guideline <- "MyLab 2025"
|
||||||
|
my_bp$mo <- "B_ESCHR_COLI"
|
||||||
|
my_bp$ab <- "AMC"
|
||||||
|
my_bp$type <- "animal"
|
||||||
|
my_bp$host <- NA
|
||||||
|
my_bp$breakpoint_S <- 8
|
||||||
|
my_bp$breakpoint_R <- 32
|
||||||
|
|
||||||
|
# NA host should match when no species-specific row exists; guideline omitted
|
||||||
|
result <- suppressMessages(
|
||||||
|
as.sir(as.mic(64), mo = "E. coli", ab = "AMC", host = "dogs", reference_data = my_bp)
|
||||||
|
)
|
||||||
|
expect_equal(as.character(result), "R")
|
||||||
|
})
|
||||||
|
|||||||
Reference in New Issue
Block a user