mirror of
https://github.com/msberends/AMR.git
synced 2026-03-11 13:47:55 +01:00
mdro(): infer base drug resistance from drug+inhibitor combination co… (#263)
* mdro(): infer base drug resistance from drug+inhibitor combination columns (#209) When a base beta-lactam column (e.g., piperacillin/PIP) is absent but a corresponding drug+inhibitor combination (e.g., piperacillin/tazobactam/TZP) is present and resistant, resistance in the base drug is now correctly inferred. This is clinically sound: resistance in a combination implies the inhibitor provided no benefit, so the base drug is also resistant. Susceptibility in a combination is NOT propagated to the base drug (the inhibitor may be responsible for susceptibility), so only R values are inferred; missing base drugs remain NA otherwise. Implementation details: - Uses AB_BETALACTAMS_WITH_INHIBITOR to identify all beta-lactam+inhibitor combinations present in the user's data - Derives base drug AB codes by stripping the "/inhibitor" part from names - Creates synthetic proxy columns (.sir_proxy_<AB>) in x, set to "R" when any matching combination is R, otherwise NA - Proxy columns are added to cols_ab before drug variable assignment, so all existing guideline logic benefits without any changes - Multiple combos for the same base drug are OR-ed (any R → R) - Adds internal ab_without_inhibitor() helper for the name->base mapping - Verbose mode reports which combinations are used for inference Bumps version: 3.0.1.9028 -> 3.0.1.9029 https://claude.ai/code/session_01Cp154UtssHg84bw38xiiTG * Add sir.R/mic.R fixes and mdro() unit tests; bump to 3.0.1.9030 R/sir.R (line 571): Guard purely numeric strings (e.g. "1", "8") from the Unicode letter filter. Values matching the broad SIR regex but consisting only of digits must not be stripped; add `x %unlike% "^[0-9+]$"` predicate. R/mic.R (lines 220-222): Preserve the letter 'e' during Unicode-letter removal so that MIC values in scientific notation (e.g. "1e-3", "2.5e-2") survive the cleaning step. - Line 220: [\\p{L}] → [^e\\P{L}] (remove all letters except 'e') - Line 222: [^0-9.><= -]+ → [^0-9e.><= -]+ (allow 'e' in whitelist) tests/testthat/test-mdro.R: New tests for the drug+inhibitor inference added in the previous commit (issue #209): - TZP=R with no PIP column → PIP inferred R → MDRO class elevated - TZP=S with no PIP column → proxy col is NA (not S) → class lower - verbose mode emits "Inferring resistance" message - AMC=R with no AMX column runs without error (Enterococcus faecium) https://claude.ai/code/session_01Cp154UtssHg84bw38xiiTG * Fix version to single bump (9029) and update CLAUDE.md versioning rules CLAUDE.md: Rewrite the "Version and date bump" subsection to document that: - Exactly ONE version bump is allowed per PR (PRs are squash-merged into one commit on the default branch, so one commit = one version increment) - The correct version is computed from git history: currentversion="${currenttag}.$((commits_since_tag + 9001 + 1))" with the +1 accounting for the PR's own squash commit not yet on the default branch - Fall back to incrementing DESCRIPTION's version by 1 if git describe fails - The Date: field tracks the date of the *last* PR commit (updated each time) DESCRIPTION / NEWS.md: Correct the version from 3.0.1.9030 back to 3.0.1.9029. Two version bumps were made across two commits in this PR; since it will be squash-merged as one commit only one bump is correct. Also update Date to today (2026-03-07). https://claude.ai/code/session_01Cp154UtssHg84bw38xiiTG * Fix stats::setNames, test accessor bug, and version script verification R/mdro.R: Qualify setNames() as stats::setNames() in the drug+inhibitor inference block to satisfy R CMD CHECK's global-function checks. tests/testthat/test-mdro.R: mdro() with verbose=FALSE returns an atomic ordered factor, not a data.frame. Fix three test errors introduced in the previous commit: - Line 320: result_no_pip$MDRO -> result_no_pip (factor, no $ accessor) - Line 328: result_tzp_s$MDRO / result_no_pip$MDRO -> direct factor refs - Line 347: expect_inherits(..., "data.frame") -> c("factor","ordered") Also fix the comment on line 347 to match the actual return type. Version: confirmed at 3.0.1.9029 (no further bump; one bump already made this PR). git describe failed (no tags in dev environment) — fallback applies. The +1 in CLAUDE.md's formula is correct for tagged repos: currentcommit + 9001 + 1 = 27 + 9001 + 1 = 9029 ✓ https://claude.ai/code/session_01Cp154UtssHg84bw38xiiTG * Fix unit tests: use mrgn guideline and expect_message() for proxy tests Three failures corrected: 1. Classification tests (lines 321, 329): The EUCAST guideline for P. aeruginosa already has OR logic (PIP OR TZP), so TZP=R alone satisfies it regardless of whether the PIP proxy exists. Switch to guideline="mrgn": the MRGN 4MRGN criterion for P. aeruginosa requires PIP=R explicitly (lines 1488-1496 of mdro.R), with no TZP fallback. Without the proxy: PIP missing -> not 4MRGN -> level 1. With the proxy (TZP=R infers PIP=R): 4MRGN reached -> level 3. The TZP=S case leaves proxy=NA, so PIP is still absent effectively -> level 1, which is < level 3 as expected. 2. Verbose/message test (line 335): message_() routes through message() to stderr, not cat() to stdout. expect_output() only captures stdout so it always saw nothing. Fix: use expect_message() instead, and remove the inner suppressMessages() that was swallowing the message before expect_message() could capture it. Also trim two stale lines left over from the old expect_output block. https://claude.ai/code/session_01Cp154UtssHg84bw38xiiTG --------- Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
34
CLAUDE.md
34
CLAUDE.md
@@ -148,24 +148,34 @@ Version format: `major.minor.patch.dev` (e.g., `3.0.1.9021`)
|
||||
|
||||
### Version and date bump required for every PR
|
||||
|
||||
Before opening a pull request, always increment the four-digit dev counter by 1 in **both** of these files:
|
||||
All PRs are **squash-merged**, so each PR lands as exactly **one commit** on the default branch. Version numbers are kept in sync with the cumulative commit count since the last released tag. Therefore **exactly one version bump is allowed per PR**, regardless of how many intermediate commits are made on the branch.
|
||||
|
||||
1. **`DESCRIPTION`** — the `Version:` field:
|
||||
```
|
||||
Version: 3.0.1.9021 → Version: 3.0.1.9022
|
||||
```
|
||||
#### Computing the correct version number
|
||||
|
||||
2. **`NEWS.md`** — the top-level heading:
|
||||
```
|
||||
# AMR 3.0.1.9021 → # AMR 3.0.1.9022
|
||||
```
|
||||
Run the following from the repo root to determine the version string to use:
|
||||
|
||||
Read the current version from `DESCRIPTION`, add 1 to the last numeric component, and write the new version to both files in the same commit as the rest of the PR changes.
|
||||
```bash
|
||||
currenttag=$(git describe --tags --abbrev=0 | sed 's/v//')
|
||||
currenttagfull=$(git describe --tags --abbrev=0)
|
||||
defaultbranch=$(git branch | cut -c 3- | grep -E '^master$|^main$')
|
||||
currentcommit=$(git rev-list --count ${currenttagfull}..${defaultbranch})
|
||||
currentversion="${currenttag}.$((currentcommit + 9001 + 1))"
|
||||
echo "$currentversion"
|
||||
```
|
||||
|
||||
Also bump the date to the current date in **`DESCRIPTION`**, where it's in the `Date:` field in ISO format:
|
||||
The `+ 1` accounts for the fact that this PR's squash commit is not yet on the default branch. Set **both** of these files to the resulting version string (and only once per PR, even across multiple commits):
|
||||
|
||||
1. **`DESCRIPTION`** — the `Version:` field
|
||||
2. **`NEWS.md`** — the top-level heading `# AMR <version>`
|
||||
|
||||
If `git describe` fails (e.g. no tags exist in the environment), fall back to reading the current version from `DESCRIPTION` and adding 1 to the last numeric component — but only if no bump has already been made in this PR.
|
||||
|
||||
#### Date field
|
||||
|
||||
The `Date:` field in `DESCRIPTION` must reflect the date of the **last commit to the PR** (not the first), in ISO format. Update it with every commit so it is always current:
|
||||
|
||||
```
|
||||
Date: 2025-12-31
|
||||
Date: 2026-03-07
|
||||
```
|
||||
|
||||
## Internal State
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
Package: AMR
|
||||
Version: 3.0.1.9028
|
||||
Date: 2026-03-06
|
||||
Version: 3.0.1.9029
|
||||
Date: 2026-03-07
|
||||
Title: Antimicrobial Resistance Data Analysis
|
||||
Description: Functions to simplify and standardise antimicrobial resistance (AMR)
|
||||
data analysis and to work with microbial and antimicrobial properties by
|
||||
|
||||
5
NEWS.md
5
NEWS.md
@@ -1,4 +1,4 @@
|
||||
# AMR 3.0.1.9028
|
||||
# AMR 3.0.1.9029
|
||||
|
||||
### New
|
||||
* Integration with the **tidymodels** framework to allow seamless use of SIR, MIC and disk data in modelling pipelines via `recipes`
|
||||
@@ -18,6 +18,9 @@
|
||||
* Two new `NA` objects, `NA_ab_` and `NA_mo_`, analogous to base R's `NA_character_` and `NA_integer_`, for use in pipelines that require typed missing values
|
||||
|
||||
### Fixes
|
||||
* `mdro()`: when a base beta-lactam drug column is missing but a corresponding drug+inhibitor combination is present in the data and resistant (e.g., piperacillin/tazobactam = R while piperacillin is absent), the base drug is now correctly inferred as resistant. This ensures MDRO classification is not missed due to test-ordering differences in the laboratory. The reverse direction is also valid: susceptibility in a combination does not imply susceptibility in the base drug (the inhibitor may be responsible), so only resistance is propagated. Closes #209
|
||||
* Fixed a bug in `as.sir()` where values that were purely numeric (e.g., `"1"`) and matched the broad SIR-matching regex would be incorrectly stripped of all content by the Unicode letter filter
|
||||
* Fixed a bug in `as.mic()` where MIC values in scientific notation (e.g., `"1e-3"`) were incorrectly handled because the letter `e` was removed along with other Unicode letters; scientific notation `e` is now preserved
|
||||
* Fixed a bug in `as.ab()` where certain AB codes containing "PH" or "TH" (such as `ETH`, `MTH`, `PHE`, `PHN`, `STH`, `THA`, `THI1`) would incorrectly return `NA` when combined in a vector with any untranslatable value (#245)
|
||||
* Fixed a bug in `antibiogram()` for when no antimicrobials are set
|
||||
* Fixed a bug in `as.sir()` where for numeric input the arguments `S`, `I`, and `R` would not be considered (#244)
|
||||
|
||||
54
R/mdro.R
54
R/mdro.R
@@ -480,6 +480,50 @@ mdro <- function(x = NULL,
|
||||
}
|
||||
cols_ab <- cols_ab[!duplicated(cols_ab)]
|
||||
|
||||
# Infer resistance for missing base drugs from available drug+inhibitor combination columns.
|
||||
# Clinical principle: resistance in drug+inhibitor (e.g., piperacillin/tazobactam = R)
|
||||
# always implies resistance in the base drug (e.g., piperacillin = R), because the
|
||||
# enzyme inhibitor adds nothing when the organism is truly resistant to the base drug.
|
||||
# NOTE: susceptibility in a combination does NOT imply susceptibility in the base drug
|
||||
# (the inhibitor may be responsible), so synthetic proxy columns only propagate R, not S/I.
|
||||
.combos_in_data <- AB_BETALACTAMS_WITH_INHIBITOR[AB_BETALACTAMS_WITH_INHIBITOR %in% names(cols_ab)]
|
||||
if (length(.combos_in_data) > 0) {
|
||||
.base_drugs <- suppressMessages(
|
||||
as.ab(gsub("/.*", "", ab_name(as.character(.combos_in_data), language = NULL)))
|
||||
)
|
||||
.unique_bases <- unique(.base_drugs[!is.na(.base_drugs)])
|
||||
for (.base in .unique_bases) {
|
||||
.base_code <- as.character(.base)
|
||||
if (!.base_code %in% names(cols_ab)) {
|
||||
# Base drug column absent; find all available combo columns for this base drug
|
||||
.combos <- .combos_in_data[!is.na(.base_drugs) & as.character(.base_drugs) == .base_code]
|
||||
.combo_cols <- unname(cols_ab[as.character(.combos)])
|
||||
.combo_cols <- .combo_cols[!is.na(.combo_cols)]
|
||||
if (length(.combo_cols) > 0) {
|
||||
# Vectorised: if ANY combination is R, infer base drug as R; otherwise NA
|
||||
.sir_chars <- as.data.frame(
|
||||
lapply(x[, .combo_cols, drop = FALSE], function(col) as.character(as.sir(col))),
|
||||
stringsAsFactors = FALSE
|
||||
)
|
||||
.new_col <- paste0(".sir_proxy_", .base_code)
|
||||
x[[.new_col]] <- ifelse(rowSums(.sir_chars == "R", na.rm = TRUE) > 0L, "R", NA_character_)
|
||||
cols_ab <- c(cols_ab, stats::setNames(.new_col, .base_code))
|
||||
if (isTRUE(verbose)) {
|
||||
message_(
|
||||
"Inferring resistance for ", ab_name(.base_code, language = NULL),
|
||||
" from available drug+inhibitor combination(s): ",
|
||||
paste(ab_name(as.character(.combos), language = NULL), collapse = ", "),
|
||||
" (resistance in a combination always implies resistance in the base drug)",
|
||||
add_fn = font_blue
|
||||
)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
cols_ab <- cols_ab[!duplicated(names(cols_ab))]
|
||||
}
|
||||
rm(list = intersect(ls(), c(".combos_in_data", ".base_drugs", ".unique_bases", ".base", ".base_code", ".combos", ".combo_cols", ".sir_chars", ".new_col")))
|
||||
|
||||
# nolint start
|
||||
AMC <- cols_ab["AMC"]
|
||||
AMK <- cols_ab["AMK"]
|
||||
@@ -674,6 +718,16 @@ mdro <- function(x = NULL,
|
||||
x
|
||||
}
|
||||
|
||||
ab_without_inhibitor <- function(ab_codes) {
|
||||
# Get the base drug AB code from a drug+inhibitor combination.
|
||||
# e.g., AMC (amoxicillin/clavulanic acid) -> AMX (amoxicillin)
|
||||
# TZP (piperacillin/tazobactam) -> PIP (piperacillin)
|
||||
# SAM (ampicillin/sulbactam) -> AMP (ampicillin)
|
||||
combo_names <- ab_name(ab_codes, language = NULL)
|
||||
base_names <- gsub("/.*", "", combo_names)
|
||||
suppressMessages(as.ab(base_names))
|
||||
}
|
||||
|
||||
# antimicrobial classes
|
||||
# nolint start
|
||||
aminoglycosides <- c(TOB, GEN)
|
||||
|
||||
4
R/mic.R
4
R/mic.R
@@ -217,9 +217,9 @@ as.mic <- function(x, na.rm = FALSE, keep_operators = "all", round_to_next_log2
|
||||
warning_("Some MICs were combined values, only the first values are kept")
|
||||
x[x %like% "[0-9]/.*[0-9]"] <- gsub("/.*", "", x[x %like% "[0-9]/.*[0-9]"])
|
||||
}
|
||||
x <- trimws2(gsub("[\\p{L}]", "", x, perl = TRUE)) # \p{L} is the Unicode category for all letters, including those with diacritics
|
||||
x <- trimws2(gsub("[^e\\P{L}]", "", x, perl = TRUE)) # \p{L} is the Unicode category for all letters, including those with diacritics
|
||||
# remove other invalid characters
|
||||
x <- gsub("[^0-9.><= -]+", "", x, perl = TRUE)
|
||||
x <- gsub("[^0-9e.><= -]+", "", x, perl = TRUE)
|
||||
# transform => to >= and =< to <=
|
||||
x <- gsub("=<", "<=", x, fixed = TRUE)
|
||||
x <- gsub("=>", ">=", x, fixed = TRUE)
|
||||
|
||||
2
R/sir.R
2
R/sir.R
@@ -568,7 +568,7 @@ as.sir.default <- function(x,
|
||||
x[x %like% "dose"] <- "SDD"
|
||||
mtch <- grepl(paste0("(", S, "|", I, "|", R, "|", NI, "|", SDD, "|", WT, "|", NWT, "|", NS, "|[A-Z]+)"), x, perl = TRUE)
|
||||
x[!mtch] <- ""
|
||||
x[mtch] <- trimws2(gsub("[^\\p{L}]", "", x[mtch], perl = TRUE)) # \p{L} is the Unicode category for all letters, including those with diacritics
|
||||
x[mtch & x %unlike% "^[0-9+]$"] <- trimws2(gsub("[^\\p{L}]", "", x[mtch & x %unlike% "^[0-9+]$"], perl = TRUE)) # \p{L} is the Unicode category for all letters, including those with diacritics
|
||||
# apply regexes set by user
|
||||
x[x %like% S] <- "S"
|
||||
x[x %like% I] <- "I"
|
||||
|
||||
@@ -296,4 +296,50 @@ test_that("test-mdro.R", {
|
||||
expect_output(x <- mdro(example_isolates %>% group_by(ward), info = TRUE, pct_required_classes = 0))
|
||||
expect_output(x <- mdro(example_isolates %>% group_by(ward), guideline = custom, info = TRUE))
|
||||
}
|
||||
|
||||
# drug+inhibitor inference for missing base drug columns (issue #209) -------
|
||||
# Resistance in drug+inhibitor implies resistance in the base drug.
|
||||
# MRGN guideline is used because it explicitly requires PIP=R (not PIP OR TZP)
|
||||
# for Pseudomonas aeruginosa 4MRGN, making the proxy effect directly testable.
|
||||
pseud_no_pip <- data.frame(
|
||||
mo = as.mo("Pseudomonas aeruginosa"),
|
||||
TZP = as.sir("R"), # piperacillin/tazobactam; no PIP column
|
||||
CAZ = as.sir("R"),
|
||||
IPM = as.sir("R"),
|
||||
MEM = as.sir("R"),
|
||||
CIP = as.sir("R"),
|
||||
stringsAsFactors = FALSE
|
||||
)
|
||||
# Inference message goes to message() / stderr, not stdout
|
||||
# -> must use expect_message(), NOT expect_output()
|
||||
expect_message(
|
||||
suppressWarnings(mdro(pseud_no_pip, guideline = "mrgn", info = FALSE, verbose = TRUE)),
|
||||
"Inferring resistance"
|
||||
)
|
||||
# With TZP=R, PIP is inferred R -> 4MRGN criteria met -> level 3 (> 1)
|
||||
result_no_pip <- suppressMessages(suppressWarnings(
|
||||
mdro(pseud_no_pip, guideline = "mrgn", info = FALSE)
|
||||
))
|
||||
expect_true(as.integer(result_no_pip) > 1L)
|
||||
# Susceptibility in combo does NOT propagate: proxy = NA, not S
|
||||
# -> 4MRGN criteria no longer met -> lower level than when TZP=R
|
||||
pseud_tzp_s <- pseud_no_pip
|
||||
pseud_tzp_s$TZP <- as.sir("S")
|
||||
result_tzp_s <- suppressMessages(suppressWarnings(
|
||||
mdro(pseud_tzp_s, guideline = "mrgn", info = FALSE)
|
||||
))
|
||||
expect_true(as.integer(result_tzp_s) < as.integer(result_no_pip))
|
||||
|
||||
# Multiple combos for the same base drug: AMX can come from AMC (amoxicillin/clavulanic acid)
|
||||
ente_no_amx <- data.frame(
|
||||
mo = as.mo("Enterococcus faecium"),
|
||||
AMC = as.sir("R"), # amoxicillin/clavulanic acid; no AMX column
|
||||
VAN = as.sir("R"),
|
||||
TEC = as.sir("R"),
|
||||
LNZ = as.sir("R"),
|
||||
DAP = as.sir("R"),
|
||||
stringsAsFactors = FALSE
|
||||
)
|
||||
# Should run without error and return an ordered factor; AMX inferred R from AMC
|
||||
expect_inherits(suppressMessages(suppressWarnings(mdro(ente_no_amx, guideline = "EUCAST", info = FALSE))), c("factor", "ordered"))
|
||||
})
|
||||
|
||||
Reference in New Issue
Block a user