Fix custom reference_data support in as.sir() (#239) (PR #279)

* Fix custom reference_data support in as.sir() (#239) - custom guideline names now correctly classify values as R: CLSI convention (>= breakpoint_R for MIC, <= for disk) applies only when guideline contains "CLSI"; all other guidelines including custom ones use the EUCAST convention (> breakpoint_R for MIC, < for disk) - guideline argument is now optional when reference_data is manually set: if omitted or if its value does not match any row in the custom data, all rows in reference_data are used; if set to a value present in the data, only matching rows are filtered — useful for multi-guideline custom tables - host = NA in custom reference_data now acts as a host-agnostic fallback when no host-specific breakpoint row exists for the current animal species - updated reference_data argument documentation to explain these conventions https://claude.ai/code/session_01Q8KtFFGG9qrjAgLJBbxG2U * Refactor R-classification logic using custom_breakpoints_set flag Introduce custom_breakpoints_set <- !identical(reference_data, AMR::clinical_breakpoints) at the top of as_sir_method() and replace all identical() calls inside that function with this variable. In the case_when_AMR interpretation blocks (MIC and disk), the R-classification now has three explicit arms: - !custom_breakpoints_set & EUCAST guideline -> open interval (> / <) - !custom_breakpoints_set & CLSI guideline -> closed interval (>= / <=) - custom_breakpoints_set -> open interval (> / <), always, regardless of the guideline name in the custom data (e.g. "CLSI_custom" must not accidentally trigger CLSI convention) https://claude.ai/code/session_01Q8KtFFGG9qrjAgLJBbxG2U * Fix unit tests for custom reference_data (#239) - Do not override my_bp$mo / my_bp$ab in tests: assigning plain character strips the <mo>/<ab> class, which check_reference_data() rejects. Use the mo/ab values already present in the source row instead. - Use NA_character_ instead of NA for my_bp$host so the host column keeps its character class. - Pass breakpoint_type = "animal" explicitly in the host-fallback test since the custom reference_data only contains animal-type breakpoints. https://claude.ai/code/session_01Q8KtFFGG9qrjAgLJBbxG2U * Add coerce_reference_data_columns() for lenient reference_data validation check_reference_data() now returns the (possibly coerced) reference_data and the call site captures the result so downstream code sees the fixed columns. A new coerce_reference_data_columns() helper is called before the strict class check inside check_reference_data(). It coerces columns to the expected types: - mo -> as.mo() if not already <mo> class - ab -> as.ab() if not already <ab> class - character columns -> as.character() (e.g. host = NA becomes NA_character_) - numeric columns -> as.double() - logical columns -> as.logical() This allows users to build a custom reference_data from a plain data.frame without having to pre-apply as.mo()/as.ab() or worry about NA column types. Updated the reference_data roxygen argument to document the auto-coercion and restored the tests to the simpler form that uses plain character assignments, relying on the new coercion instead of workarounds. https://claude.ai/code/session_01Q8KtFFGG9qrjAgLJBbxG2U --------- Co-authored-by: Claude <noreply@anthropic.com>
2026-06-25 02:16:21 +02:00 · 2026-04-25 14:38:01 +02:00
parent 19157ce718
commit 8261b91b24
4 changed files with 109 additions and 15 deletions
--- a/NEWS.md
+++ b/NEWS.md
@@ -21,6 +21,7 @@
 * Two new `NA` objects, `NA_ab_` and `NA_mo_`, analogous to base R's `NA_character_` and `NA_integer_`, for use in pipelines that require typed missing values

 ### Fixes
+* `as.sir()` with `reference_data`: custom guideline names now correctly classify values as R using EUCAST convention (`> breakpoint_R` for MIC, `< breakpoint_R` for disk); custom breakpoints with `host = NA` now serve as a host-agnostic fallback when no host-specific row matches (fixes #239)
 * Fixed multiple bugs in the `parallel = TRUE` mode of `as.sir()` for data frames: (1) PSOCK workers (Windows / R < 4.0) now correctly load the AMR package before processing, with a graceful fallback to sequential mode when the package cannot be loaded; (2) resolved stale-environment issue where the PSOCK path read a frozen copy of `AMR_env` instead of the live one, causing the wrong log entries to be captured; (3) fixed log-entry duplication in the fork-based path (`mclapply`) where pre-existing `sir_interpretation_history` rows were included in every worker's captured log; (4) removed use of non-exported internal functions (`%pm>%`, `pm_pull`, `as.sir.default`) from the worker closure, which made PSOCK workers fail; (5) suppressed per-column progress messages inside workers to prevent interleaved console output; (6) fixed a malformed Unicode escape `\u00a` (3 digits) in the "DONE" status message
 * Fixed a bug in `as.sir()` where values that were purely numeric (e.g., `"1"`) and matched the broad SIR-matching regex would be incorrectly stripped of all content by the Unicode letter filter
 * Fixed a bug in `as.mic()` where MIC values in scientific notation (e.g., `"1e-3"`) were incorrectly handled because the letter `e` was removed along with other Unicode letters; scientific notation `e` is now preserved