* Fix parallel computing in as.sir.data.frame
Six bugs in parallel = TRUE mode:
1. PSOCK workers (Windows / R < 4.0) never had AMR loaded, so every
exported/AMR function call failed. Added clusterEvalQ(cl, library(AMR))
with a graceful fallback to sequential when the package cannot be loaded
(e.g. dev-only load_all() environments).
2. clusterExport'd AMR_env was a frozen serialised copy; as.sir() on the
worker wrote to AMR:::AMR_env while run_as_sir_column read from the stale
copy, so the captured log was always wrong. Fixed by resolving AMR_env
dynamically via get("AMR_env", envir = asNamespace("AMR")) inside the
worker function, and removing AMR_env from clusterExport.
3. In the fork-based (mclapply) path each worker inherited the parent's full
sir_interpretation_history. Capturing the whole log then combining across
workers duplicated every pre-existing entry. Fixed by recording the log
row count before the as.sir() call and slicing only the new rows
afterwards.
4. run_as_sir_column used non-exported internals (%pm>%, pm_pull,
as.sir.default) that are inaccessible on PSOCK workers after library(AMR).
Replaced pipe chains with direct as.mic(as.character(x[, col, drop=TRUE]))
and as.disk(...) calls, and changed as.sir.default() to as.sir() which
dispatches correctly via S3.
5. With info = TRUE, worker forks printed per-column progress messages
simultaneously, producing garbled interleaved console output. Per-column
messages are now suppressed inside workers (effective_info = FALSE) while
the outer "Running in parallel" / "DONE" messages still appear.
6. Malformed Unicode escape \u00a (3 hex digits) in the "DONE" banner was
parsed by R as U+00AD (soft hyphen) + "ONE"; corrected to .
https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR
* Add parallel computing tests to test-sir.R
Eight targeted tests verify correctness of the parallel as.sir() path:
identical SIR output vs sequential, matching log row counts, no
pre-existing history duplication, reproducibility across runs, results
consistency across max_cores values, single-column fallback, and no
per-column worker messages leaking when info = TRUE. All pass when only
1 core is available (parallel silently falls back to sequential).
https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR
* Fix as.sir() data.frame: preserve already-<sir> columns, exclude metadata
Issue #278: two related bugs in the column-detection / type-assignment pipeline.
Bug 1 – already-<sir> columns deleted on re-run
Line 886 excluded already-sir columns from the type assignment (they
stayed type "") causing the result loop to do x[,col] <- NULL, deleting
them. Fix: drop the !is.sir() guard so all untyped columns fall through
to type "sir" and are re-processed correctly.
Bug 2 – metadata columns treated as antibiotics
as.ab("patient") -> OXY, as.ab("ward") -> PRU. The column detector
accepted any column whose name matched an antibiotic code, regardless of
content. Fix: for name-matched columns that do not already carry an AMR
class, also verify content looks like AMR data (all_valid_mics, all-
numeric, or any SIR-like string). all_valid_disks() is intentionally
avoided here because it strips letters from strings (as.disk("Pt_1")==1).
Also adds tools/benchmark_parallel.R: a standalone script that times
sequential vs parallel as.sir() across n=20/200/2000/20000 rows and
saves a ggplot2 PNG to tools/benchmark_parallel.png.
https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR
* Update benchmark: two-panel script with warm-up and column-count sweep
Previous single-panel benchmark was misleading: the first sequential run
paid one-time cache-warm-up cost (skewing n=20), and only 6 columns were
used so only 6 cores were ever active on a 16-core machine.
New two-panel design:
Left – vary rows with 16 fixed AB columns (shows memory-bandwidth
saturation for large n)
Right – vary columns with fixed rows (shows the real speedup profile:
parallel wins when n_cols >> 1)
Also adds a warm-up pass before measurements to eliminate first-call bias.
https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR
* Optimise parallel as.sir(): row-batch mode when n_cols < n_cores
Previously parallel dispatch only parallelised by column, so a 6-column
dataset on a 16-core machine used at most 6 cores with the other 10 idle.
For large n this also caused memory-bandwidth saturation (each worker did
a full n-row scan of clinical_breakpoints simultaneously).
New row-batch mode (fork path, R >= 4.0, non-Windows):
pieces_per_col = ceil(n_cores / n_cols)
Jobs = n_cols × pieces_per_col (≈ n_cores jobs total)
Each job: one column × one row slice
Benefits:
- All cores stay busy regardless of column count
- Per-worker memory footprint shrinks by pieces_per_col ×
- Breakpoints lookup cache pressure reduced per worker
PSOCK path (Windows / R < 4.0) is unchanged: per-job serialisation
overhead makes row batching unprofitable there.
run_as_sir_column() gains an optional `rows` parameter (NULL = all rows,
backward-compatible). Results are reassembled via as.sir(c(as.character(.)))
which is safe for already-clean SIR values.
https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR
* Fix info=FALSE ignored when no breakpoints found in as_sir_method
Operator-precedence bug at line 1601:
if (isTRUE(info) && nrow(df_unique) < 10 || nrow(breakpoints) == 0)
R evaluates && before ||, so this was equivalent to:
(isTRUE(info) && nrow(df_unique) < 10) || (nrow(breakpoints) == 0)
When nrow(breakpoints) == 0 (e.g. cefoxitin / flucloxacillin / mupirocin
against E. coli in EUCAST) the intro message was always printed regardless
of info. Fix: add parentheses so info gates both conditions:
isTRUE(info) && (nrow(df_unique) < 10 || nrow(breakpoints) == 0)
Also pass print = isTRUE(info) to progress_ticker so the progress bar
(which prints intro_txt as its title) is suppressed when info = FALSE.
https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR
* Fix cli formatting in as.sir() messages
- stop_if for empty ab_cols: wrap as.mic() and as.disk() in
{.help [{.fun ...}](...)} for clickable links in cli output
- Parallel mode message: use {.field col} formatting for column names
and quotes = FALSE in vector_and(), consistent with the rest of the
codebase (avoids double-quoting from both font_bold and quotes="'")
https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR
* Use font_bold() inside {.field} for column names in parallel message
Convention: paste0("{.field ", font_bold(col), "}") gives bold green
column names without quotation marks, consistent with the rest of the
codebase (e.g. the 'Cleaning values' message in run_as_sir_column).
https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR
* Add collapse = NULL to font_bold() for column name vectors
font_bold() without collapse = NULL joins a vector with "" into a single
string, breaking paste0() element-wise formatting for length > 1 vectors.
https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR
* Add tools/ to .Rbuildignore
Keeps the benchmark script out of the built package tarball.
https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR
---------
Co-authored-by: Claude <noreply@anthropic.com>
The AMR Package for R
Please visit our comprehensive package website https://amr-for-r.org to read more about this package, including many examples and tutorials.
Overview:
- Provides an all-in-one solution for antimicrobial resistance (AMR) data analysis in a One Health approach
- Peer-reviewed, used in over 175 countries, available in 28 languages
- Generates antibiograms - traditional, combined, syndromic, and even WISCA
- Provides the full microbiological taxonomy of ~79 000 distinct species and extensive info of ~620 antimicrobial drugs
- Applies CLSI 2011-2026 and EUCAST 2011-2026 clinical and veterinary breakpoints, and ECOFFs, for MIC and disk zone interpretation
- Corrects for duplicate isolates, calculates and predicts AMR per antimicrobial class
- Integrates with WHONET, ATC, EARS-Net, PubChem, LOINC, SNOMED CT, and NCBI
- 100% free of costs and dependencies, highly suitable for places with limited resources
The AMR package is a peer-reviewed, free and open-source R package
with zero dependencies to simplify the analysis and prediction of
Antimicrobial Resistance (AMR) and to work with microbial and
antimicrobial data and properties, by using evidence-based methods.
Our aim is to provide a standard for clean and reproducible AMR data
analysis, that can therefore empower epidemiological analyses to
continuously enable surveillance and treatment evaluation in any
setting.
The AMR package supports and can read any data format, including
WHONET data. This package works on Windows, macOS and Linux with all
versions of R since R-3.0 (April 2013). It was designed to work in any
setting, including those with very limited resources. It was created
for both routine data analysis and academic research at the Faculty of
Medical Sciences of the University of Groningen
and the University Medical Center Groningen.
How to get this package
To install the latest ‘release’ version from CRAN:
install.packages("AMR")
To install the latest ‘beta’ version:
install.packages("AMR", repos = "beta.amr-for-r.org")
If this does not work, try to install directly from GitHub using the
remotes package:
remotes::install_github("msberends/AMR")
This AMR package for R is free, open-source software and licensed under the GNU General Public License v2.0 (GPL-2). These requirements are consequently legally binding: modifications must be released under the same license when distributing the package, changes made to the code must be documented, source code must be made available when the package is distributed, and a copy of the license and copyright notice must be included with the package.