Issue #278: two related bugs in the column-detection / type-assignment pipeline.
Bug 1 – already-<sir> columns deleted on re-run
Line 886 excluded already-sir columns from the type assignment (they
stayed type "") causing the result loop to do x[,col] <- NULL, deleting
them. Fix: drop the !is.sir() guard so all untyped columns fall through
to type "sir" and are re-processed correctly.
Bug 2 – metadata columns treated as antibiotics
as.ab("patient") -> OXY, as.ab("ward") -> PRU. The column detector
accepted any column whose name matched an antibiotic code, regardless of
content. Fix: for name-matched columns that do not already carry an AMR
class, also verify content looks like AMR data (all_valid_mics, all-
numeric, or any SIR-like string). all_valid_disks() is intentionally
avoided here because it strips letters from strings (as.disk("Pt_1")==1).
Also adds tools/benchmark_parallel.R: a standalone script that times
sequential vs parallel as.sir() across n=20/200/2000/20000 rows and
saves a ggplot2 PNG to tools/benchmark_parallel.png.
https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR
Eight targeted tests verify correctness of the parallel as.sir() path:
identical SIR output vs sequential, matching log row counts, no
pre-existing history duplication, reproducibility across runs, results
consistency across max_cores values, single-column fallback, and no
per-column worker messages leaking when info = TRUE. All pass when only
1 core is available (parallel silently falls back to sequential).
https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR