Migrate parallel computing in as.sir() from parallel:: to future/future.apply (#280)

* Migrate parallel computing in as.sir() from parallel:: to future/future.apply Replace parallel::mclapply() and parallel::parLapply() with future.apply::future_lapply(), enabling transparent support for any future backend (multisession, multicore, mirai_multisession, cluster) on all platforms including Windows. When parallel = TRUE the function now: (1) respects an active future::plan() set by the user without overriding it on exit, or (2) sets a temporary multisession plan with parallelly::availableCores() and tears it down on exit. The max_cores argument controls worker count only when no user plan is active. future and future.apply are added to Suggests in DESCRIPTION. https://claude.ai/code/session_01M1Jvf2Miu6JL4TQrEh1wS8 * Require user plan() for parallel=TRUE; fix as_wt_nwt false-positive warnings - parallel = TRUE now errors with a cli-styled message if no non-sequential future::plan() is active; users must call e.g. future::plan(future::multisession) before using parallel = TRUE (breaking change) - Removed auto-setup/teardown of multisession plan inside as.sir(), which was slow and caused version-mismatch issues with load_all() workflows - Added as_wt_nwt to the exclusion list in as_sir_method() to suppress false-positive "no longer used" warnings during parallel runs - Fixed pieces_per_col row-batch calculation to use n_workers (total available workers from the active plan) instead of n_cores (workers clipped to n_cols), so row-batch mode activates correctly when n_cols < n_workers - Updated @param parallel and @param max_cores roxygen docs; regenerated man/as.sir.Rd - Updated sequential-mode hint to instruct users to set plan() first https://claude.ai/code/session_01M1Jvf2Miu6JL4TQrEh1wS8 * fix parallel * fix parallel * unit tests * unit tedts --------- Co-authored-by: Claude <noreply@anthropic.com>
2026-06-29 22:56:30 +02:00 · 2026-04-30 08:57:19 +01:00
parent 3f1b20c304
commit 23beebc6c3
11 changed files with 277 additions and 294 deletions
--- a/NEWS.md
+++ b/NEWS.md
@@ -1,8 +1,13 @@
-# AMR 3.0.1.9052
+# AMR 3.0.1.9053
+
+This will become release v3.1.0, intended for launch end of May.

 ### New
 * Support for clinical breakpoints of 2026 of both CLSI and EUCAST, by adding all of their over 5,700 new clinical breakpoints to the `clinical_breakpoints` data set for usage in `as.sir()`. EUCAST 2026 is now the new default guideline for all MIC and disk diffusion interpretations.
-* Integration with the **tidymodels** framework to allow seamless use of SIR, MIC and disk data in modelling pipelines via `recipes`
+* Support for the [`future`](https://future.futureverse.org) package and its framework, as the previous implementation of parallel computing was slow
+  - **Breaking change**: `as.sir()` with `parallel = TRUE` now requires a non-sequential `future::plan()` to be active before the call — e.g., `future::plan(future::multisession)` — and throws an informative error if none is set.
+  - New all-core usage setup: when the number of AB columns is smaller than the number of available cores, rows are now split into batches so all cores stay active (row-batch mode). Previously, a 6-column dataset on a 16-core machine would only use 6 cores; now all 16 are used, with each worker processing a smaller row slice (lower per-worker memory pressure and processing time)
+* Integration with the *tidymodels* framework to allow seamless use of SIR, MIC and disk data in modelling pipelines via `recipes`
  - `step_mic_log2()` to transform `<mic>` columns with log2, and `step_sir_numeric()` to convert `<sir>` columns to numeric
  - New `tidyselect` helpers:
    - `all_sir()`, `all_sir_predictors()`
@@ -21,7 +26,6 @@
 * Two new `NA` objects, `NA_ab_` and `NA_mo_`, analogous to base R's `NA_character_` and `NA_integer_`, for use in pipelines that require typed missing values

 ### Fixes
-* Fixed multiple bugs in the `parallel = TRUE` mode of `as.sir()` for data frames
 * Fixed a bug in `as.sir()` where values that were purely numeric (e.g., `"1"`) and matched the broad SIR-matching regex would be incorrectly stripped of all content by the Unicode letter filter
 * Fixed a bug in `as.mic()` where MIC values in scientific notation (e.g., `"1e-3"`) were incorrectly handled because the letter `e` was removed along with other Unicode letters; scientific notation `e` is now preserved
 * Fixed a bug in `as.ab()` where certain AB codes containing "PH" or "TH" (such as `ETH`, `MTH`, `PHE`, `PHN`, `STH`, `THA`, `THI1`) would incorrectly return `NA` when combined in a vector with any untranslatable value (#245)
@@ -37,8 +41,7 @@
 * Fixed BRMO classification by including bacterial complexes (#275)
 * Fixed `as.sir()` for data frames silently deleting columns whose AB class was already `<sir>` when called a second time (re-running on already-converted data) (#278)
 * Fixed `as.sir()` for data frames incorrectly treating metadata columns (e.g. `patient`, `ward`) as antibiotic columns when their names coincidentally matched an antibiotic code; column content is now validated against AMR data patterns before inclusion
-* Improved parallel computing in `as.sir()`: when the number of AB columns is smaller than the number of available cores, rows are now split into batches so all cores stay active (row-batch mode). Previously, a 6-column dataset on a 16-core machine would only use 6 cores; now all 16 are used, with each worker processing a smaller row slice (lower per-worker memory pressure)
-* Fixed `as.sir()` ignoring `info = FALSE` for columns with no breakpoints (e.g. cefoxitin against *E. coli*): an operator-precedence bug (`&&`/`||`) caused the "Interpreting MIC values" intro message to fire unconditionally when `nrow(breakpoints) == 0`, regardless of `info`; the progress bar title was also not gated by `info`
+* Fixed `as.sir()` ignoring `info = FALSE` for columns with no breakpoints (e.g. cefoxitin against *E. coli*)

 ### Updates
 * `as.sir()` with `reference_data`: custom guideline names now correctly classify values as R using EUCAST convention (`> breakpoint_R` for MIC, `< breakpoint_R` for disk); custom breakpoints with `host = NA` now serve as a host-agnostic fallback when no host-specific row matches (#239)
@@ -56,7 +59,6 @@
  * This results in more reliable behaviour compared to previous versions for capped MIC values
  * Removed the `"inverse"` option, which has now become redundant
 * `ab_group()` now returns values consist with the AMR selectors (#246)
-* Added two new `NA` objects, `NA_ab_` and `NA_mo_`, analogous to base R's `NA_character_` and `NA_integer_`, for use in pipelines that require typed missing values


 # AMR 3.0.1