1
0
mirror of https://github.com/msberends/AMR.git synced 2026-04-28 10:23:53 +02:00

Fix parallel computing in as.sir.data.frame (#276)

* Fix parallel computing in as.sir.data.frame

Six bugs in parallel = TRUE mode:

1. PSOCK workers (Windows / R < 4.0) never had AMR loaded, so every
   exported/AMR function call failed. Added clusterEvalQ(cl, library(AMR))
   with a graceful fallback to sequential when the package cannot be loaded
   (e.g. dev-only load_all() environments).

2. clusterExport'd AMR_env was a frozen serialised copy; as.sir() on the
   worker wrote to AMR:::AMR_env while run_as_sir_column read from the stale
   copy, so the captured log was always wrong. Fixed by resolving AMR_env
   dynamically via get("AMR_env", envir = asNamespace("AMR")) inside the
   worker function, and removing AMR_env from clusterExport.

3. In the fork-based (mclapply) path each worker inherited the parent's full
   sir_interpretation_history. Capturing the whole log then combining across
   workers duplicated every pre-existing entry. Fixed by recording the log
   row count before the as.sir() call and slicing only the new rows
   afterwards.

4. run_as_sir_column used non-exported internals (%pm>%, pm_pull,
   as.sir.default) that are inaccessible on PSOCK workers after library(AMR).
   Replaced pipe chains with direct as.mic(as.character(x[, col, drop=TRUE]))
   and as.disk(...) calls, and changed as.sir.default() to as.sir() which
   dispatches correctly via S3.

5. With info = TRUE, worker forks printed per-column progress messages
   simultaneously, producing garbled interleaved console output. Per-column
   messages are now suppressed inside workers (effective_info = FALSE) while
   the outer "Running in parallel" / "DONE" messages still appear.

6. Malformed Unicode escape \u00a (3 hex digits) in the "DONE" banner was
   parsed by R as U+00AD (soft hyphen) + "ONE"; corrected to  .

https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR

* Add parallel computing tests to test-sir.R

Eight targeted tests verify correctness of the parallel as.sir() path:
identical SIR output vs sequential, matching log row counts, no
pre-existing history duplication, reproducibility across runs, results
consistency across max_cores values, single-column fallback, and no
per-column worker messages leaking when info = TRUE. All pass when only
1 core is available (parallel silently falls back to sequential).

https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR

* Fix as.sir() data.frame: preserve already-<sir> columns, exclude metadata

Issue #278: two related bugs in the column-detection / type-assignment pipeline.

Bug 1 – already-<sir> columns deleted on re-run
  Line 886 excluded already-sir columns from the type assignment (they
  stayed type "") causing the result loop to do x[,col] <- NULL, deleting
  them.  Fix: drop the !is.sir() guard so all untyped columns fall through
  to type "sir" and are re-processed correctly.

Bug 2 – metadata columns treated as antibiotics
  as.ab("patient") -> OXY, as.ab("ward") -> PRU.  The column detector
  accepted any column whose name matched an antibiotic code, regardless of
  content.  Fix: for name-matched columns that do not already carry an AMR
  class, also verify content looks like AMR data (all_valid_mics, all-
  numeric, or any SIR-like string).  all_valid_disks() is intentionally
  avoided here because it strips letters from strings (as.disk("Pt_1")==1).

Also adds tools/benchmark_parallel.R: a standalone script that times
sequential vs parallel as.sir() across n=20/200/2000/20000 rows and
saves a ggplot2 PNG to tools/benchmark_parallel.png.

https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR

* Update benchmark: two-panel script with warm-up and column-count sweep

Previous single-panel benchmark was misleading: the first sequential run
paid one-time cache-warm-up cost (skewing n=20), and only 6 columns were
used so only 6 cores were ever active on a 16-core machine.

New two-panel design:
  Left  – vary rows with 16 fixed AB columns (shows memory-bandwidth
          saturation for large n)
  Right – vary columns with fixed rows (shows the real speedup profile:
          parallel wins when n_cols >> 1)

Also adds a warm-up pass before measurements to eliminate first-call bias.

https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR

* Optimise parallel as.sir(): row-batch mode when n_cols < n_cores

Previously parallel dispatch only parallelised by column, so a 6-column
dataset on a 16-core machine used at most 6 cores with the other 10 idle.
For large n this also caused memory-bandwidth saturation (each worker did
a full n-row scan of clinical_breakpoints simultaneously).

New row-batch mode (fork path, R >= 4.0, non-Windows):
  pieces_per_col = ceil(n_cores / n_cols)
  Jobs = n_cols × pieces_per_col  (≈ n_cores jobs total)
  Each job: one column × one row slice

Benefits:
  - All cores stay busy regardless of column count
  - Per-worker memory footprint shrinks by pieces_per_col ×
  - Breakpoints lookup cache pressure reduced per worker

PSOCK path (Windows / R < 4.0) is unchanged: per-job serialisation
overhead makes row batching unprofitable there.

run_as_sir_column() gains an optional `rows` parameter (NULL = all rows,
backward-compatible). Results are reassembled via as.sir(c(as.character(.)))
which is safe for already-clean SIR values.

https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR

* Fix info=FALSE ignored when no breakpoints found in as_sir_method

Operator-precedence bug at line 1601:

  if (isTRUE(info) && nrow(df_unique) < 10 || nrow(breakpoints) == 0)

R evaluates && before ||, so this was equivalent to:

  (isTRUE(info) && nrow(df_unique) < 10) || (nrow(breakpoints) == 0)

When nrow(breakpoints) == 0 (e.g. cefoxitin / flucloxacillin / mupirocin
against E. coli in EUCAST) the intro message was always printed regardless
of info. Fix: add parentheses so info gates both conditions:

  isTRUE(info) && (nrow(df_unique) < 10 || nrow(breakpoints) == 0)

Also pass print = isTRUE(info) to progress_ticker so the progress bar
(which prints intro_txt as its title) is suppressed when info = FALSE.

https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR

* Fix cli formatting in as.sir() messages

- stop_if for empty ab_cols: wrap as.mic() and as.disk() in
  {.help [{.fun ...}](...)} for clickable links in cli output
- Parallel mode message: use {.field col} formatting for column names
  and quotes = FALSE in vector_and(), consistent with the rest of the
  codebase (avoids double-quoting from both font_bold and quotes="'")

https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR

* Use font_bold() inside {.field} for column names in parallel message

Convention: paste0("{.field ", font_bold(col), "}") gives bold green
column names without quotation marks, consistent with the rest of the
codebase (e.g. the 'Cleaning values' message in run_as_sir_column).

https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR

* Add collapse = NULL to font_bold() for column name vectors

font_bold() without collapse = NULL joins a vector with "" into a single
string, breaking paste0() element-wise formatting for length > 1 vectors.

https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR

* Add tools/ to .Rbuildignore

Keeps the benchmark script out of the built package tarball.

https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR

---------

Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
Matthijs Berends
2026-04-25 00:34:38 +02:00
committed by GitHub
parent e7780b6d5f
commit 19157ce718
6 changed files with 420 additions and 107 deletions

254
R/sir.R
View File

@@ -716,7 +716,7 @@ as.sir.disk <- function(x,
}
#' @rdname as.sir
#' @param parallel A [logical] to indicate if parallel computing must be used, defaults to `FALSE`. This requires no additional packages, as the used `parallel` package is part of base \R. On Windows and on \R < 4.0.0 [parallel::parLapply()] will be used, in all other cases the more efficient [parallel::mclapply()] will be used.
#' @param parallel A [logical] to indicate if parallel computing must be used, defaults to `FALSE`. The `parallel` package is part of base \R and no additional packages are required. On Unix/macOS with \R >= 4.0.0, [parallel::mclapply()] (fork-based) is used; on Windows and \R < 4.0.0, [parallel::parLapply()] with a PSOCK cluster is used (requires the AMR package to be installed, not just loaded via `devtools::load_all()`). Parallelism distributes columns across cores; it is most beneficial when there are many antibiotic columns and a large number of rows.
#' @param max_cores Maximum number of cores to use if `parallel = TRUE`. Use a negative value to subtract that number from the available number of cores, e.g. a value of `-2` on an 8-core machine means that at most 6 cores will be used. Defaults to `-1`. There will never be used more cores than variables to analyse. The available number of cores are detected using [parallelly::availableCores()] if that package is installed, and base \R's [parallel::detectCores()] otherwise.
#' @export
as.sir.data.frame <- function(x,
@@ -852,7 +852,6 @@ as.sir.data.frame <- function(x,
i <- 0
ab_cols <- colnames(x)[vapply(FUN.VALUE = logical(1), x, function(y) {
i <<- i + 1
check <- is.mic(y) | is.disk(y)
ab <- colnames(x)[i]
if (!is.null(col_mo) && ab == col_mo) {
return(FALSE)
@@ -861,13 +860,30 @@ as.sir.data.frame <- function(x,
return(FALSE)
}
if (length(sel) == 0 || (length(sel) > 0 && ab %in% sel)) {
# columns already carrying an AMR class are always included
y_bak <- x.bak[, ab, drop = TRUE]
if (is.mic(y_bak) || is.disk(y_bak) || is.sir(y_bak)) {
return(TRUE)
}
ab_coerced <- suppressWarnings(as.ab(ab, info = FALSE))
if (is.na(ab_coerced) || (length(sel) > 0 & !ab %in% sel)) {
# not even a valid AB code
return(FALSE)
} else {
return(TRUE)
}
# Name matches an antibiotic; also verify column content resembles AMR
# data. This prevents false positives on metadata columns whose names
# happen to match a drug code (e.g. 'patient' -> OXY, 'ward' -> PRU).
# Note: all_valid_disks() is intentionally avoided here because it strips
# non-numeric characters (as.disk("Pt_1") == 1), accepting patient IDs.
y_char <- tryCatch(as.character(y), error = function(e) character(0))
y_valid <- y_char[!is.na(y_char) & nzchar(trimws(y_char))]
if (length(y_valid) == 0L) {
return(FALSE)
}
y_numeric <- suppressWarnings(as.numeric(y_valid))
all_valid_mics(y) ||
all(!is.na(y_numeric)) ||
any(y_valid %in% c("S", "SDD", "I", "R", "NI"))
} else {
return(FALSE)
}
@@ -875,7 +891,7 @@ as.sir.data.frame <- function(x,
stop_if(
length(ab_cols) == 0,
"no columns with MIC values, disk zones or antibiotic column names found in this data set. Use as.mic() or as.disk() to transform antimicrobial columns."
"no columns with MIC values, disk zones or antibiotic column names found in this data set. Use {.help [{.fun as.mic}](AMR::as.mic)} or {.help [{.fun as.disk}](AMR::as.disk)} to transform antimicrobial columns."
)
# set type per column
types <- character(length(ab_cols))
@@ -883,7 +899,7 @@ as.sir.data.frame <- function(x,
types[vapply(FUN.VALUE = logical(1), x.bak[, ab_cols, drop = FALSE], is.mic)] <- "mic"
types[types == "" & vapply(FUN.VALUE = logical(1), x[, ab_cols, drop = FALSE], all_valid_disks)] <- "disk"
types[types == "" & vapply(FUN.VALUE = logical(1), x[, ab_cols, drop = FALSE], all_valid_mics)] <- "mic"
types[types == "" & !vapply(FUN.VALUE = logical(1), x.bak[, ab_cols, drop = FALSE], is.sir)] <- "sir"
types[types == ""] <- "sir"
if (any(types %in% c("mic", "disk"), na.rm = TRUE)) {
# now we need an mo column
stop_if(is.null(col_mo), "{.arg col_mo} must be set")
@@ -906,6 +922,21 @@ as.sir.data.frame <- function(x,
return(NULL)
}
)
if (!is.null(cl)) {
# Each PSOCK worker is a fresh R session — the AMR package must be loaded there
# so all exported functions (as.sir, as.mic, as.disk, ...) are available.
amr_loaded_on_workers <- tryCatch({
parallel::clusterEvalQ(cl, library(AMR, quietly = TRUE))
TRUE
}, error = function(e) FALSE)
if (!amr_loaded_on_workers) {
if (isTRUE(info)) {
message_("Could not load AMR on parallel workers (package may not be installed); falling back to single-core computation.")
}
parallel::stopCluster(cl)
cl <- NULL
}
}
if (is.null(cl)) {
n_cores <- 1
}
@@ -916,93 +947,139 @@ as.sir.data.frame <- function(x,
message_("Processing columns:", as_note = FALSE)
}
run_as_sir_column <- function(i) {
# In parallel mode suppress per-column messages: workers print simultaneously and
# their output would be interleaved on the console.
is_parallel_run <- isTRUE(parallel) && n_cores > 1 && length(ab_cols) > 1
effective_info <- if (is_parallel_run) FALSE else info
# Row-batch mode: when n_cols < n_cores we would leave cores idle under plain
# column-parallel dispatch. Instead we split rows into pieces so every core
# gets work. pieces_per_col = ceil(n_cores / n_cols) gives ~n_cores jobs
# total; each job processes one column on one row slice, which also reduces
# per-worker memory pressure (smaller breakpoints search space).
# Only used for the fork path (R >= 4.0, non-Windows); PSOCK clusters already
# incur high per-job serialisation overhead so we keep column-mode there.
use_fork <- is_parallel_run &&
!(.Platform$OS.type == "windows" || getRversion() < "4.0.0")
pieces_per_col <- if (use_fork && length(ab_cols) < n_cores) {
ceiling(n_cores / length(ab_cols))
} else {
1L
}
run_as_sir_column <- function(i, rows = NULL) {
# Always resolve AMR_env from the package namespace. This is essential for
# PSOCK workers (where the closure-captured AMR_env is a stale serialised copy
# while as.sir() writes to the live AMR:::AMR_env) and also avoids capturing
# pre-existing log entries from earlier in the session when forking.
.amr_env <- get("AMR_env", envir = asNamespace("AMR"), inherits = FALSE)
# In parallel mode each worker (fork or PSOCK) has its own copy of the
# history; record the current length so we capture only the new rows added
# by the as.sir() call below, not any pre-existing entries inherited at fork
# time or carried over from earlier as.sir() calls.
if (is_parallel_run) pre_log_n <- NROW(.amr_env$sir_interpretation_history)
ab_col <- ab_cols[i]
out <- list(result = NULL, log = NULL)
# row subsetting: NULL means all rows (column-mode), otherwise row-batch mode
row_idx <- if (is.null(rows)) seq_len(nrow(x)) else rows
if (types[i] == "mic") {
result <- x %pm>%
pm_pull(ab_col) %pm>%
as.character() %pm>%
as.mic() %pm>%
as.sir(
mo = x_mo,
mo.bak = x[, col_mo, drop = TRUE],
ab = ab_col,
guideline = guideline,
uti = uti,
capped_mic_handling = capped_mic_handling,
as_wt_nwt = as_wt_nwt,
add_intrinsic_resistance = add_intrinsic_resistance,
reference_data = reference_data,
substitute_missing_r_breakpoint = substitute_missing_r_breakpoint,
include_screening = include_screening,
include_PKPD = include_PKPD,
breakpoint_type = breakpoint_type,
host = host,
verbose = verbose,
info = info,
conserve_capped_values = conserve_capped_values,
is_data.frame = TRUE
)
result <- as.sir(
as.mic(as.character(x[row_idx, ab_col, drop = TRUE])),
mo = x_mo[row_idx],
mo.bak = x[row_idx, col_mo, drop = TRUE],
ab = ab_col,
guideline = guideline,
uti = if (length(uti) > 1L) uti[row_idx] else uti,
capped_mic_handling = capped_mic_handling,
as_wt_nwt = as_wt_nwt,
add_intrinsic_resistance = add_intrinsic_resistance,
reference_data = reference_data,
substitute_missing_r_breakpoint = substitute_missing_r_breakpoint,
include_screening = include_screening,
include_PKPD = include_PKPD,
breakpoint_type = breakpoint_type,
host = if (length(host) > 1L) host[row_idx] else host,
verbose = verbose,
info = effective_info,
conserve_capped_values = conserve_capped_values,
is_data.frame = TRUE
)
out$result <- result
out$log <- AMR_env$sir_interpretation_history
AMR_env$sir_interpretation_history <- AMR_env$sir_interpretation_history[0, , drop = FALSE] # reset log
if (is_parallel_run) {
full_log <- .amr_env$sir_interpretation_history
out$log <- if (pre_log_n < NROW(full_log)) {
full_log[seq.int(pre_log_n + 1L, NROW(full_log)), , drop = FALSE]
} else {
full_log[0L, , drop = FALSE]
}
} else {
out$log <- .amr_env$sir_interpretation_history
.amr_env$sir_interpretation_history <- .amr_env$sir_interpretation_history[0L, , drop = FALSE]
}
return(out)
} else if (types[i] == "disk") {
result <- x %pm>%
pm_pull(ab_col) %pm>%
as.character() %pm>%
as.disk() %pm>%
as.sir(
mo = x_mo,
mo.bak = x[, col_mo, drop = TRUE],
ab = ab_col,
guideline = guideline,
uti = uti,
as_wt_nwt = as_wt_nwt,
add_intrinsic_resistance = add_intrinsic_resistance,
reference_data = reference_data,
substitute_missing_r_breakpoint = substitute_missing_r_breakpoint,
include_screening = include_screening,
include_PKPD = include_PKPD,
breakpoint_type = breakpoint_type,
host = host,
verbose = verbose,
info = info,
is_data.frame = TRUE
)
result <- as.sir(
as.disk(as.character(x[row_idx, ab_col, drop = TRUE])),
mo = x_mo[row_idx],
mo.bak = x[row_idx, col_mo, drop = TRUE],
ab = ab_col,
guideline = guideline,
uti = if (length(uti) > 1L) uti[row_idx] else uti,
as_wt_nwt = as_wt_nwt,
add_intrinsic_resistance = add_intrinsic_resistance,
reference_data = reference_data,
substitute_missing_r_breakpoint = substitute_missing_r_breakpoint,
include_screening = include_screening,
include_PKPD = include_PKPD,
breakpoint_type = breakpoint_type,
host = if (length(host) > 1L) host[row_idx] else host,
verbose = verbose,
info = effective_info,
is_data.frame = TRUE
)
out$result <- result
out$log <- AMR_env$sir_interpretation_history
AMR_env$sir_interpretation_history <- AMR_env$sir_interpretation_history[0, , drop = FALSE]
if (is_parallel_run) {
full_log <- .amr_env$sir_interpretation_history
out$log <- if (pre_log_n < NROW(full_log)) {
full_log[seq.int(pre_log_n + 1L, NROW(full_log)), , drop = FALSE]
} else {
full_log[0L, , drop = FALSE]
}
} else {
out$log <- .amr_env$sir_interpretation_history
.amr_env$sir_interpretation_history <- .amr_env$sir_interpretation_history[0L, , drop = FALSE]
}
return(out)
} else if (types[i] == "sir") {
ab <- ab_col
ab_coerced <- suppressWarnings(as.ab(ab, info = FALSE))
show_message <- FALSE
if (!all(x[, ab, drop = TRUE] %in% c("S", "SDD", "I", "R", "NI", NA), na.rm = TRUE)) {
if (!all(x[row_idx, ab, drop = TRUE] %in% c("S", "SDD", "I", "R", "NI", NA), na.rm = TRUE)) {
show_message <- TRUE
if (isTRUE(info)) {
message_("\u00a0\u00a0", AMR_env$bullet_icon, " Cleaning values in column ", paste0("{.field ", font_bold(ab), "}"), " (",
if (isTRUE(effective_info)) {
message_("\u00a0\u00a0", .amr_env$bullet_icon, " Cleaning values in column ", paste0("{.field ", font_bold(ab), "}"), " (",
ifelse(ab_coerced != toupper(ab), paste0(ab_coerced, ", "), ""),
ab_name(ab_coerced, tolower = TRUE, info = info), ")... ",
ab_name(ab_coerced, tolower = TRUE, info = effective_info), ")... ",
appendLF = FALSE,
as_note = FALSE
)
}
} else if (!is.sir(x.bak[, ab, drop = TRUE])) {
show_message <- TRUE
if (isTRUE(info)) {
message_("\u00a0\u00a0", AMR_env$bullet_icon, " Assigning class {.cls sir} to already clean column ", paste0("{.field ", font_bold(ab), "}"), " (",
if (isTRUE(effective_info)) {
message_("\u00a0\u00a0", .amr_env$bullet_icon, " Assigning class {.cls sir} to already clean column ", paste0("{.field ", font_bold(ab), "}"), " (",
ifelse(ab_coerced != toupper(ab), paste0(ab_coerced, ", "), ""),
ab_name(ab_coerced, tolower = TRUE, language = NULL, info = info), ")... ",
ab_name(ab_coerced, tolower = TRUE, language = NULL, info = effective_info), ")... ",
appendLF = FALSE,
as_note = FALSE
)
}
}
result <- as.sir.default(x = as.character(x[, ab, drop = TRUE]))
if (show_message == TRUE && isTRUE(info)) {
result <- as.sir(as.character(x[row_idx, ab, drop = TRUE]))
if (show_message == TRUE && isTRUE(effective_info)) {
message_(font_green_bg("\u00a0OK\u00a0"), as_note = FALSE)
}
out$result <- result
@@ -1016,26 +1093,55 @@ as.sir.data.frame <- function(x,
if (isTRUE(parallel) && n_cores > 1 && length(ab_cols) > 1) {
if (isTRUE(info)) {
message_(as_note = FALSE)
message_("Running in parallel mode using ", n_cores, " out of ", get_n_cores(Inf), " cores, on columns ", vector_and(font_bold(ab_cols, collapse = NULL), quotes = "'", sort = FALSE), "...", as_note = FALSE, appendLF = FALSE)
if (pieces_per_col > 1L) {
message_("Running in parallel mode using ", n_cores, " out of ", get_n_cores(Inf), " cores, on columns ", vector_and(paste0("{.field ", font_bold(ab_cols, collapse = NULL), "}"), quotes = FALSE, sort = FALSE), " (", pieces_per_col, " row slices per column)...", as_note = FALSE, appendLF = FALSE)
} else {
message_("Running in parallel mode using ", n_cores, " out of ", get_n_cores(Inf), " cores, on columns ", vector_and(paste0("{.field ", font_bold(ab_cols, collapse = NULL), "}"), quotes = FALSE, sort = FALSE), "...", as_note = FALSE, appendLF = FALSE)
}
}
if (.Platform$OS.type == "windows" || getRversion() < "4.0.0") {
# `cl` has been created in the part above before the `run_as_sir_column` function
# PSOCK cluster: column-mode only (row-batch serialisation overhead not worth it)
on.exit(parallel::stopCluster(cl), add = TRUE)
parallel::clusterExport(cl, varlist = c(
"x", "x.bak", "x_mo", "ab_cols", "types",
"capped_mic_handling", "as_wt_nwt", "add_intrinsic_resistance",
"reference_data", "substitute_missing_r_breakpoint", "include_screening", "include_PKPD",
"breakpoint_type", "guideline", "host", "uti", "info", "verbose",
"col_mo", "AMR_env", "conserve_capped_values",
"breakpoint_type", "guideline", "host", "uti", "verbose",
"col_mo", "conserve_capped_values",
"effective_info", "is_parallel_run",
"run_as_sir_column"
), envir = environment())
result_list <- parallel::parLapply(cl, seq_along(ab_cols), run_as_sir_column)
} else if (pieces_per_col > 1L) {
# Row-batch mode (R >= 4.0, non-Windows, n_cols < n_cores):
# build (col, row_slice) job pairs so all cores stay active
row_cuts <- unique(round(seq(0, nrow(x), length.out = pieces_per_col + 1L)))
row_ranges <- lapply(seq_len(length(row_cuts) - 1L), function(p) {
seq.int(row_cuts[p] + 1L, row_cuts[p + 1L])
})
jobs <- do.call(c, lapply(seq_along(ab_cols), function(ci) {
lapply(seq_along(row_ranges), function(p) list(col = ci, rows = row_ranges[[p]]))
}))
flat <- parallel::mclapply(jobs, function(job) {
run_as_sir_column(job$col, job$rows)
}, mc.cores = n_cores)
# Reassemble: for each column concatenate row pieces in order
result_list <- lapply(seq_along(ab_cols), function(ci) {
pieces <- flat[vapply(jobs, function(j) j$col == ci, logical(1L))]
list(
result = as.sir(do.call(c, lapply(pieces, function(p) as.character(p$result)))),
log = {
logs <- Filter(Negate(is.null), lapply(pieces, function(p) p$log))
if (length(logs) > 0L) do.call(rbind_AMR, logs) else NULL
}
)
})
} else {
# R>=4.0 on unix
# Column-parallel mode (R >= 4.0, non-Windows, n_cols >= n_cores)
result_list <- parallel::mclapply(seq_along(ab_cols), run_as_sir_column, mc.cores = n_cores)
}
if (isTRUE(info)) {
message_(font_green_bg("\u00aDONE\u00a"), as_note = FALSE)
message_(font_green_bg("\u00a0DONE\u00a0"), as_note = FALSE)
message_(as_note = FALSE)
message_("Run {.help [{.fun sir_interpretation_history}](AMR::sir_interpretation_history)} to retrieve a logbook with all details of the breakpoint interpretations.")
}
@@ -1492,11 +1598,11 @@ as_sir_method <- function(method_short,
add_intrinsic_resistance_to_AMR_env()
}
if (isTRUE(info) && nrow(df_unique) < 10 || nrow(breakpoints) == 0) {
if (isTRUE(info) && (nrow(df_unique) < 10 || nrow(breakpoints) == 0)) {
# only print intro under 10 items, otherwise progressbar will print this and then it will be printed double
message_(intro_txt, appendLF = FALSE, as_note = FALSE)
}
p <- progress_ticker(n = nrow(df_unique), n_min = 10, title = intro_txt, only_bar_percent = TRUE)
p <- progress_ticker(n = nrow(df_unique), n_min = 10, print = isTRUE(info), title = intro_txt, only_bar_percent = TRUE)
has_progress_bar <- !is.null(import_fn("progress_bar", "progress", error_on_fail = FALSE)) && nrow(df_unique) >= 10
on.exit(close(p))