1
0
mirror of https://github.com/msberends/AMR.git synced 2026-05-14 01:10:45 +02:00

Add parallel computing support to antibiogram() and wisca() (#281) (#282)

* Add parallel computing support to antibiogram() and wisca() (#281)

For WISCA: simulations are distributed across (group, chunk) job pairs
via future.apply::future_lapply(), keeping all workers active even when
the regimen count is smaller than nbrOfWorkers(). Sequential fallback
with progress ticker is preserved when parallel = FALSE or workers = 1.

For grouped antibiograms: each group is processed by a separate worker,
mirroring the row-batch approach in as.sir().

Same gate pattern as as.sir() (PR #280): requires a non-sequential
future::plan() to be active; auto-upgrades to parallel = TRUE when a
parallel plan is detected; throws an informative error otherwise.

https://claude.ai/code/session_01FC43syPbzhGmKgrrVNHjnF

* Fix version to 3.0.1.9055 and update CLAUDE.md version formula

Uses origin/${defaultbranch} (with a fetch) instead of the local
branch ref so the commit count is never stale after a merge.

https://claude.ai/code/session_01FC43syPbzhGmKgrrVNHjnF

* Fix non-ASCII characters in antibiogram.R

Replace en/em dashes and non-breaking spaces with ASCII equivalents
to satisfy R CMD check portability requirement.

https://claude.ai/code/session_01FC43syPbzhGmKgrrVNHjnF

* Update auto-generated Rd files after documentation rebuild

https://claude.ai/code/session_01FC43syPbzhGmKgrrVNHjnF

* Move parallel gate to top of antibiogram.default() like sir.R

The gate was inside the wisca==TRUE block, so parallel=TRUE with a
sequential plan was silently ignored for non-WISCA antibiograms.
Now the gate runs unconditionally at the top of the function,
identical to the as.sir() pattern: error on explicit parallel=TRUE
with sequential plan, auto-upgrade when a non-sequential plan is
already active.

https://claude.ai/code/session_01FC43syPbzhGmKgrrVNHjnF

* Fix parallel WISCA returning all NA; strengthen tests; add sequential hint

Bug: lapply() over a factor yields length-1 factor elements (integer
codes), while for() over a factor yields character strings.  The job
list stored j\$group as a factor integer, but the reassembly loop
compared it with identical(j\$group, g) where g was character -- always
FALSE, so no simulation chunks were ever assembled and coverage stayed
NA throughout.

Fix: convert unique_groups to character before building jobs so both
the job list and the reassembly loop use the same type.

Tests: replaced na.rm = TRUE guards with explicit anyNA() checks so the
test suite would have caught the all-NA result immediately.

Also adds a sequential-mode performance hint (analogous to sir.R
lines 1116-1127) when simulations >= 500 and >= 3 regimens.

https://claude.ai/code/session_01FC43syPbzhGmKgrrVNHjnF

---------

Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
Matthijs Berends
2026-04-30 18:41:56 +01:00
committed by GitHub
parent 23beebc6c3
commit f7e9294bea
8 changed files with 279 additions and 79 deletions

View File

@@ -167,7 +167,8 @@ Then run the following from the repo root to determine the version string to use
currenttag=$(git describe --tags --abbrev=0 | sed 's/v//') currenttag=$(git describe --tags --abbrev=0 | sed 's/v//')
currenttagfull=$(git describe --tags --abbrev=0) currenttagfull=$(git describe --tags --abbrev=0)
defaultbranch=$(git branch | cut -c 3- | grep -E '^master$|^main$') defaultbranch=$(git branch | cut -c 3- | grep -E '^master$|^main$')
currentcommit=$(git rev-list --count ${currenttagfull}..${defaultbranch}) git fetch origin ${defaultbranch} --quiet
currentcommit=$(git rev-list --count ${currenttagfull}..origin/${defaultbranch})
currentversion="${currenttag}.$((currentcommit + 9001 + 1))" currentversion="${currenttag}.$((currentcommit + 9001 + 1))"
echo "$currentversion" echo "$currentversion"
``` ```

View File

@@ -1,6 +1,6 @@
Package: AMR Package: AMR
Version: 3.0.1.9053 Version: 3.0.1.9055
Date: 2026-04-27 Date: 2026-04-30
Title: Antimicrobial Resistance Data Analysis Title: Antimicrobial Resistance Data Analysis
Description: Functions to simplify and standardise antimicrobial resistance (AMR) Description: Functions to simplify and standardise antimicrobial resistance (AMR)
data analysis and to work with microbial and antimicrobial properties by data analysis and to work with microbial and antimicrobial properties by

View File

@@ -1,4 +1,4 @@
# AMR 3.0.1.9053 # AMR 3.0.1.9055
This will become release v3.1.0, intended for launch end of May. This will become release v3.1.0, intended for launch end of May.
@@ -7,6 +7,7 @@ This will become release v3.1.0, intended for launch end of May.
* Support for the [`future`](https://future.futureverse.org) package and its framework, as the previous implementation of parallel computing was slow * Support for the [`future`](https://future.futureverse.org) package and its framework, as the previous implementation of parallel computing was slow
- **Breaking change**: `as.sir()` with `parallel = TRUE` now requires a non-sequential `future::plan()` to be active before the call — e.g., `future::plan(future::multisession)` — and throws an informative error if none is set. - **Breaking change**: `as.sir()` with `parallel = TRUE` now requires a non-sequential `future::plan()` to be active before the call — e.g., `future::plan(future::multisession)` — and throws an informative error if none is set.
- New all-core usage setup: when the number of AB columns is smaller than the number of available cores, rows are now split into batches so all cores stay active (row-batch mode). Previously, a 6-column dataset on a 16-core machine would only use 6 cores; now all 16 are used, with each worker processing a smaller row slice (lower per-worker memory pressure and processing time) - New all-core usage setup: when the number of AB columns is smaller than the number of available cores, rows are now split into batches so all cores stay active (row-batch mode). Previously, a 6-column dataset on a 16-core machine would only use 6 cores; now all 16 are used, with each worker processing a smaller row slice (lower per-worker memory pressure and processing time)
- `antibiogram()` and `wisca()` gained a `parallel` argument using the same `future`/`future.apply` pattern: for WISCA, Monte Carlo simulations are split into `(group, chunk)` job pairs distributed across workers; for grouped antibiograms, each group is processed by a separate worker (#281)
* Integration with the *tidymodels* framework to allow seamless use of SIR, MIC and disk data in modelling pipelines via `recipes` * Integration with the *tidymodels* framework to allow seamless use of SIR, MIC and disk data in modelling pipelines via `recipes`
- `step_mic_log2()` to transform `<mic>` columns with log2, and `step_sir_numeric()` to convert `<sir>` columns to numeric - `step_mic_log2()` to transform `<mic>` columns with log2, and `step_sir_numeric()` to convert `<sir>` columns to numeric
- New `tidyselect` helpers: - New `tidyselect` helpers:

View File

@@ -54,7 +54,7 @@
#' @param add_total_n *(deprecated in favour of `formatting_type`)* A [logical] to indicate whether `n_tested` available numbers per pathogen should be added to the table (default is `TRUE`). This will add the lowest and highest number of available isolates per antimicrobial (e.g, if for *E. coli* 200 isolates are available for ciprofloxacin and 150 for amoxicillin, the returned number will be "150-200"). This option is unavailable when `wisca = TRUE`; in that case, use [retrieve_wisca_parameters()] to get the parameters used for WISCA. #' @param add_total_n *(deprecated in favour of `formatting_type`)* A [logical] to indicate whether `n_tested` available numbers per pathogen should be added to the table (default is `TRUE`). This will add the lowest and highest number of available isolates per antimicrobial (e.g, if for *E. coli* 200 isolates are available for ciprofloxacin and 150 for amoxicillin, the returned number will be "150-200"). This option is unavailable when `wisca = TRUE`; in that case, use [retrieve_wisca_parameters()] to get the parameters used for WISCA.
#' @param only_all_tested (for combination antibiograms): a [logical] to indicate that isolates must be tested for all antimicrobials, see *Details*. #' @param only_all_tested (for combination antibiograms): a [logical] to indicate that isolates must be tested for all antimicrobials, see *Details*.
#' @param digits Number of digits to use for rounding the antimicrobial coverage, defaults to 1 for WISCA and 0 otherwise. #' @param digits Number of digits to use for rounding the antimicrobial coverage, defaults to 1 for WISCA and 0 otherwise.
#' @param formatting_type Numeric value (122 for WISCA, 1-12 for non-WISCA) indicating how the 'cells' of the antibiogram table should be formatted. See *Details* > *Formatting Type* for a list of options. #' @param formatting_type Numeric value (1-22 for WISCA, 1-12 for non-WISCA) indicating how the 'cells' of the antibiogram table should be formatted. See *Details* > *Formatting Type* for a list of options.
#' @param col_mo Column name of the names or codes of the microorganisms (see [as.mo()]) - the default is the first column of class [`mo`]. Values will be coerced using [as.mo()]. #' @param col_mo Column name of the names or codes of the microorganisms (see [as.mo()]) - the default is the first column of class [`mo`]. Values will be coerced using [as.mo()].
#' @param language Language to translate text, which defaults to the system language (see [get_AMR_locale()]). #' @param language Language to translate text, which defaults to the system language (see [get_AMR_locale()]).
#' @param minimum The minimum allowed number of available (tested) isolates. Any isolate count lower than `minimum` will return `NA` with a warning. The default number of `30` isolates is advised by the Clinical and Laboratory Standards Institute (CLSI) as best practice, see *Source*. #' @param minimum The minimum allowed number of available (tested) isolates. Any isolate count lower than `minimum` will return `NA` with a warning. The default number of `30` isolates is advised by the Clinical and Laboratory Standards Institute (CLSI) as best practice, see *Source*.
@@ -65,6 +65,7 @@
#' @param simulations (for WISCA) a numerical value to set the number of Monte Carlo simulations. #' @param simulations (for WISCA) a numerical value to set the number of Monte Carlo simulations.
#' @param conf_interval A numerical value to set confidence interval (default is `0.95`). #' @param conf_interval A numerical value to set confidence interval (default is `0.95`).
#' @param interval_side The side of the confidence interval, either `"two-tailed"` (default), `"left"` or `"right"`. #' @param interval_side The side of the confidence interval, either `"two-tailed"` (default), `"left"` or `"right"`.
#' @param parallel A [logical] to indicate if parallel computing must be used, defaults to `FALSE`. Requires the [`future.apply`][future.apply::future_lapply()] package. For WISCA, Monte Carlo simulations are distributed across workers; for grouped antibiograms, each group is processed by a separate worker. **A non-sequential [future::plan()] must already be active before setting `parallel = TRUE`** -- for example, `future::plan(future::multisession)`. An error is thrown if `parallel = TRUE` is used without a plan set by the user.
#' @param info A [logical] to indicate info should be printed - the default is `TRUE` only in interactive mode. #' @param info A [logical] to indicate info should be printed - the default is `TRUE` only in interactive mode.
#' @param object An [antibiogram()] object. #' @param object An [antibiogram()] object.
#' @param ... When used in [R Markdown or Quarto][knitr::kable()]: arguments passed on to [knitr::kable()] (otherwise, has no use). #' @param ... When used in [R Markdown or Quarto][knitr::kable()]: arguments passed on to [knitr::kable()] (otherwise, has no use).
@@ -413,6 +414,7 @@ antibiogram <- function(x,
conf_interval = 0.95, conf_interval = 0.95,
interval_side = "two-tailed", interval_side = "two-tailed",
info = interactive(), info = interactive(),
parallel = FALSE,
...) { ...) {
UseMethod("antibiogram") UseMethod("antibiogram")
} }
@@ -439,6 +441,7 @@ antibiogram.default <- function(x,
conf_interval = 0.95, conf_interval = 0.95,
interval_side = "two-tailed", interval_side = "two-tailed",
info = interactive(), info = interactive(),
parallel = FALSE,
...) { ...) {
meet_criteria(x, allow_class = "data.frame") meet_criteria(x, allow_class = "data.frame")
x <- ascertain_sir_classes(x, "x") x <- ascertain_sir_classes(x, "x")
@@ -478,6 +481,35 @@ antibiogram.default <- function(x,
meet_criteria(conf_interval, allow_class = c("numeric", "integer"), has_length = 1, is_finite = TRUE, is_positive = TRUE) meet_criteria(conf_interval, allow_class = c("numeric", "integer"), has_length = 1, is_finite = TRUE, is_positive = TRUE)
meet_criteria(interval_side, allow_class = "character", has_length = 1, is_in = c("two-tailed", "left", "right")) meet_criteria(interval_side, allow_class = "character", has_length = 1, is_in = c("two-tailed", "left", "right"))
meet_criteria(info, allow_class = "logical", has_length = 1) meet_criteria(info, allow_class = "logical", has_length = 1)
meet_criteria(parallel, allow_class = "logical", has_length = 1)
# parallel gate - identical pattern to as.sir()
if (requireNamespace("future.apply", quietly = TRUE) && !inherits(future::plan(), "sequential")) {
if (isFALSE(parallel)) {
message_("Assuming {.code parallel = TRUE} since parallel computing has been set up using the {.pkg future} package before. Set {.help [{.fun plan}](future::plan)} to sequential to prevent this.")
}
parallel <- TRUE
}
if (isTRUE(parallel)) {
stop_ifnot(
requireNamespace("future.apply", quietly = TRUE),
"Setting {.code parallel = TRUE} requires the {.pkg future.apply} package.\n",
"Install it with {.code install.packages(\"future.apply\")}."
)
stop_if(inherits(future::plan(), "sequential"),
"Setting {.code parallel = TRUE} requires a non-sequential {.help [{.fun future::plan}](future::plan)} to be active.\n",
"For your system, you could first run: {.code library(future); ",
ifelse(.Platform$OS.type == "windows" || in_rstudio(),
"plan(multisession)",
"plan(multicore)"
),
"}",
call = FALSE
)
n_workers <- future::nbrOfWorkers()
} else {
n_workers <- 1L
}
# try to find columns based on type # try to find columns based on type
if (is.null(col_mo)) { if (is.null(col_mo)) {
@@ -705,52 +737,96 @@ antibiogram.default <- function(x,
wisca_parameters <- out wisca_parameters <- out
progress <- progress_ticker( # quantile probabilities are constant across all groups
n = length(unique(wisca_parameters$group)) * simulations, probs <- if (interval_side == "two-tailed") {
n_min = 25, c((1 - conf_interval) / 2, 1 - (1 - conf_interval) / 2)
print = info, } else if (interval_side == "left") {
title = paste("Calculating WISCA for", length(unique(wisca_parameters$group)), "regimens") c(0, conf_interval)
) } else {
on.exit(close(progress)) c(1 - conf_interval, 1)
# run WISCA per group
for (group in unique(wisca_parameters$group)) {
params_current <- wisca_parameters[wisca_parameters$group == group, , drop = FALSE]
if (sum(params_current$n_tested, na.rm = TRUE) == 0) {
next
}
# prepare priors
priors_current <- create_wisca_priors(params_current)
# Monte Carlo simulations
coverage_simulations <- vapply(
FUN.VALUE = double(1),
seq_len(simulations), function(i) {
progress$tick()
simulate_coverage(priors_current)
}
)
# summarise results
coverage_mean <- mean(coverage_simulations)
if (interval_side == "two-tailed") {
probs <- c((1 - conf_interval) / 2, 1 - (1 - conf_interval) / 2)
} else if (interval_side == "left") {
probs <- c(0, conf_interval)
} else if (interval_side == "right") {
probs <- c(1 - conf_interval, 1)
}
coverage_ci <- unname(stats::quantile(coverage_simulations, probs = probs))
out_wisca$coverage[out_wisca$group == group] <- coverage_mean
out_wisca$lower_ci[out_wisca$group == group] <- coverage_ci[1]
out_wisca$upper_ci[out_wisca$group == group] <- coverage_ci[2]
} }
close(progress) unique_groups <- as.character(unique(wisca_parameters$group))
use_parallel_wisca <- isTRUE(parallel) && n_workers > 1L && length(unique_groups) > 0L
if (use_parallel_wisca) {
if (isTRUE(info)) {
message_("Running WISCA in parallel mode using ", n_workers, " workers...", as_note = FALSE, appendLF = FALSE)
}
# chunks_per_group gives ~n_workers total jobs so all workers stay busy
# even when the number of regimens is smaller than n_workers
chunks_per_group <- max(1L, ceiling(n_workers / length(unique_groups)))
chunk_sizes <- diff(c(0L, round(seq_len(chunks_per_group) * simulations / chunks_per_group)))
# precompute priors per group and build (group, chunk) job list
jobs <- unlist(lapply(unique_groups, function(g) {
params_g <- wisca_parameters[wisca_parameters$group == g, , drop = FALSE]
if (sum(params_g$n_tested, na.rm = TRUE) == 0L) return(NULL)
priors_g <- create_wisca_priors(params_g)
lapply(seq_along(chunk_sizes), function(ch) {
list(group = g, priors = priors_g, n_sims = chunk_sizes[ch])
})
}), recursive = FALSE)
jobs <- Filter(Negate(is.null), jobs)
flat <- future.apply::future_lapply(jobs, function(job) {
vapply(FUN.VALUE = double(1), seq_len(job$n_sims), function(i) {
simulate_coverage(job$priors)
})
}, future.seed = TRUE)
# reassemble per group: concatenate chunks, then summarise
for (g in unique_groups) {
g_idx <- vapply(jobs, function(j) identical(j$group, g), logical(1))
if (!any(g_idx)) next
sims <- unlist(flat[g_idx], use.names = FALSE)
out_wisca$coverage[out_wisca$group == g] <- mean(sims)
ci_vals <- unname(stats::quantile(sims, probs = probs))
out_wisca$lower_ci[out_wisca$group == g] <- ci_vals[1]
out_wisca$upper_ci[out_wisca$group == g] <- ci_vals[2]
}
if (isTRUE(info)) message_(font_green_bg(" DONE "), as_note = FALSE)
} else {
progress <- progress_ticker(
n = length(unique_groups) * simulations,
n_min = 25,
print = info,
title = paste("Calculating WISCA for", length(unique_groups), "regimens")
)
on.exit(close(progress), add = TRUE)
for (group in unique_groups) {
params_current <- wisca_parameters[wisca_parameters$group == group, , drop = FALSE]
if (sum(params_current$n_tested, na.rm = TRUE) == 0) next
priors_current <- create_wisca_priors(params_current)
coverage_simulations <- vapply(
FUN.VALUE = double(1),
seq_len(simulations), function(i) {
progress$tick()
simulate_coverage(priors_current)
}
)
out_wisca$coverage[out_wisca$group == group] <- mean(coverage_simulations)
ci_vals <- unname(stats::quantile(coverage_simulations, probs = probs))
out_wisca$lower_ci[out_wisca$group == group] <- ci_vals[1]
out_wisca$upper_ci[out_wisca$group == group] <- ci_vals[2]
}
close(progress)
if (isTRUE(info) && simulations >= 500 && length(unique_groups) >= 3) {
suggest <- ifelse(.Platform$OS.type == "windows" || in_rstudio(),
"plan(multisession)",
"plan(multicore)"
)
if (requireNamespace("future.apply", quietly = TRUE)) {
message_("Running in sequential mode. To speed up WISCA, set a parallel {.help [{.fun future::plan}](future::plan)} such as {.code ", suggest, "} and use {.code parallel = TRUE}.")
} else {
message_("Running in sequential mode. To speed up WISCA, install the {.pkg future.apply} package and then set {.code parallel = TRUE}.")
}
}
}
# final output preparation # final output preparation
out <- out_wisca out <- out_wisca
@@ -997,30 +1073,50 @@ antibiogram.grouped_df <- function(x,
conf_interval = 0.95, conf_interval = 0.95,
interval_side = "two-tailed", interval_side = "two-tailed",
info = interactive(), info = interactive(),
parallel = FALSE,
...) { ...) {
stop_ifnot(is.null(mo_transform), "{.arg mo_transform} must not be set if creating an antibiogram using a grouped tibble. The groups will become the variables over which the antimicrobials are calculated, which could include the pathogen information (though not necessary). Nonetheless, this makes {.arg mo_transform} redundant.", call = FALSE) stop_ifnot(is.null(mo_transform), "{.arg mo_transform} must not be set if creating an antibiogram using a grouped tibble. The groups will become the variables over which the antimicrobials are calculated, which could include the pathogen information (though not necessary). Nonetheless, this makes {.arg mo_transform} redundant.", call = FALSE)
stop_ifnot(is.null(syndromic_group), "{.arg syndromic_group} must not be set if creating an antibiogram using a grouped tibble. The groups will become the variables over which the antimicrobials are calculated, making {.arg syndromic_group} redundant.", call = FALSE) stop_ifnot(is.null(syndromic_group), "{.arg syndromic_group} must not be set if creating an antibiogram using a grouped tibble. The groups will become the variables over which the antimicrobials are calculated, making {.arg syndromic_group} redundant.", call = FALSE)
meet_criteria(parallel, allow_class = "logical", has_length = 1)
groups <- attributes(x)$groups groups <- attributes(x)$groups
n_groups <- NROW(groups) n_groups <- NROW(groups)
progress <- progress_ticker(
n = n_groups,
n_min = 5,
print = info,
title = paste("Calculating AMR for", n_groups, "groups")
)
on.exit(close(progress))
out <- NULL # parallel gate - identical pattern to as.sir()
wisca_parameters <- NULL if (requireNamespace("future.apply", quietly = TRUE) && !inherits(future::plan(), "sequential")) {
long_numeric <- NULL if (isFALSE(parallel)) {
message_("Assuming {.code parallel = TRUE} since parallel computing has been set up using the {.pkg future} package before. Set {.help [{.fun plan}](future::plan)} to sequential to prevent this.")
for (i in seq_len(n_groups)) {
progress$tick()
rows <- unlist(groups[i, ]$.rows)
if (length(rows) == 0) {
next
} }
new_out <- antibiogram(as.data.frame(x)[rows, , drop = FALSE], parallel <- TRUE
}
if (isTRUE(parallel)) {
stop_ifnot(
requireNamespace("future.apply", quietly = TRUE),
"Setting {.code parallel = TRUE} requires the {.pkg future.apply} package.\n",
"Install it with {.code install.packages(\"future.apply\")}."
)
stop_if(inherits(future::plan(), "sequential"),
"Setting {.code parallel = TRUE} requires a non-sequential {.help [{.fun future::plan}](future::plan)} to be active.\n",
"For your system, you could first run: {.code library(future); ",
ifelse(.Platform$OS.type == "windows" || in_rstudio(),
"plan(multisession)",
"plan(multicore)"
),
"}",
call = FALSE
)
n_workers <- future::nbrOfWorkers()
} else {
n_workers <- 1L
}
use_parallel <- isTRUE(parallel) && n_workers > 1L && n_groups > 1L
x_df <- as.data.frame(x)
run_group <- function(i) {
rows <- unlist(groups[i, ]$.rows)
if (length(rows) == 0L) return(NULL)
antibiogram(x_df[rows, , drop = FALSE],
antimicrobials = antimicrobials, antimicrobials = antimicrobials,
mo_transform = NULL, mo_transform = NULL,
ab_transform = ab_transform, ab_transform = ab_transform,
@@ -1040,12 +1136,42 @@ antibiogram.grouped_df <- function(x,
conf_interval = conf_interval, conf_interval = conf_interval,
interval_side = interval_side, interval_side = interval_side,
info = FALSE, info = FALSE,
... parallel = FALSE # never nest parallelism in workers
) )
}
if (use_parallel) {
if (isTRUE(info)) {
message_("Running antibiogram for ", n_groups, " groups in parallel using ", n_workers, " workers...", as_note = FALSE, appendLF = FALSE)
}
results_raw <- future.apply::future_lapply(seq_len(n_groups), run_group, future.seed = TRUE)
if (isTRUE(info)) message_(font_green_bg(" DONE "), as_note = FALSE)
} else {
progress <- progress_ticker(
n = n_groups,
n_min = 5,
print = info,
title = paste("Calculating AMR for", n_groups, "groups")
)
on.exit(close(progress), add = TRUE)
results_raw <- vector("list", n_groups)
for (i in seq_len(n_groups)) {
progress$tick()
results_raw[[i]] <- run_group(i)
}
close(progress)
}
out <- NULL
wisca_parameters <- NULL
long_numeric <- NULL
for (i in seq_len(n_groups)) {
new_out <- results_raw[[i]]
new_wisca_parameters <- attributes(new_out)$wisca_parameters new_wisca_parameters <- attributes(new_out)$wisca_parameters
new_long_numeric <- attributes(new_out)$long_numeric new_long_numeric <- attributes(new_out)$long_numeric
if (NROW(new_out) == 0) { if (is.null(new_out) || NROW(new_out) == 0) {
next next
} }
@@ -1071,8 +1197,7 @@ antibiogram.grouped_df <- function(x,
new_long_numeric <- new_long_numeric[, c(col_name, setdiff(names(new_long_numeric), col_name))] # set place to 1st col new_long_numeric <- new_long_numeric[, c(col_name, setdiff(names(new_long_numeric), col_name))] # set place to 1st col
} }
if (i == 1) { if (is.null(out)) {
# the first go
out <- new_out out <- new_out
wisca_parameters <- new_wisca_parameters wisca_parameters <- new_wisca_parameters
long_numeric <- new_long_numeric long_numeric <- new_long_numeric
@@ -1083,8 +1208,6 @@ antibiogram.grouped_df <- function(x,
} }
} }
close(progress)
out <- structure(as_original_data_class(out, class(x), extra_class = "antibiogram"), out <- structure(as_original_data_class(out, class(x), extra_class = "antibiogram"),
has_syndromic_group = FALSE, has_syndromic_group = FALSE,
combine_SI = isTRUE(combine_SI), combine_SI = isTRUE(combine_SI),
@@ -1116,6 +1239,7 @@ wisca <- function(x,
conf_interval = 0.95, conf_interval = 0.95,
interval_side = "two-tailed", interval_side = "two-tailed",
info = interactive(), info = interactive(),
parallel = FALSE,
...) { ...) {
antibiogram( antibiogram(
x = x, x = x,
@@ -1137,6 +1261,7 @@ wisca <- function(x,
conf_interval = conf_interval, conf_interval = conf_interval,
interval_side = interval_side, interval_side = interval_side,
info = info, info = info,
parallel = parallel,
... ...
) )
} }

View File

@@ -25,7 +25,8 @@ antibiogram(x, antimicrobials = where(is.sir), mo_transform = "shortname",
ifelse(wisca, 14, 18)), col_mo = NULL, language = get_AMR_locale(), ifelse(wisca, 14, 18)), col_mo = NULL, language = get_AMR_locale(),
minimum = 30, combine_SI = TRUE, sep = " + ", sort_columns = TRUE, minimum = 30, combine_SI = TRUE, sep = " + ", sort_columns = TRUE,
wisca = FALSE, simulations = 1000, conf_interval = 0.95, wisca = FALSE, simulations = 1000, conf_interval = 0.95,
interval_side = "two-tailed", info = interactive(), ...) interval_side = "two-tailed", info = interactive(), parallel = FALSE,
...)
wisca(x, antimicrobials = where(is.sir), ab_transform = "name", wisca(x, antimicrobials = where(is.sir), ab_transform = "name",
syndromic_group = NULL, only_all_tested = FALSE, digits = 1, syndromic_group = NULL, only_all_tested = FALSE, digits = 1,
@@ -33,7 +34,7 @@ wisca(x, antimicrobials = where(is.sir), ab_transform = "name",
col_mo = NULL, language = get_AMR_locale(), combine_SI = TRUE, col_mo = NULL, language = get_AMR_locale(), combine_SI = TRUE,
sep = " + ", sort_columns = TRUE, simulations = 1000, sep = " + ", sort_columns = TRUE, simulations = 1000,
conf_interval = 0.95, interval_side = "two-tailed", conf_interval = 0.95, interval_side = "two-tailed",
info = interactive(), ...) info = interactive(), parallel = FALSE, ...)
retrieve_wisca_parameters(wisca_model, ...) retrieve_wisca_parameters(wisca_model, ...)
@@ -80,7 +81,7 @@ retrieve_wisca_parameters(wisca_model, ...)
\item{digits}{Number of digits to use for rounding the antimicrobial coverage, defaults to 1 for WISCA and 0 otherwise.} \item{digits}{Number of digits to use for rounding the antimicrobial coverage, defaults to 1 for WISCA and 0 otherwise.}
\item{formatting_type}{Numeric value (122 for WISCA, 1-12 for non-WISCA) indicating how the 'cells' of the antibiogram table should be formatted. See \emph{Details} > \emph{Formatting Type} for a list of options.} \item{formatting_type}{Numeric value (1-22 for WISCA, 1-12 for non-WISCA) indicating how the 'cells' of the antibiogram table should be formatted. See \emph{Details} > \emph{Formatting Type} for a list of options.}
\item{col_mo}{Column name of the names or codes of the microorganisms (see \code{\link[=as.mo]{as.mo()}}) - the default is the first column of class \code{\link{mo}}. Values will be coerced using \code{\link[=as.mo]{as.mo()}}.} \item{col_mo}{Column name of the names or codes of the microorganisms (see \code{\link[=as.mo]{as.mo()}}) - the default is the first column of class \code{\link{mo}}. Values will be coerced using \code{\link[=as.mo]{as.mo()}}.}
@@ -104,6 +105,8 @@ retrieve_wisca_parameters(wisca_model, ...)
\item{info}{A \link{logical} to indicate info should be printed - the default is \code{TRUE} only in interactive mode.} \item{info}{A \link{logical} to indicate info should be printed - the default is \code{TRUE} only in interactive mode.}
\item{parallel}{A \link{logical} to indicate if parallel computing must be used, defaults to \code{FALSE}. Requires the \code{\link[future.apply:future_lapply]{future.apply}} package. For WISCA, Monte Carlo simulations are distributed across workers; for grouped antibiograms, each group is processed by a separate worker. \strong{A non-sequential \code{\link[future:plan]{future::plan()}} must already be active before setting \code{parallel = TRUE}} -- for example, \code{future::plan(future::multisession)}. An error is thrown if \code{parallel = TRUE} is used without a plan set by the user.}
\item{...}{When used in \link[knitr:kable]{R Markdown or Quarto}: arguments passed on to \code{\link[knitr:kable]{knitr::kable()}} (otherwise, has no use).} \item{...}{When used in \link[knitr:kable]{R Markdown or Quarto}: arguments passed on to \code{\link[knitr:kable]{knitr::kable()}} (otherwise, has no use).}
\item{wisca_model}{The outcome of \code{\link[=wisca]{wisca()}} or \code{\link[=antibiogram]{antibiogram(..., wisca = TRUE)}}.} \item{wisca_model}{The outcome of \code{\link[=wisca]{wisca()}} or \code{\link[=antibiogram]{antibiogram(..., wisca = TRUE)}}.}

View File

@@ -45,9 +45,8 @@ A list with class \code{"htest"} containing the following
\item{residuals}{the Pearson residuals, \item{residuals}{the Pearson residuals,
\code{(observed - expected) / sqrt(expected)}.} \code{(observed - expected) / sqrt(expected)}.}
\item{stdres}{standardized residuals, \item{stdres}{standardized residuals,
\code{(observed - expected) / sqrt(V)}, where \code{V} is the \code{(observed - expected) / sqrt(V)}, where \code{V} is the residual cell variance (Agresti, 2007,
residual cell variance (Agresti, 2007, section 2.4.5 section 2.4.5 for the case where \code{x} is a matrix, \code{n * p * (1 - p)} otherwise).}
for the case where \code{x} is a matrix, \code{n * p * (1 - p)} otherwise).}
} }
\description{ \description{
\code{\link[=g.test]{g.test()}} performs chi-squared contingency table tests and goodness-of-fit tests, just like \code{\link[=chisq.test]{chisq.test()}} but is more reliable (1). A \emph{G}-test can be used to see whether the number of observations in each category fits a theoretical expectation (called a \strong{\emph{G}-test of goodness-of-fit}), or to see whether the proportions of one variable are different for different values of the other variable (called a \strong{\emph{G}-test of independence}). \code{\link[=g.test]{g.test()}} performs chi-squared contingency table tests and goodness-of-fit tests, just like \code{\link[=chisq.test]{chisq.test()}} but is more reliable (1). A \emph{G}-test can be used to see whether the number of observations in each category fits a theoretical expectation (called a \strong{\emph{G}-test of goodness-of-fit}), or to see whether the proportions of one variable are different for different values of the other variable (called a \strong{\emph{G}-test of independence}).

View File

@@ -32,7 +32,7 @@ pca(x, ..., retx = TRUE, center = TRUE, scale. = TRUE, tol = NULL,
standard deviations are less than or equal to \code{tol} times the standard deviations are less than or equal to \code{tol} times the
standard deviation of the first component.) With the default null standard deviation of the first component.) With the default null
setting, no components are omitted (unless \code{rank.} is specified setting, no components are omitted (unless \code{rank.} is specified
less than \code{min(dim(x))}.). Other settings for \code{tol} could be less than \code{min(dim(x))}.). Other settings for tol could be
\code{tol = 0} or \code{tol = sqrt(.Machine$double.eps)}, which \code{tol = 0} or \code{tol = sqrt(.Machine$double.eps)}, which
would omit essentially constant components.} would omit essentially constant components.}

View File

@@ -130,6 +130,77 @@ test_that("test-antibiogram.R", {
expect_equal(colnames(ab9), c("ward", "gender", "Piperacillin/tazobactam", "Piperacillin/tazobactam + Gentamicin", "Piperacillin/tazobactam + Tobramycin")) expect_equal(colnames(ab9), c("ward", "gender", "Piperacillin/tazobactam", "Piperacillin/tazobactam + Gentamicin", "Piperacillin/tazobactam + Tobramycin"))
} }
# Parallel computing ----------------------------------------------------
# Tests must pass even when only 1 core is available; parallel = TRUE then
# silently falls back to sequential, but results must still be correct.
if (AMR:::pkg_is_available("future.apply")) {
set.seed(42)
# sequential reference for WISCA
wisca_seq <- suppressWarnings(suppressMessages(
wisca(example_isolates, antimicrobials = c("TZP", "TZP+TOB", "TZP+GEN"), simulations = 100, info = FALSE)
))
future::plan(future::multicore)
# 1. parallel = TRUE produces the same antibiogram structure as sequential
wisca_par <- suppressWarnings(suppressMessages(
wisca(example_isolates, antimicrobials = c("TZP", "TZP+TOB", "TZP+GEN"), simulations = 100, parallel = TRUE, info = FALSE)
))
expect_inherits(wisca_par, "antibiogram")
expect_equal(colnames(wisca_par), colnames(wisca_seq))
expect_true(isTRUE(attributes(wisca_par)$wisca))
# 2. coverage values are non-NA and fall within [0, 1]
ln <- attributes(wisca_par)$long_numeric
expect_false(anyNA(ln$coverage))
expect_false(anyNA(ln$lower_ci))
expect_false(anyNA(ln$upper_ci))
expect_true(all(ln$coverage >= 0 & ln$coverage <= 1))
expect_true(all(ln$lower_ci <= ln$coverage))
expect_true(all(ln$upper_ci >= ln$coverage))
# 3. a second parallel run gives the same column names
wisca_par2 <- suppressWarnings(suppressMessages(
wisca(example_isolates, antimicrobials = c("TZP", "TZP+TOB", "TZP+GEN"), simulations = 100, parallel = TRUE, info = FALSE)
))
expect_equal(colnames(wisca_par), colnames(wisca_par2))
# 4. parallel with workers = 1 gives same structure as sequential
future::plan(future::multicore, workers = 1)
wisca_par1 <- suppressWarnings(suppressMessages(
wisca(example_isolates, antimicrobials = c("TZP", "TZP+TOB", "TZP+GEN"), simulations = 100, parallel = TRUE, info = FALSE)
))
expect_equal(colnames(wisca_seq), colnames(wisca_par1))
# 5. grouped antibiogram in parallel yields identical structure to sequential
if (AMR:::pkg_is_available("dplyr", min_version = "1.0.0", also_load = TRUE)) {
future::plan(future::sequential)
ab_grp_seq <- suppressWarnings(suppressMessages(
example_isolates %>%
group_by(ward) %>%
wisca(antimicrobials = c("TZP", "TZP+TOB"), simulations = 50, info = FALSE)
))
future::plan(future::multicore)
ab_grp_par <- suppressWarnings(suppressMessages(
example_isolates %>%
group_by(ward) %>%
wisca(antimicrobials = c("TZP", "TZP+TOB"), simulations = 50, parallel = TRUE, info = FALSE)
))
expect_equal(colnames(ab_grp_seq), colnames(ab_grp_par))
expect_equal(nrow(ab_grp_seq), nrow(ab_grp_par))
}
# 6. parallel = TRUE without a plan raises an informative error
future::plan(future::sequential)
expect_error(
suppressWarnings(wisca(example_isolates, antimicrobials = "TZP", parallel = TRUE, info = FALSE)),
"non-sequential"
)
future::plan(future::sequential)
}
# Generate plots with ggplot2 or base R -------------------------------- # Generate plots with ggplot2 or base R --------------------------------