1
0
mirror of https://github.com/msberends/AMR.git synced 2026-04-28 05:43:50 +02:00

2 Commits

Author SHA1 Message Date
Claude
20c9447096 Require user plan() for parallel=TRUE; fix as_wt_nwt false-positive warnings
- parallel = TRUE now errors with a cli-styled message if no non-sequential
  future::plan() is active; users must call e.g. future::plan(future::multisession)
  before using parallel = TRUE (breaking change)
- Removed auto-setup/teardown of multisession plan inside as.sir(), which was
  slow and caused version-mismatch issues with load_all() workflows
- Added as_wt_nwt to the exclusion list in as_sir_method() to suppress
  false-positive "no longer used" warnings during parallel runs
- Fixed pieces_per_col row-batch calculation to use n_workers (total available
  workers from the active plan) instead of n_cores (workers clipped to n_cols),
  so row-batch mode activates correctly when n_cols < n_workers
- Updated @param parallel and @param max_cores roxygen docs; regenerated man/as.sir.Rd
- Updated sequential-mode hint to instruct users to set plan() first

https://claude.ai/code/session_01M1Jvf2Miu6JL4TQrEh1wS8
2026-04-27 14:20:41 +00:00
Claude
b1cf7a94ad Migrate parallel computing in as.sir() from parallel:: to future/future.apply
Replace parallel::mclapply() and parallel::parLapply() with
future.apply::future_lapply(), enabling transparent support for any
future backend (multisession, multicore, mirai_multisession, cluster)
on all platforms including Windows.

When parallel = TRUE the function now: (1) respects an active
future::plan() set by the user without overriding it on exit, or
(2) sets a temporary multisession plan with parallelly::availableCores()
and tears it down on exit. The max_cores argument controls worker count
only when no user plan is active.

future and future.apply are added to Suggests in DESCRIPTION.

https://claude.ai/code/session_01M1Jvf2Miu6JL4TQrEh1wS8
2026-04-27 12:21:48 +00:00
10 changed files with 92 additions and 104 deletions

View File

@@ -1,6 +1,6 @@
Package: AMR
Version: 3.0.1.9052
Date: 2026-04-25
Version: 3.0.1.9053
Date: 2026-04-27
Title: Antimicrobial Resistance Data Analysis
Description: Functions to simplify and standardise antimicrobial resistance (AMR)
data analysis and to work with microbial and antimicrobial properties by
@@ -37,13 +37,15 @@ Authors@R: c(
person(given = c("Casper", "J."), family = "Albers", role = "ths", comment = c(ORCID = "0000-0002-9213-6743")),
person(given = c("Corinna"), family = "Glasner", role = "ths", comment = c(ORCID = "0000-0003-1241-1328")))
Depends: R (>= 3.0.0)
Suggests:
Suggests:
cleaner,
cli,
crayon,
curl,
data.table,
dplyr,
future,
future.apply,
ggplot2,
knitr,
openxlsx,

View File

@@ -1,4 +1,4 @@
# AMR 3.0.1.9052
# AMR 3.0.1.9053
### New
* Support for clinical breakpoints of 2026 of both CLSI and EUCAST, by adding all of their over 5,700 new clinical breakpoints to the `clinical_breakpoints` data set for usage in `as.sir()`. EUCAST 2026 is now the new default guideline for all MIC and disk diffusion interpretations.
@@ -38,9 +38,12 @@
* Fixed `as.sir()` for data frames silently deleting columns whose AB class was already `<sir>` when called a second time (re-running on already-converted data) (#278)
* Fixed `as.sir()` for data frames incorrectly treating metadata columns (e.g. `patient`, `ward`) as antibiotic columns when their names coincidentally matched an antibiotic code; column content is now validated against AMR data patterns before inclusion
* Improved parallel computing in `as.sir()`: when the number of AB columns is smaller than the number of available cores, rows are now split into batches so all cores stay active (row-batch mode). Previously, a 6-column dataset on a 16-core machine would only use 6 cores; now all 16 are used, with each worker processing a smaller row slice (lower per-worker memory pressure)
* Fixed false-positive `"as_wt_nwt is no longer used"` warnings that appeared during parallel `as.sir()` runs; `as_wt_nwt` is now excluded from the unused-argument check in `as_sir_method()`
* **Breaking change**: `as.sir()` with `parallel = TRUE` now requires a non-sequential `future::plan()` to be active before the call — e.g., `future::plan(future::multisession)` — and throws an informative error if none is set; previously `as.sir()` would silently set up and tear down a `multisession` plan itself, which was slow and caused version-mismatch issues with `load_all()` workflows
* Fixed `as.sir()` ignoring `info = FALSE` for columns with no breakpoints (e.g. cefoxitin against *E. coli*): an operator-precedence bug (`&&`/`||`) caused the "Interpreting MIC values" intro message to fire unconditionally when `nrow(breakpoints) == 0`, regardless of `info`; the progress bar title was also not gated by `info`
### Updates
* `as.sir()` with `parallel = TRUE` now uses `future.apply::future_lapply()` instead of `parallel::mclapply()`/`parallel::parLapply()`, enabling transparent support for any `future` backend (including `mirai_multisession`) on all platforms; `future` and `future.apply` are now listed under `Suggests`
* `as.sir()` with `reference_data`: custom guideline names now correctly classify values as R using EUCAST convention (`> breakpoint_R` for MIC, `< breakpoint_R` for disk); custom breakpoints with `host = NA` now serve as a host-agnostic fallback when no host-specific row matches (#239)
* Extensive `cli` integration for better message handling and clickable links in messages and warnings (#191, #265)
* `mdro()` now infers resistance for a _missing_ base drug column from an _available_ corresponding drug+inhibitor combination showing resistance (e.g., piperacillin is absent but required, while piperacillin/tazobactam available and resistant). Can be set with the new argument `infer_from_combinations`, which defaults to `TRUE` (#209). Note that this can yield a higher MDRO detection (which is a good thing as it has become more reliable).

133
R/sir.R
View File

@@ -95,7 +95,7 @@ VALID_SIR_LEVELS <- c("S", "SDD", "I", "R", "NI", "WT", "NWT", "NS")
#' # for veterinary breakpoints, also set `host`:
#' your_data %>% mutate_if(is.mic, as.sir, host = "column_with_animal_species", guideline = "CLSI")
#'
#' # fast processing with parallel computing:
#' # fast processing with parallel computing (requires future.apply):
#' as.sir(your_data, ..., parallel = TRUE)
#' ```
#' * Operators like "<=" will be considered according to the `capped_mic_handling` setting. At default, an MIC value of e.g. ">2" will return "NI" (non-interpretable) if the breakpoint is 4-8; the *true* MIC could be at either side of the breakpoint. This is to prevent that capped values from raw laboratory data would not be treated conservatively.
@@ -112,7 +112,7 @@ VALID_SIR_LEVELS <- c("S", "SDD", "I", "R", "NI", "WT", "NWT", "NS")
#' # for veterinary breakpoints, also set `host`:
#' your_data %>% mutate_if(is.disk, as.sir, host = "column_with_animal_species", guideline = "CLSI")
#'
#' # fast processing with parallel computing:
#' # fast processing with parallel computing (requires future.apply):
#' as.sir(your_data, ..., parallel = TRUE)
#' ```
#'
@@ -220,7 +220,8 @@ VALID_SIR_LEVELS <- c("S", "SDD", "I", "R", "NI", "WT", "NWT", "NS")
#' sir_interpretation_history()
#'
#' \donttest{
#' # using parallel computing, which is available in base R:
#' # using parallel computing (requires the future.apply package):
#' # future::plan(future::multisession) # optional: set your own plan first
#' as.sir(df_wide, parallel = TRUE, info = TRUE)
#'
#'
@@ -716,8 +717,8 @@ as.sir.disk <- function(x,
}
#' @rdname as.sir
#' @param parallel A [logical] to indicate if parallel computing must be used, defaults to `FALSE`. The `parallel` package is part of base \R and no additional packages are required. On Unix/macOS with \R >= 4.0.0, [parallel::mclapply()] (fork-based) is used; on Windows and \R < 4.0.0, [parallel::parLapply()] with a PSOCK cluster is used (requires the AMR package to be installed, not just loaded via `devtools::load_all()`). Parallelism distributes columns across cores; it is most beneficial when there are many antibiotic columns and a large number of rows.
#' @param max_cores Maximum number of cores to use if `parallel = TRUE`. Use a negative value to subtract that number from the available number of cores, e.g. a value of `-2` on an 8-core machine means that at most 6 cores will be used. Defaults to `-1`. There will never be used more cores than variables to analyse. The available number of cores are detected using [parallelly::availableCores()] if that package is installed, and base \R's [parallel::detectCores()] otherwise.
#' @param parallel A [logical] to indicate if parallel computing must be used, defaults to `FALSE`. Requires the [`future.apply`][future.apply::future_lapply()] package. **A non-sequential [future::plan()] must already be active before setting `parallel = TRUE`** — for example, `future::plan(future::multisession)`. An error is thrown if `parallel = TRUE` is used without a plan set by the user. Parallelism distributes columns (and optionally row batches) across workers; it is most beneficial when there are many antibiotic columns and a large number of rows.
#' @param max_cores Maximum number of workers to use when `parallel = TRUE`. Use a negative value to subtract that number from the available workers, e.g. a value of `-2` means at most `nbrOfWorkers() - 2` workers will be used. Defaults to `-1` (all but one worker). There will never be more workers used than there are antibiotic columns to analyse.
#' @export
as.sir.data.frame <- function(x,
...,
@@ -911,35 +912,34 @@ as.sir.data.frame <- function(x,
}
# set up parallel computing
n_cores <- get_n_cores(max_cores = max_cores)
n_cores <- min(n_cores, length(ab_cols)) # never more cores than variables required
if (isTRUE(parallel) && (.Platform$OS.type == "windows" || getRversion() < "4.0.0")) {
cl <- tryCatch(parallel::makeCluster(n_cores, type = "PSOCK"),
error = function(e) {
if (isTRUE(info)) {
message_("Could not create parallel cluster, using single-core computation. Error message: ", conditionMessage(e))
}
return(NULL)
}
)
if (!is.null(cl)) {
# Each PSOCK worker is a fresh R session — the AMR package must be loaded there
# so all exported functions (as.sir, as.mic, as.disk, ...) are available.
amr_loaded_on_workers <- tryCatch({
parallel::clusterEvalQ(cl, library(AMR, quietly = TRUE))
TRUE
}, error = function(e) FALSE)
if (!amr_loaded_on_workers) {
if (isTRUE(info)) {
message_("Could not load AMR on parallel workers (package may not be installed); falling back to single-core computation.")
}
parallel::stopCluster(cl)
cl <- NULL
}
if (isTRUE(parallel)) {
if (!requireNamespace("future.apply", quietly = TRUE)) {
stop_(
"Setting {.arg parallel} to {.code TRUE} requires the {.pkg future.apply} package.\n",
"Install it with: ", highlight_code('install.packages("future.apply")'), "."
)
}
if (is.null(cl)) {
n_cores <- 1
if (inherits(future::plan(), "sequential")) {
stop_(
"Setting {.arg parallel} to {.code TRUE} requires a non-sequential {.help [future::plan](future::plan)} to be active.\n",
"Set a parallel plan before calling {.help [{.fun as.sir}](AMR::as.sir)}, for example:\n",
highlight_code("future::plan(future::multisession)"), "\n",
"Or on Linux/macOS for fork-based workers:\n",
highlight_code("future::plan(future::multicore)"), "\n",
"See {.help [future::plan](future::plan)} for all available strategies.",
call = FALSE
)
}
n_workers <- future::nbrOfWorkers()
n_cores <- if (max_cores < 0L) {
max(1L, n_workers + max_cores)
} else {
min(max_cores, n_workers)
}
n_cores <- min(n_cores, length(ab_cols))
} else {
n_workers <- 1L
n_cores <- 1L
}
if (isTRUE(info)) {
@@ -952,31 +952,23 @@ as.sir.data.frame <- function(x,
is_parallel_run <- isTRUE(parallel) && n_cores > 1 && length(ab_cols) > 1
effective_info <- if (is_parallel_run) FALSE else info
# Row-batch mode: when n_cols < n_cores we would leave cores idle under plain
# column-parallel dispatch. Instead we split rows into pieces so every core
# gets work. pieces_per_col = ceil(n_cores / n_cols) gives ~n_cores jobs
# Row-batch mode: when n_cols < n_workers we would leave workers idle under plain
# column-parallel dispatch. Instead we split rows into pieces so every worker
# gets work. pieces_per_col = ceil(n_workers / n_cols) gives ~n_workers jobs
# total; each job processes one column on one row slice, which also reduces
# per-worker memory pressure (smaller breakpoints search space).
# Only used for the fork path (R >= 4.0, non-Windows); PSOCK clusters already
# incur high per-job serialisation overhead so we keep column-mode there.
use_fork <- is_parallel_run &&
!(.Platform$OS.type == "windows" || getRversion() < "4.0.0")
pieces_per_col <- if (use_fork && length(ab_cols) < n_cores) {
ceiling(n_cores / length(ab_cols))
pieces_per_col <- if (is_parallel_run && length(ab_cols) < n_workers) {
ceiling(n_workers / length(ab_cols))
} else {
1L
}
run_as_sir_column <- function(i, rows = NULL) {
# Always resolve AMR_env from the package namespace. This is essential for
# PSOCK workers (where the closure-captured AMR_env is a stale serialised copy
# while as.sir() writes to the live AMR:::AMR_env) and also avoids capturing
# pre-existing log entries from earlier in the session when forking.
# Always resolve AMR_env from the package namespace so workers get the live
# environment rather than a stale serialised copy from the closure.
.amr_env <- get("AMR_env", envir = asNamespace("AMR"), inherits = FALSE)
# In parallel mode each worker (fork or PSOCK) has its own copy of the
# history; record the current length so we capture only the new rows added
# by the as.sir() call below, not any pre-existing entries inherited at fork
# time or carried over from earlier as.sir() calls.
# In parallel mode each worker has its own copy of the history; record the
# current length so we capture only the rows added by this as.sir() call.
if (is_parallel_run) pre_log_n <- NROW(.amr_env$sir_interpretation_history)
ab_col <- ab_cols[i]
@@ -1090,31 +1082,17 @@ as.sir.data.frame <- function(x,
return(out)
}
if (isTRUE(parallel) && n_cores > 1 && length(ab_cols) > 1) {
if (is_parallel_run) {
if (isTRUE(info)) {
message_(as_note = FALSE)
if (pieces_per_col > 1L) {
message_("Running in parallel mode using ", n_cores, " out of ", get_n_cores(Inf), " cores, on columns ", vector_and(paste0("{.field ", font_bold(ab_cols, collapse = NULL), "}"), quotes = FALSE, sort = FALSE), " (", pieces_per_col, " row slices per column)...", as_note = FALSE, appendLF = FALSE)
message_("Running in parallel mode using ", n_cores, " out of ", get_n_cores(Inf), " workers, on columns ", vector_and(paste0("{.field ", font_bold(ab_cols, collapse = NULL), "}"), quotes = FALSE, sort = FALSE), " (", pieces_per_col, " row slices per column)...", as_note = FALSE, appendLF = FALSE)
} else {
message_("Running in parallel mode using ", n_cores, " out of ", get_n_cores(Inf), " cores, on columns ", vector_and(paste0("{.field ", font_bold(ab_cols, collapse = NULL), "}"), quotes = FALSE, sort = FALSE), "...", as_note = FALSE, appendLF = FALSE)
message_("Running in parallel mode using ", n_cores, " out of ", get_n_cores(Inf), " workers, on columns ", vector_and(paste0("{.field ", font_bold(ab_cols, collapse = NULL), "}"), quotes = FALSE, sort = FALSE), "...", as_note = FALSE, appendLF = FALSE)
}
}
if (.Platform$OS.type == "windows" || getRversion() < "4.0.0") {
# PSOCK cluster: column-mode only (row-batch serialisation overhead not worth it)
on.exit(parallel::stopCluster(cl), add = TRUE)
parallel::clusterExport(cl, varlist = c(
"x", "x.bak", "x_mo", "ab_cols", "types",
"capped_mic_handling", "as_wt_nwt", "add_intrinsic_resistance",
"reference_data", "substitute_missing_r_breakpoint", "include_screening", "include_PKPD",
"breakpoint_type", "guideline", "host", "uti", "verbose",
"col_mo", "conserve_capped_values",
"effective_info", "is_parallel_run",
"run_as_sir_column"
), envir = environment())
result_list <- parallel::parLapply(cl, seq_along(ab_cols), run_as_sir_column)
} else if (pieces_per_col > 1L) {
# Row-batch mode (R >= 4.0, non-Windows, n_cols < n_cores):
# build (col, row_slice) job pairs so all cores stay active
if (pieces_per_col > 1L) {
# Row-batch mode: build (col, row_slice) job pairs so all workers stay active
row_cuts <- unique(round(seq(0, nrow(x), length.out = pieces_per_col + 1L)))
row_ranges <- lapply(seq_len(length(row_cuts) - 1L), function(p) {
seq.int(row_cuts[p] + 1L, row_cuts[p + 1L])
@@ -1122,9 +1100,9 @@ as.sir.data.frame <- function(x,
jobs <- do.call(c, lapply(seq_along(ab_cols), function(ci) {
lapply(seq_along(row_ranges), function(p) list(col = ci, rows = row_ranges[[p]]))
}))
flat <- parallel::mclapply(jobs, function(job) {
flat <- future.apply::future_lapply(jobs, function(job) {
run_as_sir_column(job$col, job$rows)
}, mc.cores = n_cores)
}, future.seed = TRUE)
# Reassemble: for each column concatenate row pieces in order
result_list <- lapply(seq_along(ab_cols), function(ci) {
pieces <- flat[vapply(jobs, function(j) j$col == ci, logical(1L))]
@@ -1137,8 +1115,8 @@ as.sir.data.frame <- function(x,
)
})
} else {
# Column-parallel mode (R >= 4.0, non-Windows, n_cols >= n_cores)
result_list <- parallel::mclapply(seq_along(ab_cols), run_as_sir_column, mc.cores = n_cores)
# Column-parallel mode: one job per antibiotic column
result_list <- future.apply::future_lapply(seq_along(ab_cols), run_as_sir_column, future.seed = TRUE)
}
if (isTRUE(info)) {
message_(font_green_bg("\u00a0DONE\u00a0"), as_note = FALSE)
@@ -1147,10 +1125,13 @@ as.sir.data.frame <- function(x,
}
} else {
# sequential mode (non-parallel)
if (isTRUE(info) && n_cores > 1 && NROW(x) * NCOL(x) > 10000) {
# give a note that parallel mode might be better
if (isTRUE(info) && get_n_cores(Inf) > 1 && NROW(x) * NCOL(x) > 10000) {
message_(as_note = FALSE)
message_("Running in sequential mode. Consider setting {.arg parallel} to {.code TRUE} to speed up processing on multiple cores.\n")
if (requireNamespace("future.apply", quietly = TRUE)) {
message_("Running in sequential mode. To speed up processing, set a parallel plan first (e.g., ", highlight_code("future::plan(future::multisession)"), ") and then set {.arg parallel} to {.code TRUE}.\n")
} else {
message_("Running in sequential mode. To speed up processing, install the {.pkg future.apply} package, set a parallel plan first (e.g., ", highlight_code("future::plan(future::multisession)"), ") and then set {.arg parallel} to {.code TRUE}.\n")
}
}
# this will contain a progress bar already
result_list <- lapply(seq_along(ab_cols), run_as_sir_column)
@@ -1280,7 +1261,7 @@ as_sir_method <- function(method_short,
# backward compatibilty
dots <- list(...)
dots <- dots[which(!names(dots) %in% c("warn", "mo.bak", "is_data.frame"))]
dots <- dots[which(!names(dots) %in% c("warn", "mo.bak", "is_data.frame", "as_wt_nwt"))]
if (length(dots) != 0) {
warning_("These arguments in {.help [{.fun as.sir}](AMR::as.sir)} are no longer used: ", vector_and(names(dots), quotes = "`"), ".", call = FALSE)
}

View File

@@ -31,22 +31,24 @@ step_sir_numeric(recipe, ..., role = NA, trained = FALSE, columns = NULL,
skip = FALSE, id = recipes::rand_id("sir_numeric"))
}
\arguments{
\item{recipe}{A recipe object. The step will be added to the sequence of
operations for this recipe.}
\item{recipe}{A recipe object. The step will be added to the
sequence of operations for this recipe.}
\item{...}{One or more selector functions to choose variables for this step.
See \code{\link[recipes:selections]{selections()}} for more details.}
\item{...}{One or more selector functions to choose variables
for this step. See \code{\link[recipes:selections]{selections()}} for more details.}
\item{role}{Not used by this step since no new variables are created.}
\item{role}{Not used by this step since no new variables are
created.}
\item{trained}{A logical to indicate if the quantities for preprocessing have
been estimated.}
\item{trained}{A logical to indicate if the quantities for
preprocessing have been estimated.}
\item{skip}{A logical. Should the step be skipped when the recipe is baked by
\code{\link[recipes:bake]{bake()}}? While all operations are baked when \code{\link[recipes:prep]{prep()}} is run, some
operations may not be able to be conducted on new data (e.g. processing the
outcome variable(s)). Care should be taken when using \code{skip = TRUE} as it
may affect the computations for subsequent operations.}
\item{skip}{A logical. Should the step be skipped when the
recipe is baked by \code{\link[recipes:bake]{bake()}}? While all operations are baked
when \code{\link[recipes:prep]{prep()}} is run, some operations may not be able to be
conducted on new data (e.g. processing the outcome variable(s)).
Care should be taken when using \code{skip = TRUE} as it may affect
the computations for subsequent operations.}
\item{id}{A character string that is unique to this step to identify it.}
}

View File

@@ -72,7 +72,7 @@ retrieve_wisca_parameters(wisca_model, ...)
\item{ab_transform}{A character to transform antimicrobial input - must be one of the column names of the \link{antimicrobials} data set (defaults to \code{"name"}): \code{"ab"}, \code{"cid"}, \code{"name"}, \code{"group"}, \code{"atc"}, \code{"atc_group1"}, \code{"atc_group2"}, \code{"abbreviations"}, \code{"synonyms"}, \code{"oral_ddd"}, \code{"oral_units"}, \code{"iv_ddd"}, \code{"iv_units"}, or \code{"loinc"}. Can also be \code{NULL} to not transform the input.}
\item{syndromic_group}{A column name of \code{x}, or values calculated to split rows of \code{x}, e.g. by using \code{\link[=ifelse]{ifelse()}} or \code{\link[dplyr:case-and-replace-when]{case_when()}}. See \emph{Examples}.}
\item{syndromic_group}{A column name of \code{x}, or values calculated to split rows of \code{x}, e.g. by using \code{\link[=ifelse]{ifelse()}} or \code{\link[dplyr:case_when]{case_when()}}. See \emph{Examples}.}
\item{add_total_n}{\emph{(deprecated in favour of \code{formatting_type})} A \link{logical} to indicate whether \code{n_tested} available numbers per pathogen should be added to the table (default is \code{TRUE}). This will add the lowest and highest number of available isolates per antimicrobial (e.g, if for \emph{E. coli} 200 isolates are available for ciprofloxacin and 150 for amoxicillin, the returned number will be "150-200"). This option is unavailable when \code{wisca = TRUE}; in that case, use \code{\link[=retrieve_wisca_parameters]{retrieve_wisca_parameters()}} to get the parameters used for WISCA.}

View File

@@ -150,9 +150,9 @@ The default \code{"conservative"} setting ensures cautious handling of uncertain
\item{col_mo}{Column name of the names or codes of the microorganisms (see \code{\link[=as.mo]{as.mo()}}) - the default is the first column of class \code{\link{mo}}. Values will be coerced using \code{\link[=as.mo]{as.mo()}}.}
\item{parallel}{A \link{logical} to indicate if parallel computing must be used, defaults to \code{FALSE}. The \code{parallel} package is part of base \R and no additional packages are required. On Unix/macOS with \R >= 4.0.0, \code{\link[parallel:mclapply]{parallel::mclapply()}} (fork-based) is used; on Windows and \R < 4.0.0, \code{\link[parallel:clusterApply]{parallel::parLapply()}} with a PSOCK cluster is used (requires the AMR package to be installed, not just loaded via \code{devtools::load_all()}). Parallelism distributes columns across cores; it is most beneficial when there are many antibiotic columns and a large number of rows.}
\item{parallel}{A \link{logical} to indicate if parallel computing must be used, defaults to \code{FALSE}. Requires the \code{\link[future.apply:future_lapply]{future.apply}} package. \strong{A non-sequential \code{\link[future:plan]{future::plan()}} must already be active before setting \code{parallel = TRUE}} — for example, \code{future::plan(future::multisession)}. An error is thrown if \code{parallel = TRUE} is used without a plan set by the user. Parallelism distributes columns (and optionally row batches) across workers; it is most beneficial when there are many antibiotic columns and a large number of rows.}
\item{max_cores}{Maximum number of cores to use if \code{parallel = TRUE}. Use a negative value to subtract that number from the available number of cores, e.g. a value of \code{-2} on an 8-core machine means that at most 6 cores will be used. Defaults to \code{-1}. There will never be used more cores than variables to analyse. The available number of cores are detected using \code{\link[parallelly:availableCores]{parallelly::availableCores()}} if that package is installed, and base \R's \code{\link[parallel:detectCores]{parallel::detectCores()}} otherwise.}
\item{max_cores}{Maximum number of workers to use when \code{parallel = TRUE}. Use a negative value to subtract that number from the available workers, e.g. a value of \code{-2} means at most \code{nbrOfWorkers() - 2} workers will be used. Defaults to \code{-1} (all but one worker). There will never be more workers used than there are antibiotic columns to analyse.}
\item{clean}{A \link{logical} to indicate whether previously stored results should be forgotten after returning the 'logbook' with results.}
}
@@ -183,7 +183,7 @@ your_data \%>\% mutate_if(is.mic, as.sir, ab = c("cipro", "ampicillin", ...), mo
# for veterinary breakpoints, also set `host`:
your_data \%>\% mutate_if(is.mic, as.sir, host = "column_with_animal_species", guideline = "CLSI")
# fast processing with parallel computing:
# fast processing with parallel computing (requires future.apply):
as.sir(your_data, ..., parallel = TRUE)
}\if{html}{\out{</div>}}
\item Operators like "<=" will be considered according to the \code{capped_mic_handling} setting. At default, an MIC value of e.g. ">2" will return "NI" (non-interpretable) if the breakpoint is 4-8; the \emph{true} MIC could be at either side of the breakpoint. This is to prevent that capped values from raw laboratory data would not be treated conservatively.
@@ -201,7 +201,7 @@ your_data \%>\% mutate_if(is.disk, as.sir, ab = c("cipro", "ampicillin", ...), m
# for veterinary breakpoints, also set `host`:
your_data \%>\% mutate_if(is.disk, as.sir, host = "column_with_animal_species", guideline = "CLSI")
# fast processing with parallel computing:
# fast processing with parallel computing (requires future.apply):
as.sir(your_data, ..., parallel = TRUE)
}\if{html}{\out{</div>}}
}
@@ -313,7 +313,8 @@ as.sir(df_wide)
sir_interpretation_history()
\donttest{
# using parallel computing, which is available in base R:
# using parallel computing (requires the future.apply package):
# future::plan(future::multisession) # optional: set your own plan first
as.sir(df_wide, parallel = TRUE, info = TRUE)

View File

@@ -19,7 +19,7 @@ Define custom EUCAST rules for your organisation or specific analysis and use th
Some organisations have their own adoption of EUCAST rules. This function can be used to define custom EUCAST rules to be used in the \code{\link[=eucast_rules]{eucast_rules()}} function.
\subsection{Basics}{
If you are familiar with the \code{\link[dplyr:case-and-replace-when]{case_when()}} function of the \code{dplyr} package, you will recognise the input method to set your own rules. Rules must be set using what \R considers to be the 'formula notation'. The rule itself is written \emph{before} the tilde (\code{~}) and the consequence of the rule is written \emph{after} the tilde:
If you are familiar with the \code{\link[dplyr:case_when]{case_when()}} function of the \code{dplyr} package, you will recognise the input method to set your own rules. Rules must be set using what \R considers to be the 'formula notation'. The rule itself is written \emph{before} the tilde (\code{~}) and the consequence of the rule is written \emph{after} the tilde:
\if{html}{\out{<div class="sourceCode r">}}\preformatted{x <- custom_eucast_rules(TZP == "S" ~ aminopenicillins == "S",
TZP == "R" ~ aminopenicillins == "R")

View File

@@ -26,7 +26,7 @@ Define custom a MDRO guideline for your organisation or specific analysis and us
Using a custom MDRO guideline is of importance if you have custom rules to determine MDROs in your hospital, e.g., rules that are dependent on ward, state of contact isolation or other variables in your data.
\subsection{Basics}{
If you are familiar with the \code{\link[dplyr:case-and-replace-when]{case_when()}} function of the \code{dplyr} package, you will recognise the input method to set your own rules. Rules must be set using what \R considers to be the 'formula notation'. The rule itself is written \emph{before} the tilde (\code{~}) and the consequence of the rule is written \emph{after} the tilde:
If you are familiar with the \code{\link[dplyr:case_when]{case_when()}} function of the \code{dplyr} package, you will recognise the input method to set your own rules. Rules must be set using what \R considers to be the 'formula notation'. The rule itself is written \emph{before} the tilde (\code{~}) and the consequence of the rule is written \emph{after} the tilde:
\if{html}{\out{<div class="sourceCode r">}}\preformatted{custom <- custom_mdro_guideline(CIP == "R" & age > 60 ~ "Elderly Type A",
ERY == "R" & age > 60 ~ "Elderly Type B")

View File

@@ -45,9 +45,8 @@ A list with class \code{"htest"} containing the following
\item{residuals}{the Pearson residuals,
\code{(observed - expected) / sqrt(expected)}.}
\item{stdres}{standardized residuals,
\code{(observed - expected) / sqrt(V)}, where \code{V} is the
residual cell variance (Agresti, 2007, section 2.4.5
for the case where \code{x} is a matrix, \code{n * p * (1 - p)} otherwise).}
\code{(observed - expected) / sqrt(V)}, where \code{V} is the residual cell variance (Agresti, 2007,
section 2.4.5 for the case where \code{x} is a matrix, \code{n * p * (1 - p)} otherwise).}
}
\description{
\code{\link[=g.test]{g.test()}} performs chi-squared contingency table tests and goodness-of-fit tests, just like \code{\link[=chisq.test]{chisq.test()}} but is more reliable (1). A \emph{G}-test can be used to see whether the number of observations in each category fits a theoretical expectation (called a \strong{\emph{G}-test of goodness-of-fit}), or to see whether the proportions of one variable are different for different values of the other variable (called a \strong{\emph{G}-test of independence}).

View File

@@ -32,7 +32,7 @@ pca(x, ..., retx = TRUE, center = TRUE, scale. = TRUE, tol = NULL,
standard deviations are less than or equal to \code{tol} times the
standard deviation of the first component.) With the default null
setting, no components are omitted (unless \code{rank.} is specified
less than \code{min(dim(x))}.). Other settings for \code{tol} could be
less than \code{min(dim(x))}.). Other settings for tol could be
\code{tol = 0} or \code{tol = sqrt(.Machine$double.eps)}, which
would omit essentially constant components.}