mirror of
https://github.com/msberends/AMR.git
synced 2026-04-28 07:44:03 +02:00
Compare commits
2 Commits
main
...
claude/mig
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
20c9447096 | ||
|
|
b1cf7a94ad |
@@ -1,6 +1,6 @@
|
|||||||
Package: AMR
|
Package: AMR
|
||||||
Version: 3.0.1.9052
|
Version: 3.0.1.9053
|
||||||
Date: 2026-04-25
|
Date: 2026-04-27
|
||||||
Title: Antimicrobial Resistance Data Analysis
|
Title: Antimicrobial Resistance Data Analysis
|
||||||
Description: Functions to simplify and standardise antimicrobial resistance (AMR)
|
Description: Functions to simplify and standardise antimicrobial resistance (AMR)
|
||||||
data analysis and to work with microbial and antimicrobial properties by
|
data analysis and to work with microbial and antimicrobial properties by
|
||||||
@@ -44,6 +44,8 @@ Suggests:
|
|||||||
curl,
|
curl,
|
||||||
data.table,
|
data.table,
|
||||||
dplyr,
|
dplyr,
|
||||||
|
future,
|
||||||
|
future.apply,
|
||||||
ggplot2,
|
ggplot2,
|
||||||
knitr,
|
knitr,
|
||||||
openxlsx,
|
openxlsx,
|
||||||
|
|||||||
5
NEWS.md
5
NEWS.md
@@ -1,4 +1,4 @@
|
|||||||
# AMR 3.0.1.9052
|
# AMR 3.0.1.9053
|
||||||
|
|
||||||
### New
|
### New
|
||||||
* Support for clinical breakpoints of 2026 of both CLSI and EUCAST, by adding all of their over 5,700 new clinical breakpoints to the `clinical_breakpoints` data set for usage in `as.sir()`. EUCAST 2026 is now the new default guideline for all MIC and disk diffusion interpretations.
|
* Support for clinical breakpoints of 2026 of both CLSI and EUCAST, by adding all of their over 5,700 new clinical breakpoints to the `clinical_breakpoints` data set for usage in `as.sir()`. EUCAST 2026 is now the new default guideline for all MIC and disk diffusion interpretations.
|
||||||
@@ -38,9 +38,12 @@
|
|||||||
* Fixed `as.sir()` for data frames silently deleting columns whose AB class was already `<sir>` when called a second time (re-running on already-converted data) (#278)
|
* Fixed `as.sir()` for data frames silently deleting columns whose AB class was already `<sir>` when called a second time (re-running on already-converted data) (#278)
|
||||||
* Fixed `as.sir()` for data frames incorrectly treating metadata columns (e.g. `patient`, `ward`) as antibiotic columns when their names coincidentally matched an antibiotic code; column content is now validated against AMR data patterns before inclusion
|
* Fixed `as.sir()` for data frames incorrectly treating metadata columns (e.g. `patient`, `ward`) as antibiotic columns when their names coincidentally matched an antibiotic code; column content is now validated against AMR data patterns before inclusion
|
||||||
* Improved parallel computing in `as.sir()`: when the number of AB columns is smaller than the number of available cores, rows are now split into batches so all cores stay active (row-batch mode). Previously, a 6-column dataset on a 16-core machine would only use 6 cores; now all 16 are used, with each worker processing a smaller row slice (lower per-worker memory pressure)
|
* Improved parallel computing in `as.sir()`: when the number of AB columns is smaller than the number of available cores, rows are now split into batches so all cores stay active (row-batch mode). Previously, a 6-column dataset on a 16-core machine would only use 6 cores; now all 16 are used, with each worker processing a smaller row slice (lower per-worker memory pressure)
|
||||||
|
* Fixed false-positive `"as_wt_nwt is no longer used"` warnings that appeared during parallel `as.sir()` runs; `as_wt_nwt` is now excluded from the unused-argument check in `as_sir_method()`
|
||||||
|
* **Breaking change**: `as.sir()` with `parallel = TRUE` now requires a non-sequential `future::plan()` to be active before the call — e.g., `future::plan(future::multisession)` — and throws an informative error if none is set; previously `as.sir()` would silently set up and tear down a `multisession` plan itself, which was slow and caused version-mismatch issues with `load_all()` workflows
|
||||||
* Fixed `as.sir()` ignoring `info = FALSE` for columns with no breakpoints (e.g. cefoxitin against *E. coli*): an operator-precedence bug (`&&`/`||`) caused the "Interpreting MIC values" intro message to fire unconditionally when `nrow(breakpoints) == 0`, regardless of `info`; the progress bar title was also not gated by `info`
|
* Fixed `as.sir()` ignoring `info = FALSE` for columns with no breakpoints (e.g. cefoxitin against *E. coli*): an operator-precedence bug (`&&`/`||`) caused the "Interpreting MIC values" intro message to fire unconditionally when `nrow(breakpoints) == 0`, regardless of `info`; the progress bar title was also not gated by `info`
|
||||||
|
|
||||||
### Updates
|
### Updates
|
||||||
|
* `as.sir()` with `parallel = TRUE` now uses `future.apply::future_lapply()` instead of `parallel::mclapply()`/`parallel::parLapply()`, enabling transparent support for any `future` backend (including `mirai_multisession`) on all platforms; `future` and `future.apply` are now listed under `Suggests`
|
||||||
* `as.sir()` with `reference_data`: custom guideline names now correctly classify values as R using EUCAST convention (`> breakpoint_R` for MIC, `< breakpoint_R` for disk); custom breakpoints with `host = NA` now serve as a host-agnostic fallback when no host-specific row matches (#239)
|
* `as.sir()` with `reference_data`: custom guideline names now correctly classify values as R using EUCAST convention (`> breakpoint_R` for MIC, `< breakpoint_R` for disk); custom breakpoints with `host = NA` now serve as a host-agnostic fallback when no host-specific row matches (#239)
|
||||||
* Extensive `cli` integration for better message handling and clickable links in messages and warnings (#191, #265)
|
* Extensive `cli` integration for better message handling and clickable links in messages and warnings (#191, #265)
|
||||||
* `mdro()` now infers resistance for a _missing_ base drug column from an _available_ corresponding drug+inhibitor combination showing resistance (e.g., piperacillin is absent but required, while piperacillin/tazobactam available and resistant). Can be set with the new argument `infer_from_combinations`, which defaults to `TRUE` (#209). Note that this can yield a higher MDRO detection (which is a good thing as it has become more reliable).
|
* `mdro()` now infers resistance for a _missing_ base drug column from an _available_ corresponding drug+inhibitor combination showing resistance (e.g., piperacillin is absent but required, while piperacillin/tazobactam available and resistant). Can be set with the new argument `infer_from_combinations`, which defaults to `TRUE` (#209). Note that this can yield a higher MDRO detection (which is a good thing as it has become more reliable).
|
||||||
|
|||||||
133
R/sir.R
133
R/sir.R
@@ -95,7 +95,7 @@ VALID_SIR_LEVELS <- c("S", "SDD", "I", "R", "NI", "WT", "NWT", "NS")
|
|||||||
#' # for veterinary breakpoints, also set `host`:
|
#' # for veterinary breakpoints, also set `host`:
|
||||||
#' your_data %>% mutate_if(is.mic, as.sir, host = "column_with_animal_species", guideline = "CLSI")
|
#' your_data %>% mutate_if(is.mic, as.sir, host = "column_with_animal_species", guideline = "CLSI")
|
||||||
#'
|
#'
|
||||||
#' # fast processing with parallel computing:
|
#' # fast processing with parallel computing (requires future.apply):
|
||||||
#' as.sir(your_data, ..., parallel = TRUE)
|
#' as.sir(your_data, ..., parallel = TRUE)
|
||||||
#' ```
|
#' ```
|
||||||
#' * Operators like "<=" will be considered according to the `capped_mic_handling` setting. At default, an MIC value of e.g. ">2" will return "NI" (non-interpretable) if the breakpoint is 4-8; the *true* MIC could be at either side of the breakpoint. This is to prevent that capped values from raw laboratory data would not be treated conservatively.
|
#' * Operators like "<=" will be considered according to the `capped_mic_handling` setting. At default, an MIC value of e.g. ">2" will return "NI" (non-interpretable) if the breakpoint is 4-8; the *true* MIC could be at either side of the breakpoint. This is to prevent that capped values from raw laboratory data would not be treated conservatively.
|
||||||
@@ -112,7 +112,7 @@ VALID_SIR_LEVELS <- c("S", "SDD", "I", "R", "NI", "WT", "NWT", "NS")
|
|||||||
#' # for veterinary breakpoints, also set `host`:
|
#' # for veterinary breakpoints, also set `host`:
|
||||||
#' your_data %>% mutate_if(is.disk, as.sir, host = "column_with_animal_species", guideline = "CLSI")
|
#' your_data %>% mutate_if(is.disk, as.sir, host = "column_with_animal_species", guideline = "CLSI")
|
||||||
#'
|
#'
|
||||||
#' # fast processing with parallel computing:
|
#' # fast processing with parallel computing (requires future.apply):
|
||||||
#' as.sir(your_data, ..., parallel = TRUE)
|
#' as.sir(your_data, ..., parallel = TRUE)
|
||||||
#' ```
|
#' ```
|
||||||
#'
|
#'
|
||||||
@@ -220,7 +220,8 @@ VALID_SIR_LEVELS <- c("S", "SDD", "I", "R", "NI", "WT", "NWT", "NS")
|
|||||||
#' sir_interpretation_history()
|
#' sir_interpretation_history()
|
||||||
#'
|
#'
|
||||||
#' \donttest{
|
#' \donttest{
|
||||||
#' # using parallel computing, which is available in base R:
|
#' # using parallel computing (requires the future.apply package):
|
||||||
|
#' # future::plan(future::multisession) # optional: set your own plan first
|
||||||
#' as.sir(df_wide, parallel = TRUE, info = TRUE)
|
#' as.sir(df_wide, parallel = TRUE, info = TRUE)
|
||||||
#'
|
#'
|
||||||
#'
|
#'
|
||||||
@@ -716,8 +717,8 @@ as.sir.disk <- function(x,
|
|||||||
}
|
}
|
||||||
|
|
||||||
#' @rdname as.sir
|
#' @rdname as.sir
|
||||||
#' @param parallel A [logical] to indicate if parallel computing must be used, defaults to `FALSE`. The `parallel` package is part of base \R and no additional packages are required. On Unix/macOS with \R >= 4.0.0, [parallel::mclapply()] (fork-based) is used; on Windows and \R < 4.0.0, [parallel::parLapply()] with a PSOCK cluster is used (requires the AMR package to be installed, not just loaded via `devtools::load_all()`). Parallelism distributes columns across cores; it is most beneficial when there are many antibiotic columns and a large number of rows.
|
#' @param parallel A [logical] to indicate if parallel computing must be used, defaults to `FALSE`. Requires the [`future.apply`][future.apply::future_lapply()] package. **A non-sequential [future::plan()] must already be active before setting `parallel = TRUE`** — for example, `future::plan(future::multisession)`. An error is thrown if `parallel = TRUE` is used without a plan set by the user. Parallelism distributes columns (and optionally row batches) across workers; it is most beneficial when there are many antibiotic columns and a large number of rows.
|
||||||
#' @param max_cores Maximum number of cores to use if `parallel = TRUE`. Use a negative value to subtract that number from the available number of cores, e.g. a value of `-2` on an 8-core machine means that at most 6 cores will be used. Defaults to `-1`. There will never be used more cores than variables to analyse. The available number of cores are detected using [parallelly::availableCores()] if that package is installed, and base \R's [parallel::detectCores()] otherwise.
|
#' @param max_cores Maximum number of workers to use when `parallel = TRUE`. Use a negative value to subtract that number from the available workers, e.g. a value of `-2` means at most `nbrOfWorkers() - 2` workers will be used. Defaults to `-1` (all but one worker). There will never be more workers used than there are antibiotic columns to analyse.
|
||||||
#' @export
|
#' @export
|
||||||
as.sir.data.frame <- function(x,
|
as.sir.data.frame <- function(x,
|
||||||
...,
|
...,
|
||||||
@@ -911,35 +912,34 @@ as.sir.data.frame <- function(x,
|
|||||||
}
|
}
|
||||||
|
|
||||||
# set up parallel computing
|
# set up parallel computing
|
||||||
n_cores <- get_n_cores(max_cores = max_cores)
|
if (isTRUE(parallel)) {
|
||||||
n_cores <- min(n_cores, length(ab_cols)) # never more cores than variables required
|
if (!requireNamespace("future.apply", quietly = TRUE)) {
|
||||||
if (isTRUE(parallel) && (.Platform$OS.type == "windows" || getRversion() < "4.0.0")) {
|
stop_(
|
||||||
cl <- tryCatch(parallel::makeCluster(n_cores, type = "PSOCK"),
|
"Setting {.arg parallel} to {.code TRUE} requires the {.pkg future.apply} package.\n",
|
||||||
error = function(e) {
|
"Install it with: ", highlight_code('install.packages("future.apply")'), "."
|
||||||
if (isTRUE(info)) {
|
)
|
||||||
message_("Could not create parallel cluster, using single-core computation. Error message: ", conditionMessage(e))
|
|
||||||
}
|
|
||||||
return(NULL)
|
|
||||||
}
|
|
||||||
)
|
|
||||||
if (!is.null(cl)) {
|
|
||||||
# Each PSOCK worker is a fresh R session — the AMR package must be loaded there
|
|
||||||
# so all exported functions (as.sir, as.mic, as.disk, ...) are available.
|
|
||||||
amr_loaded_on_workers <- tryCatch({
|
|
||||||
parallel::clusterEvalQ(cl, library(AMR, quietly = TRUE))
|
|
||||||
TRUE
|
|
||||||
}, error = function(e) FALSE)
|
|
||||||
if (!amr_loaded_on_workers) {
|
|
||||||
if (isTRUE(info)) {
|
|
||||||
message_("Could not load AMR on parallel workers (package may not be installed); falling back to single-core computation.")
|
|
||||||
}
|
|
||||||
parallel::stopCluster(cl)
|
|
||||||
cl <- NULL
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
if (is.null(cl)) {
|
if (inherits(future::plan(), "sequential")) {
|
||||||
n_cores <- 1
|
stop_(
|
||||||
|
"Setting {.arg parallel} to {.code TRUE} requires a non-sequential {.help [future::plan](future::plan)} to be active.\n",
|
||||||
|
"Set a parallel plan before calling {.help [{.fun as.sir}](AMR::as.sir)}, for example:\n",
|
||||||
|
highlight_code("future::plan(future::multisession)"), "\n",
|
||||||
|
"Or on Linux/macOS for fork-based workers:\n",
|
||||||
|
highlight_code("future::plan(future::multicore)"), "\n",
|
||||||
|
"See {.help [future::plan](future::plan)} for all available strategies.",
|
||||||
|
call = FALSE
|
||||||
|
)
|
||||||
}
|
}
|
||||||
|
n_workers <- future::nbrOfWorkers()
|
||||||
|
n_cores <- if (max_cores < 0L) {
|
||||||
|
max(1L, n_workers + max_cores)
|
||||||
|
} else {
|
||||||
|
min(max_cores, n_workers)
|
||||||
|
}
|
||||||
|
n_cores <- min(n_cores, length(ab_cols))
|
||||||
|
} else {
|
||||||
|
n_workers <- 1L
|
||||||
|
n_cores <- 1L
|
||||||
}
|
}
|
||||||
|
|
||||||
if (isTRUE(info)) {
|
if (isTRUE(info)) {
|
||||||
@@ -952,31 +952,23 @@ as.sir.data.frame <- function(x,
|
|||||||
is_parallel_run <- isTRUE(parallel) && n_cores > 1 && length(ab_cols) > 1
|
is_parallel_run <- isTRUE(parallel) && n_cores > 1 && length(ab_cols) > 1
|
||||||
effective_info <- if (is_parallel_run) FALSE else info
|
effective_info <- if (is_parallel_run) FALSE else info
|
||||||
|
|
||||||
# Row-batch mode: when n_cols < n_cores we would leave cores idle under plain
|
# Row-batch mode: when n_cols < n_workers we would leave workers idle under plain
|
||||||
# column-parallel dispatch. Instead we split rows into pieces so every core
|
# column-parallel dispatch. Instead we split rows into pieces so every worker
|
||||||
# gets work. pieces_per_col = ceil(n_cores / n_cols) gives ~n_cores jobs
|
# gets work. pieces_per_col = ceil(n_workers / n_cols) gives ~n_workers jobs
|
||||||
# total; each job processes one column on one row slice, which also reduces
|
# total; each job processes one column on one row slice, which also reduces
|
||||||
# per-worker memory pressure (smaller breakpoints search space).
|
# per-worker memory pressure (smaller breakpoints search space).
|
||||||
# Only used for the fork path (R >= 4.0, non-Windows); PSOCK clusters already
|
pieces_per_col <- if (is_parallel_run && length(ab_cols) < n_workers) {
|
||||||
# incur high per-job serialisation overhead so we keep column-mode there.
|
ceiling(n_workers / length(ab_cols))
|
||||||
use_fork <- is_parallel_run &&
|
|
||||||
!(.Platform$OS.type == "windows" || getRversion() < "4.0.0")
|
|
||||||
pieces_per_col <- if (use_fork && length(ab_cols) < n_cores) {
|
|
||||||
ceiling(n_cores / length(ab_cols))
|
|
||||||
} else {
|
} else {
|
||||||
1L
|
1L
|
||||||
}
|
}
|
||||||
|
|
||||||
run_as_sir_column <- function(i, rows = NULL) {
|
run_as_sir_column <- function(i, rows = NULL) {
|
||||||
# Always resolve AMR_env from the package namespace. This is essential for
|
# Always resolve AMR_env from the package namespace so workers get the live
|
||||||
# PSOCK workers (where the closure-captured AMR_env is a stale serialised copy
|
# environment rather than a stale serialised copy from the closure.
|
||||||
# while as.sir() writes to the live AMR:::AMR_env) and also avoids capturing
|
|
||||||
# pre-existing log entries from earlier in the session when forking.
|
|
||||||
.amr_env <- get("AMR_env", envir = asNamespace("AMR"), inherits = FALSE)
|
.amr_env <- get("AMR_env", envir = asNamespace("AMR"), inherits = FALSE)
|
||||||
# In parallel mode each worker (fork or PSOCK) has its own copy of the
|
# In parallel mode each worker has its own copy of the history; record the
|
||||||
# history; record the current length so we capture only the new rows added
|
# current length so we capture only the rows added by this as.sir() call.
|
||||||
# by the as.sir() call below, not any pre-existing entries inherited at fork
|
|
||||||
# time or carried over from earlier as.sir() calls.
|
|
||||||
if (is_parallel_run) pre_log_n <- NROW(.amr_env$sir_interpretation_history)
|
if (is_parallel_run) pre_log_n <- NROW(.amr_env$sir_interpretation_history)
|
||||||
|
|
||||||
ab_col <- ab_cols[i]
|
ab_col <- ab_cols[i]
|
||||||
@@ -1090,31 +1082,17 @@ as.sir.data.frame <- function(x,
|
|||||||
return(out)
|
return(out)
|
||||||
}
|
}
|
||||||
|
|
||||||
if (isTRUE(parallel) && n_cores > 1 && length(ab_cols) > 1) {
|
if (is_parallel_run) {
|
||||||
if (isTRUE(info)) {
|
if (isTRUE(info)) {
|
||||||
message_(as_note = FALSE)
|
message_(as_note = FALSE)
|
||||||
if (pieces_per_col > 1L) {
|
if (pieces_per_col > 1L) {
|
||||||
message_("Running in parallel mode using ", n_cores, " out of ", get_n_cores(Inf), " cores, on columns ", vector_and(paste0("{.field ", font_bold(ab_cols, collapse = NULL), "}"), quotes = FALSE, sort = FALSE), " (", pieces_per_col, " row slices per column)...", as_note = FALSE, appendLF = FALSE)
|
message_("Running in parallel mode using ", n_cores, " out of ", get_n_cores(Inf), " workers, on columns ", vector_and(paste0("{.field ", font_bold(ab_cols, collapse = NULL), "}"), quotes = FALSE, sort = FALSE), " (", pieces_per_col, " row slices per column)...", as_note = FALSE, appendLF = FALSE)
|
||||||
} else {
|
} else {
|
||||||
message_("Running in parallel mode using ", n_cores, " out of ", get_n_cores(Inf), " cores, on columns ", vector_and(paste0("{.field ", font_bold(ab_cols, collapse = NULL), "}"), quotes = FALSE, sort = FALSE), "...", as_note = FALSE, appendLF = FALSE)
|
message_("Running in parallel mode using ", n_cores, " out of ", get_n_cores(Inf), " workers, on columns ", vector_and(paste0("{.field ", font_bold(ab_cols, collapse = NULL), "}"), quotes = FALSE, sort = FALSE), "...", as_note = FALSE, appendLF = FALSE)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
if (.Platform$OS.type == "windows" || getRversion() < "4.0.0") {
|
if (pieces_per_col > 1L) {
|
||||||
# PSOCK cluster: column-mode only (row-batch serialisation overhead not worth it)
|
# Row-batch mode: build (col, row_slice) job pairs so all workers stay active
|
||||||
on.exit(parallel::stopCluster(cl), add = TRUE)
|
|
||||||
parallel::clusterExport(cl, varlist = c(
|
|
||||||
"x", "x.bak", "x_mo", "ab_cols", "types",
|
|
||||||
"capped_mic_handling", "as_wt_nwt", "add_intrinsic_resistance",
|
|
||||||
"reference_data", "substitute_missing_r_breakpoint", "include_screening", "include_PKPD",
|
|
||||||
"breakpoint_type", "guideline", "host", "uti", "verbose",
|
|
||||||
"col_mo", "conserve_capped_values",
|
|
||||||
"effective_info", "is_parallel_run",
|
|
||||||
"run_as_sir_column"
|
|
||||||
), envir = environment())
|
|
||||||
result_list <- parallel::parLapply(cl, seq_along(ab_cols), run_as_sir_column)
|
|
||||||
} else if (pieces_per_col > 1L) {
|
|
||||||
# Row-batch mode (R >= 4.0, non-Windows, n_cols < n_cores):
|
|
||||||
# build (col, row_slice) job pairs so all cores stay active
|
|
||||||
row_cuts <- unique(round(seq(0, nrow(x), length.out = pieces_per_col + 1L)))
|
row_cuts <- unique(round(seq(0, nrow(x), length.out = pieces_per_col + 1L)))
|
||||||
row_ranges <- lapply(seq_len(length(row_cuts) - 1L), function(p) {
|
row_ranges <- lapply(seq_len(length(row_cuts) - 1L), function(p) {
|
||||||
seq.int(row_cuts[p] + 1L, row_cuts[p + 1L])
|
seq.int(row_cuts[p] + 1L, row_cuts[p + 1L])
|
||||||
@@ -1122,9 +1100,9 @@ as.sir.data.frame <- function(x,
|
|||||||
jobs <- do.call(c, lapply(seq_along(ab_cols), function(ci) {
|
jobs <- do.call(c, lapply(seq_along(ab_cols), function(ci) {
|
||||||
lapply(seq_along(row_ranges), function(p) list(col = ci, rows = row_ranges[[p]]))
|
lapply(seq_along(row_ranges), function(p) list(col = ci, rows = row_ranges[[p]]))
|
||||||
}))
|
}))
|
||||||
flat <- parallel::mclapply(jobs, function(job) {
|
flat <- future.apply::future_lapply(jobs, function(job) {
|
||||||
run_as_sir_column(job$col, job$rows)
|
run_as_sir_column(job$col, job$rows)
|
||||||
}, mc.cores = n_cores)
|
}, future.seed = TRUE)
|
||||||
# Reassemble: for each column concatenate row pieces in order
|
# Reassemble: for each column concatenate row pieces in order
|
||||||
result_list <- lapply(seq_along(ab_cols), function(ci) {
|
result_list <- lapply(seq_along(ab_cols), function(ci) {
|
||||||
pieces <- flat[vapply(jobs, function(j) j$col == ci, logical(1L))]
|
pieces <- flat[vapply(jobs, function(j) j$col == ci, logical(1L))]
|
||||||
@@ -1137,8 +1115,8 @@ as.sir.data.frame <- function(x,
|
|||||||
)
|
)
|
||||||
})
|
})
|
||||||
} else {
|
} else {
|
||||||
# Column-parallel mode (R >= 4.0, non-Windows, n_cols >= n_cores)
|
# Column-parallel mode: one job per antibiotic column
|
||||||
result_list <- parallel::mclapply(seq_along(ab_cols), run_as_sir_column, mc.cores = n_cores)
|
result_list <- future.apply::future_lapply(seq_along(ab_cols), run_as_sir_column, future.seed = TRUE)
|
||||||
}
|
}
|
||||||
if (isTRUE(info)) {
|
if (isTRUE(info)) {
|
||||||
message_(font_green_bg("\u00a0DONE\u00a0"), as_note = FALSE)
|
message_(font_green_bg("\u00a0DONE\u00a0"), as_note = FALSE)
|
||||||
@@ -1147,10 +1125,13 @@ as.sir.data.frame <- function(x,
|
|||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
# sequential mode (non-parallel)
|
# sequential mode (non-parallel)
|
||||||
if (isTRUE(info) && n_cores > 1 && NROW(x) * NCOL(x) > 10000) {
|
if (isTRUE(info) && get_n_cores(Inf) > 1 && NROW(x) * NCOL(x) > 10000) {
|
||||||
# give a note that parallel mode might be better
|
|
||||||
message_(as_note = FALSE)
|
message_(as_note = FALSE)
|
||||||
message_("Running in sequential mode. Consider setting {.arg parallel} to {.code TRUE} to speed up processing on multiple cores.\n")
|
if (requireNamespace("future.apply", quietly = TRUE)) {
|
||||||
|
message_("Running in sequential mode. To speed up processing, set a parallel plan first (e.g., ", highlight_code("future::plan(future::multisession)"), ") and then set {.arg parallel} to {.code TRUE}.\n")
|
||||||
|
} else {
|
||||||
|
message_("Running in sequential mode. To speed up processing, install the {.pkg future.apply} package, set a parallel plan first (e.g., ", highlight_code("future::plan(future::multisession)"), ") and then set {.arg parallel} to {.code TRUE}.\n")
|
||||||
|
}
|
||||||
}
|
}
|
||||||
# this will contain a progress bar already
|
# this will contain a progress bar already
|
||||||
result_list <- lapply(seq_along(ab_cols), run_as_sir_column)
|
result_list <- lapply(seq_along(ab_cols), run_as_sir_column)
|
||||||
@@ -1280,7 +1261,7 @@ as_sir_method <- function(method_short,
|
|||||||
|
|
||||||
# backward compatibilty
|
# backward compatibilty
|
||||||
dots <- list(...)
|
dots <- list(...)
|
||||||
dots <- dots[which(!names(dots) %in% c("warn", "mo.bak", "is_data.frame"))]
|
dots <- dots[which(!names(dots) %in% c("warn", "mo.bak", "is_data.frame", "as_wt_nwt"))]
|
||||||
if (length(dots) != 0) {
|
if (length(dots) != 0) {
|
||||||
warning_("These arguments in {.help [{.fun as.sir}](AMR::as.sir)} are no longer used: ", vector_and(names(dots), quotes = "`"), ".", call = FALSE)
|
warning_("These arguments in {.help [{.fun as.sir}](AMR::as.sir)} are no longer used: ", vector_and(names(dots), quotes = "`"), ".", call = FALSE)
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -31,22 +31,24 @@ step_sir_numeric(recipe, ..., role = NA, trained = FALSE, columns = NULL,
|
|||||||
skip = FALSE, id = recipes::rand_id("sir_numeric"))
|
skip = FALSE, id = recipes::rand_id("sir_numeric"))
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{recipe}{A recipe object. The step will be added to the sequence of
|
\item{recipe}{A recipe object. The step will be added to the
|
||||||
operations for this recipe.}
|
sequence of operations for this recipe.}
|
||||||
|
|
||||||
\item{...}{One or more selector functions to choose variables for this step.
|
\item{...}{One or more selector functions to choose variables
|
||||||
See \code{\link[recipes:selections]{selections()}} for more details.}
|
for this step. See \code{\link[recipes:selections]{selections()}} for more details.}
|
||||||
|
|
||||||
\item{role}{Not used by this step since no new variables are created.}
|
\item{role}{Not used by this step since no new variables are
|
||||||
|
created.}
|
||||||
|
|
||||||
\item{trained}{A logical to indicate if the quantities for preprocessing have
|
\item{trained}{A logical to indicate if the quantities for
|
||||||
been estimated.}
|
preprocessing have been estimated.}
|
||||||
|
|
||||||
\item{skip}{A logical. Should the step be skipped when the recipe is baked by
|
\item{skip}{A logical. Should the step be skipped when the
|
||||||
\code{\link[recipes:bake]{bake()}}? While all operations are baked when \code{\link[recipes:prep]{prep()}} is run, some
|
recipe is baked by \code{\link[recipes:bake]{bake()}}? While all operations are baked
|
||||||
operations may not be able to be conducted on new data (e.g. processing the
|
when \code{\link[recipes:prep]{prep()}} is run, some operations may not be able to be
|
||||||
outcome variable(s)). Care should be taken when using \code{skip = TRUE} as it
|
conducted on new data (e.g. processing the outcome variable(s)).
|
||||||
may affect the computations for subsequent operations.}
|
Care should be taken when using \code{skip = TRUE} as it may affect
|
||||||
|
the computations for subsequent operations.}
|
||||||
|
|
||||||
\item{id}{A character string that is unique to this step to identify it.}
|
\item{id}{A character string that is unique to this step to identify it.}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -72,7 +72,7 @@ retrieve_wisca_parameters(wisca_model, ...)
|
|||||||
|
|
||||||
\item{ab_transform}{A character to transform antimicrobial input - must be one of the column names of the \link{antimicrobials} data set (defaults to \code{"name"}): \code{"ab"}, \code{"cid"}, \code{"name"}, \code{"group"}, \code{"atc"}, \code{"atc_group1"}, \code{"atc_group2"}, \code{"abbreviations"}, \code{"synonyms"}, \code{"oral_ddd"}, \code{"oral_units"}, \code{"iv_ddd"}, \code{"iv_units"}, or \code{"loinc"}. Can also be \code{NULL} to not transform the input.}
|
\item{ab_transform}{A character to transform antimicrobial input - must be one of the column names of the \link{antimicrobials} data set (defaults to \code{"name"}): \code{"ab"}, \code{"cid"}, \code{"name"}, \code{"group"}, \code{"atc"}, \code{"atc_group1"}, \code{"atc_group2"}, \code{"abbreviations"}, \code{"synonyms"}, \code{"oral_ddd"}, \code{"oral_units"}, \code{"iv_ddd"}, \code{"iv_units"}, or \code{"loinc"}. Can also be \code{NULL} to not transform the input.}
|
||||||
|
|
||||||
\item{syndromic_group}{A column name of \code{x}, or values calculated to split rows of \code{x}, e.g. by using \code{\link[=ifelse]{ifelse()}} or \code{\link[dplyr:case-and-replace-when]{case_when()}}. See \emph{Examples}.}
|
\item{syndromic_group}{A column name of \code{x}, or values calculated to split rows of \code{x}, e.g. by using \code{\link[=ifelse]{ifelse()}} or \code{\link[dplyr:case_when]{case_when()}}. See \emph{Examples}.}
|
||||||
|
|
||||||
\item{add_total_n}{\emph{(deprecated in favour of \code{formatting_type})} A \link{logical} to indicate whether \code{n_tested} available numbers per pathogen should be added to the table (default is \code{TRUE}). This will add the lowest and highest number of available isolates per antimicrobial (e.g, if for \emph{E. coli} 200 isolates are available for ciprofloxacin and 150 for amoxicillin, the returned number will be "150-200"). This option is unavailable when \code{wisca = TRUE}; in that case, use \code{\link[=retrieve_wisca_parameters]{retrieve_wisca_parameters()}} to get the parameters used for WISCA.}
|
\item{add_total_n}{\emph{(deprecated in favour of \code{formatting_type})} A \link{logical} to indicate whether \code{n_tested} available numbers per pathogen should be added to the table (default is \code{TRUE}). This will add the lowest and highest number of available isolates per antimicrobial (e.g, if for \emph{E. coli} 200 isolates are available for ciprofloxacin and 150 for amoxicillin, the returned number will be "150-200"). This option is unavailable when \code{wisca = TRUE}; in that case, use \code{\link[=retrieve_wisca_parameters]{retrieve_wisca_parameters()}} to get the parameters used for WISCA.}
|
||||||
|
|
||||||
|
|||||||
@@ -150,9 +150,9 @@ The default \code{"conservative"} setting ensures cautious handling of uncertain
|
|||||||
|
|
||||||
\item{col_mo}{Column name of the names or codes of the microorganisms (see \code{\link[=as.mo]{as.mo()}}) - the default is the first column of class \code{\link{mo}}. Values will be coerced using \code{\link[=as.mo]{as.mo()}}.}
|
\item{col_mo}{Column name of the names or codes of the microorganisms (see \code{\link[=as.mo]{as.mo()}}) - the default is the first column of class \code{\link{mo}}. Values will be coerced using \code{\link[=as.mo]{as.mo()}}.}
|
||||||
|
|
||||||
\item{parallel}{A \link{logical} to indicate if parallel computing must be used, defaults to \code{FALSE}. The \code{parallel} package is part of base \R and no additional packages are required. On Unix/macOS with \R >= 4.0.0, \code{\link[parallel:mclapply]{parallel::mclapply()}} (fork-based) is used; on Windows and \R < 4.0.0, \code{\link[parallel:clusterApply]{parallel::parLapply()}} with a PSOCK cluster is used (requires the AMR package to be installed, not just loaded via \code{devtools::load_all()}). Parallelism distributes columns across cores; it is most beneficial when there are many antibiotic columns and a large number of rows.}
|
\item{parallel}{A \link{logical} to indicate if parallel computing must be used, defaults to \code{FALSE}. Requires the \code{\link[future.apply:future_lapply]{future.apply}} package. \strong{A non-sequential \code{\link[future:plan]{future::plan()}} must already be active before setting \code{parallel = TRUE}} — for example, \code{future::plan(future::multisession)}. An error is thrown if \code{parallel = TRUE} is used without a plan set by the user. Parallelism distributes columns (and optionally row batches) across workers; it is most beneficial when there are many antibiotic columns and a large number of rows.}
|
||||||
|
|
||||||
\item{max_cores}{Maximum number of cores to use if \code{parallel = TRUE}. Use a negative value to subtract that number from the available number of cores, e.g. a value of \code{-2} on an 8-core machine means that at most 6 cores will be used. Defaults to \code{-1}. There will never be used more cores than variables to analyse. The available number of cores are detected using \code{\link[parallelly:availableCores]{parallelly::availableCores()}} if that package is installed, and base \R's \code{\link[parallel:detectCores]{parallel::detectCores()}} otherwise.}
|
\item{max_cores}{Maximum number of workers to use when \code{parallel = TRUE}. Use a negative value to subtract that number from the available workers, e.g. a value of \code{-2} means at most \code{nbrOfWorkers() - 2} workers will be used. Defaults to \code{-1} (all but one worker). There will never be more workers used than there are antibiotic columns to analyse.}
|
||||||
|
|
||||||
\item{clean}{A \link{logical} to indicate whether previously stored results should be forgotten after returning the 'logbook' with results.}
|
\item{clean}{A \link{logical} to indicate whether previously stored results should be forgotten after returning the 'logbook' with results.}
|
||||||
}
|
}
|
||||||
@@ -183,7 +183,7 @@ your_data \%>\% mutate_if(is.mic, as.sir, ab = c("cipro", "ampicillin", ...), mo
|
|||||||
# for veterinary breakpoints, also set `host`:
|
# for veterinary breakpoints, also set `host`:
|
||||||
your_data \%>\% mutate_if(is.mic, as.sir, host = "column_with_animal_species", guideline = "CLSI")
|
your_data \%>\% mutate_if(is.mic, as.sir, host = "column_with_animal_species", guideline = "CLSI")
|
||||||
|
|
||||||
# fast processing with parallel computing:
|
# fast processing with parallel computing (requires future.apply):
|
||||||
as.sir(your_data, ..., parallel = TRUE)
|
as.sir(your_data, ..., parallel = TRUE)
|
||||||
}\if{html}{\out{</div>}}
|
}\if{html}{\out{</div>}}
|
||||||
\item Operators like "<=" will be considered according to the \code{capped_mic_handling} setting. At default, an MIC value of e.g. ">2" will return "NI" (non-interpretable) if the breakpoint is 4-8; the \emph{true} MIC could be at either side of the breakpoint. This is to prevent that capped values from raw laboratory data would not be treated conservatively.
|
\item Operators like "<=" will be considered according to the \code{capped_mic_handling} setting. At default, an MIC value of e.g. ">2" will return "NI" (non-interpretable) if the breakpoint is 4-8; the \emph{true} MIC could be at either side of the breakpoint. This is to prevent that capped values from raw laboratory data would not be treated conservatively.
|
||||||
@@ -201,7 +201,7 @@ your_data \%>\% mutate_if(is.disk, as.sir, ab = c("cipro", "ampicillin", ...), m
|
|||||||
# for veterinary breakpoints, also set `host`:
|
# for veterinary breakpoints, also set `host`:
|
||||||
your_data \%>\% mutate_if(is.disk, as.sir, host = "column_with_animal_species", guideline = "CLSI")
|
your_data \%>\% mutate_if(is.disk, as.sir, host = "column_with_animal_species", guideline = "CLSI")
|
||||||
|
|
||||||
# fast processing with parallel computing:
|
# fast processing with parallel computing (requires future.apply):
|
||||||
as.sir(your_data, ..., parallel = TRUE)
|
as.sir(your_data, ..., parallel = TRUE)
|
||||||
}\if{html}{\out{</div>}}
|
}\if{html}{\out{</div>}}
|
||||||
}
|
}
|
||||||
@@ -313,7 +313,8 @@ as.sir(df_wide)
|
|||||||
sir_interpretation_history()
|
sir_interpretation_history()
|
||||||
|
|
||||||
\donttest{
|
\donttest{
|
||||||
# using parallel computing, which is available in base R:
|
# using parallel computing (requires the future.apply package):
|
||||||
|
# future::plan(future::multisession) # optional: set your own plan first
|
||||||
as.sir(df_wide, parallel = TRUE, info = TRUE)
|
as.sir(df_wide, parallel = TRUE, info = TRUE)
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -19,7 +19,7 @@ Define custom EUCAST rules for your organisation or specific analysis and use th
|
|||||||
Some organisations have their own adoption of EUCAST rules. This function can be used to define custom EUCAST rules to be used in the \code{\link[=eucast_rules]{eucast_rules()}} function.
|
Some organisations have their own adoption of EUCAST rules. This function can be used to define custom EUCAST rules to be used in the \code{\link[=eucast_rules]{eucast_rules()}} function.
|
||||||
\subsection{Basics}{
|
\subsection{Basics}{
|
||||||
|
|
||||||
If you are familiar with the \code{\link[dplyr:case-and-replace-when]{case_when()}} function of the \code{dplyr} package, you will recognise the input method to set your own rules. Rules must be set using what \R considers to be the 'formula notation'. The rule itself is written \emph{before} the tilde (\code{~}) and the consequence of the rule is written \emph{after} the tilde:
|
If you are familiar with the \code{\link[dplyr:case_when]{case_when()}} function of the \code{dplyr} package, you will recognise the input method to set your own rules. Rules must be set using what \R considers to be the 'formula notation'. The rule itself is written \emph{before} the tilde (\code{~}) and the consequence of the rule is written \emph{after} the tilde:
|
||||||
|
|
||||||
\if{html}{\out{<div class="sourceCode r">}}\preformatted{x <- custom_eucast_rules(TZP == "S" ~ aminopenicillins == "S",
|
\if{html}{\out{<div class="sourceCode r">}}\preformatted{x <- custom_eucast_rules(TZP == "S" ~ aminopenicillins == "S",
|
||||||
TZP == "R" ~ aminopenicillins == "R")
|
TZP == "R" ~ aminopenicillins == "R")
|
||||||
|
|||||||
@@ -26,7 +26,7 @@ Define custom a MDRO guideline for your organisation or specific analysis and us
|
|||||||
Using a custom MDRO guideline is of importance if you have custom rules to determine MDROs in your hospital, e.g., rules that are dependent on ward, state of contact isolation or other variables in your data.
|
Using a custom MDRO guideline is of importance if you have custom rules to determine MDROs in your hospital, e.g., rules that are dependent on ward, state of contact isolation or other variables in your data.
|
||||||
\subsection{Basics}{
|
\subsection{Basics}{
|
||||||
|
|
||||||
If you are familiar with the \code{\link[dplyr:case-and-replace-when]{case_when()}} function of the \code{dplyr} package, you will recognise the input method to set your own rules. Rules must be set using what \R considers to be the 'formula notation'. The rule itself is written \emph{before} the tilde (\code{~}) and the consequence of the rule is written \emph{after} the tilde:
|
If you are familiar with the \code{\link[dplyr:case_when]{case_when()}} function of the \code{dplyr} package, you will recognise the input method to set your own rules. Rules must be set using what \R considers to be the 'formula notation'. The rule itself is written \emph{before} the tilde (\code{~}) and the consequence of the rule is written \emph{after} the tilde:
|
||||||
|
|
||||||
\if{html}{\out{<div class="sourceCode r">}}\preformatted{custom <- custom_mdro_guideline(CIP == "R" & age > 60 ~ "Elderly Type A",
|
\if{html}{\out{<div class="sourceCode r">}}\preformatted{custom <- custom_mdro_guideline(CIP == "R" & age > 60 ~ "Elderly Type A",
|
||||||
ERY == "R" & age > 60 ~ "Elderly Type B")
|
ERY == "R" & age > 60 ~ "Elderly Type B")
|
||||||
|
|||||||
@@ -45,9 +45,8 @@ A list with class \code{"htest"} containing the following
|
|||||||
\item{residuals}{the Pearson residuals,
|
\item{residuals}{the Pearson residuals,
|
||||||
\code{(observed - expected) / sqrt(expected)}.}
|
\code{(observed - expected) / sqrt(expected)}.}
|
||||||
\item{stdres}{standardized residuals,
|
\item{stdres}{standardized residuals,
|
||||||
\code{(observed - expected) / sqrt(V)}, where \code{V} is the
|
\code{(observed - expected) / sqrt(V)}, where \code{V} is the residual cell variance (Agresti, 2007,
|
||||||
residual cell variance (Agresti, 2007, section 2.4.5
|
section 2.4.5 for the case where \code{x} is a matrix, \code{n * p * (1 - p)} otherwise).}
|
||||||
for the case where \code{x} is a matrix, \code{n * p * (1 - p)} otherwise).}
|
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
\code{\link[=g.test]{g.test()}} performs chi-squared contingency table tests and goodness-of-fit tests, just like \code{\link[=chisq.test]{chisq.test()}} but is more reliable (1). A \emph{G}-test can be used to see whether the number of observations in each category fits a theoretical expectation (called a \strong{\emph{G}-test of goodness-of-fit}), or to see whether the proportions of one variable are different for different values of the other variable (called a \strong{\emph{G}-test of independence}).
|
\code{\link[=g.test]{g.test()}} performs chi-squared contingency table tests and goodness-of-fit tests, just like \code{\link[=chisq.test]{chisq.test()}} but is more reliable (1). A \emph{G}-test can be used to see whether the number of observations in each category fits a theoretical expectation (called a \strong{\emph{G}-test of goodness-of-fit}), or to see whether the proportions of one variable are different for different values of the other variable (called a \strong{\emph{G}-test of independence}).
|
||||||
|
|||||||
@@ -32,7 +32,7 @@ pca(x, ..., retx = TRUE, center = TRUE, scale. = TRUE, tol = NULL,
|
|||||||
standard deviations are less than or equal to \code{tol} times the
|
standard deviations are less than or equal to \code{tol} times the
|
||||||
standard deviation of the first component.) With the default null
|
standard deviation of the first component.) With the default null
|
||||||
setting, no components are omitted (unless \code{rank.} is specified
|
setting, no components are omitted (unless \code{rank.} is specified
|
||||||
less than \code{min(dim(x))}.). Other settings for \code{tol} could be
|
less than \code{min(dim(x))}.). Other settings for tol could be
|
||||||
\code{tol = 0} or \code{tol = sqrt(.Machine$double.eps)}, which
|
\code{tol = 0} or \code{tol = sqrt(.Machine$double.eps)}, which
|
||||||
would omit essentially constant components.}
|
would omit essentially constant components.}
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user