1
0
mirror of https://github.com/msberends/AMR.git synced 2026-03-24 08:42:29 +01:00
Files
AMR/R/mic.R
Matthijs Berends 4171d5b778 (v3.0.0.9036) Modernise messaging infrastructure to use cli markup (#265)
* Modernise messaging infrastructure with cli support

Rewrites message_(), warning_(), stop_() to use cli::cli_inform(),
cli::cli_warn(), and cli::cli_abort() when the cli package is available,
with a fully functional plain-text fallback for environments without cli.

Key changes:
- New cli_to_plain() helper converts cli inline markup ({.fun}, {.arg},
  {.val}, {.field}, {.cls}, {.pkg}, {.href}, {.url}, etc.) to readable
  plain-text equivalents for the non-cli fallback path
- word_wrap() simplified: drops add_fn, ANSI re-index algorithm, RStudio
  link injection, and operator spacing hack; returns pasted input unchanged
  when cli is available
- stop_() no longer references AMR_env$cli_abort; uses pkg_is_available()
  directly; passes sys.call() objects to cli::cli_abort() call= argument
- Removed add_fn parameter from message_(), warning_(), and word_wrap()
- All call sites across R/ updated: add_fn arguments removed, some paste0-
  based string construction converted to cli glue syntax ({.fun as.mo},
  {.arg col_mo}, {n} results, etc.)
- cli already listed in Suggests; no DESCRIPTION dependency changes needed

https://claude.ai/code/session_01XHWLohiSTdZvCutwD7ag2b

* Replace {.fun} with {.help} for all exported functions in messaging

All function names referenced via {.fun …} in cli-style messages are
exported in NAMESPACE, so {.help …} is the appropriate markup — it
renders as a clickable help link rather than plain function styling.

https://claude.ai/code/session_01XHWLohiSTdZvCutwD7ag2b

* Qualify all {.help} tags with AMR:: and convert backtick ?func references

- Add AMR:: namespace prefix and trailing () to all {.help} cli markup
  so they render as clickable help links (e.g. {.help AMR::as.sir}())
- Convert `?funcname` backtick-quoted help references to {.help AMR::funcname}()
  in aa_helper_functions.R, custom_eucast_rules.R, interpretive_rules.R,
  key_antimicrobials.R, mo.R, plotting.R, resistance_predict.R, and sir.R
- Skipped `?proportion` in sir_calc.R as 'proportion' is not exported

https://claude.ai/code/session_01XHWLohiSTdZvCutwD7ag2b

* Require cli >= 3.0.0 for cli_inform/cli_warn/cli_abort availability checks

cli_inform, cli_warn, and cli_abort were introduced in cli 3.0.0.
Add min_version = "3.0.0" (as character) to all four pkg_is_available("cli")
checks so older cli versions fall back to base R messaging.

https://claude.ai/code/session_01XHWLohiSTdZvCutwD7ag2b

* Implement cli::code_highlight() for R code examples in messages (issue #191)

Add highlight_code() helper that wraps cli::code_highlight() when cli >= 3.0.0
is available, falling back to plain code otherwise. Apply it to all inline
R code examples embedded in message/warning/stop strings across the package.

Also convert remaining backtick-quoted function and argument references in
messaging calls to proper cli markup: {.help AMR::fn}(), {.arg arg},
{.code expr}, and {.pkg pkg} throughout ab.R, ab_from_text.R, av_from_text.R,
amr_selectors.R, count.R, custom_antimicrobials.R, custom_microorganisms.R,
interpretive_rules.R, mo.R, mo_property.R, sir.R, sir_calc.R.

Fixes #191

https://claude.ai/code/session_01XHWLohiSTdZvCutwD7ag2b

* Fix {.help} markup to use correct cli link format [{.fun fn}](AMR::fn)

Replace all instances of {.help AMR::fn}() (incorrect format with manual
parentheses outside the link) with {.help [{.fun fn}](AMR::fn)} which is
the correct cli hyperlink syntax: the display text [{.fun fn}] renders the
function name with parentheses automatically, and (AMR::fn) is the link target.

Also update the plain-text fallback handler in aa_helper_functions.R to
extract the display text from the [text](topic) markdown link format,
so that non-cli environments show just the function name (e.g. `fn()`),
not the raw link markup.

Dynamic cases in amr_selectors.R and mo_property.R also updated.

https://claude.ai/code/session_01XHWLohiSTdZvCutwD7ag2b

* Add {.topic} markup for non-function help page references

Replace {.code ?AMR-options} and backtick-style ?AMR-options / ?AMR-deprecated
references with proper {.topic AMR-options} / {.topic AMR-deprecated} cli markup
in count.R, interpretive_rules.R, proportion.R, and zz_deprecated.R.

Add {.topic} fallback handler to format_message() in aa_helper_functions.R:
plain-text environments render {.topic foo} as ?foo, and the [text](topic)
link form extracts just the display text (same pattern as {.help}).

Also convert remaining backtick function/arg references in proportion.R to
{.help [{.fun ...}](AMR::...)}, {.arg}, and {.code} markup for consistency.

Note: zzz.R intentionally keeps the backtick form since its startup message
goes through packageStartupMessage() which bypasses our cli infrastructure.

https://claude.ai/code/session_01XHWLohiSTdZvCutwD7ag2b

* Fix {.topic} to use required pkg::topic format with display text

{.topic} in cli requires a package-qualified topic reference to generate
a valid x-r-help:pkg::topic URI. Bare {.topic AMR-options} produced a
malformed x-r-help:AMR-options URI (no package prefix).

Use the [display_text](pkg::topic) form throughout:
  {.topic [AMR-options](AMR::AMR-options)}
  {.topic [AMR-deprecated](AMR::AMR-deprecated)}

The hyphen in the topic name is fine as a URI string even though
AMR::AMR-options is not a valid R symbol expression.

The fallback handler in format_message() already handles the [text](uri)
form by extracting the display text, so plain-text output is unchanged.

https://claude.ai/code/session_01XHWLohiSTdZvCutwD7ag2b

* Fix regexec() calls: remove perl=TRUE unsupported in older R

regexec() only gained the perl argument in R 4.1.0. The CI matrix
covers oldrel-1 through oldrel-4 (R 3.x/4.0.x), so perl=TRUE caused
an 'unused argument' error on every message_() call in those
environments.

All four affected regexec() calls use POSIX-extended compatible
patterns, so dropping perl=TRUE is safe.

https://claude.ai/code/session_01XHWLohiSTdZvCutwD7ag2b

* Slim CI matrix for PRs to ubuntu-latest / r-release only

For pull requests, check-recent now runs a single job (ubuntu-latest,
r-release) via a setup job that emits the matrix as JSON. On push and
schedule the full matrix is unchanged (devel + release on all OSes,
oldrel-1 through oldrel-4).

Also removed the pull_request trigger from check-recent-dev-pkgs; the
dev-packages check only needs to run on push/schedule.

https://claude.ai/code/session_01XHWLohiSTdZvCutwD7ag2b

* Restrict dev-versions and old-tinytest CI to main branch only

Both workflows were triggering on every push to every branch.
Narrowed push trigger to [main] so they only run after merging,
not on every feature/PR branch push.

https://claude.ai/code/session_01XHWLohiSTdZvCutwD7ag2b

* Update NEWS.md to continuous log + add concise style rules to CLAUDE.md

NEWS.md is now a single continuous log under one heading per dev series,
not a new section per version bump. CLAUDE.md documents: only replace
line 1 (heading), append new entries, keep them extremely concise with
no trailing full stop.

Merged 9035 and 9036 entries into one section; condensed verbose 9036
bullets; added CI workflow change entry.

https://claude.ai/code/session_01XHWLohiSTdZvCutwD7ag2b

* Replace single-quoted literals in messaging calls with cli markup

Converted bare 'value' strings inside stop_(), warning_(), message_()
to appropriate cli markup:
- {.val}: option values ('drug', 'dose', 'administration', 'SDD', 'logbook')
- {.cls}: class names ('sir', 'mo')
- {.field}: column names ('mo' in mo_source)
- {.code}: object/dataset names ('clinical_breakpoints')

Files changed: ab_from_text.R, av_from_text.R, sir.R, sir_calc.R, mo_source.R

https://claude.ai/code/session_01XHWLohiSTdZvCutwD7ag2b

* Apply {.topic}, {.cls}, and {.field} markup in sir.R messaging

- 'clinical_breakpoints' (dataset): {.code} -> {.topic [clinical_breakpoints](AMR::clinical_breakpoints)}
- "is of class" context: extract bad_col/bad_cls/exp_cls vars and use {.cls} + {.field} in glue syntax
- Column references in as.sir() messages: font_bold(col) with surrounding quotes -> {.field {col}}

https://claude.ai/code/session_01XHWLohiSTdZvCutwD7ag2b

* Replace glue-style dynamic markup with paste0() construction

{.field {variable}} and {.cls {variable}} patterns rely on glue
evaluation which is not safe in a zero-dependency package. Replace
all four occurrences with paste0("{.field ", var, "}") so the value
is baked into the markup string before reaching message_()/stop_().

https://claude.ai/code/session_01XHWLohiSTdZvCutwD7ag2b

* Limit push trigger to main in check-recent workflow

push: branches: '**' caused both the push event (9-worker matrix) and
the pull_request event (1-worker matrix) to fire simultaneously on every
PR commit. Restricting push to [main] means PR pushes only trigger the
pull_request path (1 worker), while direct pushes to main still get the
full 9-worker matrix.

https://claude.ai/code/session_01XHWLohiSTdZvCutwD7ag2b

* Limit push trigger to main in code-coverage workflow

Same fix as check-recent: push: branches: '**' caused the workflow to
run twice per PR commit (once for push, once for pull_request). Restricting
push to [main] ensures coverage runs only once per PR update.

https://claude.ai/code/session_01XHWLohiSTdZvCutwD7ag2b

* Replace bare backticks with cli inline markup across all messaging calls

- {.arg} for argument names in stop_/warning_/message_ calls
- {.cls} after "of class" text in format_class() and elsewhere
- {.fun} for function names (replaces `fn()` pattern)
- {.pkg} for tidyverse package names (dplyr, ggplot2)
- {.code} for code literals (TRUE, FALSE, expressions)
- Rewrite print.ab: use cli named-vector with * bullets and code
  highlighting when cli >= 3.0.0; keep plain-text fallback otherwise
- Fix typo in as.sir(): "of must be" -> "or must be"
- switch sir.R verbose notes from message() to message_()

https://claude.ai/code/session_01XHWLohiSTdZvCutwD7ag2b

* Pre-evaluate inline expressions, add format_inline_(), fix print.ab

- All bare {variable}/{expression} in message_()/warning_()/stop_() calls
  are now pre-evaluated via paste0(), so users without cli/glue never see
  raw template syntax (mo_source.R, first_isolate.R, join_microorganisms.R,
  antibiogram.R, atc_online.R)
- Add format_inline_() helper: formats a cli-markup string and returns it
  (not emits it), using cli::format_inline() when available and cli_to_plain()
  otherwise
- Rewrite .onAttach to use format_inline_() for all packageStartupMessage
  calls; also adds {.topic} link and {.code} markup for option names
- print.ab: pre-evaluate function_name via paste0 (no .envir needed),
  apply highlight_code() to each example bullet for R syntax highlighting
- join_microorganisms: pre-evaluate {type} and {nrow(...)} expressions

https://claude.ai/code/session_01XHWLohiSTdZvCutwD7ag2b

* fixes

* Replace all "in \`funcname()\`:" with {.help [{.fun funcname}](AMR::funcname)}

Converts all "in `funcname()`:" prefixes in warning_()/message_()/stop_()
calls to the full {.help} link format for clickable help in supported
terminals. Also fixes adjacent backtick argument names to {.arg}.

Files changed: ab.R, ab_property.R, av.R, av_property.R, antibiogram.R,
key_antimicrobials.R, mdro.R, mic.R, mo.R, plotting.R

https://claude.ai/code/session_01XHWLohiSTdZvCutwD7ag2b

* fixes

* definitive

* version fix

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-03-20 17:01:34 +01:00

722 lines
26 KiB
R

# ==================================================================== #
# TITLE: #
# AMR: An R Package for Working with Antimicrobial Resistance Data #
# #
# SOURCE CODE: #
# https://github.com/msberends/AMR #
# #
# PLEASE CITE THIS SOFTWARE AS: #
# Berends MS, Luz CF, Friedrich AW, et al. (2022). #
# AMR: An R Package for Working with Antimicrobial Resistance Data. #
# Journal of Statistical Software, 104(3), 1-31. #
# https://doi.org/10.18637/jss.v104.i03 #
# #
# Developed at the University of Groningen and the University Medical #
# Center Groningen in The Netherlands, in collaboration with many #
# colleagues from around the world, see our website. #
# #
# This R package is free software; you can freely use and distribute #
# it for both personal and commercial purposes under the terms of the #
# GNU General Public License version 2.0 (GNU GPL-2), as published by #
# the Free Software Foundation. #
# We created this package for both routine data analysis and academic #
# research and it was publicly released in the hope that it will be #
# useful, but it comes WITHOUT ANY WARRANTY OR LIABILITY. #
# #
# Visit our website for the full manual and a complete tutorial about #
# how to conduct AMR data analysis: https://amr-for-r.org #
# ==================================================================== #
# these are allowed MIC values and will become factor levels
VALID_MIC_LEVELS <- c(
as.double(paste0("0.000", c(1:9))),
as.double(paste0("0.00", c(1:99, 1953125, 390625, 78125))),
as.double(paste0("0.0", c(1:99, 125, 128, 156, 165, 256, 512, 625, 3125, 15625))),
as.double(paste0("0.", c(1:99, 125, 128, 256, 512))),
1:9, 1.5,
c(10:98)[9:98 %% 2 == TRUE],
2^c(7:12), 192 * c(1:5), 80 * c(2:12)
)
VALID_MIC_LEVELS <- trimws(gsub("[.]?0+$", "", format(unique(sort(VALID_MIC_LEVELS)), scientific = FALSE), perl = TRUE))
operators <- c("<", "<=", "", ">=", ">")
VALID_MIC_LEVELS <- c(t(vapply(
FUN.VALUE = character(length(VALID_MIC_LEVELS)),
c("<", "<=", "", ">=", ">"),
paste0,
VALID_MIC_LEVELS
)))
COMMON_MIC_VALUES <- c(
0.0001, 0.0002, 0.0005,
0.001, 0.002, 0.004, 0.008,
0.016, 0.032, 0.064,
0.125, 0.25, 0.5,
1, 2, 4, 8,
16, 32, 64,
128, 256, 512,
1024, 2048, 4096
)
#' Transform Input to Minimum Inhibitory Concentrations (MIC)
#'
#' This transforms vectors to a new class [`mic`], which treats the input as decimal numbers, while maintaining operators (such as ">=") and only allowing valid MIC values known to the field of (medical) microbiology.
#' @rdname as.mic
#' @param x A [character] or [numeric] vector.
#' @param na.rm A [logical] indicating whether missing values should be removed.
#' @param keep_operators A [character] specifying how to handle operators (such as `>` and `<=`) in the input. Accepts one of three values: `"all"` (or `TRUE`) to keep all operators, `"none"` (or `FALSE`) to remove all operators, or `"edges"` to keep operators only at both ends of the range.
#' @param round_to_next_log2 A [logical] to round up all values to the next log2 level, that are not either `r vector_or(COMMON_MIC_VALUES, quotes = F)`. Values that are already in this list (with or without operators), are left unchanged (including any operators).
#' @param ... Arguments passed on to methods.
#' @details To interpret MIC values as SIR values, use [as.sir()] on MIC values. It supports guidelines from EUCAST (`r min(as.integer(gsub("[^0-9]", "", subset(clinical_breakpoints, guideline %like% "EUCAST")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(clinical_breakpoints, guideline %like% "EUCAST")$guideline)))`) and CLSI (`r min(as.integer(gsub("[^0-9]", "", subset(clinical_breakpoints, guideline %like% "CLSI")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(clinical_breakpoints, guideline %like% "CLSI")$guideline)))`).
#'
#' This class for MIC values is a quite a special data type: formally it is an ordered [factor] with valid MIC values as [factor] levels (to make sure only valid MIC values are retained), but for any mathematical operation it acts as decimal numbers:
#'
#' ```
#' x <- random_mic(10)
#' x
#' #> Class 'mic'
#' #> [1] 16 1 8 8 64 >=128 0.0625 32 32 16
#'
#' is.factor(x)
#' #> [1] TRUE
#'
#' x[1] * 2
#' #> [1] 32
#'
#' median(x)
#' #> [1] 26
#' ```
#'
#' This makes it possible to maintain operators that often come with MIC values, such ">=" and "<=", even when filtering using [numeric] values in data analysis, e.g.:
#'
#' ```
#' x[x > 4]
#' #> Class 'mic'
#' #> [1] 16 8 8 64 >=128 32 32 16
#'
#' df <- data.frame(x, hospital = "A")
#' subset(df, x > 4) # or with dplyr: df %>% filter(x > 4)
#' #> x hospital
#' #> 1 16 A
#' #> 5 64 A
#' #> 6 >=128 A
#' #> 8 32 A
#' #> 9 32 A
#' #> 10 16 A
#' ```
#'
#' All so-called [group generic functions][groupGeneric()] are implemented for the MIC class (such as `!`, `!=`, `<`, `>=`, [exp()], [log2()]). Some mathematical functions are also implemented (such as [quantile()], [median()], [fivenum()]). Since [sd()] and [var()] are non-generic functions, these could not be extended. Use [mad()] as an alternative, or use e.g. `sd(as.numeric(x))` where `x` is your vector of MIC values.
#'
#' Using [as.double()] or [as.numeric()] on MIC values will remove the operators and return a numeric vector. Do **not** use [as.integer()] on MIC values as by the \R convention on [factor]s, it will return the index of the factor levels (which is often useless for regular users).
#'
#' The function [is.mic()] detects if the input contains class `mic`. If the input is a [data.frame] or [list], it iterates over all columns/items and returns a [logical] vector.
#'
#' Use [droplevels()] to drop unused levels. At default, it will return a plain factor. Use `droplevels(..., as.mic = TRUE)` to maintain the `mic` class.
#'
#' With [rescale_mic()], existing MIC ranges can be limited to a defined range of MIC values. This can be useful to better compare MIC distributions.
#'
#' For `ggplot2`, use one of the [`scale_*_mic()`][scale_x_mic()] functions to plot MIC values. They allows custom MIC ranges and to plot intermediate log2 levels for missing MIC values.
#' @return Ordered [factor] with additional class [`mic`], that in mathematical operations acts as a [numeric] vector. Bear in mind that the outcome of any mathematical operation on MICs will return a [numeric] value.
#' @aliases mic
#' @export
#' @seealso [as.sir()]
#' @examples
#' mic_data <- as.mic(c(">=32", "1.0", "1", "1.00", 8, "<=0.128", "8", "16", "16"))
#' mic_data
#' is.mic(mic_data)
#'
#' # this can also coerce combined MIC/SIR values:
#' as.mic("<=0.002; S")
#'
#' # mathematical processing treats MICs as, and returns, numeric values
#' fivenum(mic_data)
#' quantile(mic_data)
#' all(mic_data < 512)
#'
#' # rescale MICs using rescale_mic()
#' rescale_mic(mic_data, mic_range = c(4, 16))
#'
#' # round up to nearest log2 level, e.g. for CLSI breakpoint interpretation:
#' c(1:8)
#' as.mic(c(1:8), round_to_next_log2 = TRUE)
#'
#' # interpret MIC values
#' as.sir(
#' x = as.mic(2),
#' mo = as.mo("Streptococcus pneumoniae"),
#' ab = "AMX",
#' guideline = "EUCAST"
#' )
#' as.sir(
#' x = as.mic(c(0.01, 2, 4, 8)),
#' mo = as.mo("Streptococcus pneumoniae"),
#' ab = "AMX",
#' guideline = "EUCAST"
#' )
#'
#' # plot MIC values, see ?plot
#' plot(mic_data)
#' plot(mic_data, mo = "E. coli", ab = "cipro")
#'
#' if (require("ggplot2")) {
#' autoplot(mic_data, mo = "E. coli", ab = "cipro")
#' }
#' if (require("ggplot2")) {
#' autoplot(mic_data, mo = "E. coli", ab = "cipro", language = "nl") # Dutch
#' }
as.mic <- function(x, na.rm = FALSE, keep_operators = "all", round_to_next_log2 = FALSE) {
meet_criteria(x, allow_NA = TRUE)
meet_criteria(na.rm, allow_class = "logical", has_length = 1)
meet_criteria(keep_operators, allow_class = c("character", "logical"), is_in = c("all", "none", "edges", FALSE, TRUE), has_length = 1)
meet_criteria(round_to_next_log2, allow_class = "logical", has_length = 1)
if (isTRUE(keep_operators)) {
keep_operators <- "all"
} else if (isFALSE(keep_operators)) {
keep_operators <- "none"
}
if (is.mic(x) && (keep_operators == "all" || !any(x %like% "[>=<]", na.rm = TRUE))) {
if (isTRUE(round_to_next_log2)) {
x <- roundup_to_nearest_log2(x)
}
if (!identical(levels(x), VALID_MIC_LEVELS)) {
# might be from an older AMR version - just update MIC factor levels
x <- set_clean_class(factor(as.character(x), levels = VALID_MIC_LEVELS, ordered = TRUE),
new_class = c("mic", "ordered", "factor")
)
}
return(x)
}
x.bak <- NULL
if (is.numeric(x)) {
x.bak <- format(x, scientific = FALSE)
# MICs never have more than 9 decimals, so:
x <- format(round(x, 9), scientific = FALSE)
} else {
x <- as.character(unlist(x))
}
if (isTRUE(na.rm)) {
x <- x[!is.na(x)]
}
x <- trimws2(x)
x[x == ""] <- NA
if (is.null(x.bak)) {
x.bak <- x
}
# remove NAs on beforehand to not count them
x.bak <- gsub("(NA)+", "", x.bak)
# and trim
x.bak <- trimws2(x.bak)
# comma to period
x <- gsub(",", ".", x, fixed = TRUE)
# transform Unicode for >= and <=
x <- gsub("\u2264", "<=", x, fixed = TRUE)
x <- gsub("\u2265", ">=", x, fixed = TRUE)
if (any(x %like% "[0-9]/.*[0-9]", na.rm = TRUE)) {
warning_("Some MICs were combined values, only the first values are kept")
x[x %like% "[0-9]/.*[0-9]"] <- gsub("/.*", "", x[x %like% "[0-9]/.*[0-9]"])
}
x <- trimws2(gsub("[^e\\P{L}]", "", x, perl = TRUE)) # \p{L} is the Unicode category for all letters, including those with diacritics
# remove other invalid characters
x <- gsub("[^0-9e.><= -]+", "", x, perl = TRUE)
# transform => to >= and =< to <=
x <- gsub("=<", "<=", x, fixed = TRUE)
x <- gsub("=>", ">=", x, fixed = TRUE)
# Remove leading == and =
x <- gsub("^=+", "", x)
# retrieve operators and remove them from input
x_operators <- trimws(gsub("[^>=<]", "", x))
x <- trimws(gsub("[>=<]", "", x))
# dots without a leading zero must start with 0
x <- gsub("([^0-9]|^)[.]", "\\10.", x, perl = TRUE)
# values like "<=0.2560.512" should be 0.512
x <- gsub(".*[.].*[.]", "0.", x, perl = TRUE)
# remove ending .0
x <- gsub("[.]+0$", "", x, perl = TRUE)
# remove all after last digit
x <- gsub("[^0-9]+$", "", x, perl = TRUE)
# keep only one zero before dot
x <- gsub("^0+[.]", "0.", x, perl = TRUE)
# starting 00 is probably 0.0 if there's no dot yet
x[x %unlike% "[.]"] <- gsub("^00", "0.0", x[!x %like% "[.]"])
# remove last zeroes
x <- gsub("([.].?)0+$", "\\1", x, perl = TRUE)
x <- gsub("(.*[.])0+$", "\\10", x, perl = TRUE)
# remove ending .0 again
x[x %like% "[.]"] <- gsub("0+$", "", x[x %like% "[.]"])
# never end with dot
x <- gsub("[.]$", "", x, perl = TRUE)
# remove scientific notation
x[x %like% "[0-9]e[-]?[0-9]"] <- trimws(format(suppressWarnings(as.double(x[x %like% "[0-9]e[-]?[0-9]"])), scientific = FALSE))
# add operators again
x <- paste0(x_operators, x)
# remove NAs introduced by format()
x <- gsub("(NA)+", "", x)
# trim it
x <- trimws2(x)
## previously unempty values now empty - should return a warning later on
x[x.bak != "" & x == ""] <- "invalid"
na_before <- x[is.na(x) | x == ""] %pm>% length()
x[!as.character(x) %in% VALID_MIC_LEVELS] <- NA
na_after <- x[is.na(x) | x == ""] %pm>% length()
if (na_before != na_after) {
list_missing <- x.bak[is.na(x) & !is.na(x.bak) & x.bak != ""] %pm>%
unique() %pm>%
sort() %pm>%
vector_and(quotes = TRUE)
cur_col <- get_current_column()
warning_("in {.fun as.mic}: ", na_after - na_before, " result",
ifelse(na_after - na_before > 1, "s", ""),
ifelse(is.null(cur_col), "", paste0(" in column '", cur_col, "'")),
" truncated (",
round(((na_after - na_before) / length(x)) * 100),
"%) that were invalid MICs: ",
list_missing,
call = FALSE
)
}
if (keep_operators == "none" && !all(is.na(x))) {
x <- gsub("[>=<]", "", x)
} else if (keep_operators == "edges" && !all(is.na(x))) {
dbls <- as.double(gsub("[>=<]", "", x))
x[dbls == min(dbls, na.rm = TRUE)] <- paste0("<=", min(dbls, na.rm = TRUE))
x[dbls == max(dbls, na.rm = TRUE)] <- paste0(">=", max(dbls, na.rm = TRUE))
keep <- x[dbls == max(dbls, na.rm = TRUE) | dbls == min(dbls, na.rm = TRUE)]
x[!x %in% keep] <- gsub("[>=<]", "", x[!x %in% keep])
}
if (isTRUE(round_to_next_log2)) {
x <- roundup_to_nearest_log2(x)
}
set_clean_class(factor(x, levels = VALID_MIC_LEVELS, ordered = TRUE),
new_class = c("mic", "ordered", "factor")
)
}
#' @rdname as.mic
#' @export
is.mic <- function(x) {
if (identical(typeof(x), "list")) {
unname(vapply(FUN.VALUE = logical(1), x, is.mic))
} else {
isTRUE(inherits(x, "mic"))
}
}
#' @rdname as.mic
#' @details `NA_mic_` is a missing value of the new `mic` class, analogous to e.g. base \R's [`NA_character_`][base::NA].
#' @format NULL
#' @export
NA_mic_ <- set_clean_class(factor(NA, levels = VALID_MIC_LEVELS, ordered = TRUE),
new_class = c("mic", "ordered", "factor")
)
#' @rdname as.mic
#' @param mic_range A manual range to rescale the MIC values, e.g., `mic_range = c(0.001, 32)`. Use `NA` to prevent rescaling on one side, e.g., `mic_range = c(NA, 32)`.
#' @export
rescale_mic <- function(x, mic_range, keep_operators = "edges", as.mic = TRUE, round_to_next_log2 = FALSE) {
meet_criteria(mic_range, allow_class = c("numeric", "integer", "logical", "mic"), has_length = 2, allow_NA = TRUE, allow_NULL = TRUE)
if (is.numeric(mic_range)) {
mic_range <- trimws(format(mic_range, scientific = FALSE))
mic_range <- gsub("[.]0+$", "", mic_range)
mic_range[mic_range == "NA"] <- NA_character_
} else if (is.mic(mic_range)) {
mic_range <- as.character(mic_range)
}
stop_ifnot(
all(mic_range %in% c(VALID_MIC_LEVELS, NA)),
"Values in {.arg mic_range} must be valid MIC values. ",
"The allowed range is ", format(as.double(as.mic(VALID_MIC_LEVELS)[1]), scientific = FALSE), " to ", format(as.double(as.mic(VALID_MIC_LEVELS)[length(VALID_MIC_LEVELS)]), scientific = FALSE), ". ",
"Unvalid: ", vector_and(mic_range[!mic_range %in% c(VALID_MIC_LEVELS, NA)], quotes = FALSE), "."
)
x <- as.mic(x)
if (is.null(mic_range)) {
mic_range <- c(NA, NA)
}
mic_range <- as.mic(mic_range)
min_mic <- mic_range[1]
max_mic <- mic_range[2]
if (!is.na(min_mic)) {
x[x < min_mic] <- min_mic
}
if (!is.na(max_mic)) {
x[x > max_mic] <- max_mic
}
x <- as.mic(x, keep_operators = ifelse(keep_operators == "edges", "none", keep_operators), round_to_next_log2 = round_to_next_log2)
if (isTRUE(as.mic)) {
if (keep_operators == "edges" && length(unique(x)) > 1) {
x[x == min(x, na.rm = TRUE)] <- paste0("<=", x[x == min(x, na.rm = TRUE)])
x[x == max(x, na.rm = TRUE)] <- paste0(">=", x[x == max(x, na.rm = TRUE)])
}
return(x)
}
# create a manual factor with levels only within desired range
expanded <- plotrange_as_table(x,
expand = TRUE,
keep_operators = ifelse(keep_operators == "edges", "none", keep_operators),
mic_range = mic_range
)
if (keep_operators == "edges") {
names(expanded)[1] <- paste0("<=", names(expanded)[1])
names(expanded)[length(expanded)] <- paste0(">=", names(expanded)[length(expanded)])
}
# MICs contain all MIC levels, so strip this to only existing levels and their intermediate values
out <- factor(names(expanded),
levels = names(expanded),
ordered = TRUE
)
# and only keep the ones in the data
if (keep_operators == "edges") {
out <- out[match(x, as.double(as.mic(out, keep_operators = "all")))]
} else {
out <- out[match(x, out)]
}
out
}
#' @rdname as.mic
#' @details Use [mic_p50()] and [mic_p90()] to get the 50th and 90th percentile of MIC values. They return 'normal' [numeric] values.
#' @export
mic_p50 <- function(x, na.rm = FALSE, ...) {
x <- as.mic(x)
as.double(stats::quantile(x, probs = 0.5, na.rm = na.rm))
}
#' @rdname as.mic
#' @export
mic_p90 <- function(x, na.rm = FALSE, ...) {
x <- as.mic(x)
as.double(stats::quantile(x, probs = 0.9, na.rm = na.rm))
}
#' @method as.double mic
#' @export
#' @noRd
as.double.mic <- function(x, ...) {
as.double(gsub("[<=>]+", "", as.character(x), perl = TRUE))
}
#' @method as.numeric mic
#' @export
#' @noRd
as.numeric.mic <- function(x, ...) {
as.numeric(gsub("[<=>]+", "", as.character(x), perl = TRUE))
}
#' @rdname as.mic
#' @method droplevels mic
#' @param as.mic A [logical] to indicate whether the `mic` class should be kept - the default is `TRUE` for [rescale_mic()] and `FALSE` for [droplevels()]. When setting this to `FALSE` in [rescale_mic()], the output will have factor levels that acknowledge `mic_range`.
#' @export
droplevels.mic <- function(x, as.mic = FALSE, ...) {
x <- as.mic(x) # make sure that currently implemented MIC levels are used
x <- droplevels.factor(x, ...)
if (as.mic == TRUE) {
class(x) <- c("mic", "ordered", "factor")
}
x
}
all_valid_mics <- function(x) {
if (!inherits(x, c("mic", "character", "factor", "numeric", "integer"))) {
return(FALSE)
}
x_mic <- tryCatch(suppressWarnings(as.mic(x[!is.na(x)])),
error = function(e) NA
)
!any(is.na(x_mic)) && !all(is.na(x))
}
# this prevents the requirement for putting the dependency in Imports:
#' @rawNamespace if(getRversion() >= "3.0.0") S3method(pillar::pillar_shaft, mic)
pillar_shaft.mic <- function(x, ...) {
if (!identical(levels(x), VALID_MIC_LEVELS) && message_not_thrown_before("pillar_shaft.mic")) {
warning_(AMR_env$sup_1_icon, " These columns contain an outdated or altered structure - convert with {.fun as.mic} to update",
call = FALSE
)
}
crude_numbers <- as.double(x)
operators <- gsub("[^<=>]+", "", as.character(x))
# colourise operators
operators[!is.na(operators) & operators != ""] <- font_silver(operators[!is.na(operators) & operators != ""], collapse = NULL)
out <- trimws(paste0(operators, trimws(format(crude_numbers))))
out[is.na(x)] <- font_na(NA)
# make trailing zeroes less visible
if (is_dark()) {
fn <- font_silver
} else {
fn <- font_white
}
out[out %like% "[.]"] <- gsub("([.]?0+)$", fn("\\1"), out[out %like% "[.]"], perl = TRUE)
create_pillar_column(out, align = "right", width = max(nchar(font_stripstyle(out))))
}
# this prevents the requirement for putting the dependency in Imports:
#' @rawNamespace if(getRversion() >= "3.0.0") S3method(pillar::type_sum, mic)
type_sum.mic <- function(x, ...) {
if (!identical(levels(x), VALID_MIC_LEVELS)) {
paste0("mic", AMR_env$sup_1_icon)
} else {
"mic"
}
}
#' @method print mic
#' @export
#' @noRd
print.mic <- function(x, ...) {
cat("Class 'mic'")
if (!identical(levels(x), VALID_MIC_LEVELS)) {
cat(font_red(" with an outdated or altered structure - convert with `as.mic()` to update"))
}
cat("\n")
print(as.character(x), quote = FALSE)
att <- attributes(x)
if ("na.action" %in% names(att)) {
cat(font_silver(paste0("(NA ", class(att$na.action), ": ", paste0(att$na.action, collapse = ", "), ")\n")))
}
}
#' @method summary mic
#' @export
#' @noRd
summary.mic <- function(object, ...) {
summary(as.double(object), ...)
}
#' @method as.matrix mic
#' @export
#' @noRd
as.matrix.mic <- function(x, ...) {
as.matrix(as.double(x), ...)
}
#' @method as.vector mic
#' @export
#' @noRd
as.vector.mic <- function(x, mode = "numneric", ...) {
y <- NextMethod()
y <- as.mic(y)
calls <- unlist(lapply(sys.calls(), as.character))
if (any(calls %in% c("rbind", "cbind")) && message_not_thrown_before("as.vector.mic")) {
warning_("Functions {.fun rbind} and {.fun cbind} cannot preserve the structure of MIC values. Use {.pkg dplyr}'s {.fun bind_rows} or {.fun bind_cols} instead.", call = FALSE)
}
y
}
#' @method as.list mic
#' @export
#' @noRd
as.list.mic <- function(x, ...) {
lapply(as.list(as.character(x), ...), as.mic)
}
#' @method as.data.frame mic
#' @export
#' @noRd
as.data.frame.mic <- function(x, ...) {
as.data.frame.vector(as.mic(x), ...)
}
#' @method [ mic
#' @export
#' @noRd
"[.mic" <- function(x, ...) {
y <- NextMethod()
as.mic(y)
}
#' @method [[ mic
#' @export
#' @noRd
"[[.mic" <- function(x, ...) {
y <- NextMethod()
as.mic(y)
}
#' @method [<- mic
#' @export
#' @noRd
"[<-.mic" <- function(i, j, ..., value) {
value <- as.mic(value)
y <- NextMethod()
as.mic(y)
}
#' @method [[<- mic
#' @export
#' @noRd
"[[<-.mic" <- function(i, j, ..., value) {
value <- as.mic(value)
y <- NextMethod()
as.mic(y)
}
#' @method c mic
#' @export
#' @noRd
c.mic <- function(...) {
as.mic(unlist(lapply(list(...), as.character)))
}
#' @method unique mic
#' @export
#' @noRd
unique.mic <- function(x, incomparables = FALSE, ...) {
y <- NextMethod()
as.mic(y)
}
#' @method rep mic
#' @export
#' @noRd
rep.mic <- function(x, ...) {
y <- NextMethod()
as.mic(y)
}
#' @method sort mic
#' @export
#' @noRd
sort.mic <- function(x, decreasing = FALSE, ...) {
x <- as.mic(x) # make sure that currently implemented MIC levels are used
dbl <- as.double(x)
# make sure that e.g. '<0.001' comes before '0.001', and '>0.001' comes after
dbl[as.character(x) %like% "<[0-9]"] <- dbl[as.character(x) %like% "<[0-9]"] - 0.000002
dbl[as.character(x) %like% "<="] <- dbl[as.character(x) %like% "<="] - 0.000001
dbl[as.character(x) %like% ">="] <- dbl[as.character(x) %like% ">="] + 0.000001
dbl[as.character(x) %like% ">[0-9]"] <- dbl[as.character(x) %like% ">[0-9]"] + 0.000002
if (decreasing == TRUE) {
x[order(-dbl)]
} else {
x[order(dbl)]
}
}
#' @method hist mic
#' @importFrom graphics hist
#' @export
#' @noRd
hist.mic <- function(x, ...) {
warning_("in {.fun hist}: use {.fun plot} or {.pkg ggplot2}'s {.fun autoplot} for optimal plotting of MIC values")
hist(log2(x))
}
# this prevents the requirement for putting the dependency in Imports:
#' @rawNamespace if(getRversion() >= "3.0.0") S3method(skimr::get_skimmers, mic)
get_skimmers.mic <- function(column) {
column <- as.mic(column) # make sure that currently implemented MIC levels are used
skimr::sfl(
skim_type = "mic",
p0 = ~ stats::quantile(column, probs = 0, na.rm = TRUE, names = FALSE),
p25 = ~ stats::quantile(column, probs = 0.25, na.rm = TRUE, names = FALSE),
p50 = ~ stats::quantile(column, probs = 0.5, na.rm = TRUE, names = FALSE),
p75 = ~ stats::quantile(column, probs = 0.75, na.rm = TRUE, names = FALSE),
p100 = ~ stats::quantile(column, probs = 1, na.rm = TRUE, names = FALSE),
hist = ~ skimr::inline_hist(log2(stats::na.omit(column)), 10)
)
}
roundup_to_nearest_log2 <- function(x) {
x_dbl <- suppressWarnings(as.double(gsub("[>=<]", "", x)))
x_new <- vapply(
FUN.VALUE = double(1),
x_dbl,
function(val) {
if (is.na(val)) {
NA_real_
} else {
COMMON_MIC_VALUES[which(COMMON_MIC_VALUES >= val)][1]
}
}
)
x[!x_dbl %in% COMMON_MIC_VALUES] <- x_new[!x_dbl %in% COMMON_MIC_VALUES]
x
}
# Miscellaneous mathematical functions ------------------------------------
#' @method mean mic
#' @export
#' @noRd
mean.mic <- function(x, trim = 0, na.rm = FALSE, ...) {
mean(as.double(x), trim = trim, na.rm = na.rm, ...)
}
#' @method median mic
#' @importFrom stats median
#' @export
#' @noRd
median.mic <- function(x, na.rm = FALSE, ...) {
median(as.double(x), na.rm = na.rm, ...)
}
#' @method quantile mic
#' @importFrom stats quantile
#' @export
#' @noRd
quantile.mic <- function(x, probs = seq(0, 1, 0.25), na.rm = FALSE,
names = TRUE, type = 7, ...) {
quantile(as.double(x), probs = probs, na.rm = na.rm, names = names, type = type, ...)
}
# Math (see ?groupGeneric) ------------------------------------------------
#' @export
Math.mic <- function(x, ...) {
x <- as.double(x)
# set class to numeric, because otherwise NextMethod will be factor (since mic is a factor)
.Class <- class(x)
NextMethod(.Generic)
}
# Ops (see ?groupGeneric) -------------------------------------------------
#' @export
Ops.mic <- function(e1, e2) {
e1_chr <- as.character(e1)
e2_chr <- character(0)
e1 <- as.double(e1)
if (!missing(e2)) {
# when .Generic is `!`, e2 is missing
e2_chr <- as.character(e2)
e2 <- as.double(e2)
}
if (as.character(.Generic) %in% c("<", "<=", "==", "!=", ">", ">=")) {
# make sure that <0.002 is lower than 0.002
# and that >32 is higher than 32, but equal to >=32
e1[e1_chr %like% "<" & e1_chr %unlike% "="] <- e1[e1_chr %like% "<" & e1_chr %unlike% "="] - 0.000001
e1[e1_chr %like% ">" & e1_chr %unlike% "="] <- e1[e1_chr %like% ">" & e1_chr %unlike% "="] + 0.000001
e2[e2_chr %like% "<" & e2_chr %unlike% "="] <- e2[e2_chr %like% "<" & e2_chr %unlike% "="] - 0.000001
e2[e2_chr %like% ">" & e2_chr %unlike% "="] <- e2[e2_chr %like% ">" & e2_chr %unlike% "="] + 0.000001
}
# set .Class to numeric, because otherwise NextMethod will be factor (since mic is a factor)
.Class <- class(e1)
NextMethod(.Generic)
}
# Complex (see ?groupGeneric) ---------------------------------------------
#' @export
Complex.mic <- function(z) {
z <- as.double(z)
# set class to numeric, because otherwise NextMethod will be factor (since mic is a factor)
.Class <- class(z)
NextMethod(.Generic)
}
# Summary (see ?groupGeneric) ---------------------------------------------
#' @export
Summary.mic <- function(..., na.rm = FALSE) {
# NextMethod() cannot be called from an anonymous function (`...`), so we get() the generic directly:
fn <- get(.Generic, envir = .GenericCallEnv)
fn(as.double(c(...)),
na.rm = na.rm
)
}