Add add_if_missing parameter to control NA handling in interpretive rules (#264)

2026-04-28 10:23:53 +02:00 · 2026-04-21 21:53:43 +02:00
parent fb8758f36b
commit 8ff5d4472a
46 changed files with 1232 additions and 1016 deletions
--- a/R/aa_helper_functions.R
+++ b/R/aa_helper_functions.R
@@ -766,7 +766,7 @@ vector_or <- function(v, quotes = TRUE, reverse = FALSE, sort = TRUE, initial_ca
  }
  if (isTRUE(quotes)) {
    if (isTRUE(documentation)) {
-      quotes <- '"'
+      quotes <- c("`\"", "\"`")
    } else {
      # use cli to format as values
      quotes <- c("{.val ", "}")
--- a/R/aa_options.R
+++ b/R/aa_options.R
@@ -35,7 +35,7 @@
 #' `options(AMR_guideline = "CLSI")`
 #' @section Options (alphabetical order):
 #' * `AMR_antibiogram_formatting_type` \cr A [numeric] (1-22) to use in [antibiogram()], to indicate which formatting type to use.
-#' * `AMR_breakpoint_type` \cr A [character] to use in [as.sir()], to indicate which breakpoint type to use. This must be either `r vector_or(clinical_breakpoints$type)`.
+#' * `AMR_breakpoint_type` \cr A [character] to use in [as.sir()], to indicate which breakpoint type to use. This must be either `r vector_or(clinical_breakpoints$type, documentation = TRUE)`.
 #' * `AMR_capped_mic_handling` \cr A [character] to use in [as.sir()], to indicate how capped MIC values (`<`, `<=`, `>`, `>=`) should be interpreted. Must be one of `"none"`, `"conservative"`, `"standard"`, or `"lenient"` - the default is `"conservative"`.
 #' * `AMR_cleaning_regex` \cr A [regular expression][base::regex] (case-insensitive) to use in [as.mo()] and all [`mo_*`][mo_property()] functions, to clean the user input. The default is the outcome of [mo_cleaning_regex()], which removes texts between brackets and texts such as "species" and "serovar".
 #' * `AMR_custom_ab` \cr A file location to an RDS file, to use custom antimicrobial drugs with this package. This is explained in [add_custom_antimicrobials()].
--- a/R/ab_property.R
+++ b/R/ab_property.R
@@ -32,7 +32,7 @@
 #' Use these functions to return a specific property of an antibiotic from the [antimicrobials] data set. All input values will be evaluated internally with [as.ab()].
 #' @param x Any (vector of) text that can be coerced to a valid antibiotic drug code with [as.ab()].
 #' @param tolower A [logical] to indicate whether the first [character] of every output should be transformed to a lower case [character]. This will lead to e.g. "polymyxin B" and not "polymyxin b".
-#' @param property One of the column names of one of the [antimicrobials] data set: `vector_or(colnames(antimicrobials), sort = FALSE)`.
+#' @param property One of the column names of one of the [antimicrobials] data set: `r vector_or(colnames(antimicrobials), documentation = TRUE, sort = FALSE)`.
 #' @param language Language of the returned text - the default is the current system language (see [get_AMR_locale()]) and can also be set with the package option [`AMR_locale`][AMR-options]. Use `language = NULL` or `language = ""` to prevent translation.
 #' @param administration Way of administration, either `"oral"` or `"iv"`.
 #' @param open Browse the URL using [utils::browseURL()].
--- a/R/antibiogram.R
+++ b/R/antibiogram.R
@@ -48,8 +48,8 @@
 #'     - `carbapenems() + "GEN"`
 #'     - `carbapenems() + c("", "GEN")`
 #'     - `carbapenems() + c("", aminoglycosides())`
-#' @param mo_transform A character to transform microorganism input - must be `"name"`, `"shortname"` (default), `"gramstain"`, or one of the column names of the [microorganisms] data set: `r vector_or(colnames(microorganisms), sort = FALSE, quotes = TRUE)`. Can also be `NULL` to not transform the input or `NA` to consider all microorganisms 'unknown'.
-#' @param ab_transform A character to transform antimicrobial input - must be one of the column names of the [antimicrobials] data set (defaults to `"name"`): `r vector_or(colnames(antimicrobials), sort = FALSE, quotes = TRUE)`. Can also be `NULL` to not transform the input.
+#' @param mo_transform A character to transform microorganism input - must be `"name"`, `"shortname"` (default), `"gramstain"`, or one of the column names of the [microorganisms] data set: `r vector_or(colnames(microorganisms), sort = FALSE, documentation = TRUE)`. Can also be `NULL` to not transform the input or `NA` to consider all microorganisms 'unknown'.
+#' @param ab_transform A character to transform antimicrobial input - must be one of the column names of the [antimicrobials] data set (defaults to `"name"`): `r vector_or(colnames(antimicrobials), sort = FALSE, documentation = TRUE)`. Can also be `NULL` to not transform the input.
 #' @param syndromic_group A column name of `x`, or values calculated to split rows of `x`, e.g. by using [ifelse()] or [`case_when()`][dplyr::case_when()]. See *Examples*.
 #' @param add_total_n *(deprecated in favour of `formatting_type`)* A [logical] to indicate whether `n_tested` available numbers per pathogen should be added to the table (default is `TRUE`). This will add the lowest and highest number of available isolates per antimicrobial (e.g, if for *E. coli* 200 isolates are available for ciprofloxacin and 150 for amoxicillin, the returned number will be "150-200"). This option is unavailable when `wisca = TRUE`; in that case, use [retrieve_wisca_parameters()] to get the parameters used for WISCA.
 #' @param only_all_tested (for combination antibiograms): a [logical] to indicate that isolates must be tested for all antimicrobials, see *Details*.
--- a/R/av_property.R
+++ b/R/av_property.R
@@ -32,7 +32,7 @@
 #' Use these functions to return a specific property of an antiviral drug from the [antivirals] data set. All input values will be evaluated internally with [as.av()].
 #' @param x Any (vector of) text that can be coerced to a valid antiviral drug code with [as.av()].
 #' @param tolower A [logical] to indicate whether the first [character] of every output should be transformed to a lower case [character].
-#' @param property One of the column names of one of the [antivirals] data set: `vector_or(colnames(antivirals), sort = FALSE)`.
+#' @param property One of the column names of one of the [antivirals] data set: `r vector_or(colnames(antivirals), documentation = TRUE, sort = FALSE)`.
 #' @param language Language of the returned text - the default is system language (see [get_AMR_locale()]) and can also be set with the package option [`AMR_locale`][AMR-options]. Use `language = NULL` or `language = ""` to prevent translation.
 #' @param administration Way of administration, either `"oral"` or `"iv"`.
 #' @param open Browse the URL using [utils::browseURL()].
--- a/R/data.R
+++ b/R/data.R
@@ -106,12 +106,12 @@
 #' @format A [tibble][tibble::tibble] with `r format(nrow(microorganisms), big.mark = " ")` observations and `r ncol(microorganisms)` variables:
 #' - `mo`\cr ID of microorganism as used by this package. ***This is a unique identifier.***
 #' - `fullname`\cr Full name, like `"Escherichia coli"`. For the taxonomic ranks genus, species and subspecies, this is the 'pasted' text of genus, species, and subspecies. For all taxonomic ranks higher than genus, this is the name of the taxon. ***This is a unique identifier.***
-#' - `status` \cr Status of the taxon, either `r vector_or(microorganisms$status)`
+#' - `status` \cr Status of the taxon, either `r vector_or(microorganisms$status, documentation = TRUE)`
 #' - `kingdom`, `phylum`, `class`, `order`, `family`, `genus`, `species`, `subspecies`\cr Taxonomic rank of the microorganism. Note that for fungi, *phylum* is equal to their taxonomic *division*. Also, for fungi, *subkingdom* and *subdivision* were left out since they do not occur in the bacterial taxonomy.
 #' - `rank`\cr Text of the taxonomic rank of the microorganism, such as `"species"` or `"genus"`
 #' - `ref`\cr Author(s) and year of related scientific publication. This contains only the *first surname* and year of the *latest* authors, e.g. "Wallis *et al.* 2006 *emend.* Smith and Jones 2018" becomes "Smith *et al.*, 2018". This field is directly retrieved from the source specified in the column `source`. Moreover, accents were removed to comply with CRAN that only allows ASCII characters.
-#' - `oxygen_tolerance` \cr Oxygen tolerance, either `r vector_or(microorganisms$oxygen_tolerance)`. These data were retrieved from BacDive (see *Source*). Items that contain "likely" are missing from BacDive and were extrapolated from other species within the same genus to guess the oxygen tolerance. Currently `r round(length(microorganisms$oxygen_tolerance[which(!is.na(microorganisms$oxygen_tolerance))]) / nrow(microorganisms[which(microorganisms$kingdom == "Bacteria"), ]) * 100, 1)`% of all `r format_included_data_number(nrow(microorganisms[which(microorganisms$kingdom == "Bacteria"), ]))` bacteria in the data set contain an oxygen tolerance.
-#' - `source`\cr Either `r vector_or(microorganisms$source)` (see *Source*)
+#' - `oxygen_tolerance` \cr Oxygen tolerance, either `r vector_or(microorganisms$oxygen_tolerance, documentation = TRUE)`. These data were retrieved from BacDive (see *Source*). Items that contain "likely" are missing from BacDive and were extrapolated from other species within the same genus to guess the oxygen tolerance. Currently `r round(length(microorganisms$oxygen_tolerance[which(!is.na(microorganisms$oxygen_tolerance))]) / nrow(microorganisms[which(microorganisms$kingdom == "Bacteria"), ]) * 100, 1)`% of all `r format_included_data_number(nrow(microorganisms[which(microorganisms$kingdom == "Bacteria"), ]))` bacteria in the data set contain an oxygen tolerance.
+#' - `source`\cr Either `r vector_or(microorganisms$source, documentation = TRUE)` (see *Source*)
 #' - `lpsn`\cr Identifier ('Record number') of `r TAXONOMY_VERSION$LPSN$name`. This will be the first/highest LPSN identifier to keep one identifier per row. For example, *Acetobacter ascendens* has LPSN Record number 7864 and 11011. Only the first is available in the `microorganisms` data set. ***This is a unique identifier***, though available for only `r format_included_data_number(sum(!is.na(microorganisms$lpsn)))` records.
 #' - `lpsn_parent`\cr LPSN identifier of the parent taxon
 #' - `lpsn_renamed_to`\cr LPSN identifier of the currently valid taxon
@@ -222,8 +222,8 @@
 #' - `date`\cr Date of receipt at the laboratory
 #' - `patient`\cr ID of the patient
 #' - `age`\cr Age of the patient
-#' - `gender`\cr Gender of the patient, either `r vector_or(example_isolates$gender)`
-#' - `ward`\cr Ward type where the patient was admitted, either `r vector_or(example_isolates$ward)`
+#' - `gender`\cr Gender of the patient, either `r vector_or(example_isolates$gender, documentation = TRUE)`
+#' - `ward`\cr Ward type where the patient was admitted, either `r vector_or(example_isolates$ward, documentation = TRUE)`
 #' - `mo`\cr ID of microorganism created with [as.mo()], see also the [microorganisms] data set
 #' - `PEN:RIF`\cr `r sum(vapply(FUN.VALUE = logical(1), example_isolates, is.sir))` different antimicrobials with class [`sir`] (see [as.sir()]); these column names occur in the [antimicrobials] data set and can be translated with [set_ab_names()] or [ab_name()]
 #' @inheritSection AMR Download Our Reference Data
@@ -292,9 +292,9 @@
 #' Use [as.sir()] to transform MICs or disks measurements to SIR values.
 #' @format A [tibble][tibble::tibble] with `r format(nrow(clinical_breakpoints), big.mark = " ")` observations and `r ncol(clinical_breakpoints)` variables:
 #' - `guideline`\cr Name of the guideline
-#' - `type`\cr Breakpoint type, either `r vector_or(clinical_breakpoints$type)`
-#' - `host`\cr Host of infectious agent. This is mostly useful for veterinary breakpoints and is either `r vector_or(clinical_breakpoints$host)`
-#' - `method`\cr Testing method, either `r vector_or(clinical_breakpoints$method)`
+#' - `type`\cr Breakpoint type, either `r vector_or(clinical_breakpoints$type, documentation = TRUE)`
+#' - `host`\cr Host of infectious agent. This is mostly useful for veterinary breakpoints and is either `r vector_or(clinical_breakpoints$host, documentation = TRUE)`
+#' - `method`\cr Testing method, either `r vector_or(clinical_breakpoints$method, documentation = TRUE)`
 #' - `site`\cr Body site for which the breakpoint must be applied, e.g. "Oral" or "Respiratory"
 #' - `mo`\cr Microbial ID, see [as.mo()]
 #' - `rank_index`\cr Taxonomic rank index of `mo` from 1 (subspecies/infraspecies) to 5 (unknown microorganism)
@@ -307,7 +307,7 @@
 #' - `is_SDD`\cr A [logical] value (`TRUE`/`FALSE`) to indicate whether the intermediate range between "S" and "R" should be interpreted as "SDD", instead of "I". This currently applies to `r sum(clinical_breakpoints$is_SDD)` breakpoints.
 #' @details
 #' ### Different Types of Breakpoints
-#' Supported types of breakpoints are `r vector_and(clinical_breakpoints$type, quote = FALSE)`. ECOFF (Epidemiological cut-off) values are used in antimicrobial susceptibility testing to differentiate between wild-type and non-wild-type strains of bacteria or fungi.
+#' Supported types of breakpoints are `r vector_and(clinical_breakpoints$type, quotes = FALSE)`. ECOFF (Epidemiological cut-off) values are used in antimicrobial susceptibility testing to differentiate between wild-type and non-wild-type strains of bacteria or fungi.
 #'
 #' The default is `"human"`, which can also be set with the package option [`AMR_breakpoint_type`][AMR-options]. Use [`as.sir(..., breakpoint_type = ...)`][as.sir()] to interpret raw data using a specific breakpoint type, e.g. `as.sir(..., breakpoint_type = "ECOFF")` to use ECOFFs.
 #'
@@ -350,10 +350,10 @@
 #' @format A [tibble][tibble::tibble] with `r format(nrow(dosage), big.mark = " ")` observations and `r ncol(dosage)` variables:
 #' - `ab`\cr Antimicrobial ID as used in this package (such as `AMC`), using the official EARS-Net (European Antimicrobial Resistance Surveillance Network) codes where available
 #' - `name`\cr Official name of the antimicrobial drug as used by WHONET/EARS-Net or the WHO
-#' - `type`\cr Type of the dosage, either `r vector_or(dosage$type)`
+#' - `type`\cr Type of the dosage, either `r vector_or(dosage$type, documentation = TRUE)`
 #' - `dose`\cr Dose, such as "2 g" or "25 mg/kg"
 #' - `dose_times`\cr Number of times a dose must be administered
-#' - `administration`\cr Route of administration, either `r vector_or(dosage$administration)`
+#' - `administration`\cr Route of administration, either `r vector_or(dosage$administration, documentation = TRUE)`
 #' - `notes`\cr Additional dosage notes
 #' - `original_txt`\cr Original text in the PDF file of EUCAST
 #' - `eucast_version`\cr Version number of the EUCAST Clinical Breakpoints guideline to which these dosages apply, either `r vector_or(dosage$eucast_version, quotes = FALSE, sort = TRUE, reverse = TRUE)`
--- a/R/interpretive_rules.R
+++ b/R/interpretive_rules.R
@@ -64,16 +64,17 @@ format_eucast_version_nr <- function(version, markdown = TRUE) {
 #' @param guideline A guideline name, either "EUCAST" (default) or "CLSI". This can be set with the package option [`AMR_guideline`][AMR-options].
 #' @param rules A [character] vector that specifies which rules should be applied. Must be one or more of `"breakpoints"`, `"expected_phenotypes"`, `"expert"`, `"other"`, `"custom"`, `"all"`, and defaults to `c("breakpoints", "expected_phenotypes")`. The default value can be set to another value using the package option [`AMR_interpretive_rules`][AMR-options]: `options(AMR_interpretive_rules = "all")`. If using `"custom"`, be sure to fill in argument `custom_rules` too. Custom rules can be created with [custom_eucast_rules()].
 #' @param verbose A [logical] to turn Verbose mode on and off (default is off). In Verbose mode, the function does not apply rules to the data, but instead returns a data set in logbook form with extensive info about which rows and columns would be effected and in which way. Using Verbose mode takes a lot more time.
-#' @param version_breakpoints The version number to use for the EUCAST Clinical Breakpoints guideline. Can be `r vector_or(names(EUCAST_VERSION_BREAKPOINTS), reverse = TRUE)`.
-#' @param version_expected_phenotypes The version number to use for the EUCAST Expected Phenotypes. Can be `r vector_or(names(EUCAST_VERSION_EXPECTED_PHENOTYPES), reverse = TRUE)`.
-#' @param version_expertrules The version number to use for the EUCAST Expert Rules and Intrinsic Resistance guideline. Can be `r vector_or(names(EUCAST_VERSION_EXPERT_RULES), reverse = TRUE)`.
+#' @param version_breakpoints The version number to use for the EUCAST Clinical Breakpoints guideline. Can be `r vector_or(names(EUCAST_VERSION_BREAKPOINTS), documentation = TRUE, reverse = TRUE)`.
+#' @param version_expected_phenotypes The version number to use for the EUCAST Expected Phenotypes. Can be `r vector_or(names(EUCAST_VERSION_EXPECTED_PHENOTYPES), documentation = TRUE, reverse = TRUE)`.
+#' @param version_expertrules The version number to use for the EUCAST Expert Rules and Intrinsic Resistance guideline. Can be `r vector_or(names(EUCAST_VERSION_EXPERT_RULES), documentation = TRUE, reverse = TRUE)`.
 #' @param ampc_cephalosporin_resistance (only applies when `rules` contains `"expert"` or `"all"`) a [character] value that should be applied to cefotaxime, ceftriaxone and ceftazidime for AmpC de-repressed cephalosporin-resistant mutants - the default is `NA`. Currently only works when `version_expertrules` is `3.2` and higher; these versions of '*EUCAST Expert Rules on Enterobacterales*' state that results of cefotaxime, ceftriaxone and ceftazidime should be reported with a note, or results should be suppressed (emptied) for these three drugs. A value of `NA` (the default) for this argument will remove results for these three drugs, while e.g. a value of `"R"` will make the results for these drugs resistant. Use `NULL` or `FALSE` to not alter results for these three drugs of AmpC de-repressed cephalosporin-resistant mutants. Using `TRUE` is equal to using `"R"`. \cr For *EUCAST Expert Rules* v3.2, this rule applies to: `r vector_and(gsub("[^a-zA-Z ]+", "", unlist(strsplit(EUCAST_RULES_DF[which(EUCAST_RULES_DF$reference.version %in% c(3.2, 3.3) & EUCAST_RULES_DF$reference.rule %like% "ampc"), "this_value"][1], "|", fixed = TRUE))), quotes = "*")`.
 #' @param ... Column names of antimicrobials. To automatically detect antimicrobial column names, do not provide any named arguments; [guess_ab_col()] will then be used for detection. To manually specify a column, provide its name (case-insensitive) as an argument, e.g. `AMX = "amoxicillin"`. To skip a specific antimicrobial, set it to `NULL`, e.g. `TIC = NULL` to exclude ticarcillin. If a manually defined column does not exist in the data, it will be skipped with a warning.
 #' @param ab Any (vector of) text that can be coerced to a valid antimicrobial drug code with [as.ab()].
-#' @param administration Route of administration, either `r vector_or(dosage$administration)`.
+#' @param administration Route of administration, either `r vector_or(dosage$administration, documentation = TRUE)`.
 #' @param only_sir_columns A [logical] to indicate whether only antimicrobial columns must be included that were transformed to class [sir][as.sir()] on beforehand. Defaults to `FALSE` if no columns of `x` have a class [sir][as.sir()].
 #' @param custom_rules Custom rules to apply, created with [custom_eucast_rules()].
 #' @param overwrite A [logical] indicating whether to overwrite existing SIR values (default: `FALSE`). When `FALSE`, only non-SIR values are modified (i.e., any value that is not already S, I or R). To ensure compliance with EUCAST guidelines, **this should remain** `FALSE`, as EUCAST notes often state that an organism "should be tested for susceptibility to individual agents or be reported resistant".
+#' @param add_if_missing A [logical] indicating whether rules should also be applied to missing (`NA`) values (default: `TRUE`). When `FALSE`, rules are only applied to cells that already contain an SIR value; cells with `NA` are left untouched. This is particularly useful when using `overwrite = TRUE` with custom rules and you want to update reported results without imputing values for untested drugs.
 #' @inheritParams first_isolate
 #' @details
 #' **Note:** This function does not translate MIC or disk values to SIR values. Use [as.sir()] for that. \cr
@@ -170,6 +171,7 @@ interpretive_rules <- function(x,
                               only_sir_columns = any(is.sir(x)),
                               custom_rules = NULL,
                               overwrite = FALSE,
+                               add_if_missing = TRUE,
                               ...) {
  meet_criteria(x, allow_class = "data.frame")
  meet_criteria(col_mo, allow_class = "character", has_length = 1, is_in = colnames(x), allow_NULL = TRUE)
@@ -184,6 +186,12 @@ interpretive_rules <- function(x,
  meet_criteria(only_sir_columns, allow_class = "logical", has_length = 1)
  meet_criteria(custom_rules, allow_class = "custom_eucast_rules", allow_NULL = TRUE)
  meet_criteria(overwrite, allow_class = "logical", has_length = 1)
+  meet_criteria(add_if_missing, allow_class = "logical", has_length = 1)
+
+  stop_if(
+    !overwrite && !add_if_missing,
+    "Either set {.arg overwrite} or {.arg add_if_missing} to {.code TRUE}, or both."
+  )

  stop_if(
    guideline == "CLSI",
@@ -533,7 +541,8 @@ interpretive_rules <- function(x,
          warned = warned,
          info = info,
          verbose = verbose,
-          overwrite = overwrite
+          overwrite = overwrite,
+          add_if_missing = add_if_missing
        )
        n_added <- n_added + run_changes$added
        n_changed <- n_changed + run_changes$changed
@@ -575,7 +584,8 @@ interpretive_rules <- function(x,
          warned = warned,
          info = info,
          verbose = verbose,
-          overwrite = overwrite
+          overwrite = overwrite,
+          add_if_missing = add_if_missing
        )
        n_added <- n_added + run_changes$added
        n_changed <- n_changed + run_changes$changed
@@ -595,7 +605,7 @@ interpretive_rules <- function(x,
  } else {
    if (isTRUE(info)) {
      cat("\n")
-      message_("Skipping inhibitor-inheritance rules defined by this AMR package: setting S to drug+inhibitor where drug is S, and setting R to drug where drug+inhibitor is R. Add \"other\" or \"all\" to the {.arg rules} argument to apply those rules.")
+      message_("Skipping inhibitor-inheritance rules defined by this AMR package: setting S to drug+inhibitor where drug is S, and setting R to drug where drug+inhibitor is R. Add {.val other} or {.val all} to the {.arg rules} argument to apply those rules.")
    }
  }

@@ -609,7 +619,7 @@ interpretive_rules <- function(x,
  # >>> Apply Official EUCAST rules <<< ---------------------------------------------------
  eucast_notification_shown <- FALSE
  if (!is.null(list(...)$eucast_rules_df)) {
-    # this allows: eucast_rules(x, eucast_rules_df = AMR:::EUCAST_RULES_DF %>% filter(is.na(have_these_values)))
+    # this allows: eucast_rules(x, eucast_rules_df = AMR:::EUCAST_RULES_DF |> filter(is.na(have_these_values)))
    eucast_rules_df_total <- list(...)$eucast_rules_df
  } else {
    # otherwise internal data file, created in data-raw/_pre_commit_checks.R
@@ -862,7 +872,8 @@ interpretive_rules <- function(x,
      warned = warned,
      info = info,
      verbose = verbose,
-      overwrite = overwrite
+      overwrite = overwrite,
+      add_if_missing = add_if_missing
    )
    n_added <- n_added + run_changes$added
    n_changed <- n_changed + run_changes$changed
@@ -932,7 +943,8 @@ interpretive_rules <- function(x,
        warned = warned,
        info = info,
        verbose = verbose,
-        overwrite = overwrite
+        overwrite = overwrite,
+        add_if_missing = add_if_missing
      )
      n_added <- n_added + run_changes$added
      n_changed <- n_changed + run_changes$changed
@@ -1063,13 +1075,13 @@ interpretive_rules <- function(x,
    warn_lacking_sir_class <- warn_lacking_sir_class[order(colnames(x.bak))]
    warn_lacking_sir_class <- warn_lacking_sir_class[!is.na(warn_lacking_sir_class)]
    warning_(
-      "in {.help [{.fun eucast_rules}](AMR::eucast_rules)}: not all columns with antimicrobial results are of class {.cls sir}. Transform them on beforehand, e.g.:\n",
-      "  - ", highlight_code(paste0(x_deparsed, " %>% as.sir(", ifelse(length(warn_lacking_sir_class) == 1,
+      "in {.help [{.fun eucast_rules}](AMR::eucast_rules)}: not all columns with antimicrobial results are of class {.cls sir}. Transform them on beforehand, e.g.:\n\n",
+      "\u00a0\u00a0", AMR_env$bullet_icon, " ", highlight_code(paste0(x_deparsed, " |> as.sir(", ifelse(length(warn_lacking_sir_class) == 1,
        warn_lacking_sir_class,
        paste0(warn_lacking_sir_class[1], ":", warn_lacking_sir_class[length(warn_lacking_sir_class)])
-      ), ")")), "\n",
-      "  - ", highlight_code(paste0(x_deparsed, " %>% mutate_if(is_sir_eligible, as.sir)")), "\n",
-      "  - ", highlight_code(paste0(x_deparsed, " %>% mutate(across(where(is_sir_eligible), as.sir))"))
+      ), ")")), "\n\n",
+      "\u00a0\u00a0", AMR_env$bullet_icon, " ", highlight_code(paste0(x_deparsed, " |> mutate_if(is_sir_eligible, as.sir)")), "\n\n",
+      "\u00a0\u00a0", AMR_env$bullet_icon, " ", highlight_code(paste0(x_deparsed, " |> mutate(across(where(is_sir_eligible), as.sir))"))
    )
  }

@@ -1124,9 +1136,11 @@ edit_sir <- function(x,
                     warned,
                     info,
                     verbose,
-                     overwrite) {
+                     overwrite,
+                     add_if_missing) {
  cols <- unique(cols[!is.na(cols) & !is.null(cols)])
-
+  rows <- unique(rows)
+  
  # for Verbose Mode, keep track of all changes and return them
  track_changes <- list(
    added = 0,
@@ -1152,32 +1166,50 @@ edit_sir <- function(x,
      track_changes$sir_warn <- cols[!vapply(FUN.VALUE = logical(1), x[, cols, drop = FALSE], is.sir)]
    }
    isNA <- is.na(new_edits[rows, cols])
-    isSIR <- !isNA & (new_edits[rows, cols] == "S" | new_edits[rows, cols] == "I" | new_edits[rows, cols] == "R" | new_edits[rows, cols] == "SDD" | new_edits[rows, cols] == "NI" | new_edits[rows, cols] == "WT" | new_edits[rows, cols] == "NWT" | new_edits[rows, cols] == "NS")
+    isSIR <- !isNA &
+      (new_edits[rows, cols] == "S" |
+        new_edits[rows, cols] == "I" |
+        new_edits[rows, cols] == "R" |
+        new_edits[rows, cols] == "SDD" |
+        new_edits[rows, cols] == "NI" |
+        new_edits[rows, cols] == "WT" |
+        new_edits[rows, cols] == "NWT" |
+        new_edits[rows, cols] == "NS")
    non_SIR <- !isSIR
    if (isFALSE(overwrite) && any(isSIR) && message_not_thrown_before("edit_sir.warning_overwrite")) {
-      warning_("Some values had SIR values and were not overwritten, since {.code overwrite = FALSE}.")
+      warning_("in {.help [{.fun eucast_rules}](AMR::eucast_rules)}: some columns had SIR values which were not overwritten, since {.code overwrite = FALSE}.")
    }
-    tryCatch(
-      # insert into original table
-      if (isTRUE(overwrite)) {
-        new_edits[rows, cols] <- to
+    # determine which cells to modify based on overwrite and add_if_missing
+    if (isTRUE(overwrite)) {
+      if (isTRUE(add_if_missing)) {
+        apply_mask <- rep(TRUE, length(isSIR))
      } else {
-        new_edits[rows, cols][non_SIR] <- to
-      },
+        apply_mask <- isSIR
+      }
+    } else {
+      # overwrite = FALSE, add_if_missing = TRUE: fill missing and placeholder cells only
+      apply_mask <- !isSIR
+    }
+
+    do_assign <- function() {
+      subset <- new_edits[rows, cols, drop = FALSE]
+      mask <- matrix(apply_mask, nrow = nrow(subset), ncol = ncol(subset))
+      subset[mask] <- to
+      new_edits[rows, cols] <<- subset
+    }
+
+    tryCatch(
+      do_assign(),
      warning = function(w) {
        if (w$message %like% "invalid factor level") {
-          xyz <- vapply(FUN.VALUE = logical(1), cols, function(col) {
+          vapply(FUN.VALUE = logical(1), cols, function(col) {
            new_edits[, col] <<- factor(
              x = as.character(pm_pull(new_edits, col)),
              levels = unique(c(to, levels(pm_pull(new_edits, col))))
            )
            TRUE
          })
-          if (isTRUE(overwrite)) {
-            suppressWarnings(new_edits[rows, cols] <<- to)
-          } else {
-            suppressWarnings(new_edits[rows, cols][non_SIR] <<- to)
-          }
+          suppressWarnings(do_assign())
          warning_(
            "in {.help [{.fun eucast_rules}](AMR::eucast_rules)}: value \"", to, "\" added to the factor levels of column",
            ifelse(length(cols) == 1, "", "s"),
@@ -1185,7 +1217,7 @@ edit_sir <- function(x,
            " because this value was not an existing factor level."
          )
          txt_warning()
-          warned <- FALSE
+          warned <<- FALSE
        } else {
          warning_("in {.help [{.fun eucast_rules}](AMR::eucast_rules)}: ", w$message)
          txt_warning()
--- a/R/mo.R
+++ b/R/mo.R
@@ -792,7 +792,7 @@ print.mo <- function(x, print.shortnames = FALSE, ...) {
  names(x) <- x_names
  if (!all(x %in% c(AMR_env$MO_lookup$mo, NA))) {
    warning_(
-      "Some MO codes are from a previous AMR package version. ",
+      "Some MO codes are from another AMR package version. ",
      "Please update the MO codes with {.help [{.fun as.mo}](AMR::as.mo)}.",
      call = FALSE
    )
@@ -826,7 +826,7 @@ as.data.frame.mo <- function(x, ...) {
  add_MO_lookup_to_AMR_env()
  if (!all(x %in% c(AMR_env$MO_lookup$mo, NA))) {
    warning_(
-      "The data contains old MO codes (from a previous AMR package version). ",
+      "The data contains old MO codes (from another AMR package version). ",
      "Please update your MO codes with {.help [{.fun as.mo}](AMR::as.mo)}."
    )
  }
--- a/R/mo_property.R
+++ b/R/mo_property.R
@@ -31,7 +31,7 @@
 #'
 #' Use these functions to return a specific property of a microorganism based on the latest accepted taxonomy. All input values will be evaluated internally with [as.mo()], which makes it possible to use microbial abbreviations, codes and names as input. See *Examples*.
 #' @param x Any [character] (vector) that can be coerced to a valid microorganism code with [as.mo()]. Can be left blank for auto-guessing the column containing microorganism codes if used in a data set, see *Examples*.
-#' @param property One of the column names of the [microorganisms] data set: `r vector_or(colnames(microorganisms), sort = FALSE, quotes = TRUE)`, or must be `"shortname"`.
+#' @param property One of the column names of the [microorganisms] data set: `r vector_or(colnames(microorganisms), sort = FALSE, documentation = TRUE)`, or must be `"shortname"`.
 #' @inheritParams as.mo
 #' @param ... Other arguments passed on to [as.mo()], such as 'minimum_matching_score', 'ignore_pattern', and 'remove_from_input'.
 #' @param ab Any (vector of) text that can be coerced to a valid antibiotic drug code with [as.ab()].
--- a/R/pca.R
+++ b/R/pca.R
@@ -66,12 +66,12 @@
 #'
 #' # new ggplot2 plotting method using this package:
 #' if (require("dplyr") && require("ggplot2")) {
-#'     ggplot_pca(pca_result)
+#'   ggplot_pca(pca_result)
 #' }
 #' if (require("dplyr") && require("ggplot2")) {
-#'     ggplot_pca(pca_result) +
-#'       scale_colour_viridis_d() +
-#'       labs(title = "Title here")
+#'   ggplot_pca(pca_result) +
+#'     scale_colour_viridis_d() +
+#'     labs(title = "Title here")
 #' }
 #' }
 pca <- function(x,
--- a/R/plotting.R
+++ b/R/plotting.R
@@ -200,7 +200,7 @@
 #'     theme_minimal() +
 #'     geom_boxplot(fill = NA, colour = "grey30") +
 #'     geom_jitter(width = 0.25)
-#'     labs(title = "scale_y_mic()/scale_colour_sir() automatically applied")
+#'   labs(title = "scale_y_mic()/scale_colour_sir() automatically applied")
 #'
 #'   mic_sir_plot
 #' }
--- a/R/sir.R
+++ b/R/sir.R
@@ -65,7 +65,7 @@ VALID_SIR_LEVELS <- c("S", "SDD", "I", "R", "NI", "WT", "NWT", "NS")
 #' @param substitute_missing_r_breakpoint A [logical] to indicate that a missing clinical breakpoints for R (resistant) must be substituted with R - the default is `FALSE`. Some (especially CLSI) breakpoints only have a breakpoint for S, meaning that the outcome can only be `"S"` or `NA`. Setting this to `TRUE` will convert the `NA`s in these cases to `"R"`. Can also be set with the package option [`AMR_substitute_missing_r_breakpoint`][AMR-options].
 #' @param include_screening A [logical] to indicate that clinical breakpoints for screening are allowed - the default is `FALSE`. Can also be set with the package option [`AMR_include_screening`][AMR-options].
 #' @param include_PKPD A [logical] to indicate that PK/PD clinical breakpoints must be applied as a last resort - the default is `TRUE`. Can also be set with the package option [`AMR_include_PKPD`][AMR-options].
-#' @param breakpoint_type The type of breakpoints to use, either `r vector_or(clinical_breakpoints$type)`. ECOFF stands for Epidemiological Cut-Off values. The default is `"human"`, which can also be set with the package option [`AMR_breakpoint_type`][AMR-options]. If `host` is set to values of veterinary species, this will automatically be set to `"animal"`.
+#' @param breakpoint_type The type of breakpoints to use, either `r vector_or(clinical_breakpoints$type, documentation = TRUE)`. ECOFF stands for Epidemiological Cut-Off values. The default is `"human"`, which can also be set with the package option [`AMR_breakpoint_type`][AMR-options]. If `host` is set to values of veterinary species, this will automatically be set to `"animal"`.
 #' @param host A vector (or column name) with [character]s to indicate the host. Only useful for veterinary breakpoints, as it requires `breakpoint_type = "animal"`. The values can be any text resembling the animal species, even in any of the `r length(LANGUAGES_SUPPORTED)` supported languages of this package. For foreign languages, be sure to set the language with [set_AMR_locale()] (though it will be automatically guessed based on the system language).
 #' @param language Language to convert values set in `host` when using animal breakpoints. Use one of these supported language names or [ISO 639-1 codes](https://en.wikipedia.org/wiki/ISO_639-1): `r vector_or(paste0(sapply(LANGUAGES_SUPPORTED_NAMES, function(x) x[[1]]), " (" , LANGUAGES_SUPPORTED, ")"), quotes = FALSE, sort = FALSE)`.
 #' @param verbose A [logical] to indicate that all notes should be printed during interpretation of MIC values or disk diffusion values.
--- a/R/tidymodels.R
+++ b/R/tidymodels.R
@@ -21,7 +21,6 @@
 #' @export
 #' @examples
 #' if (require("tidymodels")) {
-#'
 #'   # The below approach formed the basis for this paper: DOI 10.3389/fmicb.2025.1582703
 #'   # Presence of ESBL genes was predicted based on raw MIC values.
 #'
@@ -40,13 +39,10 @@
 #'
 #'   # Create and prep a recipe with MIC log2 transformation
 #'   mic_recipe <- recipe(esbl ~ ., data = training_data) %>%
-#'
 #'     # Optionally remove non-predictive variables
 #'     remove_role(genus, old_role = "predictor") %>%
-#'
 #'     # Apply the log2 transformation to all MIC predictors
 #'     step_mic_log2(all_mic_predictors()) %>%
-#'
 #'     # And apply the preparation steps
 #'     prep()
 #'
@@ -67,13 +63,15 @@
 #'     bind_cols(out_testing)
 #'
 #'   # Evaluate predictions using standard classification metrics
-#'   our_metrics <- metric_set(accuracy,
-#'                             recall,
-#'                             precision,
-#'                             sensitivity,
-#'                             specificity,
-#'                             ppv,
-#'                             npv)
+#'   our_metrics <- metric_set(
+#'     accuracy,
+#'     recall,
+#'     precision,
+#'     sensitivity,
+#'     specificity,
+#'     ppv,
+#'     npv
+#'   )
 #'   metrics <- our_metrics(predictions, truth = esbl, estimate = .pred_class)
 #'
 #'   # Show performance
--- a/R/top_n_microorganisms.R
+++ b/R/top_n_microorganisms.R
@@ -32,7 +32,7 @@
 #' This function filters a data set to include only the top *n* microorganisms based on a specified property, such as taxonomic family or genus. For example, it can filter a data set to the top 3 species, or to any species in the top 5 genera, or to the top 3 species in each of the top 5 genera.
 #' @param x A data frame containing microbial data.
 #' @param n An integer specifying the maximum number of unique values of the `property` to include in the output.
-#' @param property A character string indicating the microorganism property to use for filtering. Must be one of the column names of the [microorganisms] data set: `r vector_or(colnames(microorganisms), sort = FALSE, quotes = TRUE)`. If `NULL`, the raw values from `col_mo` will be used without transformation. When using `"species"` (default) or `"subpecies"`, the genus will be added to make sure each (sub)species still belongs to the right genus.
+#' @param property A character string indicating the microorganism property to use for filtering. Must be one of the column names of the [microorganisms] data set: `r vector_or(colnames(microorganisms), sort = FALSE, documentation = TRUE)`. If `NULL`, the raw values from `col_mo` will be used without transformation. When using `"species"` (default) or `"subpecies"`, the genus will be added to make sure each (sub)species still belongs to the right genus.
 #' @param n_for_each An optional integer specifying the maximum number of rows to retain for each value of the selected property. If `NULL`, all rows within the top *n* groups will be included.
 #' @param col_mo A character string indicating the column in `x` that contains microorganism names or codes. Defaults to the first column of class [`mo`]. Values will be coerced using [as.mo()].
 #' @param ... Additional arguments passed on to [mo_property()] when `property` is not `NULL`.