(v3.0.0.9034) fix MycoBank synonyms

2026-01-12 05:54:39 +01:00 · 2025-09-18 13:58:34 +01:00
parent 5796e8f3a4
commit 10ba36821e
16 changed files with 8653 additions and 7641 deletions
--- a/4
+++ b/4
@@ -1,6 +1,6 @@
 Package: AMR
-Version: 3.0.0.9033
-Date: 2025-09-15
+Version: 3.0.0.9034
+Date: 2025-09-18
 Title: Antimicrobial Resistance Data Analysis
 Description: Functions to simplify and standardise antimicrobial resistance (AMR)
  data analysis and to work with microbial and antimicrobial properties by
--- a/NEWS.md
+++ b/NEWS.md
@@ -1,4 +1,4 @@
-# AMR 3.0.0.9033
+# AMR 3.0.0.9034

 This is a bugfix release following the release of v3.0.0 in June 2025.

@@ -11,8 +11,9 @@ This is a bugfix release following the release of v3.0.0 in June 2025.
 * Fixed a bug in `as.sir()` to allow any tidyselect language (#220)
 * Fixed a bug in `as.sir()` to pick right breakpoint when `uti = FALSE` (#216)
 * Fixed a bug in `ggplot_sir()` when using `combine_SI = FALSE` (#213)
-* Fixed a bug the `antimicrobials` data set to remove statins (#229)
 * Fixed a bug in `mdro()` to make sure all genes specified in arguments are acknowledged
+* Fixed a bug the `antimicrobials` data set to remove statins (#229)
+* Fixed a bug the `microorganisms` data set for MycoBank IDs and synonyms (#233)
 * Fixed ATC J01CR05 to map to piperacillin/tazobactam rather than piperacillin/sulbactam (#230)
 * Fixed skimmers (`skimr` package) of class `ab`, `sir`, and `disk` (#234)
 * Fixed all plotting to contain a separate colour for SDD (susceptible dose-dependent) (#223)
--- a/R/aa_helper_functions.R
+++ b/R/aa_helper_functions.R
@@ -485,11 +485,7 @@ word_wrap <- function(...,
  }

  # format backticks
-  if (pkg_is_available("cli") &&
-    tryCatch(isTRUE(getExportedValue("ansi_has_hyperlink_support", ns = asNamespace("cli"))()), error = function(e) FALSE) &&
-    tryCatch(getExportedValue("isAvailable", ns = asNamespace("rstudioapi"))(), error = function(e) {
-      return(FALSE)
-    }) &&
+  if (pkg_is_available("cli") && in_rstudio() &&
    tryCatch(getExportedValue("versionInfo", ns = asNamespace("rstudioapi"))()$version > "2023.6.0.0", error = function(e) {
      return(FALSE)
    })) {
@@ -1188,6 +1184,13 @@ reset_all_thrown_messages <- function() {
  )
 }

+in_rstudio <- function() {
+  identical(Sys.getenv("RSTUDIO"), "1")
+}
+in_positron <- function() {
+  identical(Sys.getenv("POSITRON"), "1")
+}
+
 has_colour <- function() {
  if (is.null(AMR_env$supports_colour)) {
    if (Sys.getenv("EMACS") != "" || Sys.getenv("INSIDE_EMACS") != "") {
@@ -1222,7 +1225,7 @@ is_dark <- function() {
  AMR_env$current_theme <- tryCatch(getExportedValue("getThemeInfo", ns = asNamespace("rstudioapi"))()$editor, error = function(e) NULL)
  if (!identical(AMR_env$current_theme, AMR_env$former_theme) || is.null(AMR_env$is_dark_theme)) {
    AMR_env$former_theme <- AMR_env$current_theme
-    AMR_env$is_dark_theme <- !has_colour() || tryCatch(isTRUE(getExportedValue("getThemeInfo", ns = asNamespace("rstudioapi"))()$dark), error = function(e) FALSE)
+    AMR_env$is_dark_theme <- !has_colour() || tryCatch(isTRUE(getExportedValue("getThemeInfo", ns = asNamespace("rstudioapi"))()$dark), error = function(e) TRUE)
  }
  isTRUE(AMR_env$is_dark_theme)
 }
--- a/R/first_isolate.R
+++ b/R/first_isolate.R
@@ -61,7 +61,7 @@
 #'
 #' All isolates with a microbial ID of `NA` will be excluded as first isolate.
 #'
-#' ### Different methods
+#' ## Different methods
 #'
 #' According to previously-mentioned sources, there are different methods (algorithms) to select first isolates with increasing reliability: isolate-based, patient-based, episode-based and phenotype-based. All methods select on a combination of the taxonomic genus and species (not subspecies).
 #'
@@ -89,21 +89,29 @@
 #' | - Major difference in any antimicrobial result   | - `first_isolate(x, type = "points")`                 |
 #' | - Any difference in key antimicrobial results    | - `first_isolate(x, type = "keyantimicrobials")`      |
 #'
-#' ### Isolate-based
+#' **Isolate-based**
+#'
+#' _Minimum variables required: Microorganism identifier_
 #'
 #' This method does not require any selection, as all isolates should be included. It does, however, respect all arguments set in the [first_isolate()] function. For example, the default setting for `include_unknown` (`FALSE`) will omit selection of rows without a microbial ID.
 #'
-#' ### Patient-based
+#' **Patient-based**
 #'
-#' To include every genus-species combination per patient once, set the `episode_days` to `Inf`. This method makes sure that no duplicate isolates are selected from the same patient. This method is preferred to e.g. identify the first MRSA finding of each patient to determine the incidence. Conversely, in a large longitudinal data set, this could mean that isolates are *excluded* that were found years after the initial isolate.
+#' _Minimum variables required: Microorganism identifier, Patient identifier_
 #'
-#' ### Episode-based
+#' This method includes every genus-species combination per patient once. This method makes sure that no duplicate isolates are selected from the same patient. This method is preferred to e.g. identify the first MRSA finding of each patient to determine the incidence. Conversely, in a large longitudinal data set, this could mean that isolates are *excluded* that were found years after the initial isolate.
 #'
-#' To include every genus-species combination per patient episode once, set the `episode_days` to a sensible number of days. Depending on the type of analysis, this could be 14, 30, 60 or 365. Short episodes are common for analysing specific hospital or ward data or ICU cases, long episodes are common for analysing regional and national data.
+#' **Episode-based**
+#'
+#' _Minimum variables required: Microorganism identifier, Patient identifier, Date_
+#'
+#' To include every genus-species combination per patient episode once, set the `episode_days` to a sensible number of days. Depending on the type of analysis, this could be e.g., 14, 30, 60 or 365. Short episodes are common for analysing specific hospital or ward data or ICU cases, long episodes are common for analysing regional and national data.
 #'
 #' This is the most common method to correct for duplicate isolates. Patients are categorised into episodes based on their ID and dates (e.g., the date of specimen receipt or laboratory result). While this is a common method, it does not take into account antimicrobial test results. This means that e.g. a methicillin-resistant *Staphylococcus aureus* (MRSA) isolate cannot be differentiated from a wildtype *Staphylococcus aureus* isolate.
 #'
-#' ### Phenotype-based
+#' **Phenotype-based**
+#'
+#' _Minimum variables required: Microorganism identifier, Patient identifier, Date, Antimicrobial test results_
 #'
 #' This is a more reliable method, since it also *weighs* the antibiogram (antimicrobial test results) yielding so-called 'first weighted isolates'. There are two different methods to weigh the antibiogram:
 #'
--- a/data-raw/_reproduction_scripts/reproduction_of_microorganisms.R
+++ b/data-raw/_reproduction_scripts/reproduction_of_microorganisms.R
--- a/data-raw/datasets/microorganisms.dta
+++ b/data-raw/datasets/microorganisms.dta
--- a/data-raw/datasets/microorganisms.feather
+++ b/data-raw/datasets/microorganisms.feather
--- a/data-raw/datasets/microorganisms.parquet
+++ b/data-raw/datasets/microorganisms.parquet
--- a/data-raw/datasets/microorganisms.rds
+++ b/data-raw/datasets/microorganisms.rds
--- a/data-raw/datasets/microorganisms.sav
+++ b/data-raw/datasets/microorganisms.sav
--- a/data-raw/datasets/microorganisms.txt
+++ b/data-raw/datasets/microorganisms.txt
--- a/data-raw/datasets/microorganisms.xlsx
+++ b/data-raw/datasets/microorganisms.xlsx
--- a/data-raw/microorganisms.md5
+++ b/data-raw/microorganisms.md5
@@ -1 +1 @@
-5908f9e6e7687dfb8301d27fb26d1790
+6dc4dded108052760bfb626df03435e2
--- a/data/microorganisms.rda
+++ b/data/microorganisms.rda
--- a/man/first_isolate.Rd
+++ b/man/first_isolate.Rd
@@ -108,26 +108,30 @@ All mentioned methods are covered in the \code{\link[=first_isolate]{first_isola
   - Any difference in key antimicrobial results \tab - \code{first_isolate(x, type = "keyantimicrobials")} \cr
 }

-}

-\subsection{Isolate-based}{
+\strong{Isolate-based}
+
+\emph{Minimum variables required: Microorganism identifier}

 This method does not require any selection, as all isolates should be included. It does, however, respect all arguments set in the \code{\link[=first_isolate]{first_isolate()}} function. For example, the default setting for \code{include_unknown} (\code{FALSE}) will omit selection of rows without a microbial ID.
-}

-\subsection{Patient-based}{
+\strong{Patient-based}

-To include every genus-species combination per patient once, set the \code{episode_days} to \code{Inf}. This method makes sure that no duplicate isolates are selected from the same patient. This method is preferred to e.g. identify the first MRSA finding of each patient to determine the incidence. Conversely, in a large longitudinal data set, this could mean that isolates are \emph{excluded} that were found years after the initial isolate.
-}
+\emph{Minimum variables required: Microorganism identifier, Patient identifier}

-\subsection{Episode-based}{
+This method includes every genus-species combination per patient once. This method makes sure that no duplicate isolates are selected from the same patient. This method is preferred to e.g. identify the first MRSA finding of each patient to determine the incidence. Conversely, in a large longitudinal data set, this could mean that isolates are \emph{excluded} that were found years after the initial isolate.

-To include every genus-species combination per patient episode once, set the \code{episode_days} to a sensible number of days. Depending on the type of analysis, this could be 14, 30, 60 or 365. Short episodes are common for analysing specific hospital or ward data or ICU cases, long episodes are common for analysing regional and national data.
+\strong{Episode-based}
+
+\emph{Minimum variables required: Microorganism identifier, Patient identifier, Date}
+
+To include every genus-species combination per patient episode once, set the \code{episode_days} to a sensible number of days. Depending on the type of analysis, this could be e.g., 14, 30, 60 or 365. Short episodes are common for analysing specific hospital or ward data or ICU cases, long episodes are common for analysing regional and national data.

 This is the most common method to correct for duplicate isolates. Patients are categorised into episodes based on their ID and dates (e.g., the date of specimen receipt or laboratory result). While this is a common method, it does not take into account antimicrobial test results. This means that e.g. a methicillin-resistant \emph{Staphylococcus aureus} (MRSA) isolate cannot be differentiated from a wildtype \emph{Staphylococcus aureus} isolate.
-}

-\subsection{Phenotype-based}{
+\strong{Phenotype-based}
+
+\emph{Minimum variables required: Microorganism identifier, Patient identifier, Date, Antimicrobial test results}

 This is a more reliable method, since it also \emph{weighs} the antibiogram (antimicrobial test results) yielding so-called 'first weighted isolates'. There are two different methods to weigh the antibiogram:
 \enumerate{
--- a/man/microorganisms.Rd
+++ b/man/microorganisms.Rd
@@ -18,7 +18,7 @@ A \link[tibble:tibble]{tibble} with 78 679 observations and 26 variables:
 \item \code{lpsn}\cr Identifier ('Record number') of List of Prokaryotic names with Standing in Nomenclature (LPSN). This will be the first/highest LPSN identifier to keep one identifier per row. For example, \emph{Acetobacter ascendens} has LPSN Record number 7864 and 11011. Only the first is available in the \code{microorganisms} data set. \emph{\strong{This is a unique identifier}}, though available for only ~33 000 records.
 \item \code{lpsn_parent}\cr LPSN identifier of the parent taxon
 \item \code{lpsn_renamed_to}\cr LPSN identifier of the currently valid taxon
-\item \code{mycobank}\cr Identifier ('MycoBank #') of MycoBank. \emph{\strong{This is a unique identifier}}, though available for only ~18 000 records.
+\item \code{mycobank}\cr Identifier ('MycoBank #') of MycoBank. \emph{\strong{This is a unique identifier}}, though available for only ~19 000 records.
 \item \code{mycobank_parent}\cr MycoBank identifier of the parent taxon
 \item \code{mycobank_renamed_to}\cr MycoBank identifier of the currently valid taxon
 \item \code{gbif}\cr Identifier ('taxonID') of Global Biodiversity Information Facility (GBIF). \emph{\strong{This is a unique identifier}}, though available for only ~49 000 records.
@@ -70,7 +70,7 @@ Included taxonomic data from \href{https://lpsn.dsmz.de}{LPSN}, \href{https://ww
 \item ~28 000 species from the kingdom of Fungi. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package. Only relevant fungi are covered (such as all species of \emph{Aspergillus}, \emph{Candida}, \emph{Cryptococcus}, \emph{Histoplasma}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).
 \item ~8 100 (sub)species from the kingdom of Protozoa
 \item ~1 600 (sub)species from 39 other relevant genera from the kingdom of Animalia (such as \emph{Strongyloides} and \emph{Taenia})
-\item All ~22 000 previously accepted names of all included (sub)species (these were taxonomically renamed)
+\item All ~26 000 previously accepted names of all included (sub)species (these were taxonomically renamed)
 \item The complete taxonomic tree of all included (sub)species: from kingdom to subspecies
 \item The identifier of the parent taxons
 \item The year and first author of the related scientific publication