(v1.7.1.9051) updated taxonomy, updated git branch name

2026-07-15 12:30:54 +02:00 · 2021-10-06 13:23:57 +02:00
parent 8f5e5a3fc2
commit 37e6e35ec4
139 changed files with 2694 additions and 1862 deletions
--- a/R/catalogue_of_life.R
+++ b/R/catalogue_of_life.R
@@ -41,7 +41,7 @@ format_included_data_number <- function(data) {

 #' The Catalogue of Life
 #'
-#' This package contains the complete taxonomic tree of almost all microorganisms from the authoritative and comprehensive Catalogue of Life.
+#' This package contains the complete taxonomic tree (last updated: `r CATALOGUE_OF_LIFE$yearmonth_LPSN`) of almost all microorganisms from the authoritative and comprehensive Catalogue of Life (CoL), supplemented with data from the List of Prokaryotic names with Standing in Nomenclature (LPSN).
 #' @section Catalogue of Life:
 #' \if{html}{\figure{logo_col.png}{options: height=40px style=margin-bottom:5px} \cr}
 #' This package contains the complete taxonomic tree of almost all microorganisms (`r format_included_data_number(microorganisms)` species) from the authoritative and comprehensive Catalogue of Life (CoL, <http://www.catalogueoflife.org>). The CoL is the most comprehensive and authoritative global index of species currently available. Nonetheless, we supplemented the CoL data with data from the List of Prokaryotic names with Standing in Nomenclature (LPSN, [lpsn.dsmz.de](https://lpsn.dsmz.de)). This supplementation is needed until the [CoL+ project](https://github.com/CatalogueOfLife/general) is finished, which we await.
@@ -58,7 +58,7 @@ format_included_data_number <- function(data) {
 #'
 #' The Catalogue of Life (<http://www.catalogueoflife.org>) is the most comprehensive and authoritative global index of species currently available. It holds essential information on the names, relationships and distributions of over 1.9 million species. The Catalogue of Life is used to support the major biodiversity and conservation information services such as the Global Biodiversity Information Facility (GBIF), Encyclopedia of Life (EoL) and the International Union for Conservation of Nature Red List. It is recognised by the Convention on Biological Diversity as a significant component of the Global Taxonomy Initiative and a contribution to Target 1 of the Global Strategy for Plant Conservation.
 #'
-#' The syntax used to transform the original data to a cleansed \R format, can be found here: <https://github.com/msberends/AMR/blob/master/data-raw/reproduction_of_microorganisms.R>.
+#' The syntax used to transform the original data to a cleansed \R format, can be found here: <https://github.com/msberends/AMR/blob/main/data-raw/reproduction_of_microorganisms.R>.
 #' @inheritSection AMR Read more on Our Website!
 #' @name catalogue_of_life
 #' @rdname catalogue_of_life
@@ -141,5 +141,5 @@ print.catalogue_of_life_version <- function(x, ...) {
             "  Number of included bacterial species: ", format(x$LPSN$n, big.mark = ","), "\n\n",
             "=> Total number of species included:  ", format(x$total_included$n_total_species, big.mark = ","), "\n",
             "=> Total number of synonyms included: ", format(x$total_included$n_total_synonyms, big.mark = ","), "\n\n",
-             "See for more info ?microorganisms and ?catalogue_of_life.\n"))
+             "See for more info ", font_grey_bg("`?microorganisms`"), " and ", font_grey_bg("`?catalogue_of_life`"), ".\n"))
 }
--- a/R/data.R
+++ b/R/data.R
@@ -60,13 +60,13 @@
 #' ## Direct download
 #' These data sets are available as 'flat files' for use even without \R - you can find the files here:
 #' 
-#' * <https://github.com/msberends/AMR/raw/master/data-raw/antibiotics.txt>
-#' * <https://github.com/msberends/AMR/raw/master/data-raw/antivirals.txt>
+#' * <https://github.com/msberends/AMR/raw/main/data-raw/antibiotics.txt>
+#' * <https://github.com/msberends/AMR/raw/main/data-raw/antivirals.txt>
 #' 
 #' Files in \R format (with preserved data structure) can be found here:
 #' 
-#' * <https://github.com/msberends/AMR/raw/master/data/antibiotics.rda>
-#' * <https://github.com/msberends/AMR/raw/master/data/antivirals.rda>
+#' * <https://github.com/msberends/AMR/raw/main/data/antibiotics.rda>
+#' * <https://github.com/msberends/AMR/raw/main/data/antivirals.rda>
 #' @source World Health Organization (WHO) Collaborating Centre for Drug Statistics Methodology (WHOCC): <https://www.whocc.no/atc_ddd_index/>
 #'
 #' WHONET 2019 software: <http://www.whonet.org/software.html>
@@ -83,7 +83,7 @@

 #' Data Set with `r format(nrow(microorganisms), big.mark = ",")` Microorganisms
 #'
-#' A data set containing the microbial taxonomy, last updated in `r CATALOGUE_OF_LIFE$yearmonth_LPSN`, of six kingdoms from the Catalogue of Life (CoL) and the List of Prokaryotic names with Standing in Nomenclature (LPSN). MO codes can be looked up using [as.mo()].
+#' A data set containing the full microbial taxonomy (**last updated: `r CATALOGUE_OF_LIFE$yearmonth_LPSN`**) of `r nr2char(length(unique(microorganisms$kingdom[!microorganisms$kingdom %like% "unknown"])))` kingdoms from the Catalogue of Life (CoL) and the List of Prokaryotic names with Standing in Nomenclature (LPSN). MO codes can be looked up using [as.mo()].
 #' @inheritSection catalogue_of_life Catalogue of Life
 #' @format A [data.frame] with `r format(nrow(microorganisms), big.mark = ",")` observations and `r ncol(microorganisms)` variables:
 #' - `mo`\cr ID of microorganism as used by this package
@@ -105,20 +105,21 @@
 #' 
 #' - 11 entries of *Streptococcus* (beta-haemolytic: groups A, B, C, D, F, G, H, K and unspecified; other: viridans, milleri)
 #' - 2 entries of *Staphylococcus* (coagulase-negative (CoNS) and coagulase-positive (CoPS))
-#' - 3 entries of *Trichomonas* (*Trichomonas vaginalis*, and its family and genus)
-#' - 1 entry of *Candida* (*Candida krusei*), that is not (yet) in the Catalogue of Life
-#' - 1 entry of *Blastocystis* (*Blastocystis hominis*), although it officially does not exist (Noel *et al.* 2005, PMID 15634993)
+#' - 3 entries of *Trichomonas* (*T. vaginalis*, and its family and genus)
+#' - 1 entry of *Candida* (*C.  krusei*), that is not (yet) in the Catalogue of Life
+#' - 1 entry of *Blastocystis* (*B.  hominis*), although it officially does not exist (Noel *et al.* 2005, PMID 15634993)
+#' - 1 entry of *Moraxella* (*M. catarrhalis*), which was formally named *Branhamella catarrhalis* (Catlin, 1970) though this change was never accepted within the field of clinical microbiology
 #' - 5 other 'undefined' entries (unknown, unknown Gram negatives, unknown Gram positives, unknown yeast and unknown fungus)
 #' - 6 families under the Enterobacterales order, according to Adeolu *et al.* (2016, PMID 27620848), that are not (yet) in the Catalogue of Life
 #' 
 #' ## Direct download
 #' This data set is available as 'flat file' for use even without \R - you can find the file here:
 #' 
-#' * <https://github.com/msberends/AMR/raw/master/data-raw/microorganisms.txt>
+#' * <https://github.com/msberends/AMR/raw/main/data-raw/microorganisms.txt>
 #' 
 #' The file in \R format (with preserved data structure) can be found here:
 #' 
-#' * <https://github.com/msberends/AMR/raw/master/data/microorganisms.rda>
+#' * <https://github.com/msberends/AMR/raw/main/data/microorganisms.rda>
 #' @section About the Records from LPSN (see *Source*):
 #' The List of Prokaryotic names with Standing in Nomenclature (LPSN) provides comprehensive information on the nomenclature of prokaryotes. LPSN is a free to use service founded by Jean P. Euzeby in 1997 and later on maintained by Aidan C. Parte.
 #' 
@@ -251,7 +252,7 @@
 #' - `breakpoint_S`\cr Lowest MIC value or highest number of millimetres that leads to "S"
 #' - `breakpoint_R`\cr Highest MIC value or lowest number of millimetres that leads to "R"
 #' - `uti`\cr A [logical] value (`TRUE`/`FALSE`) to indicate whether the rule applies to a urinary tract infection (UTI)
-#' @details The repository of this `AMR` package contains a file comprising this exact data set: <https://github.com/msberends/AMR/blob/master/data-raw/rsi_translation.txt>. This file **allows for machine reading EUCAST and CLSI guidelines**, which is almost impossible with the Excel and PDF files distributed by EUCAST and CLSI. The file is updated automatically.
+#' @details The repository of this `AMR` package contains a file comprising this exact data set: <https://github.com/msberends/AMR/blob/main/data-raw/rsi_translation.txt>. This file **allows for machine reading EUCAST and CLSI guidelines**, which is almost impossible with the Excel and PDF files distributed by EUCAST and CLSI. The file is updated automatically.
 #' @inheritSection AMR Reference Data Publicly Available
 #' @inheritSection AMR Read more on Our Website!
 #' @seealso [intrinsic_resistant]
@@ -263,7 +264,7 @@
 #' @format A [data.frame] with `r format(nrow(intrinsic_resistant), big.mark = ",")` observations and `r ncol(intrinsic_resistant)` variables:
 #' - `microorganism`\cr Name of the microorganism
 #' - `antibiotic`\cr Name of the antibiotic drug
-#' @details The repository of this `AMR` package contains a file comprising this exact data set: <https://github.com/msberends/AMR/blob/master/data-raw/intrinsic_resistant.txt>. This file **allows for machine reading EUCAST guidelines about intrinsic resistance**, which is almost impossible with the Excel and PDF files distributed by EUCAST. The file is updated automatically.
+#' @details The repository of this `AMR` package contains a file comprising this exact data set: <https://github.com/msberends/AMR/blob/main/data-raw/intrinsic_resistant.txt>. This file **allows for machine reading EUCAST guidelines about intrinsic resistance**, which is almost impossible with the Excel and PDF files distributed by EUCAST. The file is updated automatically.
 #' 
 #' This data set is based on `r format_eucast_version_nr(3.2)`.
 #' @inheritSection AMR Reference Data Publicly Available
--- a/R/eucast_rules.R
+++ b/R/eucast_rules.R
@@ -66,7 +66,7 @@ format_eucast_version_nr <- function(version, markdown = TRUE) {
 #' **Note:** This function does not translate MIC values to RSI values. Use [as.rsi()] for that. \cr
 #' **Note:** When ampicillin (AMP, J01CA01) is not available but amoxicillin (AMX, J01CA04) is, the latter will be used for all rules where there is a dependency on ampicillin. These drugs are interchangeable when it comes to expression of antimicrobial resistance. \cr
 #'
-#' The file containing all EUCAST rules is located here: <https://github.com/msberends/AMR/blob/master/data-raw/eucast_rules.tsv>.  **Note:** Old taxonomic names are replaced with the current taxonomy where applicable. For example, *Ochrobactrum anthropi* was renamed to *Brucella anthropi* in 2020; the original EUCAST rules v3.1 and v3.2 did not yet contain this new taxonomic name. The file used as input for this `AMR` package contains the taxonomy updated until [`r CATALOGUE_OF_LIFE$yearmonth_LPSN`][catalogue_of_life()].
+#' The file containing all EUCAST rules is located here: <https://github.com/msberends/AMR/blob/main/data-raw/eucast_rules.tsv>.  **Note:** Old taxonomic names are replaced with the current taxonomy where applicable. For example, *Ochrobactrum anthropi* was renamed to *Brucella anthropi* in 2020; the original EUCAST rules v3.1 and v3.2 did not yet contain this new taxonomic name. The file used as input for this `AMR` package contains the taxonomy updated until [`r CATALOGUE_OF_LIFE$yearmonth_LPSN`][catalogue_of_life()].
 #' 
 #' ## Custom Rules
 #' 
--- a/R/globals.R
+++ b/R/globals.R
@@ -54,7 +54,7 @@ CATALOGUE_OF_LIFE <- list(
  version = "Catalogue of Life: {year} Annual Checklist",
  url_CoL = "http://www.catalogueoflife.org/col/",
  url_LPSN = "https://lpsn.dsmz.de",
-  yearmonth_LPSN = "March 2021"
+  yearmonth_LPSN = "5 October 2021"
 )

 globalVariables(c(".rowid",
--- a/R/mo.R
+++ b/R/mo.R
@@ -201,7 +201,8 @@ as.mo <- function(x,
               & isFALSE(Becker)
               & isFALSE(Lancefield), error = function(e) FALSE)) {
    # to improve speed, special case for taxonomically correct full names (case-insensitive)
-    return(MO_lookup[match(gsub(".*(unknown ).*", "unknown name", tolower(x), perl = TRUE), MO_lookup$fullname_lower), "mo", drop = TRUE])
+    return(set_clean_class(MO_lookup[match(gsub(".*(unknown ).*", "unknown name", tolower(x), perl = TRUE), MO_lookup$fullname_lower), "mo", drop = TRUE],
+                           new_class = c("mo", "character")))
  }

  if (!is.null(reference_df)
@@ -233,7 +234,7 @@ as.mo <- function(x,
                     info = info,
                     ...)
  }
-
+  
  set_clean_class(y,
                  new_class = c("mo", "character"))
 }
@@ -1499,20 +1500,23 @@ exec_as.mo <- function(x,
    # - Becker et al. 2014, PMID 25278577
    # - Becker et al. 2019, PMID 30872103
    # - Becker et al. 2020, PMID 32056452
-    post_Becker <- character(0) # 2020-10-20 currently all are mentioned in above papers (otherwise uncomment the section below)
+    post_Becker <- c("caledonicus", "canis", "durrellii", "lloydii", "roterodami")
    
    # nolint start
-    # if (any(x %in% MO_lookup[which(MO_lookup$species %in% post_Becker), property])) {
-    #   warning_("Becker ", font_italic("et al."), " (2014, 2019) does not contain these species named after their publication: ",
-    #            font_italic(paste("S.",
-    #                              sort(mo_species(unique(x[x %in% MO_lookup[which(MO_lookup$species %in% post_Becker), property]]))),
-    #                              collapse = ", ")),
-    #            ".",
-    #            call = FALSE,
-    #            immediate = TRUE)
-    # }
+    # comment below code if all staphylococcal species are categorised as CoNS/CoPS
+    if (any(x %in% MO_lookup[which(MO_lookup$species %in% post_Becker), property])) {
+      if (message_not_thrown_before("as.mo_becker")) {
+        warning_("Becker ", font_italic("et al."), " (2014, 2019, 2020) does not contain these species named after their publication: ",
+                 font_italic(paste("S.",
+                                   sort(mo_species(unique(x[x %in% MO_lookup[which(MO_lookup$species %in% post_Becker), property]]))),
+                                   collapse = ", ")),
+                 ". Categorisation to CoNS/CoPS was taken from the original scientific publication(s).",
+                 call = FALSE,
+                 immediate = TRUE)
+      }
+    }
    # nolint end
-
+    
    # 'MO_CONS' and 'MO_COPS' are <mo> vectors created in R/zzz.R
    CoNS <- MO_lookup[which(MO_lookup$mo %in% MO_CONS), property, drop = TRUE]
    x[x %in% CoNS] <- lookup(mo == "B_STPHY_CONS", uncertainty = -1)
@@ -1916,7 +1920,7 @@ print.mo_uncertainties <- function(x, ...) {
    txt <- paste(txt,
                 paste0(
                   strwrap(
-                     paste0(font_red('"', x[i, ]$input, '"', collapse = ""),
+                     paste0('"', x[i, ]$input, '"',
                            " -> ",
                            paste0(font_bold(font_italic(x[i, ]$fullname)),
                                   ifelse(!is.na(x[i, ]$renamed_to), paste(", renamed to", font_italic(x[i, ]$renamed_to)), ""),
@@ -2047,6 +2051,8 @@ parse_and_convert <- function(x) {
 }

 replace_old_mo_codes <- function(x, property) {
+  # this function transform old MO codes to current codes, such as:
+  # B_ESCH_COL (AMR v0.5.0) -> B_ESCHR_COLI
  ind <- x %like_case% "^[A-Z]_[A-Z_]+$" & !x %in% MO_lookup$mo
  if (any(ind)) {
    # get the ones that match
@@ -2066,6 +2072,12 @@ replace_old_mo_codes <- function(x, property) {
                                                               MO_lookup$fullname_lower %like_case% name]
                                     if (length(results) > 1) {
                                       all_direct_matches <<- FALSE
+                                     } else if (length(results) == 0) {
+                                       # not found, so now search in old taxonomic names
+                                       results <- MO.old_lookup$fullname_new[MO.old_lookup$fullname_lower %like% name]
+                                       if (length(results) > 0) {
+                                         results <- MO_lookup$mo[match(results, MO_lookup$fullname)]
+                                       }
                                     }
                                     results[1L]
                                   }), use.names = FALSE)
@@ -2073,6 +2085,8 @@ replace_old_mo_codes <- function(x, property) {
    # assign on places where a match was found
    x[ind] <- solved
    n_matched <- length(affected[!is.na(affected)])
+    n_solved <- length(affected[!is.na(solved)])
+    n_unsolved <- length(affected[is.na(solved)])
    n_unique <- length(affected_unique[!is.na(affected_unique)])
    if (n_unique < n_matched) {
      n_unique <- paste0(n_unique, " unique, ")
@@ -2086,12 +2100,17 @@ replace_old_mo_codes <- function(x, property) {
                      "Please update your MO codes with `as.mo()` to increase speed."),
               call = FALSE)
    } else {
-      warning_(paste0(n_matched, " old MO code", ifelse(n_matched == 1, "", "s"), 
-                      " (", n_unique, "from a previous AMR package version) ", 
-                      ifelse(n_matched == 1, "was", "were"), 
+      warning_(paste0("The input contained ", n_matched,
+                      " old MO code", ifelse(n_matched == 1, "", "s"),
+                      " (", n_unique, "from a previous AMR package version). ",
+                      n_solved, " old MO code", ifelse(n_solved == 1, "", "s"), 
+                      ifelse(n_solved == 1, " was", " were"), 
                      ifelse(all_direct_matches, " updated ", font_bold(" guessed ")),
-                      "to ", ifelse(n_matched == 1, "a ", ""), 
-                      "currently used MO code", ifelse(n_matched == 1, "", "s"), "."),
+                      "to ", ifelse(n_solved == 1, "a ", ""), 
+                      "currently used MO code", ifelse(n_solved == 1, "", "s"),
+                      ifelse(n_unsolved > 0,
+                             paste0(" and ", n_unsolved, " old MO code", ifelse(n_unsolved == 1, "", "s"), " could not be updated."),
+                             ".")),
               call = FALSE)
    }
  }
--- a/R/rsi.R
+++ b/R/rsi.R
@@ -75,7 +75,7 @@
 #' 
 #' ## Machine-Readable Interpretation Guidelines
 #' 
-#' The repository of this package [contains a machine-readable version](https://github.com/msberends/AMR/blob/master/data-raw/rsi_translation.txt) of all guidelines. This is a CSV file consisting of `r format(nrow(AMR::rsi_translation), big.mark = ",")` rows and `r ncol(AMR::rsi_translation)` columns. This file is machine-readable, since it contains one row for every unique combination of the test method (MIC or disk diffusion), the antimicrobial agent and the microorganism. **This allows for easy implementation of these rules in laboratory information systems (LIS)**. Note that it only contains interpretation guidelines for humans - interpretation guidelines from CLSI for animals were removed.
+#' The repository of this package [contains a machine-readable version](https://github.com/msberends/AMR/blob/main/data-raw/rsi_translation.txt) of all guidelines. This is a CSV file consisting of `r format(nrow(AMR::rsi_translation), big.mark = ",")` rows and `r ncol(AMR::rsi_translation)` columns. This file is machine-readable, since it contains one row for every unique combination of the test method (MIC or disk diffusion), the antimicrobial agent and the microorganism. **This allows for easy implementation of these rules in laboratory information systems (LIS)**. Note that it only contains interpretation guidelines for humans - interpretation guidelines from CLSI for animals were removed.
 #'
 #' ## Other
 #' 
--- a/R/sysdata.rda
+++ b/R/sysdata.rda
--- a/R/translate.R
+++ b/R/translate.R
@@ -27,7 +27,7 @@
 #'
 #' For language-dependent output of AMR functions, like [mo_name()], [mo_gramstain()], [mo_type()] and [ab_name()].
 #' @inheritSection lifecycle Stable Lifecycle
-#' @details Strings will be translated to foreign languages if they are defined in a local translation file. Additions to this file can be suggested at our repository. The file can be found here: <https://github.com/msberends/AMR/blob/master/data-raw/translations.tsv>. This file will be read by all functions where a translated output can be desired, like all [`mo_*`][mo_property()] functions (such as [mo_name()], [mo_gramstain()], [mo_type()], etc.) and [`ab_*`][ab_property()] functions (such as [ab_name()], [ab_group()], etc.). 
+#' @details Strings will be translated to foreign languages if they are defined in a local translation file. Additions to this file can be suggested at our repository. The file can be found here: <https://github.com/msberends/AMR/blob/main/data-raw/translations.tsv>. This file will be read by all functions where a translated output can be desired, like all [`mo_*`][mo_property()] functions (such as [mo_name()], [mo_gramstain()], [mo_type()], etc.) and [`ab_*`][ab_property()] functions (such as [ab_name()], [ab_group()], etc.). 
 #'
 #' Currently supported languages are: `r vector_and(gsub(";.*", "", ISOcodes::ISO_639_2[which(ISOcodes::ISO_639_2$Alpha_2 %in% LANGUAGES_SUPPORTED), "Name"]), quotes = FALSE)`. All these languages have translations available for all antimicrobial agents and colloquial microorganism names.
 #'