diff --git a/.Rbuildignore b/.Rbuildignore index 23bb08a6..7e567125 100755 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -21,3 +21,4 @@ ^Meta$ ^pkgdown$ ^public$ +^reproduction.*R$ diff --git a/DESCRIPTION b/DESCRIPTION index a2047d31..f7d59cd2 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR Version: 0.5.0.9018 -Date: 2019-02-18 +Date: 2019-02-20 Title: Antimicrobial Resistance Analysis Authors@R: c( person( @@ -56,6 +56,7 @@ Suggests: covr (>= 3.0.1), curl, ggplot2, + microbenchmark, readxl, rmarkdown, rstudioapi, diff --git a/NAMESPACE b/NAMESPACE index c87056bd..23c2c8f5 100755 --- a/NAMESPACE +++ b/NAMESPACE @@ -68,6 +68,7 @@ export(atc_trivial_nl) export(atc_umcg) export(availability) export(brmo) +export(catalogue_of_life_version) export(count_I) export(count_IR) export(count_R) @@ -123,11 +124,11 @@ export(mo_ref) export(mo_renamed) export(mo_shortname) export(mo_species) -export(mo_subkingdom) export(mo_subspecies) export(mo_taxonomy) export(mo_type) export(mo_uncertainties) +export(mo_url) export(mo_year) export(mrgn) export(n_rsi) diff --git a/NEWS.md b/NEWS.md index 73952f58..6d347bb1 100755 --- a/NEWS.md +++ b/NEWS.md @@ -10,12 +10,12 @@ We've got a new website: [https://msberends.gitlab.io/AMR](https://msberends.git #### New * **BREAKING**: removed deprecated functions, parameters and references to 'bactid'. Use `as.mo()` to identify an MO code. -* Catalogue of Life (CoL) inclusion for data about microorganisms, which also contains all ITIS data we used previously. The `microorganisms` data set now contains: - * Almost 60,000 species from six different kingdoms - * Almost 15,000 previously accepted names which are now taxonomic 'synonyms' - * All (sub)species from the kingdoms Archaea, Bacteria, Chromista, Protozoa and Viruses - * All (sub)species from the orders Eurotiales, Saccharomycetales and Onygenales of the kingdom Fungi. The complete taxonomy of this kingdom has more than 130,000 species. The orders we included contains at least all memebers of the families *Candida*, *Aspergillus* and *Trichophyton*. - * Due to this change, the ID of *Streptococcus* was changed from `B_STRPTC` to `B_STRPT`. +* Catalogue of Life as a new taxonomic source for data about microorganisms, which also contains all ITIS data we used previously. The `microorganisms` data set now contains: + * All ~55,000 species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses + * All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different species, of which most are not microbial. Including everything tremendously slows down our algortihms, and not all fungi fit the scope of this package. By only including the aforementioned taxonomic orders, the most relevant species are covered (like genera *Aspergillus*, *Candida*, *Pneumocystis*, *Saccharomyces* and *Trichophyton*). + * All ~15,000 previously accepted names of species that have been taxonomically renamed + * The responsible author(s) and year of scientific publication + * Due to this change, some `mo` codes changed (e.g. *Streptococcus* changed from `B_STRPTC` to `B_STRPT`). A translation table is used internally to support older microorganism IDs, so users will not notice this difference. * Support for data from [WHONET](https://whonet.org/) and [EARS-Net](https://ecdc.europa.eu/en/about-us/partnerships-and-networks/disease-and-laboratory-networks/ears-net) (European Antimicrobial Resistance Surveillance Network): * Exported files from WHONET can be read and used in this package. For functions like `first_isolate()` and `eucast_rules()`, all parameters will be filled in automatically. * This package now knows all antibiotic abbrevations by EARS-Net (which are also being used by WHONET) - the `antibiotics` data set now contains a column `ears_net`. diff --git a/R/catalogue_of_life.R b/R/catalogue_of_life.R new file mode 100755 index 00000000..25c41ba5 --- /dev/null +++ b/R/catalogue_of_life.R @@ -0,0 +1,73 @@ +# ==================================================================== # +# TITLE # +# Antimicrobial Resistance (AMR) Analysis # +# # +# SOURCE # +# https://gitlab.com/msberends/AMR # +# # +# LICENCE # +# (c) 2019 Berends MS (m.s.berends@umcg.nl), Luz CF (c.f.luz@umcg.nl) # +# # +# This R package is free software; you can freely use and distribute # +# it for both personal and commercial purposes under the terms of the # +# GNU General Public License version 2.0 (GNU GPL-2), as published by # +# the Free Software Foundation. # +# # +# This R package was created for academic research and was publicly # +# released in the hope that it will be useful, but it comes WITHOUT # +# ANY WARRANTY OR LIABILITY. # +# Visit our website for more info: https://msberends.gitab.io/AMR. # +# ==================================================================== # + +#' The Catalogue of Life +#' +#' This package contains the complete taxonomic tree of almost all microorganisms from the authoritative and comprehensive Catalogue of Life. +#' @section Catalogue of Life: +#' \if{html}{\figure{logo_col.png}{options: height=60px style=margin-bottom:5px} \cr} +#' This package contains the complete taxonomic tree of almost all microorganisms from the authoritative and comprehensive Catalogue of Life (\url{http://www.catalogueoflife.org}). This data is updated annually - check the included version with \code{\link{catalogue_of_life_version}}. +#' +#' Included are: +#' \itemize{ +#' \item{All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses} +#' \item{All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different species, of which most are not microbial. Including everything tremendously slows down our algortihms, and not all fungi fit the scope of this package. By only including the aforementioned taxonomic orders, the most relevant species are covered (like genera \emph{Aspergillus}, \emph{Candida}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} +#' \item{All ~15,000 previously accepted names of (sub)species that have been taxonomically renamed} +#' \item{The complete taxonomic tree of all included (sub)species: from kingdom to subspecies} +#' \item{The responsible author(s) and year of scientific publication} +#' } +#' +#' The Catalogue of Life (\url{http://www.catalogueoflife.org}) is the most comprehensive and authoritative global index of species currently available. It holds essential information on the names, relationships and distributions of over 1.6 million species. The Catalogue of Life is used to support the major biodiversity and conservation information services such as the Global Biodiversity Information Facility (GBIF), Encyclopedia of Life (EoL) and the International Union for Conservation of Nature Red List. It is recognised by the Convention on Biological Diversity as a significant component of the Global Taxonomy Initiative and a contribution to Target 1 of the Global Strategy for Plant Conservation. +#' +#' The syntax used to transform the original data to a cleansed R format, can be found here: \url{https://gitlab.com/msberends/AMR/blob/master/reproduction_of_microorganisms.R}. +#' @inheritSection AMR Read more on our website! +#' @name catalogue_of_life +#' @rdname catalogue_of_life +#' @examples +#' # Get version info of included data set +#' catalogue_of_life_version() +#' +#' +#' # Get a note when a species was renamed +#' mo_shortname("Chlamydia psittaci") +#' # Note: 'Chlamydia psittaci' (Page, 1968) was renamed +#' # 'Chlamydophila psittaci' (Everett et al., 1999) +#' # [1] "C. psittaci" +#' +#' # Get any property from the entire taxonomic tree for all included species +#' mo_class("E. coli") +#' # [1] "Gammaproteobacteria" +#' +#' mo_family("E. coli") +#' # [1] "Enterobacteriaceae" +#' +#' mo_gramstain("E. coli") # based on kingdom and phylum, see ?mo_gramstain +#' # [1] "Gram negative" +#' +#' mo_ref("E. coli") +#' # [1] "Castellani et al., 1919" +#' +#' # Do not get mistaken - the package only includes microorganisms +#' mo_phylum("C. elegans") +#' # [1] "Cyanobacteria" # Bacteria?! +#' mo_fullname("C. elegans") +#' # [1] "Chroococcus limneticus elegans" # Because a microorganism was found +NULL diff --git a/R/data.R b/R/data.R index 3dac1489..85e4c27d 100755 --- a/R/data.R +++ b/R/data.R @@ -133,26 +133,25 @@ #' Data set with ~60,000 microorganisms #' #' A data set containing the microbial taxonomy of six kingdoms from the Catalogue of Life. MO codes can be looked up using \code{\link{as.mo}}. -#' @inheritSection ITIS ITIS -#' @format A \code{\link{data.frame}} with 56,659 observations and 15 variables: +#' @inheritSection catalogue_of_life Catalogue of Life +#' @format A \code{\link{data.frame}} with 56,672 observations and 14 variables: #' \describe{ -#' \item{\code{mo}}{ID of microorganism} +#' \item{\code{mo}}{ID of microorganism as used by this package} #' \item{\code{col_id}}{Catalogue of Life ID} -#' \item{\code{genus}}{Taxonomic genus of the microorganism as found in ITIS, see Source} -#' \item{\code{species}}{Taxonomic species of the microorganism as found in ITIS, see Source} -#' \item{\code{subspecies}}{Taxonomic subspecies of the microorganism as found in ITIS, see Source} #' \item{\code{fullname}}{Full name, like \code{"Echerichia coli"}} -#' \item{\code{family}}{Taxonomic family of the microorganism as found in ITIS, see Source} -#' \item{\code{order}}{Taxonomic order of the microorganism as found in ITIS, see Source} -#' \item{\code{class}}{Taxonomic class of the microorganism as found in ITIS, see Source} -#' \item{\code{phylum}}{Taxonomic phylum of the microorganism as found in ITIS, see Source} -#' \item{\code{subkingdom}}{Taxonomic subkingdom of the microorganism as found in ITIS, see Source} -#' \item{\code{kingdom}}{Taxonomic kingdom of the microorganism as found in ITIS, see Source} -#' \item{\code{gramstain}}{Gram of microorganism, like \code{"Gram negative"}} -#' \item{\code{prevalence}}{An integer based on estimated prevalence of the microorganism in humans. Used internally by \code{\link{as.mo}}, otherwise quite meaningless. It has a value of 25 for manually added items and a value of 1000 for all unprevalent microorganisms whose genus was somewhere in the top 250 (with another species).} -#' \item{\code{ref}}{Author(s) and year of concerning publication as found in ITIS, see Source} +#' \item{\code{kingdom}}{Taxonomic kingdom of the microorganism} +#' \item{\code{phylum}}{Taxonomic phylum of the microorganism} +#' \item{\code{class}}{Taxonomic class of the microorganism} +#' \item{\code{order}}{Taxonomic order of the microorganism} +#' \item{\code{family}}{Taxonomic family of the microorganism} +#' \item{\code{genus}}{Taxonomic genus of the microorganism} +#' \item{\code{species}}{Taxonomic species of the microorganism} +#' \item{\code{subspecies}}{Taxonomic subspecies of the microorganism} +#' \item{\code{rank}}{Taxonomic rank of the microorganism, like \code{"species"} or \code{"genus"}} +#' \item{\code{ref}}{Author(s) and year of concerning scientific publication} +#' \item{\code{species_id}}{ID of the species as used by the Catalogue of Life} #' } -#' @source Integrated Taxonomic Information System (ITIS) public online database, \url{https://www.itis.gov}. +#' @source Catalogue of Life: Annual Checklist (public online database), \url{www.catalogueoflife.org}. #' @details Manually added were: #' \itemize{ #' \item{9 species of \emph{Streptococcus} (beta haemolytic groups A, B, C, D, F, G, H, K and unspecified)} @@ -160,21 +159,37 @@ #' \item{2 other undefined (unknown Gram negatives and unknown Gram positives)} #' } #' @inheritSection AMR Read more on our website! -#' @seealso \code{\link{as.mo}} \code{\link{mo_property}} \code{\link{microorganisms.codes}} +#' @seealso \code{\link{as.mo}}, \code{\link{mo_property}}, \code{\link{microorganisms.codes}} "microorganisms" +catalogue_of_life <- list( + version = "Catalogue of Life: 2018 Annual Checklist", + url = "http://www.catalogueoflife.org/annual-checklist/2018" +) + +#' Version info of included Catalogue of Life +#' @seealso \code{\link{microorganisms}} +#' @inheritSection catalogue_of_life Catalogue of Life +#' @export +catalogue_of_life_version <- function() { + list(version = catalogue_of_life$version, + url = catalogue_of_life$url, + no_of_species = nrow(AMR::microorganisms), + no_of_synonyms = nrow(AMR::microorganisms.old)) +} + #' Data set with previously accepted taxonomic names #' -#' A data set containing old (previously valid or accepted) taxonomic names according to ITIS. This data set is used internally by \code{\link{as.mo}}. -#' @inheritSection as.mo ITIS +#' A data set containing old (previously valid or accepted) taxonomic names according to the Catalogue of Life. This data set is used internally by \code{\link{as.mo}}. +#' @inheritSection catalogue_of_life Catalogue of Life #' @format A \code{\link{data.frame}} with 14,506 observations and 4 variables: #' \describe{ #' \item{\code{col_id}}{Catalogue of Life ID} #' \item{\code{tsn_new}}{New Catalogue of Life ID} -#' \item{\code{fullname}}{Old taxonomic name of the microorganism as found in the CoL, see Source} -#' \item{\code{ref}}{Author(s) and year of concerning publication as found in the CoL, see Source} +#' \item{\code{fullname}}{Old taxonomic name of the microorganism} +#' \item{\code{ref}}{Author(s) and year of concerning scientific publication} #' } -#' @source [3] Integrated Taxonomic Information System (ITIS) on-line database, \url{https://www.itis.gov}. +#' @source [3] Catalogue of Life: Annual Checklist (public online database), \url{www.catalogueoflife.org}. #' @inheritSection AMR Read more on our website! #' @seealso \code{\link{as.mo}} \code{\link{mo_property}} \code{\link{microorganisms}} "microorganisms.old" @@ -187,7 +202,7 @@ #' \item{\code{certe}}{Commonly used code of a microorganism} #' \item{\code{mo}}{ID of the microorganism in the \code{\link{microorganisms}} data set} #' } -#' @inheritSection ITIS ITIS +#' @inheritSection catalogue_of_life Catalogue of Life #' @inheritSection AMR Read more on our website! #' @seealso \code{\link{as.mo}} \code{\link{microorganisms}} "microorganisms.codes" diff --git a/R/globals.R b/R/globals.R index b868fea1..caf62176 100755 --- a/R/globals.R +++ b/R/globals.R @@ -29,6 +29,7 @@ globalVariables(c(".", "Becker", "certe", "cnt", + "col_id", "count", "count.x", "count.y", @@ -49,6 +50,7 @@ globalVariables(c(".", "key_ab", "key_ab_lag", "key_ab_other", + "kingdom", "labs", "Lancefield", "Last name", @@ -73,7 +75,9 @@ globalVariables(c(".", "other_pat_or_mo", "Pasted", "patient_id", + "phylum", "prevalence", + "prevalent", "psae", "R", "real_first_isolate", @@ -85,6 +89,7 @@ globalVariables(c(".", "Sex", "shortname", "species", + "superprevalent", "trade_name", "transmute", "tsn", diff --git a/R/itis.R b/R/itis.R deleted file mode 100755 index 0b5b812e..00000000 --- a/R/itis.R +++ /dev/null @@ -1,63 +0,0 @@ -# ==================================================================== # -# TITLE # -# Antimicrobial Resistance (AMR) Analysis # -# # -# SOURCE # -# https://gitlab.com/msberends/AMR # -# # -# LICENCE # -# (c) 2019 Berends MS (m.s.berends@umcg.nl), Luz CF (c.f.luz@umcg.nl) # -# # -# This R package is free software; you can freely use and distribute # -# it for both personal and commercial purposes under the terms of the # -# GNU General Public License version 2.0 (GNU GPL-2), as published by # -# the Free Software Foundation. # -# # -# This R package was created for academic research and was publicly # -# released in the hope that it will be useful, but it comes WITHOUT # -# ANY WARRANTY OR LIABILITY. # -# Visit our website for more info: https://msberends.gitab.io/AMR. # -# ==================================================================== # - -#' ITIS: Integrated Taxonomic Information System -#' -#' All taxonomic names of all microorganisms are included in this package, using the authoritative Integrated Taxonomic Information System (ITIS). -#' @section ITIS: -#' \if{html}{\figure{logo_itis.jpg}{options: height=60px style=margin-bottom:5px} \cr} -#' This package contains the \strong{complete microbial taxonomic data} (with all nine taxonomic ranks - from kingdom to subspecies) from the publicly available Integrated Taxonomic Information System (ITIS, \url{https://www.itis.gov}). -#' -#' All ~20,000 (sub)species from \strong{the taxonomic kingdoms Bacteria, Fungi and Protozoa are included in this package}, as well as all their ~2,500 previously accepted names known to ITIS. Furthermore, the responsible authors and year of publication are available. This allows users to use authoritative taxonomic information for their data analysis on any microorganism, not only human pathogens. It also helps to quickly determine the Gram stain of bacteria, since ITIS honours the taxonomic branching order of bacterial phyla according to Cavalier-Smith (2002), which defines that all bacteria are classified into either subkingdom Negibacteria or subkingdom Posibacteria. -#' -#' ITIS is a partnership of U.S., Canadian, and Mexican agencies and taxonomic specialists [3]. -#' @inheritSection AMR Read more on our website! -#' @name ITIS -#' @rdname ITIS -#' @examples -#' # Get a note when a species was renamed -#' mo_shortname("Chlamydia psittaci") -#' # Note: 'Chlamydia psittaci' (Page, 1968) was renamed -#' # 'Chlamydophila psittaci' (Everett et al., 1999) -#' # [1] "C. psittaci" -#' -#' # Get any property from the entire taxonomic tree for all included species -#' mo_class("E. coli") -#' # [1] "Gammaproteobacteria" -#' -#' mo_family("E. coli") -#' # [1] "Enterobacteriaceae" -#' -#' mo_subkingdom("E. coli") -#' # [1] "Negibacteria" -#' -#' mo_gramstain("E. coli") # based on subkingdom -#' # [1] "Gram negative" -#' -#' mo_ref("E. coli") -#' # [1] "Castellani and Chalmers, 1919" -#' -#' # Do not get mistaken - the package only includes microorganisms -#' mo_phylum("C. elegans") -#' # [1] "Cyanobacteria" # Bacteria?! -#' mo_fullname("C. elegans") -#' # [1] "Chroococcus limneticus elegans" # Because a microorganism was found -NULL diff --git a/R/mo.R b/R/mo.R index 85b2dfc1..e2550ed8 100755 --- a/R/mo.R +++ b/R/mo.R @@ -77,7 +77,7 @@ #' \item{It strips off values between brackets and the brackets itself, and re-evaluates the input with all previous rules} #' \item{It strips off words from the end one by one and re-evaluates the input with all previous rules} #' \item{It strips off words from the start one by one and re-evaluates the input with all previous rules} -#' \item{It tries to look for some manual changes which are not yet published to the ITIS database (like \emph{Propionibacterium} not yet being \emph{Cutibacterium})} +#' \item{It tries to look for some manual changes which are not yet published to the Catalogue of Life (like \emph{Propionibacterium} not yet being \emph{Cutibacterium})} #' } #' #' Examples: @@ -94,17 +94,17 @@ #' #' Use \code{mo_renamed()} to get a vector with all values that could be coerced based on an old, previously accepted taxonomic name. #' -#' @inheritSection ITIS ITIS +#' @inheritSection catalogue_of_life Catalogue of Life # (source as a section, so it can be inherited by other man pages) #' @section Source: #' [1] Becker K \emph{et al.} \strong{Coagulase-Negative Staphylococci}. 2014. Clin Microbiol Rev. 27(4): 870–926. \url{https://dx.doi.org/10.1128/CMR.00109-13} #' #' [2] Lancefield RC \strong{A serological differentiation of human and other groups of hemolytic streptococci}. 1933. J Exp Med. 57(4): 571–95. \url{https://dx.doi.org/10.1084/jem.57.4.571} #' -#' [3] Integrated Taxonomic Information System (ITIS). Retrieved September 2018. \url{http://www.itis.gov} +#' [3] Catalogue of Life: Annual Checklist (public online database), \url{www.catalogueoflife.org}. #' @export #' @return Character (vector) with class \code{"mo"}. Unknown values will return \code{NA}. -#' @seealso \code{\link{microorganisms}} for the \code{data.frame} with ITIS content that is being used to determine ID's. \cr +#' @seealso \code{\link{microorganisms}} for the \code{data.frame} that is being used to determine ID's. \cr #' The \code{\link{mo_property}} functions (like \code{\link{mo_genus}}, \code{\link{mo_gramstain}}) to get properties based on the returned code. #' @inheritSection AMR Read more on our website! #' @examples @@ -216,15 +216,15 @@ exec_as.mo <- function(x, Becker = FALSE, Lancefield = FALSE, x <- x[!is.na(x) & !is.null(x) & !identical(x, "")] - # conversion v0.5.0 to v0.6.0, remove for v0.7.0 - x <- gsub("B_STRPTC", "B_STRPT", x) - x <- gsub("B_STRPT_EQUI", "B_STRPT_EQU", x) - x <- gsub("B_PDMNS", "B_PSDMN", x) - x <- gsub("B_CTRDM", "B_CLSTR", x) - x <- gsub("F_CANDD_GLB", "F_CANDD_GLA", x) - x <- gsub("F_CANDD_LUS", "F_CANDD", x) - x <- gsub("B_FCTRM", "B_FSBCT", x) - + # conversion of old MO codes from v0.5.0 (ITIS) to later versions (Catalogue of Life) + if (any(x %like% "^[BFP]_[A-Z]{3,7}")) { + leftpart <- gsub("^([BFP]_[A-Z]{3,7}).*", "\\1", x) + if (any(leftpart %in% names(mo_codes_v0.5.0))) { + rightpart <- gsub("^[BFP]_[A-Z]{3,7}(.*)", "\\1", x) + leftpart <- mo_codes_v0.5.0[leftpart] + x[!is.na(leftpart)] <- paste0(leftpart[!is.na(leftpart)], rightpart[!is.na(leftpart)]) + } + } # defined df to check for if (!is.null(reference_df)) { diff --git a/R/mo_property.R b/R/mo_property.R index 9443687e..82b409c3 100755 --- a/R/mo_property.R +++ b/R/mo_property.R @@ -26,14 +26,18 @@ #' @param property one of the column names of one of the \code{\link{microorganisms}} data set or \code{"shortname"} #' @param language language of the returned text, defaults to system language (see \code{\link{get_locale}}) and can also be set with \code{\link{getOption}("AMR_locale")}. Use \code{language = NULL} or \code{language = ""} to prevent translation. #' @param ... other parameters passed on to \code{\link{as.mo}} -#' @details All functions will return the most recently known taxonomic property according to ITIS, except for \code{mo_ref}, \code{mo_authors} and \code{mo_year}. This leads to the following results: +#' @details All functions will return the most recently known taxonomic property according to the Catalogue of Life, except for \code{mo_ref}, \code{mo_authors} and \code{mo_year}. This leads to the following results: #' \itemize{ #' \item{\code{mo_fullname("Chlamydia psittaci")} will return \code{"Chlamydophila psittaci"} (with a warning about the renaming)} #' \item{\code{mo_ref("Chlamydia psittaci")} will return \code{"Page, 1968"} (with a warning about the renaming)} #' \item{\code{mo_ref("Chlamydophila psittaci")} will return \code{"Everett et al., 1999"} (without a warning)} #' } +#' +#' The Gram stain - \code{mo_gramstain()} - will be determined on the taxonomic kingdom and phylum. According to Cavalier-Smith (2002) who defined subkingdoms Negibacteria and Posibacteria, only these phyla are Posibacteria: Actinobacteria, Chloroflexi, Firmicutes and Tenericutes (ref: \url{https://itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=956097}). These bacteria are considered Gram positive - all other bacteria are considered Gram negative. Species outside the kingdom of Bacteria will return a value \code{NA}. +#' +#' The function \code{mo_url()} will return the direct URL to the species in the Catalogue of Life. #' @inheritSection get_locale Supported languages -#' @inheritSection ITIS ITIS +#' @inheritSection catalogue_of_life Catalogue of Life #' @inheritSection as.mo Source #' @rdname mo_property #' @name mo_property @@ -49,14 +53,13 @@ #' # All properties of Escherichia coli #' ## taxonomic properties #' mo_kingdom("E. coli") # "Bacteria" -#' mo_subkingdom("E. coli") # "Negibacteria" #' mo_phylum("E. coli") # "Proteobacteria" #' mo_class("E. coli") # "Gammaproteobacteria" #' mo_order("E. coli") # "Enterobacteriales" #' mo_family("E. coli") # "Enterobacteriaceae" #' mo_genus("E. coli") # "Escherichia" #' mo_species("E. coli") # "coli" -#' mo_subspecies("E. coli") # NA +#' mo_subspecies("E. coli") # "" #' #' ## colloquial properties #' mo_fullname("E. coli") # "Escherichia coli" @@ -220,12 +223,6 @@ mo_phylum <- function(x, ...) { mo_validate(x = x, property = "phylum", ...) } -#' @rdname mo_property -#' @export -mo_subkingdom <- function(x, ...) { - mo_validate(x = x, property = "subkingdom", ...) -} - #' @rdname mo_property #' @export mo_kingdom <- function(x, ...) { @@ -290,6 +287,15 @@ mo_taxonomy <- function(x, ...) { subspecies = mo_subspecies(x)) } +#' @rdname mo_property +#' @export +mo_url <- function(x, ...) { + u <- mo_validate(x = x, property = "species_id", ...) + u[u != ""] <- paste0(catalogue_of_life$url, "/details/species/id/", u) + u +} + + #' @rdname mo_property #' @importFrom data.table data.table as.data.table setkey #' @export diff --git a/R/zzz.R b/R/zzz.R index 7dcfa05b..4e3c2bef 100755 --- a/R/zzz.R +++ b/R/zzz.R @@ -128,6 +128,198 @@ NULL value = microorganisms.oldDT, envir = asNamespace("AMR")) + # conversion of old MO codes from v0.5.0 (ITIS) to later versions (Catalogue of Life) + mo_codes_v0.5.0 <- c(B_ACHRMB = "B_ACHRM", B_ANNMA = "B_ACTNS", B_ACLLS = "B_ALCYC", + B_AHNGM = "B_ARCHN", B_ARMTM = "B_ARMTMN", B_ARTHRS = "B_ARTHR", + B_APHLS = "B_AZRHZP", B_BRCHA = "B_BRCHY", B_BCTRM = "B_BRVBCT", + B_CLRBCT = "B_CLRBC", B_CTRDM = "B_CLSTR", B_CPRMM = "B_CYLND", + B_DLCLN = "B_DPLCL", B_DMCLM = "B_DSLFT", B_DSLFVB = "B_DSLFV", + B_FCTRM = "B_FSBCT", B_GNRLA = "B_GRDNR", B_HNRBM = "B_HLNRB", + B_HPHGA = "B_HNPHGA", B_HCCCS = "B_HYDRC", B_MCRCLS = "B_MCRCL", + B_MTHYLS = "B_MLSMA", B_MARCLS = "B_MRCLS", B_MGCLS = "B_MSTGC", + B_MCLLA = "B_MTHYLC", B_MYCPLS = "B_MYCPL", B_NBCTR = "B_NTRBC", + B_OCLLS = "B_OCNBC", B_PTHRX = "B_PLNKT", B_PCCCS = "B_PRCHL", + B_PSPHN = "B_PRPHY", B_PDMNS = "B_PSDMN", B_SCCHRP = "B_SCCHR", + B_SRBCTR = "B_SHRBCTR", B_STRPTC = "B_STRPT", B_SHMNS = "B_SYNTR", + B_TRBCTR = "B_THRMN", P_ALBMN = "C_ABMNA", F_ACHLY = "C_ACHLY", + P_ACINT = "C_ACINT", P_ARTCL = "C_ACLNA", P_ACRVL = "C_ACRVL", + P_ADRCT = "C_ADRCT", P_AMPHS = "C_AHSRS", F_ALBUG = "C_ALBUG", + P_ALCNT = "C_ALCNT", P_ALFRD = "C_ALFRD", P_ALLGR = "C_ALLGR", + P_AMPHL = "C_ALPTS", F_ALTHR = "C_ALTHR", P_AMLLA = "C_AMLLA", + P_ANMLN = "C_AMLNA", P_AMMBC = "C_AMMBC", P_AMMDS = "C_AMMDS", + P_AMMLG = "C_AMMLG", P_AMMMR = "C_AMMMR", P_AMMMS = "C_AMMMS", + P_AMMON = "C_AMMON", P_AMMSC = "C_AMMSC", P_AMMSP = "C_AMMSP", + P_AMMST = "C_AMMST", P_AMMTM = "C_AMMTM", F_AMYCS = "C_AMYCS", + P_ANARM = "C_ANARM", P_ANGLD = "C_ANGLD", P_ANGLG = "C_ANGLG", + P_ANNLC = "C_ANNLC", F_ANSLP = "C_ANSLP", F_APDCH = "C_APDCH", + F_APHND = "C_APHND", F_APLNC = "C_APLNC", F_AQLND = "C_AQLND", + P_ARCHS = "C_ARCHAS", P_ASTRN = "C_ARNNN", P_ARNPR = "C_ARNPR", + F_ARSPR = "C_ARSPR", P_ARTST = "C_ARTSTR", P_AMPHC = "C_ARYNA", + P_ASCHM = "C_ASCHM", P_ASPDS = "C_ASPDS", P_ASTCL = "C_ASTCL", + P_ASTRG = "C_ASTRGR", P_ASTRM = "C_ASTRMM", P_ASTRR = "C_ASTRR", + P_ASTRT = "C_ASTRTR", F_ATKNS = "C_ATKNS", F_AYLLA = "C_AYLLA", + P_BAGGN = "C_BAGGN", P_BCCLL = "C_BCCLL", P_BDLLD = "C_BDLLD", + P_BGNRN = "C_BGNRN", P_BLCLN = "C_BLCLN", P_BLMND = "C_BLMND", + P_BLMNL = "C_BLMNL", P_BLPHR = "C_BLPHR", P_BLVNT = "C_BLVNT", + P_BOLVN = "C_BOLVN", P_BORLS = "C_BORLS", P_BRNNM = "C_BRNNM", + P_BRSLN = "C_BRSLN", P_BRSRD = "C_BRSRD", F_BRVLG = "C_BRVLG", + F_BNLLA = "C_BRVLGN", P_BSCCM = "C_BSCCM", F_BSDPH = "C_BSDPH", + P_BTHYS = "C_BTHYS", P_BTLLN = "C_BTLLN", P_BULMN = "C_BULMN", + P_CCLDM = "C_CCLDM", P_CDNLL = "C_CDNLL", P_CLPSS = "C_CDNLLP", + P_CHLDN = "C_CHLDNL", P_CHLST = "C_CHLST", P_CHNLM = "C_CHNLM", + P_CHRYS = "C_CHRYSL", P_CHTSP = "C_CHTSP", P_CBCDS = "C_CIBCDS", + P_CLCRN = "C_CLCRN", P_CLMNA = "C_CLMNA", P_CLPDM = "C_CLPDM", + P_CLPHR = "C_CLPHRY", P_CLVLN = "C_CLVLN", P_CMPNL = "C_CMPNL", + P_CNCRS = "C_CNCRS", P_CNTCH = "C_CNTCH", F_CNTRM = "C_CNTRMY", + P_COLPD = "C_COLPD", P_COLPS = "C_COLPS", P_CPRDS = "C_CPRDS", + P_CRNSP = "C_CPRMA", P_CRBNL = "C_CRBNL", P_CRBRB = "C_CRBRB", + P_CRBRG = "C_CRBRG", P_CRBRS = "C_CRBRS", P_CRCHS = "C_CRCHS", + P_CRCLC = "C_CRCLC", P_CRNLC = "C_CRNLC", P_CRNTH = "C_CRNTH", + P_CRPNT = "C_CRPNT", P_CRSTG = "C_CRSTG", P_CRTHN = "C_CRTHN", + P_CRTRN = "C_CRTRN", P_CYMBL = "C_CRTTA", P_CRYPT = "C_CRYPT", + P_CSHMN = "C_CSHMNL", P_CSSDL = "C_CSSDL", P_CLNDS = "C_CSSDLN", + P_CHRNA = "C_CTHRN", P_CTPSS = "C_CTPSS", P_CUNLN = "C_CUNLN", + P_CYLND = "C_CVLNA", P_CYCLC = "C_CYCLCB", P_CDNTA = "C_CYCLD", + P_CYCLG = "C_CYCLG", P_CYCLM = "C_CYCLM", P_CYRTL = "C_CYRTL", + P_CYSTM = "C_CYSTM", P_DCHLM = "C_DCHLM", P_DCRBS = "C_DCRBS", + P_DCTYC = "C_DCTYC", P_DIDNM = "C_DIDNM", P_DLPTS = "C_DLPTS", + P_DNTLN = "C_DNTLN", P_DNTST = "C_DNTST", P_DORTH = "C_DORTH", + P_DCTYP = "C_DPHMS", F_DPLCY = "C_DPLCY", P_DNDRT = "C_DRTNA", + P_DSCMM = "C_DSCMM", P_DSCRB = "C_DSCRB", P_DSCRN = "C_DSCRN", + P_DSCSP = "C_DSCSP", P_DSNBR = "C_DSNBR", P_DYCBC = "C_DYCBC", + F_DCTYC = "C_DYCHS", F_ECTRG = "C_ECTRG", B_EDWRD = "C_EDWRD", + P_EGGRL = "C_EGGRL", P_EHLYS = "C_EHLYS", P_EHRNB = "C_EHRNB", + P_ELPHD = "C_ELPHD", P_ENCHL = "C_ELYDM", P_EPHDM = "C_EPHDM", + P_EPLTS = "C_EPLTS", P_EPLXL = "C_EPLXL", P_EPNDL = "C_EPNDL", + P_EPNDS = "C_EPNDS", P_ENLLA = "C_EPSTM", P_EPSTY = "C_EPSTY", + F_ERYCH = "C_ERYCH", F_ESMDM = "C_ESMDM", P_ESSYR = "C_ESSYR", + P_FSCHR = "C_FHRNA", P_FLRLS = "C_FLRLS", P_FLNTN = "C_FNTNA", + P_FRNDC = "C_FRNDC", P_FRNTN = "C_FRNTN", P_FRSNK = "C_FRSNK", + P_FNLLA = "C_FSCHRN", P_FSSRN = "C_FSSRN", P_FVCSS = "C_FVCSS", + P_GDRYN = "C_GDRYN", F_GELGN = "C_GELGN", P_GERDA = "C_GERDA", + P_GLACM = "C_GLACM", P_GLBBL = "C_GLBBL", P_GLBGR = "C_GLBGR", + P_GLBLN = "C_GLBLN", P_GRTLA = "C_GLBRT", P_GLBTX = "C_GLBTX", + P_GLLNA = "C_GLLNA", P_GLMSP = "C_GLMSP", P_GLNDL = "C_GLNDL", + F_GNMCH = "C_GNMCH", P_GOSLL = "C_GOSLL", P_GRNDS = "C_GRNDS", + P_GRNTA = "C_GRNTA", P_GLBRT = "C_GTLLA", P_GTTLN = "C_GTTLN", + P_GVLNP = "C_GVLNP", P_GYPSN = "C_GYPSN", P_GYRDN = "C_GYRDN", + P_HALTR = "C_HALTR", P_HANZW = "C_HANZW", P_HAURN = "C_HAURN", + P_HELNN = "C_HELNN", P_HLPHR = "C_HHRYA", P_HLNTA = "C_HLNTA", + F_HLPHT = "C_HLPHT", P_HLSTC = "C_HLSTC", P_HMSPH = "C_HMSPH", + P_HMTRM = "C_HMTRM", P_HPKNS = "C_HPKNS", P_HPLPH = "C_HPLPH", + P_HPPCR = "C_HPPCR", P_HNLLA = "C_HPPCRP", P_HRMSN = "C_HRMSN", + P_HRNLL = "C_HRNLL", F_HRPCH = "C_HRPCH", P_HSTGR = "C_HSTGR", + P_HSTTL = "C_HSTTL", P_HTRST = "C_HTGNA", P_HTRLL = "C_HTRLL", + P_HTRPH = "C_HTRPH", F_HYPHC = "C_HYPHC", P_HYPRM = "C_HYPRM", + P_INTRN = "C_INTRN", P_IRIDI = "C_IRIDI", P_ISLND = "C_ISLND", + P_JCLLL = "C_JCLLL", P_KHLLL = "C_KHLLL", P_KRNPS = "C_KRNPS", + P_KRRRL = "C_KRRRL", P_LABOE = "C_LABOE", P_LAGEN = "C_LAGEN", + P_LBSLL = "C_LBSLL", F_LTHLA = "C_LBYRN", P_LCRYM = "C_LCRYM", + P_LEMBS = "C_LEMBS", F_LGNDM = "C_LGNDM", P_LGNMM = "C_LGNMM", + P_LGNPH = "C_LGNPHR", F_LGNSM = "C_LGNSM", P_LGYNP = "C_LGYNP", + P_LITTB = "C_LITTB", P_LITUL = "C_LITUL", P_LMBDN = "C_LMBDN", + P_LMRCK = "C_LMRCK", F_LBYRN = "C_LMYXA", P_LNGLN = "C_LNGLN", + P_LNTCL = "C_LNTCL", P_LOXDS = "C_LOXDS", F_LPTLG = "C_LPTLG", + F_LNLLA = "C_LPTLGN", F_LPTMT = "C_LPTMT", P_LRYNG = "C_LRYNG", + P_LTCRN = "C_LTCRN", P_LTHPL = "C_LTHPL", P_LTNTS = "C_LTNTS", + F_LTRST = "C_LTRST", P_LXPHY = "C_LXPHY", P_MCRTH = "C_MCRTH", + P_MELNS = "C_MELNS", P_MSDNM = "C_MESDNM", P_METPS = "C_METPS", + P_MIMSN = "C_MIMSN", P_MINCN = "C_MINCN", P_MLLNL = "C_MLLNL", + P_MLMMN = "C_MLMMN", F_MNDNL = "C_MNDNL", P_MNLYS = "C_MNLYS", + P_MNPSS = "C_MNPSS", P_MRGNL = "C_MRGNL", P_MRGNP = "C_MRGNP", + P_MRSPL = "C_MRSPL", P_MRTNT = "C_MRTNT", P_MSSLN = "C_MSSLN", + P_MSSSS = "C_MSSSS", P_MTCNT = "C_MTCNT", P_MYCHS = "C_MYCHS", + P_MYSCH = "C_MYSCH", F_MYZCY = "C_MYZCY", P_NASSL = "C_NASSL", + P_NBCLN = "C_NBCLN", P_NBCLR = "C_NBCLR", P_NCNRB = "C_NCNRB", + P_NDBCL = "C_NDBCL", P_NRLLA = "C_NDBCLR", P_NMMLC = "C_NMMLC", + F_NMTPH = "C_NMTPH", P_NNNLL = "C_NNNLL", P_NODSR = "C_NODSR", + P_NONIN = "C_NONIN", P_NOURI = "C_NOURI", P_OCLNA = "C_OCLNA", + P_OGLNA = "C_OGLNA", P_OPHTH = "C_OLMDM", F_OLPDP = "C_OLPDP", + P_ONYCH = "C_OMPSS", P_OOLIN = "C_OOLIN", P_OPRCL = "C_OPRCL", + P_ORBLN = "C_ORBLN", F_ORCAD = "C_ORCAD", P_ORDRS = "C_ORDRS", + P_OPHRY = "C_ORYDM", P_OSNGL = "C_OSNGL", P_OXYTR = "C_OXYTR", + P_PARRN = "C_PARRN", P_PATRS = "C_PATRS", P_PAVNN = "C_PAVNN", + P_PTYCH = "C_PCYLS", P_PDPHR = "C_PDPHR", P_PELSN = "C_PELSN", + F_PHGMY = "C_PHGMY", F_PSDSP = "C_PHRTA", P_PHRYG = "C_PHRYG", + P_PHYSL = "C_PHYSL", F_PHYTP = "C_PHYTP", P_PLACS = "C_PLACS", + P_PLCPS = "C_PLCPS", P_PLCPSL = "C_PLCPSL", P_PLCTN = "C_PLCTN", + P_PLGPH = "C_PLGPH", B_PLGTH = "C_PLGTH", P_PLMRN = "C_PLMRN", + P_PLNCT = "C_PLNCT", P_PLNDSC = "C_PLNDSC", P_PLNGY = "C_PLNGY", + P_PLNRBL = "C_PLNLLA", P_PLNLN = "C_PLNLN", P_PLNLR = "C_PLNLR", + P_PLNRB = "C_PLNRB", P_PLNSP = "C_PLNSPR", P_PLRNM = "C_PLRNM", + P_PLRST = "C_PLRST", P_PLRTR = "C_PLRTR", F_PLSMD = "C_PLSMD", + P_PLTYC = "C_PLTYC", P_PSDBL = "C_PLVNA", P_PLYMR = "C_PLYMR", + P_PLTYN = "C_PNMTM", P_PNRPL = "C_PNRPL", F_PNTSM = "C_PNTSM", + P_PRCNT = "C_PRCNT", P_PRFSS = "C_PRFSS", P_PRMCM = "C_PRMCUM", + F_PRNSP = "C_PRNSP", P_PRPND = "C_PRPND", P_PRPYX = "C_PRPYX", + P_PRRDN = "C_PRRDN", P_PSDDF = "C_PSDDF", P_PSDMC = "C_PSDMC", + P_PSDND = "C_PSDND", P_PSDNN = "C_PSDNN", P_PSDPL = "C_PSDPLY", + P_PSMMS = "C_PSMMS", P_PTLLN = "C_PTLLN", P_PTLLND = "C_PTLLND", + F_PTRSN = "C_PTRSN", P_PULLN = "C_PULLN", P_PUTLN = "C_PUTLN", + P_PRTTR = "C_PYMNA", P_PYRGL = "C_PYRGL", P_PYRGO = "C_PYRGO", + P_PYRLN = "C_PYRLN", F_PYTHM = "C_PYTHIM", F_PYTHL = "C_PYTHL", + P_PYXCL = "C_PYXCL", P_QNQLC = "C_QNQLC", P_RAMLN = "C_RAMLN", + P_RBRTN = "C_RBRTN", P_RCRVD = "C_RCRVD", P_RCTBL = "C_RCTBL", + P_RCTCB = "C_RCTCB", P_RCTGL = "C_RCTGL", P_RCTVG = "C_RCTVG", + P_RDGDR = "C_RDGDR", P_REMNC = "C_REMNC", P_REPHX = "C_REPHX", + P_RHBDM = "C_RHBDMM", F_RHBDS = "C_RHBDSP", P_RHPDD = "C_RHPDD", + F_RHPDM = "C_RHPDM", F_RHZDMY = "C_RHZDM", P_RHZMM = "C_RHZMM", + P_RIVRN = "C_RIVRN", P_ROSLN = "C_ROSLN", P_ROTAL = "C_ROTAL", + P_RPHDP = "C_RPHDP", P_RPRTN = "C_RPRTN", P_RSSLL = "C_RSSLL", + P_RTLMM = "C_RTLMM", P_RTYLA = "C_RTYLA", P_RUGID = "C_RUGID", + F_RZLLP = "C_RZLLP", P_SAGRN = "C_SAGRN", P_SCCMM = "C_SCCMM", + P_SCCRH = "C_SCCRH", P_SCHLM = "C_SCHLM", F_SCLRS = "C_SCLRS", + P_SCTLR = "C_SCTLR", P_SEBRK = "C_SEBRK", P_SGMLN = "C_SGMLN", + P_SGMLP = "C_SGMLP", P_SGMMR = "C_SGMMR", P_SGMVR = "C_SGMVR", + F_SMMRS = "C_SMMRS", P_SNNDS = "C_SNNDS", P_SORTS = "C_SORTS", + P_SPHGN = "C_SPHGN", P_SPHNN = "C_SPHNN", P_SNLLA = "C_SPHNNL", + P_SPHTR = "C_SPHTR", P_SPHTX = "C_SPHTX", P_SPHVG = "C_SPHVG", + P_SPRDT = "C_SPRDT", P_SPRLC = "C_SPRLC", F_SPRLG = "C_SPRLG", + P_SPRLL = "C_SPRLL", F_SPRMY = "C_SPRMY", P_SPRPL = "C_SPRPL", + P_SPRSG = "C_SPRSG", P_SPRST = "C_SPRST", P_SPHNP = "C_SPRTA", + P_SPRZN = "C_SPRZN", P_SPHRG = "C_SPSNA", P_STHDM = "C_SPTHD", + P_SRCNR = "C_SRCNR", F_SRLPD = "C_SRLPD", F_SPNGS = "C_SSPRA", + F_STEIN = "C_STEIN", P_SPTHD = "C_STHDDS", P_STHRP = "C_STHRP", + P_STNFR = "C_STNFR", P_STNSM = "C_STNSM", P_STNTR = "C_STNTR", + P_STRBL = "C_STRBL", P_STRMB = "C_STRMB", P_STTSN = "C_STTSN", + P_STYLN = "C_SYCHA", F_SCHZC = "C_SYTRM", P_TBNLL = "C_TBNLL", + P_TRCHL = "C_TCHLS", P_TCHNT = "C_TCHNT", P_THRCL = "C_THRCL", + P_THRMM = "C_THRMM", P_TIARN = "C_TIARN", P_TKPHR = "C_TKPHR", + P_TLNMA = "C_TLNMA", P_TLYPM = "C_TLYPM", P_TMNDS = "C_TMNDS", + P_TMNTA = "C_TMNTA", P_TNTNN = "C_TNNDM", P_TTNNS = "C_TNTNN", + P_TNPSS = "C_TNTNNP", P_TONTN = "C_TONTN", P_TOSAI = "C_TOSAI", + P_TPHTR = "C_TPHTR", P_TRCHH = "C_TRCHH", P_TRPHS = "C_TRCHLR", + P_TMMNA = "C_TRCHM", P_TRCHS = "C_TRCHSP", P_TRFRN = "C_TRFRN", + P_TRLCL = "C_TRLCL", P_TRTXL = "C_TRTXL", P_TRTXS = "C_TRTXS", + P_TTRHY = "C_TTRHY", F_TTRMY = "C_TTRMY", P_TXTLR = "C_TXTLR", + F_THRST = "C_TYTRM", P_URLPT = "C_ULPTS", P_UNGLT = "C_UNGLT", + P_URCNT = "C_URCNT", P_URONM = "C_URONM", P_UROSM = "C_UROSM", + P_URTRC = "C_URTRC", P_URSTY = "C_UTYLA", P_UVGRN = "C_UVGRN", + P_VLVLN = "C_VALVLN", P_VGNLN = "C_VGNLN", P_VGNLNP = "C_VGNLNP", + P_VLNRA = "C_VLVLN", P_VGNCL = "C_VNCLA", P_VRGLN = "C_VRGLN", + P_VRGLNP = "C_VRGLNP", P_VRTCL = "C_VRTCL", P_WBBNL = "C_WBBNL", + P_WEBBN = "C_WEBBN", P_WSNRL = "C_WSNRL", P_ZTHMN = "C_ZHMNM", + B_ZOOGL = "C_ZOOGL", F_DDSCS = "F_DPDSC", F_SCCHR = "F_SMYCS", + P_AMTRN = "P_ACNTH", F_AMBDM = "P_AMBDM", F_ARCYR = "P_ARCYR", + F_BADHM = "P_BADHM", F_BDHMP = "P_BDHMP", F_BRBYL = "P_BRBYL", + F_BRFLD = "P_BRFLD", F_CLMYX = "P_CLMYX", F_CLSTD = "P_CLSTD", + F_CMTRC = "P_CMTRC", F_CRBRR = "P_CRBRR", F_CRTMY = "P_CRTMY", + F_CRTRM = "P_CRTRM", F_DCTYD = "P_DCTYD", F_DDYMM = "P_DDYMM", + F_DIACH = "P_DIACH", F_DIANM = "P_DIANM", F_DIDRM = "P_DIDRM", + F_ELMYX = "P_ELMYX", F_ESTLM = "P_ESTLM", F_FULIG = "P_FULIG", + F_HMTRC = "P_HMTRC", F_LCRPS = "P_LCRPS", F_LICEA = "P_LICEA", + F_LMPRD = "P_LMPRD", F_LPTDR = "P_LPTDR", F_LSTRL = "P_LSTRL", + F_LYCGL = "P_LYCGL", F_MCBRD = "P_MCBRD", F_MNKTL = "P_MNKTL", + F_MTTRC = "P_MTTRC", F_MUCLG = "P_MUCLG", F_PHYSR = "P_PHYSR", + F_PRCHN = "P_PRCHN", F_PRMBD = "P_PRMBD", F_PRTPH = "P_PRTPH", + F_PSRNA = "P_PSRNA", F_PYSRM = "P_PYSRM", F_RTCLR = "P_RTCLR", + F_STMNT = "P_STMNT", F_SYMPH = "P_SYMPH", F_TRBRK = "P_TRBRK", + F_TRICH = "P_TRICH", F_TUBFR = "P_TUBFR") + + assign(x = "mo_codes_v0.5.0", + value = mo_codes_v0.5.0, + envir = asNamespace("AMR")) + # packageStartupMessage("OK.", appendLF = TRUE) } } diff --git a/_pkgdown.yml b/_pkgdown.yml index 8f3683e5..f701779f 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -88,7 +88,8 @@ reference: for more information about how to work with functions in this package. contents: - '`AMR`' - - '`ITIS`' + - '`catalogue_of_life`' + - '`catalogue_of_life_version`' - '`WHOCC`' - title: 'Cleaning your data' desc: > @@ -145,7 +146,6 @@ reference: - '`WHONET`' - '`microorganisms.codes`' - '`microorganisms.old`' - - '`supplementary_data`' - title: Other desc: > These functions are mostly for internal use, but some of diff --git a/data/microorganisms.old.rda b/data/microorganisms.old.rda index 02c8c939..027d5334 100644 Binary files a/data/microorganisms.old.rda and b/data/microorganisms.old.rda differ diff --git a/data/microorganisms.rda b/data/microorganisms.rda index 24a35f47..50db4fbe 100755 Binary files a/data/microorganisms.rda and b/data/microorganisms.rda differ diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index 212c53ae..f9678395 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -192,7 +192,7 @@
AMR.Rmd
Note: values on this page will change with every website update since they are based on randomly created values and the page was written in RMarkdown. However, the methodology remains unchanged. This page was generated on 18 February 2019.
+Note: values on this page will change with every website update since they are based on randomly created values and the page was written in RMarkdown. However, the methodology remains unchanged. This page was generated on 20 February 2019.
Now, let’s start the cleaning and the analysis!
@@ -411,8 +411,8 @@ #> #> Item Count Percent Cum. Count Cum. Percent #> --- ----- ------- -------- ----------- ------------- -#> 1 M 10,283 51.4% 10,283 51.4% -#> 2 F 9,717 48.6% 20,000 100.0% +#> 1 M 10,471 52.4% 10,471 52.4% +#> 2 F 9,529 47.6% 20,000 100.0%So, we can draw at least two conclusions immediately. From a data scientist perspective, the data looks clean: only values M
and F
. From a researcher perspective: there are slightly more men. Nothing we didn’t already know.
The data is already quite clean, but we still need to transform some variables. The bacteria
column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate()
function of the dplyr
package makes this really easy:
data <- data %>%
@@ -443,10 +443,10 @@
#> Kingella kingae (no changes)
#>
#> EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
-#> Table 1: Intrinsic resistance in Enterobacteriaceae (1294 changes)
+#> Table 1: Intrinsic resistance in Enterobacteriaceae (1288 changes)
#> Table 2: Intrinsic resistance in non-fermentative Gram-negative bacteria (no changes)
#> Table 3: Intrinsic resistance in other Gram-negative bacteria (no changes)
-#> Table 4: Intrinsic resistance in Gram-positive bacteria (2822 changes)
+#> Table 4: Intrinsic resistance in Gram-positive bacteria (2757 changes)
#> Table 8: Interpretive rules for B-lactam agents and Gram-positive cocci (no changes)
#> Table 9: Interpretive rules for B-lactam agents and Gram-negative rods (no changes)
#> Table 10: Interpretive rules for B-lactam agents and other Gram-negative bacteria (no changes)
@@ -462,9 +462,9 @@
#> Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no changes)
#> Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no changes)
#>
-#> => EUCAST rules affected 7,463 out of 20,000 rows
+#> => EUCAST rules affected 7,243 out of 20,000 rows
#> -> added 0 test results
-#> -> changed 4,116 test results (0 to S; 0 to I; 4,116 to R)
So only 28.3% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
So only 28.5% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
For future use, the above two syntaxes can be shortened with the filter_first_isolate()
function:
isolate | @@ -654,8 +654,8 @@|||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | -2010-04-05 | -Q7 | +2010-01-31 | +C6 | B_ESCHR_COL | R | S | @@ -666,11 +666,11 @@||||||||
2 | -2010-05-10 | -Q7 | +2010-03-09 | +C6 | B_ESCHR_COL | S | -R | +S | S | S | FALSE | @@ -678,11 +678,11 @@||||
3 | -2010-06-22 | -Q7 | +2010-05-30 | +C6 | B_ESCHR_COL | S | -I | +S | S | S | FALSE | @@ -690,8 +690,8 @@||||
4 | -2010-06-25 | -Q7 | +2010-06-22 | +C6 | B_ESCHR_COL | S | S | @@ -702,83 +702,83 @@||||||||
5 | -2010-10-03 | -Q7 | +2011-01-10 | +C6 | B_ESCHR_COL | S | -S | -S | +I | +R | S | FALSE | -FALSE | +TRUE | |
6 | -2010-10-08 | -Q7 | +2011-02-19 | +C6 | B_ESCHR_COL | +I | S | S | S | -S | -FALSE | -FALSE | +TRUE | +TRUE | |
7 | -2011-03-24 | -Q7 | +2011-02-21 | +C6 | B_ESCHR_COL | -S | -S | -S | +R | +I | +R | S | FALSE | -FALSE | +TRUE |
8 | -2011-03-27 | -Q7 | +2011-05-11 | +C6 | B_ESCHR_COL | S | S | S | S | FALSE | -FALSE | +TRUE | |||
9 | -2011-05-11 | -Q7 | +2011-07-06 | +C6 | B_ESCHR_COL | S | S | -R | -R | -TRUE | -TRUE | -||||
10 | -2011-07-22 | -Q7 | -B_ESCHR_COL | -R | -S | S | S | FALSE | -TRUE | +FALSE | +|||||
10 | +2011-08-11 | +C6 | +B_ESCHR_COL | +S | +S | +S | +S | +FALSE | +FALSE |
Instead of 2, now 4 isolates are flagged. In total, 79% of all isolates are marked ‘first weighted’ - 50.8% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
+Instead of 2, now 6 isolates are flagged. In total, 79.3% of all isolates are marked ‘first weighted’ - 50.9% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
As with filter_first_isolate()
, there’s a shortcut for this new algorithm too:
So we end up with 15,805 isolates for analysis.
+So we end up with 15,864 isolates for analysis.
We can remove unneeded columns:
@@ -803,45 +803,45 @@Time for the analysis!
@@ -915,9 +915,9 @@Or can be used like the dplyr
way, which is easier readable:
Frequency table of genus
and species
from a data.frame
(15,805 x 13)
Frequency table of genus
and species
from a data.frame
(15,864 x 13)
Columns: 2
-Length: 15,805 (of which NA: 0 = 0.00%)
+Length: 15,864 (of which NA: 0 = 0.00%)
Unique: 4
Shortest: 16
Longest: 24
The functions portion_R
, portion_RI
, portion_I
, portion_IS
and portion_S
can be used to determine the portion of a specific antimicrobial outcome. They can be used on their own:
Or can be used in conjuction with group_by()
and summarise()
, both from the dplyr
package:
data_1st %>%
group_by(hospital) %>%
@@ -984,19 +984,19 @@ Longest: 24
Hospital A
-0.4661749
+0.4814277
Hospital B
-0.4820906
+0.4811252
Hospital C
-0.4847862
+0.4843815
Hospital D
-0.4790875
+0.4610721
@@ -1014,23 +1014,23 @@ Longest: 24
Hospital A
-0.4661749
-4745
+0.4814277
+4819
Hospital B
-0.4820906
-5472
+0.4811252
+5510
Hospital C
-0.4847862
-2432
+0.4843815
+2401
Hospital D
-0.4790875
-3156
+0.4610721
+3134
@@ -1050,27 +1050,27 @@ Longest: 24
Escherichia
-0.7332054
-0.9033909
-0.9754319
+0.7180533
+0.8983310
+0.9704421
Klebsiella
-0.7142857
-0.9095675
-0.9737877
+0.7189501
+0.8879641
+0.9654289
Staphylococcus
-0.7179037
-0.9162487
-0.9779338
+0.7231920
+0.9216958
+0.9790524
Streptococcus
-0.7289984
+0.7372084
0.0000000
-0.7289984
+0.7372084
diff --git a/docs/articles/AMR_files/figure-html/plot 1-1.png b/docs/articles/AMR_files/figure-html/plot 1-1.png
index 86aa274d..b87bb214 100644
Binary files a/docs/articles/AMR_files/figure-html/plot 1-1.png and b/docs/articles/AMR_files/figure-html/plot 1-1.png differ
diff --git a/docs/articles/AMR_files/figure-html/plot 3-1.png b/docs/articles/AMR_files/figure-html/plot 3-1.png
index 1f6d21fd..3a6a2c1f 100644
Binary files a/docs/articles/AMR_files/figure-html/plot 3-1.png and b/docs/articles/AMR_files/figure-html/plot 3-1.png differ
diff --git a/docs/articles/AMR_files/figure-html/plot 4-1.png b/docs/articles/AMR_files/figure-html/plot 4-1.png
index 2689fa2e..eec1e91b 100644
Binary files a/docs/articles/AMR_files/figure-html/plot 4-1.png and b/docs/articles/AMR_files/figure-html/plot 4-1.png differ
diff --git a/docs/articles/AMR_files/figure-html/plot 5-1.png b/docs/articles/AMR_files/figure-html/plot 5-1.png
index bac17cf6..dae88c56 100644
Binary files a/docs/articles/AMR_files/figure-html/plot 5-1.png and b/docs/articles/AMR_files/figure-html/plot 5-1.png differ
diff --git a/docs/articles/EUCAST.html b/docs/articles/EUCAST.html
index 908711d9..8eabbc3a 100644
--- a/docs/articles/EUCAST.html
+++ b/docs/articles/EUCAST.html
@@ -192,7 +192,7 @@
How to apply EUCAST rules
Matthijs S. Berends
- 18 February 2019
+ 20 February 2019
EUCAST.Rmd
diff --git a/docs/articles/G_test.html b/docs/articles/G_test.html
index 589abac2..304f848b 100644
--- a/docs/articles/G_test.html
+++ b/docs/articles/G_test.html
@@ -192,7 +192,7 @@
How to use the G-test
Matthijs S. Berends
- 18 February 2019
+ 20 February 2019
G_test.Rmd
diff --git a/docs/articles/WHONET.html b/docs/articles/WHONET.html
index 9d3b980e..52633355 100644
--- a/docs/articles/WHONET.html
+++ b/docs/articles/WHONET.html
@@ -192,7 +192,7 @@
How to work with WHONET data
Matthijs S. Berends
- 18 February 2019
+ 20 February 2019
WHONET.Rmd
diff --git a/docs/articles/atc_property.html b/docs/articles/atc_property.html
index 0760a382..f9bcbfe3 100644
--- a/docs/articles/atc_property.html
+++ b/docs/articles/atc_property.html
@@ -192,7 +192,7 @@
How to get properties of an antibiotic
Matthijs S. Berends
- 18 February 2019
+ 20 February 2019
atc_property.Rmd
diff --git a/docs/articles/benchmarks.html b/docs/articles/benchmarks.html
index 7fdf2257..f50800a3 100644
--- a/docs/articles/benchmarks.html
+++ b/docs/articles/benchmarks.html
@@ -192,7 +192,7 @@
Benchmarks
Matthijs S. Berends
- 14 February 2019
+ 20 February 2019
benchmarks.Rmd
@@ -201,161 +201,183 @@
-One of the most important features of this package is the complete microbial taxonomic database, supplied by ITIS (https://www.itis.gov). We created a function as.mo()
that transforms any user input value to a valid microbial ID by using AI (Artificial Intelligence) and based on the taxonomic tree of ITIS.
-Using the microbenchmark
package, we can review the calculation performance of this function. Its function microbenchmark()
calculates different input expressions independently of each others and runs every expression 100 times.
+One of the most important features of this package is the complete microbial taxonomic database, supplied by the Catalogue of Life (http://catalogueoflife.org). We created a function as.mo()
that transforms any user input value to a valid microbial ID by using AI (Artificial Intelligence) combined with the taxonomic tree of Catalogue of Life.
+Using the microbenchmark
package, we can review the calculation performance of this function. Its function microbenchmark()
runs different input expressions independently of each other and measures their time-to-result.
In the next test, we try to ‘coerce’ different input values for Staphylococcus aureus. The actual result is the same every time: it returns its MO code B_STPHY_AUR
(B stands for Bacteria, the taxonomic kingdom).
But the calculation time differs a lot. Here, the AI effect can be reviewed best:
-benchmark <- microbenchmark(as.mo("sau"),
- as.mo("stau"),
- as.mo("staaur"),
- as.mo("S. aureus"),
- as.mo("S. aureus"),
- as.mo("STAAUR"),
- as.mo("Staphylococcus aureus"),
- as.mo("B_STPHY_AUR"))
-print(benchmark, unit = "ms")
-# Unit: milliseconds
-# expr min lq mean median uq max neval
-# as.mo("sau") 18.983141 19.121148 19.9676944 19.1967505 19.2871260 38.635012 100
-# as.mo("stau") 37.503863 37.692049 38.9856547 37.8244335 37.9851040 57.576107 100
-# as.mo("staaur") 18.945427 19.122579 19.6392560 19.2241285 19.3536140 38.687672 100
-# as.mo("S. aureus") 15.305229 15.471103 16.3477096 15.5545630 15.6689280 36.363005 100
-# as.mo("S. aureus") 15.308232 15.469881 16.5269706 15.5506870 15.6277560 42.155292 100
-# as.mo("STAAUR") 18.984049 19.117166 19.6104597 19.2219285 19.3161095 38.638783 100
-# as.mo("Staphylococcus aureus") 8.103546 8.198285 8.6422018 8.2636915 8.3200535 27.002527 100
-# as.mo("B_STPHY_AUR") 0.156236 0.196779 0.2017926 0.2035535 0.2115505 0.241861 100
-
-par(mar = c(5, 15, 4, 2)) # set more space for left margin text (15)
-boxplot(benchmark, horizontal = TRUE, las = 1, unit = "ms", log = FALSE, xlab = "", ylim = c(0, 200),
- main = expression(paste("Benchmark of ", italic("Staphylococcus aureus"))))
-
-In the table above, all measurements are shown in milliseconds (thousands of seconds), tested on a quite regular Linux server from 2007 (Core 2 Duo 2.7 GHz, 2 GB DDR2 RAM). A value of 8 milliseconds means it can determine 125 input values per second. It case of 40 milliseconds, this is only 25 input values per second. The more an input value resembles a full name, the faster the result will be found. In case of as.mo("B_STPHY_AUR")
, the input is already a valid MO code, so it only almost takes no time at all (0.0002 seconds on our server).
-To achieve this speed, the as.mo
function also takes into account the prevalence of human pathogenic microorganisms. The downside is of course that less prevalent microorganisms will be determined far less faster. See this example for the ID of Burkholderia nodosa (B_BRKHL_NOD
):
-benchmark <- microbenchmark(as.mo("buno"),
- as.mo("burnod"),
- as.mo("B. nodosa"),
- as.mo("B. nodosa"),
- as.mo("BURNOD"),
- as.mo("Burkholderia nodosa"),
- as.mo("B_BRKHL_NOD"))
-print(benchmark, unit = "ms")
-
-# Unit: milliseconds
-# expr min lq mean median uq max neval
-# as.mo("buno") 125.141333 125.8553210 129.5727691 126.3899910 127.0954925 194.51985 100
-# as.mo("burnod") 142.300359 144.1611750 147.0642288 144.6074960 145.5243025 176.91649 100
-# as.mo("B. nodosa") 81.530132 81.9360840 83.3915418 82.1852770 82.6848870 102.63184 100
-# as.mo("B. nodosa") 81.109547 81.9836805 84.7595894 82.3437825 82.8282705 110.67036 100
-# as.mo("BURNOD") 143.163527 143.9134485 148.7192688 144.5582580 145.7489115 314.92070 100
-# as.mo("Burkholderia nodosa") 36.226325 36.5499000 37.1309929 36.6581540 36.7551985 56.25597 100
-# as.mo("B_BRKHL_NOD") 0.172509 0.3038455 0.4806591 0.3078265 0.3121215 19.16173 100
-
-boxplot(benchmark, horizontal = TRUE, las = 1, unit = "ms", log = FALSE, xlab = "", ylim = c(0, 200),
- main = expression(paste("Benchmark of ", italic("Burkholderia nodosa"))))
-
-That takes up to 8 times as much time! A value of 145 milliseconds means it can only determine ~7 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance.
+S.aureus <- microbenchmark(as.mo("sau"),
+ as.mo("stau"),
+ as.mo("staaur"),
+ as.mo("S. aureus"),
+ as.mo("S. aureus"),
+ as.mo("STAAUR"),
+ as.mo("Staphylococcus aureus"),
+ as.mo("B_STPHY_AUR"),
+ times = 10)
+print(S.aureus, unit = "ms")
+#> Unit: milliseconds
+#> expr min lq mean median
+#> as.mo("sau") 42.58139 42.645368 43.3006677 42.970095
+#> as.mo("stau") 76.60094 77.168264 83.7686909 77.316642
+#> as.mo("staaur") 42.86607 42.947083 43.5035571 43.497293
+#> as.mo("S. aureus") 18.39354 18.432582 22.4304233 18.495928
+#> as.mo("S. aureus") 18.46513 18.559903 18.6640991 18.579110
+#> as.mo("STAAUR") 42.71975 42.788612 44.3280682 43.069864
+#> as.mo("Staphylococcus aureus") 11.56285 11.591419 15.9457298 11.667161
+#> as.mo("B_STPHY_AUR") 0.40487 0.450128 0.5036822 0.481417
+#> uq max neval
+#> 43.448543 45.058105 10
+#> 78.335591 127.180349 10
+#> 43.817095 44.999509 10
+#> 19.007097 56.501460 10
+#> 18.651814 19.373275 10
+#> 43.741388 54.703256 10
+#> 12.323077 50.121808 10
+#> 0.519271 0.766578 10
+In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 10 milliseconds means it can determine 100 input values per second. It case of 50 milliseconds, this is only 20 input values per second. The more an input value resembles a full name, the faster the result will be found. In case of as.mo("B_STPHY_AUR")
, the input is already a valid MO code, so it only almost takes no time at all (404 millionths of seconds).
+To achieve this speed, the as.mo
function also takes into account the prevalence of human pathogenic microorganisms. The downside is of course that less prevalent microorganisms will be determined far less faster. See this example for the ID of Mycoplasma leonicaptivi (B_MYCPL_LEO
), a bug probably never found before in humans:
+M.leonicaptivi <- microbenchmark(as.mo("myle"),
+ as.mo("mycleo"),
+ as.mo("M. leonicaptivi"),
+ as.mo("M. leonicaptivi"),
+ as.mo("MYCLEO"),
+ as.mo("Mycoplasma leonicaptivi"),
+ as.mo("B_MYCPL_LEO"),
+ times = 10)
+print(M.leonicaptivi, unit = "ms")
+#> Unit: milliseconds
+#> expr min lq mean
+#> as.mo("myle") 112.28656 112.372601 112.751678
+#> as.mo("mycleo") 382.46812 382.757612 383.432440
+#> as.mo("M. leonicaptivi") 202.68674 203.654949 210.461303
+#> as.mo("M. leonicaptivi") 202.89759 203.440956 203.816387
+#> as.mo("MYCLEO") 382.27864 383.090895 401.904482
+#> as.mo("Mycoplasma leonicaptivi") 102.99676 103.191196 109.196394
+#> as.mo("B_MYCPL_LEO") 0.32155 0.564807 4.320068
+#> median uq max neval
+#> 112.540884 112.76874 113.76321 10
+#> 383.232219 384.05897 385.28587 10
+#> 204.255445 205.80976 242.53035 10
+#> 203.613673 203.82802 206.15038 10
+#> 386.478757 421.87837 437.26978 10
+#> 103.596136 104.65940 142.25748 10
+#> 0.593652 0.62522 37.96384 10
+That takes 6 times as much time on average! A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance:
+par(mar = c(5, 16, 4, 2)) # set more space for left margin text (16)
+
+# highest value on y axis
+max_y_axis <- max(S.aureus$time, M.leonicaptivi$time, na.rm = TRUE) / 1e6
+
+boxplot(S.aureus, horizontal = TRUE, las = 1, unit = "ms", log = FALSE, xlab = "", ylim = c(0, max_y_axis),
+ main = expression(paste("Benchmark of ", italic("Staphylococcus aureus"))))
+
+
+boxplot(M.leonicaptivi, horizontal = TRUE, las = 1, unit = "ms", log = FALSE, xlab = "", ylim = c(0, max_y_axis),
+ main = expression(paste("Benchmark of ", italic("Mycoplasma leonicaptivi"))))
+
To relieve this pitfall and further improve performance, two important calculations take almost no time at all: repetitive results and already precalculated results.
Repetitive results
-Repetitive results mean that unique values are present more than once. Unique values will only be calculated once by as.mo()
. We will use mo_fullname()
for this test - a helper function that returns the full microbial name (genus, species and possibly subspecies) and uses as.mo()
internally.
-library(dplyr)
-# take 500,000 random MO codes from the septic_patients data set
-x = septic_patients %>%
- sample_n(500000, replace = TRUE) %>%
- pull(mo)
-
-# got the right length?
-length(x)
-# [1] 500000
-
-# and how many unique values do we have?
-n_distinct(x)
-# [1] 96
-
-# only 96, but distributed in 500,000 results. now let's see:
-microbenchmark(X = mo_fullname(x),
- times = 10,
- unit = "ms")
-# Unit: milliseconds
-# expr min lq mean median uq max neval
-# X 114.9342 117.1076 129.6448 120.2047 131.5005 168.6371 10
-So transforming 500,000 values (!) of 96 unique values only takes 0.12 seconds (120 ms). You only lose time on your unique input values.
-Results of a tenfold - 5,000,000 values:
-# Unit: milliseconds
-# expr min lq mean median uq max neval
-# X 882.9045 901.3011 1001.677 940.3421 1168.088 1226.846 10
-Even determining the full names of 5 Million values is done within a second.
+Repetitive results mean that unique values are present more than once. Unique values will only be calculated once by as.mo()
. We will use mo_fullname()
for this test - a helper function that returns the full microbial name (genus, species and possibly subspecies) which uses as.mo()
internally.
+library(dplyr)
+#>
+#> Attaching package: 'dplyr'
+#> The following objects are masked from 'package:stats':
+#>
+#> filter, lag
+#> The following objects are masked from 'package:base':
+#>
+#> intersect, setdiff, setequal, union
+# take 500,000 random MO codes from the septic_patients data set
+x = septic_patients %>%
+ sample_n(500000, replace = TRUE) %>%
+ pull(mo)
+
+# got the right length?
+length(x)
+#> [1] 500000
+
+# and how many unique values do we have?
+n_distinct(x)
+#> [1] 95
+
+# now let's see:
+run_it <- microbenchmark(X = mo_fullname(x),
+ times = 10)
+print(run_it, unit = "ms")
+#> Unit: milliseconds
+#> expr min lq mean median uq max neval
+#> X 435.7086 442.1682 465.5949 468.8453 477.1915 505.961 10
+So transforming 500,000 values (!) of 95 unique values only takes 0.47 seconds (468 ms). You only lose time on your unique input values.
Precalculated results
What about precalculated results? If the input is an already precalculated result of a helper function like mo_fullname()
, it almost doesn’t take any time at all (see ‘C’ below):
-microbenchmark(A = mo_fullname("B_STPHY_AUR"),
- B = mo_fullname("S. aureus"),
- C = mo_fullname("Staphylococcus aureus"),
- times = 10,
- unit = "ms")
-# Unit: milliseconds
-# expr min lq mean median uq max neval
-# A 11.364086 11.460537 11.5104799 11.4795330 11.524860 11.818263 10
-# B 11.976454 12.012352 12.1704592 12.0853020 12.210004 12.881737 10
-# C 0.095823 0.102528 0.1167754 0.1153785 0.132629 0.140661 10
-So going from mo_fullname("Staphylococcus aureus")
to "Staphylococcus aureus"
takes 0.0001 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:
-microbenchmark(A = mo_species("aureus"),
- B = mo_genus("Staphylococcus"),
- C = mo_fullname("Staphylococcus aureus"),
- D = mo_family("Staphylococcaceae"),
- E = mo_order("Bacillales"),
- F = mo_class("Bacilli"),
- G = mo_phylum("Firmicutes"),
- H = mo_subkingdom("Posibacteria"),
- I = mo_kingdom("Bacteria"),
- times = 10,
- unit = "ms")
-# Unit: milliseconds
-# expr min lq mean median uq max neval
-# A 0.105181 0.121314 0.1478538 0.1465265 0.166711 0.211409 10
-# B 0.132558 0.146388 0.1584278 0.1499835 0.164895 0.208477 10
-# C 0.135492 0.160355 0.2341847 0.1884665 0.348857 0.395931 10
-# D 0.109650 0.115727 0.1270481 0.1264130 0.128648 0.168317 10
-# E 0.081574 0.096940 0.0992582 0.0980915 0.101479 0.120477 10
-# F 0.081575 0.088489 0.0988463 0.0989650 0.103365 0.126482 10
-# G 0.091981 0.095333 0.1043568 0.1001530 0.111327 0.129625 10
-# H 0.092610 0.093169 0.1009135 0.0985455 0.101828 0.120406 10
-# I 0.087371 0.091213 0.1069758 0.0941815 0.109302 0.192831 10
-Of course, when running mo_phylum("Firmicutes")
the function has zero knowledge about the actual microorganism, namely S. aureus. But since the result would be "Firmicutes"
too, there is no point in calculating the result. And because this package ‘knows’ all phyla of all known microorganisms (according to ITIS), it can just return the initial value immediately.
+run_it <- microbenchmark(A = mo_fullname("B_STPHY_AUR"),
+ B = mo_fullname("S. aureus"),
+ C = mo_fullname("Staphylococcus aureus"),
+ times = 10)
+print(run_it, unit = "ms")
+#> Unit: milliseconds
+#> expr min lq mean median uq max neval
+#> A 38.887977 38.920313 39.3674024 39.076862 39.258415 42.166327 10
+#> B 19.589084 19.631059 19.8682396 19.781567 19.955611 20.751941 10
+#> C 0.255829 0.382732 0.4199913 0.400156 0.499156 0.564807 10
+So going from mo_fullname("Staphylococcus aureus")
to "Staphylococcus aureus"
takes 0.0004 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:
+microbenchmark(A = mo_species("aureus"),
+ B = mo_genus("Staphylococcus"),
+ C = mo_fullname("Staphylococcus aureus"),
+ D = mo_family("Staphylococcaceae"),
+ E = mo_order("Bacillales"),
+ F = mo_class("Bacilli"),
+ G = mo_phylum("Firmicutes"),
+ H = mo_kingdom("Bacteria"),
+ times = 10,
+ unit = "ms")
+#> Unit: milliseconds
+#> expr min lq mean median uq max neval
+#> A 0.250242 0.292496 0.3891774 0.4266960 0.456902 0.520388 10
+#> B 0.259461 0.311702 0.3428727 0.3412800 0.374141 0.443912 10
+#> C 0.290960 0.313169 0.4334429 0.4097595 0.520389 0.725373 10
+#> D 0.271823 0.282789 0.3187217 0.3192800 0.352909 0.375398 10
+#> E 0.245353 0.270985 0.3081197 0.2960235 0.330839 0.429036 10
+#> F 0.246122 0.266585 0.2991101 0.3089435 0.332794 0.351582 10
+#> G 0.271893 0.272452 0.3085039 0.2850580 0.368204 0.385525 10
+#> H 0.252686 0.259251 0.3161791 0.2985025 0.334820 0.422680 10
+Of course, when running mo_phylum("Firmicutes")
the function has zero knowledge about the actual microorganism, namely S. aureus. But since the result would be "Firmicutes"
too, there is no point in calculating the result. And because this package ‘knows’ all phyla of all known bacteria (according to the Catalogue of Life), it can just return the initial value immediately.
Results in other languages
When the system language is non-English and supported by this AMR
package, some functions take a little while longer:
-mo_fullname("CoNS", language = "en") # or just mo_fullname("CoNS") on an English system
-# "Coagulase Negative Staphylococcus (CoNS)"
-
-mo_fullname("CoNS", language = "fr") # or just mo_fullname("CoNS") on a French system
-# "Staphylococcus à coagulase négative (CoNS)"
-
-microbenchmark(en = mo_fullname("CoNS", language = "en"),
- de = mo_fullname("CoNS", language = "de"),
- nl = mo_fullname("CoNS", language = "nl"),
- es = mo_fullname("CoNS", language = "es"),
- it = mo_fullname("CoNS", language = "it"),
- fr = mo_fullname("CoNS", language = "fr"),
- pt = mo_fullname("CoNS", language = "pt"),
- times = 10,
- unit = "ms")
-# Unit: milliseconds
-# expr min lq mean median uq max neval
-# en 6.093583 6.51724 6.555105 6.562986 6.630663 6.99698 100
-# de 13.934874 14.35137 16.891587 14.462210 14.764658 43.63956 100
-# nl 13.900092 14.34729 15.943268 14.424565 14.581535 43.76283 100
-# es 13.833813 14.34596 14.574783 14.439757 14.653994 17.49168 100
-# it 13.811883 14.36621 15.179060 14.453515 14.812359 43.64284 100
-# fr 13.798683 14.37019 16.344731 14.468775 14.697610 48.62923 100
-# pt 13.789674 14.36244 15.706321 14.443772 14.679905 44.76701 100
+mo_fullname("CoNS", language = "en") # or just mo_fullname("CoNS") on an English system
+#> [1] "Coagulase Negative Staphylococcus (CoNS)"
+
+mo_fullname("CoNS", language = "fr") # or just mo_fullname("CoNS") on a French system
+#> [1] "Staphylococcus à coagulase négative (CoNS)"
+
+microbenchmark(en = mo_fullname("CoNS", language = "en"),
+ de = mo_fullname("CoNS", language = "de"),
+ nl = mo_fullname("CoNS", language = "nl"),
+ es = mo_fullname("CoNS", language = "es"),
+ it = mo_fullname("CoNS", language = "it"),
+ fr = mo_fullname("CoNS", language = "fr"),
+ pt = mo_fullname("CoNS", language = "pt"),
+ times = 10,
+ unit = "ms")
+#> Unit: milliseconds
+#> expr min lq mean median uq max neval
+#> en 10.67105 11.03136 11.06332 11.07271 11.15310 11.45006 10
+#> de 19.13393 19.50080 26.13799 19.61419 20.23400 52.66501 10
+#> nl 19.05410 19.53789 22.94707 19.59205 20.12616 52.47399 10
+#> es 19.31635 19.55221 26.22342 19.58633 20.01875 52.97636 10
+#> it 19.21725 19.47105 19.63980 19.58053 19.68162 20.58914 10
+#> fr 19.07854 19.45450 19.67303 19.56153 19.64517 20.45651 10
+#> pt 19.00668 19.28388 19.53493 19.57857 19.66423 20.55317 10
Currently supported are German, Dutch, Spanish, Italian, French and Portuguese.
freq.Rmd
colnames(microorganisms)
# [1] "mo" "col_id" "fullname" "kingdom" "phylum"
# [6] "class" "order" "family" "genus" "species"
-# [11] "subspecies" "rank" "ref"
If we compare the dimensions between the old and new dataset, we can see that these 12 variables were added:
+# [11] "subspecies" "rank" "ref" "species_id" +If we compare the dimensions between the old and new dataset, we can see that these 13 variables were added:
+# [1] 2000 62So now the genus
and species
variables are available. A frequency table of these combined variables can be created like this:
Frequency table of genus
and species
from a data.frame
(2,000 x 61)
Frequency table of genus
and species
from a data.frame
(2,000 x 62)
Columns: 2
Length: 2,000 (of which NA: 0 = 0.00%)
Unique: 95
mo_property.Rmd
resistance_predict.Rmd
This package contains the complete microbial taxonomic data (with all nine taxonomic ranks - from kingdom to subspecies) from the publicly available Integrated Taxonomic Information System (ITIS, https://www.itis.gov).
-All ~20,000 (sub)species from the taxonomic kingdoms Bacteria, Fungi and Protozoa are included in this package, as well as all their ~2,500 previously accepted names known to ITIS. Furthermore, the responsible authors and year of publication are available. This allows users to use authoritative taxonomic information for their data analysis on any microorganism, not only human pathogens. It also helps to quickly determine the Gram stain of bacteria, since ITIS honours the taxonomic branching order of bacterial phyla according to Cavalier-Smith (2002), which defines that all bacteria are classified into either subkingdom Negibacteria or subkingdom Posibacteria.
-Read more about the data from ITIS in our manual.
+ +This package contains the complete taxonomic tree of almost all microorganisms from the authoritative and comprehensive Catalogue of Life (www.catalogueoflife.org). This data is updated annually - check the included version with catalogue_of_life_version()
.
Included are:
+The Catalogue of Life (www.catalogueoflife.org) is the most comprehensive and authoritative global index of species currently available. It holds essential information on the names, relationships and distributions of over 1.6 million species. The Catalogue of Life is used to support the major biodiversity and conservation information services such as the Global Biodiversity Information Facility (GBIF), Encyclopedia of Life (EoL) and the International Union for Conservation of Nature Red List. It is recognised by the Convention on Biological Diversity as a significant component of the Global Taxonomy Initiative and a contribution to Target 1 of the Global Strategy for Plant Conservation.
+Read more about the data from the Catalogue of Life in our manual.
as.mo()
to identify an MO code.microorganisms
data set now contains:
+microorganisms
data set now contains:
B_STRPTC
to B_STRPT
.mo
codes changed (e.g. Streptococcus changed from B_STRPTC
to B_STRPT
). A translation table is used internally to support older microorganism IDs, so users will not notice this difference.allow_uncertain = TRUE
(which is the default setting), i
It strips off values between brackets and the brackets itself, and re-evaluates the input with all previous rules
It strips off words from the end one by one and re-evaluates the input with all previous rules
It strips off words from the start one by one and re-evaluates the input with all previous rules
It tries to look for some manual changes which are not yet published to the ITIS database (like Propionibacterium not yet being Cutibacterium)
It tries to look for some manual changes which are not yet published to the Catalogue of Life (like Propionibacterium not yet being Cutibacterium)
Examples:
"Streptococcus group B (known as S. agalactiae)"
. The text between brackets will be removed and a warning will be thrown that the result Streptococcus group B (B_STRPT_GRB
) needs review.
allow_uncertain = TRUE
(which is the default setting), i
[1] Becker K et al. Coagulase-Negative Staphylococci. 2014. Clin Microbiol Rev. 27(4): 870–926. https://dx.doi.org/10.1128/CMR.00109-13
[2] Lancefield RC A serological differentiation of human and other groups of hemolytic streptococci. 1933. J Exp Med. 57(4): 571–95. https://dx.doi.org/10.1084/jem.57.4.571
-[3] Integrated Taxonomic Information System (ITIS). Retrieved September 2018. http://www.itis.gov
+[3] Catalogue of Life: Annual Checklist (public online database), www.catalogueoflife.org.
-
-This package contains the complete microbial taxonomic data (with all nine taxonomic ranks - from kingdom to subspecies) from the publicly available Integrated Taxonomic Information System (ITIS, https://www.itis.gov).
All ~20,000 (sub)species from the taxonomic kingdoms Bacteria, Fungi and Protozoa are included in this package, as well as all their ~2,500 previously accepted names known to ITIS. Furthermore, the responsible authors and year of publication are available. This allows users to use authoritative taxonomic information for their data analysis on any microorganism, not only human pathogens. It also helps to quickly determine the Gram stain of bacteria, since ITIS honours the taxonomic branching order of bacterial phyla according to Cavalier-Smith (2002), which defines that all bacteria are classified into either subkingdom Negibacteria or subkingdom Posibacteria.
-ITIS is a partnership of U.S., Canadian, and Mexican agencies and taxonomic specialists [3].
+
+This package contains the complete taxonomic tree of almost all microorganisms from the authoritative and comprehensive Catalogue of Life (http://www.catalogueoflife.org). This data is updated annually - check the included version with catalogue_of_life_version
.
Included are:
All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses
All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different species, of which most are not microbial. Including everything tremendously slows down our algortihms, and not all fungi fit the scope of this package. By only including the aforementioned taxonomic orders, the most relevant species are covered (like genera Aspergillus, Candida, Pneumocystis, Saccharomyces and Trichophyton).
All ~15,000 previously accepted names of (sub)species that have been taxonomically renamed
The complete taxonomic tree of all included (sub)species: from kingdom to subspecies
The responsible author(s) and year of scientific publication
The Catalogue of Life (http://www.catalogueoflife.org) is the most comprehensive and authoritative global index of species currently available. It holds essential information on the names, relationships and distributions of over 1.6 million species. The Catalogue of Life is used to support the major biodiversity and conservation information services such as the Global Biodiversity Information Facility (GBIF), Encyclopedia of Life (EoL) and the International Union for Conservation of Nature Red List. It is recognised by the Convention on Biological Diversity as a significant component of the Global Taxonomy Initiative and a contribution to Target 1 of the Global Strategy for Plant Conservation.
+The syntax used to transform the original data to a cleansed R format, can be found here: https://gitlab.com/msberends/AMR/blob/master/reproduction_of_microorganisms.R.
microorganisms
for the data.frame
with ITIS content that is being used to determine ID's.
+
microorganisms
for the data.frame
that is being used to determine ID's.
The mo_property
functions (like mo_genus
, mo_gramstain
) to get properties based on the returned code.
mo_property
functions (like catalogue_of_life.Rd
This package contains the complete taxonomic tree of almost all microorganisms from the authoritative and comprehensive Catalogue of Life.
+ +
+This package contains the complete taxonomic tree of almost all microorganisms from the authoritative and comprehensive Catalogue of Life (http://www.catalogueoflife.org). This data is updated annually - check the included version with catalogue_of_life_version
.
Included are:
All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses
All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different species, of which most are not microbial. Including everything tremendously slows down our algortihms, and not all fungi fit the scope of this package. By only including the aforementioned taxonomic orders, the most relevant species are covered (like genera Aspergillus, Candida, Pneumocystis, Saccharomyces and Trichophyton).
All ~15,000 previously accepted names of (sub)species that have been taxonomically renamed
The complete taxonomic tree of all included (sub)species: from kingdom to subspecies
The responsible author(s) and year of scientific publication
The Catalogue of Life (http://www.catalogueoflife.org) is the most comprehensive and authoritative global index of species currently available. It holds essential information on the names, relationships and distributions of over 1.6 million species. The Catalogue of Life is used to support the major biodiversity and conservation information services such as the Global Biodiversity Information Facility (GBIF), Encyclopedia of Life (EoL) and the International Union for Conservation of Nature Red List. It is recognised by the Convention on Biological Diversity as a significant component of the Global Taxonomy Initiative and a contribution to Target 1 of the Global Strategy for Plant Conservation.
+The syntax used to transform the original data to a cleansed R format, can be found here: https://gitlab.com/msberends/AMR/blob/master/reproduction_of_microorganisms.R.
+ +
+On our website https://msberends.gitlab.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
# NOT RUN { +# Get version info of included data set +catalogue_of_life_version() + + +# Get a note when a species was renamed +mo_shortname("Chlamydia psittaci") +# Note: 'Chlamydia psittaci' (Page, 1968) was renamed +# 'Chlamydophila psittaci' (Everett et al., 1999) +# [1] "C. psittaci" + +# Get any property from the entire taxonomic tree for all included species +mo_class("E. coli") +# [1] "Gammaproteobacteria" + +mo_family("E. coli") +# [1] "Enterobacteriaceae" + +mo_gramstain("E. coli") # based on kingdom and phylum, see ?mo_gramstain +# [1] "Gram negative" + +mo_ref("E. coli") +# [1] "Castellani et al., 1919" + +# Do not get mistaken - the package only includes microorganisms +mo_phylum("C. elegans") +# [1] "Cyanobacteria" # Bacteria?! +mo_fullname("C. elegans") +# [1] "Chroococcus limneticus elegans" # Because a microorganism was found +# }+
catalogue_of_life_version.Rd
Version info of included Catalogue of Life
+ +catalogue_of_life_version()
+
+
+This package contains the complete taxonomic tree of almost all microorganisms from the authoritative and comprehensive Catalogue of Life (http://www.catalogueoflife.org). This data is updated annually - check the included version with catalogue_of_life_version
.
Included are:
All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses
All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different species, of which most are not microbial. Including everything tremendously slows down our algortihms, and not all fungi fit the scope of this package. By only including the aforementioned taxonomic orders, the most relevant species are covered (like genera Aspergillus, Candida, Pneumocystis, Saccharomyces and Trichophyton).
All ~15,000 previously accepted names of (sub)species that have been taxonomically renamed
The complete taxonomic tree of all included (sub)species: from kingdom to subspecies
The responsible author(s) and year of scientific publication
The Catalogue of Life (http://www.catalogueoflife.org) is the most comprehensive and authoritative global index of species currently available. It holds essential information on the names, relationships and distributions of over 1.6 million species. The Catalogue of Life is used to support the major biodiversity and conservation information services such as the Global Biodiversity Information Facility (GBIF), Encyclopedia of Life (EoL) and the International Union for Conservation of Nature Red List. It is recognised by the Convention on Biological Diversity as a significant component of the Global Taxonomy Initiative and a contribution to Target 1 of the Global Strategy for Plant Conservation.
+The syntax used to transform the original data to a cleansed R format, can be found here: https://gitlab.com/msberends/AMR/blob/master/reproduction_of_microorganisms.R.
+ +ITIS: Integrated Taxonomic Information System
The Catalogue of Life
Version info of included Catalogue of Life
mo_fullname()
mo_shortname()
mo_subspecies()
mo_species()
mo_genus()
mo_family()
mo_order()
mo_class()
mo_phylum()
mo_subkingdom()
mo_kingdom()
mo_type()
mo_gramstain()
mo_ref()
mo_authors()
mo_year()
mo_taxonomy()
mo_property()
mo_fullname()
mo_shortname()
mo_subspecies()
mo_species()
mo_genus()
mo_family()
mo_order()
mo_class()
mo_phylum()
mo_kingdom()
mo_type()
mo_gramstain()
mo_ref()
mo_authors()
mo_year()
mo_taxonomy()
mo_url()
mo_property()
Property of a microorganism
mo
ID of the microorganism in the microorganisms
data set
-This package contains the complete microbial taxonomic data (with all nine taxonomic ranks - from kingdom to subspecies) from the publicly available Integrated Taxonomic Information System (ITIS, https://www.itis.gov).
All ~20,000 (sub)species from the taxonomic kingdoms Bacteria, Fungi and Protozoa are included in this package, as well as all their ~2,500 previously accepted names known to ITIS. Furthermore, the responsible authors and year of publication are available. This allows users to use authoritative taxonomic information for their data analysis on any microorganism, not only human pathogens. It also helps to quickly determine the Gram stain of bacteria, since ITIS honours the taxonomic branching order of bacterial phyla according to Cavalier-Smith (2002), which defines that all bacteria are classified into either subkingdom Negibacteria or subkingdom Posibacteria.
-ITIS is a partnership of U.S., Canadian, and Mexican agencies and taxonomic specialists [3].
+
+This package contains the complete taxonomic tree of almost all microorganisms from the authoritative and comprehensive Catalogue of Life (http://www.catalogueoflife.org). This data is updated annually - check the included version with catalogue_of_life_version
.
Included are:
All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses
All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different species, of which most are not microbial. Including everything tremendously slows down our algortihms, and not all fungi fit the scope of this package. By only including the aforementioned taxonomic orders, the most relevant species are covered (like genera Aspergillus, Candida, Pneumocystis, Saccharomyces and Trichophyton).
All ~15,000 previously accepted names of (sub)species that have been taxonomically renamed
The complete taxonomic tree of all included (sub)species: from kingdom to subspecies
The responsible author(s) and year of scientific publication
The Catalogue of Life (http://www.catalogueoflife.org) is the most comprehensive and authoritative global index of species currently available. It holds essential information on the names, relationships and distributions of over 1.6 million species. The Catalogue of Life is used to support the major biodiversity and conservation information services such as the Global Biodiversity Information Facility (GBIF), Encyclopedia of Life (EoL) and the International Union for Conservation of Nature Red List. It is recognised by the Convention on Biological Diversity as a significant component of the Global Taxonomy Initiative and a contribution to Target 1 of the Global Strategy for Plant Conservation.
+The syntax used to transform the original data to a cleansed R format, can be found here: https://gitlab.com/msberends/AMR/blob/master/reproduction_of_microorganisms.R.
A data.frame
with 56,659 observations and 15 variables:
mo
ID of microorganism
A data.frame
with 56,672 observations and 14 variables:
mo
ID of microorganism as used by this package
col_id
Catalogue of Life ID
genus
Taxonomic genus of the microorganism as found in ITIS, see Source
species
Taxonomic species of the microorganism as found in ITIS, see Source
subspecies
Taxonomic subspecies of the microorganism as found in ITIS, see Source
fullname
Full name, like "Echerichia coli"
family
Taxonomic family of the microorganism as found in ITIS, see Source
order
Taxonomic order of the microorganism as found in ITIS, see Source
class
Taxonomic class of the microorganism as found in ITIS, see Source
phylum
Taxonomic phylum of the microorganism as found in ITIS, see Source
subkingdom
Taxonomic subkingdom of the microorganism as found in ITIS, see Source
kingdom
Taxonomic kingdom of the microorganism as found in ITIS, see Source
gramstain
Gram of microorganism, like "Gram negative"
prevalence
An integer based on estimated prevalence of the microorganism in humans. Used internally by as.mo
, otherwise quite meaningless. It has a value of 25 for manually added items and a value of 1000 for all unprevalent microorganisms whose genus was somewhere in the top 250 (with another species).
ref
Author(s) and year of concerning publication as found in ITIS, see Source
kingdom
Taxonomic kingdom of the microorganism
phylum
Taxonomic phylum of the microorganism
class
Taxonomic class of the microorganism
order
Taxonomic order of the microorganism
family
Taxonomic family of the microorganism
genus
Taxonomic genus of the microorganism
species
Taxonomic species of the microorganism
subspecies
Taxonomic subspecies of the microorganism
rank
Taxonomic rank of the microorganism, like "species"
or "genus"
ref
Author(s) and year of concerning scientific publication
species_id
ID of the species as used by the Catalogue of Life
Integrated Taxonomic Information System (ITIS) public online database, https://www.itis.gov.
+Catalogue of Life: Annual Checklist (public online database), www.catalogueoflife.org.
2 other undefined (unknown Gram negatives and unknown Gram positives)
-This package contains the complete microbial taxonomic data (with all nine taxonomic ranks - from kingdom to subspecies) from the publicly available Integrated Taxonomic Information System (ITIS, https://www.itis.gov).
All ~20,000 (sub)species from the taxonomic kingdoms Bacteria, Fungi and Protozoa are included in this package, as well as all their ~2,500 previously accepted names known to ITIS. Furthermore, the responsible authors and year of publication are available. This allows users to use authoritative taxonomic information for their data analysis on any microorganism, not only human pathogens. It also helps to quickly determine the Gram stain of bacteria, since ITIS honours the taxonomic branching order of bacterial phyla according to Cavalier-Smith (2002), which defines that all bacteria are classified into either subkingdom Negibacteria or subkingdom Posibacteria.
-ITIS is a partnership of U.S., Canadian, and Mexican agencies and taxonomic specialists [3].
+
+This package contains the complete taxonomic tree of almost all microorganisms from the authoritative and comprehensive Catalogue of Life (http://www.catalogueoflife.org). This data is updated annually - check the included version with catalogue_of_life_version
.
Included are:
All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses
All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different species, of which most are not microbial. Including everything tremendously slows down our algortihms, and not all fungi fit the scope of this package. By only including the aforementioned taxonomic orders, the most relevant species are covered (like genera Aspergillus, Candida, Pneumocystis, Saccharomyces and Trichophyton).
All ~15,000 previously accepted names of (sub)species that have been taxonomically renamed
The complete taxonomic tree of all included (sub)species: from kingdom to subspecies
The responsible author(s) and year of scientific publication
The Catalogue of Life (http://www.catalogueoflife.org) is the most comprehensive and authoritative global index of species currently available. It holds essential information on the names, relationships and distributions of over 1.6 million species. The Catalogue of Life is used to support the major biodiversity and conservation information services such as the Global Biodiversity Information Facility (GBIF), Encyclopedia of Life (EoL) and the International Union for Conservation of Nature Red List. It is recognised by the Convention on Biological Diversity as a significant component of the Global Taxonomy Initiative and a contribution to Target 1 of the Global Strategy for Plant Conservation.
+The syntax used to transform the original data to a cleansed R format, can be found here: https://gitlab.com/msberends/AMR/blob/master/reproduction_of_microorganisms.R.
A data set containing old (previously valid or accepted) taxonomic names according to ITIS. This data set is used internally by as.mo
.
A data set containing old (previously valid or accepted) taxonomic names according to the Catalogue of Life. This data set is used internally by as.mo
.
A data.frame
with 14,506 observations and 4 variables:
col_id
Catalogue of Life ID
tsn_new
New Catalogue of Life ID
fullname
Old taxonomic name of the microorganism as found in the CoL, see Source
ref
Author(s) and year of concerning publication as found in the CoL, see Source
fullname
Old taxonomic name of the microorganism
ref
Author(s) and year of concerning scientific publication
[3] Integrated Taxonomic Information System (ITIS) on-line database, https://www.itis.gov.
+[3] Catalogue of Life: Annual Checklist (public online database), www.catalogueoflife.org.
-
-This package contains the complete microbial taxonomic data (with all nine taxonomic ranks - from kingdom to subspecies) from the publicly available Integrated Taxonomic Information System (ITIS, https://www.itis.gov).
All ~20,000 (sub)species from the taxonomic kingdoms Bacteria, Fungi and Protozoa are included in this package, as well as all their ~2,500 previously accepted names known to ITIS. Furthermore, the responsible authors and year of publication are available. This allows users to use authoritative taxonomic information for their data analysis on any microorganism, not only human pathogens. It also helps to quickly determine the Gram stain of bacteria, since ITIS honours the taxonomic branching order of bacterial phyla according to Cavalier-Smith (2002), which defines that all bacteria are classified into either subkingdom Negibacteria or subkingdom Posibacteria.
-ITIS is a partnership of U.S., Canadian, and Mexican agencies and taxonomic specialists [3].
+
+This package contains the complete taxonomic tree of almost all microorganisms from the authoritative and comprehensive Catalogue of Life (http://www.catalogueoflife.org). This data is updated annually - check the included version with catalogue_of_life_version
.
Included are:
All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses
All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different species, of which most are not microbial. Including everything tremendously slows down our algortihms, and not all fungi fit the scope of this package. By only including the aforementioned taxonomic orders, the most relevant species are covered (like genera Aspergillus, Candida, Pneumocystis, Saccharomyces and Trichophyton).
All ~15,000 previously accepted names of (sub)species that have been taxonomically renamed
The complete taxonomic tree of all included (sub)species: from kingdom to subspecies
The responsible author(s) and year of scientific publication
The Catalogue of Life (http://www.catalogueoflife.org) is the most comprehensive and authoritative global index of species currently available. It holds essential information on the names, relationships and distributions of over 1.6 million species. The Catalogue of Life is used to support the major biodiversity and conservation information services such as the Global Biodiversity Information Facility (GBIF), Encyclopedia of Life (EoL) and the International Union for Conservation of Nature Red List. It is recognised by the Convention on Biological Diversity as a significant component of the Global Taxonomy Initiative and a contribution to Target 1 of the Global Strategy for Plant Conservation.
+The syntax used to transform the original data to a cleansed R format, can be found here: https://gitlab.com/msberends/AMR/blob/master/reproduction_of_microorganisms.R.
All functions will return the most recently known taxonomic property according to ITIS, except for mo_ref
, mo_authors
and mo_year
. This leads to the following results:
All functions will return the most recently known taxonomic property according to the Catalogue of Life, except for mo_ref
, mo_authors
and mo_year
. This leads to the following results:
mo_fullname("Chlamydia psittaci")
will return "Chlamydophila psittaci"
(with a warning about the renaming)
mo_ref("Chlamydia psittaci")
will return "Page, 1968"
(with a warning about the renaming)
mo_ref("Chlamydophila psittaci")
will return "Everett et al., 1999"
(without a warning)
The Gram stain - mo_gramstain()
- will be determined on the taxonomic kingdom and phylum. According to Cavalier-Smith (2002) who defined subkingdoms Negibacteria and Posibacteria, only these phyla are Posibacteria: Actinobacteria, Chloroflexi, Firmicutes and Tenericutes (ref: https://itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=956097). These bacteria are considered Gram positive - all other bacteria are considered Gram negative. Species outside the kingdom of Bacteria will return a value NA
.
The function mo_url()
will return the direct URL to the species in the Catalogue of Life.
Supported languages are "en"
(English), "de"
(German), "nl"
(Dutch), "es"
(Spanish), "it"
(Italian), "fr"
(French), and "pt"
(Portuguese).
-This package contains the complete microbial taxonomic data (with all nine taxonomic ranks - from kingdom to subspecies) from the publicly available Integrated Taxonomic Information System (ITIS, https://www.itis.gov).
All ~20,000 (sub)species from the taxonomic kingdoms Bacteria, Fungi and Protozoa are included in this package, as well as all their ~2,500 previously accepted names known to ITIS. Furthermore, the responsible authors and year of publication are available. This allows users to use authoritative taxonomic information for their data analysis on any microorganism, not only human pathogens. It also helps to quickly determine the Gram stain of bacteria, since ITIS honours the taxonomic branching order of bacterial phyla according to Cavalier-Smith (2002), which defines that all bacteria are classified into either subkingdom Negibacteria or subkingdom Posibacteria.
-ITIS is a partnership of U.S., Canadian, and Mexican agencies and taxonomic specialists [3].
+
+This package contains the complete taxonomic tree of almost all microorganisms from the authoritative and comprehensive Catalogue of Life (http://www.catalogueoflife.org). This data is updated annually - check the included version with catalogue_of_life_version
.
Included are:
All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses
All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different species, of which most are not microbial. Including everything tremendously slows down our algortihms, and not all fungi fit the scope of this package. By only including the aforementioned taxonomic orders, the most relevant species are covered (like genera Aspergillus, Candida, Pneumocystis, Saccharomyces and Trichophyton).
All ~15,000 previously accepted names of (sub)species that have been taxonomically renamed
The complete taxonomic tree of all included (sub)species: from kingdom to subspecies
The responsible author(s) and year of scientific publication
The Catalogue of Life (http://www.catalogueoflife.org) is the most comprehensive and authoritative global index of species currently available. It holds essential information on the names, relationships and distributions of over 1.6 million species. The Catalogue of Life is used to support the major biodiversity and conservation information services such as the Global Biodiversity Information Facility (GBIF), Encyclopedia of Life (EoL) and the International Union for Conservation of Nature Red List. It is recognised by the Convention on Biological Diversity as a significant component of the Global Taxonomy Initiative and a contribution to Target 1 of the Global Strategy for Plant Conservation.
+The syntax used to transform the original data to a cleansed R format, can be found here: https://gitlab.com/msberends/AMR/blob/master/reproduction_of_microorganisms.R.
[1] Becker K et al. Coagulase-Negative Staphylococci. 2014. Clin Microbiol Rev. 27(4): 870–926. https://dx.doi.org/10.1128/CMR.00109-13
[2] Lancefield RC A serological differentiation of human and other groups of hemolytic streptococci. 1933. J Exp Med. 57(4): 571–95. https://dx.doi.org/10.1084/jem.57.4.571
-[3] Integrated Taxonomic Information System (ITIS). Retrieved September 2018. http://www.itis.gov
+[3] Catalogue of Life: Annual Checklist (public online database), www.catalogueoflife.org.