diff --git a/.Rbuildignore b/.Rbuildignore index 23bb08a6..7e567125 100755 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -21,3 +21,4 @@ ^Meta$ ^pkgdown$ ^public$ +^reproduction.*R$ diff --git a/DESCRIPTION b/DESCRIPTION index a2047d31..f7d59cd2 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR Version: 0.5.0.9018 -Date: 2019-02-18 +Date: 2019-02-20 Title: Antimicrobial Resistance Analysis Authors@R: c( person( @@ -56,6 +56,7 @@ Suggests: covr (>= 3.0.1), curl, ggplot2, + microbenchmark, readxl, rmarkdown, rstudioapi, diff --git a/NAMESPACE b/NAMESPACE index c87056bd..23c2c8f5 100755 --- a/NAMESPACE +++ b/NAMESPACE @@ -68,6 +68,7 @@ export(atc_trivial_nl) export(atc_umcg) export(availability) export(brmo) +export(catalogue_of_life_version) export(count_I) export(count_IR) export(count_R) @@ -123,11 +124,11 @@ export(mo_ref) export(mo_renamed) export(mo_shortname) export(mo_species) -export(mo_subkingdom) export(mo_subspecies) export(mo_taxonomy) export(mo_type) export(mo_uncertainties) +export(mo_url) export(mo_year) export(mrgn) export(n_rsi) diff --git a/NEWS.md b/NEWS.md index 73952f58..6d347bb1 100755 --- a/NEWS.md +++ b/NEWS.md @@ -10,12 +10,12 @@ We've got a new website: [https://msberends.gitlab.io/AMR](https://msberends.git #### New * **BREAKING**: removed deprecated functions, parameters and references to 'bactid'. Use `as.mo()` to identify an MO code. -* Catalogue of Life (CoL) inclusion for data about microorganisms, which also contains all ITIS data we used previously. The `microorganisms` data set now contains: - * Almost 60,000 species from six different kingdoms - * Almost 15,000 previously accepted names which are now taxonomic 'synonyms' - * All (sub)species from the kingdoms Archaea, Bacteria, Chromista, Protozoa and Viruses - * All (sub)species from the orders Eurotiales, Saccharomycetales and Onygenales of the kingdom Fungi. The complete taxonomy of this kingdom has more than 130,000 species. The orders we included contains at least all memebers of the families *Candida*, *Aspergillus* and *Trichophyton*. - * Due to this change, the ID of *Streptococcus* was changed from `B_STRPTC` to `B_STRPT`. +* Catalogue of Life as a new taxonomic source for data about microorganisms, which also contains all ITIS data we used previously. The `microorganisms` data set now contains: + * All ~55,000 species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses + * All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different species, of which most are not microbial. Including everything tremendously slows down our algortihms, and not all fungi fit the scope of this package. By only including the aforementioned taxonomic orders, the most relevant species are covered (like genera *Aspergillus*, *Candida*, *Pneumocystis*, *Saccharomyces* and *Trichophyton*). + * All ~15,000 previously accepted names of species that have been taxonomically renamed + * The responsible author(s) and year of scientific publication + * Due to this change, some `mo` codes changed (e.g. *Streptococcus* changed from `B_STRPTC` to `B_STRPT`). A translation table is used internally to support older microorganism IDs, so users will not notice this difference. * Support for data from [WHONET](https://whonet.org/) and [EARS-Net](https://ecdc.europa.eu/en/about-us/partnerships-and-networks/disease-and-laboratory-networks/ears-net) (European Antimicrobial Resistance Surveillance Network): * Exported files from WHONET can be read and used in this package. For functions like `first_isolate()` and `eucast_rules()`, all parameters will be filled in automatically. * This package now knows all antibiotic abbrevations by EARS-Net (which are also being used by WHONET) - the `antibiotics` data set now contains a column `ears_net`. diff --git a/R/catalogue_of_life.R b/R/catalogue_of_life.R new file mode 100755 index 00000000..25c41ba5 --- /dev/null +++ b/R/catalogue_of_life.R @@ -0,0 +1,73 @@ +# ==================================================================== # +# TITLE # +# Antimicrobial Resistance (AMR) Analysis # +# # +# SOURCE # +# https://gitlab.com/msberends/AMR # +# # +# LICENCE # +# (c) 2019 Berends MS (m.s.berends@umcg.nl), Luz CF (c.f.luz@umcg.nl) # +# # +# This R package is free software; you can freely use and distribute # +# it for both personal and commercial purposes under the terms of the # +# GNU General Public License version 2.0 (GNU GPL-2), as published by # +# the Free Software Foundation. # +# # +# This R package was created for academic research and was publicly # +# released in the hope that it will be useful, but it comes WITHOUT # +# ANY WARRANTY OR LIABILITY. # +# Visit our website for more info: https://msberends.gitab.io/AMR. # +# ==================================================================== # + +#' The Catalogue of Life +#' +#' This package contains the complete taxonomic tree of almost all microorganisms from the authoritative and comprehensive Catalogue of Life. +#' @section Catalogue of Life: +#' \if{html}{\figure{logo_col.png}{options: height=60px style=margin-bottom:5px} \cr} +#' This package contains the complete taxonomic tree of almost all microorganisms from the authoritative and comprehensive Catalogue of Life (\url{http://www.catalogueoflife.org}). This data is updated annually - check the included version with \code{\link{catalogue_of_life_version}}. +#' +#' Included are: +#' \itemize{ +#' \item{All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses} +#' \item{All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different species, of which most are not microbial. Including everything tremendously slows down our algortihms, and not all fungi fit the scope of this package. By only including the aforementioned taxonomic orders, the most relevant species are covered (like genera \emph{Aspergillus}, \emph{Candida}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} +#' \item{All ~15,000 previously accepted names of (sub)species that have been taxonomically renamed} +#' \item{The complete taxonomic tree of all included (sub)species: from kingdom to subspecies} +#' \item{The responsible author(s) and year of scientific publication} +#' } +#' +#' The Catalogue of Life (\url{http://www.catalogueoflife.org}) is the most comprehensive and authoritative global index of species currently available. It holds essential information on the names, relationships and distributions of over 1.6 million species. The Catalogue of Life is used to support the major biodiversity and conservation information services such as the Global Biodiversity Information Facility (GBIF), Encyclopedia of Life (EoL) and the International Union for Conservation of Nature Red List. It is recognised by the Convention on Biological Diversity as a significant component of the Global Taxonomy Initiative and a contribution to Target 1 of the Global Strategy for Plant Conservation. +#' +#' The syntax used to transform the original data to a cleansed R format, can be found here: \url{https://gitlab.com/msberends/AMR/blob/master/reproduction_of_microorganisms.R}. +#' @inheritSection AMR Read more on our website! +#' @name catalogue_of_life +#' @rdname catalogue_of_life +#' @examples +#' # Get version info of included data set +#' catalogue_of_life_version() +#' +#' +#' # Get a note when a species was renamed +#' mo_shortname("Chlamydia psittaci") +#' # Note: 'Chlamydia psittaci' (Page, 1968) was renamed +#' # 'Chlamydophila psittaci' (Everett et al., 1999) +#' # [1] "C. psittaci" +#' +#' # Get any property from the entire taxonomic tree for all included species +#' mo_class("E. coli") +#' # [1] "Gammaproteobacteria" +#' +#' mo_family("E. coli") +#' # [1] "Enterobacteriaceae" +#' +#' mo_gramstain("E. coli") # based on kingdom and phylum, see ?mo_gramstain +#' # [1] "Gram negative" +#' +#' mo_ref("E. coli") +#' # [1] "Castellani et al., 1919" +#' +#' # Do not get mistaken - the package only includes microorganisms +#' mo_phylum("C. elegans") +#' # [1] "Cyanobacteria" # Bacteria?! +#' mo_fullname("C. elegans") +#' # [1] "Chroococcus limneticus elegans" # Because a microorganism was found +NULL diff --git a/R/data.R b/R/data.R index 3dac1489..85e4c27d 100755 --- a/R/data.R +++ b/R/data.R @@ -133,26 +133,25 @@ #' Data set with ~60,000 microorganisms #' #' A data set containing the microbial taxonomy of six kingdoms from the Catalogue of Life. MO codes can be looked up using \code{\link{as.mo}}. -#' @inheritSection ITIS ITIS -#' @format A \code{\link{data.frame}} with 56,659 observations and 15 variables: +#' @inheritSection catalogue_of_life Catalogue of Life +#' @format A \code{\link{data.frame}} with 56,672 observations and 14 variables: #' \describe{ -#' \item{\code{mo}}{ID of microorganism} +#' \item{\code{mo}}{ID of microorganism as used by this package} #' \item{\code{col_id}}{Catalogue of Life ID} -#' \item{\code{genus}}{Taxonomic genus of the microorganism as found in ITIS, see Source} -#' \item{\code{species}}{Taxonomic species of the microorganism as found in ITIS, see Source} -#' \item{\code{subspecies}}{Taxonomic subspecies of the microorganism as found in ITIS, see Source} #' \item{\code{fullname}}{Full name, like \code{"Echerichia coli"}} -#' \item{\code{family}}{Taxonomic family of the microorganism as found in ITIS, see Source} -#' \item{\code{order}}{Taxonomic order of the microorganism as found in ITIS, see Source} -#' \item{\code{class}}{Taxonomic class of the microorganism as found in ITIS, see Source} -#' \item{\code{phylum}}{Taxonomic phylum of the microorganism as found in ITIS, see Source} -#' \item{\code{subkingdom}}{Taxonomic subkingdom of the microorganism as found in ITIS, see Source} -#' \item{\code{kingdom}}{Taxonomic kingdom of the microorganism as found in ITIS, see Source} -#' \item{\code{gramstain}}{Gram of microorganism, like \code{"Gram negative"}} -#' \item{\code{prevalence}}{An integer based on estimated prevalence of the microorganism in humans. Used internally by \code{\link{as.mo}}, otherwise quite meaningless. It has a value of 25 for manually added items and a value of 1000 for all unprevalent microorganisms whose genus was somewhere in the top 250 (with another species).} -#' \item{\code{ref}}{Author(s) and year of concerning publication as found in ITIS, see Source} +#' \item{\code{kingdom}}{Taxonomic kingdom of the microorganism} +#' \item{\code{phylum}}{Taxonomic phylum of the microorganism} +#' \item{\code{class}}{Taxonomic class of the microorganism} +#' \item{\code{order}}{Taxonomic order of the microorganism} +#' \item{\code{family}}{Taxonomic family of the microorganism} +#' \item{\code{genus}}{Taxonomic genus of the microorganism} +#' \item{\code{species}}{Taxonomic species of the microorganism} +#' \item{\code{subspecies}}{Taxonomic subspecies of the microorganism} +#' \item{\code{rank}}{Taxonomic rank of the microorganism, like \code{"species"} or \code{"genus"}} +#' \item{\code{ref}}{Author(s) and year of concerning scientific publication} +#' \item{\code{species_id}}{ID of the species as used by the Catalogue of Life} #' } -#' @source Integrated Taxonomic Information System (ITIS) public online database, \url{https://www.itis.gov}. +#' @source Catalogue of Life: Annual Checklist (public online database), \url{www.catalogueoflife.org}. #' @details Manually added were: #' \itemize{ #' \item{9 species of \emph{Streptococcus} (beta haemolytic groups A, B, C, D, F, G, H, K and unspecified)} @@ -160,21 +159,37 @@ #' \item{2 other undefined (unknown Gram negatives and unknown Gram positives)} #' } #' @inheritSection AMR Read more on our website! -#' @seealso \code{\link{as.mo}} \code{\link{mo_property}} \code{\link{microorganisms.codes}} +#' @seealso \code{\link{as.mo}}, \code{\link{mo_property}}, \code{\link{microorganisms.codes}} "microorganisms" +catalogue_of_life <- list( + version = "Catalogue of Life: 2018 Annual Checklist", + url = "http://www.catalogueoflife.org/annual-checklist/2018" +) + +#' Version info of included Catalogue of Life +#' @seealso \code{\link{microorganisms}} +#' @inheritSection catalogue_of_life Catalogue of Life +#' @export +catalogue_of_life_version <- function() { + list(version = catalogue_of_life$version, + url = catalogue_of_life$url, + no_of_species = nrow(AMR::microorganisms), + no_of_synonyms = nrow(AMR::microorganisms.old)) +} + #' Data set with previously accepted taxonomic names #' -#' A data set containing old (previously valid or accepted) taxonomic names according to ITIS. This data set is used internally by \code{\link{as.mo}}. -#' @inheritSection as.mo ITIS +#' A data set containing old (previously valid or accepted) taxonomic names according to the Catalogue of Life. This data set is used internally by \code{\link{as.mo}}. +#' @inheritSection catalogue_of_life Catalogue of Life #' @format A \code{\link{data.frame}} with 14,506 observations and 4 variables: #' \describe{ #' \item{\code{col_id}}{Catalogue of Life ID} #' \item{\code{tsn_new}}{New Catalogue of Life ID} -#' \item{\code{fullname}}{Old taxonomic name of the microorganism as found in the CoL, see Source} -#' \item{\code{ref}}{Author(s) and year of concerning publication as found in the CoL, see Source} +#' \item{\code{fullname}}{Old taxonomic name of the microorganism} +#' \item{\code{ref}}{Author(s) and year of concerning scientific publication} #' } -#' @source [3] Integrated Taxonomic Information System (ITIS) on-line database, \url{https://www.itis.gov}. +#' @source [3] Catalogue of Life: Annual Checklist (public online database), \url{www.catalogueoflife.org}. #' @inheritSection AMR Read more on our website! #' @seealso \code{\link{as.mo}} \code{\link{mo_property}} \code{\link{microorganisms}} "microorganisms.old" @@ -187,7 +202,7 @@ #' \item{\code{certe}}{Commonly used code of a microorganism} #' \item{\code{mo}}{ID of the microorganism in the \code{\link{microorganisms}} data set} #' } -#' @inheritSection ITIS ITIS +#' @inheritSection catalogue_of_life Catalogue of Life #' @inheritSection AMR Read more on our website! #' @seealso \code{\link{as.mo}} \code{\link{microorganisms}} "microorganisms.codes" diff --git a/R/globals.R b/R/globals.R index b868fea1..caf62176 100755 --- a/R/globals.R +++ b/R/globals.R @@ -29,6 +29,7 @@ globalVariables(c(".", "Becker", "certe", "cnt", + "col_id", "count", "count.x", "count.y", @@ -49,6 +50,7 @@ globalVariables(c(".", "key_ab", "key_ab_lag", "key_ab_other", + "kingdom", "labs", "Lancefield", "Last name", @@ -73,7 +75,9 @@ globalVariables(c(".", "other_pat_or_mo", "Pasted", "patient_id", + "phylum", "prevalence", + "prevalent", "psae", "R", "real_first_isolate", @@ -85,6 +89,7 @@ globalVariables(c(".", "Sex", "shortname", "species", + "superprevalent", "trade_name", "transmute", "tsn", diff --git a/R/itis.R b/R/itis.R deleted file mode 100755 index 0b5b812e..00000000 --- a/R/itis.R +++ /dev/null @@ -1,63 +0,0 @@ -# ==================================================================== # -# TITLE # -# Antimicrobial Resistance (AMR) Analysis # -# # -# SOURCE # -# https://gitlab.com/msberends/AMR # -# # -# LICENCE # -# (c) 2019 Berends MS (m.s.berends@umcg.nl), Luz CF (c.f.luz@umcg.nl) # -# # -# This R package is free software; you can freely use and distribute # -# it for both personal and commercial purposes under the terms of the # -# GNU General Public License version 2.0 (GNU GPL-2), as published by # -# the Free Software Foundation. # -# # -# This R package was created for academic research and was publicly # -# released in the hope that it will be useful, but it comes WITHOUT # -# ANY WARRANTY OR LIABILITY. # -# Visit our website for more info: https://msberends.gitab.io/AMR. # -# ==================================================================== # - -#' ITIS: Integrated Taxonomic Information System -#' -#' All taxonomic names of all microorganisms are included in this package, using the authoritative Integrated Taxonomic Information System (ITIS). -#' @section ITIS: -#' \if{html}{\figure{logo_itis.jpg}{options: height=60px style=margin-bottom:5px} \cr} -#' This package contains the \strong{complete microbial taxonomic data} (with all nine taxonomic ranks - from kingdom to subspecies) from the publicly available Integrated Taxonomic Information System (ITIS, \url{https://www.itis.gov}). -#' -#' All ~20,000 (sub)species from \strong{the taxonomic kingdoms Bacteria, Fungi and Protozoa are included in this package}, as well as all their ~2,500 previously accepted names known to ITIS. Furthermore, the responsible authors and year of publication are available. This allows users to use authoritative taxonomic information for their data analysis on any microorganism, not only human pathogens. It also helps to quickly determine the Gram stain of bacteria, since ITIS honours the taxonomic branching order of bacterial phyla according to Cavalier-Smith (2002), which defines that all bacteria are classified into either subkingdom Negibacteria or subkingdom Posibacteria. -#' -#' ITIS is a partnership of U.S., Canadian, and Mexican agencies and taxonomic specialists [3]. -#' @inheritSection AMR Read more on our website! -#' @name ITIS -#' @rdname ITIS -#' @examples -#' # Get a note when a species was renamed -#' mo_shortname("Chlamydia psittaci") -#' # Note: 'Chlamydia psittaci' (Page, 1968) was renamed -#' # 'Chlamydophila psittaci' (Everett et al., 1999) -#' # [1] "C. psittaci" -#' -#' # Get any property from the entire taxonomic tree for all included species -#' mo_class("E. coli") -#' # [1] "Gammaproteobacteria" -#' -#' mo_family("E. coli") -#' # [1] "Enterobacteriaceae" -#' -#' mo_subkingdom("E. coli") -#' # [1] "Negibacteria" -#' -#' mo_gramstain("E. coli") # based on subkingdom -#' # [1] "Gram negative" -#' -#' mo_ref("E. coli") -#' # [1] "Castellani and Chalmers, 1919" -#' -#' # Do not get mistaken - the package only includes microorganisms -#' mo_phylum("C. elegans") -#' # [1] "Cyanobacteria" # Bacteria?! -#' mo_fullname("C. elegans") -#' # [1] "Chroococcus limneticus elegans" # Because a microorganism was found -NULL diff --git a/R/mo.R b/R/mo.R index 85b2dfc1..e2550ed8 100755 --- a/R/mo.R +++ b/R/mo.R @@ -77,7 +77,7 @@ #' \item{It strips off values between brackets and the brackets itself, and re-evaluates the input with all previous rules} #' \item{It strips off words from the end one by one and re-evaluates the input with all previous rules} #' \item{It strips off words from the start one by one and re-evaluates the input with all previous rules} -#' \item{It tries to look for some manual changes which are not yet published to the ITIS database (like \emph{Propionibacterium} not yet being \emph{Cutibacterium})} +#' \item{It tries to look for some manual changes which are not yet published to the Catalogue of Life (like \emph{Propionibacterium} not yet being \emph{Cutibacterium})} #' } #' #' Examples: @@ -94,17 +94,17 @@ #' #' Use \code{mo_renamed()} to get a vector with all values that could be coerced based on an old, previously accepted taxonomic name. #' -#' @inheritSection ITIS ITIS +#' @inheritSection catalogue_of_life Catalogue of Life # (source as a section, so it can be inherited by other man pages) #' @section Source: #' [1] Becker K \emph{et al.} \strong{Coagulase-Negative Staphylococci}. 2014. Clin Microbiol Rev. 27(4): 870–926. \url{https://dx.doi.org/10.1128/CMR.00109-13} #' #' [2] Lancefield RC \strong{A serological differentiation of human and other groups of hemolytic streptococci}. 1933. J Exp Med. 57(4): 571–95. \url{https://dx.doi.org/10.1084/jem.57.4.571} #' -#' [3] Integrated Taxonomic Information System (ITIS). Retrieved September 2018. \url{http://www.itis.gov} +#' [3] Catalogue of Life: Annual Checklist (public online database), \url{www.catalogueoflife.org}. #' @export #' @return Character (vector) with class \code{"mo"}. Unknown values will return \code{NA}. -#' @seealso \code{\link{microorganisms}} for the \code{data.frame} with ITIS content that is being used to determine ID's. \cr +#' @seealso \code{\link{microorganisms}} for the \code{data.frame} that is being used to determine ID's. \cr #' The \code{\link{mo_property}} functions (like \code{\link{mo_genus}}, \code{\link{mo_gramstain}}) to get properties based on the returned code. #' @inheritSection AMR Read more on our website! #' @examples @@ -216,15 +216,15 @@ exec_as.mo <- function(x, Becker = FALSE, Lancefield = FALSE, x <- x[!is.na(x) & !is.null(x) & !identical(x, "")] - # conversion v0.5.0 to v0.6.0, remove for v0.7.0 - x <- gsub("B_STRPTC", "B_STRPT", x) - x <- gsub("B_STRPT_EQUI", "B_STRPT_EQU", x) - x <- gsub("B_PDMNS", "B_PSDMN", x) - x <- gsub("B_CTRDM", "B_CLSTR", x) - x <- gsub("F_CANDD_GLB", "F_CANDD_GLA", x) - x <- gsub("F_CANDD_LUS", "F_CANDD", x) - x <- gsub("B_FCTRM", "B_FSBCT", x) - + # conversion of old MO codes from v0.5.0 (ITIS) to later versions (Catalogue of Life) + if (any(x %like% "^[BFP]_[A-Z]{3,7}")) { + leftpart <- gsub("^([BFP]_[A-Z]{3,7}).*", "\\1", x) + if (any(leftpart %in% names(mo_codes_v0.5.0))) { + rightpart <- gsub("^[BFP]_[A-Z]{3,7}(.*)", "\\1", x) + leftpart <- mo_codes_v0.5.0[leftpart] + x[!is.na(leftpart)] <- paste0(leftpart[!is.na(leftpart)], rightpart[!is.na(leftpart)]) + } + } # defined df to check for if (!is.null(reference_df)) { diff --git a/R/mo_property.R b/R/mo_property.R index 9443687e..82b409c3 100755 --- a/R/mo_property.R +++ b/R/mo_property.R @@ -26,14 +26,18 @@ #' @param property one of the column names of one of the \code{\link{microorganisms}} data set or \code{"shortname"} #' @param language language of the returned text, defaults to system language (see \code{\link{get_locale}}) and can also be set with \code{\link{getOption}("AMR_locale")}. Use \code{language = NULL} or \code{language = ""} to prevent translation. #' @param ... other parameters passed on to \code{\link{as.mo}} -#' @details All functions will return the most recently known taxonomic property according to ITIS, except for \code{mo_ref}, \code{mo_authors} and \code{mo_year}. This leads to the following results: +#' @details All functions will return the most recently known taxonomic property according to the Catalogue of Life, except for \code{mo_ref}, \code{mo_authors} and \code{mo_year}. This leads to the following results: #' \itemize{ #' \item{\code{mo_fullname("Chlamydia psittaci")} will return \code{"Chlamydophila psittaci"} (with a warning about the renaming)} #' \item{\code{mo_ref("Chlamydia psittaci")} will return \code{"Page, 1968"} (with a warning about the renaming)} #' \item{\code{mo_ref("Chlamydophila psittaci")} will return \code{"Everett et al., 1999"} (without a warning)} #' } +#' +#' The Gram stain - \code{mo_gramstain()} - will be determined on the taxonomic kingdom and phylum. According to Cavalier-Smith (2002) who defined subkingdoms Negibacteria and Posibacteria, only these phyla are Posibacteria: Actinobacteria, Chloroflexi, Firmicutes and Tenericutes (ref: \url{https://itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=956097}). These bacteria are considered Gram positive - all other bacteria are considered Gram negative. Species outside the kingdom of Bacteria will return a value \code{NA}. +#' +#' The function \code{mo_url()} will return the direct URL to the species in the Catalogue of Life. #' @inheritSection get_locale Supported languages -#' @inheritSection ITIS ITIS +#' @inheritSection catalogue_of_life Catalogue of Life #' @inheritSection as.mo Source #' @rdname mo_property #' @name mo_property @@ -49,14 +53,13 @@ #' # All properties of Escherichia coli #' ## taxonomic properties #' mo_kingdom("E. coli") # "Bacteria" -#' mo_subkingdom("E. coli") # "Negibacteria" #' mo_phylum("E. coli") # "Proteobacteria" #' mo_class("E. coli") # "Gammaproteobacteria" #' mo_order("E. coli") # "Enterobacteriales" #' mo_family("E. coli") # "Enterobacteriaceae" #' mo_genus("E. coli") # "Escherichia" #' mo_species("E. coli") # "coli" -#' mo_subspecies("E. coli") # NA +#' mo_subspecies("E. coli") # "" #' #' ## colloquial properties #' mo_fullname("E. coli") # "Escherichia coli" @@ -220,12 +223,6 @@ mo_phylum <- function(x, ...) { mo_validate(x = x, property = "phylum", ...) } -#' @rdname mo_property -#' @export -mo_subkingdom <- function(x, ...) { - mo_validate(x = x, property = "subkingdom", ...) -} - #' @rdname mo_property #' @export mo_kingdom <- function(x, ...) { @@ -290,6 +287,15 @@ mo_taxonomy <- function(x, ...) { subspecies = mo_subspecies(x)) } +#' @rdname mo_property +#' @export +mo_url <- function(x, ...) { + u <- mo_validate(x = x, property = "species_id", ...) + u[u != ""] <- paste0(catalogue_of_life$url, "/details/species/id/", u) + u +} + + #' @rdname mo_property #' @importFrom data.table data.table as.data.table setkey #' @export diff --git a/R/zzz.R b/R/zzz.R index 7dcfa05b..4e3c2bef 100755 --- a/R/zzz.R +++ b/R/zzz.R @@ -128,6 +128,198 @@ NULL value = microorganisms.oldDT, envir = asNamespace("AMR")) + # conversion of old MO codes from v0.5.0 (ITIS) to later versions (Catalogue of Life) + mo_codes_v0.5.0 <- c(B_ACHRMB = "B_ACHRM", B_ANNMA = "B_ACTNS", B_ACLLS = "B_ALCYC", + B_AHNGM = "B_ARCHN", B_ARMTM = "B_ARMTMN", B_ARTHRS = "B_ARTHR", + B_APHLS = "B_AZRHZP", B_BRCHA = "B_BRCHY", B_BCTRM = "B_BRVBCT", + B_CLRBCT = "B_CLRBC", B_CTRDM = "B_CLSTR", B_CPRMM = "B_CYLND", + B_DLCLN = "B_DPLCL", B_DMCLM = "B_DSLFT", B_DSLFVB = "B_DSLFV", + B_FCTRM = "B_FSBCT", B_GNRLA = "B_GRDNR", B_HNRBM = "B_HLNRB", + B_HPHGA = "B_HNPHGA", B_HCCCS = "B_HYDRC", B_MCRCLS = "B_MCRCL", + B_MTHYLS = "B_MLSMA", B_MARCLS = "B_MRCLS", B_MGCLS = "B_MSTGC", + B_MCLLA = "B_MTHYLC", B_MYCPLS = "B_MYCPL", B_NBCTR = "B_NTRBC", + B_OCLLS = "B_OCNBC", B_PTHRX = "B_PLNKT", B_PCCCS = "B_PRCHL", + B_PSPHN = "B_PRPHY", B_PDMNS = "B_PSDMN", B_SCCHRP = "B_SCCHR", + B_SRBCTR = "B_SHRBCTR", B_STRPTC = "B_STRPT", B_SHMNS = "B_SYNTR", + B_TRBCTR = "B_THRMN", P_ALBMN = "C_ABMNA", F_ACHLY = "C_ACHLY", + P_ACINT = "C_ACINT", P_ARTCL = "C_ACLNA", P_ACRVL = "C_ACRVL", + P_ADRCT = "C_ADRCT", P_AMPHS = "C_AHSRS", F_ALBUG = "C_ALBUG", + P_ALCNT = "C_ALCNT", P_ALFRD = "C_ALFRD", P_ALLGR = "C_ALLGR", + P_AMPHL = "C_ALPTS", F_ALTHR = "C_ALTHR", P_AMLLA = "C_AMLLA", + P_ANMLN = "C_AMLNA", P_AMMBC = "C_AMMBC", P_AMMDS = "C_AMMDS", + P_AMMLG = "C_AMMLG", P_AMMMR = "C_AMMMR", P_AMMMS = "C_AMMMS", + P_AMMON = "C_AMMON", P_AMMSC = "C_AMMSC", P_AMMSP = "C_AMMSP", + P_AMMST = "C_AMMST", P_AMMTM = "C_AMMTM", F_AMYCS = "C_AMYCS", + P_ANARM = "C_ANARM", P_ANGLD = "C_ANGLD", P_ANGLG = "C_ANGLG", + P_ANNLC = "C_ANNLC", F_ANSLP = "C_ANSLP", F_APDCH = "C_APDCH", + F_APHND = "C_APHND", F_APLNC = "C_APLNC", F_AQLND = "C_AQLND", + P_ARCHS = "C_ARCHAS", P_ASTRN = "C_ARNNN", P_ARNPR = "C_ARNPR", + F_ARSPR = "C_ARSPR", P_ARTST = "C_ARTSTR", P_AMPHC = "C_ARYNA", + P_ASCHM = "C_ASCHM", P_ASPDS = "C_ASPDS", P_ASTCL = "C_ASTCL", + P_ASTRG = "C_ASTRGR", P_ASTRM = "C_ASTRMM", P_ASTRR = "C_ASTRR", + P_ASTRT = "C_ASTRTR", F_ATKNS = "C_ATKNS", F_AYLLA = "C_AYLLA", + P_BAGGN = "C_BAGGN", P_BCCLL = "C_BCCLL", P_BDLLD = "C_BDLLD", + P_BGNRN = "C_BGNRN", P_BLCLN = "C_BLCLN", P_BLMND = "C_BLMND", + P_BLMNL = "C_BLMNL", P_BLPHR = "C_BLPHR", P_BLVNT = "C_BLVNT", + P_BOLVN = "C_BOLVN", P_BORLS = "C_BORLS", P_BRNNM = "C_BRNNM", + P_BRSLN = "C_BRSLN", P_BRSRD = "C_BRSRD", F_BRVLG = "C_BRVLG", + F_BNLLA = "C_BRVLGN", P_BSCCM = "C_BSCCM", F_BSDPH = "C_BSDPH", + P_BTHYS = "C_BTHYS", P_BTLLN = "C_BTLLN", P_BULMN = "C_BULMN", + P_CCLDM = "C_CCLDM", P_CDNLL = "C_CDNLL", P_CLPSS = "C_CDNLLP", + P_CHLDN = "C_CHLDNL", P_CHLST = "C_CHLST", P_CHNLM = "C_CHNLM", + P_CHRYS = "C_CHRYSL", P_CHTSP = "C_CHTSP", P_CBCDS = "C_CIBCDS", + P_CLCRN = "C_CLCRN", P_CLMNA = "C_CLMNA", P_CLPDM = "C_CLPDM", + P_CLPHR = "C_CLPHRY", P_CLVLN = "C_CLVLN", P_CMPNL = "C_CMPNL", + P_CNCRS = "C_CNCRS", P_CNTCH = "C_CNTCH", F_CNTRM = "C_CNTRMY", + P_COLPD = "C_COLPD", P_COLPS = "C_COLPS", P_CPRDS = "C_CPRDS", + P_CRNSP = "C_CPRMA", P_CRBNL = "C_CRBNL", P_CRBRB = "C_CRBRB", + P_CRBRG = "C_CRBRG", P_CRBRS = "C_CRBRS", P_CRCHS = "C_CRCHS", + P_CRCLC = "C_CRCLC", P_CRNLC = "C_CRNLC", P_CRNTH = "C_CRNTH", + P_CRPNT = "C_CRPNT", P_CRSTG = "C_CRSTG", P_CRTHN = "C_CRTHN", + P_CRTRN = "C_CRTRN", P_CYMBL = "C_CRTTA", P_CRYPT = "C_CRYPT", + P_CSHMN = "C_CSHMNL", P_CSSDL = "C_CSSDL", P_CLNDS = "C_CSSDLN", + P_CHRNA = "C_CTHRN", P_CTPSS = "C_CTPSS", P_CUNLN = "C_CUNLN", + P_CYLND = "C_CVLNA", P_CYCLC = "C_CYCLCB", P_CDNTA = "C_CYCLD", + P_CYCLG = "C_CYCLG", P_CYCLM = "C_CYCLM", P_CYRTL = "C_CYRTL", + P_CYSTM = "C_CYSTM", P_DCHLM = "C_DCHLM", P_DCRBS = "C_DCRBS", + P_DCTYC = "C_DCTYC", P_DIDNM = "C_DIDNM", P_DLPTS = "C_DLPTS", + P_DNTLN = "C_DNTLN", P_DNTST = "C_DNTST", P_DORTH = "C_DORTH", + P_DCTYP = "C_DPHMS", F_DPLCY = "C_DPLCY", P_DNDRT = "C_DRTNA", + P_DSCMM = "C_DSCMM", P_DSCRB = "C_DSCRB", P_DSCRN = "C_DSCRN", + P_DSCSP = "C_DSCSP", P_DSNBR = "C_DSNBR", P_DYCBC = "C_DYCBC", + F_DCTYC = "C_DYCHS", F_ECTRG = "C_ECTRG", B_EDWRD = "C_EDWRD", + P_EGGRL = "C_EGGRL", P_EHLYS = "C_EHLYS", P_EHRNB = "C_EHRNB", + P_ELPHD = "C_ELPHD", P_ENCHL = "C_ELYDM", P_EPHDM = "C_EPHDM", + P_EPLTS = "C_EPLTS", P_EPLXL = "C_EPLXL", P_EPNDL = "C_EPNDL", + P_EPNDS = "C_EPNDS", P_ENLLA = "C_EPSTM", P_EPSTY = "C_EPSTY", + F_ERYCH = "C_ERYCH", F_ESMDM = "C_ESMDM", P_ESSYR = "C_ESSYR", + P_FSCHR = "C_FHRNA", P_FLRLS = "C_FLRLS", P_FLNTN = "C_FNTNA", + P_FRNDC = "C_FRNDC", P_FRNTN = "C_FRNTN", P_FRSNK = "C_FRSNK", + P_FNLLA = "C_FSCHRN", P_FSSRN = "C_FSSRN", P_FVCSS = "C_FVCSS", + P_GDRYN = "C_GDRYN", F_GELGN = "C_GELGN", P_GERDA = "C_GERDA", + P_GLACM = "C_GLACM", P_GLBBL = "C_GLBBL", P_GLBGR = "C_GLBGR", + P_GLBLN = "C_GLBLN", P_GRTLA = "C_GLBRT", P_GLBTX = "C_GLBTX", + P_GLLNA = "C_GLLNA", P_GLMSP = "C_GLMSP", P_GLNDL = "C_GLNDL", + F_GNMCH = "C_GNMCH", P_GOSLL = "C_GOSLL", P_GRNDS = "C_GRNDS", + P_GRNTA = "C_GRNTA", P_GLBRT = "C_GTLLA", P_GTTLN = "C_GTTLN", + P_GVLNP = "C_GVLNP", P_GYPSN = "C_GYPSN", P_GYRDN = "C_GYRDN", + P_HALTR = "C_HALTR", P_HANZW = "C_HANZW", P_HAURN = "C_HAURN", + P_HELNN = "C_HELNN", P_HLPHR = "C_HHRYA", P_HLNTA = "C_HLNTA", + F_HLPHT = "C_HLPHT", P_HLSTC = "C_HLSTC", P_HMSPH = "C_HMSPH", + P_HMTRM = "C_HMTRM", P_HPKNS = "C_HPKNS", P_HPLPH = "C_HPLPH", + P_HPPCR = "C_HPPCR", P_HNLLA = "C_HPPCRP", P_HRMSN = "C_HRMSN", + P_HRNLL = "C_HRNLL", F_HRPCH = "C_HRPCH", P_HSTGR = "C_HSTGR", + P_HSTTL = "C_HSTTL", P_HTRST = "C_HTGNA", P_HTRLL = "C_HTRLL", + P_HTRPH = "C_HTRPH", F_HYPHC = "C_HYPHC", P_HYPRM = "C_HYPRM", + P_INTRN = "C_INTRN", P_IRIDI = "C_IRIDI", P_ISLND = "C_ISLND", + P_JCLLL = "C_JCLLL", P_KHLLL = "C_KHLLL", P_KRNPS = "C_KRNPS", + P_KRRRL = "C_KRRRL", P_LABOE = "C_LABOE", P_LAGEN = "C_LAGEN", + P_LBSLL = "C_LBSLL", F_LTHLA = "C_LBYRN", P_LCRYM = "C_LCRYM", + P_LEMBS = "C_LEMBS", F_LGNDM = "C_LGNDM", P_LGNMM = "C_LGNMM", + P_LGNPH = "C_LGNPHR", F_LGNSM = "C_LGNSM", P_LGYNP = "C_LGYNP", + P_LITTB = "C_LITTB", P_LITUL = "C_LITUL", P_LMBDN = "C_LMBDN", + P_LMRCK = "C_LMRCK", F_LBYRN = "C_LMYXA", P_LNGLN = "C_LNGLN", + P_LNTCL = "C_LNTCL", P_LOXDS = "C_LOXDS", F_LPTLG = "C_LPTLG", + F_LNLLA = "C_LPTLGN", F_LPTMT = "C_LPTMT", P_LRYNG = "C_LRYNG", + P_LTCRN = "C_LTCRN", P_LTHPL = "C_LTHPL", P_LTNTS = "C_LTNTS", + F_LTRST = "C_LTRST", P_LXPHY = "C_LXPHY", P_MCRTH = "C_MCRTH", + P_MELNS = "C_MELNS", P_MSDNM = "C_MESDNM", P_METPS = "C_METPS", + P_MIMSN = "C_MIMSN", P_MINCN = "C_MINCN", P_MLLNL = "C_MLLNL", + P_MLMMN = "C_MLMMN", F_MNDNL = "C_MNDNL", P_MNLYS = "C_MNLYS", + P_MNPSS = "C_MNPSS", P_MRGNL = "C_MRGNL", P_MRGNP = "C_MRGNP", + P_MRSPL = "C_MRSPL", P_MRTNT = "C_MRTNT", P_MSSLN = "C_MSSLN", + P_MSSSS = "C_MSSSS", P_MTCNT = "C_MTCNT", P_MYCHS = "C_MYCHS", + P_MYSCH = "C_MYSCH", F_MYZCY = "C_MYZCY", P_NASSL = "C_NASSL", + P_NBCLN = "C_NBCLN", P_NBCLR = "C_NBCLR", P_NCNRB = "C_NCNRB", + P_NDBCL = "C_NDBCL", P_NRLLA = "C_NDBCLR", P_NMMLC = "C_NMMLC", + F_NMTPH = "C_NMTPH", P_NNNLL = "C_NNNLL", P_NODSR = "C_NODSR", + P_NONIN = "C_NONIN", P_NOURI = "C_NOURI", P_OCLNA = "C_OCLNA", + P_OGLNA = "C_OGLNA", P_OPHTH = "C_OLMDM", F_OLPDP = "C_OLPDP", + P_ONYCH = "C_OMPSS", P_OOLIN = "C_OOLIN", P_OPRCL = "C_OPRCL", + P_ORBLN = "C_ORBLN", F_ORCAD = "C_ORCAD", P_ORDRS = "C_ORDRS", + P_OPHRY = "C_ORYDM", P_OSNGL = "C_OSNGL", P_OXYTR = "C_OXYTR", + P_PARRN = "C_PARRN", P_PATRS = "C_PATRS", P_PAVNN = "C_PAVNN", + P_PTYCH = "C_PCYLS", P_PDPHR = "C_PDPHR", P_PELSN = "C_PELSN", + F_PHGMY = "C_PHGMY", F_PSDSP = "C_PHRTA", P_PHRYG = "C_PHRYG", + P_PHYSL = "C_PHYSL", F_PHYTP = "C_PHYTP", P_PLACS = "C_PLACS", + P_PLCPS = "C_PLCPS", P_PLCPSL = "C_PLCPSL", P_PLCTN = "C_PLCTN", + P_PLGPH = "C_PLGPH", B_PLGTH = "C_PLGTH", P_PLMRN = "C_PLMRN", + P_PLNCT = "C_PLNCT", P_PLNDSC = "C_PLNDSC", P_PLNGY = "C_PLNGY", + P_PLNRBL = "C_PLNLLA", P_PLNLN = "C_PLNLN", P_PLNLR = "C_PLNLR", + P_PLNRB = "C_PLNRB", P_PLNSP = "C_PLNSPR", P_PLRNM = "C_PLRNM", + P_PLRST = "C_PLRST", P_PLRTR = "C_PLRTR", F_PLSMD = "C_PLSMD", + P_PLTYC = "C_PLTYC", P_PSDBL = "C_PLVNA", P_PLYMR = "C_PLYMR", + P_PLTYN = "C_PNMTM", P_PNRPL = "C_PNRPL", F_PNTSM = "C_PNTSM", + P_PRCNT = "C_PRCNT", P_PRFSS = "C_PRFSS", P_PRMCM = "C_PRMCUM", + F_PRNSP = "C_PRNSP", P_PRPND = "C_PRPND", P_PRPYX = "C_PRPYX", + P_PRRDN = "C_PRRDN", P_PSDDF = "C_PSDDF", P_PSDMC = "C_PSDMC", + P_PSDND = "C_PSDND", P_PSDNN = "C_PSDNN", P_PSDPL = "C_PSDPLY", + P_PSMMS = "C_PSMMS", P_PTLLN = "C_PTLLN", P_PTLLND = "C_PTLLND", + F_PTRSN = "C_PTRSN", P_PULLN = "C_PULLN", P_PUTLN = "C_PUTLN", + P_PRTTR = "C_PYMNA", P_PYRGL = "C_PYRGL", P_PYRGO = "C_PYRGO", + P_PYRLN = "C_PYRLN", F_PYTHM = "C_PYTHIM", F_PYTHL = "C_PYTHL", + P_PYXCL = "C_PYXCL", P_QNQLC = "C_QNQLC", P_RAMLN = "C_RAMLN", + P_RBRTN = "C_RBRTN", P_RCRVD = "C_RCRVD", P_RCTBL = "C_RCTBL", + P_RCTCB = "C_RCTCB", P_RCTGL = "C_RCTGL", P_RCTVG = "C_RCTVG", + P_RDGDR = "C_RDGDR", P_REMNC = "C_REMNC", P_REPHX = "C_REPHX", + P_RHBDM = "C_RHBDMM", F_RHBDS = "C_RHBDSP", P_RHPDD = "C_RHPDD", + F_RHPDM = "C_RHPDM", F_RHZDMY = "C_RHZDM", P_RHZMM = "C_RHZMM", + P_RIVRN = "C_RIVRN", P_ROSLN = "C_ROSLN", P_ROTAL = "C_ROTAL", + P_RPHDP = "C_RPHDP", P_RPRTN = "C_RPRTN", P_RSSLL = "C_RSSLL", + P_RTLMM = "C_RTLMM", P_RTYLA = "C_RTYLA", P_RUGID = "C_RUGID", + F_RZLLP = "C_RZLLP", P_SAGRN = "C_SAGRN", P_SCCMM = "C_SCCMM", + P_SCCRH = "C_SCCRH", P_SCHLM = "C_SCHLM", F_SCLRS = "C_SCLRS", + P_SCTLR = "C_SCTLR", P_SEBRK = "C_SEBRK", P_SGMLN = "C_SGMLN", + P_SGMLP = "C_SGMLP", P_SGMMR = "C_SGMMR", P_SGMVR = "C_SGMVR", + F_SMMRS = "C_SMMRS", P_SNNDS = "C_SNNDS", P_SORTS = "C_SORTS", + P_SPHGN = "C_SPHGN", P_SPHNN = "C_SPHNN", P_SNLLA = "C_SPHNNL", + P_SPHTR = "C_SPHTR", P_SPHTX = "C_SPHTX", P_SPHVG = "C_SPHVG", + P_SPRDT = "C_SPRDT", P_SPRLC = "C_SPRLC", F_SPRLG = "C_SPRLG", + P_SPRLL = "C_SPRLL", F_SPRMY = "C_SPRMY", P_SPRPL = "C_SPRPL", + P_SPRSG = "C_SPRSG", P_SPRST = "C_SPRST", P_SPHNP = "C_SPRTA", + P_SPRZN = "C_SPRZN", P_SPHRG = "C_SPSNA", P_STHDM = "C_SPTHD", + P_SRCNR = "C_SRCNR", F_SRLPD = "C_SRLPD", F_SPNGS = "C_SSPRA", + F_STEIN = "C_STEIN", P_SPTHD = "C_STHDDS", P_STHRP = "C_STHRP", + P_STNFR = "C_STNFR", P_STNSM = "C_STNSM", P_STNTR = "C_STNTR", + P_STRBL = "C_STRBL", P_STRMB = "C_STRMB", P_STTSN = "C_STTSN", + P_STYLN = "C_SYCHA", F_SCHZC = "C_SYTRM", P_TBNLL = "C_TBNLL", + P_TRCHL = "C_TCHLS", P_TCHNT = "C_TCHNT", P_THRCL = "C_THRCL", + P_THRMM = "C_THRMM", P_TIARN = "C_TIARN", P_TKPHR = "C_TKPHR", + P_TLNMA = "C_TLNMA", P_TLYPM = "C_TLYPM", P_TMNDS = "C_TMNDS", + P_TMNTA = "C_TMNTA", P_TNTNN = "C_TNNDM", P_TTNNS = "C_TNTNN", + P_TNPSS = "C_TNTNNP", P_TONTN = "C_TONTN", P_TOSAI = "C_TOSAI", + P_TPHTR = "C_TPHTR", P_TRCHH = "C_TRCHH", P_TRPHS = "C_TRCHLR", + P_TMMNA = "C_TRCHM", P_TRCHS = "C_TRCHSP", P_TRFRN = "C_TRFRN", + P_TRLCL = "C_TRLCL", P_TRTXL = "C_TRTXL", P_TRTXS = "C_TRTXS", + P_TTRHY = "C_TTRHY", F_TTRMY = "C_TTRMY", P_TXTLR = "C_TXTLR", + F_THRST = "C_TYTRM", P_URLPT = "C_ULPTS", P_UNGLT = "C_UNGLT", + P_URCNT = "C_URCNT", P_URONM = "C_URONM", P_UROSM = "C_UROSM", + P_URTRC = "C_URTRC", P_URSTY = "C_UTYLA", P_UVGRN = "C_UVGRN", + P_VLVLN = "C_VALVLN", P_VGNLN = "C_VGNLN", P_VGNLNP = "C_VGNLNP", + P_VLNRA = "C_VLVLN", P_VGNCL = "C_VNCLA", P_VRGLN = "C_VRGLN", + P_VRGLNP = "C_VRGLNP", P_VRTCL = "C_VRTCL", P_WBBNL = "C_WBBNL", + P_WEBBN = "C_WEBBN", P_WSNRL = "C_WSNRL", P_ZTHMN = "C_ZHMNM", + B_ZOOGL = "C_ZOOGL", F_DDSCS = "F_DPDSC", F_SCCHR = "F_SMYCS", + P_AMTRN = "P_ACNTH", F_AMBDM = "P_AMBDM", F_ARCYR = "P_ARCYR", + F_BADHM = "P_BADHM", F_BDHMP = "P_BDHMP", F_BRBYL = "P_BRBYL", + F_BRFLD = "P_BRFLD", F_CLMYX = "P_CLMYX", F_CLSTD = "P_CLSTD", + F_CMTRC = "P_CMTRC", F_CRBRR = "P_CRBRR", F_CRTMY = "P_CRTMY", + F_CRTRM = "P_CRTRM", F_DCTYD = "P_DCTYD", F_DDYMM = "P_DDYMM", + F_DIACH = "P_DIACH", F_DIANM = "P_DIANM", F_DIDRM = "P_DIDRM", + F_ELMYX = "P_ELMYX", F_ESTLM = "P_ESTLM", F_FULIG = "P_FULIG", + F_HMTRC = "P_HMTRC", F_LCRPS = "P_LCRPS", F_LICEA = "P_LICEA", + F_LMPRD = "P_LMPRD", F_LPTDR = "P_LPTDR", F_LSTRL = "P_LSTRL", + F_LYCGL = "P_LYCGL", F_MCBRD = "P_MCBRD", F_MNKTL = "P_MNKTL", + F_MTTRC = "P_MTTRC", F_MUCLG = "P_MUCLG", F_PHYSR = "P_PHYSR", + F_PRCHN = "P_PRCHN", F_PRMBD = "P_PRMBD", F_PRTPH = "P_PRTPH", + F_PSRNA = "P_PSRNA", F_PYSRM = "P_PYSRM", F_RTCLR = "P_RTCLR", + F_STMNT = "P_STMNT", F_SYMPH = "P_SYMPH", F_TRBRK = "P_TRBRK", + F_TRICH = "P_TRICH", F_TUBFR = "P_TUBFR") + + assign(x = "mo_codes_v0.5.0", + value = mo_codes_v0.5.0, + envir = asNamespace("AMR")) + # packageStartupMessage("OK.", appendLF = TRUE) } } diff --git a/_pkgdown.yml b/_pkgdown.yml index 8f3683e5..f701779f 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -88,7 +88,8 @@ reference: for more information about how to work with functions in this package. contents: - '`AMR`' - - '`ITIS`' + - '`catalogue_of_life`' + - '`catalogue_of_life_version`' - '`WHOCC`' - title: 'Cleaning your data' desc: > @@ -145,7 +146,6 @@ reference: - '`WHONET`' - '`microorganisms.codes`' - '`microorganisms.old`' - - '`supplementary_data`' - title: Other desc: > These functions are mostly for internal use, but some of diff --git a/data/microorganisms.old.rda b/data/microorganisms.old.rda index 02c8c939..027d5334 100644 Binary files a/data/microorganisms.old.rda and b/data/microorganisms.old.rda differ diff --git a/data/microorganisms.rda b/data/microorganisms.rda index 24a35f47..50db4fbe 100755 Binary files a/data/microorganisms.rda and b/data/microorganisms.rda differ diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index 212c53ae..f9678395 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -192,7 +192,7 @@

How to conduct AMR analysis

Matthijs S. Berends

-

18 February 2019

+

20 February 2019

@@ -201,7 +201,7 @@ -

Note: values on this page will change with every website update since they are based on randomly created values and the page was written in RMarkdown. However, the methodology remains unchanged. This page was generated on 18 February 2019.

+

Note: values on this page will change with every website update since they are based on randomly created values and the page was written in RMarkdown. However, the methodology remains unchanged. This page was generated on 20 February 2019.

Introduction

@@ -217,21 +217,21 @@ -2019-02-18 +2019-02-20 abcd Escherichia coli S S -2019-02-18 +2019-02-20 abcd Escherichia coli S R -2019-02-18 +2019-02-20 efgh Escherichia coli R @@ -327,71 +327,71 @@ -2011-01-25 -V6 -Hospital A -Staphylococcus aureus -I -S -S -S -F - - -2013-07-31 -A8 -Hospital D -Escherichia coli -S -S -S -S -M - - -2016-10-12 -Y3 -Hospital D -Escherichia coli -R -S -S -R -F - - -2013-12-31 -B4 +2012-08-24 +V3 Hospital B Escherichia coli S S S S -M +F - -2014-08-10 -F9 -Hospital A + +2013-04-28 +S3 +Hospital B Staphylococcus aureus R S +S +S +F + + +2013-05-13 +F2 +Hospital D +Staphylococcus aureus +S +S R S M -2017-12-05 -H7 -Hospital A +2017-04-04 +I9 +Hospital C Escherichia coli +R +S +S +S +M + + +2015-12-23 +G9 +Hospital B +Staphylococcus aureus S S S S M + +2011-12-19 +T2 +Hospital B +Staphylococcus aureus +R +S +S +S +F +

Now, let’s start the cleaning and the analysis!

@@ -411,8 +411,8 @@ #> #> Item Count Percent Cum. Count Cum. Percent #> --- ----- ------- -------- ----------- ------------- -#> 1 M 10,283 51.4% 10,283 51.4% -#> 2 F 9,717 48.6% 20,000 100.0% +#> 1 M 10,471 52.4% 10,471 52.4% +#> 2 F 9,529 47.6% 20,000 100.0%

So, we can draw at least two conclusions immediately. From a data scientist perspective, the data looks clean: only values M and F. From a researcher perspective: there are slightly more men. Nothing we didn’t already know.

The data is already quite clean, but we still need to transform some variables. The bacteria column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate() function of the dplyr package makes this really easy:

data <- data %>%
@@ -443,10 +443,10 @@
 #> Kingella kingae (no changes)
 #> 
 #> EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
-#> Table 1:  Intrinsic resistance in Enterobacteriaceae (1294 changes)
+#> Table 1:  Intrinsic resistance in Enterobacteriaceae (1288 changes)
 #> Table 2:  Intrinsic resistance in non-fermentative Gram-negative bacteria (no changes)
 #> Table 3:  Intrinsic resistance in other Gram-negative bacteria (no changes)
-#> Table 4:  Intrinsic resistance in Gram-positive bacteria (2822 changes)
+#> Table 4:  Intrinsic resistance in Gram-positive bacteria (2757 changes)
 #> Table 8:  Interpretive rules for B-lactam agents and Gram-positive cocci (no changes)
 #> Table 9:  Interpretive rules for B-lactam agents and Gram-negative rods (no changes)
 #> Table 10: Interpretive rules for B-lactam agents and other Gram-negative bacteria (no changes)
@@ -462,9 +462,9 @@
 #> Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no changes)
 #> Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no changes)
 #> 
-#> => EUCAST rules affected 7,463 out of 20,000 rows
+#> => EUCAST rules affected 7,243 out of 20,000 rows
 #>    -> added 0 test results
-#>    -> changed 4,116 test results (0 to S; 0 to I; 4,116 to R)
+#> -> changed 4,045 test results (0 to S; 0 to I; 4,045 to R)

@@ -489,8 +489,8 @@ #> NOTE: Using column `bacteria` as input for `col_mo`. #> NOTE: Using column `date` as input for `col_date`. #> NOTE: Using column `patient_id` as input for `col_patient_id`. -#> => Found 5,654 first isolates (28.3% of total)

-

So only 28.3% is suitable for resistance analysis! We can now filter on it with the filter() function, also from the dplyr package:

+#> => Found 5,692 first isolates (28.5% of total) +

So only 28.5% is suitable for resistance analysis! We can now filter on it with the filter() function, also from the dplyr package:

data_1st <- data %>% 
   filter(first == TRUE)

For future use, the above two syntaxes can be shortened with the filter_first_isolate() function:

@@ -516,8 +516,8 @@ 1 -2010-04-05 -Q7 +2010-01-31 +C6 B_ESCHR_COL R S @@ -527,30 +527,30 @@ 2 -2010-05-10 -Q7 +2010-03-09 +C6 B_ESCHR_COL S -R +S S S FALSE 3 -2010-06-22 -Q7 +2010-05-30 +C6 B_ESCHR_COL S -I +S S S FALSE 4 -2010-06-25 -Q7 +2010-06-22 +C6 B_ESCHR_COL S S @@ -560,41 +560,41 @@ 5 -2010-10-03 -Q7 +2011-01-10 +C6 B_ESCHR_COL S -S -S +I +R S FALSE 6 -2010-10-08 -Q7 +2011-02-19 +C6 B_ESCHR_COL +I S S S -S -FALSE +TRUE 7 -2011-03-24 -Q7 +2011-02-21 +C6 B_ESCHR_COL -S -S -S +R +I +R S FALSE 8 -2011-03-27 -Q7 +2011-05-11 +C6 B_ESCHR_COL S S @@ -604,21 +604,21 @@ 9 -2011-05-11 -Q7 +2011-07-06 +C6 B_ESCHR_COL S S -R -R -TRUE +S +S +FALSE 10 -2011-07-22 -Q7 +2011-08-11 +C6 B_ESCHR_COL -R +S S S S @@ -637,7 +637,7 @@ #> NOTE: Using column `patient_id` as input for `col_patient_id`. #> NOTE: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics = FALSE to prevent this. #> [Criterion] Inclusion based on key antibiotics, ignoring I. -#> => Found 15,805 first weighted isolates (79.0% of total) +#> => Found 15,864 first weighted isolates (79.3% of total) @@ -654,8 +654,8 @@ - - + + @@ -666,11 +666,11 @@ - - + + - + @@ -678,11 +678,11 @@ - - + + - + @@ -690,8 +690,8 @@ - - + + @@ -702,83 +702,83 @@ - - + + - - + + - + - - + + + - - - + + - - + + - - - + + + - + - - + + - + - - + + - - - - - - - - - - - - - + + + + + + + + + + + + +
isolate
12010-04-05Q72010-01-31C6 B_ESCHR_COL R S
22010-05-10Q72010-03-09C6 B_ESCHR_COL SRS S S FALSE
32010-06-22Q72010-05-30C6 B_ESCHR_COL SIS S S FALSE
42010-06-25Q72010-06-22C6 B_ESCHR_COL S S
52010-10-03Q72011-01-10C6 B_ESCHR_COL SSSIR S FALSEFALSETRUE
62010-10-08Q72011-02-19C6 B_ESCHR_COLI S S SSFALSEFALSETRUETRUE
72011-03-24Q72011-02-21C6 B_ESCHR_COLSSSRIR S FALSEFALSETRUE
82011-03-27Q72011-05-11C6 B_ESCHR_COL S S S S FALSEFALSETRUE
92011-05-11Q72011-07-06C6 B_ESCHR_COL S SRRTRUETRUE
102011-07-22Q7B_ESCHR_COLRS S S FALSETRUEFALSE
102011-08-11C6B_ESCHR_COLSSSSFALSEFALSE
-

Instead of 2, now 4 isolates are flagged. In total, 79% of all isolates are marked ‘first weighted’ - 50.8% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.

+

Instead of 2, now 6 isolates are flagged. In total, 79.3% of all isolates are marked ‘first weighted’ - 50.9% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.

As with filter_first_isolate(), there’s a shortcut for this new algorithm too:

data_1st <- data %>% 
   filter_first_weighted_isolate()
-

So we end up with 15,805 isolates for analysis.

+

So we end up with 15,864 isolates for analysis.

We can remove unneeded columns:

data_1st <- data_1st %>% 
   select(-c(first, keyab))
@@ -803,45 +803,45 @@ -2 -2013-07-31 -A8 -Hospital D +1 +2012-08-24 +V3 +Hospital B B_ESCHR_COL S S S S -M -Gram negative -Escherichia -coli -TRUE - - -3 -2016-10-12 -Y3 -Hospital D -B_ESCHR_COL -R -S -S -R F Gram negative Escherichia coli TRUE - -5 -2014-08-10 -F9 -Hospital A + +2 +2013-04-28 +S3 +Hospital B B_STPHY_AUR R S +S +S +F +Gram positive +Staphylococcus +aureus +TRUE + + +3 +2013-05-13 +F2 +Hospital D +B_STPHY_AUR +S +S R S M @@ -851,53 +851,53 @@ TRUE +5 +2015-12-23 +G9 +Hospital B +B_STPHY_AUR +S +S +S +S +M +Gram positive +Staphylococcus +aureus +TRUE + + 6 -2017-12-05 -H7 +2011-12-19 +T2 +Hospital B +B_STPHY_AUR +R +S +S +S +F +Gram positive +Staphylococcus +aureus +TRUE + + +7 +2011-11-26 +J1 Hospital A B_ESCHR_COL +R S S -S -S +R M Gram negative Escherichia coli TRUE - -7 -2014-02-06 -D7 -Hospital C -B_STPHY_AUR -R -R -R -S -M -Gram positive -Staphylococcus -aureus -TRUE - - -8 -2013-05-11 -A5 -Hospital A -B_STPHY_AUR -S -S -S -S -M -Gram positive -Staphylococcus -aureus -TRUE -

Time for the analysis!

@@ -915,9 +915,9 @@
freq(paste(data_1st$genus, data_1st$species))

Or can be used like the dplyr way, which is easier readable:

data_1st %>% freq(genus, species)
-

Frequency table of genus and species from a data.frame (15,805 x 13)

+

Frequency table of genus and species from a data.frame (15,864 x 13)

Columns: 2
-Length: 15,805 (of which NA: 0 = 0.00%)
+Length: 15,864 (of which NA: 0 = 0.00%)
Unique: 4

Shortest: 16
Longest: 24

@@ -934,33 +934,33 @@ Longest: 24

1 Escherichia coli -7,815 -49.4% -7,815 -49.4% +7,849 +49.5% +7,849 +49.5% 2 Staphylococcus aureus -3,988 -25.2% -11,803 -74.7% +4,010 +25.3% +11,859 +74.8% 3 Streptococcus pneumoniae -2,476 -15.7% -14,279 -90.3% +2,443 +15.4% +14,302 +90.2% 4 Klebsiella pneumoniae -1,526 -9.7% -15,805 +1,562 +9.8% +15,864 100.0% @@ -971,7 +971,7 @@ Longest: 24

Resistance percentages

The functions portion_R, portion_RI, portion_I, portion_IS and portion_S can be used to determine the portion of a specific antimicrobial outcome. They can be used on their own:

data_1st %>% portion_IR(amox)
-#> [1] 0.4771275
+#> [1] 0.4777484

Or can be used in conjuction with group_by() and summarise(), both from the dplyr package:

data_1st %>% 
   group_by(hospital) %>% 
@@ -984,19 +984,19 @@ Longest: 24

Hospital A -0.4661749 +0.4814277 Hospital B -0.4820906 +0.4811252 Hospital C -0.4847862 +0.4843815 Hospital D -0.4790875 +0.4610721 @@ -1014,23 +1014,23 @@ Longest: 24

Hospital A -0.4661749 -4745 +0.4814277 +4819 Hospital B -0.4820906 -5472 +0.4811252 +5510 Hospital C -0.4847862 -2432 +0.4843815 +2401 Hospital D -0.4790875 -3156 +0.4610721 +3134 @@ -1050,27 +1050,27 @@ Longest: 24

Escherichia -0.7332054 -0.9033909 -0.9754319 +0.7180533 +0.8983310 +0.9704421 Klebsiella -0.7142857 -0.9095675 -0.9737877 +0.7189501 +0.8879641 +0.9654289 Staphylococcus -0.7179037 -0.9162487 -0.9779338 +0.7231920 +0.9216958 +0.9790524 Streptococcus -0.7289984 +0.7372084 0.0000000 -0.7289984 +0.7372084 diff --git a/docs/articles/AMR_files/figure-html/plot 1-1.png b/docs/articles/AMR_files/figure-html/plot 1-1.png index 86aa274d..b87bb214 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 1-1.png and b/docs/articles/AMR_files/figure-html/plot 1-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 3-1.png b/docs/articles/AMR_files/figure-html/plot 3-1.png index 1f6d21fd..3a6a2c1f 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 3-1.png and b/docs/articles/AMR_files/figure-html/plot 3-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 4-1.png b/docs/articles/AMR_files/figure-html/plot 4-1.png index 2689fa2e..eec1e91b 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 4-1.png and b/docs/articles/AMR_files/figure-html/plot 4-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 5-1.png b/docs/articles/AMR_files/figure-html/plot 5-1.png index bac17cf6..dae88c56 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 5-1.png and b/docs/articles/AMR_files/figure-html/plot 5-1.png differ diff --git a/docs/articles/EUCAST.html b/docs/articles/EUCAST.html index 908711d9..8eabbc3a 100644 --- a/docs/articles/EUCAST.html +++ b/docs/articles/EUCAST.html @@ -192,7 +192,7 @@

How to apply EUCAST rules

Matthijs S. Berends

-

18 February 2019

+

20 February 2019

diff --git a/docs/articles/G_test.html b/docs/articles/G_test.html index 589abac2..304f848b 100644 --- a/docs/articles/G_test.html +++ b/docs/articles/G_test.html @@ -192,7 +192,7 @@

How to use the G-test

Matthijs S. Berends

-

18 February 2019

+

20 February 2019

diff --git a/docs/articles/WHONET.html b/docs/articles/WHONET.html index 9d3b980e..52633355 100644 --- a/docs/articles/WHONET.html +++ b/docs/articles/WHONET.html @@ -192,7 +192,7 @@

How to work with WHONET data

Matthijs S. Berends

-

18 February 2019

+

20 February 2019

diff --git a/docs/articles/atc_property.html b/docs/articles/atc_property.html index 0760a382..f9bcbfe3 100644 --- a/docs/articles/atc_property.html +++ b/docs/articles/atc_property.html @@ -192,7 +192,7 @@

How to get properties of an antibiotic

Matthijs S. Berends

-

18 February 2019

+

20 February 2019

diff --git a/docs/articles/benchmarks.html b/docs/articles/benchmarks.html index 7fdf2257..f50800a3 100644 --- a/docs/articles/benchmarks.html +++ b/docs/articles/benchmarks.html @@ -192,7 +192,7 @@

Benchmarks

Matthijs S. Berends

-

14 February 2019

+

20 February 2019

@@ -201,161 +201,183 @@ -

One of the most important features of this package is the complete microbial taxonomic database, supplied by ITIS (https://www.itis.gov). We created a function as.mo() that transforms any user input value to a valid microbial ID by using AI (Artificial Intelligence) and based on the taxonomic tree of ITIS.

-

Using the microbenchmark package, we can review the calculation performance of this function. Its function microbenchmark() calculates different input expressions independently of each others and runs every expression 100 times.

+

One of the most important features of this package is the complete microbial taxonomic database, supplied by the Catalogue of Life (http://catalogueoflife.org). We created a function as.mo() that transforms any user input value to a valid microbial ID by using AI (Artificial Intelligence) combined with the taxonomic tree of Catalogue of Life.

+

Using the microbenchmark package, we can review the calculation performance of this function. Its function microbenchmark() runs different input expressions independently of each other and measures their time-to-result.

library(microbenchmark)
 library(AMR)

In the next test, we try to ‘coerce’ different input values for Staphylococcus aureus. The actual result is the same every time: it returns its MO code B_STPHY_AUR (B stands for Bacteria, the taxonomic kingdom).

But the calculation time differs a lot. Here, the AI effect can be reviewed best:

- -

-

In the table above, all measurements are shown in milliseconds (thousands of seconds), tested on a quite regular Linux server from 2007 (Core 2 Duo 2.7 GHz, 2 GB DDR2 RAM). A value of 8 milliseconds means it can determine 125 input values per second. It case of 40 milliseconds, this is only 25 input values per second. The more an input value resembles a full name, the faster the result will be found. In case of as.mo("B_STPHY_AUR"), the input is already a valid MO code, so it only almost takes no time at all (0.0002 seconds on our server).

-

To achieve this speed, the as.mo function also takes into account the prevalence of human pathogenic microorganisms. The downside is of course that less prevalent microorganisms will be determined far less faster. See this example for the ID of Burkholderia nodosa (B_BRKHL_NOD):

- -

-

That takes up to 8 times as much time! A value of 145 milliseconds means it can only determine ~7 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance.

+ +

In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 10 milliseconds means it can determine 100 input values per second. It case of 50 milliseconds, this is only 20 input values per second. The more an input value resembles a full name, the faster the result will be found. In case of as.mo("B_STPHY_AUR"), the input is already a valid MO code, so it only almost takes no time at all (404 millionths of seconds).

+

To achieve this speed, the as.mo function also takes into account the prevalence of human pathogenic microorganisms. The downside is of course that less prevalent microorganisms will be determined far less faster. See this example for the ID of Mycoplasma leonicaptivi (B_MYCPL_LEO), a bug probably never found before in humans:

+ +

That takes 6 times as much time on average! A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance:

+
par(mar = c(5, 16, 4, 2)) # set more space for left margin text (16)
+
+# highest value on y axis
+max_y_axis <- max(S.aureus$time, M.leonicaptivi$time, na.rm = TRUE) / 1e6
+
+boxplot(S.aureus, horizontal = TRUE, las = 1, unit = "ms", log = FALSE, xlab = "", ylim = c(0, max_y_axis),
+        main = expression(paste("Benchmark of ", italic("Staphylococcus aureus"))))
+

+

+boxplot(M.leonicaptivi, horizontal = TRUE, las = 1, unit = "ms", log = FALSE, xlab = "", ylim = c(0, max_y_axis),
+        main = expression(paste("Benchmark of ", italic("Mycoplasma leonicaptivi"))))
+

To relieve this pitfall and further improve performance, two important calculations take almost no time at all: repetitive results and already precalculated results.

Repetitive results

-

Repetitive results mean that unique values are present more than once. Unique values will only be calculated once by as.mo(). We will use mo_fullname() for this test - a helper function that returns the full microbial name (genus, species and possibly subspecies) and uses as.mo() internally.

- -

So transforming 500,000 values (!) of 96 unique values only takes 0.12 seconds (120 ms). You only lose time on your unique input values.

-

Results of a tenfold - 5,000,000 values:

- -

Even determining the full names of 5 Million values is done within a second.

+

Repetitive results mean that unique values are present more than once. Unique values will only be calculated once by as.mo(). We will use mo_fullname() for this test - a helper function that returns the full microbial name (genus, species and possibly subspecies) which uses as.mo() internally.

+ +

So transforming 500,000 values (!) of 95 unique values only takes 0.47 seconds (468 ms). You only lose time on your unique input values.

Precalculated results

What about precalculated results? If the input is an already precalculated result of a helper function like mo_fullname(), it almost doesn’t take any time at all (see ‘C’ below):

- -

So going from mo_fullname("Staphylococcus aureus") to "Staphylococcus aureus" takes 0.0001 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:

- -

Of course, when running mo_phylum("Firmicutes") the function has zero knowledge about the actual microorganism, namely S. aureus. But since the result would be "Firmicutes" too, there is no point in calculating the result. And because this package ‘knows’ all phyla of all known microorganisms (according to ITIS), it can just return the initial value immediately.

+ +

So going from mo_fullname("Staphylococcus aureus") to "Staphylococcus aureus" takes 0.0004 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:

+ +

Of course, when running mo_phylum("Firmicutes") the function has zero knowledge about the actual microorganism, namely S. aureus. But since the result would be "Firmicutes" too, there is no point in calculating the result. And because this package ‘knows’ all phyla of all known bacteria (according to the Catalogue of Life), it can just return the initial value immediately.

Results in other languages

When the system language is non-English and supported by this AMR package, some functions take a little while longer:

- +

Currently supported are German, Dutch, Spanish, Italian, French and Portuguese.

diff --git a/docs/articles/benchmarks_files/figure-html/unnamed-chunk-4-1.png b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-4-1.png new file mode 100644 index 00000000..331cac82 Binary files /dev/null and b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-4-1.png differ diff --git a/docs/articles/benchmarks_files/figure-html/unnamed-chunk-4-2.png b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-4-2.png new file mode 100644 index 00000000..c1225095 Binary files /dev/null and b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-4-2.png differ diff --git a/docs/articles/freq.html b/docs/articles/freq.html index d964bf24..f605df37 100644 --- a/docs/articles/freq.html +++ b/docs/articles/freq.html @@ -192,7 +192,7 @@

How to create frequency tables

Matthijs S. Berends

-

18 February 2019

+

20 February 2019

@@ -258,16 +258,16 @@ Longest: 1

colnames(microorganisms)
 #  [1] "mo"         "col_id"     "fullname"   "kingdom"    "phylum"    
 #  [6] "class"      "order"      "family"     "genus"      "species"   
-# [11] "subspecies" "rank"       "ref"
-

If we compare the dimensions between the old and new dataset, we can see that these 12 variables were added:

+# [11] "subspecies" "rank" "ref" "species_id"
+

If we compare the dimensions between the old and new dataset, we can see that these 13 variables were added:

dim(septic_patients)
 # [1] 2000   49
 dim(my_patients)
-# [1] 2000   61
+# [1] 2000 62

So now the genus and species variables are available. A frequency table of these combined variables can be created like this:

my_patients %>%
   freq(genus, species, nmax = 15)
-

Frequency table of genus and species from a data.frame (2,000 x 61)

+

Frequency table of genus and species from a data.frame (2,000 x 62)

Columns: 2
Length: 2,000 (of which NA: 0 = 0.00%)
Unique: 95

diff --git a/docs/articles/mo_property.html b/docs/articles/mo_property.html index 1700164e..8f5fb8cb 100644 --- a/docs/articles/mo_property.html +++ b/docs/articles/mo_property.html @@ -192,7 +192,7 @@

How to get properties of a microorganism

Matthijs S. Berends

-

18 February 2019

+

20 February 2019

diff --git a/docs/articles/resistance_predict.html b/docs/articles/resistance_predict.html index 03e8e556..5afd65c2 100644 --- a/docs/articles/resistance_predict.html +++ b/docs/articles/resistance_predict.html @@ -192,7 +192,7 @@

How to predict antimicrobial resistance

Matthijs S. Berends

-

18 February 2019

+

20 February 2019

diff --git a/docs/index.html b/docs/index.html index e8870b83..2b9faf0d 100644 --- a/docs/index.html +++ b/docs/index.html @@ -284,10 +284,17 @@

Microbial (taxonomic) reference data

-

-

This package contains the complete microbial taxonomic data (with all nine taxonomic ranks - from kingdom to subspecies) from the publicly available Integrated Taxonomic Information System (ITIS, https://www.itis.gov).

-

All ~20,000 (sub)species from the taxonomic kingdoms Bacteria, Fungi and Protozoa are included in this package, as well as all their ~2,500 previously accepted names known to ITIS. Furthermore, the responsible authors and year of publication are available. This allows users to use authoritative taxonomic information for their data analysis on any microorganism, not only human pathogens. It also helps to quickly determine the Gram stain of bacteria, since ITIS honours the taxonomic branching order of bacterial phyla according to Cavalier-Smith (2002), which defines that all bacteria are classified into either subkingdom Negibacteria or subkingdom Posibacteria.

-

Read more about the data from ITIS in our manual.

+

+

This package contains the complete taxonomic tree of almost all microorganisms from the authoritative and comprehensive Catalogue of Life (www.catalogueoflife.org). This data is updated annually - check the included version with catalogue_of_life_version().

+

Included are:

+ +

The Catalogue of Life (www.catalogueoflife.org) is the most comprehensive and authoritative global index of species currently available. It holds essential information on the names, relationships and distributions of over 1.6 million species. The Catalogue of Life is used to support the major biodiversity and conservation information services such as the Global Biodiversity Information Facility (GBIF), Encyclopedia of Life (EoL) and the International Union for Conservation of Nature Red List. It is recognised by the Convention on Biological Diversity as a significant component of the Global Taxonomy Initiative and a contribution to Target 1 of the Global Strategy for Plant Conservation.

+

Read more about the data from the Catalogue of Life in our manual.

diff --git a/docs/news/index.html b/docs/news/index.html index 59b41ad3..32062816 100644 --- a/docs/news/index.html +++ b/docs/news/index.html @@ -249,13 +249,13 @@

Examples:

-

ITIS

+

Catalogue of Life

-


-This package contains the complete microbial taxonomic data (with all nine taxonomic ranks - from kingdom to subspecies) from the publicly available Integrated Taxonomic Information System (ITIS, https://www.itis.gov).

-

All ~20,000 (sub)species from the taxonomic kingdoms Bacteria, Fungi and Protozoa are included in this package, as well as all their ~2,500 previously accepted names known to ITIS. Furthermore, the responsible authors and year of publication are available. This allows users to use authoritative taxonomic information for their data analysis on any microorganism, not only human pathogens. It also helps to quickly determine the Gram stain of bacteria, since ITIS honours the taxonomic branching order of bacterial phyla according to Cavalier-Smith (2002), which defines that all bacteria are classified into either subkingdom Negibacteria or subkingdom Posibacteria.

-

ITIS is a partnership of U.S., Canadian, and Mexican agencies and taxonomic specialists [3].

+


+This package contains the complete taxonomic tree of almost all microorganisms from the authoritative and comprehensive Catalogue of Life (http://www.catalogueoflife.org). This data is updated annually - check the included version with catalogue_of_life_version.

+

Included are:

+

The Catalogue of Life (http://www.catalogueoflife.org) is the most comprehensive and authoritative global index of species currently available. It holds essential information on the names, relationships and distributions of over 1.6 million species. The Catalogue of Life is used to support the major biodiversity and conservation information services such as the Global Biodiversity Information Facility (GBIF), Encyclopedia of Life (EoL) and the International Union for Conservation of Nature Red List. It is recognised by the Convention on Biological Diversity as a significant component of the Global Taxonomy Initiative and a contribution to Target 1 of the Global Strategy for Plant Conservation.

+

The syntax used to transform the original data to a cleansed R format, can be found here: https://gitlab.com/msberends/AMR/blob/master/reproduction_of_microorganisms.R.

Read more on our website!

@@ -291,7 +297,7 @@ On our website https://msberends.gitla

See also

-

as.mo mo_property microorganisms.codes

+

as.mo, mo_property, microorganisms.codes

@@ -305,7 +311,7 @@ On our website https://msberends.gitla
  • Details
  • -
  • ITIS
  • +
  • Catalogue of Life
  • Read more on our website!
  • diff --git a/docs/reference/microorganisms.old.html b/docs/reference/microorganisms.old.html index b21b8ed1..928aff99 100644 --- a/docs/reference/microorganisms.old.html +++ b/docs/reference/microorganisms.old.html @@ -47,7 +47,7 @@ - + @@ -237,7 +237,7 @@
    -

    A data set containing old (previously valid or accepted) taxonomic names according to ITIS. This data set is used internally by as.mo.

    +

    A data set containing old (previously valid or accepted) taxonomic names according to the Catalogue of Life. This data set is used internally by as.mo.

    @@ -248,21 +248,28 @@

    A data.frame with 14,506 observations and 4 variables:

    col_id

    Catalogue of Life ID

    tsn_new

    New Catalogue of Life ID

    -
    fullname

    Old taxonomic name of the microorganism as found in the CoL, see Source

    -
    ref

    Author(s) and year of concerning publication as found in the CoL, see Source

    +
    fullname

    Old taxonomic name of the microorganism

    +
    ref

    Author(s) and year of concerning scientific publication

    Source

    -

    [3] Integrated Taxonomic Information System (ITIS) on-line database, https://www.itis.gov.

    +

    [3] Catalogue of Life: Annual Checklist (public online database), www.catalogueoflife.org.

    -

    ITIS

    +

    Catalogue of Life

    -


    -This package contains the complete microbial taxonomic data (with all nine taxonomic ranks - from kingdom to subspecies) from the publicly available Integrated Taxonomic Information System (ITIS, https://www.itis.gov).

    -

    All ~20,000 (sub)species from the taxonomic kingdoms Bacteria, Fungi and Protozoa are included in this package, as well as all their ~2,500 previously accepted names known to ITIS. Furthermore, the responsible authors and year of publication are available. This allows users to use authoritative taxonomic information for their data analysis on any microorganism, not only human pathogens. It also helps to quickly determine the Gram stain of bacteria, since ITIS honours the taxonomic branching order of bacterial phyla according to Cavalier-Smith (2002), which defines that all bacteria are classified into either subkingdom Negibacteria or subkingdom Posibacteria.

    -

    ITIS is a partnership of U.S., Canadian, and Mexican agencies and taxonomic specialists [3].

    +


    +This package contains the complete taxonomic tree of almost all microorganisms from the authoritative and comprehensive Catalogue of Life (http://www.catalogueoflife.org). This data is updated annually - check the included version with catalogue_of_life_version.

    +

    Included are:

    +

    The Catalogue of Life (http://www.catalogueoflife.org) is the most comprehensive and authoritative global index of species currently available. It holds essential information on the names, relationships and distributions of over 1.6 million species. The Catalogue of Life is used to support the major biodiversity and conservation information services such as the Global Biodiversity Information Facility (GBIF), Encyclopedia of Life (EoL) and the International Union for Conservation of Nature Red List. It is recognised by the Convention on Biological Diversity as a significant component of the Global Taxonomy Initiative and a contribution to Target 1 of the Global Strategy for Plant Conservation.

    +

    The syntax used to transform the original data to a cleansed R format, can be found here: https://gitlab.com/msberends/AMR/blob/master/reproduction_of_microorganisms.R.

    Read more on our website!

    @@ -284,7 +291,7 @@ On our website https://msberends.gitla
  • Source
  • -
  • ITIS
  • +
  • Catalogue of Life
  • Read more on our website!
  • diff --git a/docs/reference/mo_property.html b/docs/reference/mo_property.html index 9ff1dde2..0a44ac78 100644 --- a/docs/reference/mo_property.html +++ b/docs/reference/mo_property.html @@ -259,8 +259,6 @@ mo_phylum(x, ...) -mo_subkingdom(x, ...) - mo_kingdom(x, ...) mo_type(x, language = get_locale(), ...) @@ -275,6 +273,8 @@ mo_taxonomy(x, ...) +mo_url(x, ...) + mo_property(x, property = "fullname", language = get_locale(), ...)

    Arguments

    @@ -310,31 +310,40 @@

    Details

    -

    All functions will return the most recently known taxonomic property according to ITIS, except for mo_ref, mo_authors and mo_year. This leads to the following results: