mirror of
https://github.com/msberends/AMR.git
synced 2025-07-09 02:03:04 +02:00
DSMZ data
This commit is contained in:
25
man/as.mo.Rd
25
man/as.mo.Rd
@ -26,7 +26,7 @@ clean_mo_history()
|
||||
\arguments{
|
||||
\item{x}{a character vector or a \code{data.frame} with one or two columns}
|
||||
|
||||
\item{Becker}{a logical to indicate whether \emph{Staphylococci} should be categorised into Coagulase Negative \emph{Staphylococci} ("CoNS") and Coagulase Positive \emph{Staphylococci} ("CoPS") instead of their own species, according to Karsten Becker \emph{et al.} [1].
|
||||
\item{Becker}{a logical to indicate whether \emph{Staphylococci} should be categorised into Coagulase Negative \emph{Staphylococci} ("CoNS") and Coagulase Positive \emph{Staphylococci} ("CoPS") instead of their own species, according to Karsten Becker \emph{et al.} [1]. Note that this does not include species that were newly named after this publication.
|
||||
|
||||
This excludes \emph{Staphylococcus aureus} at default, use \code{Becker = "all"} to also categorise \emph{S. aureus} as "CoPS".}
|
||||
|
||||
@ -44,7 +44,7 @@ clean_mo_history()
|
||||
Character (vector) with class \code{"mo"}. Unknown values will return \code{NA}.
|
||||
}
|
||||
\description{
|
||||
Use this function to determine a valid microorganism ID (\code{mo}). Determination is done using intelligent rules and the complete taxonomic kingdoms Bacteria, Chromista, Protozoa, Archaea, Viruses, and most microbial species from the kingdom Fungi (see Source). The input can be almost anything: a full name (like \code{"Staphylococcus aureus"}), an abbreviated name (like \code{"S. aureus"}), an abbreviation known in the field (like \code{"MRSA"}), or just a genus. Please see Examples.
|
||||
Use this function to determine a valid microorganism ID (\code{mo}). Determination is done using intelligent rules and the complete taxonomic kingdoms Bacteria, Chromista, Protozoa, Archaea and most microbial species from the kingdom Fungi (see Source). The input can be almost anything: a full name (like \code{"Staphylococcus aureus"}), an abbreviated name (like \code{"S. aureus"}), an abbreviation known in the field (like \code{"MRSA"}), or just a genus. Please see Examples.
|
||||
}
|
||||
\details{
|
||||
\strong{General info} \cr
|
||||
@ -61,13 +61,15 @@ A microbial ID from this package (class: \code{mo}) typically looks like these e
|
||||
| | ----> species, a 3-4 letter acronym
|
||||
| ----> genus, a 5-7 letter acronym, mostly without vowels
|
||||
----> taxonomic kingdom: A (Archaea), AN (Animalia), B (Bacteria), C (Chromista),
|
||||
F (Fungi), P (Protozoa), PL (Plantae) or V (Viruses)
|
||||
F (Fungi), P (Protozoa) or PL (Plantae)
|
||||
}
|
||||
|
||||
Values that cannot be coered will be considered 'unknown' and have an MO code \code{UNKNOWN}.
|
||||
|
||||
Use the \code{\link{mo_property}_*} functions to get properties based on the returned code, see Examples.
|
||||
|
||||
The algorithm uses data from the Catalogue of Life (see below) and from one other source (see \code{?microorganisms}).
|
||||
|
||||
\strong{Self-learning algoritm} \cr
|
||||
The \code{as.mo()} function gains experience from previously determined microbial IDs and learns from it. This drastically improves both speed and reliability. Use \code{clean_mo_history()} to reset the algorithms. Only experience from your current \code{AMR} package version is used. This is done because in the future the taxonomic tree (which is included in this package) may change for any organism and it consequently has to rebuild its knowledge. Usually, any guess after the first try runs 90-95\% faster than the first try. The algorithm saves its previous findings to \code{~/.Rhistory_mo}.
|
||||
|
||||
@ -76,7 +78,7 @@ This function uses intelligent rules to help getting fast and logical results. I
|
||||
\itemize{
|
||||
\item{Valid MO codes and full names: it first searches in already valid MO code and known genus/species combinations}
|
||||
\item{Human pathogenic prevalence: it first searches in more prevalent microorganisms, then less prevalent ones (see \emph{Microbial prevalence of pathogens in humans} below)}
|
||||
\item{Taxonomic kingdom: it first searches in Bacteria/Chromista, then Fungi, then Protozoa, then Viruses}
|
||||
\item{Taxonomic kingdom: it first searches in Bacteria/Chromista, then Fungi, then Protozoa}
|
||||
\item{Breakdown of input values: from here it starts to breakdown input values to find possible matches}
|
||||
}
|
||||
|
||||
@ -93,7 +95,6 @@ The algorithm can additionally use three different levels of uncertainty to gues
|
||||
\itemize{
|
||||
\item{(uncertainty level 1): It tries to look for only matching genera}
|
||||
\item{(uncertainty level 1): It tries to look for previously accepted (but now invalid) taxonomic names}
|
||||
\item{(uncertainty level 1): It tries to look for some manual changes which are not (yet) published to the Catalogue of Life (like \emph{Propionibacterium} being \emph{Cutibacterium})}
|
||||
\item{(uncertainty level 2): It strips off values between brackets and the brackets itself, and re-evaluates the input with all previous rules}
|
||||
\item{(uncertainty level 2): It strips off words from the end one by one and re-evaluates the input with all previous rules}
|
||||
\item{(uncertainty level 3): It strips off words from the start one by one and re-evaluates the input with all previous rules}
|
||||
@ -164,6 +165,12 @@ as.mo("MRSA") # Methicillin Resistant S. aureus
|
||||
as.mo("VISA") # Vancomycin Intermediate S. aureus
|
||||
as.mo("VRSA") # Vancomycin Resistant S. aureus
|
||||
|
||||
# Dyslexia is no problem - these all work:
|
||||
as.mo("Ureaplasma urealyticum")
|
||||
as.mo("Ureaplasma urealyticus")
|
||||
as.mo("Ureaplasmium urealytica")
|
||||
as.mo("Ureaplazma urealitycium")
|
||||
|
||||
as.mo("Streptococcus group A")
|
||||
as.mo("GAS") # Group A Streptococci
|
||||
as.mo("GBS") # Group B Streptococci
|
||||
@ -174,13 +181,9 @@ as.mo("S. epidermidis", Becker = TRUE) # will not remain species: B_STPHY_CNS
|
||||
as.mo("S. pyogenes") # will remain species: B_STRPT_PYO
|
||||
as.mo("S. pyogenes", Lancefield = TRUE) # will not remain species: B_STRPT_GRA
|
||||
|
||||
# Use mo_* functions to get a specific property based on `mo`
|
||||
Ecoli <- as.mo("E. coli") # returns `B_ESCHR_COL`
|
||||
mo_genus(Ecoli) # returns "Escherichia"
|
||||
mo_gramstain(Ecoli) # returns "Gram negative"
|
||||
# but it uses as.mo internally too, so you could also just use:
|
||||
# All mo_* functions use as.mo() internally too (see ?mo_property):
|
||||
mo_genus("E. coli") # returns "Escherichia"
|
||||
|
||||
mo_gramstain("E. coli") # returns "Gram negative"#'
|
||||
|
||||
\dontrun{
|
||||
df$mo <- as.mo(df$microorganism_name)
|
||||
|
@ -18,7 +18,7 @@ This package contains the complete taxonomic tree of almost all microorganisms (
|
||||
|
||||
Included are:
|
||||
\itemize{
|
||||
\item{All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses}
|
||||
\item{All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria and Protozoa}
|
||||
\item{All ~3,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of \emph{Aspergillus}, \emph{Candida}, \emph{Cryptococcus}, \emph{Histplasma}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).}
|
||||
\item{All ~2,000 (sub)species from ~100 other relevant genera, from the kingdoms of Animalia and Plantae (like \emph{Strongyloides} and \emph{Taenia})}
|
||||
\item{All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed}
|
||||
@ -66,3 +66,7 @@ mo_phylum("C. elegans")
|
||||
mo_fullname("C. elegans")
|
||||
# [1] "Chroococcus limneticus elegans" # Because a microorganism was found
|
||||
}
|
||||
\seealso{
|
||||
Data set \code{\link{microorganisms}} for the actual data. \cr
|
||||
Function \code{\link{as.mo}()} to use the data for intelligent determination of microorganisms.
|
||||
}
|
||||
|
@ -6,11 +6,16 @@
|
||||
\usage{
|
||||
catalogue_of_life_version()
|
||||
}
|
||||
\value{
|
||||
a \code{list}, invisibly
|
||||
}
|
||||
\description{
|
||||
This function returns a list with info about the included data from the Catalogue of Life. It also shows if the included version is their latest annual release. The Catalogue of Life releases their annual release in March each year.
|
||||
This function returns information about the included data from the Catalogue of Life. It also shows if the included version is their latest annual release. The Catalogue of Life releases their annual release in March each year.
|
||||
}
|
||||
\details{
|
||||
The list item \code{is_latest_annual_release} is based on the system date.
|
||||
|
||||
For DSMZ, see \code{?microorganisms}.
|
||||
}
|
||||
\section{Catalogue of Life}{
|
||||
|
||||
|
@ -3,8 +3,8 @@
|
||||
\docType{data}
|
||||
\name{microorganisms}
|
||||
\alias{microorganisms}
|
||||
\title{Data set with ~60,000 microorganisms}
|
||||
\format{A \code{\link{data.frame}} with 59,985 observations and 15 variables:
|
||||
\title{Data set with ~65,000 microorganisms}
|
||||
\format{A \code{\link{data.frame}} with 65,629 observations and 16 variables:
|
||||
\describe{
|
||||
\item{\code{mo}}{ID of microorganism as used by this package}
|
||||
\item{\code{col_id}}{Catalogue of Life ID}
|
||||
@ -20,10 +20,13 @@
|
||||
\item{\code{rank}}{Taxonomic rank of the microorganism, like \code{"species"} or \code{"genus"}}
|
||||
\item{\code{ref}}{Author(s) and year of concerning scientific publication}
|
||||
\item{\code{species_id}}{ID of the species as used by the Catalogue of Life}
|
||||
\item{\code{source}}{Either \code{"CoL"}, \code{"DSMZ"} (see source) or "manually added"}
|
||||
\item{\code{prevalence}}{Prevalence of the microorganism, see \code{?as.mo}}
|
||||
}}
|
||||
\source{
|
||||
Catalogue of Life: Annual Checklist (public online database), \url{www.catalogueoflife.org}.
|
||||
Catalogue of Life: Annual Checklist (public online taxonomic database), \url{www.catalogueoflife.org} (check included annual version with \code{\link{catalogue_of_life_version}()}).
|
||||
|
||||
Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Germany, Prokaryotic Nomenclature Up-to-Date, \url{http://www.dsmz.de/bacterial-diversity/prokaryotic-nomenclature-up-to-date} (check included version with \code{\link{catalogue_of_life_version}()}).
|
||||
}
|
||||
\usage{
|
||||
microorganisms
|
||||
@ -36,9 +39,17 @@ Manually added were:
|
||||
\itemize{
|
||||
\item{9 species of \emph{Streptococcus} (beta haemolytic groups A, B, C, D, F, G, H, K and unspecified)}
|
||||
\item{2 species of \emph{Staphylococcus} (coagulase-negative [CoNS] and coagulase-positive [CoPS])}
|
||||
\item{2 other undefined (unknown Gram negatives and unknown Gram positives)}
|
||||
\item{3 other undefined (unknown, unknown Gram negatives and unknown Gram positives)}
|
||||
\item{8,830 species from the DSMZ (Deutsche Sammlung von Mikroorganismen und Zellkulturen) that are not in the Catalogue of Life}
|
||||
}
|
||||
}
|
||||
\section{About the records from DSMZ (see source)}{
|
||||
|
||||
Names of prokaryotes are defined as being validly published by the International Code of Nomenclature of Bacteria. Validly published are all names which are included in the Approved Lists of Bacterial Names and the names subsequently published in the International Journal of Systematic Bacteriology (IJSB) and, from January 2000, in the International Journal of Systematic and Evolutionary Microbiology (IJSEM) as original articles or in the validation lists.
|
||||
|
||||
From: \url{https://www.dsmz.de/support/bacterial-nomenclature-up-to-date-downloads/readme.html}
|
||||
}
|
||||
|
||||
\section{Catalogue of Life}{
|
||||
|
||||
\if{html}{\figure{logo_col.png}{options: height=40px style=margin-bottom:5px} \cr}
|
||||
|
@ -4,7 +4,7 @@
|
||||
\name{microorganisms.old}
|
||||
\alias{microorganisms.old}
|
||||
\title{Data set with previously accepted taxonomic names}
|
||||
\format{A \code{\link{data.frame}} with 17,069 observations and 4 variables:
|
||||
\format{A \code{\link{data.frame}} with 16,911 observations and 4 variables:
|
||||
\describe{
|
||||
\item{\code{col_id}}{Catalogue of Life ID}
|
||||
\item{\code{tsn_new}}{New Catalogue of Life ID}
|
||||
|
@ -67,7 +67,7 @@ mo_property(x, property = "fullname", language = get_locale(), ...)
|
||||
|
||||
\item{...}{other parameters passed on to \code{\link{as.mo}}}
|
||||
|
||||
\item{open}{browse the URL using \code{\link[utils]{browseURL}}}
|
||||
\item{open}{browse the URL using \code{\link[utils]{browseURL}()}}
|
||||
|
||||
\item{property}{one of the column names of one of the \code{\link{microorganisms}} data set or \code{"shortname"}}
|
||||
}
|
||||
@ -90,9 +90,9 @@ All functions will return the most recently known taxonomic property according t
|
||||
\item{\code{mo_ref("Chlamydophila psittaci")} will return \code{"Everett et al., 1999"} (without a warning)}
|
||||
}
|
||||
|
||||
The Gram stain - \code{mo_gramstain()} - will be determined on the taxonomic kingdom and phylum. According to Cavalier-Smith (2002) who defined subkingdoms Negibacteria and Posibacteria, only these phyla are Posibacteria: Actinobacteria, Chloroflexi, Firmicutes and Tenericutes (ref: \url{https://itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=956097}). These bacteria are considered Gram positive - all other bacteria are considered Gram negative. Species outside the kingdom of Bacteria will return a value \code{NA}.
|
||||
The Gram stain - \code{mo_gramstain()} - will be determined on the taxonomic kingdom and phylum. According to Cavalier-Smith (2002) who defined subkingdoms Negibacteria and Posibacteria, only these phyla are Posibacteria: Actinobacteria, Chloroflexi, Firmicutes and Tenericutes. These bacteria are considered Gram positive - all other bacteria are considered Gram negative. Species outside the kingdom of Bacteria will return a value \code{NA}.
|
||||
|
||||
The function \code{mo_url()} will return the direct URL to the species in the Catalogue of Life.
|
||||
The function \code{mo_url()} will return the direct URL to the online database entry, which also shows the scientific reference of the concerned species.
|
||||
}
|
||||
\section{Supported languages}{
|
||||
|
||||
@ -169,7 +169,7 @@ mo_shortname("K. pneu rh") # "K. pneumoniae"
|
||||
|
||||
# Becker classification, see ?as.mo
|
||||
mo_fullname("S. epi") # "Staphylococcus epidermidis"
|
||||
mo_fullname("S. epi", Becker = TRUE) # "Coagulase Negative Staphylococcus (CoNS)"
|
||||
mo_fullname("S. epi", Becker = TRUE) # "Coagulase-negative Staphylococcus (CoNS)"
|
||||
mo_shortname("S. epi") # "S. epidermidis"
|
||||
mo_shortname("S. epi", Becker = TRUE) # "CoNS"
|
||||
|
||||
|
Reference in New Issue
Block a user