A data set containing the full microbial taxonomy (last updated: 5 October 2021) of six kingdoms from the Catalogue of Life (CoL) and the List of Prokaryotic names with Standing in Nomenclature (LPSN). MO codes can be looked up using as.mo()
.
Format
A tibble with 70,764 observations and 16 variables:
mo
ID of microorganism as used by this packagefullname
Full name, like"Escherichia coli"
kingdom
,phylum
,class
,order
,family
,genus
,species
,subspecies
Taxonomic rank of the microorganismrank
Text of the taxonomic rank of the microorganism, like"species"
or"genus"
ref
Author(s) and year of concerning scientific publicationspecies_id
ID of the species as used by the Catalogue of Lifesource
Either "CoL", "LPSN" or "manually added" (see Source)prevalence
Prevalence of the microorganism, seeas.mo()
snomed
Systematized Nomenclature of Medicine (SNOMED) code of the microorganism, according to the US Edition of SNOMED CT from 1 September 2020 (see Source). Usemo_snomed()
to retrieve it quickly, seemo_property()
.
Source
Catalogue of Life: 2019 Annual Checklist as currently implemented in this AMR
package:
Annual Checklist (public online taxonomic database), http://www.catalogueoflife.org
List of Prokaryotic names with Standing in Nomenclature (5 October 2021) as currently implemented in this AMR
package:
Parte, A.C., Sarda Carbasse, J., Meier-Kolthoff, J.P., Reimer, L.C. and Goker, M. (2020). List of Prokaryotic names with Standing in Nomenclature (LPSN) moves to the DSMZ. International Journal of Systematic and Evolutionary Microbiology, 70, 5607-5612; doi:10.1099/ijsem.0.004332
Parte, A.C. (2018). LPSN - List of Prokaryotic names with Standing in Nomenclature (bacterio.net), 20 years on. International Journal of Systematic and Evolutionary Microbiology, 68, 1825-1829; doi:10.1099/ijsem.0.002786
Parte, A.C. (2014). LPSN - List of Prokaryotic names with Standing in Nomenclature. Nucleic Acids Research, 42, Issue D1, D613-D616; doi:10.1093/nar/gkt1111
Euzeby, J.P. (1997). List of Bacterial Names with Standing in Nomenclature: a Folder Available on the Internet. International Journal of Systematic Bacteriology, 47, 590-592; doi:10.1099/00207713-47-2-590
US Edition of SNOMED CT from 1 September 2020 as currently implemented in this AMR
package:
Retrieved from the Public Health Information Network Vocabulary Access and Distribution System (PHIN VADS), OID 2.16.840.1.114222.4.11.1009, version 12; url: https://phinvads.cdc.gov/vads/ViewValueSet.action?oid=2.16.840.1.114222.4.11.1009
Details
Please note that entries are only based on the Catalogue of Life and the LPSN (see below). Since these sources incorporate entries based on (recent) publications in the International Journal of Systematic and Evolutionary Microbiology (IJSEM), it can happen that the year of publication is sometimes later than one might expect.
For example, Staphylococcus pettenkoferi was described for the first time in Diagnostic Microbiology and Infectious Disease in 2002 (doi:10.1016/s0732-8893(02)00399-1
), but it was not before 2007 that a publication in IJSEM followed (doi:10.1099/ijs.0.64381-0
). Consequently, the AMR
package returns 2007 for mo_year("S. pettenkoferi")
.
Manual additions
For convenience, some entries were added manually:
11 entries of Streptococcus (beta-haemolytic: groups A, B, C, D, F, G, H, K and unspecified; other: viridans, milleri)
2 entries of Staphylococcus (coagulase-negative (CoNS) and coagulase-positive (CoPS))
3 entries of Trichomonas (T. vaginalis, and its family and genus)
4 entries of Toxoplasma (T. gondii, and its order, family and genus)
1 entry of Candida (C. krusei), that is not (yet) in the Catalogue of Life
1 entry of Blastocystis (B. hominis), although it officially does not exist (Noel et al. 2005, PMID 15634993)
1 entry of Moraxella (M. catarrhalis), which was formally named Branhamella catarrhalis (Catlin, 1970) though this change was never accepted within the field of clinical microbiology
5 other 'undefined' entries (unknown, unknown Gram negatives, unknown Gram positives, unknown yeast and unknown fungus)
6 families under the Enterobacterales order, according to Adeolu et al. (2016, PMID 27620848), that are not (yet) in the Catalogue of Life
Direct download
Like all data sets in this package, this data set is publicly available for download in the following formats: R, MS Excel, Apache Feather, Apache Parquet, SPSS, SAS, and Stata. Please visit our website for the download links. The actual files are of course available on our GitHub repository.
About the Records from LPSN (see Source)
The List of Prokaryotic names with Standing in Nomenclature (LPSN) provides comprehensive information on the nomenclature of prokaryotes. LPSN is a free to use service founded by Jean P. Euzeby in 1997 and later on maintained by Aidan C. Parte.
As of February 2020, the regularly augmented LPSN database at DSMZ is the basis of the new LPSN service. The new database was implemented for the Type-Strain Genome Server and augmented in 2018 to store all kinds of nomenclatural information. Data from the previous version of LPSN and from the Prokaryotic Nomenclature Up-to-date (PNU) service were imported into the new system. PNU had been established in 1993 as a service of the Leibniz Institute DSMZ, and was curated by Norbert Weiss, Manfred Kracht and Dorothea Gleim.
Catalogue of Life
This package contains the complete taxonomic tree of almost all microorganisms (~71,000 species) from the authoritative and comprehensive Catalogue of Life (CoL, http://www.catalogueoflife.org). The CoL is the most comprehensive and authoritative global index of species currently available. Nonetheless, we supplemented the CoL data with data from the List of Prokaryotic names with Standing in Nomenclature (LPSN, lpsn.dsmz.de). This supplementation is needed until the CoL+ project is finished, which we await.
Click here for more information about the included taxa. Check which versions of the CoL and LPSN were included in this package with catalogue_of_life_version()
.
Examples
microorganisms
#> # A tibble: 70,764 × 16
#> mo fullname kingdom phylum class order family genus species
#> <mo> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 F_FUNGUS (unknown fu… Fungi (unkn… (unk… (unk… "(unk… "(un… "(unkn…
#> 2 B_GRAMN (unknown Gr… Bacter… (unkn… (unk… (unk… "(unk… "(un… "(unkn…
#> 3 B_GRAMP (unknown Gr… Bacter… (unkn… (unk… (unk… "(unk… "(un… "(unkn…
#> 4 UNKNOWN (unknown na… (unkno… (unkn… (unk… (unk… "(unk… "(un… "(unkn…
#> 5 F_YEAST (unknown ye… Fungi (unkn… (unk… (unk… "(unk… "(un… "(unkn…
#> 6 B_[FAM]_ABDTBCTR Abditibacte… Bacter… Abdit… Abdi… Abdi… "Abdi… "" ""
#> 7 B_[ORD]_ABDTBCTR Abditibacte… Bacter… Abdit… Abdi… Abdi… "" "" ""
#> 8 B_ABDTB Abditibacte… Bacter… Abdit… Abdi… Abdi… "Abdi… "Abd… ""
#> 9 B_ABDTB_UTST Abditibacte… Bacter… Abdit… Abdi… Abdi… "Abdi… "Abd… "utste…
#> 10 C_ABDTD Abditodentr… Chromi… Foram… Glob… Rota… "Boli… "Abd… ""
#> # … with 70,754 more rows, and 7 more variables: subspecies <chr>, rank <chr>,
#> # ref <chr>, species_id <dbl>, source <chr>, prevalence <dbl>, snomed <list>