<!-- Generated by pkgdown: do not edit by hand --><htmllang="en"><head><metahttp-equiv="Content-Type"content="text/html; charset=UTF-8"><metacharset="utf-8"><metahttp-equiv="X-UA-Compatible"content="IE=edge"><metaname="viewport"content="width=device-width, initial-scale=1, shrink-to-fit=no"><title>Data Set with 78 678 Taxonomic Records of Microorganisms — microorganisms • AMR (for R)</title><!-- favicons --><linkrel="icon"type="image/png"sizes="16x16"href="../favicon-16x16.png"><linkrel="icon"type="image/png"sizes="32x32"href="../favicon-32x32.png"><linkrel="apple-touch-icon"type="image/png"sizes="180x180"href="../apple-touch-icon.png"><linkrel="apple-touch-icon"type="image/png"sizes="120x120"href="../apple-touch-icon-120x120.png"><linkrel="apple-touch-icon"type="image/png"sizes="76x76"href="../apple-touch-icon-76x76.png"><linkrel="apple-touch-icon"type="image/png"sizes="60x60"href="../apple-touch-icon-60x60.png"><scriptsrc="../deps/jquery-3.6.0/jquery-3.6.0.min.js"></script><metaname="viewport"content="width=device-width, initial-scale=1, shrink-to-fit=no"><linkhref="../deps/bootstrap-5.3.1/bootstrap.min.css"rel="stylesheet"><scriptsrc="../deps/bootstrap-5.3.1/bootstrap.bundle.min.js"></script><linkhref="../deps/Lato-0.4.9/font.css"rel="stylesheet"><linkhref="../deps/Fira_Code-0.4.9/font.css"rel="stylesheet"><linkhref="../deps/font-awesome-6.5.2/css/all.min.css"rel="stylesheet"><linkhref="../deps/font-awesome-6.5.2/css/v4-shims.min.css"rel="stylesheet"><scriptsrc="../deps/headroom-0.11.0/headroom.min.js"></script><scriptsrc="../deps/headroom-0.11.0/jQuery.headroom.min.js"></script><scriptsrc="../deps/bootstrap-toc-1.0.1/bootstrap-toc.min.js"></script><scriptsrc="../deps/clipboard.js-2.0.11/clipboard.min.js"></script><scriptsrc="../deps/search-1.0.0/autocomplete.jquery.min.js"></script><scriptsrc="../deps/search-1.0.0/fuse.min.js"></script><scriptsrc="../deps/search-1.0.0/mark.min.js"></script><!-- pkgdown --><scriptsrc="../pkgdown.js"></script><linkhref="../extra.css"rel="stylesheet"><scriptsrc="../extra.js"></script><metaproperty="og:title"content="Data Set with 78 678 Taxonomic Records of Microorganisms — microorganisms"><metaname="description"content="Adatasetcontainingthefullmicrobialtaxonomy(lastupdated:June24th,2024)ofsixkingdoms.ThisdatasetisthebackboneofthisAMRpackage.MOcodescanbelookedupusingas.mo()andmicroorganismpropertiescanbelookedupusinganyofthemo_*functions.
This data set is carefully crafted, yet made 100% reproducible from public and authoritative taxonomic sources (using this script), namely: List of Prokaryotic names with Standing in Nomenclature (LPSN) for bacteria, MycoBank for fungi, and Global Biodiversity Information Facility (GBIF) for all others taxons."><metaproperty="og:description"content="Adatasetcontainingthefullmicrobialtaxonomy(lastupdated:June24th,2024)ofsixkingdoms.ThisdatasetisthebackboneofthisAMRpackage.MOcodescanbelookedupusingas.mo()andmicroorganismpropertiescanbelookedupusinganyofthemo_*functions.
This data set is carefully crafted, yet made 100% reproducible from public and authoritative taxonomic sources (using this script), namely: List of Prokaryotic names with Standing in Nomenclature (LPSN) for bacteria, MycoBank for fungi, and Global Biodiversity Information Facility (GBIF) for all others taxons."><metaproperty="og:image"content="https://msberends.github.io/AMR/logo.svg"></head><body>
<buttonclass="nav-link dropdown-toggle"type="button"id="dropdown-how-to"data-bs-toggle="dropdown"aria-expanded="false"aria-haspopup="true"><spanclass="fa fa-question-circle"></span> How to</button>
<ulclass="dropdown-menu"aria-labelledby="dropdown-how-to"><li><aclass="dropdown-item"href="../articles/AMR.html"><spanclass="fa fa-directions"></span> Conduct AMR Analysis</a></li>
<li><aclass="dropdown-item"href="../articles/AMR_with_tidymodels.html"><spanclass="fa fa-square-root-variable"></span> Use AMR for Predictive Modelling (tidymodels)</a></li>
<p>A data set containing the full microbial taxonomy (<strong>last updated: June 24th, 2024</strong>) of six kingdoms. This data set is the backbone of this <code>AMR</code> package. MO codes can be looked up using <code><ahref="as.mo.html">as.mo()</a></code> and microorganism properties can be looked up using any of the <code><ahref="mo_property.html">mo_*</a></code> functions.</p>
<p>This data set is carefully crafted, yet made 100% reproducible from public and authoritative taxonomic sources (using <ahref="https://github.com/msberends/AMR/blob/main/data-raw/reproduction_of_microorganisms.R"class="external-link">this script</a>), namely: <em>List of Prokaryotic names with Standing in Nomenclature (LPSN)</em> for bacteria, <em>MycoBank</em> for fungi, and <em>Global Biodiversity Information Facility (GBIF)</em> for all others taxons.</p>
<p>A <ahref="https://tibble.tidyverse.org/reference/tibble.html"class="external-link">tibble</a> with 78 678 observations and 26 variables:</p><ul><li><p><code>mo</code><br> ID of microorganism as used by this package. <em><strong>This is a unique identifier.</strong></em></p></li>
<li><p><code>fullname</code><br> Full name, like <code>"Escherichia coli"</code>. For the taxonomic ranks genus, species and subspecies, this is the 'pasted' text of genus, species, and subspecies. For all taxonomic ranks higher than genus, this is the name of the taxon. <em><strong>This is a unique identifier.</strong></em></p></li>
<li><p><code>status</code><br> Status of the taxon, either "accepted", "not validly published", "synonym", or "unknown"</p></li>
<li><p><code>kingdom</code>, <code>phylum</code>, <code>class</code>, <code>order</code>, <code>family</code>, <code>genus</code>, <code>species</code>, <code>subspecies</code><br> Taxonomic rank of the microorganism. Note that for fungi, <em>phylum</em> is equal to their taxonomic <em>division</em>. Also, for fungi, <em>subkingdom</em> and <em>subdivision</em> were left out since they do not occur in the bacterial taxonomy.</p></li>
<li><p><code>ref</code><br> Author(s) and year of related scientific publication. This contains only the <em>first surname</em> and year of the <em>latest</em> authors, e.g. "Wallis <em>et al.</em> 2006 <em>emend.</em> Smith and Jones 2018" becomes "Smith <em>et al.</em>, 2018". This field is directly retrieved from the source specified in the column <code>source</code>. Moreover, accents were removed to comply with CRAN that only allows ASCII characters.</p></li>
<li><p><code>oxygen_tolerance</code><br> Oxygen tolerance, either "aerobe", "anaerobe", "anaerobe/microaerophile", "facultative anaerobe", "likely facultative anaerobe", or "microaerophile". These data were retrieved from BacDive (see <em>Source</em>). Items that contain "likely" are missing from BacDive and were extrapolated from other species within the same genus to guess the oxygen tolerance. Currently 68.3% of all ~39 000 bacteria in the data set contain an oxygen tolerance.</p></li>
<li><p><code>source</code><br> Either "GBIF", "LPSN", "MycoBank", or "manually added" (see <em>Source</em>)</p></li>
<li><p><code>lpsn</code><br> Identifier ('Record number') of List of Prokaryotic names with Standing in Nomenclature (LPSN). This will be the first/highest LPSN identifier to keep one identifier per row. For example, <em>Acetobacter ascendens</em> has LPSN Record number 7864 and 11011. Only the first is available in the <code>microorganisms</code> data set. <em><strong>This is a unique identifier</strong></em>, though available for only ~33 000 records.</p></li>
<li><p><code>mycobank</code><br> Identifier ('MycoBank #') of MycoBank. <em><strong>This is a unique identifier</strong></em>, though available for only ~18 000 records.</p></li>
<li><p><code>gbif</code><br> Identifier ('taxonID') of Global Biodiversity Information Facility (GBIF). <em><strong>This is a unique identifier</strong></em>, though available for only ~49 000 records.</p></li>
<li><p><code>prevalence</code><br> Prevalence of the microorganism based on Bartlett <em>et al.</em> (2022, <ahref="https://doi.org/10.1099/mic.0.001269"class="external-link">doi:10.1099/mic.0.001269</a>
<li><p><code>snomed</code><br> Systematized Nomenclature of Medicine (SNOMED) code of the microorganism, version of July 16th, 2024 (see <em>Source</em>). Use <code><ahref="mo_property.html">mo_snomed()</a></code> to retrieve it quickly, see <code><ahref="mo_property.html">mo_property()</a></code>.</p></li>
<p>Taxonomic entries were imported in this order of importance:</p><ol><li><p>List of Prokaryotic names with Standing in Nomenclature (LPSN):<br><br>
Parte, AC <em>et al.</em> (2020). <strong>List of Prokaryotic names with Standing in Nomenclature (LPSN) moves to the DSMZ.</strong> International Journal of Systematic and Evolutionary Microbiology, 70, 5607-5612; <ahref="https://doi.org/10.1099/ijsem.0.004332"class="external-link">doi:10.1099/ijsem.0.004332</a>
. Accessed from <ahref="https://lpsn.dsmz.de"class="external-link">https://lpsn.dsmz.de</a> on June 24th, 2024.</p></li>
<li><p>MycoBank:<br><br>
Vincent, R <em>et al</em> (2013). <strong>MycoBank gearing up for new horizons.</strong> IMA Fungus, 4(2), 371-9; <ahref="https://doi.org/10.5598/imafungus.2013.04.02.16"class="external-link">doi:10.5598/imafungus.2013.04.02.16</a>
. Accessed from <ahref="https://www.mycobank.org"class="external-link">https://www.mycobank.org</a> on June 24th, 2024.</p></li>
<li><p>Global Biodiversity Information Facility (GBIF):<br><br>
. Accessed from <ahref="https://www.gbif.org"class="external-link">https://www.gbif.org</a> on June 24th, 2024.</p></li>
</ol><p>Furthermore, these sources were used for additional details:</p><ul><li><p>BacDive:<br><br>
Reimer, LC <em>et al.</em> (2022). <strong><em>BacDive</em> in 2022: the knowledge base for standardized bacterial and archaeal data.</strong> Nucleic Acids Res., 50(D1):D741-D74; <ahref="https://doi.org/10.1093/nar/gkab961"class="external-link">doi:10.1093/nar/gkab961</a>
. Accessed from <ahref="https://bacdive.dsmz.de"class="external-link">https://bacdive.dsmz.de</a> on July 16th, 2024.</p></li>
<li><p>Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT):<br><br>
Public Health Information Network Vocabulary Access and Distribution System (PHIN VADS). US Edition of SNOMED CT from 1 September 2020. Value Set Name 'Microorganism', OID 2.16.840.1.114222.4.11.1009 (v12). Accessed from <ahref="https://phinvads.cdc.gov"class="external-link">https://phinvads.cdc.gov</a> on July 16th, 2024.</p></li>
<li><p>Grimont <em>et al.</em> (2007). Antigenic Formulae of the Salmonella Serovars, 9th Edition. WHO Collaborating Centre for Reference and Research on <em>Salmonella</em> (WHOCC-SALM).</p></li>
<li><p>Bartlett <em>et al.</em> (2022). <strong>A comprehensive list of bacterial pathogens infecting humans</strong><em>Microbiology</em> 168:001269; <ahref="https://doi.org/10.1099/mic.0.001269"class="external-link">doi:10.1099/mic.0.001269</a></p></li>
<p>Please note that entries are only based on LPSN, MycoBank, and GBIF (see below). Since these sources incorporate entries based on (recent) publications in the International Journal of Systematic and Evolutionary Microbiology (IJSEM), it can happen that the year of publication is sometimes later than one might expect.</p>
<p>For example, <em>Staphylococcus pettenkoferi</em> was described for the first time in Diagnostic Microbiology and Infectious Disease in 2002 (<ahref="https://doi.org/10.1016/s0732-8893%2802%2900399-1"class="external-link">doi:10.1016/s0732-8893(02)00399-1</a>
), but it was not until 2007 that a publication in IJSEM followed (<ahref="https://doi.org/10.1099/ijs.0.64381-0"class="external-link">doi:10.1099/ijs.0.64381-0</a>
<p>Included taxonomic data from <ahref="https://lpsn.dsmz.de"class="external-link">LPSN</a>, <ahref="https://www.mycobank.org"class="external-link">MycoBank</a>, and <ahref="https://www.gbif.org"class="external-link">GBIF</a> are:</p><ul><li><p>All ~39 000 (sub)species from the kingdoms of Archaea and Bacteria</p></li>
<li><p>~28 000 species from the kingdom of Fungi. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package. Only relevant fungi are covered (such as all species of <em>Aspergillus</em>, <em>Candida</em>, <em>Cryptococcus</em>, <em>Histoplasma</em>, <em>Pneumocystis</em>, <em>Saccharomyces</em> and <em>Trichophyton</em>).</p></li>
<p>For convenience, some entries were added manually:</p><ul><li><p>~1 500 entries of <em>Salmonella</em>, such as the city-like serovars and groups A to H</p></li>
<li><p>36 species groups (such as the beta-haemolytic <em>Streptococcus</em> groups A to K, coagulase-negative <em>Staphylococcus</em> (CoNS), <em>Mycobacterium tuberculosis</em> complex, etc.), of which the group compositions are stored in the <ahref="microorganisms.groups.html">microorganisms.groups</a> data set</p></li>
<li><p>1 entry of <em>Blastocystis</em> (<em>B. hominis</em>), although it officially does not exist (Noel <em>et al.</em> 2005, PMID 15634993)</p></li>
<li><p>1 entry of <em>Moraxella</em> (<em>M. catarrhalis</em>), which was formally named <em>Branhamella catarrhalis</em> (Catlin, 1970) though this change was never accepted within the field of clinical microbiology</p></li>
</ul><p>The syntax used to transform the original data to a cleansed <spanstyle="R">R</span> format, can be <ahref="https://github.com/msberends/AMR/blob/main/data-raw/reproduction_of_microorganisms.R"class="external-link">found here</a>.</p>
<p>Like all data sets in this package, this data set is publicly available for download in the following formats: R, MS Excel, Apache Feather, Apache Parquet, SPSS, and Stata. Please visit <ahref="https://msberends.github.io/AMR/articles/datasets.html">our website for the download links</a>. The actual files are of course available on <ahref="https://github.com/msberends/AMR/tree/main/data-raw"class="external-link">our GitHub repository</a>.</p>
<p><code>AMR</code> (for R). Free and open-source, licenced under the <atarget="_blank"href="https://github.com/msberends/AMR/blob/main/LICENSE"class="external-link">GNU General Public License version 2.0 (GPL-2)</a>.<br>Developed at the <atarget="_blank"href="https://www.rug.nl"class="external-link">University of Groningen</a> and <atarget="_blank"href="https://www.umcg.nl"class="external-link">University Medical Center Groningen</a> in The Netherlands.</p>