1
0
mirror of https://github.com/msberends/AMR.git synced 2024-12-25 07:26:12 +01:00
AMR/reference/mo_matching_score.html
2024-12-20 10:03:24 +00:00

162 lines
21 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<!-- Generated by pkgdown: do not edit by hand --><html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"><title>Calculate the Matching Score for Microorganisms — mo_matching_score • AMR (for R)</title><!-- favicons --><link rel="icon" type="image/png" sizes="16x16" href="../favicon-16x16.png"><link rel="icon" type="image/png" sizes="32x32" href="../favicon-32x32.png"><link rel="apple-touch-icon" type="image/png" sizes="180x180" href="../apple-touch-icon.png"><link rel="apple-touch-icon" type="image/png" sizes="120x120" href="../apple-touch-icon-120x120.png"><link rel="apple-touch-icon" type="image/png" sizes="76x76" href="../apple-touch-icon-76x76.png"><link rel="apple-touch-icon" type="image/png" sizes="60x60" href="../apple-touch-icon-60x60.png"><script src="../deps/jquery-3.6.0/jquery-3.6.0.min.js"></script><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"><link href="../deps/bootstrap-5.3.1/bootstrap.min.css" rel="stylesheet"><script src="../deps/bootstrap-5.3.1/bootstrap.bundle.min.js"></script><link href="../deps/Lato-0.4.9/font.css" rel="stylesheet"><link href="../deps/Fira_Code-0.4.9/font.css" rel="stylesheet"><link href="../deps/font-awesome-6.5.2/css/all.min.css" rel="stylesheet"><link href="../deps/font-awesome-6.5.2/css/v4-shims.min.css" rel="stylesheet"><script src="../deps/headroom-0.11.0/headroom.min.js"></script><script src="../deps/headroom-0.11.0/jQuery.headroom.min.js"></script><script src="../deps/bootstrap-toc-1.0.1/bootstrap-toc.min.js"></script><script src="../deps/clipboard.js-2.0.11/clipboard.min.js"></script><script src="../deps/search-1.0.0/autocomplete.jquery.min.js"></script><script src="../deps/search-1.0.0/fuse.min.js"></script><script src="../deps/search-1.0.0/mark.min.js"></script><!-- pkgdown --><script src="../pkgdown.js"></script><link href="../extra.css" rel="stylesheet"><script src="../extra.js"></script><meta property="og:title" content="Calculate the Matching Score for Microorganisms — mo_matching_score"><meta name="description" content="This algorithm is used by as.mo() and all the mo_* functions to determine the most probable match of taxonomic records based on user input."><meta property="og:description" content="This algorithm is used by as.mo() and all the mo_* functions to determine the most probable match of taxonomic records based on user input."><meta property="og:image" content="https://msberends.github.io/AMR/logo.svg"></head><body>
<a href="#main" class="visually-hidden-focusable">Skip to contents</a>
<nav class="navbar navbar-expand-lg fixed-top bg-primary" data-bs-theme="dark" aria-label="Site navigation"><div class="container">
<a class="navbar-brand me-2" href="../index.html">AMR (for R)</a>
<small class="nav-text text-muted me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="">2.1.1.9122</small>
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div id="navbar" class="collapse navbar-collapse ms-3">
<ul class="navbar-nav me-auto"><li class="nav-item dropdown">
<button class="nav-link dropdown-toggle" type="button" id="dropdown-how-to" data-bs-toggle="dropdown" aria-expanded="false" aria-haspopup="true"><span class="fa fa-question-circle"></span> How to</button>
<ul class="dropdown-menu" aria-labelledby="dropdown-how-to"><li><a class="dropdown-item" href="../articles/AMR.html"><span class="fa fa-directions"></span> Conduct AMR Analysis</a></li>
<li><a class="dropdown-item" href="../reference/antibiogram.html"><span class="fa fa-file-prescription"></span> Generate Antibiogram (Trad./Syndromic/WISCA)</a></li>
<li><a class="dropdown-item" href="../articles/resistance_predict.html"><span class="fa fa-dice"></span> Predict Antimicrobial Resistance</a></li>
<li><a class="dropdown-item" href="../articles/datasets.html"><span class="fa fa-database"></span> Download Data Sets for Own Use</a></li>
<li><a class="dropdown-item" href="../articles/AMR_with_tidymodels.html"><span class="fa fa-square-root-variable"></span> Use AMR for Predictive Modelling (tidymodels)</a></li>
<li><a class="dropdown-item" href="../reference/AMR-options.html"><span class="fa fa-gear"></span> Set User- Or Team-specific Package Settings</a></li>
<li><a class="dropdown-item" href="../articles/PCA.html"><span class="fa fa-compress"></span> Conduct Principal Component Analysis for AMR</a></li>
<li><a class="dropdown-item" href="../articles/MDR.html"><span class="fa fa-skull-crossbones"></span> Determine Multi-Drug Resistance (MDR)</a></li>
<li><a class="dropdown-item" href="../articles/WHONET.html"><span class="fa fa-globe-americas"></span> Work with WHONET Data</a></li>
<li><a class="dropdown-item" href="../articles/EUCAST.html"><span class="fa fa-exchange-alt"></span> Apply Eucast Rules</a></li>
<li><a class="dropdown-item" href="../reference/mo_property.html"><span class="fa fa-bug"></span> Get Taxonomy of a Microorganism</a></li>
<li><a class="dropdown-item" href="../reference/ab_property.html"><span class="fa fa-capsules"></span> Get Properties of an Antibiotic Drug</a></li>
<li><a class="dropdown-item" href="../reference/av_property.html"><span class="fa fa-capsules"></span> Get Properties of an Antiviral Drug</a></li>
</ul></li>
<li class="nav-item"><a class="nav-link" href="../articles/AMR_for_Python.html"><span class="fa fab fa-python"></span> AMR for Python</a></li>
<li class="active nav-item"><a class="nav-link" href="../reference/index.html"><span class="fa fa-book-open"></span> Manual</a></li>
<li class="nav-item"><a class="nav-link" href="../authors.html"><span class="fa fa-users"></span> Authors</a></li>
</ul><ul class="navbar-nav"><li class="nav-item"><a class="nav-link" href="../news/index.html"><span class="fa far fa-newspaper"></span> Changelog</a></li>
<li class="nav-item"><a class="external-link nav-link" href="https://github.com/msberends/AMR"><span class="fa fab fa-github"></span> Source Code</a></li>
</ul></div>
</div>
</nav><div class="container template-reference-topic">
<div class="row">
<main id="main" class="col-md-9"><div class="page-header">
<img src="../logo.svg" class="logo" alt=""><h1>Calculate the Matching Score for Microorganisms</h1>
<small class="dont-index">Source: <a href="https://github.com/msberends/AMR/blob/main/R/mo_matching_score.R" class="external-link"><code>R/mo_matching_score.R</code></a></small>
<div class="d-none name"><code>mo_matching_score.Rd</code></div>
</div>
<div class="ref-description section level2">
<p>This algorithm is used by <code><a href="as.mo.html">as.mo()</a></code> and all the <code><a href="mo_property.html">mo_*</a></code> functions to determine the most probable match of taxonomic records based on user input.</p>
</div>
<div class="section level2">
<h2 id="ref-usage">Usage<a class="anchor" aria-label="anchor" href="#ref-usage"></a></h2>
<div class="sourceCode"><pre class="sourceCode r"><code><span><span class="fu">mo_matching_score</span><span class="op">(</span><span class="va">x</span>, <span class="va">n</span><span class="op">)</span></span></code></pre></div>
</div>
<div class="section level2">
<h2 id="arguments">Arguments<a class="anchor" aria-label="anchor" href="#arguments"></a></h2>
<dl><dt id="arg-x">x<a class="anchor" aria-label="anchor" href="#arg-x"></a></dt>
<dd><p>Any user input value(s)</p></dd>
<dt id="arg-n">n<a class="anchor" aria-label="anchor" href="#arg-n"></a></dt>
<dd><p>A full taxonomic name, that exists in <code><a href="microorganisms.html">microorganisms$fullname</a></code></p></dd>
</dl></div>
<div class="section level2">
<h2 id="note">Note<a class="anchor" aria-label="anchor" href="#note"></a></h2>
<p>This algorithm was originally developed in 2018 and subsequently described in: Berends MS <em>et al.</em> (2022). <strong>AMR: An R Package for Working with Antimicrobial Resistance Data</strong>. <em>Journal of Statistical Software</em>, 104(3), 1-31; <a href="https://doi.org/10.18637/jss.v104.i03" class="external-link">doi:10.18637/jss.v104.i03</a>
.</p>
<p>Later, the work of Bartlett A <em>et al.</em> about bacterial pathogens infecting humans (2022, <a href="https://doi.org/10.1099/mic.0.001269" class="external-link">doi:10.1099/mic.0.001269</a>
) was incorporated, and optimalisations to the algorithm were made.</p>
</div>
<div class="section level2">
<h2 id="matching-score-for-microorganisms">Matching Score for Microorganisms<a class="anchor" aria-label="anchor" href="#matching-score-for-microorganisms"></a></h2>
<p>With ambiguous user input in <code><a href="as.mo.html">as.mo()</a></code> and all the <code><a href="mo_property.html">mo_*</a></code> functions, the returned results are chosen based on their matching score using <code>mo_matching_score()</code>. This matching score \(m\), is calculated as:</p>
<p><img src="figures/mo_matching_score.png" width="300" alt="mo matching score"></p>
<p>where:</p><ul><li><p>\(x\) is the user input;</p></li>
<li><p>\(n\) is a taxonomic name (genus, species, and subspecies);</p></li>
<li><p>\(l_n\) is the length of \(n\);</p></li>
<li><p>\(lev\) is the <a href="https://en.wikipedia.org/wiki/Levenshtein_distance" class="external-link">Levenshtein distance function</a> (counting any insertion as 1, and any deletion or substitution as 2) that is needed to change \(x\) into \(n\);</p></li>
<li><p>\(p_n\) is the human pathogenic prevalence group of \(n\), as described below;</p></li>
<li><p>\(k_n\) is the taxonomic kingdom of \(n\), set as Bacteria = 1, Fungi = 1.25, Protozoa = 1.5, Chromista = 1.75, Archaea = 2, others = 3.</p></li>
</ul><p>The grouping into human pathogenic prevalence \(p\) is based on recent work from Bartlett <em>et al.</em> (2022, <a href="https://doi.org/10.1099/mic.0.001269" class="external-link">doi:10.1099/mic.0.001269</a>
) who extensively studied medical-scientific literature to categorise all bacterial species into these groups:</p><ul><li><p><strong>Established</strong>, if a taxonomic species has infected at least three persons in three or more references. These records have <code>prevalence = 1.15</code> in the <a href="microorganisms.html">microorganisms</a> data set;</p></li>
<li><p><strong>Putative</strong>, if a taxonomic species has fewer than three known cases. These records have <code>prevalence = 1.25</code> in the <a href="microorganisms.html">microorganisms</a> data set.</p></li>
</ul><p>Furthermore,</p><ul><li><p>Genera from the World Health Organization's (WHO) Priority Pathogen List have <code>prevalence = 1.0</code> in the <a href="microorganisms.html">microorganisms</a> data set;</p></li>
<li><p>Any genus present in the <strong>established</strong> list also has <code>prevalence = 1.15</code> in the <a href="microorganisms.html">microorganisms</a> data set;</p></li>
<li><p>Any other genus present in the <strong>putative</strong> list has <code>prevalence = 1.25</code> in the <a href="microorganisms.html">microorganisms</a> data set;</p></li>
<li><p>Any other species or subspecies of which the genus is present in the two aforementioned groups, has <code>prevalence = 1.5</code> in the <a href="microorganisms.html">microorganisms</a> data set;</p></li>
<li><p>Any <em>non-bacterial</em> genus, species or subspecies of which the genus is present in the following list, has <code>prevalence = 1.25</code> in the <a href="microorganisms.html">microorganisms</a> data set: <em>Absidia</em>, <em>Acanthamoeba</em>, <em>Acremonium</em>, <em>Actinomucor</em>, <em>Aedes</em>, <em>Alternaria</em>, <em>Amoeba</em>, <em>Ancylostoma</em>, <em>Angiostrongylus</em>, <em>Anisakis</em>, <em>Anopheles</em>, <em>Apophysomyces</em>, <em>Arthroderma</em>, <em>Aspergillus</em>, <em>Aureobasidium</em>, <em>Basidiobolus</em>, <em>Beauveria</em>, <em>Bipolaris</em>, <em>Blastobotrys</em>, <em>Blastocystis</em>, <em>Blastomyces</em>, <em>Candida</em>, <em>Capillaria</em>, <em>Chaetomium</em>, <em>Chilomastix</em>, <em>Chrysonilia</em>, <em>Chrysosporium</em>, <em>Cladophialophora</em>, <em>Cladosporium</em>, <em>Clavispora</em>, <em>Coccidioides</em>, <em>Cokeromyces</em>, <em>Conidiobolus</em>, <em>Coniochaeta</em>, <em>Contracaecum</em>, <em>Cordylobia</em>, <em>Cryptococcus</em>, <em>Cryptosporidium</em>, <em>Cunninghamella</em>, <em>Curvularia</em>, <em>Cyberlindnera</em>, <em>Debaryozyma</em>, <em>Demodex</em>, <em>Dermatobia</em>, <em>Dientamoeba</em>, <em>Diphyllobothrium</em>, <em>Dirofilaria</em>, <em>Echinostoma</em>, <em>Entamoeba</em>, <em>Enterobius</em>, <em>Epidermophyton</em>, <em>Exidia</em>, <em>Exophiala</em>, <em>Exserohilum</em>, <em>Fasciola</em>, <em>Fonsecaea</em>, <em>Fusarium</em>, <em>Geotrichum</em>, <em>Giardia</em>, <em>Graphium</em>, <em>Haloarcula</em>, <em>Halobacterium</em>, <em>Halococcus</em>, <em>Hansenula</em>, <em>Hendersonula</em>, <em>Heterophyes</em>, <em>Histomonas</em>, <em>Histoplasma</em>, <em>Hortaea</em>, <em>Hymenolepis</em>, <em>Hypomyces</em>, <em>Hysterothylacium</em>, <em>Kloeckera</em>, <em>Kluyveromyces</em>, <em>Kodamaea</em>, <em>Lacazia</em>, <em>Leishmania</em>, <em>Lichtheimia</em>, <em>Lodderomyces</em>, <em>Lomentospora</em>, <em>Madurella</em>, <em>Malassezia</em>, <em>Malbranchea</em>, <em>Metagonimus</em>, <em>Meyerozyma</em>, <em>Microsporidium</em>, <em>Microsporum</em>, <em>Millerozyma</em>, <em>Mortierella</em>, <em>Mucor</em>, <em>Mycocentrospora</em>, <em>Nannizzia</em>, <em>Necator</em>, <em>Nectria</em>, <em>Ochroconis</em>, <em>Oesophagostomum</em>, <em>Oidiodendron</em>, <em>Opisthorchis</em>, <em>Paecilomyces</em>, <em>Paracoccidioides</em>, <em>Pediculus</em>, <em>Penicillium</em>, <em>Phaeoacremonium</em>, <em>Phaeomoniella</em>, <em>Phialophora</em>, <em>Phlebotomus</em>, <em>Phoma</em>, <em>Pichia</em>, <em>Piedraia</em>, <em>Pithomyces</em>, <em>Pityrosporum</em>, <em>Pneumocystis</em>, <em>Pseudallescheria</em>, <em>Pseudoscopulariopsis</em>, <em>Pseudoterranova</em>, <em>Pulex</em>, <em>Purpureocillium</em>, <em>Quambalaria</em>, <em>Rhinocladiella</em>, <em>Rhizomucor</em>, <em>Rhizopus</em>, <em>Rhodotorula</em>, <em>Saccharomyces</em>, <em>Saksenaea</em>, <em>Saprochaete</em>, <em>Sarcoptes</em>, <em>Scedosporium</em>, <em>Schistosoma</em>, <em>Schizosaccharomyces</em>, <em>Scolecobasidium</em>, <em>Scopulariopsis</em>, <em>Scytalidium</em>, <em>Spirometra</em>, <em>Sporobolomyces</em>, <em>Sporopachydermia</em>, <em>Sporothrix</em>, <em>Sporotrichum</em>, <em>Stachybotrys</em>, <em>Strongyloides</em>, <em>Syncephalastrum</em>, <em>Syngamus</em>, <em>Taenia</em>, <em>Talaromyces</em>, <em>Teleomorph</em>, <em>Toxocara</em>, <em>Trichinella</em>, <em>Trichobilharzia</em>, <em>Trichoderma</em>, <em>Trichomonas</em>, <em>Trichophyton</em>, <em>Trichosporon</em>, <em>Trichostrongylus</em>, <em>Trichuris</em>, <em>Tritirachium</em>, <em>Trombicula</em>, <em>Trypanosoma</em>, <em>Tunga</em>, <em>Ulocladium</em>, <em>Ustilago</em>, <em>Verticillium</em>, <em>Wallemia</em>, <em>Wangiella</em>, <em>Wickerhamomyces</em>, <em>Wuchereria</em>, <em>Yarrowia</em>, or <em>Zygosaccharomyces</em>;</p></li>
<li><p>All other records have <code>prevalence = 2.0</code> in the <a href="microorganisms.html">microorganisms</a> data set.</p></li>
</ul><p>When calculating the matching score, all characters in \(x\) and \(n\) are ignored that are other than A-Z, a-z, 0-9, spaces and parentheses.</p>
<p>All matches are sorted descending on their matching score and for all user input values, the top match will be returned. This will lead to the effect that e.g., <code>"E. coli"</code> will return the microbial ID of <em>Escherichia coli</em> (\(m = 0.688\), a highly prevalent microorganism found in humans) and not <em>Entamoeba coli</em> (\(m = 0.381\), a less prevalent microorganism in humans), although the latter would alphabetically come first.</p>
</div>
<div class="section level2">
<h2 id="reference-data-publicly-available">Reference Data Publicly Available<a class="anchor" aria-label="anchor" href="#reference-data-publicly-available"></a></h2>
<p>All data sets in this <code>AMR</code> package (about microorganisms, antibiotics, SIR interpretation, EUCAST rules, etc.) are publicly and freely available for download in the following formats: R, MS Excel, Apache Feather, Apache Parquet, SPSS, and Stata. We also provide tab-separated plain text files that are machine-readable and suitable for input in any software program, such as laboratory information systems. Please visit <a href="https://msberends.github.io/AMR/articles/datasets.html">our website for the download links</a>. The actual files are of course available on <a href="https://github.com/msberends/AMR/tree/main/data-raw" class="external-link">our GitHub repository</a>.</p>
</div>
<div class="section level2">
<h2 id="ref-examples">Examples<a class="anchor" aria-label="anchor" href="#ref-examples"></a></h2>
<div class="sourceCode"><pre class="sourceCode r"><code><span class="r-in"><span><span class="fu"><a href="as.mo.html">mo_reset_session</a></span><span class="op">(</span><span class="op">)</span></span></span>
<span class="r-msg co"><span class="r-pr">#&gt;</span> Reset 17 previously matched input values.</span>
<span class="r-in"><span></span></span>
<span class="r-in"><span><span class="fu"><a href="as.mo.html">as.mo</a></span><span class="op">(</span><span class="st">"E. coli"</span><span class="op">)</span></span></span>
<span class="r-out co"><span class="r-pr">#&gt;</span> Class 'mo'</span>
<span class="r-out co"><span class="r-pr">#&gt;</span> [1] B_ESCHR_COLI</span>
<span class="r-in"><span><span class="fu"><a href="as.mo.html">mo_uncertainties</a></span><span class="op">(</span><span class="op">)</span></span></span>
<span class="r-out co"><span class="r-pr">#&gt;</span> Matching scores are based on the resemblance between the input and the full</span>
<span class="r-out co"><span class="r-pr">#&gt;</span> taxonomic name, and the pathogenicity in humans. See ?mo_matching_score.</span>
<span class="r-out co"><span class="r-pr">#&gt;</span> </span>
<span class="r-out co"><span class="r-pr">#&gt;</span> --------------------------------------------------------------------------------</span>
<span class="r-out co"><span class="r-pr">#&gt;</span> "E. coli" -&gt; Escherichia coli (B_ESCHR_COLI, 0.688)</span>
<span class="r-out co"><span class="r-pr">#&gt;</span> Also matched: Enterococcus crotali (0.650), Escherichia coli coli</span>
<span class="r-out co"><span class="r-pr">#&gt;</span> (0.643), Escherichia coli expressing (0.611), Enterobacter cowanii</span>
<span class="r-out co"><span class="r-pr">#&gt;</span> (0.600), Enterococcus columbae (0.595), Enterococcus camelliae (0.591),</span>
<span class="r-out co"><span class="r-pr">#&gt;</span> Enterococcus casseliflavus (0.577), Enterobacter cloacae cloacae</span>
<span class="r-out co"><span class="r-pr">#&gt;</span> (0.571), Enterobacter cloacae complex (0.571), and Enterobacter cloacae</span>
<span class="r-out co"><span class="r-pr">#&gt;</span> dissolvens (0.565)</span>
<span class="r-out co"><span class="r-pr">#&gt;</span> </span>
<span class="r-out co"><span class="r-pr">#&gt;</span> Only the first 10 other matches of each record are shown. Run</span>
<span class="r-out co"><span class="r-pr">#&gt;</span> print(mo_uncertainties(), n = ...) to view more entries, or save</span>
<span class="r-out co"><span class="r-pr">#&gt;</span> mo_uncertainties() to an object.</span>
<span class="r-in"><span></span></span>
<span class="r-in"><span><span class="fu">mo_matching_score</span><span class="op">(</span></span></span>
<span class="r-in"><span> x <span class="op">=</span> <span class="st">"E. coli"</span>,</span></span>
<span class="r-in"><span> n <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html" class="external-link">c</a></span><span class="op">(</span><span class="st">"Escherichia coli"</span>, <span class="st">"Entamoeba coli"</span><span class="op">)</span></span></span>
<span class="r-in"><span><span class="op">)</span></span></span>
<span class="r-out co"><span class="r-pr">#&gt;</span> [1] 0.6875000 0.3809524</span>
</code></pre></div>
</div>
</main><aside class="col-md-3"><nav id="toc" aria-label="Table of contents"><h2>On this page</h2>
</nav></aside></div>
<footer><div class="pkgdown-footer-left">
<p><code>AMR</code> (for R). Free and open-source, licenced under the <a target="_blank" href="https://github.com/msberends/AMR/blob/main/LICENSE" class="external-link">GNU General Public License version 2.0 (GPL-2)</a>.<br>Developed at the <a target="_blank" href="https://www.rug.nl" class="external-link">University of Groningen</a> and <a target="_blank" href="https://www.umcg.nl" class="external-link">University Medical Center Groningen</a> in The Netherlands.</p>
</div>
<div class="pkgdown-footer-right">
<p><a target="_blank" href="https://www.rug.nl" class="external-link"><img src="https://github.com/msberends/AMR/raw/main/pkgdown/assets/logo_rug.svg" style="max-width: 150px;"></a><a target="_blank" href="https://www.umcg.nl" class="external-link"><img src="https://github.com/msberends/AMR/raw/main/pkgdown/assets/logo_umcg.svg" style="max-width: 150px;"></a></p>
</div>
</footer></div>
</body></html>