<!-- Generated by pkgdown: do not edit by hand --><htmllang="en"><head><metahttp-equiv="Content-Type"content="text/html; charset=UTF-8"><metacharset="utf-8"><metahttp-equiv="X-UA-Compatible"content="IE=edge"><metaname="viewport"content="width=device-width, initial-scale=1, shrink-to-fit=no"><metaname="description"content="This algorithm is used by as.mo() and all the mo_* functions to determine the most probable match of taxonomic records based on user input."><title>Calculate the Matching Score for Microorganisms — mo_matching_score • AMR (for R)</title><!-- favicons --><linkrel="icon"type="image/png"sizes="16x16"href="../favicon-16x16.png"><linkrel="icon"type="image/png"sizes="32x32"href="../favicon-32x32.png"><linkrel="apple-touch-icon"type="image/png"sizes="180x180"href="../apple-touch-icon.png"><linkrel="apple-touch-icon"type="image/png"sizes="120x120"href="../apple-touch-icon-120x120.png"><linkrel="apple-touch-icon"type="image/png"sizes="76x76"href="../apple-touch-icon-76x76.png"><linkrel="apple-touch-icon"type="image/png"sizes="60x60"href="../apple-touch-icon-60x60.png"><scriptsrc="../deps/jquery-3.6.0/jquery-3.6.0.min.js"></script><metaname="viewport"content="width=device-width, initial-scale=1, shrink-to-fit=no"><linkhref="../deps/bootstrap-5.3.1/bootstrap.min.css"rel="stylesheet"><scriptsrc="../deps/bootstrap-5.3.1/bootstrap.bundle.min.js"></script><linkhref="../deps/Lato-0.4.9/font.css"rel="stylesheet"><linkhref="../deps/Fira_Code-0.4.9/font.css"rel="stylesheet"><!-- Font Awesome icons --><linkrel="stylesheet"href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.12.1/css/all.min.css"integrity="sha256-mmgLkCYLUQbXn0B1SRqzHar6dCnv9oZFPEC1g1cwlkk="crossorigin="anonymous"><linkrel="stylesheet"href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.12.1/css/v4-shims.min.css"integrity="sha256-wZjR52fzng1pJHwx4aV2AO3yyTOXrcDW7jBpJtTwVxw="crossorigin="anonymous"><!-- bootstrap-toc --><scriptsrc="https://cdn.jsdelivr.net/gh/afeld/bootstrap-toc@v1.0.1/dist/bootstrap-toc.min.js"integrity="sha256-4veVQbu7//Lk5TSmc7YV48MxtMy98e26cf5MrgZYnwo="crossorigin="anonymous"></script><!-- headroom.js --><scriptsrc="https://cdnjs.cloudflare.com/ajax/libs/headroom/0.11.0/headroom.min.js"integrity="sha256-AsUX4SJE1+yuDu5+mAVzJbuYNPHj/WroHuZ8Ir/CkE0="crossorigin="anonymous"></script><scriptsrc="https://cdnjs.cloudflare.com/ajax/libs/headroom/0.11.0/jQuery.headroom.min.js"integrity="sha256-ZX/yNShbjqsohH1k95liqY9Gd8uOiE1S4vZc+9KQ1K4="crossorigin="anonymous"></script><!-- clipboard.js --><scriptsrc="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.11/clipboard.min.js"integrity="sha512-7O5pXpc0oCRrxk8RUfDYFgn0nO1t+jLuIOQdOMRp4APB7uZ4vSjspzp5y6YDtDs4VzUSTbWzBFZ/LKJhnyFOKw=="crossorigin="anonymous"referrerpolicy="no-referrer"></script><!-- search --><scriptsrc="https://cdnjs.cloudflare.com/ajax/libs/fuse.js/6.4.6/fuse.js"integrity="sha512-zv6Ywkjyktsohkbp9bb45V6tEMoWhzFzXis+LrMehmJZZSys19Yxf1dopHx7WzIKxr5tK2dVcYmaCk2uqdjF4A=="crossorigin="anonymous"></script><scriptsrc="https://cdnjs.cloudflare.com/ajax/libs/autocomplete.js/0.38.0/autocomplete.jquery.min.js"integrity="sha512-GU9ayf+66Xx2TmpxqJpliWbT5PiGYxpaG8rfnBEk1LL8l1KGkRShhngwdXK1UgqhAzWpZHSiYPc09/NwDQIGyg=="crossorigin="anonymous"></script><scriptsrc="https://cdnjs.cloudflare.com/ajax/libs/mark.js/8.11.1/mark.min.js"integrity="sha512-5CYOlHXGh6QpOFA/TeTylKLWfB3ftPsde7AnmhuitiTX4K5SqCLBeKro6sPS8ilsz1Q4NRx3v8Ko2IBiszzdww=="crossorigin="anonymous"></script><!-- pkgdown --><scriptsrc="../pkgdown.js"></script><linkhref="../extra.css"rel="stylesheet"><scriptsrc="../extra.js"></script><metaproperty="og:title"content="Calculate the Matching Score for Microorganisms — mo_matching_score"><metaproperty="og:description"content="This algorithm is used by as.mo() and all the mo_* functions to determine the most probable match of taxonomic records based on user input."><metaproperty="og:image"content="https://msberends.github.io/AMR/logo.svg"><metaname="twitter:card"content="summary_large_imag
<p>This algorithm is used by <code><ahref="as.mo.html">as.mo()</a></code> and all the <code><ahref="mo_property.html">mo_*</a></code> functions to determine the most probable match of taxonomic records based on user input.</p>
<p>This algorithm was originally described in: Berends MS <em>et al.</em> (2022). <strong>AMR: An R Package for Working with Antimicrobial Resistance Data</strong>. <em>Journal of Statistical Software</em>, 104(3), 1-31; <ahref="https://doi.org/10.18637/jss.v104.i03"class="external-link">doi:10.18637/jss.v104.i03</a>
<p>Later, the work of Bartlett A <em>et al.</em> about bacterial pathogens infecting humans (2022, <ahref="https://doi.org/10.1099/mic.0.001269"class="external-link">doi:10.1099/mic.0.001269</a>
<h2id="matching-score-for-microorganisms">Matching Score for Microorganisms<aclass="anchor"aria-label="anchor"href="#matching-score-for-microorganisms"></a></h2>
<p>With ambiguous user input in <code><ahref="as.mo.html">as.mo()</a></code> and all the <code><ahref="mo_property.html">mo_*</a></code> functions, the returned results are chosen based on their matching score using <code>mo_matching_score()</code>. This matching score \(m\), is calculated as:</p>
<li><p>\(n\) is a taxonomic name (genus, species, and subspecies);</p></li>
<li><p>\(l_n\) is the length of \(n\);</p></li>
<li><p>\(lev\) is the <ahref="https://en.wikipedia.org/wiki/Levenshtein_distance"class="external-link">Levenshtein distance function</a> (counting any insertion as 1, and any deletion or substitution as 2) that is needed to change \(x\) into \(n\);</p></li>
</ul><p>The grouping into human pathogenic prevalence \(p\) is based on recent work from Bartlett <em>et al.</em> (2022, <ahref="https://doi.org/10.1099/mic.0.001269"class="external-link">doi:10.1099/mic.0.001269</a>
) who extensively studied medical-scientific literature to categorise all bacterial species into these groups:</p><ul><li><p><strong>Established</strong>, if a taxonomic species has infected at least three persons in three or more references. These records have <code>prevalence = 1.0</code> in the <ahref="microorganisms.html">microorganisms</a> data set;</p></li>
<li><p><strong>Putative</strong>, if a taxonomic species has fewer than three known cases. These records have <code>prevalence = 1.25</code> in the <ahref="microorganisms.html">microorganisms</a> data set.</p></li>
</ul><p>Furthermore,</p><ul><li><p>Any genus present in the <strong>established</strong> list also has <code>prevalence = 1.0</code> in the <ahref="microorganisms.html">microorganisms</a> data set;</p></li>
<li><p>Any other genus present in the <strong>putative</strong> list has <code>prevalence = 1.25</code> in the <ahref="microorganisms.html">microorganisms</a> data set;</p></li>
<li><p>Any other species or subspecies of which the genus is present in the two aforementioned groups, has <code>prevalence = 1.5</code> in the <ahref="microorganisms.html">microorganisms</a> data set;</p></li>
</ul><p>When calculating the matching score, all characters in \(x\) and \(n\) are ignored that are other than A-Z, a-z, 0-9, spaces and parentheses.</p>
<p>All matches are sorted descending on their matching score and for all user input values, the top match will be returned. This will lead to the effect that e.g., <code>"E. coli"</code> will return the microbial ID of <em>Escherichia coli</em> (\(m = 0.688\), a highly prevalent microorganism found in humans) and not <em>Entamoeba coli</em> (\(m = 0.381\), a less prevalent microorganism in humans), although the latter would alphabetically come first.</p>
<h2id="reference-data-publicly-available">Reference Data Publicly Available<aclass="anchor"aria-label="anchor"href="#reference-data-publicly-available"></a></h2>
<p>All data sets in this <code>AMR</code> package (about microorganisms, antibiotics, SIR interpretation, EUCAST rules, etc.) are publicly and freely available for download in the following formats: R, MS Excel, Apache Feather, Apache Parquet, SPSS, SAS, and Stata. We also provide tab-separated plain text files that are machine-readable and suitable for input in any software program, such as laboratory information systems. Please visit <ahref="https://msberends.github.io/AMR/articles/datasets.html">our website for the download links</a>. The actual files are of course available on <ahref="https://github.com/msberends/AMR/tree/main/data-raw"class="external-link">our GitHub repository</a>.</p>
<p><code>AMR</code> (for R). Free and open-source, licenced under the <atarget="_blank"href="https://github.com/msberends/AMR/blob/main/LICENSE"class="external-link">GNU General Public License version 2.0 (GPL-2)</a>.<br>Developed at the <atarget="_blank"href="https://www.rug.nl"class="external-link">University of Groningen</a> and <atarget="_blank"href="https://www.umcg.nl"class="external-link">University Medical Center Groningen</a> in The Netherlands.</p>