<!-- Generated by pkgdown: do not edit by hand --><htmllang="en"><head><metahttp-equiv="Content-Type"content="text/html; charset=UTF-8"><metacharset="utf-8"><metahttp-equiv="X-UA-Compatible"content="IE=edge"><metaname="viewport"content="width=device-width, initial-scale=1, shrink-to-fit=no"><metaname="description"content="This algorithm is used by as.mo() and all the mo_* functions to determine the most probable match of taxonomic records based on user input."><title>Calculate the Matching Score for Microorganisms — mo_matching_score • AMR (for R)</title><!-- favicons --><linkrel="icon"type="image/png"sizes="16x16"href="../favicon-16x16.png"><linkrel="icon"type="image/png"sizes="32x32"href="../favicon-32x32.png"><linkrel="apple-touch-icon"type="image/png"sizes="180x180"href="../apple-touch-icon.png"><linkrel="apple-touch-icon"type="image/png"sizes="120x120"href="../apple-touch-icon-120x120.png"><linkrel="apple-touch-icon"type="image/png"sizes="76x76"href="../apple-touch-icon-76x76.png"><linkrel="apple-touch-icon"type="image/png"sizes="60x60"href="../apple-touch-icon-60x60.png"><scriptsrc="../deps/jquery-3.6.0/jquery-3.6.0.min.js"></script><metaname="viewport"content="width=device-width, initial-scale=1, shrink-to-fit=no"><linkhref="../deps/bootstrap-5.2.2/bootstrap.min.css"rel="stylesheet"><scriptsrc="../deps/bootstrap-5.2.2/bootstrap.bundle.min.js"></script><linkhref="../deps/Fira_Code-0.4.4/font.css"rel="stylesheet"><!-- Font Awesome icons --><linkrel="stylesheet"href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.12.1/css/all.min.css"integrity="sha256-mmgLkCYLUQbXn0B1SRqzHar6dCnv9oZFPEC1g1cwlkk="crossorigin="anonymous"><linkrel="stylesheet"href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.12.1/css/v4-shims.min.css"integrity="sha256-wZjR52fzng1pJHwx4aV2AO3yyTOXrcDW7jBpJtTwVxw="crossorigin="anonymous"><!-- bootstrap-toc --><scriptsrc="https://cdn.jsdelivr.net/gh/afeld/bootstrap-toc@v1.0.1/dist/bootstrap-toc.min.js"integrity="sha256-4veVQbu7//Lk5TSmc7YV48MxtMy98e26cf5MrgZYnwo="crossorigin="anonymous"></script><!-- headroom.js --><scriptsrc="https://cdnjs.cloudflare.com/ajax/libs/headroom/0.11.0/headroom.min.js"integrity="sha256-AsUX4SJE1+yuDu5+mAVzJbuYNPHj/WroHuZ8Ir/CkE0="crossorigin="anonymous"></script><scriptsrc="https://cdnjs.cloudflare.com/ajax/libs/headroom/0.11.0/jQuery.headroom.min.js"integrity="sha256-ZX/yNShbjqsohH1k95liqY9Gd8uOiE1S4vZc+9KQ1K4="crossorigin="anonymous"></script><!-- clipboard.js --><scriptsrc="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.6/clipboard.min.js"integrity="sha256-inc5kl9MA1hkeYUt+EC3BhlIgyp/2jDIyBLS6k3UxPI="crossorigin="anonymous"></script><!-- search --><scriptsrc="https://cdnjs.cloudflare.com/ajax/libs/fuse.js/6.4.6/fuse.js"integrity="sha512-zv6Ywkjyktsohkbp9bb45V6tEMoWhzFzXis+LrMehmJZZSys19Yxf1dopHx7WzIKxr5tK2dVcYmaCk2uqdjF4A=="crossorigin="anonymous"></script><scriptsrc="https://cdnjs.cloudflare.com/ajax/libs/autocomplete.js/0.38.0/autocomplete.jquery.min.js"integrity="sha512-GU9ayf+66Xx2TmpxqJpliWbT5PiGYxpaG8rfnBEk1LL8l1KGkRShhngwdXK1UgqhAzWpZHSiYPc09/NwDQIGyg=="crossorigin="anonymous"></script><scriptsrc="https://cdnjs.cloudflare.com/ajax/libs/mark.js/8.11.1/mark.min.js"integrity="sha512-5CYOlHXGh6QpOFA/TeTylKLWfB3ftPsde7AnmhuitiTX4K5SqCLBeKro6sPS8ilsz1Q4NRx3v8Ko2IBiszzdww=="crossorigin="anonymous"></script><!-- pkgdown --><scriptsrc="../pkgdown.js"></script><linkhref="../extra.css"rel="stylesheet"><scriptsrc="../extra.js"></script><metaproperty="og:title"content="Calculate the Matching Score for Microorganisms — mo_matching_score"><metaproperty="og:description"content="This algorithm is used by as.mo() and all the mo_* functions to determine the most probable match of taxonomic records based on user input."><metaproperty="og:image"content="https://msberends.github.io/AMR/logo.svg"><metaname="twitter:card"content="summary_large_image"><metaname="twitter:creator"content="@msberends"><metaname="twitter:site"content="@msberends"><!-- mathjax --><scriptsrc="htt
<p>This algorithm is used by <code><ahref="as.mo.html">as.mo()</a></code> and all the <code><ahref="mo_property.html">mo_*</a></code> functions to determine the most probable match of taxonomic records based on user input.</p>
<p>This algorithm was originally described in: Berends MS <em>et al.</em> (2022). <strong>AMR: An R Package for Working with Antimicrobial Resistance Data</strong>. <em>Journal of Statistical Software</em>, 104(3), 1-31; <ahref="https://doi.org/10.18637/jss.v104.i03"class="external-link">doi:10.18637/jss.v104.i03</a>
<p>Later, the work of Bartlett A <em>et al.</em> about bacterial pathogens infecting humans (2022, <ahref="https://doi.org/10.1099/mic.0.001269"class="external-link">doi:10.1099/mic.0.001269</a>
<h2id="matching-score-for-microorganisms">Matching Score for Microorganisms<aclass="anchor"aria-label="anchor"href="#matching-score-for-microorganisms"></a></h2>
<p>With ambiguous user input in <code><ahref="as.mo.html">as.mo()</a></code> and all the <code><ahref="mo_property.html">mo_*</a></code> functions, the returned results are chosen based on their matching score using <code>mo_matching_score()</code>. This matching score \(m\), is calculated as:</p>
<li><p>\(n\) is a taxonomic name (genus, species, and subspecies);</p></li>
<li><p>\(l_n\) is the length of \(n\);</p></li>
<li><p>\(lev\) is the <ahref="https://en.wikipedia.org/wiki/Levenshtein_distance"class="external-link">Levenshtein distance function</a> (counting any insertion as 1, and any deletion or substitution as 2) that is needed to change \(x\) into \(n\);</p></li>
<li><p>\(p_{n}\) is the human pathogenic prevalence group of \(n\), as described below;</p></li>
<li><p>\(k_n\) is the taxonomic kingdom of \(n\), set as Bacteria = 1, Fungi = 2, Protozoa = 3, Archaea = 4, others = 5.</p></li>
</ul><p>The grouping into human pathogenic prevalence \(p\) is based on recent work from Bartlett <em>et al.</em> (2022, <ahref="https://doi.org/10.1099/mic.0.001269"class="external-link">doi:10.1099/mic.0.001269</a>
) who extensively studied medical-scientific literature to categorise all bacterial species into these groups:</p><ul><li><p><strong>Established</strong>, if a taxonomic species has infected at least three persons in three or more references. These records have <code>prevalence = 1.0</code> in the <ahref="microorganisms.html">microorganisms</a> data set;</p></li>
<li><p><strong>Putative</strong>, if a taxonomic species has fewer than three known cases. These records have <code>prevalence = 1.25</code> in the <ahref="microorganisms.html">microorganisms</a> data set.</p></li>
</ul><p>Furthermore,</p><ul><li><p>Any genus present in the <strong>established</strong> list also has <code>prevalence = 1.0</code> in the <ahref="microorganisms.html">microorganisms</a> data set;</p></li>
<li><p>Any other genus present in the <strong>putative</strong> list has <code>prevalence = 1.25</code> in the <ahref="microorganisms.html">microorganisms</a> data set;</p></li>
<li><p>Any other species or subspecies of which the genus is present in the two aforementioned groups, has <code>prevalence = 1.5</code> in the <ahref="microorganisms.html">microorganisms</a> data set;</p></li>
<li><p>Any <em>non-bacterial</em> genus, species or subspecies of which the genus is present in the following list, has <code>prevalence = 1.5</code> in the <ahref="microorganisms.html">microorganisms</a> data set: <em>Absidia</em>, <em>Acanthamoeba</em>, <em>Acremonium</em>, <em>Aedes</em>, <em>Alternaria</em>, <em>Amoeba</em>, <em>Ancylostoma</em>, <em>Angiostrongylus</em>, <em>Anisakis</em>, <em>Anopheles</em>, <em>Apophysomyces</em>, <em>Aspergillus</em>, <em>Aureobasidium</em>, <em>Basidiobolus</em>, <em>Beauveria</em>, <em>Blastocystis</em>, <em>Blastomyces</em>, <em>Candida</em>, <em>Capillaria</em>, <em>Chaetomium</em>, <em>Chrysonilia</em>, <em>Cladophialophora</em>, <em>Cladosporium</em>, <em>Conidiobolus</em>, <em>Contracaecum</em>, <em>Cordylobia</em>, <em>Cryptococcus</em>, <em>Curvularia</em>, <em>Demodex</em>, <em>Dermatobia</em>, <em>Dientamoeba</em>, <em>Diphyllobothrium</em>, <em>Dirofilaria</em>, <em>Echinostoma</em>, <em>Entamoeba</em>, <em>Enterobius</em>, <em>Exophiala</em>, <em>Exserohilum</em>, <em>Fasciola</em>, <em>Fonsecaea</em>, <em>Fusarium</em>, <em>Giardia</em>, <em>Haloarcula</em>, <em>Halobacterium</em>, <em>Halococcus</em>, <em>Hendersonula</em>, <em>Heterophyes</em>, <em>Histomonas</em>, <em>Histoplasma</em>, <em>Hymenolepis</em>, <em>Hypomyces</em>, <em>Hysterothylacium</em>, <em>Leishmania</em>, <em>Malassezia</em>, <em>Malbranchea</em>, <em>Metagonimus</em>, <em>Meyerozyma</em>, <em>Microsporidium</em>, <em>Microsporum</em>, <em>Mortierella</em>, <em>Mucor</em>, <em>Mycocentrospora</em>, <em>Necator</em>, <em>Nectria</em>, <em>Ochroconis</em>, <em>Oesophagostomum</em>, <em>Oidiodendron</em>, <em>Opisthorchis</em>, <em>Pediculus</em>, <em>Phlebotomus</em>, <em>Phoma</em>, <em>Pichia</em>, <em>Piedraia</em>, <em>Pithomyces</em>, <em>Pityrosporum</em>, <em>Pneumocystis</em>, <em>Pseudallescheria</em>, <em>Pseudoterranova</em>, <em>Pulex</em>, <em>Rhizomucor</em>, <em>Rhizopus</em>, <em>Rhodotorula</em>, <em>Saccharomyces</em>, <em>Sarcoptes</em>, <em>Scolecobasidium</em>, <em>Scopulariopsis</em>, <em>Scytalidium</em>, <em>Spirometra</em>, <em>Sporobolomyces</em>, <em>Stachybotrys</em>, <em>Strongyloides</em>, <em>Syngamus</em>, <em>Taenia</em>, <em>Toxocara</em>, <em>Trichinella</em>, <em>Trichobilharzia</em>, <em>Trichoderma</em>, <em>Trichomonas</em>, <em>Trichophyton</em>, <em>Trichosporon</em>, <em>Trichostrongylus</em>, <em>Trichuris</em>, <em>Tritirachium</em>, <em>Trombicula</em>, <em>Trypanosoma</em>, <em>Tunga</em> or <em>Wuchereria</em>;</p></li>
<li><p>All other records have <code>prevalence = 2.0</code> in the <ahref="microorganisms.html">microorganisms</a> data set.</p></li>
</ul><p>When calculating the matching score, all characters in \(x\) and \(n\) are ignored that are other than A-Z, a-z, 0-9, spaces and parentheses.</p>
<p>All matches are sorted descending on their matching score and for all user input values, the top match will be returned. This will lead to the effect that e.g., <code>"E. coli"</code> will return the microbial ID of <em>Escherichia coli</em> (\(m = 0.688\), a highly prevalent microorganism found in humans) and not <em>Entamoeba coli</em> (\(m = 0.159\), a less prevalent microorganism in humans), although the latter would alphabetically come first.</p>
<h2id="reference-data-publicly-available">Reference Data Publicly Available<aclass="anchor"aria-label="anchor"href="#reference-data-publicly-available"></a></h2>
<p>All data sets in this <code>AMR</code> package (about microorganisms, antibiotics, R/SI interpretation, EUCAST rules, etc.) are publicly and freely available for download in the following formats: R, MS Excel, Apache Feather, Apache Parquet, SPSS, SAS, and Stata. We also provide tab-separated plain text files that are machine-readable and suitable for input in any software program, such as laboratory information systems. Please visit <ahref="https://msberends.github.io/AMR/articles/datasets.html">our website for the download links</a>. The actual files are of course available on <ahref="https://github.com/msberends/AMR/tree/main/data-raw"class="external-link">our GitHub repository</a>.</p>
<p></p><p><code>AMR</code> (for R). Free and open-source, licenced under the <atarget="_blank"href="https://github.com/msberends/AMR/blob/main/LICENSE"class="external-link">GNU General Public License version 2.0 (GPL-2)</a>.<br>Developed at the <atarget="_blank"href="https://www.rug.nl"class="external-link">University of Groningen</a> and <atarget="_blank"href="https://www.umcg.nl"class="external-link">University Medical Center Groningen</a> in The Netherlands.</p>