1
0
mirror of https://github.com/msberends/AMR.git synced 2025-07-23 23:43:14 +02:00

(v1.3.0.9030) matching score update

This commit is contained in:
2020-09-26 16:26:01 +02:00
parent 9667c2994f
commit 050a9a04fb
33 changed files with 249 additions and 175 deletions

View File

@ -82,7 +82,7 @@
</button>
<span class="navbar-brand">
<a class="navbar-link" href="../index.html">AMR (for R)</a>
<span class="version label label-default" data-toggle="tooltip" data-placement="bottom" title="Latest development version">1.3.0.9028</span>
<span class="version label label-default" data-toggle="tooltip" data-placement="bottom" title="Latest development version">1.3.0.9030</span>
</span>
</div>
@ -242,7 +242,7 @@
<p>This helper function is used by <code><a href='as.mo.html'>as.mo()</a></code> to determine the most probable match of taxonomic records, based on user input.</p>
</div>
<pre class="usage"><span class='fu'>mo_matching_score</span>(<span class='kw'>x</span>, <span class='kw'>fullname</span>, uncertainty = <span class='fl'>1</span>)</pre>
<pre class="usage"><span class='fu'>mo_matching_score</span>(<span class='kw'>x</span>, <span class='kw'>n</span>)</pre>
<h2 class="hasAnchor" id="arguments"><a class="anchor" href="#arguments"></a>Arguments</h2>
<table class="ref-arguments">
@ -252,7 +252,7 @@
<td><p>Any user input value(s)</p></td>
</tr>
<tr>
<th>fullname</th>
<th>n</th>
<td><p>A full taxonomic name, that exists in <code><a href='microorganisms.html'>microorganisms$fullname</a></code></p></td>
</tr>
<tr>
@ -261,22 +261,28 @@
</tr>
</table>
<h2 class="hasAnchor" id="details"><a class="anchor" href="#details"></a>Details</h2>
<h2 class="hasAnchor" id="matching-score-for-microorganisms"><a class="anchor" href="#matching-score-for-microorganisms"></a>Matching score for microorganisms</h2>
<p>The matching score is based on four parameters:</p><ol>
<li><p>A human pathogenic prevalence \(P\), that is categorised into group 1, 2 and 3 (see <code><a href='as.mo.html'>as.mo()</a></code>);</p></li>
<li><p>A kingdom index \(K\) is set as follows: Bacteria = 1, Fungi = 2, Protozoa = 3, Archaea = 4, and all others = 5;</p></li>
<li><p>The level of uncertainty \(U\) that is needed to get to a result (1 to 3, see <code><a href='as.mo.html'>as.mo()</a></code>);</p></li>
<li><p>The <a href='https://en.wikipedia.org/wiki/Levenshtein_distance'>Levenshtein distance</a> \(L\) is the distance between the user input and all taxonomic full names, with the text length of the user input being the maximum distance. A modified version of the Levenshtein distance \(L'\) based on the text length of the full name \(F\) is calculated as:</p></li>
</ol>
<p>$$L' = 1 - \frac{0.5L}{F}$$</p>
<p>The final matching score \(M\) is calculated as:
$$M = L' \times \frac{1}{P K U} = \frac{F - 0.5L}{F P K U}$$</p>
<p>With ambiguous user input in <code><a href='as.mo.html'>as.mo()</a></code> and all the <code><a href='mo_property.html'>mo_*</a></code> functions, the returned results are chosen based on their matching score using <code>mo_matching_score()</code>. This matching score \(m\) is calculated as:</p>
<p>$$m_{(x, n)} = \frac{l_{n} - 0.5 \times \min \begin{cases}l_{n} \\ \operatorname{lev}(x, n)\end{cases}}{l_{n} p k}$$</p>
<p>where:</p><ul>
<li><p>\(x\) is the user input;</p></li>
<li><p>\(n\) is a taxonomic name (genus, species and subspecies);</p></li>
<li><p>\(l_{n}\) is the length of the taxonomic name;</p></li>
<li><p>\(\operatorname{lev}\) is the <a href='https://en.wikipedia.org/wiki/Levenshtein_distance'>Levenshtein distance</a> function;</p></li>
<li><p>\(p\) is the human pathogenic prevalence, categorised into group \(1\), \(2\) and \(3\) (see <em>Details</em> in <code><a href='as.mo.html'>?as.mo</a></code>), meaning that \(p = \{1, 2 , 3\}\);</p></li>
<li><p>\(k\) is the kingdom index, set as follows: Bacteria = \(1\), Fungi = \(2\), Protozoa = \(3\), Archaea = \(4\), and all others = \(5\), meaning that \(k = \{1, 2 , 3, 4, 5\}\).</p></li>
</ul>
<p>All matches are sorted descending on their matching score and for all user input values, the top match will be returned.</p>
<h2 class="hasAnchor" id="examples"><a class="anchor" href="#examples"></a>Examples</h2>
<pre class="examples"><span class='fu'><a href='as.mo.html'>as.mo</a></span>(<span class='st'>"E. coli"</span>)
<span class='fu'><a href='as.mo.html'>mo_uncertainties</a></span>()
<span class='fu'>mo_matching_score</span>(<span class='st'>"E. coli"</span>, <span class='st'>"Escherichia coli"</span>)
</pre>
</div>
<div class="col-md-3 hidden-xs hidden-sm" id="pkgdown-sidebar">