This helper function is used by as.mo() to determine the most probable match of taxonomic records, based on user input.

mo_matching_score(x, n)

Arguments

x

Any user input value(s)

n

A full taxonomic name, that exists in microorganisms$fullname

uncertainty

The level of uncertainty set in as.mo(), see allow_uncertain in that function (here, it defaults to 1, but is automatically determined in as.mo() based on the number of transformations needed to get to a result)

Matching score for microorganisms

With ambiguous user input in as.mo() and all the mo_* functions, the returned results are chosen based on their matching score using mo_matching_score(). This matching score \(m\) is calculated as:

$$m_{(x, n)} = \frac{l_{n} - 0.5 \times \min \begin{cases}l_{n} \\ \operatorname{lev}(x, n)\end{cases}}{l_{n} p k}$$

where:

  • \(x\) is the user input;

  • \(n\) is a taxonomic name (genus, species and subspecies);

  • \(l_{n}\) is the length of the taxonomic name;

  • \(\operatorname{lev}\) is the Levenshtein distance function;

  • \(p\) is the human pathogenic prevalence, categorised into group \(1\), \(2\) and \(3\) (see Details in ?as.mo), meaning that \(p = \{1, 2 , 3\}\);

  • \(k\) is the kingdom index, set as follows: Bacteria = \(1\), Fungi = \(2\), Protozoa = \(3\), Archaea = \(4\), and all others = \(5\), meaning that \(k = \{1, 2 , 3, 4, 5\}\).

All matches are sorted descending on their matching score and for all user input values, the top match will be returned.

Examples

as.mo("E. coli")
mo_uncertainties()

mo_matching_score("E. coli", "Escherichia coli")