Calculate the matching score for microorganisms — mo_matching

This helper function is used by as.mo() to determine the most probable match of taxonomic records, based on user input.

mo_matching_score(x, fullname, uncertainty = 1)

Arguments

x	Any user input value(s)
fullname	A full taxonomic name, that exists in `microorganisms$fullname`
uncertainty	The level of uncertainty set in `as.mo()`, see `allow_uncertain` in that function (here, it defaults to 1, but is automatically determined in `as.mo()` based on the number of transformations needed to get to a result)

The matching score is based on four parameters:

A human pathogenic prevalence $P$, that is categorised into group 1, 2 and 3 (see as.mo());
A kingdom index $K$ is set as follows: Bacteria = 1, Fungi = 2, Protozoa = 3, Archaea = 4, and all others = 5;
The level of uncertainty $U$ that is needed to get to a result (1 to 3, see as.mo());
The Levenshtein distance $L$ is the distance between the user input and all taxonomic full names, with the text length of the user input being the maximum distance. A modified version of the Levenshtein distance $L'$ based on the text length of the full name $F$ is calculated as:

$$L' = F - \frac{0.5L}{F}$$

The final matching score $M$ is calculated as: $$M = L' \times \frac{1}{P K U} = \frac{F - 0.5L}{F P K U}$$

as.mo("E. coli")
mo_uncertainties()