AMR/man/mo_matching_score.Rd

66 lines
8.1 KiB
Plaintext
Raw Normal View History

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/mo_matching_score.R
\name{mo_matching_score}
\alias{mo_matching_score}
\title{Calculate the Matching Score for Microorganisms}
\usage{
2020-09-26 16:26:01 +02:00
mo_matching_score(x, n)
}
\arguments{
\item{x}{Any user input value(s)}
2020-09-26 16:26:01 +02:00
\item{n}{A full taxonomic name, that exists in \code{\link[=microorganisms]{microorganisms$fullname}}}
}
\description{
2020-11-05 01:11:49 +01:00
This algorithm is used by \code{\link[=as.mo]{as.mo()}} and all the \code{\link[=mo_property]{mo_*}} functions to determine the most probable match of taxonomic records based on user input.
}
2022-10-05 09:12:22 +02:00
\note{
This algorithm was described in: Berends MS \emph{et al.} (2022). \strong{AMR: An R Package for Working with Antimicrobial Resistance Data}. \emph{Journal of Statistical Software}, 104(3), 1-31; \doi{10.18637/jss.v104.i03}.
}
\section{Matching Score for Microorganisms}{
2020-09-26 16:26:01 +02:00
2020-09-28 11:00:59 +02:00
With ambiguous user input in \code{\link[=as.mo]{as.mo()}} and all the \code{\link[=mo_property]{mo_*}} functions, the returned results are chosen based on their matching score using \code{\link[=mo_matching_score]{mo_matching_score()}}. This matching score \eqn{m}, is calculated as:
2020-09-26 16:26:01 +02:00
\ifelse{latex}{\deqn{m_{(x, n)} = \frac{l_{n} - 0.5 \cdot \min \begin{cases}l_{n} \\ \textrm{lev}(x, n)\end{cases}}{l_{n} \cdot p_{n} \cdot k_{n}}}}{\ifelse{html}{\figure{mo_matching_score.png}{options: width="300" alt="mo matching score"}}{m(x, n) = ( l_n * min(l_n, lev(x, n) ) ) / ( l_n * p_n * k_n )}}
2020-09-26 16:26:01 +02:00
where:
\itemize{
\item \ifelse{html}{\out{<i>x</i> is the user input;}}{\eqn{x} is the user input;}
\item \ifelse{html}{\out{<i>n</i> is a taxonomic name (genus, species, and subspecies);}}{\eqn{n} is a taxonomic name (genus, species, and subspecies);}
\item \ifelse{html}{\out{<i>l<sub>n</sub></i> is the length of <i>n</i>;}}{l_n is the length of \eqn{n};}
2022-10-05 09:12:22 +02:00
\item \ifelse{html}{\out{<i>lev</i> is the <a href="https://en.wikipedia.org/wiki/Levenshtein_distance">Levenshtein distance function</a> (counting any insertion as 1, and any deletion or substitution as 2) that is needed to change <i>x</i> into <i>n</i>;}}{lev is the Levenshtein distance function (counting any insertion as 1, and any deletion or substitution as 2) that is needed to change \eqn{x} into \eqn{n};}
\item \ifelse{html}{\out{<i>p<sub>n</sub></i> is the human pathogenic prevalence group of <i>n</i>, as described below;}}{p_n is the human pathogenic prevalence group of \eqn{n}, as described below;}
\item \ifelse{html}{\out{<i>k<sub>n</sub></i> is the taxonomic kingdom of <i>n</i>, set as Bacteria = 1, Fungi = 2, Protozoa = 3, Archaea = 4, others = 5.}}{l_n is the taxonomic kingdom of \eqn{n}, set as Bacteria = 1, Fungi = 2, Protozoa = 3, Archaea = 4, others = 5.}
2020-09-26 16:26:01 +02:00
}
2022-10-05 09:12:22 +02:00
The grouping into human pathogenic prevalence (\eqn{p}) is based on experience from several microbiological laboratories in the Netherlands in conjunction with international reports on pathogen prevalence:
2020-09-26 16:51:17 +02:00
2022-10-05 09:12:22 +02:00
\strong{Group 1} (most prevalent microorganisms) consists of all microorganisms where the taxonomic class is Gammaproteobacteria or where the taxonomic genus is \emph{Enterococcus}, \emph{Staphylococcus} or \emph{Streptococcus}. This group consequently contains all common Gram-negative bacteria, such as \emph{Pseudomonas} and \emph{Legionella} and all species within the order Enterobacterales.
2022-12-11 11:44:29 +01:00
\strong{Group 2} consists of all microorganisms where the taxonomic phylum is Pseudomonadota (previously named Proteobacteria), Bacillota (previously named Firmicutes), Actinomycetota (previously named Actinobacteria) or Sarcomastigophora, or where the taxonomic genus is \emph{Absidia}, \emph{Acanthamoeba}, \emph{Acholeplasma}, \emph{Acremonium}, \emph{Actinotignum}, \emph{Aedes}, \emph{Alistipes}, \emph{Alloprevotella}, \emph{Alternaria}, \emph{Amoeba}, \emph{Anaerosalibacter}, \emph{Ancylostoma}, \emph{Angiostrongylus}, \emph{Anisakis}, \emph{Anopheles}, \emph{Apophysomyces}, \emph{Arachnia}, \emph{Aspergillus}, \emph{Aureobasidium}, \emph{Bacteroides}, \emph{Basidiobolus}, \emph{Beauveria}, \emph{Bergeyella}, \emph{Blastocystis}, \emph{Blastomyces}, \emph{Borrelia}, \emph{Brachyspira}, \emph{Branhamella}, \emph{Butyricimonas}, \emph{Candida}, \emph{Capillaria}, \emph{Capnocytophaga}, \emph{Catabacter}, \emph{Cetobacterium}, \emph{Chaetomium}, \emph{Chlamydia}, \emph{Chlamydophila}, \emph{Christensenella}, \emph{Chryseobacterium}, \emph{Chrysonilia}, \emph{Cladophialophora}, \emph{Cladosporium}, \emph{Conidiobolus}, \emph{Contracaecum}, \emph{Cordylobia}, \emph{Cryptococcus}, \emph{Curvularia}, \emph{Deinococcus}, \emph{Demodex}, \emph{Dermatobia}, \emph{Dientamoeba}, \emph{Diphyllobothrium}, \emph{Dirofilaria}, \emph{Dysgonomonas}, \emph{Echinostoma}, \emph{Elizabethkingia}, \emph{Empedobacter}, \emph{Entamoeba}, \emph{Enterobius}, \emph{Exophiala}, \emph{Exserohilum}, \emph{Fasciola}, \emph{Flavobacterium}, \emph{Fonsecaea}, \emph{Fusarium}, \emph{Fusobacterium}, \emph{Giardia}, \emph{Haloarcula}, \emph{Halobacterium}, \emph{Halococcus}, \emph{Hendersonula}, \emph{Heterophyes}, \emph{Histomonas}, \emph{Histoplasma}, \emph{Hymenolepis}, \emph{Hypomyces}, \emph{Hysterothylacium}, \emph{Leishmania}, \emph{Lelliottia}, \emph{Leptosphaeria}, \emph{Leptotrichia}, \emph{Lucilia}, \emph{Lumbricus}, \emph{Malassezia}, \emph{Malbranchea}, \emph{Metagonimus}, \emph{Meyerozyma}, \emph{Microsporidium}, \emph{Microsporum}, \emph{Mortierella}, \emph{Mucor}, \emph{Mycocentrospora}, \emph{Mycoplasma}, \emph{Myroides}, \emph{Necator}, \emph{Nectria}, \emph{Ochroconis}, \emph{Odoribacter}, \emph{Oesophagostomum}, \emph{Oidiodendron}, \emph{Opisthorchis}, \emph{Ornithobacterium}, \emph{Parabacteroides}, \emph{Pediculus}, \emph{Pedobacter}, \emph{Phlebotomus}, \emph{Phocaeicola}, \emph{Phocanema}, \emph{Phoma}, \emph{Pichia}, \emph{Piedraia}, \emph{Pithomyces}, \emph{Pityrosporum}, \emph{Pneumocystis}, \emph{Porphyromonas}, \emph{Prevotella}, \emph{Pseudallescheria}, \emph{Pseudoterranova}, \emph{Pulex}, \emph{Rhizomucor}, \emph{Rhizopus}, \emph{Rhodotorula}, \emph{Riemerella}, \emph{Saccharomyces}, \emph{Sarcoptes}, \emph{Scolecobasidium}, \emph{Scopulariopsis}, \emph{Scytalidium}, \emph{Sphingobacterium}, \emph{Spirometra}, \emph{Spiroplasma}, \emph{Sporobolomyces}, \emph{Stachybotrys}, \emph{Streptobacillus}, \emph{Strongyloides}, \emph{Syngamus}, \emph{Taenia}, \emph{Tannerella}, \emph{Tenacibaculum}, \emph{Terrimonas}, \emph{Toxocara}, \emph{Treponema}, \emph{Trichinella}, \emph{Trichobilharzia}, \emph{Trichoderma}, \emph{Trichomonas}, \emph{Trichophyton}, \emph{Trichosporon}, \emph{Trichostrongylus}, \emph{Trichuris}, \emph{Tritirachium}, \emph{Trombicula}, \emph{Trypanosoma}, \emph{Tunga}, \emph{Ureaplasma}, \emph{Victivallis}, \emph{Wautersiella}, \emph{Weeksella} or \emph{Wuchereria}.
2022-10-05 09:12:22 +02:00
\strong{Group 3} consists of all other microorganisms.
All characters in \eqn{x} and \eqn{n} are ignored that are other than A-Z, a-z, 0-9, spaces and parentheses.
2022-12-09 13:37:08 +01:00
All matches are sorted descending on their matching score and for all user input values, the top match will be returned. This will lead to the effect that e.g., \code{"E. coli"} will return the microbial ID of \emph{Escherichia coli} (\eqn{m = 0.688}, a highly prevalent microorganism found in humans) and not \emph{Entamoeba coli} (\eqn{m = 0.079}, a less prevalent microorganism in humans), although the latter would alphabetically come first.
}
2020-09-26 16:26:01 +02:00
\section{Reference Data Publicly Available}{
2022-08-27 20:49:37 +02:00
All data sets in this \code{AMR} package (about microorganisms, antibiotics, R/SI interpretation, EUCAST rules, etc.) are publicly and freely available for download in the following formats: R, MS Excel, Apache Feather, Apache Parquet, SPSS, SAS, and Stata. We also provide tab-separated plain text files that are machine-readable and suitable for input in any software program, such as laboratory information systems. Please visit \href{https://msberends.github.io/AMR/articles/datasets.html}{our website for the download links}. The actual files are of course available on \href{https://github.com/msberends/AMR/tree/main/data-raw}{our GitHub repository}.
}
\examples{
as.mo("E. coli")
mo_uncertainties()
2020-09-26 16:26:01 +02:00
2022-08-28 10:31:50 +02:00
mo_matching_score(
x = "E. coli",
n = c("Escherichia coli", "Entamoeba coli")
)
}
2020-11-05 01:11:49 +01:00
\author{
2022-08-28 22:38:08 +02:00
Dr. Matthijs Berends
2020-11-05 01:11:49 +01:00
}