Built site for AMR@3.0.1.9003: ba30b08

2026-06-01 00:21:53 +02:00 · 2025-11-24 10:42:21 +00:00
parent 7d16891987
commit 141fc468f8
161 changed files with 21798 additions and 313 deletions
--- a/reference/mo_matching_score.md
+++ b/reference/mo_matching_score.md
@@ -0,0 +1,214 @@
+# Calculate the Matching Score for Microorganisms
+
+This algorithm is used by
+[`as.mo()`](https://amr-for-r.org/reference/as.mo.md) and all the
+[`mo_*`](https://amr-for-r.org/reference/mo_property.md) functions to
+determine the most probable match of taxonomic records based on user
+input.
+
+## Usage
+
+``` r
+mo_matching_score(x, n)
+```
+
+## Arguments
+
+- x:
+
+  Any user input value(s).
+
+- n:
+
+  A full taxonomic name, that exists in
+  [`microorganisms$fullname`](https://amr-for-r.org/reference/microorganisms.md).
+
+## Note
+
+This algorithm was originally developed in 2018 and subsequently
+described in: Berends MS *et al.* (2022). **AMR: An R Package for
+Working with Antimicrobial Resistance Data**. *Journal of Statistical
+Software*, 104(3), 1-31;
+[doi:10.18637/jss.v104.i03](https://doi.org/10.18637/jss.v104.i03) .
+
+Later, the work of Bartlett A *et al.* about bacterial pathogens
+infecting humans (2022,
+[doi:10.1099/mic.0.001269](https://doi.org/10.1099/mic.0.001269) ) was
+incorporated, and optimalisations to the algorithm were made.
+
+## Matching Score for Microorganisms
+
+With ambiguous user input in
+[`as.mo()`](https://amr-for-r.org/reference/as.mo.md) and all the
+[`mo_*`](https://amr-for-r.org/reference/mo_property.md) functions, the
+returned results are chosen based on their matching score using
+`mo_matching_score()`. This matching score \\m\\, is calculated as:
+
+\$\$m\_{(x, n)} = \frac{l\_{n} - 0.5 \cdot \min \begin{cases}l\_{n} \\
+\textrm{lev}(x, n)\end{cases}}{l\_{n} \cdot p\_{n} \cdot k\_{n}}\$\$
+
+where:
+
+- \\x\\ is the user input;
+
+- \\n\\ is a taxonomic name (genus, species, and subspecies);
+
+- \\l_n\\ is the length of \\n\\;
+
+- \\lev\\ is the [Levenshtein distance
+  function](https://en.wikipedia.org/wiki/Levenshtein_distance)
+  (counting any insertion as 1, and any deletion or substitution as 2)
+  that is needed to change \\x\\ into \\n\\;
+
+- \\p_n\\ is the human pathogenic prevalence group of \\n\\, as
+  described below;
+
+- \\k_n\\ is the taxonomic kingdom of \\n\\, set as Bacteria = 1, Fungi
+  = 1.25, Protozoa = 1.5, Chromista = 1.75, Archaea = 2, others = 3.
+
+The grouping into human pathogenic prevalence \\p\\ is based on recent
+work from Bartlett *et al.* (2022,
+[doi:10.1099/mic.0.001269](https://doi.org/10.1099/mic.0.001269) ) who
+extensively studied medical-scientific literature to categorise all
+bacterial species into these groups:
+
+- **Established**, if a taxonomic species has infected at least three
+  persons in three or more references. These records have
+  `prevalence = 1.15` in the
+  [microorganisms](https://amr-for-r.org/reference/microorganisms.md)
+  data set;
+
+- **Putative**, if a taxonomic species has fewer than three known cases.
+  These records have `prevalence = 1.25` in the
+  [microorganisms](https://amr-for-r.org/reference/microorganisms.md)
+  data set.
+
+Furthermore,
+
+- Genera from the World Health Organization's (WHO) Priority Pathogen
+  List have `prevalence = 1.0` in the
+  [microorganisms](https://amr-for-r.org/reference/microorganisms.md)
+  data set;
+
+- Any genus present in the **established** list also has
+  `prevalence = 1.15` in the
+  [microorganisms](https://amr-for-r.org/reference/microorganisms.md)
+  data set;
+
+- Any other genus present in the **putative** list has
+  `prevalence = 1.25` in the
+  [microorganisms](https://amr-for-r.org/reference/microorganisms.md)
+  data set;
+
+- Any other species or subspecies of which the genus is present in the
+  two aforementioned groups, has `prevalence = 1.5` in the
+  [microorganisms](https://amr-for-r.org/reference/microorganisms.md)
+  data set;
+
+- Any *non-bacterial* genus, species or subspecies of which the genus is
+  present in the following list, has `prevalence = 1.25` in the
+  [microorganisms](https://amr-for-r.org/reference/microorganisms.md)
+  data set: *Absidia*, *Acanthamoeba*, *Acremonium*, *Actinomucor*,
+  *Aedes*, *Alternaria*, *Amoeba*, *Ancylostoma*, *Angiostrongylus*,
+  *Anisakis*, *Anopheles*, *Apophysomyces*, *Arthroderma*,
+  *Aspergillus*, *Aureobasidium*, *Basidiobolus*, *Beauveria*,
+  *Bipolaris*, *Blastobotrys*, *Blastocystis*, *Blastomyces*, *Candida*,
+  *Capillaria*, *Chaetomium*, *Chilomastix*, *Chrysonilia*,
+  *Chrysosporium*, *Cladophialophora*, *Cladosporium*, *Clavispora*,
+  *Coccidioides*, *Cokeromyces*, *Conidiobolus*, *Coniochaeta*,
+  *Contracaecum*, *Cordylobia*, *Cryptococcus*, *Cryptosporidium*,
+  *Cunninghamella*, *Curvularia*, *Cyberlindnera*, *Debaryozyma*,
+  *Demodex*, *Dermatobia*, *Dientamoeba*, *Diphyllobothrium*,
+  *Dirofilaria*, *Echinostoma*, *Entamoeba*, *Enterobius*,
+  *Epidermophyton*, *Exidia*, *Exophiala*, *Exserohilum*, *Fasciola*,
+  *Fonsecaea*, *Fusarium*, *Geotrichum*, *Giardia*, *Graphium*,
+  *Haloarcula*, *Halobacterium*, *Halococcus*, *Hansenula*,
+  *Hendersonula*, *Heterophyes*, *Histomonas*, *Histoplasma*, *Hortaea*,
+  *Hymenolepis*, *Hypomyces*, *Hysterothylacium*, *Kloeckera*,
+  *Kluyveromyces*, *Kodamaea*, *Lacazia*, *Leishmania*, *Lichtheimia*,
+  *Lodderomyces*, *Lomentospora*, *Madurella*, *Malassezia*,
+  *Malbranchea*, *Metagonimus*, *Meyerozyma*, *Microsporidium*,
+  *Microsporum*, *Millerozyma*, *Mortierella*, *Mucor*,
+  *Mycocentrospora*, *Nannizzia*, *Necator*, *Nectria*, *Ochroconis*,
+  *Oesophagostomum*, *Oidiodendron*, *Opisthorchis*, *Paecilomyces*,
+  *Paracoccidioides*, *Pediculus*, *Penicillium*, *Phaeoacremonium*,
+  *Phaeomoniella*, *Phialophora*, *Phlebotomus*, *Phoma*, *Pichia*,
+  *Piedraia*, *Pithomyces*, *Pityrosporum*, *Pneumocystis*,
+  *Pseudallescheria*, *Pseudoscopulariopsis*, *Pseudoterranova*,
+  *Pulex*, *Purpureocillium*, *Quambalaria*, *Rhinocladiella*,
+  *Rhizomucor*, *Rhizopus*, *Rhodotorula*, *Saccharomyces*, *Saksenaea*,
+  *Saprochaete*, *Sarcoptes*, *Scedosporium*, *Schistosoma*,
+  *Schizosaccharomyces*, *Scolecobasidium*, *Scopulariopsis*,
+  *Scytalidium*, *Spirometra*, *Sporobolomyces*, *Sporopachydermia*,
+  *Sporothrix*, *Sporotrichum*, *Stachybotrys*, *Strongyloides*,
+  *Syncephalastrum*, *Syngamus*, *Taenia*, *Talaromyces*, *Teleomorph*,
+  *Toxocara*, *Trichinella*, *Trichobilharzia*, *Trichoderma*,
+  *Trichomonas*, *Trichophyton*, *Trichosporon*, *Trichostrongylus*,
+  *Trichuris*, *Tritirachium*, *Trombicula*, *Trypanosoma*, *Tunga*,
+  *Ulocladium*, *Ustilago*, *Verticillium*, *Wallemia*, *Wangiella*,
+  *Wickerhamomyces*, *Wuchereria*, *Yarrowia*, or *Zygosaccharomyces*;
+
+- All other records have `prevalence = 2.0` in the
+  [microorganisms](https://amr-for-r.org/reference/microorganisms.md)
+  data set.
+
+When calculating the matching score, all characters in \\x\\ and \\n\\
+are ignored that are other than A-Z, a-z, 0-9, spaces and parentheses.
+
+All matches are sorted descending on their matching score and for all
+user input values, the top match will be returned. This will lead to the
+effect that e.g., `"E. coli"` will return the microbial ID of
+*Escherichia coli* (\\m = 0.688\\, a highly prevalent microorganism
+found in humans) and not *Entamoeba coli* (\\m = 0.381\\, a less
+prevalent microorganism in humans), although the latter would
+alphabetically come first.
+
+## Download Our Reference Data
+
+All reference data sets in the AMR package - including information on
+microorganisms, antimicrobials, and clinical breakpoints - are freely
+available for download in multiple formats: R, MS Excel, Apache Feather,
+Apache Parquet, SPSS, and Stata.
+
+For maximum compatibility, we also provide machine-readable,
+tab-separated plain text files suitable for use in any software,
+including laboratory information systems.
+
+Visit [our website for direct download
+links](https://amr-for-r.org/articles/datasets.html), or explore the
+actual files in [our GitHub
+repository](https://github.com/msberends/AMR/tree/main/data-raw/datasets).
+
+## Examples
+
+``` r
+mo_reset_session()
+#> ℹ Reset 17 previously matched input values.
+
+as.mo("E. coli")
+#> Class 'mo'
+#> [1] B_ESCHR_COLI
+mo_uncertainties()
+#> Matching scores are based on the resemblance between the input and the full
+#> taxonomic name, and the pathogenicity in humans. See `?mo_matching_score`.
+#> Colour keys:  0.000-0.549  0.550-0.649  0.650-0.749  0.750-1.000 
+#> 
+#> --------------------------------------------------------------------------------
+#> "E. coli" -> Escherichia coli (B_ESCHR_COLI, 0.688)
+#> Also matched: Enterococcus crotali (0.650), Escherichia coli coli
+#>               (0.643), Escherichia coli expressing (0.611), Enterobacter cowanii
+#>               (0.600), Enterococcus columbae (0.595), Enterococcus camelliae (0.591),
+#>               Enterococcus casseliflavus (0.577), Enterobacter cloacae cloacae
+#>               (0.571), Enterobacter cloacae complex (0.571), and Enterobacter cloacae
+#>               dissolvens (0.565)
+#> 
+#> Only the first 10 other matches of each record are shown. Run
+#> `print(mo_uncertainties(), n = ...)` to view more entries, or save
+#> `mo_uncertainties()` to an object.
+
+mo_matching_score(
+  x = "E. coli",
+  n = c("Escherichia coli", "Entamoeba coli")
+)
+#> [1] 0.6875000 0.3809524
+```