mirror of
https://github.com/msberends/AMR.git
synced 2026-02-09 11:12:55 +01:00
Built site for AMR@3.0.1.9003: ba30b08
This commit is contained in:
214
reference/mo_matching_score.md
Normal file
214
reference/mo_matching_score.md
Normal file
@@ -0,0 +1,214 @@
|
||||
# Calculate the Matching Score for Microorganisms
|
||||
|
||||
This algorithm is used by
|
||||
[`as.mo()`](https://amr-for-r.org/reference/as.mo.md) and all the
|
||||
[`mo_*`](https://amr-for-r.org/reference/mo_property.md) functions to
|
||||
determine the most probable match of taxonomic records based on user
|
||||
input.
|
||||
|
||||
## Usage
|
||||
|
||||
``` r
|
||||
mo_matching_score(x, n)
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
- x:
|
||||
|
||||
Any user input value(s).
|
||||
|
||||
- n:
|
||||
|
||||
A full taxonomic name, that exists in
|
||||
[`microorganisms$fullname`](https://amr-for-r.org/reference/microorganisms.md).
|
||||
|
||||
## Note
|
||||
|
||||
This algorithm was originally developed in 2018 and subsequently
|
||||
described in: Berends MS *et al.* (2022). **AMR: An R Package for
|
||||
Working with Antimicrobial Resistance Data**. *Journal of Statistical
|
||||
Software*, 104(3), 1-31;
|
||||
[doi:10.18637/jss.v104.i03](https://doi.org/10.18637/jss.v104.i03) .
|
||||
|
||||
Later, the work of Bartlett A *et al.* about bacterial pathogens
|
||||
infecting humans (2022,
|
||||
[doi:10.1099/mic.0.001269](https://doi.org/10.1099/mic.0.001269) ) was
|
||||
incorporated, and optimalisations to the algorithm were made.
|
||||
|
||||
## Matching Score for Microorganisms
|
||||
|
||||
With ambiguous user input in
|
||||
[`as.mo()`](https://amr-for-r.org/reference/as.mo.md) and all the
|
||||
[`mo_*`](https://amr-for-r.org/reference/mo_property.md) functions, the
|
||||
returned results are chosen based on their matching score using
|
||||
`mo_matching_score()`. This matching score \\m\\, is calculated as:
|
||||
|
||||
\$\$m\_{(x, n)} = \frac{l\_{n} - 0.5 \cdot \min \begin{cases}l\_{n} \\
|
||||
\textrm{lev}(x, n)\end{cases}}{l\_{n} \cdot p\_{n} \cdot k\_{n}}\$\$
|
||||
|
||||
where:
|
||||
|
||||
- \\x\\ is the user input;
|
||||
|
||||
- \\n\\ is a taxonomic name (genus, species, and subspecies);
|
||||
|
||||
- \\l_n\\ is the length of \\n\\;
|
||||
|
||||
- \\lev\\ is the [Levenshtein distance
|
||||
function](https://en.wikipedia.org/wiki/Levenshtein_distance)
|
||||
(counting any insertion as 1, and any deletion or substitution as 2)
|
||||
that is needed to change \\x\\ into \\n\\;
|
||||
|
||||
- \\p_n\\ is the human pathogenic prevalence group of \\n\\, as
|
||||
described below;
|
||||
|
||||
- \\k_n\\ is the taxonomic kingdom of \\n\\, set as Bacteria = 1, Fungi
|
||||
= 1.25, Protozoa = 1.5, Chromista = 1.75, Archaea = 2, others = 3.
|
||||
|
||||
The grouping into human pathogenic prevalence \\p\\ is based on recent
|
||||
work from Bartlett *et al.* (2022,
|
||||
[doi:10.1099/mic.0.001269](https://doi.org/10.1099/mic.0.001269) ) who
|
||||
extensively studied medical-scientific literature to categorise all
|
||||
bacterial species into these groups:
|
||||
|
||||
- **Established**, if a taxonomic species has infected at least three
|
||||
persons in three or more references. These records have
|
||||
`prevalence = 1.15` in the
|
||||
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
|
||||
data set;
|
||||
|
||||
- **Putative**, if a taxonomic species has fewer than three known cases.
|
||||
These records have `prevalence = 1.25` in the
|
||||
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
|
||||
data set.
|
||||
|
||||
Furthermore,
|
||||
|
||||
- Genera from the World Health Organization's (WHO) Priority Pathogen
|
||||
List have `prevalence = 1.0` in the
|
||||
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
|
||||
data set;
|
||||
|
||||
- Any genus present in the **established** list also has
|
||||
`prevalence = 1.15` in the
|
||||
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
|
||||
data set;
|
||||
|
||||
- Any other genus present in the **putative** list has
|
||||
`prevalence = 1.25` in the
|
||||
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
|
||||
data set;
|
||||
|
||||
- Any other species or subspecies of which the genus is present in the
|
||||
two aforementioned groups, has `prevalence = 1.5` in the
|
||||
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
|
||||
data set;
|
||||
|
||||
- Any *non-bacterial* genus, species or subspecies of which the genus is
|
||||
present in the following list, has `prevalence = 1.25` in the
|
||||
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
|
||||
data set: *Absidia*, *Acanthamoeba*, *Acremonium*, *Actinomucor*,
|
||||
*Aedes*, *Alternaria*, *Amoeba*, *Ancylostoma*, *Angiostrongylus*,
|
||||
*Anisakis*, *Anopheles*, *Apophysomyces*, *Arthroderma*,
|
||||
*Aspergillus*, *Aureobasidium*, *Basidiobolus*, *Beauveria*,
|
||||
*Bipolaris*, *Blastobotrys*, *Blastocystis*, *Blastomyces*, *Candida*,
|
||||
*Capillaria*, *Chaetomium*, *Chilomastix*, *Chrysonilia*,
|
||||
*Chrysosporium*, *Cladophialophora*, *Cladosporium*, *Clavispora*,
|
||||
*Coccidioides*, *Cokeromyces*, *Conidiobolus*, *Coniochaeta*,
|
||||
*Contracaecum*, *Cordylobia*, *Cryptococcus*, *Cryptosporidium*,
|
||||
*Cunninghamella*, *Curvularia*, *Cyberlindnera*, *Debaryozyma*,
|
||||
*Demodex*, *Dermatobia*, *Dientamoeba*, *Diphyllobothrium*,
|
||||
*Dirofilaria*, *Echinostoma*, *Entamoeba*, *Enterobius*,
|
||||
*Epidermophyton*, *Exidia*, *Exophiala*, *Exserohilum*, *Fasciola*,
|
||||
*Fonsecaea*, *Fusarium*, *Geotrichum*, *Giardia*, *Graphium*,
|
||||
*Haloarcula*, *Halobacterium*, *Halococcus*, *Hansenula*,
|
||||
*Hendersonula*, *Heterophyes*, *Histomonas*, *Histoplasma*, *Hortaea*,
|
||||
*Hymenolepis*, *Hypomyces*, *Hysterothylacium*, *Kloeckera*,
|
||||
*Kluyveromyces*, *Kodamaea*, *Lacazia*, *Leishmania*, *Lichtheimia*,
|
||||
*Lodderomyces*, *Lomentospora*, *Madurella*, *Malassezia*,
|
||||
*Malbranchea*, *Metagonimus*, *Meyerozyma*, *Microsporidium*,
|
||||
*Microsporum*, *Millerozyma*, *Mortierella*, *Mucor*,
|
||||
*Mycocentrospora*, *Nannizzia*, *Necator*, *Nectria*, *Ochroconis*,
|
||||
*Oesophagostomum*, *Oidiodendron*, *Opisthorchis*, *Paecilomyces*,
|
||||
*Paracoccidioides*, *Pediculus*, *Penicillium*, *Phaeoacremonium*,
|
||||
*Phaeomoniella*, *Phialophora*, *Phlebotomus*, *Phoma*, *Pichia*,
|
||||
*Piedraia*, *Pithomyces*, *Pityrosporum*, *Pneumocystis*,
|
||||
*Pseudallescheria*, *Pseudoscopulariopsis*, *Pseudoterranova*,
|
||||
*Pulex*, *Purpureocillium*, *Quambalaria*, *Rhinocladiella*,
|
||||
*Rhizomucor*, *Rhizopus*, *Rhodotorula*, *Saccharomyces*, *Saksenaea*,
|
||||
*Saprochaete*, *Sarcoptes*, *Scedosporium*, *Schistosoma*,
|
||||
*Schizosaccharomyces*, *Scolecobasidium*, *Scopulariopsis*,
|
||||
*Scytalidium*, *Spirometra*, *Sporobolomyces*, *Sporopachydermia*,
|
||||
*Sporothrix*, *Sporotrichum*, *Stachybotrys*, *Strongyloides*,
|
||||
*Syncephalastrum*, *Syngamus*, *Taenia*, *Talaromyces*, *Teleomorph*,
|
||||
*Toxocara*, *Trichinella*, *Trichobilharzia*, *Trichoderma*,
|
||||
*Trichomonas*, *Trichophyton*, *Trichosporon*, *Trichostrongylus*,
|
||||
*Trichuris*, *Tritirachium*, *Trombicula*, *Trypanosoma*, *Tunga*,
|
||||
*Ulocladium*, *Ustilago*, *Verticillium*, *Wallemia*, *Wangiella*,
|
||||
*Wickerhamomyces*, *Wuchereria*, *Yarrowia*, or *Zygosaccharomyces*;
|
||||
|
||||
- All other records have `prevalence = 2.0` in the
|
||||
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
|
||||
data set.
|
||||
|
||||
When calculating the matching score, all characters in \\x\\ and \\n\\
|
||||
are ignored that are other than A-Z, a-z, 0-9, spaces and parentheses.
|
||||
|
||||
All matches are sorted descending on their matching score and for all
|
||||
user input values, the top match will be returned. This will lead to the
|
||||
effect that e.g., `"E. coli"` will return the microbial ID of
|
||||
*Escherichia coli* (\\m = 0.688\\, a highly prevalent microorganism
|
||||
found in humans) and not *Entamoeba coli* (\\m = 0.381\\, a less
|
||||
prevalent microorganism in humans), although the latter would
|
||||
alphabetically come first.
|
||||
|
||||
## Download Our Reference Data
|
||||
|
||||
All reference data sets in the AMR package - including information on
|
||||
microorganisms, antimicrobials, and clinical breakpoints - are freely
|
||||
available for download in multiple formats: R, MS Excel, Apache Feather,
|
||||
Apache Parquet, SPSS, and Stata.
|
||||
|
||||
For maximum compatibility, we also provide machine-readable,
|
||||
tab-separated plain text files suitable for use in any software,
|
||||
including laboratory information systems.
|
||||
|
||||
Visit [our website for direct download
|
||||
links](https://amr-for-r.org/articles/datasets.html), or explore the
|
||||
actual files in [our GitHub
|
||||
repository](https://github.com/msberends/AMR/tree/main/data-raw/datasets).
|
||||
|
||||
## Examples
|
||||
|
||||
``` r
|
||||
mo_reset_session()
|
||||
#> ℹ Reset 17 previously matched input values.
|
||||
|
||||
as.mo("E. coli")
|
||||
#> Class 'mo'
|
||||
#> [1] B_ESCHR_COLI
|
||||
mo_uncertainties()
|
||||
#> Matching scores are based on the resemblance between the input and the full
|
||||
#> taxonomic name, and the pathogenicity in humans. See `?mo_matching_score`.
|
||||
#> Colour keys: 0.000-0.549 0.550-0.649 0.650-0.749 0.750-1.000
|
||||
#>
|
||||
#> --------------------------------------------------------------------------------
|
||||
#> "E. coli" -> Escherichia coli (B_ESCHR_COLI, 0.688)
|
||||
#> Also matched: Enterococcus crotali (0.650), Escherichia coli coli
|
||||
#> (0.643), Escherichia coli expressing (0.611), Enterobacter cowanii
|
||||
#> (0.600), Enterococcus columbae (0.595), Enterococcus camelliae (0.591),
|
||||
#> Enterococcus casseliflavus (0.577), Enterobacter cloacae cloacae
|
||||
#> (0.571), Enterobacter cloacae complex (0.571), and Enterobacter cloacae
|
||||
#> dissolvens (0.565)
|
||||
#>
|
||||
#> Only the first 10 other matches of each record are shown. Run
|
||||
#> `print(mo_uncertainties(), n = ...)` to view more entries, or save
|
||||
#> `mo_uncertainties()` to an object.
|
||||
|
||||
mo_matching_score(
|
||||
x = "E. coli",
|
||||
n = c("Escherichia coli", "Entamoeba coli")
|
||||
)
|
||||
#> [1] 0.6875000 0.3809524
|
||||
```
|
||||
Reference in New Issue
Block a user