1
0
mirror of https://github.com/msberends/AMR.git synced 2025-12-15 23:10:28 +01:00
Files
AMR/reference/as.mo.md
2025-11-24 10:42:21 +00:00

600 lines
25 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Transform Arbitrary Input to Valid Microbial Taxonomy
Use this function to get a valid microorganism code (`mo`) based on
arbitrary user input. Determination is done using intelligent rules and
the complete taxonomic tree of the kingdoms Animalia, Archaea, Bacteria,
Chromista, and Protozoa, and most microbial species from the kingdom
Fungi (see *Source*). The input can be almost anything: a full name
(like `"Staphylococcus aureus"`), an abbreviated name (such as
`"S. aureus"`), an abbreviation known in the field (such as `"MRSA"`),
or just a genus. See *Examples*.
## Usage
``` r
as.mo(x, Becker = FALSE, Lancefield = FALSE,
minimum_matching_score = NULL,
keep_synonyms = getOption("AMR_keep_synonyms", FALSE),
reference_df = get_mo_source(),
ignore_pattern = getOption("AMR_ignore_pattern", NULL),
cleaning_regex = getOption("AMR_cleaning_regex", mo_cleaning_regex()),
only_fungi = getOption("AMR_only_fungi", FALSE),
language = get_AMR_locale(), info = interactive(), ...)
is.mo(x)
mo_uncertainties()
mo_renamed()
mo_failures()
mo_reset_session()
mo_cleaning_regex()
```
## Arguments
- x:
A [character](https://rdrr.io/r/base/character.html) vector or a
[data.frame](https://rdrr.io/r/base/data.frame.html) with one or two
columns.
- Becker:
A [logical](https://rdrr.io/r/base/logical.html) to indicate whether
staphylococci should be categorised into coagulase-negative
staphylococci ("CoNS") and coagulase-positive staphylococci ("CoPS")
instead of their own species, according to Karsten Becker *et al.*
(see *Source*). Please see *Details* for a full list of staphylococcal
species that will be converted.
This excludes *Staphylococcus aureus* at default, use `Becker = "all"`
to also categorise *S. aureus* as "CoPS".
- Lancefield:
A [logical](https://rdrr.io/r/base/logical.html) to indicate whether a
beta-haemolytic *Streptococcus* should be categorised into Lancefield
groups instead of their own species, according to Rebecca C.
Lancefield (see *Source*). These streptococci will be categorised in
their first group, e.g. *Streptococcus dysgalactiae* will be group C,
although officially it was also categorised into groups G and L. .
Please see *Details* for a full list of streptococcal species that
will be converted.
This excludes enterococci at default (who are in group D), use
`Lancefield = "all"` to also categorise all enterococci as group D.
- minimum_matching_score:
A numeric value to set as the lower limit for the [MO matching
score](https://amr-for-r.org/reference/mo_matching_score.md). When
left blank, this will be determined automatically based on the
character length of `x`, its [taxonomic
kingdom](https://amr-for-r.org/reference/microorganisms.md) and [human
pathogenicity](https://amr-for-r.org/reference/mo_matching_score.md).
- keep_synonyms:
A [logical](https://rdrr.io/r/base/logical.html) to indicate if old,
previously valid taxonomic names must be preserved and not be
corrected to currently accepted names. The default is `FALSE`, which
will return a note if old taxonomic names were processed. The default
can be set with the package option
[`AMR_keep_synonyms`](https://amr-for-r.org/reference/AMR-options.md),
i.e. `options(AMR_keep_synonyms = TRUE)` or
`options(AMR_keep_synonyms = FALSE)`.
- reference_df:
A [data.frame](https://rdrr.io/r/base/data.frame.html) to be used for
extra reference when translating `x` to a valid `mo`. See
[`set_mo_source()`](https://amr-for-r.org/reference/mo_source.md) and
[`get_mo_source()`](https://amr-for-r.org/reference/mo_source.md) to
automate the usage of your own codes (e.g. used in your analysis or
organisation).
- ignore_pattern:
A Perl-compatible [regular
expression](https://rdrr.io/r/base/regex.html) (case-insensitive) of
which all matches in `x` must return `NA`. This can be convenient to
exclude known non-relevant input and can also be set with the package
option
[`AMR_ignore_pattern`](https://amr-for-r.org/reference/AMR-options.md),
e.g.
`options(AMR_ignore_pattern = "(not reported|contaminated flora)")`.
- cleaning_regex:
A Perl-compatible [regular
expression](https://rdrr.io/r/base/regex.html) (case-insensitive) to
clean the input of `x`. Every matched part in `x` will be removed. At
default, this is the outcome of `mo_cleaning_regex()`, which removes
texts between brackets and texts such as "species" and "serovar". The
default can be set with the package option
[`AMR_cleaning_regex`](https://amr-for-r.org/reference/AMR-options.md).
- only_fungi:
A [logical](https://rdrr.io/r/base/logical.html) to indicate if only
fungi must be found, making sure that e.g. misspellings always return
records from the kingdom of Fungi. This can be set globally for [all
microorganism
functions](https://amr-for-r.org/reference/mo_property.md) with the
package option
[`AMR_only_fungi`](https://amr-for-r.org/reference/AMR-options.md),
i.e. `options(AMR_only_fungi = TRUE)`.
- language:
Language to translate text like "no growth", which defaults to the
system language (see
[`get_AMR_locale()`](https://amr-for-r.org/reference/translate.md)).
- info:
A [logical](https://rdrr.io/r/base/logical.html) to indicate that info
must be printed, e.g. a progress bar when more than 25 items are to be
coerced, or a list with old taxonomic names. The default is `TRUE`
only in interactive mode.
- ...:
Other arguments passed on to functions.
## Value
A [character](https://rdrr.io/r/base/character.html)
[vector](https://rdrr.io/r/base/vector.html) with additional class `mo`
## Details
A microorganism (MO) code from this package (class: `mo`) is
human-readable and typically looks like these examples:
Code Full name
--------------- --------------------------------------
B_KLBSL Klebsiella
B_KLBSL_PNMN Klebsiella pneumoniae
B_KLBSL_PNMN_RHNS Klebsiella pneumoniae rhinoscleromatis
| | | |
| | | |
| | | \---> subspecies, a 3-5 letter acronym
| | \----> species, a 3-6 letter acronym
| \----> genus, a 4-8 letter acronym
\----> kingdom: A (Archaea), AN (Animalia), B (Bacteria),
C (Chromista), F (Fungi), PL (Plantae),
P (Protozoa)
Values that cannot be coerced will be considered 'unknown' and will
return the MO code `UNKNOWN` with a warning.
Use the [`mo_*`](https://amr-for-r.org/reference/mo_property.md)
functions to get properties based on the returned code, see *Examples*.
The `as.mo()` function uses a novel and scientifically validated
([doi:10.18637/jss.v104.i03](https://doi.org/10.18637/jss.v104.i03) )
matching score algorithm (see *Matching Score for Microorganisms* below)
to match input against the [available microbial
taxonomy](https://amr-for-r.org/reference/microorganisms.md) in this
package. This implicates that e.g. `"E. coli"` (a microorganism highly
prevalent in humans) will return the microbial ID of *Escherichia coli*
and not *Entamoeba coli* (a microorganism less prevalent in humans),
although the latter would alphabetically come first.
### Coping with Uncertain Results
Results of non-exact taxonomic input are based on their [matching
score](https://amr-for-r.org/reference/mo_matching_score.md). The lowest
allowed score can be set with the `minimum_matching_score` argument. At
default this will be determined based on the character length of the
input, the [taxonomic
kingdom](https://amr-for-r.org/reference/microorganisms.md), and the
[human
pathogenicity](https://amr-for-r.org/reference/mo_matching_score.md) of
the taxonomic outcome. If values are matched with uncertainty, a message
will be shown to suggest the user to inspect the results with
`mo_uncertainties()`, which returns a
[data.frame](https://rdrr.io/r/base/data.frame.html) with all
specifications.
To increase the quality of matching, the `cleaning_regex` argument is
used to clean the input. This must be a [regular
expression](https://rdrr.io/r/base/regex.html) that matches parts of the
input that should be removed before the input is matched against the
[available microbial
taxonomy](https://amr-for-r.org/reference/microorganisms.md). It will be
matched Perl-compatible and case-insensitive. The default value of
`cleaning_regex` is the outcome of the helper function
`mo_cleaning_regex()`.
There are three helper functions that can be run after using the
`as.mo()` function:
- Use `mo_uncertainties()` to get a
[data.frame](https://rdrr.io/r/base/data.frame.html) that prints in a
pretty format with all taxonomic names that were guessed. The output
contains the matching score for all matches (see *Matching Score for
Microorganisms* below).
- Use `mo_failures()` to get a
[character](https://rdrr.io/r/base/character.html)
[vector](https://rdrr.io/r/base/vector.html) with all values that
could not be coerced to a valid value.
- Use `mo_renamed()` to get a
[data.frame](https://rdrr.io/r/base/data.frame.html) with all values
that could be coerced based on old, previously accepted taxonomic
names.
### For Mycologists
The [matching score
algorithm](https://amr-for-r.org/reference/mo_matching_score.md) gives
precedence to bacteria over fungi. If you are only analysing fungi, be
sure to use `only_fungi = TRUE`, or better yet, add this to your code
and run it once every session:
options(AMR_only_fungi = TRUE)
This will make sure that no bacteria or other 'non-fungi' will be
returned by `as.mo()`, or any of the
[`mo_*`](https://amr-for-r.org/reference/mo_property.md) functions.
### Coagulase-negative and Coagulase-positive Staphylococci
With `Becker = TRUE`, the following staphylococci will be converted to
their corresponding coagulase group:
- Coagulase-negative: *S. americanisciuri*, *S. argensis*, *S.
arlettae*, *S. auricularis*, *S. borealis*, *S. brunensis*, *S.
caeli*, *S. caledonicus*, *S. canis*, *S. capitis*, *S. capitis
capitis*, *S. capitis urealyticus*, *S. capitis ureolyticus*, *S.
caprae*, *S. carnosus*, *S. carnosus carnosus*, *S. carnosus utilis*,
*S. casei*, *S. caseolyticus*, *S. chromogenes*, *S. cohnii*, *S.
cohnii cohnii*, *S. cohnii urealyticum*, *S. cohnii urealyticus*, *S.
condimenti*, *S. croceilyticus*, *S. debuckii*, *S. devriesei*, *S.
durrellii*, *S. edaphicus*, *S. epidermidis*, *S. equorum*, *S.
equorum equorum*, *S. equorum linens*, *S. felis*, *S. fleurettii*,
*S. gallinarum*, *S. haemolyticus*, *S. hominis*, *S. hominis
hominis*, *S. hominis novobiosepticus*, *S. jettensis*, *S. kloosii*,
*S. lentus*, *S. lloydii*, *S. lugdunensis*, *S. marylandisciuri*, *S.
massiliensis*, *S. microti*, *S. muscae*, *S. nepalensis*, *S.
pasteuri*, *S. petrasii*, *S. petrasii croceilyticus*, *S. petrasii
jettensis*, *S. petrasii petrasii*, *S. petrasii pragensis*, *S.
pettenkoferi*, *S. piscifermentans*, *S. pragensis*, *S.
pseudoxylosus*, *S. pulvereri*, *S. ratti*, *S. rostri*, *S.
saccharolyticus*, *S. saprophyticus*, *S. saprophyticus bovis*, *S.
saprophyticus saprophyticus*, *S. schleiferi*, *S. schleiferi
schleiferi*, *S. sciuri*, *S. sciuri carnaticus*, *S. sciuri lentus*,
*S. sciuri rodentium*, *S. sciuri sciuri*, *S. shinii*, *S. simulans*,
*S. stepanovicii*, *S. succinus*, *S. succinus casei*, *S. succinus
succinus*, *S. taiwanensis*, *S. urealyticus*, *S. ureilyticus*, *S.
veratri*, *S. vitulinus*, *S. vitulus*, *S. warneri*, and *S. xylosus*
- Coagulase-positive: *S. agnetis*, *S. argenteus*, *S. coagulans*, *S.
cornubiensis*, *S. delphini*, *S. hyicus*, *S. hyicus chromogenes*,
*S. hyicus hyicus*, *S. intermedius*, *S. lutrae*, *S.
pseudintermedius*, *S. roterodami*, *S. schleiferi coagulans*, *S.
schweitzeri*, *S. simiae*, and *S. singaporensis*
This is based on:
- Becker K *et al.* (2014). **Coagulase-Negative Staphylococci.** *Clin
Microbiol Rev.* 27(4): 870-926;
[doi:10.1128/CMR.00109-13](https://doi.org/10.1128/CMR.00109-13)
- Becker K *et al.* (2019). **Implications of identifying the recently
defined members of the *S. aureus* complex, *S. argenteus* and *S.
schweitzeri*: A position paper of members of the ESCMID Study Group
for staphylococci and Staphylococcal Diseases (ESGS).** *Clin
Microbiol Infect*;
[doi:10.1016/j.cmi.2019.02.028](https://doi.org/10.1016/j.cmi.2019.02.028)
- Becker K *et al.* (2020). **Emergence of coagulase-negative
staphylococci.** *Expert Rev Anti Infect Ther.* 18(4):349-366;
[doi:10.1080/14787210.2020.1730813](https://doi.org/10.1080/14787210.2020.1730813)
For newly named staphylococcal species, such as *S. brunensis* (2024)
and *S. shinii* (2023), we looked up the scientific reference to make
sure the species are considered for the correct coagulase group.
### Lancefield Groups in Streptococci
With `Lancefield = TRUE`, the following streptococci will be converted
to their corresponding Lancefield group:
- Streptococcus Group A: *S. pyogenes*
- Streptococcus Group B: *S. agalactiae*
- Streptococcus Group C: *S. dysgalactiae*, *S. dysgalactiae
dysgalactiae*, *S. dysgalactiae equisimilis*, *S. equi*, *S. equi
equi*, *S. equi ruminatorum*, and *S. equi zooepidemicus*
- Streptococcus Group F: *S. anginosus*, *S. anginosus anginosus*, *S.
anginosus whileyi*, *S. constellatus*, *S. constellatus constellatus*,
*S. constellatus pharyngis*, *S. constellatus viborgensis*, and *S.
intermedius*
- Streptococcus Group G: *S. canis*, *S. dysgalactiae*, *S. dysgalactiae
dysgalactiae*, and *S. dysgalactiae equisimilis*
- Streptococcus Group H: *S. sanguinis*
- Streptococcus Group K: *S. salivarius*, *S. salivarius salivarius*,
and *S. salivarius thermophilus*
- Streptococcus Group L: *S. dysgalactiae*, *S. dysgalactiae
dysgalactiae*, and *S. dysgalactiae equisimilis*
This is based on:
- Lancefield RC (1933). **A serological differentiation of human and
other groups of hemolytic streptococci.** *J Exp Med.* 57(4): 571-95;
[doi:10.1084/jem.57.4.571](https://doi.org/10.1084/jem.57.4.571)
## Source
- Berends MS *et al.* (2022). **AMR: An R Package for Working with
Antimicrobial Resistance Data**. *Journal of Statistical Software*,
104(3), 1-31;
[doi:10.18637/jss.v104.i03](https://doi.org/10.18637/jss.v104.i03)
- Parte, AC *et al.* (2020). **List of Prokaryotic names with Standing
in Nomenclature (LPSN) moves to the DSMZ.** International Journal of
Systematic and Evolutionary Microbiology, 70, 5607-5612;
[doi:10.1099/ijsem.0.004332](https://doi.org/10.1099/ijsem.0.004332) .
Accessed from <https://lpsn.dsmz.de> on June 24th, 2024.
- Vincent, R *et al* (2013). **MycoBank gearing up for new horizons.**
IMA Fungus, 4(2), 371-9;
[doi:10.5598/imafungus.2013.04.02.16](https://doi.org/10.5598/imafungus.2013.04.02.16)
. Accessed from <https://www.mycobank.org> on June 24th, 2024.
- GBIF Secretariat (2023). GBIF Backbone Taxonomy. Checklist dataset
[doi:10.15468/39omei](https://doi.org/10.15468/39omei) . Accessed from
<https://www.gbif.org> on June 24th, 2024.
- Reimer, LC *et al.* (2022). ***BacDive* in 2022: the knowledge base
for standardized bacterial and archaeal data.** Nucleic Acids Res.,
50(D1):D741-D74;
[doi:10.1093/nar/gkab961](https://doi.org/10.1093/nar/gkab961) .
Accessed from <https://bacdive.dsmz.de> on July 16th, 2024.
- Public Health Information Network Vocabulary Access and Distribution
System (PHIN VADS). US Edition of SNOMED CT from 1 September 2020.
Value Set Name 'Microorganism', OID 2.16.840.1.114222.4.11.1009 (v12).
URL: <https://www.cdc.gov/phin/php/phinvads/>
- Bartlett A *et al.* (2022). **A comprehensive list of bacterial
pathogens infecting humans** *Microbiology* 168:001269;
[doi:10.1099/mic.0.001269](https://doi.org/10.1099/mic.0.001269)
## Matching Score for Microorganisms
With ambiguous user input in `as.mo()` and all the
[`mo_*`](https://amr-for-r.org/reference/mo_property.md) functions, the
returned results are chosen based on their matching score using
[`mo_matching_score()`](https://amr-for-r.org/reference/mo_matching_score.md).
This matching score \\m\\, is calculated as:
\$\$m\_{(x, n)} = \frac{l\_{n} - 0.5 \cdot \min \begin{cases}l\_{n} \\
\textrm{lev}(x, n)\end{cases}}{l\_{n} \cdot p\_{n} \cdot k\_{n}}\$\$
where:
- \\x\\ is the user input;
- \\n\\ is a taxonomic name (genus, species, and subspecies);
- \\l_n\\ is the length of \\n\\;
- \\lev\\ is the [Levenshtein distance
function](https://en.wikipedia.org/wiki/Levenshtein_distance)
(counting any insertion as 1, and any deletion or substitution as 2)
that is needed to change \\x\\ into \\n\\;
- \\p_n\\ is the human pathogenic prevalence group of \\n\\, as
described below;
- \\k_n\\ is the taxonomic kingdom of \\n\\, set as Bacteria = 1, Fungi
= 1.25, Protozoa = 1.5, Chromista = 1.75, Archaea = 2, others = 3.
The grouping into human pathogenic prevalence \\p\\ is based on recent
work from Bartlett *et al.* (2022,
[doi:10.1099/mic.0.001269](https://doi.org/10.1099/mic.0.001269) ) who
extensively studied medical-scientific literature to categorise all
bacterial species into these groups:
- **Established**, if a taxonomic species has infected at least three
persons in three or more references. These records have
`prevalence = 1.15` in the
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
data set;
- **Putative**, if a taxonomic species has fewer than three known cases.
These records have `prevalence = 1.25` in the
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
data set.
Furthermore,
- Genera from the World Health Organization's (WHO) Priority Pathogen
List have `prevalence = 1.0` in the
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
data set;
- Any genus present in the **established** list also has
`prevalence = 1.15` in the
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
data set;
- Any other genus present in the **putative** list has
`prevalence = 1.25` in the
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
data set;
- Any other species or subspecies of which the genus is present in the
two aforementioned groups, has `prevalence = 1.5` in the
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
data set;
- Any *non-bacterial* genus, species or subspecies of which the genus is
present in the following list, has `prevalence = 1.25` in the
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
data set: *Absidia*, *Acanthamoeba*, *Acremonium*, *Actinomucor*,
*Aedes*, *Alternaria*, *Amoeba*, *Ancylostoma*, *Angiostrongylus*,
*Anisakis*, *Anopheles*, *Apophysomyces*, *Arthroderma*,
*Aspergillus*, *Aureobasidium*, *Basidiobolus*, *Beauveria*,
*Bipolaris*, *Blastobotrys*, *Blastocystis*, *Blastomyces*, *Candida*,
*Capillaria*, *Chaetomium*, *Chilomastix*, *Chrysonilia*,
*Chrysosporium*, *Cladophialophora*, *Cladosporium*, *Clavispora*,
*Coccidioides*, *Cokeromyces*, *Conidiobolus*, *Coniochaeta*,
*Contracaecum*, *Cordylobia*, *Cryptococcus*, *Cryptosporidium*,
*Cunninghamella*, *Curvularia*, *Cyberlindnera*, *Debaryozyma*,
*Demodex*, *Dermatobia*, *Dientamoeba*, *Diphyllobothrium*,
*Dirofilaria*, *Echinostoma*, *Entamoeba*, *Enterobius*,
*Epidermophyton*, *Exidia*, *Exophiala*, *Exserohilum*, *Fasciola*,
*Fonsecaea*, *Fusarium*, *Geotrichum*, *Giardia*, *Graphium*,
*Haloarcula*, *Halobacterium*, *Halococcus*, *Hansenula*,
*Hendersonula*, *Heterophyes*, *Histomonas*, *Histoplasma*, *Hortaea*,
*Hymenolepis*, *Hypomyces*, *Hysterothylacium*, *Kloeckera*,
*Kluyveromyces*, *Kodamaea*, *Lacazia*, *Leishmania*, *Lichtheimia*,
*Lodderomyces*, *Lomentospora*, *Madurella*, *Malassezia*,
*Malbranchea*, *Metagonimus*, *Meyerozyma*, *Microsporidium*,
*Microsporum*, *Millerozyma*, *Mortierella*, *Mucor*,
*Mycocentrospora*, *Nannizzia*, *Necator*, *Nectria*, *Ochroconis*,
*Oesophagostomum*, *Oidiodendron*, *Opisthorchis*, *Paecilomyces*,
*Paracoccidioides*, *Pediculus*, *Penicillium*, *Phaeoacremonium*,
*Phaeomoniella*, *Phialophora*, *Phlebotomus*, *Phoma*, *Pichia*,
*Piedraia*, *Pithomyces*, *Pityrosporum*, *Pneumocystis*,
*Pseudallescheria*, *Pseudoscopulariopsis*, *Pseudoterranova*,
*Pulex*, *Purpureocillium*, *Quambalaria*, *Rhinocladiella*,
*Rhizomucor*, *Rhizopus*, *Rhodotorula*, *Saccharomyces*, *Saksenaea*,
*Saprochaete*, *Sarcoptes*, *Scedosporium*, *Schistosoma*,
*Schizosaccharomyces*, *Scolecobasidium*, *Scopulariopsis*,
*Scytalidium*, *Spirometra*, *Sporobolomyces*, *Sporopachydermia*,
*Sporothrix*, *Sporotrichum*, *Stachybotrys*, *Strongyloides*,
*Syncephalastrum*, *Syngamus*, *Taenia*, *Talaromyces*, *Teleomorph*,
*Toxocara*, *Trichinella*, *Trichobilharzia*, *Trichoderma*,
*Trichomonas*, *Trichophyton*, *Trichosporon*, *Trichostrongylus*,
*Trichuris*, *Tritirachium*, *Trombicula*, *Trypanosoma*, *Tunga*,
*Ulocladium*, *Ustilago*, *Verticillium*, *Wallemia*, *Wangiella*,
*Wickerhamomyces*, *Wuchereria*, *Yarrowia*, or *Zygosaccharomyces*;
- All other records have `prevalence = 2.0` in the
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
data set.
When calculating the matching score, all characters in \\x\\ and \\n\\
are ignored that are other than A-Z, a-z, 0-9, spaces and parentheses.
All matches are sorted descending on their matching score and for all
user input values, the top match will be returned. This will lead to the
effect that e.g., `"E. coli"` will return the microbial ID of
*Escherichia coli* (\\m = 0.688\\, a highly prevalent microorganism
found in humans) and not *Entamoeba coli* (\\m = 0.381\\, a less
prevalent microorganism in humans), although the latter would
alphabetically come first.
## Download Our Reference Data
All reference data sets in the AMR package - including information on
microorganisms, antimicrobials, and clinical breakpoints - are freely
available for download in multiple formats: R, MS Excel, Apache Feather,
Apache Parquet, SPSS, and Stata.
For maximum compatibility, we also provide machine-readable,
tab-separated plain text files suitable for use in any software,
including laboratory information systems.
Visit [our website for direct download
links](https://amr-for-r.org/articles/datasets.html), or explore the
actual files in [our GitHub
repository](https://github.com/msberends/AMR/tree/main/data-raw/datasets).
## See also
[microorganisms](https://amr-for-r.org/reference/microorganisms.md) for
the [data.frame](https://rdrr.io/r/base/data.frame.html) that is being
used to determine ID's.
The [`mo_*`](https://amr-for-r.org/reference/mo_property.md) functions
(such as [`mo_genus()`](https://amr-for-r.org/reference/mo_property.md),
[`mo_gramstain()`](https://amr-for-r.org/reference/mo_property.md)) to
get properties based on the returned code.
## Examples
``` r
# \donttest{
# These examples all return "B_STPHY_AURS", the ID of S. aureus:
as.mo(c(
"sau", # WHONET code
"stau",
"STAU",
"staaur",
"S. aureus",
"S aureus",
"Sthafilokkockus aureus", # handles incorrect spelling
"Staphylococcus aureus (MRSA)",
"MRSA", # Methicillin Resistant S. aureus
"VISA", # Vancomycin Intermediate S. aureus
"VRSA", # Vancomycin Resistant S. aureus
115329001 # SNOMED CT code
))
#> Class 'mo'
#> [1] B_STPHY_AURS B_STPHY_AURS B_STPHY_AURS B_STPHY_AURS B_STPHY_AURS
#> [6] B_STPHY_AURS B_STPHY_AURS B_STPHY_AURS B_STPHY_AURS B_STPHY_AURS
#> [11] B_STPHY_AURS B_STPHY_AURS
# Dyslexia is no problem - these all work:
as.mo(c(
"Ureaplasma urealyticum",
"Ureaplasma urealyticus",
"Ureaplasmium urealytica",
"Ureaplazma urealitycium"
))
#> Class 'mo'
#> [1] B_URPLS_URLY B_URPLS_URLY B_URPLS_URLY B_URPLS_URLY
# input will get cleaned up with the input given in the `cleaning_regex` argument,
# which defaults to `mo_cleaning_regex()`:
cat(mo_cleaning_regex(), "\n")
#> ([^A-Za-z- \(\)\[\]{}]+|([({]|\[).+([})]|\])|(^| )( ?[a-z-]+[-](resistant|susceptible) ?|e?spp([^a-z]+|$)|e?ssp([^a-z]+|$)|serogr.?up[a-z]*|e?ss([^a-z]+|$)|e?sp([^a-z]+|$)|var([^a-z]+|$)|serovar[a-z]*|sube?species|biovar[a-z]*|e?species|Ig[ADEGM]|e?subsp|biotype|titer|dummy))
as.mo("Streptococcus group A")
#> Class 'mo'
#> [1] B_STRPT_GRPA
as.mo("S. epidermidis") # will remain species: B_STPHY_EPDR
#> Class 'mo'
#> [1] B_STPHY_EPDR
as.mo("S. epidermidis", Becker = TRUE) # will not remain species: B_STPHY_CONS
#> Class 'mo'
#> [1] B_STPHY_CONS
as.mo("S. pyogenes") # will remain species: B_STRPT_PYGN
#> Class 'mo'
#> [1] B_STRPT_PYGN
as.mo("S. pyogenes", Lancefield = TRUE) # will not remain species: B_STRPT_GRPA
#> Class 'mo'
#> [1] B_STRPT_GRPA
# All mo_* functions use as.mo() internally too (see ?mo_property):
mo_genus("E. coli")
#> [1] "Escherichia"
mo_gramstain("ESCO")
#> [1] "Gram-negative"
mo_is_intrinsic_resistant("ESCCOL", ab = "vanco")
#> Determining intrinsic resistance based on 'EUCAST Expected Resistant
#> Phenotypes' v1.2 (2023). This note will be shown once per session.
#> [1] TRUE
# }
```