mirror of
https://github.com/msberends/AMR.git
synced 2025-12-15 23:10:28 +01:00
600 lines
25 KiB
Markdown
600 lines
25 KiB
Markdown
# Transform Arbitrary Input to Valid Microbial Taxonomy
|
||
|
||
Use this function to get a valid microorganism code (`mo`) based on
|
||
arbitrary user input. Determination is done using intelligent rules and
|
||
the complete taxonomic tree of the kingdoms Animalia, Archaea, Bacteria,
|
||
Chromista, and Protozoa, and most microbial species from the kingdom
|
||
Fungi (see *Source*). The input can be almost anything: a full name
|
||
(like `"Staphylococcus aureus"`), an abbreviated name (such as
|
||
`"S. aureus"`), an abbreviation known in the field (such as `"MRSA"`),
|
||
or just a genus. See *Examples*.
|
||
|
||
## Usage
|
||
|
||
``` r
|
||
as.mo(x, Becker = FALSE, Lancefield = FALSE,
|
||
minimum_matching_score = NULL,
|
||
keep_synonyms = getOption("AMR_keep_synonyms", FALSE),
|
||
reference_df = get_mo_source(),
|
||
ignore_pattern = getOption("AMR_ignore_pattern", NULL),
|
||
cleaning_regex = getOption("AMR_cleaning_regex", mo_cleaning_regex()),
|
||
only_fungi = getOption("AMR_only_fungi", FALSE),
|
||
language = get_AMR_locale(), info = interactive(), ...)
|
||
|
||
is.mo(x)
|
||
|
||
mo_uncertainties()
|
||
|
||
mo_renamed()
|
||
|
||
mo_failures()
|
||
|
||
mo_reset_session()
|
||
|
||
mo_cleaning_regex()
|
||
```
|
||
|
||
## Arguments
|
||
|
||
- x:
|
||
|
||
A [character](https://rdrr.io/r/base/character.html) vector or a
|
||
[data.frame](https://rdrr.io/r/base/data.frame.html) with one or two
|
||
columns.
|
||
|
||
- Becker:
|
||
|
||
A [logical](https://rdrr.io/r/base/logical.html) to indicate whether
|
||
staphylococci should be categorised into coagulase-negative
|
||
staphylococci ("CoNS") and coagulase-positive staphylococci ("CoPS")
|
||
instead of their own species, according to Karsten Becker *et al.*
|
||
(see *Source*). Please see *Details* for a full list of staphylococcal
|
||
species that will be converted.
|
||
|
||
This excludes *Staphylococcus aureus* at default, use `Becker = "all"`
|
||
to also categorise *S. aureus* as "CoPS".
|
||
|
||
- Lancefield:
|
||
|
||
A [logical](https://rdrr.io/r/base/logical.html) to indicate whether a
|
||
beta-haemolytic *Streptococcus* should be categorised into Lancefield
|
||
groups instead of their own species, according to Rebecca C.
|
||
Lancefield (see *Source*). These streptococci will be categorised in
|
||
their first group, e.g. *Streptococcus dysgalactiae* will be group C,
|
||
although officially it was also categorised into groups G and L. .
|
||
Please see *Details* for a full list of streptococcal species that
|
||
will be converted.
|
||
|
||
This excludes enterococci at default (who are in group D), use
|
||
`Lancefield = "all"` to also categorise all enterococci as group D.
|
||
|
||
- minimum_matching_score:
|
||
|
||
A numeric value to set as the lower limit for the [MO matching
|
||
score](https://amr-for-r.org/reference/mo_matching_score.md). When
|
||
left blank, this will be determined automatically based on the
|
||
character length of `x`, its [taxonomic
|
||
kingdom](https://amr-for-r.org/reference/microorganisms.md) and [human
|
||
pathogenicity](https://amr-for-r.org/reference/mo_matching_score.md).
|
||
|
||
- keep_synonyms:
|
||
|
||
A [logical](https://rdrr.io/r/base/logical.html) to indicate if old,
|
||
previously valid taxonomic names must be preserved and not be
|
||
corrected to currently accepted names. The default is `FALSE`, which
|
||
will return a note if old taxonomic names were processed. The default
|
||
can be set with the package option
|
||
[`AMR_keep_synonyms`](https://amr-for-r.org/reference/AMR-options.md),
|
||
i.e. `options(AMR_keep_synonyms = TRUE)` or
|
||
`options(AMR_keep_synonyms = FALSE)`.
|
||
|
||
- reference_df:
|
||
|
||
A [data.frame](https://rdrr.io/r/base/data.frame.html) to be used for
|
||
extra reference when translating `x` to a valid `mo`. See
|
||
[`set_mo_source()`](https://amr-for-r.org/reference/mo_source.md) and
|
||
[`get_mo_source()`](https://amr-for-r.org/reference/mo_source.md) to
|
||
automate the usage of your own codes (e.g. used in your analysis or
|
||
organisation).
|
||
|
||
- ignore_pattern:
|
||
|
||
A Perl-compatible [regular
|
||
expression](https://rdrr.io/r/base/regex.html) (case-insensitive) of
|
||
which all matches in `x` must return `NA`. This can be convenient to
|
||
exclude known non-relevant input and can also be set with the package
|
||
option
|
||
[`AMR_ignore_pattern`](https://amr-for-r.org/reference/AMR-options.md),
|
||
e.g.
|
||
`options(AMR_ignore_pattern = "(not reported|contaminated flora)")`.
|
||
|
||
- cleaning_regex:
|
||
|
||
A Perl-compatible [regular
|
||
expression](https://rdrr.io/r/base/regex.html) (case-insensitive) to
|
||
clean the input of `x`. Every matched part in `x` will be removed. At
|
||
default, this is the outcome of `mo_cleaning_regex()`, which removes
|
||
texts between brackets and texts such as "species" and "serovar". The
|
||
default can be set with the package option
|
||
[`AMR_cleaning_regex`](https://amr-for-r.org/reference/AMR-options.md).
|
||
|
||
- only_fungi:
|
||
|
||
A [logical](https://rdrr.io/r/base/logical.html) to indicate if only
|
||
fungi must be found, making sure that e.g. misspellings always return
|
||
records from the kingdom of Fungi. This can be set globally for [all
|
||
microorganism
|
||
functions](https://amr-for-r.org/reference/mo_property.md) with the
|
||
package option
|
||
[`AMR_only_fungi`](https://amr-for-r.org/reference/AMR-options.md),
|
||
i.e. `options(AMR_only_fungi = TRUE)`.
|
||
|
||
- language:
|
||
|
||
Language to translate text like "no growth", which defaults to the
|
||
system language (see
|
||
[`get_AMR_locale()`](https://amr-for-r.org/reference/translate.md)).
|
||
|
||
- info:
|
||
|
||
A [logical](https://rdrr.io/r/base/logical.html) to indicate that info
|
||
must be printed, e.g. a progress bar when more than 25 items are to be
|
||
coerced, or a list with old taxonomic names. The default is `TRUE`
|
||
only in interactive mode.
|
||
|
||
- ...:
|
||
|
||
Other arguments passed on to functions.
|
||
|
||
## Value
|
||
|
||
A [character](https://rdrr.io/r/base/character.html)
|
||
[vector](https://rdrr.io/r/base/vector.html) with additional class `mo`
|
||
|
||
## Details
|
||
|
||
A microorganism (MO) code from this package (class: `mo`) is
|
||
human-readable and typically looks like these examples:
|
||
|
||
Code Full name
|
||
--------------- --------------------------------------
|
||
B_KLBSL Klebsiella
|
||
B_KLBSL_PNMN Klebsiella pneumoniae
|
||
B_KLBSL_PNMN_RHNS Klebsiella pneumoniae rhinoscleromatis
|
||
| | | |
|
||
| | | |
|
||
| | | \---> subspecies, a 3-5 letter acronym
|
||
| | \----> species, a 3-6 letter acronym
|
||
| \----> genus, a 4-8 letter acronym
|
||
\----> kingdom: A (Archaea), AN (Animalia), B (Bacteria),
|
||
C (Chromista), F (Fungi), PL (Plantae),
|
||
P (Protozoa)
|
||
|
||
Values that cannot be coerced will be considered 'unknown' and will
|
||
return the MO code `UNKNOWN` with a warning.
|
||
|
||
Use the [`mo_*`](https://amr-for-r.org/reference/mo_property.md)
|
||
functions to get properties based on the returned code, see *Examples*.
|
||
|
||
The `as.mo()` function uses a novel and scientifically validated
|
||
([doi:10.18637/jss.v104.i03](https://doi.org/10.18637/jss.v104.i03) )
|
||
matching score algorithm (see *Matching Score for Microorganisms* below)
|
||
to match input against the [available microbial
|
||
taxonomy](https://amr-for-r.org/reference/microorganisms.md) in this
|
||
package. This implicates that e.g. `"E. coli"` (a microorganism highly
|
||
prevalent in humans) will return the microbial ID of *Escherichia coli*
|
||
and not *Entamoeba coli* (a microorganism less prevalent in humans),
|
||
although the latter would alphabetically come first.
|
||
|
||
### Coping with Uncertain Results
|
||
|
||
Results of non-exact taxonomic input are based on their [matching
|
||
score](https://amr-for-r.org/reference/mo_matching_score.md). The lowest
|
||
allowed score can be set with the `minimum_matching_score` argument. At
|
||
default this will be determined based on the character length of the
|
||
input, the [taxonomic
|
||
kingdom](https://amr-for-r.org/reference/microorganisms.md), and the
|
||
[human
|
||
pathogenicity](https://amr-for-r.org/reference/mo_matching_score.md) of
|
||
the taxonomic outcome. If values are matched with uncertainty, a message
|
||
will be shown to suggest the user to inspect the results with
|
||
`mo_uncertainties()`, which returns a
|
||
[data.frame](https://rdrr.io/r/base/data.frame.html) with all
|
||
specifications.
|
||
|
||
To increase the quality of matching, the `cleaning_regex` argument is
|
||
used to clean the input. This must be a [regular
|
||
expression](https://rdrr.io/r/base/regex.html) that matches parts of the
|
||
input that should be removed before the input is matched against the
|
||
[available microbial
|
||
taxonomy](https://amr-for-r.org/reference/microorganisms.md). It will be
|
||
matched Perl-compatible and case-insensitive. The default value of
|
||
`cleaning_regex` is the outcome of the helper function
|
||
`mo_cleaning_regex()`.
|
||
|
||
There are three helper functions that can be run after using the
|
||
`as.mo()` function:
|
||
|
||
- Use `mo_uncertainties()` to get a
|
||
[data.frame](https://rdrr.io/r/base/data.frame.html) that prints in a
|
||
pretty format with all taxonomic names that were guessed. The output
|
||
contains the matching score for all matches (see *Matching Score for
|
||
Microorganisms* below).
|
||
|
||
- Use `mo_failures()` to get a
|
||
[character](https://rdrr.io/r/base/character.html)
|
||
[vector](https://rdrr.io/r/base/vector.html) with all values that
|
||
could not be coerced to a valid value.
|
||
|
||
- Use `mo_renamed()` to get a
|
||
[data.frame](https://rdrr.io/r/base/data.frame.html) with all values
|
||
that could be coerced based on old, previously accepted taxonomic
|
||
names.
|
||
|
||
### For Mycologists
|
||
|
||
The [matching score
|
||
algorithm](https://amr-for-r.org/reference/mo_matching_score.md) gives
|
||
precedence to bacteria over fungi. If you are only analysing fungi, be
|
||
sure to use `only_fungi = TRUE`, or better yet, add this to your code
|
||
and run it once every session:
|
||
|
||
options(AMR_only_fungi = TRUE)
|
||
|
||
This will make sure that no bacteria or other 'non-fungi' will be
|
||
returned by `as.mo()`, or any of the
|
||
[`mo_*`](https://amr-for-r.org/reference/mo_property.md) functions.
|
||
|
||
### Coagulase-negative and Coagulase-positive Staphylococci
|
||
|
||
With `Becker = TRUE`, the following staphylococci will be converted to
|
||
their corresponding coagulase group:
|
||
|
||
- Coagulase-negative: *S. americanisciuri*, *S. argensis*, *S.
|
||
arlettae*, *S. auricularis*, *S. borealis*, *S. brunensis*, *S.
|
||
caeli*, *S. caledonicus*, *S. canis*, *S. capitis*, *S. capitis
|
||
capitis*, *S. capitis urealyticus*, *S. capitis ureolyticus*, *S.
|
||
caprae*, *S. carnosus*, *S. carnosus carnosus*, *S. carnosus utilis*,
|
||
*S. casei*, *S. caseolyticus*, *S. chromogenes*, *S. cohnii*, *S.
|
||
cohnii cohnii*, *S. cohnii urealyticum*, *S. cohnii urealyticus*, *S.
|
||
condimenti*, *S. croceilyticus*, *S. debuckii*, *S. devriesei*, *S.
|
||
durrellii*, *S. edaphicus*, *S. epidermidis*, *S. equorum*, *S.
|
||
equorum equorum*, *S. equorum linens*, *S. felis*, *S. fleurettii*,
|
||
*S. gallinarum*, *S. haemolyticus*, *S. hominis*, *S. hominis
|
||
hominis*, *S. hominis novobiosepticus*, *S. jettensis*, *S. kloosii*,
|
||
*S. lentus*, *S. lloydii*, *S. lugdunensis*, *S. marylandisciuri*, *S.
|
||
massiliensis*, *S. microti*, *S. muscae*, *S. nepalensis*, *S.
|
||
pasteuri*, *S. petrasii*, *S. petrasii croceilyticus*, *S. petrasii
|
||
jettensis*, *S. petrasii petrasii*, *S. petrasii pragensis*, *S.
|
||
pettenkoferi*, *S. piscifermentans*, *S. pragensis*, *S.
|
||
pseudoxylosus*, *S. pulvereri*, *S. ratti*, *S. rostri*, *S.
|
||
saccharolyticus*, *S. saprophyticus*, *S. saprophyticus bovis*, *S.
|
||
saprophyticus saprophyticus*, *S. schleiferi*, *S. schleiferi
|
||
schleiferi*, *S. sciuri*, *S. sciuri carnaticus*, *S. sciuri lentus*,
|
||
*S. sciuri rodentium*, *S. sciuri sciuri*, *S. shinii*, *S. simulans*,
|
||
*S. stepanovicii*, *S. succinus*, *S. succinus casei*, *S. succinus
|
||
succinus*, *S. taiwanensis*, *S. urealyticus*, *S. ureilyticus*, *S.
|
||
veratri*, *S. vitulinus*, *S. vitulus*, *S. warneri*, and *S. xylosus*
|
||
|
||
- Coagulase-positive: *S. agnetis*, *S. argenteus*, *S. coagulans*, *S.
|
||
cornubiensis*, *S. delphini*, *S. hyicus*, *S. hyicus chromogenes*,
|
||
*S. hyicus hyicus*, *S. intermedius*, *S. lutrae*, *S.
|
||
pseudintermedius*, *S. roterodami*, *S. schleiferi coagulans*, *S.
|
||
schweitzeri*, *S. simiae*, and *S. singaporensis*
|
||
|
||
This is based on:
|
||
|
||
- Becker K *et al.* (2014). **Coagulase-Negative Staphylococci.** *Clin
|
||
Microbiol Rev.* 27(4): 870-926;
|
||
[doi:10.1128/CMR.00109-13](https://doi.org/10.1128/CMR.00109-13)
|
||
|
||
- Becker K *et al.* (2019). **Implications of identifying the recently
|
||
defined members of the *S. aureus* complex, *S. argenteus* and *S.
|
||
schweitzeri*: A position paper of members of the ESCMID Study Group
|
||
for staphylococci and Staphylococcal Diseases (ESGS).** *Clin
|
||
Microbiol Infect*;
|
||
[doi:10.1016/j.cmi.2019.02.028](https://doi.org/10.1016/j.cmi.2019.02.028)
|
||
|
||
- Becker K *et al.* (2020). **Emergence of coagulase-negative
|
||
staphylococci.** *Expert Rev Anti Infect Ther.* 18(4):349-366;
|
||
[doi:10.1080/14787210.2020.1730813](https://doi.org/10.1080/14787210.2020.1730813)
|
||
|
||
For newly named staphylococcal species, such as *S. brunensis* (2024)
|
||
and *S. shinii* (2023), we looked up the scientific reference to make
|
||
sure the species are considered for the correct coagulase group.
|
||
|
||
### Lancefield Groups in Streptococci
|
||
|
||
With `Lancefield = TRUE`, the following streptococci will be converted
|
||
to their corresponding Lancefield group:
|
||
|
||
- Streptococcus Group A: *S. pyogenes*
|
||
|
||
- Streptococcus Group B: *S. agalactiae*
|
||
|
||
- Streptococcus Group C: *S. dysgalactiae*, *S. dysgalactiae
|
||
dysgalactiae*, *S. dysgalactiae equisimilis*, *S. equi*, *S. equi
|
||
equi*, *S. equi ruminatorum*, and *S. equi zooepidemicus*
|
||
|
||
- Streptococcus Group F: *S. anginosus*, *S. anginosus anginosus*, *S.
|
||
anginosus whileyi*, *S. constellatus*, *S. constellatus constellatus*,
|
||
*S. constellatus pharyngis*, *S. constellatus viborgensis*, and *S.
|
||
intermedius*
|
||
|
||
- Streptococcus Group G: *S. canis*, *S. dysgalactiae*, *S. dysgalactiae
|
||
dysgalactiae*, and *S. dysgalactiae equisimilis*
|
||
|
||
- Streptococcus Group H: *S. sanguinis*
|
||
|
||
- Streptococcus Group K: *S. salivarius*, *S. salivarius salivarius*,
|
||
and *S. salivarius thermophilus*
|
||
|
||
- Streptococcus Group L: *S. dysgalactiae*, *S. dysgalactiae
|
||
dysgalactiae*, and *S. dysgalactiae equisimilis*
|
||
|
||
This is based on:
|
||
|
||
- Lancefield RC (1933). **A serological differentiation of human and
|
||
other groups of hemolytic streptococci.** *J Exp Med.* 57(4): 571-95;
|
||
[doi:10.1084/jem.57.4.571](https://doi.org/10.1084/jem.57.4.571)
|
||
|
||
## Source
|
||
|
||
- Berends MS *et al.* (2022). **AMR: An R Package for Working with
|
||
Antimicrobial Resistance Data**. *Journal of Statistical Software*,
|
||
104(3), 1-31;
|
||
[doi:10.18637/jss.v104.i03](https://doi.org/10.18637/jss.v104.i03)
|
||
|
||
- Parte, AC *et al.* (2020). **List of Prokaryotic names with Standing
|
||
in Nomenclature (LPSN) moves to the DSMZ.** International Journal of
|
||
Systematic and Evolutionary Microbiology, 70, 5607-5612;
|
||
[doi:10.1099/ijsem.0.004332](https://doi.org/10.1099/ijsem.0.004332) .
|
||
Accessed from <https://lpsn.dsmz.de> on June 24th, 2024.
|
||
|
||
- Vincent, R *et al* (2013). **MycoBank gearing up for new horizons.**
|
||
IMA Fungus, 4(2), 371-9;
|
||
[doi:10.5598/imafungus.2013.04.02.16](https://doi.org/10.5598/imafungus.2013.04.02.16)
|
||
. Accessed from <https://www.mycobank.org> on June 24th, 2024.
|
||
|
||
- GBIF Secretariat (2023). GBIF Backbone Taxonomy. Checklist dataset
|
||
[doi:10.15468/39omei](https://doi.org/10.15468/39omei) . Accessed from
|
||
<https://www.gbif.org> on June 24th, 2024.
|
||
|
||
- Reimer, LC *et al.* (2022). ***BacDive* in 2022: the knowledge base
|
||
for standardized bacterial and archaeal data.** Nucleic Acids Res.,
|
||
50(D1):D741-D74;
|
||
[doi:10.1093/nar/gkab961](https://doi.org/10.1093/nar/gkab961) .
|
||
Accessed from <https://bacdive.dsmz.de> on July 16th, 2024.
|
||
|
||
- Public Health Information Network Vocabulary Access and Distribution
|
||
System (PHIN VADS). US Edition of SNOMED CT from 1 September 2020.
|
||
Value Set Name 'Microorganism', OID 2.16.840.1.114222.4.11.1009 (v12).
|
||
URL: <https://www.cdc.gov/phin/php/phinvads/>
|
||
|
||
- Bartlett A *et al.* (2022). **A comprehensive list of bacterial
|
||
pathogens infecting humans** *Microbiology* 168:001269;
|
||
[doi:10.1099/mic.0.001269](https://doi.org/10.1099/mic.0.001269)
|
||
|
||
## Matching Score for Microorganisms
|
||
|
||
With ambiguous user input in `as.mo()` and all the
|
||
[`mo_*`](https://amr-for-r.org/reference/mo_property.md) functions, the
|
||
returned results are chosen based on their matching score using
|
||
[`mo_matching_score()`](https://amr-for-r.org/reference/mo_matching_score.md).
|
||
This matching score \\m\\, is calculated as:
|
||
|
||
\$\$m\_{(x, n)} = \frac{l\_{n} - 0.5 \cdot \min \begin{cases}l\_{n} \\
|
||
\textrm{lev}(x, n)\end{cases}}{l\_{n} \cdot p\_{n} \cdot k\_{n}}\$\$
|
||
|
||
where:
|
||
|
||
- \\x\\ is the user input;
|
||
|
||
- \\n\\ is a taxonomic name (genus, species, and subspecies);
|
||
|
||
- \\l_n\\ is the length of \\n\\;
|
||
|
||
- \\lev\\ is the [Levenshtein distance
|
||
function](https://en.wikipedia.org/wiki/Levenshtein_distance)
|
||
(counting any insertion as 1, and any deletion or substitution as 2)
|
||
that is needed to change \\x\\ into \\n\\;
|
||
|
||
- \\p_n\\ is the human pathogenic prevalence group of \\n\\, as
|
||
described below;
|
||
|
||
- \\k_n\\ is the taxonomic kingdom of \\n\\, set as Bacteria = 1, Fungi
|
||
= 1.25, Protozoa = 1.5, Chromista = 1.75, Archaea = 2, others = 3.
|
||
|
||
The grouping into human pathogenic prevalence \\p\\ is based on recent
|
||
work from Bartlett *et al.* (2022,
|
||
[doi:10.1099/mic.0.001269](https://doi.org/10.1099/mic.0.001269) ) who
|
||
extensively studied medical-scientific literature to categorise all
|
||
bacterial species into these groups:
|
||
|
||
- **Established**, if a taxonomic species has infected at least three
|
||
persons in three or more references. These records have
|
||
`prevalence = 1.15` in the
|
||
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
|
||
data set;
|
||
|
||
- **Putative**, if a taxonomic species has fewer than three known cases.
|
||
These records have `prevalence = 1.25` in the
|
||
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
|
||
data set.
|
||
|
||
Furthermore,
|
||
|
||
- Genera from the World Health Organization's (WHO) Priority Pathogen
|
||
List have `prevalence = 1.0` in the
|
||
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
|
||
data set;
|
||
|
||
- Any genus present in the **established** list also has
|
||
`prevalence = 1.15` in the
|
||
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
|
||
data set;
|
||
|
||
- Any other genus present in the **putative** list has
|
||
`prevalence = 1.25` in the
|
||
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
|
||
data set;
|
||
|
||
- Any other species or subspecies of which the genus is present in the
|
||
two aforementioned groups, has `prevalence = 1.5` in the
|
||
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
|
||
data set;
|
||
|
||
- Any *non-bacterial* genus, species or subspecies of which the genus is
|
||
present in the following list, has `prevalence = 1.25` in the
|
||
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
|
||
data set: *Absidia*, *Acanthamoeba*, *Acremonium*, *Actinomucor*,
|
||
*Aedes*, *Alternaria*, *Amoeba*, *Ancylostoma*, *Angiostrongylus*,
|
||
*Anisakis*, *Anopheles*, *Apophysomyces*, *Arthroderma*,
|
||
*Aspergillus*, *Aureobasidium*, *Basidiobolus*, *Beauveria*,
|
||
*Bipolaris*, *Blastobotrys*, *Blastocystis*, *Blastomyces*, *Candida*,
|
||
*Capillaria*, *Chaetomium*, *Chilomastix*, *Chrysonilia*,
|
||
*Chrysosporium*, *Cladophialophora*, *Cladosporium*, *Clavispora*,
|
||
*Coccidioides*, *Cokeromyces*, *Conidiobolus*, *Coniochaeta*,
|
||
*Contracaecum*, *Cordylobia*, *Cryptococcus*, *Cryptosporidium*,
|
||
*Cunninghamella*, *Curvularia*, *Cyberlindnera*, *Debaryozyma*,
|
||
*Demodex*, *Dermatobia*, *Dientamoeba*, *Diphyllobothrium*,
|
||
*Dirofilaria*, *Echinostoma*, *Entamoeba*, *Enterobius*,
|
||
*Epidermophyton*, *Exidia*, *Exophiala*, *Exserohilum*, *Fasciola*,
|
||
*Fonsecaea*, *Fusarium*, *Geotrichum*, *Giardia*, *Graphium*,
|
||
*Haloarcula*, *Halobacterium*, *Halococcus*, *Hansenula*,
|
||
*Hendersonula*, *Heterophyes*, *Histomonas*, *Histoplasma*, *Hortaea*,
|
||
*Hymenolepis*, *Hypomyces*, *Hysterothylacium*, *Kloeckera*,
|
||
*Kluyveromyces*, *Kodamaea*, *Lacazia*, *Leishmania*, *Lichtheimia*,
|
||
*Lodderomyces*, *Lomentospora*, *Madurella*, *Malassezia*,
|
||
*Malbranchea*, *Metagonimus*, *Meyerozyma*, *Microsporidium*,
|
||
*Microsporum*, *Millerozyma*, *Mortierella*, *Mucor*,
|
||
*Mycocentrospora*, *Nannizzia*, *Necator*, *Nectria*, *Ochroconis*,
|
||
*Oesophagostomum*, *Oidiodendron*, *Opisthorchis*, *Paecilomyces*,
|
||
*Paracoccidioides*, *Pediculus*, *Penicillium*, *Phaeoacremonium*,
|
||
*Phaeomoniella*, *Phialophora*, *Phlebotomus*, *Phoma*, *Pichia*,
|
||
*Piedraia*, *Pithomyces*, *Pityrosporum*, *Pneumocystis*,
|
||
*Pseudallescheria*, *Pseudoscopulariopsis*, *Pseudoterranova*,
|
||
*Pulex*, *Purpureocillium*, *Quambalaria*, *Rhinocladiella*,
|
||
*Rhizomucor*, *Rhizopus*, *Rhodotorula*, *Saccharomyces*, *Saksenaea*,
|
||
*Saprochaete*, *Sarcoptes*, *Scedosporium*, *Schistosoma*,
|
||
*Schizosaccharomyces*, *Scolecobasidium*, *Scopulariopsis*,
|
||
*Scytalidium*, *Spirometra*, *Sporobolomyces*, *Sporopachydermia*,
|
||
*Sporothrix*, *Sporotrichum*, *Stachybotrys*, *Strongyloides*,
|
||
*Syncephalastrum*, *Syngamus*, *Taenia*, *Talaromyces*, *Teleomorph*,
|
||
*Toxocara*, *Trichinella*, *Trichobilharzia*, *Trichoderma*,
|
||
*Trichomonas*, *Trichophyton*, *Trichosporon*, *Trichostrongylus*,
|
||
*Trichuris*, *Tritirachium*, *Trombicula*, *Trypanosoma*, *Tunga*,
|
||
*Ulocladium*, *Ustilago*, *Verticillium*, *Wallemia*, *Wangiella*,
|
||
*Wickerhamomyces*, *Wuchereria*, *Yarrowia*, or *Zygosaccharomyces*;
|
||
|
||
- All other records have `prevalence = 2.0` in the
|
||
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
|
||
data set.
|
||
|
||
When calculating the matching score, all characters in \\x\\ and \\n\\
|
||
are ignored that are other than A-Z, a-z, 0-9, spaces and parentheses.
|
||
|
||
All matches are sorted descending on their matching score and for all
|
||
user input values, the top match will be returned. This will lead to the
|
||
effect that e.g., `"E. coli"` will return the microbial ID of
|
||
*Escherichia coli* (\\m = 0.688\\, a highly prevalent microorganism
|
||
found in humans) and not *Entamoeba coli* (\\m = 0.381\\, a less
|
||
prevalent microorganism in humans), although the latter would
|
||
alphabetically come first.
|
||
|
||
## Download Our Reference Data
|
||
|
||
All reference data sets in the AMR package - including information on
|
||
microorganisms, antimicrobials, and clinical breakpoints - are freely
|
||
available for download in multiple formats: R, MS Excel, Apache Feather,
|
||
Apache Parquet, SPSS, and Stata.
|
||
|
||
For maximum compatibility, we also provide machine-readable,
|
||
tab-separated plain text files suitable for use in any software,
|
||
including laboratory information systems.
|
||
|
||
Visit [our website for direct download
|
||
links](https://amr-for-r.org/articles/datasets.html), or explore the
|
||
actual files in [our GitHub
|
||
repository](https://github.com/msberends/AMR/tree/main/data-raw/datasets).
|
||
|
||
## See also
|
||
|
||
[microorganisms](https://amr-for-r.org/reference/microorganisms.md) for
|
||
the [data.frame](https://rdrr.io/r/base/data.frame.html) that is being
|
||
used to determine ID's.
|
||
|
||
The [`mo_*`](https://amr-for-r.org/reference/mo_property.md) functions
|
||
(such as [`mo_genus()`](https://amr-for-r.org/reference/mo_property.md),
|
||
[`mo_gramstain()`](https://amr-for-r.org/reference/mo_property.md)) to
|
||
get properties based on the returned code.
|
||
|
||
## Examples
|
||
|
||
``` r
|
||
# \donttest{
|
||
# These examples all return "B_STPHY_AURS", the ID of S. aureus:
|
||
as.mo(c(
|
||
"sau", # WHONET code
|
||
"stau",
|
||
"STAU",
|
||
"staaur",
|
||
"S. aureus",
|
||
"S aureus",
|
||
"Sthafilokkockus aureus", # handles incorrect spelling
|
||
"Staphylococcus aureus (MRSA)",
|
||
"MRSA", # Methicillin Resistant S. aureus
|
||
"VISA", # Vancomycin Intermediate S. aureus
|
||
"VRSA", # Vancomycin Resistant S. aureus
|
||
115329001 # SNOMED CT code
|
||
))
|
||
#> Class 'mo'
|
||
#> [1] B_STPHY_AURS B_STPHY_AURS B_STPHY_AURS B_STPHY_AURS B_STPHY_AURS
|
||
#> [6] B_STPHY_AURS B_STPHY_AURS B_STPHY_AURS B_STPHY_AURS B_STPHY_AURS
|
||
#> [11] B_STPHY_AURS B_STPHY_AURS
|
||
|
||
# Dyslexia is no problem - these all work:
|
||
as.mo(c(
|
||
"Ureaplasma urealyticum",
|
||
"Ureaplasma urealyticus",
|
||
"Ureaplasmium urealytica",
|
||
"Ureaplazma urealitycium"
|
||
))
|
||
#> Class 'mo'
|
||
#> [1] B_URPLS_URLY B_URPLS_URLY B_URPLS_URLY B_URPLS_URLY
|
||
|
||
# input will get cleaned up with the input given in the `cleaning_regex` argument,
|
||
# which defaults to `mo_cleaning_regex()`:
|
||
cat(mo_cleaning_regex(), "\n")
|
||
#> ([^A-Za-z- \(\)\[\]{}]+|([({]|\[).+([})]|\])|(^| )( ?[a-z-]+[-](resistant|susceptible) ?|e?spp([^a-z]+|$)|e?ssp([^a-z]+|$)|serogr.?up[a-z]*|e?ss([^a-z]+|$)|e?sp([^a-z]+|$)|var([^a-z]+|$)|serovar[a-z]*|sube?species|biovar[a-z]*|e?species|Ig[ADEGM]|e?subsp|biotype|titer|dummy))
|
||
|
||
as.mo("Streptococcus group A")
|
||
#> Class 'mo'
|
||
#> [1] B_STRPT_GRPA
|
||
|
||
as.mo("S. epidermidis") # will remain species: B_STPHY_EPDR
|
||
#> Class 'mo'
|
||
#> [1] B_STPHY_EPDR
|
||
as.mo("S. epidermidis", Becker = TRUE) # will not remain species: B_STPHY_CONS
|
||
#> Class 'mo'
|
||
#> [1] B_STPHY_CONS
|
||
|
||
as.mo("S. pyogenes") # will remain species: B_STRPT_PYGN
|
||
#> Class 'mo'
|
||
#> [1] B_STRPT_PYGN
|
||
as.mo("S. pyogenes", Lancefield = TRUE) # will not remain species: B_STRPT_GRPA
|
||
#> Class 'mo'
|
||
#> [1] B_STRPT_GRPA
|
||
|
||
# All mo_* functions use as.mo() internally too (see ?mo_property):
|
||
mo_genus("E. coli")
|
||
#> [1] "Escherichia"
|
||
mo_gramstain("ESCO")
|
||
#> [1] "Gram-negative"
|
||
mo_is_intrinsic_resistant("ESCCOL", ab = "vanco")
|
||
#> ℹ Determining intrinsic resistance based on 'EUCAST Expected Resistant
|
||
#> Phenotypes' v1.2 (2023). This note will be shown once per session.
|
||
#> [1] TRUE
|
||
# }
|
||
```
|