incorporate Bartlett et al (2022)

This commit is contained in:
dr. M.S. (Matthijs) Berends 2022-12-19 15:32:41 +01:00
parent 23fe427cbc
commit b1b7534c78
24 changed files with 36375 additions and 36304 deletions

View File

@ -1,6 +1,6 @@
Package: AMR
Version: 1.8.2.9064
Date: 2022-12-17
Version: 1.8.2.9065
Date: 2022-12-19
Title: Antimicrobial Resistance Data Analysis
Description: Functions to simplify and standardise antimicrobial resistance (AMR)
data analysis and to work with microbial and antimicrobial properties by

View File

@ -1,4 +1,4 @@
# AMR 1.8.2.9064
# AMR 1.8.2.9065
*(this beta version will eventually become v2.0! We're happy to reach a new major milestone soon!)*
@ -7,7 +7,7 @@ This is a new major release of the AMR package, with great new additions but als
**[TL;DR](https://en.wikipedia.org/wiki/TL;DR)**
* Microbiological taxonomy (`microorganisms` data set) updated to 2022 and now based on LPSN and GBIF
* Much increased algorithms to translate user input to valid taxonomy
* Much increased algorithms to translate user input to valid taxonomy, e.g. by using [recent scientific work](https://doi.org/10.1099/mic.0.001269) about per-species human pathogenicity
* Clinical breakpoints added for EUCAST 2022 and CLSI 2022
* 20 new antibiotics added and updated all DDDs and ATC codes
* Extended support for antiviral agents (`antivirals` data set), with many new functions
@ -33,6 +33,8 @@ We are very grateful for the valuable input by our colleagues from other countri
The `microorganisms` no longer relies on the Catalogue of Life, but on the List of Prokaryotic names with Standing in Nomenclature (LPSN) and is supplemented with the 'backbone taxonomy' from the Global Biodiversity Information Facility (GBIF). The structure of this data set has changed to include separate LPSN and GBIF identifiers. Almost all previous MO codes were retained. It contains over 1,400 taxonomic names from 2022.
We previously relied on our own experience to categorise species into pathogenic groups, but we were very happy to encounter the very recent work of Bartlett *et al.* (2022, DOI [10.1099/mic.0.001269](https://doi.org/10.1099/mic.0.001269)) who extensively studied medical-scientific literature to categorise all bacterial species into groups. See `mo_matching_score()` on how their work was incorporated into the `prevalence` column of the `microorganisms` data set. Using their results, the `as.mo()` and all `mo_*()` functions are now much better capable of converting user input to valid taxonomic records.
We also made the following changes regarding the included taxonomy or microorganisms functions:
* Updated full microbiological taxonomy according to the latest daily LPSN data set (December 2022) and latest yearly GBIF taxonomy backbone (November 2022)

View File

@ -99,7 +99,7 @@
#' - `gbif_parent`\cr GBIF identifier of the parent taxon
#' - `gbif_renamed_to`\cr GBIF identifier of the currently valid taxon
#' - `source`\cr Either `r vector_or(microorganisms$source)` (see *Source*)
#' - `prevalence`\cr Prevalence of the microorganism, see [as.mo()]
#' - `prevalence`\cr Prevalence of the microorganism according to Bartlett *et al.* (2022, \doi{10.1099/mic.0.001269}), see [mo_matching_score()] for the full explanation
#' - `snomed`\cr Systematized Nomenclature of Medicine (SNOMED) code of the microorganism, version of `r documentation_date(TAXONOMY_VERSION$SNOMED$accessed_date)` (see *Source*). Use [mo_snomed()] to retrieve it quickly, see [mo_property()].
#' @details
#' Please note that entries are only based on the List of Prokaryotic names with Standing in Nomenclature (LPSN) and the Global Biodiversity Information Facility (GBIF) (see below). Since these sources incorporate entries based on (recent) publications in the International Journal of Systematic and Evolutionary Microbiology (IJSEM), it can happen that the year of publication is sometimes later than one might expect.
@ -142,7 +142,9 @@
#'
#' * `r TAXONOMY_VERSION$SNOMED$citation` URL: <`r TAXONOMY_VERSION$SNOMED$url`>
#'
#' * Grimont *et al.*. Antigenic Formulae of the Salmonella Serovars, 2007, 9th Edition. WHO Collaborating Centre for Reference and Research on *Salmonella* (WHOCC-SALM).
#' * Grimont *et al.* (2007). Antigenic Formulae of the Salmonella Serovars, 9th Edition. WHO Collaborating Centre for Reference and Research on *Salmonella* (WHOCC-SALM).
#'
#' * Bartlett *et al.* (2022). **A comprehensive list of bacterial pathogens infecting humans** *Microbiology* 168:001269; \doi{10.1099/mic.0.001269}
#' @seealso [as.mo()], [mo_property()], [microorganisms.codes], [intrinsic_resistant]
#' @examples
#' microorganisms

View File

@ -1289,7 +1289,7 @@ mdro <- function(x = NULL,
)
trans_tbl(
3,
which(x$genus == "Clostridium" & x$species == "difficile"),
which(x$genus %in% c("Clostridium", "Clostridioides") & x$species == "difficile"),
c(MTR, VAN),
"any"
)
@ -1390,7 +1390,7 @@ mdro <- function(x = NULL,
)
trans_tbl(
3,
which(x$genus == "Clostridium" & x$species == "difficile"),
which(x$genus %in% c("Clostridium", "Clostridioides") & x$species == "difficile"),
c(MTR, VAN, FDX),
"any"
)
@ -1492,7 +1492,7 @@ mdro <- function(x = NULL,
)
trans_tbl(
3,
which(x$genus == "Clostridium" & x$species == "difficile"),
which(x$genus %in% c("Clostridium", "Clostridioides") & x$species == "difficile"),
c(MTR, VAN, FDX),
"any"
)

3
R/mo.R
View File

@ -97,6 +97,7 @@
#' 7. `r TAXONOMY_VERSION$LPSN$citation` Accessed from <`r TAXONOMY_VERSION$LPSN$url`> on `r documentation_date(TAXONOMY_VERSION$LPSN$accessed_date)`.
#' 8. `r TAXONOMY_VERSION$GBIF$citation` Accessed from <`r TAXONOMY_VERSION$GBIF$url`> on `r documentation_date(TAXONOMY_VERSION$GBIF$accessed_date)`.
#' 9. `r TAXONOMY_VERSION$SNOMED$citation` URL: <`r TAXONOMY_VERSION$SNOMED$url`>
#' 10. Bartlett A *et al.* (2022). **A comprehensive list of bacterial pathogens infecting humans** *Microbiology* 168:001269; \doi{10.1099/mic.0.001269}
#' @export
#' @return A [character] [vector] with additional class [`mo`]
#' @seealso [microorganisms] for the [data.frame] that is being used to determine ID's.
@ -782,7 +783,7 @@ print.mo_uncertainties <- function(x, ...) {
return(invisible(NULL))
}
cat(word_wrap("Matching scores are based on the resemblance between the input and the full taxonomic name, and the pathogenicity in humans. See `?mo_matching_score`.\n\n", add_fn = font_blue))
cat(word_wrap("Matching scores are based on the resemblance between the input and the full taxonomic name, and the pathogenicity in humans according to Bartlett ", font_italic("et al."), " (2022). See `?mo_matching_score`.\n\n", add_fn = font_blue))
if (has_colour()) {
cat(word_wrap("Colour keys: ",
font_red_bg(" 0.000-0.499 "),

View File

@ -33,7 +33,9 @@
#' @author Dr. Matthijs Berends
#' @param x Any user input value(s)
#' @param n A full taxonomic name, that exists in [`microorganisms$fullname`][microorganisms]
#' @note This algorithm was described in: Berends MS *et al.* (2022). **AMR: An R Package for Working with Antimicrobial Resistance Data**. *Journal of Statistical Software*, 104(3), 1-31; \doi{10.18637/jss.v104.i03}.
#' @note This algorithm was originally described in: Berends MS *et al.* (2022). **AMR: An R Package for Working with Antimicrobial Resistance Data**. *Journal of Statistical Software*, 104(3), 1-31; \doi{10.18637/jss.v104.i03}.
#'
#' Later, the work of Bartlett A *et al.* about bacterial pathogens infecting humans (2022, \doi{10.1099/mic.0.001269}) was incorporated.
#' @section Matching Score for Microorganisms:
#' With ambiguous user input in [as.mo()] and all the [`mo_*`][mo_property()] functions, the returned results are chosen based on their matching score using [mo_matching_score()]. This matching score \eqn{m}, is calculated as:
#'
@ -48,15 +50,18 @@
#' * \ifelse{html}{\out{<i>p<sub>n</sub></i> is the human pathogenic prevalence group of <i>n</i>, as described below;}}{p_n is the human pathogenic prevalence group of \eqn{n}, as described below;}
#' * \ifelse{html}{\out{<i>k<sub>n</sub></i> is the taxonomic kingdom of <i>n</i>, set as Bacteria = 1, Fungi = 2, Protozoa = 3, Archaea = 4, others = 5.}}{l_n is the taxonomic kingdom of \eqn{n}, set as Bacteria = 1, Fungi = 2, Protozoa = 3, Archaea = 4, others = 5.}
#'
#' The grouping into human pathogenic prevalence (\eqn{p}) is based on experience from several microbiological laboratories in the Netherlands in conjunction with international reports on pathogen prevalence:
#' The grouping into human pathogenic prevalence (\eqn{p}) is based on recent work from Bartlett *et al.* (2022, \doi{10.1099/mic.0.001269}) who extensively studied medical-scientific literature to categorise all bacterial species into these groups:
#'
#' - **Established**, if a taxonomic species has infected at least three persons in three or more references. These records have `prevalence = 1.0` in the [microorganisms] data set.
#' - **Putative**, if a taxonomic species has fewer than three known cases. These records have `prevalence = 2.0` in the [microorganisms] data set.
#'
#' Furthermore,
#'
#' - Any *other* bacterial genus, species or subspecies of which the genus is present in the two aforementioned groups, has `prevalence = 2.5` in the [microorganisms] data set.
#' - Any *non-bacterial* genus, species or subspecies of which the genus is present in the following list, also has `prevalence = 2.5` in the [microorganisms] data set: `r vector_or(MO_PREVALENT_GENERA, quotes = "*")`.
#' - All other records have `prevalence = 3.0` in the [microorganisms] data set.
#'
#' **Group 1** (most prevalent microorganisms) consists of all microorganisms where the taxonomic class is Gammaproteobacteria or where the taxonomic genus is *Enterococcus*, *Staphylococcus* or *Streptococcus*. This group consequently contains all common Gram-negative bacteria, such as *Pseudomonas* and *Legionella* and all species within the order Enterobacterales.
#'
#' **Group 2** consists of all microorganisms where the taxonomic phylum is Pseudomonadota (previously named Proteobacteria), Bacillota (previously named Firmicutes), Actinomycetota (previously named Actinobacteria) or Sarcomastigophora, or where the taxonomic genus is `r vector_or(MO_PREVALENT_GENERA, quotes = "*")`.
#'
#' **Group 3** consists of all other microorganisms.
#'
#' All characters in \eqn{x} and \eqn{n} are ignored that are other than A-Z, a-z, 0-9, spaces and parentheses.
#' When calculating the matching score, all characters in \eqn{x} and \eqn{n} are ignored that are other than A-Z, a-z, 0-9, spaces and parentheses.
#'
#' All matches are sorted descending on their matching score and for all user input values, the top match will be returned. This will lead to the effect that e.g., `"E. coli"` will return the microbial ID of *Escherichia coli* (\eqn{m = `r round(mo_matching_score("E. coli", "Escherichia coli"), 3)`}, a highly prevalent microorganism found in humans) and not *Entamoeba coli* (\eqn{m = `r round(mo_matching_score("E. coli", "Entamoeba coli"), 3)`}, a less prevalent microorganism in humans), although the latter would alphabetically come first.
#' @export
@ -82,7 +87,7 @@ mo_matching_score <- function(x, n) {
# force a capital letter, so this conversion will not count as a substitution
substr(x, 1, 1) <- toupper(substr(x, 1, 1))
# n is always a taxonomically valid full name
if (length(n) == 1) {
n <- rep(n, length(x))
@ -90,11 +95,12 @@ mo_matching_score <- function(x, n) {
if (length(x) == 1) {
x <- rep(x, length(n))
}
# length of fullname
l_n <- nchar(n)
lev <- double(length = length(x))
l_n.lev <- double(length = length(x))
# get Levenshtein distance
lev <- unlist(Map(f = function(a, b) {
as.double(utils::adist(a, b,
ignore.case = FALSE,
@ -112,7 +118,7 @@ mo_matching_score <- function(x, n) {
p_n <- AMR_env$MO_lookup[match(n, AMR_env$MO_lookup$fullname), "prevalence", drop = TRUE]
# kingdom index (Bacteria = 1, Fungi = 2, Protozoa = 3, Archaea = 4, others = 5)
k_n <- AMR_env$MO_lookup[match(n, AMR_env$MO_lookup$fullname), "kingdom_index", drop = TRUE]
# matching score:
(l_n - 0.5 * l_n.lev) / (l_n * p_n * k_n)
}

Binary file not shown.

View File

@ -168,29 +168,21 @@ MO_STREP_ABCG <- AMR_env$MO_lookup$mo[which(AMR_env$MO_lookup$genus == "Streptoc
))]
MO_FULLNAME_LOWER <- create_MO_fullname_lower()
MO_PREVALENT_GENERA <- c(
"Absidia", "Acanthamoeba", "Acholeplasma", "Acremonium", "Actinotignum", "Aedes", "Alistipes", "Alloprevotella",
"Alternaria", "Amoeba", "Anaerosalibacter", "Ancylostoma", "Angiostrongylus", "Anisakis", "Anopheles",
"Apophysomyces", "Arachnia", "Aspergillus", "Aureobasidium", "Bacteroides", "Basidiobolus",
"Beauveria", "Bergeyella", "Blastocystis", "Blastomyces", "Borrelia", "Brachyspira", "Branhamella",
"Butyricimonas", "Candida", "Capillaria", "Capnocytophaga", "Catabacter", "Cetobacterium", "Chaetomium",
"Chlamydia", "Chlamydophila", "Christensenella", "Chryseobacterium", "Chrysonilia", "Cladophialophora", "Cladosporium",
"Conidiobolus", "Contracaecum", "Cordylobia", "Cryptococcus", "Curvularia", "Deinococcus", "Demodex",
"Dermatobia", "Dientamoeba", "Diphyllobothrium", "Dirofilaria", "Dysgonomonas", "Echinostoma", "Elizabethkingia",
"Empedobacter", "Entamoeba", "Enterobius", "Exophiala", "Exserohilum", "Fasciola", "Flavobacterium", "Fonsecaea",
"Fusarium", "Fusobacterium", "Giardia", "Haloarcula", "Halobacterium", "Halococcus", "Hendersonula",
"Heterophyes", "Histomonas", "Histoplasma", "Hymenolepis", "Hypomyces", "Hysterothylacium", "Leishmania", "Lelliottia",
"Leptosphaeria", "Leptotrichia", "Lucilia", "Lumbricus", "Malassezia", "Malbranchea", "Metagonimus", "Meyerozyma",
"Microsporidium", "Microsporum", "Mortierella", "Mucor", "Mycocentrospora", "Mycoplasma", "Myroides", "Necator",
"Nectria", "Ochroconis", "Odoribacter", "Oesophagostomum", "Oidiodendron", "Opisthorchis",
"Ornithobacterium", "Parabacteroides", "Pediculus", "Pedobacter", "Phlebotomus", "Phocaeicola",
"Phocanema", "Phoma", "Pichia", "Piedraia", "Pithomyces", "Pityrosporum", "Pneumocystis", "Porphyromonas", "Prevotella",
"Pseudallescheria", "Pseudoterranova", "Pulex", "Rhizomucor", "Rhizopus", "Rhodotorula", "Riemerella",
"Saccharomyces", "Sarcoptes", "Scolecobasidium", "Scopulariopsis", "Scytalidium", "Sphingobacterium",
"Spirometra", "Spiroplasma", "Sporobolomyces", "Stachybotrys", "Streptobacillus", "Strongyloides",
"Syngamus", "Taenia", "Tannerella", "Tenacibaculum", "Terrimonas", "Toxocara", "Treponema", "Trichinella",
"Trichobilharzia", "Trichoderma", "Trichomonas", "Trichophyton", "Trichosporon", "Trichostrongylus",
"Trichuris", "Tritirachium", "Trypanosoma", "Trombicula", "Tunga", "Ureaplasma", "Victivallis", "Wautersiella",
"Weeksella", "Wuchereria"
"Absidia", "Acanthamoeba", "Acremonium", "Aedes", "Alternaria", "Amoeba", "Ancylostoma", "Angiostrongylus",
"Anisakis", "Anopheles", "Apophysomyces", "Aspergillus", "Aureobasidium", "Basidiobolus", "Beauveria",
"Blastocystis", "Blastomyces", "Candida", "Capillaria", "Chaetomium", "Chrysonilia", "Cladophialophora",
"Cladosporium", "Conidiobolus", "Contracaecum", "Cordylobia", "Cryptococcus", "Curvularia", "Demodex",
"Dermatobia", "Dientamoeba", "Diphyllobothrium", "Dirofilaria", "Echinostoma", "Entamoeba", "Enterobius",
"Exophiala", "Exserohilum", "Fasciola", "Fonsecaea", "Fusarium", "Giardia", "Haloarcula", "Halobacterium",
"Halococcus", "Hendersonula", "Heterophyes", "Histomonas", "Histoplasma", "Hymenolepis", "Hypomyces",
"Hysterothylacium", "Leishmania", "Malassezia", "Malbranchea", "Metagonimus", "Meyerozyma", "Microsporidium",
"Microsporum", "Mortierella", "Mucor", "Mycocentrospora", "Necator", "Nectria", "Ochroconis", "Oesophagostomum",
"Oidiodendron", "Opisthorchis", "Pediculus", "Phlebotomus", "Phoma", "Pichia", "Piedraia", "Pithomyces",
"Pityrosporum", "Pneumocystis", "Pseudallescheria", "Pseudoterranova", "Pulex", "Rhizomucor", "Rhizopus",
"Rhodotorula", "Saccharomyces", "Sarcoptes", "Scolecobasidium", "Scopulariopsis", "Scytalidium", "Spirometra",
"Sporobolomyces", "Stachybotrys", "Strongyloides", "Syngamus", "Taenia", "Toxocara", "Trichinella", "Trichobilharzia",
"Trichoderma", "Trichomonas", "Trichophyton", "Trichosporon", "Trichostrongylus", "Trichuris", "Tritirachium",
"Trombicula", "Trypanosoma", "Tunga", "Wuchereria"
)
# antibiotic groups

View File

@ -69,9 +69,9 @@ genus_species is Moraxella catarrhalis ERY S AZM, CLR, RXT S Moraxella catarrhal
genus_species is Moraxella catarrhalis ERY I AZM, CLR, RXT I Moraxella catarrhalis Breakpoints 10
genus_species is Moraxella catarrhalis ERY R AZM, CLR, RXT R Moraxella catarrhalis Breakpoints 10
genus_species is Moraxella catarrhalis TCY S DOX, MNO S Moraxella catarrhalis Breakpoints 10
genus one_of Actinomyces, Bifidobacterium, Clostridium, Cutibacterium, Eggerthella, Eubacterium, Lactobacillus, Propionibacterium PEN S AMP, AMX, PIP, TZP, TIC S Anaerobic Gram-positives Breakpoints 10
genus one_of Actinomyces, Bifidobacterium, Clostridium, Cutibacterium, Eggerthella, Eubacterium, Lactobacillus, Propionibacterium PEN I AMP, AMX, PIP, TZP, TIC I Anaerobic Gram-positives Breakpoints 10
genus one_of Actinomyces, Bifidobacterium, Clostridium, Cutibacterium, Eggerthella, Eubacterium, Lactobacillus, Propionibacterium PEN R AMP, AMX, PIP, TZP, TIC R Anaerobic Gram-positives Breakpoints 10
genus one_of Actinomyces, Bifidobacterium, Clostridium, Clostridioides, Cutibacterium, Eggerthella, Eubacterium, Lactobacillus, Propionibacterium PEN S AMP, AMX, PIP, TZP, TIC S Anaerobic Gram-positives Breakpoints 10
genus one_of Actinomyces, Bifidobacterium, Clostridium, Clostridioides, Cutibacterium, Eggerthella, Eubacterium, Lactobacillus, Propionibacterium PEN I AMP, AMX, PIP, TZP, TIC I Anaerobic Gram-positives Breakpoints 10
genus one_of Actinomyces, Bifidobacterium, Clostridium, Clostridioides, Cutibacterium, Eggerthella, Eubacterium, Lactobacillus, Propionibacterium PEN R AMP, AMX, PIP, TZP, TIC R Anaerobic Gram-positives Breakpoints 10
genus one_of Bacteroides, Bilophila , Fusobacterium, Mobiluncus, Porphyromonas, Prevotella PEN S AMP, AMX, PIP, TZP, TIC S Anaerobic Gram-negatives Breakpoints 10
genus one_of Bacteroides, Bilophila , Fusobacterium, Mobiluncus, Porphyromonas, Prevotella PEN I AMP, AMX, PIP, TZP, TIC I Anaerobic Gram-negatives Breakpoints 10
genus one_of Bacteroides, Bilophila , Fusobacterium, Mobiluncus, Porphyromonas, Prevotella PEN R AMP, AMX, PIP, TZP, TIC R Anaerobic Gram-negatives Breakpoints 10
@ -175,9 +175,9 @@ genus_species is Moraxella catarrhalis ERY S AZM, CLR, RXT S Moraxella catarrhal
genus_species is Moraxella catarrhalis ERY I AZM, CLR, RXT I Moraxella catarrhalis Breakpoints 11
genus_species is Moraxella catarrhalis ERY R AZM, CLR, RXT R Moraxella catarrhalis Breakpoints 11
genus_species is Moraxella catarrhalis TCY S DOX, MNO S Moraxella catarrhalis Breakpoints 11
genus one_of Actinomyces, Bifidobacterium, Clostridium, Cutibacterium, Eggerthella, Eubacterium, Lactobacillus, Propionibacterium PEN S AMP, AMX, PIP, TIC S Anaerobic Gram-positives Breakpoints 11
genus one_of Actinomyces, Bifidobacterium, Clostridium, Cutibacterium, Eggerthella, Eubacterium, Lactobacillus, Propionibacterium PEN I AMP, AMX, PIP, TIC I Anaerobic Gram-positives Breakpoints 11
genus one_of Actinomyces, Bifidobacterium, Clostridium, Cutibacterium, Eggerthella, Eubacterium, Lactobacillus, Propionibacterium PEN R AMP, AMX, PIP, TIC R Anaerobic Gram-positives Breakpoints 11
genus one_of Actinomyces, Bifidobacterium, Clostridium, Clostridioides, Cutibacterium, Eggerthella, Eubacterium, Lactobacillus, Propionibacterium PEN S AMP, AMX, PIP, TIC S Anaerobic Gram-positives Breakpoints 11
genus one_of Actinomyces, Bifidobacterium, Clostridium, Clostridioides, Cutibacterium, Eggerthella, Eubacterium, Lactobacillus, Propionibacterium PEN I AMP, AMX, PIP, TIC I Anaerobic Gram-positives Breakpoints 11
genus one_of Actinomyces, Bifidobacterium, Clostridium, Clostridioides, Cutibacterium, Eggerthella, Eubacterium, Lactobacillus, Propionibacterium PEN R AMP, AMX, PIP, TIC R Anaerobic Gram-positives Breakpoints 11
genus one_of Bacteroides, Bilophila , Fusobacterium, Mobiluncus, Porphyromonas, Prevotella PEN S AMP, AMX, PIP, TIC S Anaerobic Gram-negatives Breakpoints 11
genus one_of Bacteroides, Bilophila , Fusobacterium, Mobiluncus, Porphyromonas, Prevotella PEN I AMP, AMX, PIP, TIC I Anaerobic Gram-negatives Breakpoints 11
genus one_of Bacteroides, Bilophila , Fusobacterium, Mobiluncus, Porphyromonas, Prevotella PEN R AMP, AMX, PIP, TIC R Anaerobic Gram-negatives Breakpoints 11

Can't render this file because it contains an unexpected character in line 5 and column 96.

Binary file not shown.

Binary file not shown.

View File

@ -1 +1 @@
6e6f44705995094be5eddc00e0878308
9e75112567e7786a6712024730056057

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

File diff suppressed because one or more lines are too long

Binary file not shown.

View File

@ -36,20 +36,29 @@
# 2. Go to https://lpsn.dsmz.de/downloads (register first) and download the latest
# CSV file (~12,5 MB) as "taxonomy.csv". Their API unfortunately does
# not include the full taxonomy and is currently (2022) pretty worthless.
# 3. Set this folder_location to the path where these two files are:
# 3. For data about human pathogens, we use Bartlett et al. (2022),
# https://doi.org/10.1099/mic.0.001269. Their latest supplementary material
# can be found here: https://github.com/padpadpadpad/bartlett_et_al_2022_human_pathogens.
#. Download their latest xlsx file in the `data` folder and save it to our
#. `data-raw` folder.
# 4. Set this folder_location to the path where these two files are:
folder_location <- "~/Downloads/backbone/"
file_gbif <- paste0(folder_location, "Taxon.tsv")
file_lpsn <- paste0(folder_location, "taxonomy.csv")
file_bartlett <- "data-raw/bartlett_et_al_2022_human_pathogens.xlsx"
# 4. Run the rest of this script line by line and check everything :)
if (!file.exists(file_gbif)) stop("GBIF file not found")
if (!file.exists(file_lpsn)) stop("LPSN file not found")
if (!file.exists(file_bartlett)) stop("Bartlett et al. Excel file not found")
library(dplyr)
library(vroom) # to import files
library(rvest) # to scape LPSN website
library(progress) # to show progress bars
library(readxl) # for reading the Bartlett Excel file
devtools::load_all(".") # load AMR package
# Helper functions --------------------------------------------------------
@ -776,32 +785,6 @@ taxonomy$gbif_parent[taxonomy$rank == "subspecies" & !is.na(taxonomy$gbif)] <- t
all(taxonomy$lpsn_parent %in% taxonomy$lpsn)
all(taxonomy$gbif_parent %in% taxonomy$gbif)
# Add prevalence ----------------------------------------------------------
# update prevalence based on taxonomy (our own JSS paper: Berends MS et al. (2022), DOI 10.18637/jss.v104.i03)
taxonomy <- taxonomy %>%
mutate(prevalence = case_when(
class == "Gammaproteobacteria" |
genus %in% c("Enterococcus", "Staphylococcus", "Streptococcus")
~ 1,
kingdom %in% c("Archaea", "Bacteria", "Chromista", "Fungi") &
(phylum %in% c(
"Sarcomastigophora",
"Firmicutes", # old, now Bacillota
"Bacillota",
"Proteobacteria", # old, now Pseudomonadota
"Pseudomonadota",
"Actinobacteria", # old, now Actinomycetota
"Actinomycetota"
) |
genus %in% AMR:::MO_PREVALENT_GENERA)
~ 2,
TRUE ~ 3
))
table(taxonomy$prevalence, useNA = "always")
# (a lot will be removed further below)
# fix rank
taxonomy <- taxonomy %>%
mutate(rank = case_when(
@ -817,6 +800,71 @@ taxonomy <- taxonomy %>%
))
# Add prevalence ----------------------------------------------------------
pathogens <- read_excel(file_bartlett, sheet = "Tab 6 Full List")
# get all established, both old and current taxonomic names
established <- pathogens %>%
filter(status == "established") %>%
mutate(fullname = paste(genus, species)) %>%
pull(fullname) %>%
c(unlist(mo_current(.)),
unlist(mo_synonyms(., keep_synonyms = FALSE))) %>%
strsplit(" ", fixed = TRUE) %>%
sapply(function(x) ifelse(length(x) == 1, x, paste(x[1], x[2]))) %>%
sort() %>%
unique()
# get all putative, both old and current taxonomic names
putative <- pathogens %>%
filter(status == "putative") %>%
mutate(fullname = paste(genus, species)) %>%
pull(fullname) %>%
c(unlist(mo_current(.)),
unlist(mo_synonyms(., keep_synonyms = FALSE))) %>%
strsplit(" ", fixed = TRUE) %>%
sapply(function(x) ifelse(length(x) == 1, x, paste(x[1], x[2]))) %>%
sort() %>%
unique()
established <- established[established %unlike% "unknown"]
putative <- putative[putative %unlike% "unknown"]
other_bacterial_genera <- c(established, putative) %>%
strsplit(" ", fixed = TRUE) %>%
sapply(function(x) x[1]) %>%
sort() %>%
unique()
other_genera <- AMR:::MO_PREVALENT_GENERA %>%
c(unlist(mo_current(.)),
unlist(mo_synonyms(., keep_synonyms = FALSE))) %>%
strsplit(" ", fixed = TRUE) %>%
sapply(function(x) x[1]) %>%
sort() %>%
unique()
other_genera <- other_genera[other_genera %unlike% "unknown"]
# update prevalence based on taxonomy (following the recent and thorough work of Bartlett et al., 2022)
# see https://doi.org/10.1099/mic.0.001269
taxonomy <- taxonomy %>%
mutate(prevalence = case_when(
# 'established' gets a 1 and means 'have infected at least three persons in three or more references'
paste(genus, species) %in% established & rank %in% c("genus", "species", "subspecies") ~ 1.0,
# 'putative' gets a 2 and means 'fewer than three known cases'
paste(genus, species) %in% putative & rank %in% c("genus", "species", "subspecies") ~ 2.0,
# other species from a genus in either group get a 2.5
genus %in% other_bacterial_genera & rank %in% c("genus", "species", "subspecies") ~ 2.5,
# we keep track of prevalent genera too of non-bacterial species
genus %in% AMR:::MO_PREVALENT_GENERA & kingdom != "Bacteria" & rank %in% c("genus", "species", "subspecies") ~ 2.5,
# all others get a 3
TRUE ~ 3.0))
table(taxonomy$prevalence, useNA = "always")
# (a lot will be removed further below)
# Save intermediate results (2) -------------------------------------------
saveRDS(taxonomy, "data-raw/taxonomy2.rds")

Binary file not shown.

View File

@ -123,6 +123,7 @@ The coercion rules consider the prevalence of microorganisms in humans grouped i
\item Parte, AC \emph{et al.} (2020). \strong{List of Prokaryotic names with Standing in Nomenclature (LPSN) moves to the DSMZ.} International Journal of Systematic and Evolutionary Microbiology, 70, 5607-5612; \doi{10.1099/ijsem.0.004332}. Accessed from \url{https://lpsn.dsmz.de} on 11 December, 2022.
\item GBIF Secretariat (2022). GBIF Backbone Taxonomy. Checklist dataset \doi{10.15468/39omei}. Accessed from \url{https://www.gbif.org} on 11 December, 2022.
\item Public Health Information Network Vocabulary Access and Distribution System (PHIN VADS). US Edition of SNOMED CT from 1 September 2020. Value Set Name 'Microoganism', OID 2.16.840.1.114222.4.11.1009 (v12). URL: \url{https://phinvads.cdc.gov}
\item Bartlett A \emph{et al.} (2022). \strong{A comprehensive list of bacterial pathogens infecting humans} \emph{Microbiology} 168:001269; \doi{10.1099/mic.0.001269}
}
}
@ -142,17 +143,22 @@ where:
\item \ifelse{html}{\out{<i>k<sub>n</sub></i> is the taxonomic kingdom of <i>n</i>, set as Bacteria = 1, Fungi = 2, Protozoa = 3, Archaea = 4, others = 5.}}{l_n is the taxonomic kingdom of \eqn{n}, set as Bacteria = 1, Fungi = 2, Protozoa = 3, Archaea = 4, others = 5.}
}
The grouping into human pathogenic prevalence (\eqn{p}) is based on experience from several microbiological laboratories in the Netherlands in conjunction with international reports on pathogen prevalence:
The grouping into human pathogenic prevalence (\eqn{p}) is based on recent work from Bartlett \emph{et al.} (2022, \doi{10.1099/mic.0.001269}) who extensively studied medical-scientific literature to categorise all bacterial species into these groups:
\itemize{
\item \strong{Established}, if a taxonomic species has infected at least three persons in three or more references. These records have \code{prevalence = 1.0} in the \link{microorganisms} data set.
\item \strong{Putative}, if a taxonomic species has fewer than three known cases. These records have \code{prevalence = 2.0} in the \link{microorganisms} data set.
}
\strong{Group 1} (most prevalent microorganisms) consists of all microorganisms where the taxonomic class is Gammaproteobacteria or where the taxonomic genus is \emph{Enterococcus}, \emph{Staphylococcus} or \emph{Streptococcus}. This group consequently contains all common Gram-negative bacteria, such as \emph{Pseudomonas} and \emph{Legionella} and all species within the order Enterobacterales.
Furthermore,
\itemize{
\item Any \emph{other} bacterial genus, species or subspecies of which the genus is present in the two aforementioned groups, has \code{prevalence = 2.5} in the \link{microorganisms} data set.
\item Any \emph{non-bacterial} genus, species or subspecies of which the genus is present in the following list, also has \code{prevalence = 2.5} in the \link{microorganisms} data set: \emph{Absidia}, \emph{Acanthamoeba}, \emph{Acremonium}, \emph{Aedes}, \emph{Alternaria}, \emph{Amoeba}, \emph{Ancylostoma}, \emph{Angiostrongylus}, \emph{Anisakis}, \emph{Anopheles}, \emph{Apophysomyces}, \emph{Aspergillus}, \emph{Aureobasidium}, \emph{Basidiobolus}, \emph{Beauveria}, \emph{Blastocystis}, \emph{Blastomyces}, \emph{Candida}, \emph{Capillaria}, \emph{Chaetomium}, \emph{Chrysonilia}, \emph{Cladophialophora}, \emph{Cladosporium}, \emph{Conidiobolus}, \emph{Contracaecum}, \emph{Cordylobia}, \emph{Cryptococcus}, \emph{Curvularia}, \emph{Demodex}, \emph{Dermatobia}, \emph{Dientamoeba}, \emph{Diphyllobothrium}, \emph{Dirofilaria}, \emph{Echinostoma}, \emph{Entamoeba}, \emph{Enterobius}, \emph{Exophiala}, \emph{Exserohilum}, \emph{Fasciola}, \emph{Fonsecaea}, \emph{Fusarium}, \emph{Giardia}, \emph{Haloarcula}, \emph{Halobacterium}, \emph{Halococcus}, \emph{Hendersonula}, \emph{Heterophyes}, \emph{Histomonas}, \emph{Histoplasma}, \emph{Hymenolepis}, \emph{Hypomyces}, \emph{Hysterothylacium}, \emph{Leishmania}, \emph{Malassezia}, \emph{Malbranchea}, \emph{Metagonimus}, \emph{Meyerozyma}, \emph{Microsporidium}, \emph{Microsporum}, \emph{Mortierella}, \emph{Mucor}, \emph{Mycocentrospora}, \emph{Necator}, \emph{Nectria}, \emph{Ochroconis}, \emph{Oesophagostomum}, \emph{Oidiodendron}, \emph{Opisthorchis}, \emph{Pediculus}, \emph{Phlebotomus}, \emph{Phoma}, \emph{Pichia}, \emph{Piedraia}, \emph{Pithomyces}, \emph{Pityrosporum}, \emph{Pneumocystis}, \emph{Pseudallescheria}, \emph{Pseudoterranova}, \emph{Pulex}, \emph{Rhizomucor}, \emph{Rhizopus}, \emph{Rhodotorula}, \emph{Saccharomyces}, \emph{Sarcoptes}, \emph{Scolecobasidium}, \emph{Scopulariopsis}, \emph{Scytalidium}, \emph{Spirometra}, \emph{Sporobolomyces}, \emph{Stachybotrys}, \emph{Strongyloides}, \emph{Syngamus}, \emph{Taenia}, \emph{Toxocara}, \emph{Trichinella}, \emph{Trichobilharzia}, \emph{Trichoderma}, \emph{Trichomonas}, \emph{Trichophyton}, \emph{Trichosporon}, \emph{Trichostrongylus}, \emph{Trichuris}, \emph{Tritirachium}, \emph{Trombicula}, \emph{Trypanosoma}, \emph{Tunga} or \emph{Wuchereria}.
\item All other records have \code{prevalence = 3.0} in the \link{microorganisms} data set.
}
\strong{Group 2} consists of all microorganisms where the taxonomic phylum is Pseudomonadota (previously named Proteobacteria), Bacillota (previously named Firmicutes), Actinomycetota (previously named Actinobacteria) or Sarcomastigophora, or where the taxonomic genus is \emph{Absidia}, \emph{Acanthamoeba}, \emph{Acholeplasma}, \emph{Acremonium}, \emph{Actinotignum}, \emph{Aedes}, \emph{Alistipes}, \emph{Alloprevotella}, \emph{Alternaria}, \emph{Amoeba}, \emph{Anaerosalibacter}, \emph{Ancylostoma}, \emph{Angiostrongylus}, \emph{Anisakis}, \emph{Anopheles}, \emph{Apophysomyces}, \emph{Arachnia}, \emph{Aspergillus}, \emph{Aureobasidium}, \emph{Bacteroides}, \emph{Basidiobolus}, \emph{Beauveria}, \emph{Bergeyella}, \emph{Blastocystis}, \emph{Blastomyces}, \emph{Borrelia}, \emph{Brachyspira}, \emph{Branhamella}, \emph{Butyricimonas}, \emph{Candida}, \emph{Capillaria}, \emph{Capnocytophaga}, \emph{Catabacter}, \emph{Cetobacterium}, \emph{Chaetomium}, \emph{Chlamydia}, \emph{Chlamydophila}, \emph{Christensenella}, \emph{Chryseobacterium}, \emph{Chrysonilia}, \emph{Cladophialophora}, \emph{Cladosporium}, \emph{Conidiobolus}, \emph{Contracaecum}, \emph{Cordylobia}, \emph{Cryptococcus}, \emph{Curvularia}, \emph{Deinococcus}, \emph{Demodex}, \emph{Dermatobia}, \emph{Dientamoeba}, \emph{Diphyllobothrium}, \emph{Dirofilaria}, \emph{Dysgonomonas}, \emph{Echinostoma}, \emph{Elizabethkingia}, \emph{Empedobacter}, \emph{Entamoeba}, \emph{Enterobius}, \emph{Exophiala}, \emph{Exserohilum}, \emph{Fasciola}, \emph{Flavobacterium}, \emph{Fonsecaea}, \emph{Fusarium}, \emph{Fusobacterium}, \emph{Giardia}, \emph{Haloarcula}, \emph{Halobacterium}, \emph{Halococcus}, \emph{Hendersonula}, \emph{Heterophyes}, \emph{Histomonas}, \emph{Histoplasma}, \emph{Hymenolepis}, \emph{Hypomyces}, \emph{Hysterothylacium}, \emph{Leishmania}, \emph{Lelliottia}, \emph{Leptosphaeria}, \emph{Leptotrichia}, \emph{Lucilia}, \emph{Lumbricus}, \emph{Malassezia}, \emph{Malbranchea}, \emph{Metagonimus}, \emph{Meyerozyma}, \emph{Microsporidium}, \emph{Microsporum}, \emph{Mortierella}, \emph{Mucor}, \emph{Mycocentrospora}, \emph{Mycoplasma}, \emph{Myroides}, \emph{Necator}, \emph{Nectria}, \emph{Ochroconis}, \emph{Odoribacter}, \emph{Oesophagostomum}, \emph{Oidiodendron}, \emph{Opisthorchis}, \emph{Ornithobacterium}, \emph{Parabacteroides}, \emph{Pediculus}, \emph{Pedobacter}, \emph{Phlebotomus}, \emph{Phocaeicola}, \emph{Phocanema}, \emph{Phoma}, \emph{Pichia}, \emph{Piedraia}, \emph{Pithomyces}, \emph{Pityrosporum}, \emph{Pneumocystis}, \emph{Porphyromonas}, \emph{Prevotella}, \emph{Pseudallescheria}, \emph{Pseudoterranova}, \emph{Pulex}, \emph{Rhizomucor}, \emph{Rhizopus}, \emph{Rhodotorula}, \emph{Riemerella}, \emph{Saccharomyces}, \emph{Sarcoptes}, \emph{Scolecobasidium}, \emph{Scopulariopsis}, \emph{Scytalidium}, \emph{Sphingobacterium}, \emph{Spirometra}, \emph{Spiroplasma}, \emph{Sporobolomyces}, \emph{Stachybotrys}, \emph{Streptobacillus}, \emph{Strongyloides}, \emph{Syngamus}, \emph{Taenia}, \emph{Tannerella}, \emph{Tenacibaculum}, \emph{Terrimonas}, \emph{Toxocara}, \emph{Treponema}, \emph{Trichinella}, \emph{Trichobilharzia}, \emph{Trichoderma}, \emph{Trichomonas}, \emph{Trichophyton}, \emph{Trichosporon}, \emph{Trichostrongylus}, \emph{Trichuris}, \emph{Tritirachium}, \emph{Trombicula}, \emph{Trypanosoma}, \emph{Tunga}, \emph{Ureaplasma}, \emph{Victivallis}, \emph{Wautersiella}, \emph{Weeksella} or \emph{Wuchereria}.
When calculating the matching score, all characters in \eqn{x} and \eqn{n} are ignored that are other than A-Z, a-z, 0-9, spaces and parentheses.
\strong{Group 3} consists of all other microorganisms.
All characters in \eqn{x} and \eqn{n} are ignored that are other than A-Z, a-z, 0-9, spaces and parentheses.
All matches are sorted descending on their matching score and for all user input values, the top match will be returned. This will lead to the effect that e.g., \code{"E. coli"} will return the microbial ID of \emph{Escherichia coli} (\eqn{m = 0.688}, a highly prevalent microorganism found in humans) and not \emph{Entamoeba coli} (\eqn{m = 0.079}, a less prevalent microorganism in humans), although the latter would alphabetically come first.
All matches are sorted descending on their matching score and for all user input values, the top match will be returned. This will lead to the effect that e.g., \code{"E. coli"} will return the microbial ID of \emph{Escherichia coli} (\eqn{m = 0.688}, a highly prevalent microorganism found in humans) and not \emph{Entamoeba coli} (\eqn{m = 0.095}, a less prevalent microorganism in humans), although the latter would alphabetically come first.
}
\section{Reference Data Publicly Available}{

View File

@ -20,7 +20,7 @@ A \link[tibble:tibble]{tibble} with 52,140 observations and 22 variables:
\item \code{gbif_parent}\cr GBIF identifier of the parent taxon
\item \code{gbif_renamed_to}\cr GBIF identifier of the currently valid taxon
\item \code{source}\cr Either "GBIF", "LPSN" or "manually added" (see \emph{Source})
\item \code{prevalence}\cr Prevalence of the microorganism, see \code{\link[=as.mo]{as.mo()}}
\item \code{prevalence}\cr Prevalence of the microorganism according to Bartlett \emph{et al.} (2022, \doi{10.1099/mic.0.001269}), see \code{\link[=mo_matching_score]{mo_matching_score()}} for the full explanation
\item \code{snomed}\cr Systematized Nomenclature of Medicine (SNOMED) code of the microorganism, version of 1 July, 2021 (see \emph{Source}). Use \code{\link[=mo_snomed]{mo_snomed()}} to retrieve it quickly, see \code{\link[=mo_property]{mo_property()}}.
}
}
@ -29,7 +29,8 @@ A \link[tibble:tibble]{tibble} with 52,140 observations and 22 variables:
\item Parte, AC \emph{et al.} (2020). \strong{List of Prokaryotic names with Standing in Nomenclature (LPSN) moves to the DSMZ.} International Journal of Systematic and Evolutionary Microbiology, 70, 5607-5612; \doi{10.1099/ijsem.0.004332}. Accessed from \url{https://lpsn.dsmz.de} on 11 December, 2022.
\item GBIF Secretariat (2022). GBIF Backbone Taxonomy. Checklist dataset \doi{10.15468/39omei}. Accessed from \url{https://www.gbif.org} on 11 December, 2022.
\item Public Health Information Network Vocabulary Access and Distribution System (PHIN VADS). US Edition of SNOMED CT from 1 September 2020. Value Set Name 'Microoganism', OID 2.16.840.1.114222.4.11.1009 (v12). URL: \url{https://phinvads.cdc.gov}
\item Grimont \emph{et al.}. Antigenic Formulae of the Salmonella Serovars, 2007, 9th Edition. WHO Collaborating Centre for Reference and Research on \emph{Salmonella} (WHOCC-SALM).
\item Grimont \emph{et al.} (2007). Antigenic Formulae of the Salmonella Serovars, 9th Edition. WHO Collaborating Centre for Reference and Research on \emph{Salmonella} (WHOCC-SALM).
\item Bartlett \emph{et al.} (2022). \strong{A comprehensive list of bacterial pathogens infecting humans} \emph{Microbiology} 168:001269; \doi{10.1099/mic.0.001269}
}
}
\usage{

View File

@ -15,7 +15,9 @@ mo_matching_score(x, n)
This algorithm is used by \code{\link[=as.mo]{as.mo()}} and all the \code{\link[=mo_property]{mo_*}} functions to determine the most probable match of taxonomic records based on user input.
}
\note{
This algorithm was described in: Berends MS \emph{et al.} (2022). \strong{AMR: An R Package for Working with Antimicrobial Resistance Data}. \emph{Journal of Statistical Software}, 104(3), 1-31; \doi{10.18637/jss.v104.i03}.
This algorithm was originally described in: Berends MS \emph{et al.} (2022). \strong{AMR: An R Package for Working with Antimicrobial Resistance Data}. \emph{Journal of Statistical Software}, 104(3), 1-31; \doi{10.18637/jss.v104.i03}.
Later, the work of Bartlett A \emph{et al.} about bacterial pathogens infecting humans (2022, \doi{10.1099/mic.0.001269}) was incorporated.
}
\section{Matching Score for Microorganisms}{
@ -33,17 +35,22 @@ where:
\item \ifelse{html}{\out{<i>k<sub>n</sub></i> is the taxonomic kingdom of <i>n</i>, set as Bacteria = 1, Fungi = 2, Protozoa = 3, Archaea = 4, others = 5.}}{l_n is the taxonomic kingdom of \eqn{n}, set as Bacteria = 1, Fungi = 2, Protozoa = 3, Archaea = 4, others = 5.}
}
The grouping into human pathogenic prevalence (\eqn{p}) is based on experience from several microbiological laboratories in the Netherlands in conjunction with international reports on pathogen prevalence:
The grouping into human pathogenic prevalence (\eqn{p}) is based on recent work from Bartlett \emph{et al.} (2022, \doi{10.1099/mic.0.001269}) who extensively studied medical-scientific literature to categorise all bacterial species into these groups:
\itemize{
\item \strong{Established}, if a taxonomic species has infected at least three persons in three or more references. These records have \code{prevalence = 1.0} in the \link{microorganisms} data set.
\item \strong{Putative}, if a taxonomic species has fewer than three known cases. These records have \code{prevalence = 2.0} in the \link{microorganisms} data set.
}
\strong{Group 1} (most prevalent microorganisms) consists of all microorganisms where the taxonomic class is Gammaproteobacteria or where the taxonomic genus is \emph{Enterococcus}, \emph{Staphylococcus} or \emph{Streptococcus}. This group consequently contains all common Gram-negative bacteria, such as \emph{Pseudomonas} and \emph{Legionella} and all species within the order Enterobacterales.
Furthermore,
\itemize{
\item Any \emph{other} bacterial genus, species or subspecies of which the genus is present in the two aforementioned groups, has \code{prevalence = 2.5} in the \link{microorganisms} data set.
\item Any \emph{non-bacterial} genus, species or subspecies of which the genus is present in the following list, also has \code{prevalence = 2.5} in the \link{microorganisms} data set: \emph{Absidia}, \emph{Acanthamoeba}, \emph{Acremonium}, \emph{Aedes}, \emph{Alternaria}, \emph{Amoeba}, \emph{Ancylostoma}, \emph{Angiostrongylus}, \emph{Anisakis}, \emph{Anopheles}, \emph{Apophysomyces}, \emph{Aspergillus}, \emph{Aureobasidium}, \emph{Basidiobolus}, \emph{Beauveria}, \emph{Blastocystis}, \emph{Blastomyces}, \emph{Candida}, \emph{Capillaria}, \emph{Chaetomium}, \emph{Chrysonilia}, \emph{Cladophialophora}, \emph{Cladosporium}, \emph{Conidiobolus}, \emph{Contracaecum}, \emph{Cordylobia}, \emph{Cryptococcus}, \emph{Curvularia}, \emph{Demodex}, \emph{Dermatobia}, \emph{Dientamoeba}, \emph{Diphyllobothrium}, \emph{Dirofilaria}, \emph{Echinostoma}, \emph{Entamoeba}, \emph{Enterobius}, \emph{Exophiala}, \emph{Exserohilum}, \emph{Fasciola}, \emph{Fonsecaea}, \emph{Fusarium}, \emph{Giardia}, \emph{Haloarcula}, \emph{Halobacterium}, \emph{Halococcus}, \emph{Hendersonula}, \emph{Heterophyes}, \emph{Histomonas}, \emph{Histoplasma}, \emph{Hymenolepis}, \emph{Hypomyces}, \emph{Hysterothylacium}, \emph{Leishmania}, \emph{Malassezia}, \emph{Malbranchea}, \emph{Metagonimus}, \emph{Meyerozyma}, \emph{Microsporidium}, \emph{Microsporum}, \emph{Mortierella}, \emph{Mucor}, \emph{Mycocentrospora}, \emph{Necator}, \emph{Nectria}, \emph{Ochroconis}, \emph{Oesophagostomum}, \emph{Oidiodendron}, \emph{Opisthorchis}, \emph{Pediculus}, \emph{Phlebotomus}, \emph{Phoma}, \emph{Pichia}, \emph{Piedraia}, \emph{Pithomyces}, \emph{Pityrosporum}, \emph{Pneumocystis}, \emph{Pseudallescheria}, \emph{Pseudoterranova}, \emph{Pulex}, \emph{Rhizomucor}, \emph{Rhizopus}, \emph{Rhodotorula}, \emph{Saccharomyces}, \emph{Sarcoptes}, \emph{Scolecobasidium}, \emph{Scopulariopsis}, \emph{Scytalidium}, \emph{Spirometra}, \emph{Sporobolomyces}, \emph{Stachybotrys}, \emph{Strongyloides}, \emph{Syngamus}, \emph{Taenia}, \emph{Toxocara}, \emph{Trichinella}, \emph{Trichobilharzia}, \emph{Trichoderma}, \emph{Trichomonas}, \emph{Trichophyton}, \emph{Trichosporon}, \emph{Trichostrongylus}, \emph{Trichuris}, \emph{Tritirachium}, \emph{Trombicula}, \emph{Trypanosoma}, \emph{Tunga} or \emph{Wuchereria}.
\item All other records have \code{prevalence = 3.0} in the \link{microorganisms} data set.
}
\strong{Group 2} consists of all microorganisms where the taxonomic phylum is Pseudomonadota (previously named Proteobacteria), Bacillota (previously named Firmicutes), Actinomycetota (previously named Actinobacteria) or Sarcomastigophora, or where the taxonomic genus is \emph{Absidia}, \emph{Acanthamoeba}, \emph{Acholeplasma}, \emph{Acremonium}, \emph{Actinotignum}, \emph{Aedes}, \emph{Alistipes}, \emph{Alloprevotella}, \emph{Alternaria}, \emph{Amoeba}, \emph{Anaerosalibacter}, \emph{Ancylostoma}, \emph{Angiostrongylus}, \emph{Anisakis}, \emph{Anopheles}, \emph{Apophysomyces}, \emph{Arachnia}, \emph{Aspergillus}, \emph{Aureobasidium}, \emph{Bacteroides}, \emph{Basidiobolus}, \emph{Beauveria}, \emph{Bergeyella}, \emph{Blastocystis}, \emph{Blastomyces}, \emph{Borrelia}, \emph{Brachyspira}, \emph{Branhamella}, \emph{Butyricimonas}, \emph{Candida}, \emph{Capillaria}, \emph{Capnocytophaga}, \emph{Catabacter}, \emph{Cetobacterium}, \emph{Chaetomium}, \emph{Chlamydia}, \emph{Chlamydophila}, \emph{Christensenella}, \emph{Chryseobacterium}, \emph{Chrysonilia}, \emph{Cladophialophora}, \emph{Cladosporium}, \emph{Conidiobolus}, \emph{Contracaecum}, \emph{Cordylobia}, \emph{Cryptococcus}, \emph{Curvularia}, \emph{Deinococcus}, \emph{Demodex}, \emph{Dermatobia}, \emph{Dientamoeba}, \emph{Diphyllobothrium}, \emph{Dirofilaria}, \emph{Dysgonomonas}, \emph{Echinostoma}, \emph{Elizabethkingia}, \emph{Empedobacter}, \emph{Entamoeba}, \emph{Enterobius}, \emph{Exophiala}, \emph{Exserohilum}, \emph{Fasciola}, \emph{Flavobacterium}, \emph{Fonsecaea}, \emph{Fusarium}, \emph{Fusobacterium}, \emph{Giardia}, \emph{Haloarcula}, \emph{Halobacterium}, \emph{Halococcus}, \emph{Hendersonula}, \emph{Heterophyes}, \emph{Histomonas}, \emph{Histoplasma}, \emph{Hymenolepis}, \emph{Hypomyces}, \emph{Hysterothylacium}, \emph{Leishmania}, \emph{Lelliottia}, \emph{Leptosphaeria}, \emph{Leptotrichia}, \emph{Lucilia}, \emph{Lumbricus}, \emph{Malassezia}, \emph{Malbranchea}, \emph{Metagonimus}, \emph{Meyerozyma}, \emph{Microsporidium}, \emph{Microsporum}, \emph{Mortierella}, \emph{Mucor}, \emph{Mycocentrospora}, \emph{Mycoplasma}, \emph{Myroides}, \emph{Necator}, \emph{Nectria}, \emph{Ochroconis}, \emph{Odoribacter}, \emph{Oesophagostomum}, \emph{Oidiodendron}, \emph{Opisthorchis}, \emph{Ornithobacterium}, \emph{Parabacteroides}, \emph{Pediculus}, \emph{Pedobacter}, \emph{Phlebotomus}, \emph{Phocaeicola}, \emph{Phocanema}, \emph{Phoma}, \emph{Pichia}, \emph{Piedraia}, \emph{Pithomyces}, \emph{Pityrosporum}, \emph{Pneumocystis}, \emph{Porphyromonas}, \emph{Prevotella}, \emph{Pseudallescheria}, \emph{Pseudoterranova}, \emph{Pulex}, \emph{Rhizomucor}, \emph{Rhizopus}, \emph{Rhodotorula}, \emph{Riemerella}, \emph{Saccharomyces}, \emph{Sarcoptes}, \emph{Scolecobasidium}, \emph{Scopulariopsis}, \emph{Scytalidium}, \emph{Sphingobacterium}, \emph{Spirometra}, \emph{Spiroplasma}, \emph{Sporobolomyces}, \emph{Stachybotrys}, \emph{Streptobacillus}, \emph{Strongyloides}, \emph{Syngamus}, \emph{Taenia}, \emph{Tannerella}, \emph{Tenacibaculum}, \emph{Terrimonas}, \emph{Toxocara}, \emph{Treponema}, \emph{Trichinella}, \emph{Trichobilharzia}, \emph{Trichoderma}, \emph{Trichomonas}, \emph{Trichophyton}, \emph{Trichosporon}, \emph{Trichostrongylus}, \emph{Trichuris}, \emph{Tritirachium}, \emph{Trombicula}, \emph{Trypanosoma}, \emph{Tunga}, \emph{Ureaplasma}, \emph{Victivallis}, \emph{Wautersiella}, \emph{Weeksella} or \emph{Wuchereria}.
When calculating the matching score, all characters in \eqn{x} and \eqn{n} are ignored that are other than A-Z, a-z, 0-9, spaces and parentheses.
\strong{Group 3} consists of all other microorganisms.
All characters in \eqn{x} and \eqn{n} are ignored that are other than A-Z, a-z, 0-9, spaces and parentheses.
All matches are sorted descending on their matching score and for all user input values, the top match will be returned. This will lead to the effect that e.g., \code{"E. coli"} will return the microbial ID of \emph{Escherichia coli} (\eqn{m = 0.688}, a highly prevalent microorganism found in humans) and not \emph{Entamoeba coli} (\eqn{m = 0.079}, a less prevalent microorganism in humans), although the latter would alphabetically come first.
All matches are sorted descending on their matching score and for all user input values, the top match will be returned. This will lead to the effect that e.g., \code{"E. coli"} will return the microbial ID of \emph{Escherichia coli} (\eqn{m = 0.688}, a highly prevalent microorganism found in humans) and not \emph{Entamoeba coli} (\eqn{m = 0.095}, a less prevalent microorganism in humans), although the latter would alphabetically come first.
}
\section{Reference Data Publicly Available}{

View File

@ -326,17 +326,22 @@ where:
\item \ifelse{html}{\out{<i>k<sub>n</sub></i> is the taxonomic kingdom of <i>n</i>, set as Bacteria = 1, Fungi = 2, Protozoa = 3, Archaea = 4, others = 5.}}{l_n is the taxonomic kingdom of \eqn{n}, set as Bacteria = 1, Fungi = 2, Protozoa = 3, Archaea = 4, others = 5.}
}
The grouping into human pathogenic prevalence (\eqn{p}) is based on experience from several microbiological laboratories in the Netherlands in conjunction with international reports on pathogen prevalence:
The grouping into human pathogenic prevalence (\eqn{p}) is based on recent work from Bartlett \emph{et al.} (2022, \doi{10.1099/mic.0.001269}) who extensively studied medical-scientific literature to categorise all bacterial species into these groups:
\itemize{
\item \strong{Established}, if a taxonomic species has infected at least three persons in three or more references. These records have \code{prevalence = 1.0} in the \link{microorganisms} data set.
\item \strong{Putative}, if a taxonomic species has fewer than three known cases. These records have \code{prevalence = 2.0} in the \link{microorganisms} data set.
}
\strong{Group 1} (most prevalent microorganisms) consists of all microorganisms where the taxonomic class is Gammaproteobacteria or where the taxonomic genus is \emph{Enterococcus}, \emph{Staphylococcus} or \emph{Streptococcus}. This group consequently contains all common Gram-negative bacteria, such as \emph{Pseudomonas} and \emph{Legionella} and all species within the order Enterobacterales.
Furthermore,
\itemize{
\item Any \emph{other} bacterial genus, species or subspecies of which the genus is present in the two aforementioned groups, has \code{prevalence = 2.5} in the \link{microorganisms} data set.
\item Any \emph{non-bacterial} genus, species or subspecies of which the genus is present in the following list, also has \code{prevalence = 2.5} in the \link{microorganisms} data set: \emph{Absidia}, \emph{Acanthamoeba}, \emph{Acremonium}, \emph{Aedes}, \emph{Alternaria}, \emph{Amoeba}, \emph{Ancylostoma}, \emph{Angiostrongylus}, \emph{Anisakis}, \emph{Anopheles}, \emph{Apophysomyces}, \emph{Aspergillus}, \emph{Aureobasidium}, \emph{Basidiobolus}, \emph{Beauveria}, \emph{Blastocystis}, \emph{Blastomyces}, \emph{Candida}, \emph{Capillaria}, \emph{Chaetomium}, \emph{Chrysonilia}, \emph{Cladophialophora}, \emph{Cladosporium}, \emph{Conidiobolus}, \emph{Contracaecum}, \emph{Cordylobia}, \emph{Cryptococcus}, \emph{Curvularia}, \emph{Demodex}, \emph{Dermatobia}, \emph{Dientamoeba}, \emph{Diphyllobothrium}, \emph{Dirofilaria}, \emph{Echinostoma}, \emph{Entamoeba}, \emph{Enterobius}, \emph{Exophiala}, \emph{Exserohilum}, \emph{Fasciola}, \emph{Fonsecaea}, \emph{Fusarium}, \emph{Giardia}, \emph{Haloarcula}, \emph{Halobacterium}, \emph{Halococcus}, \emph{Hendersonula}, \emph{Heterophyes}, \emph{Histomonas}, \emph{Histoplasma}, \emph{Hymenolepis}, \emph{Hypomyces}, \emph{Hysterothylacium}, \emph{Leishmania}, \emph{Malassezia}, \emph{Malbranchea}, \emph{Metagonimus}, \emph{Meyerozyma}, \emph{Microsporidium}, \emph{Microsporum}, \emph{Mortierella}, \emph{Mucor}, \emph{Mycocentrospora}, \emph{Necator}, \emph{Nectria}, \emph{Ochroconis}, \emph{Oesophagostomum}, \emph{Oidiodendron}, \emph{Opisthorchis}, \emph{Pediculus}, \emph{Phlebotomus}, \emph{Phoma}, \emph{Pichia}, \emph{Piedraia}, \emph{Pithomyces}, \emph{Pityrosporum}, \emph{Pneumocystis}, \emph{Pseudallescheria}, \emph{Pseudoterranova}, \emph{Pulex}, \emph{Rhizomucor}, \emph{Rhizopus}, \emph{Rhodotorula}, \emph{Saccharomyces}, \emph{Sarcoptes}, \emph{Scolecobasidium}, \emph{Scopulariopsis}, \emph{Scytalidium}, \emph{Spirometra}, \emph{Sporobolomyces}, \emph{Stachybotrys}, \emph{Strongyloides}, \emph{Syngamus}, \emph{Taenia}, \emph{Toxocara}, \emph{Trichinella}, \emph{Trichobilharzia}, \emph{Trichoderma}, \emph{Trichomonas}, \emph{Trichophyton}, \emph{Trichosporon}, \emph{Trichostrongylus}, \emph{Trichuris}, \emph{Tritirachium}, \emph{Trombicula}, \emph{Trypanosoma}, \emph{Tunga} or \emph{Wuchereria}.
\item All other records have \code{prevalence = 3.0} in the \link{microorganisms} data set.
}
\strong{Group 2} consists of all microorganisms where the taxonomic phylum is Pseudomonadota (previously named Proteobacteria), Bacillota (previously named Firmicutes), Actinomycetota (previously named Actinobacteria) or Sarcomastigophora, or where the taxonomic genus is \emph{Absidia}, \emph{Acanthamoeba}, \emph{Acholeplasma}, \emph{Acremonium}, \emph{Actinotignum}, \emph{Aedes}, \emph{Alistipes}, \emph{Alloprevotella}, \emph{Alternaria}, \emph{Amoeba}, \emph{Anaerosalibacter}, \emph{Ancylostoma}, \emph{Angiostrongylus}, \emph{Anisakis}, \emph{Anopheles}, \emph{Apophysomyces}, \emph{Arachnia}, \emph{Aspergillus}, \emph{Aureobasidium}, \emph{Bacteroides}, \emph{Basidiobolus}, \emph{Beauveria}, \emph{Bergeyella}, \emph{Blastocystis}, \emph{Blastomyces}, \emph{Borrelia}, \emph{Brachyspira}, \emph{Branhamella}, \emph{Butyricimonas}, \emph{Candida}, \emph{Capillaria}, \emph{Capnocytophaga}, \emph{Catabacter}, \emph{Cetobacterium}, \emph{Chaetomium}, \emph{Chlamydia}, \emph{Chlamydophila}, \emph{Christensenella}, \emph{Chryseobacterium}, \emph{Chrysonilia}, \emph{Cladophialophora}, \emph{Cladosporium}, \emph{Conidiobolus}, \emph{Contracaecum}, \emph{Cordylobia}, \emph{Cryptococcus}, \emph{Curvularia}, \emph{Deinococcus}, \emph{Demodex}, \emph{Dermatobia}, \emph{Dientamoeba}, \emph{Diphyllobothrium}, \emph{Dirofilaria}, \emph{Dysgonomonas}, \emph{Echinostoma}, \emph{Elizabethkingia}, \emph{Empedobacter}, \emph{Entamoeba}, \emph{Enterobius}, \emph{Exophiala}, \emph{Exserohilum}, \emph{Fasciola}, \emph{Flavobacterium}, \emph{Fonsecaea}, \emph{Fusarium}, \emph{Fusobacterium}, \emph{Giardia}, \emph{Haloarcula}, \emph{Halobacterium}, \emph{Halococcus}, \emph{Hendersonula}, \emph{Heterophyes}, \emph{Histomonas}, \emph{Histoplasma}, \emph{Hymenolepis}, \emph{Hypomyces}, \emph{Hysterothylacium}, \emph{Leishmania}, \emph{Lelliottia}, \emph{Leptosphaeria}, \emph{Leptotrichia}, \emph{Lucilia}, \emph{Lumbricus}, \emph{Malassezia}, \emph{Malbranchea}, \emph{Metagonimus}, \emph{Meyerozyma}, \emph{Microsporidium}, \emph{Microsporum}, \emph{Mortierella}, \emph{Mucor}, \emph{Mycocentrospora}, \emph{Mycoplasma}, \emph{Myroides}, \emph{Necator}, \emph{Nectria}, \emph{Ochroconis}, \emph{Odoribacter}, \emph{Oesophagostomum}, \emph{Oidiodendron}, \emph{Opisthorchis}, \emph{Ornithobacterium}, \emph{Parabacteroides}, \emph{Pediculus}, \emph{Pedobacter}, \emph{Phlebotomus}, \emph{Phocaeicola}, \emph{Phocanema}, \emph{Phoma}, \emph{Pichia}, \emph{Piedraia}, \emph{Pithomyces}, \emph{Pityrosporum}, \emph{Pneumocystis}, \emph{Porphyromonas}, \emph{Prevotella}, \emph{Pseudallescheria}, \emph{Pseudoterranova}, \emph{Pulex}, \emph{Rhizomucor}, \emph{Rhizopus}, \emph{Rhodotorula}, \emph{Riemerella}, \emph{Saccharomyces}, \emph{Sarcoptes}, \emph{Scolecobasidium}, \emph{Scopulariopsis}, \emph{Scytalidium}, \emph{Sphingobacterium}, \emph{Spirometra}, \emph{Spiroplasma}, \emph{Sporobolomyces}, \emph{Stachybotrys}, \emph{Streptobacillus}, \emph{Strongyloides}, \emph{Syngamus}, \emph{Taenia}, \emph{Tannerella}, \emph{Tenacibaculum}, \emph{Terrimonas}, \emph{Toxocara}, \emph{Treponema}, \emph{Trichinella}, \emph{Trichobilharzia}, \emph{Trichoderma}, \emph{Trichomonas}, \emph{Trichophyton}, \emph{Trichosporon}, \emph{Trichostrongylus}, \emph{Trichuris}, \emph{Tritirachium}, \emph{Trombicula}, \emph{Trypanosoma}, \emph{Tunga}, \emph{Ureaplasma}, \emph{Victivallis}, \emph{Wautersiella}, \emph{Weeksella} or \emph{Wuchereria}.
When calculating the matching score, all characters in \eqn{x} and \eqn{n} are ignored that are other than A-Z, a-z, 0-9, spaces and parentheses.
\strong{Group 3} consists of all other microorganisms.
All characters in \eqn{x} and \eqn{n} are ignored that are other than A-Z, a-z, 0-9, spaces and parentheses.
All matches are sorted descending on their matching score and for all user input values, the top match will be returned. This will lead to the effect that e.g., \code{"E. coli"} will return the microbial ID of \emph{Escherichia coli} (\eqn{m = 0.688}, a highly prevalent microorganism found in humans) and not \emph{Entamoeba coli} (\eqn{m = 0.079}, a less prevalent microorganism in humans), although the latter would alphabetically come first.
All matches are sorted descending on their matching score and for all user input values, the top match will be returned. This will lead to the effect that e.g., \code{"E. coli"} will return the microbial ID of \emph{Escherichia coli} (\eqn{m = 0.688}, a highly prevalent microorganism found in humans) and not \emph{Entamoeba coli} (\eqn{m = 0.095}, a less prevalent microorganism in humans), although the latter would alphabetically come first.
}
\section{Source}{
@ -351,6 +356,7 @@ All matches are sorted descending on their matching score and for all user input
\item Parte, AC \emph{et al.} (2020). \strong{List of Prokaryotic names with Standing in Nomenclature (LPSN) moves to the DSMZ.} International Journal of Systematic and Evolutionary Microbiology, 70, 5607-5612; \doi{10.1099/ijsem.0.004332}. Accessed from \url{https://lpsn.dsmz.de} on 11 December, 2022.
\item GBIF Secretariat (2022). GBIF Backbone Taxonomy. Checklist dataset \doi{10.15468/39omei}. Accessed from \url{https://www.gbif.org} on 11 December, 2022.
\item Public Health Information Network Vocabulary Access and Distribution System (PHIN VADS). US Edition of SNOMED CT from 1 September 2020. Value Set Name 'Microoganism', OID 2.16.840.1.114222.4.11.1009 (v12). URL: \url{https://phinvads.cdc.gov}
\item Bartlett A \emph{et al.} (2022). \strong{A comprehensive list of bacterial pathogens infecting humans} \emph{Microbiology} 168:001269; \doi{10.1099/mic.0.001269}
}
}