1
0
mirror of https://github.com/msberends/AMR.git synced 2025-01-23 21:44:35 +01:00

fix missing R breakpoints

This commit is contained in:
dr. M.S. (Matthijs) Berends 2022-10-29 14:15:23 +02:00
parent c2801ba7a1
commit 6ad7857d39
34 changed files with 959 additions and 865 deletions

View File

@ -38,7 +38,7 @@ on:
# this is to check that all dependencies are still available (see R/zzz.R)
- cron: '0 1 * * *'
name: check-devel
name: check-current
jobs:
R-code-check:

View File

@ -36,7 +36,7 @@ on:
# this is to check that all dependencies are still available (see R/zzz.R)
- cron: '0 1 * * *'
name: check-release
name: check-old
jobs:
R-code-check:

View File

@ -1,6 +1,6 @@
Package: AMR
Version: 1.8.2.9033
Date: 2022-10-22
Version: 1.8.2.9034
Date: 2022-10-29
Title: Antimicrobial Resistance Data Analysis
Description: Functions to simplify and standardise antimicrobial resistance (AMR)
data analysis and to work with microbial and antimicrobial properties by

View File

@ -1,4 +1,4 @@
# AMR 1.8.2.9033
# AMR 1.8.2.9034
This version will eventually become v2.0! We're happy to reach a new major milestone soon!
@ -35,6 +35,7 @@ This version will eventually become v2.0! We're happy to reach a new major miles
### Changed
* Fix for using `as.rsi()` on certain EUCAST breakpoints for MIC values
* Fix for using `as.rsi()` on `NA` values (e.g. `as.rsi(as.disk(NA), ...)`)
* Fix for using `as.rsi()` on drug-drug combinations with multiple breakpoints for different body sites
* Removed `as.integer()` for MIC values, since MIC are not integer values and running `table()` on MIC values consequently failed for not being able to retrieve the level position (as that's how normally `as.integer()` on `factor`s work)
* `droplevels()` on MIC will now return a common `factor` at default and will lose the `<mic>` class. Use `droplevels(..., as.mic = TRUE)` to keep the `<mic>` class.
* Small fix for using `ab_from_text()`
@ -51,6 +52,7 @@ This version will eventually become v2.0! We're happy to reach a new major miles
* Fix for using `as.ab()` on `NA` values
* Updated support for all WHONET 2022 microorganism codes
* Antimicrobial interpretation 'SDD' (susceptible dose-dependent, coined by CLSI) will be interpreted as 'I' to comply with EUCAST's 'I' in `as.rsi()`
* Fix for `mo_shortname()` in case of higher taxonomic ranks (order, class, phylum)
### Other
* New website to make use of the new Bootstrap 5 and pkgdown 2.0. The website now contains results for all examples and will be automatically regenerated with every change to our repository, using GitHub Actions

View File

@ -118,7 +118,7 @@
#' - 2 entries of *Staphylococcus* (coagulase-negative (CoNS) and coagulase-positive (CoPS))
#' - 1 entry of *Blastocystis* (*B. hominis*), although it officially does not exist (Noel *et al.* 2005, PMID 15634993)
#' - 1 entry of *Moraxella* (*M. catarrhalis*), which was formally named *Branhamella catarrhalis* (Catlin, 1970) though this change was never accepted within the field of clinical microbiology
#' - 5 other 'undefined' entries (unknown, unknown Gram negatives, unknown Gram positives, unknown yeast and unknown fungus)
#' - 6 other 'undefined' entries (unknown, unknown Gram negatives, unknown Gram positives, unknown yeast, unknown fungus, and unknown anaerobic bacteria)
#'
#' The syntax used to transform the original data to a cleansed \R format, can be found here: <https://github.com/msberends/AMR/blob/main/data-raw/reproduction_of_microorganisms.R>.
#'

View File

@ -34,6 +34,8 @@
#' @param x vector
#' @param na.rm a [logical] indicating whether missing values should be removed
#' @details Interpret disk values as RSI values with [as.rsi()]. It supports guidelines from EUCAST and CLSI.
#'
#' Disk diffusion growth zone sizes must be between 6 and 50 millimetres. Values higher than 50 but lower than 100 will be maximised to 50. All others input values outside the 6-50 range will return `NA`.
#' @return An [integer] with additional class [`disk`]
#' @aliases disk
#' @export
@ -107,7 +109,8 @@ as.disk <- function(x, na.rm = FALSE) {
x <- as.integer(ceiling(clean_double2(x)))
# disks can never be less than 6 mm (size of smallest disk) or more than 50 mm
x[x < 6 | x > 50] <- NA_integer_
x[x < 6 | x > 99] <- NA_integer_
x[x > 50] <- 50L
na_after <- length(x[is.na(x)])
if (na_before != na_after) {

View File

@ -224,6 +224,8 @@ mo_shortname <- function(x, language = get_AMR_locale(), keep_synonyms = getOpti
# unknown species etc.
shortnames[shortnames %like% "unknown"] <- paste0("(", trimws2(gsub("[^a-zA-Z -]", "", shortnames[shortnames %like% "unknown"], perl = TRUE)), ")")
shortnames[mo_rank(x.mo) %in% c("kingdom", "phylum", "class", "order", "family")] <- mo_name(x.mo, language = NULL, keep_synonyms = keep_synonyms)
shortnames[is.na(x.mo)] <- NA_character_
load_mo_uncertainties(metadata)
translate_into_language(shortnames, language = language, only_unknown = FALSE, only_affect_mo_names = TRUE)

72
R/rsi.R
View File

@ -396,7 +396,7 @@ as.rsi.mic <- function(x,
mo = NULL,
ab = deparse(substitute(x)),
guideline = "EUCAST",
uti = FALSE,
uti = NULL,
conserve_capped_values = FALSE,
add_intrinsic_resistance = FALSE,
reference_data = AMR::rsi_translation,
@ -422,7 +422,7 @@ as.rsi.disk <- function(x,
mo = NULL,
ab = deparse(substitute(x)),
guideline = "EUCAST",
uti = FALSE,
uti = NULL,
add_intrinsic_resistance = FALSE,
reference_data = AMR::rsi_translation,
...) {
@ -454,7 +454,7 @@ as.rsi.data.frame <- function(x,
meet_criteria(x, allow_class = "data.frame") # will also check for dimensions > 0
meet_criteria(col_mo, allow_class = "character", is_in = colnames(x), allow_NULL = TRUE)
meet_criteria(guideline, allow_class = "character", has_length = 1)
meet_criteria(uti, allow_class = c("logical", "character"), allow_NULL = TRUE)
meet_criteria(uti, allow_class = c("logical", "character"), allow_NULL = TRUE, allow_NA = TRUE)
meet_criteria(conserve_capped_values, allow_class = "logical", has_length = 1)
meet_criteria(add_intrinsic_resistance, allow_class = "logical", has_length = 1)
meet_criteria(reference_data, allow_class = "data.frame")
@ -495,7 +495,7 @@ as.rsi.data.frame <- function(x,
uti <- as.logical(x[, col_uti, drop = TRUE])
}
} else {
# look for specimen column and make logicals of the urines
# col_uti is still NULL - look for specimen column and make logicals of the urines
col_specimen <- suppressMessages(search_type_in_df(x = x, type = "specimen"))
if (!is.null(col_specimen)) {
uti <- x[, col_specimen, drop = TRUE] %like% "urin"
@ -514,7 +514,7 @@ as.rsi.data.frame <- function(x,
)
} else {
# no data about UTI's found
uti <- FALSE
uti <- NULL
}
}
@ -672,9 +672,9 @@ as_rsi_method <- function(method_short,
...) {
meet_criteria(x, allow_NA = TRUE, .call_depth = -2)
meet_criteria(mo, allow_class = c("mo", "character"), allow_NULL = TRUE, .call_depth = -2)
meet_criteria(ab, allow_class = c("ab", "character"), .call_depth = -2)
meet_criteria(ab, allow_class = c("ab", "character"), has_length = 1, .call_depth = -2)
meet_criteria(guideline, allow_class = "character", has_length = 1, .call_depth = -2)
meet_criteria(uti, allow_class = "logical", has_length = c(1, length(x)), .call_depth = -2)
meet_criteria(uti, allow_class = "logical", has_length = c(1, length(x)), allow_NULL = TRUE, allow_NA = TRUE, .call_depth = -2)
meet_criteria(conserve_capped_values, allow_class = "logical", has_length = 1, .call_depth = -2)
meet_criteria(add_intrinsic_resistance, allow_class = "logical", has_length = 1, .call_depth = -2)
meet_criteria(reference_data, allow_class = "data.frame", .call_depth = -2)
@ -745,6 +745,9 @@ as_rsi_method <- function(method_short,
if (length(mo) == 1) {
mo <- rep(mo, length(x))
}
if (is.null(uti)) {
uti <- NA
}
if (length(uti) == 1) {
uti <- rep(uti, length(x))
}
@ -860,7 +863,6 @@ as_rsi_method <- function(method_short,
}
get_record <- trans %pm>%
# no subsetting to UTI here
subset(lookup %in% c(
lookup_mo[i],
lookup_genus[i],
@ -871,33 +873,63 @@ as_rsi_method <- function(method_short,
lookup_other[i]
))
if (any(nrow(get_record) == 1 && get_record$uti == TRUE, na.rm = TRUE) && !any(uti == TRUE, na.rm = TRUE) && message_not_thrown_before("as.rsi", "uti", ab_param)) {
warning_("in `as.rsi()`: interpretation of ", font_bold(ab_name(ab_param, tolower = TRUE)), " is only available for (uncomplicated) urinary tract infections (UTI) for some microorganisms. Use argument `uti` to set which isolates are from urine. See ?as.rsi.")
if (NROW(get_record) == 0) {
warning_("No ", method_param, " breakpoints available for ",
font_italic(suppressMessages(suppressWarnings(mo_shortname(mo[i], language = NULL, keep_synonyms = FALSE)))),
paste0(" / "),
suppressMessages(suppressWarnings(ab_name(ab, language = NULL, tolower = TRUE))))
rise_warning <- TRUE
next
}
if (isTRUE(uti[i])) {
get_record <- get_record %pm>%
# be as specific as possible (i.e. prefer species over genus):
# pm_desc(uti) = TRUE on top and FALSE on bottom
pm_arrange(pm_desc(uti), rank_index) # 'uti' is a column in data set 'rsi_translation'
# the below `pm_desc(uti)` will put `TRUE` on top and FALSE on bottom
pm_arrange(rank_index, pm_desc(uti)) # 'uti' is a column in data set 'rsi_translation'
} else {
get_record <- get_record %pm>%
pm_filter(uti == FALSE) %pm>% # 'uti' is a column in rsi_translation
pm_arrange(rank_index)
# sort UTI = FALSE first, then UTI = TRUE
pm_arrange(rank_index, uti)
}
# warning section
records_same_mo <- get_record[get_record$mo == get_record[1, "mo", drop = TRUE], , drop = FALSE]
if (message_not_thrown_before("as.rsi", "site", records_same_mo$mo[1]) && nrow(records_same_mo) > 1 && length(unique(records_same_mo$site)) > 1) {
warning_("in `as.rsi()`: assuming site '",
get_record[1L, "site", drop = FALSE], "' for ",
font_italic(suppressMessages(suppressWarnings(mo_name(records_same_mo$mo[1], language = NULL, keep_synonyms = FALSE)))),
if (nrow(get_record) == 1 && all(get_record$uti == TRUE) && uti[i] %in% c(FALSE, NA) && message_not_thrown_before("as.rsi", "uti", ab_param)) {
# uti not set as TRUE, but there are only a UTI breakpoints available, so throw warning
warning_("in `as.rsi()`: interpretation of ", font_bold(ab_name(ab_param, tolower = TRUE)), " is only available for (uncomplicated) urinary tract infections (UTI) for some microorganisms, thus assuming `uti = TRUE`. See ?as.rsi.")
rise_warning <- TRUE
} else if (nrow(records_same_mo) > 1 && length(unique(records_same_mo$site)) > 1 && uti[i] == FALSE && all(c(TRUE, FALSE) %in% records_same_mo$uti, na.rm = TRUE) && message_not_thrown_before("as.rsi", "siteUTI", records_same_mo$mo[1], records_same_mo$ab[1])) {
# uti not set and both UTI and non-UTI breakpoints available, so throw warning
warning_("in `as.rsi()`: breakpoints for UTI ", font_underline("and"), " non-UTI available for ",
font_italic(suppressMessages(suppressWarnings(mo_shortname(records_same_mo$mo[1], language = NULL, keep_synonyms = FALSE)))),
paste0(" / "),
suppressMessages(suppressWarnings(ab_name(records_same_mo$ab[1], language = NULL, tolower = TRUE))),
paste0(" - assuming non-UTI. Use argument `uti` to set which isolates are from urine. See ?as.rsi. '"),
call = FALSE)
get_record <- get_record %pm>%
pm_filter(uti == FALSE)
rise_warning <- TRUE
} else if (nrow(records_same_mo) > 1 && length(unique(records_same_mo$site)) > 1 && all(records_same_mo$uti == FALSE, na.rm = TRUE) && message_not_thrown_before("as.rsi", "site", records_same_mo$mo[1], records_same_mo$ab[1])) {
# breakpoints for multiple body sites available, so throw warning
site <- get_record[1L, "site", drop = FALSE]
if (is.na(site)) {
site <- paste0("an unspecified body site")
} else {
site <- paste0("body site '", get_record[1L, "site", drop = FALSE], "'")
}
warning_("in `as.rsi()`: breakpoints available for ",
font_italic(suppressMessages(suppressWarnings(mo_shortname(records_same_mo$mo[1], language = NULL, keep_synonyms = FALSE)))),
paste0(" / "),
suppressMessages(suppressWarnings(ab_name(records_same_mo$ab[1], language = NULL, tolower = TRUE))),
paste0(" - assuming ", site),
call = FALSE)
rise_warning <- TRUE
}
get_record <- get_record[1L, , drop = FALSE]
if (NROW(get_record) > 0) {
# get the best hit: the top one
get_record <- get_record[1L, , drop = FALSE]
if (is.na(x[i]) | (is.na(get_record$breakpoint_S) & is.na(get_record$breakpoint_R))) {
new_rsi[i] <- NA_character_
} else if (method == "mic") {

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@ -1 +1 @@
8c1fdbe23853d30840dc5d863bc761df
4cb5e83062897061b17ddac6d5cd31d7

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@ -592,18 +592,61 @@ for (i in 2:6) {
status = "accepted",
source = "manually added"
) %>%
filter(!paste(kingdom, .[[ncol(.) - 4]], rank) %in% paste(taxonomy$kingdom, taxonomy[[i + 1]], taxonomy$rank)) %>%
filter(!paste(kingdom, .[[ncol(.) - 4]], rank) %in% paste(taxonomy$kingdom, taxonomy[[i + 1]], taxonomy$rank))# %>%
# get GBIF identifier where available
left_join(current_gbif %>%
select(kingdom, all_of(i_name), rank = taxonRank, ref = scientificNameAuthorship, gbif = taxonID, gbif_parent = parentNameUsageID),
by = c("kingdom", "rank", i_name)
) %>%
mutate(source = ifelse(!is.na(gbif), "GBIF", source))
# left_join(current_gbif %>%
# select(kingdom, all_of(i_name), rank = taxonRank, ref = scientificNameAuthorship, gbif = taxonID, gbif_parent = parentNameUsageID),
# by = c("kingdom", "rank", i_name)
# ) %>%
# mutate(source = ifelse(!is.na(gbif), "GBIF", source))
message("n = ", nrow(to_add))
taxonomy <- taxonomy %>%
bind_rows(to_add)
}
# FIX LATER: added missings after finding out still some taxonomic levels were missing
# this should not be needed - it was the only part that was required after last update
# can now be removed? Check with next update!
new_df <- AMR::microorganisms[0, ]
for (tax in c("phylum", "class", "order", "family", "genus")) {
print(tax)
out <- AMR::microorganisms %>% pull(tax) %>% unique()
missing <- vapply(FUN.VALUE = logical(1), out, function(x) length(which(AMR::microorganisms[[tax]] == x & AMR::microorganisms$rank == tax)) == 0)
missing <- names(missing)[which(missing == TRUE & names(missing) != "" & names(missing) %unlike% "unknown")]
out <- microorganisms %>%
filter(.[[tax]] %in% missing) %>%
distinct(.[[tax]], .keep_all = TRUE) %>%
mutate_at(vars((which(colnames(.) == tax) + 1):subspecies), ~"") %>%
mutate_at(vars(lpsn:gbif_renamed_to), ~NA_character_) %>%
mutate(rank = tax,
ref = NA_character_,
status = "accepted",
fullname = .[[tax]],
source = "manually added",
snomed = rep(list(character(0)), nrow(.)))
new_df <- bind_rows(new_df, out)
if (".[[tax]]" %in% colnames(new_df)) {
new_df <- new_df %>% select(-`.[[tax]]`)
}
}
new_df <- new_df %>%
mutate(mo = as.character(mo))
new_mo <- new_df %>%
filter(rank == "family") %>%
mutate(
mo_rank_new8 = abbreviate_mo(family, minlength = 8, prefix = "[FAM]_"),
mo_rank_new9 = abbreviate_mo(family, minlength = 9, prefix = "[FAM]_"),
mo_rank_new = mo_rank_new8,
mo_duplicated = duplicated(mo_rank_new),
mo_rank_new = ifelse(mo_duplicated, mo_rank_new9, mo_rank_new),
mo_duplicated = duplicated(mo_rank_new)
) %>%
transmute(fullname, mo_rank_new = paste0(gsub("_.*", "_", as.character(mo)), mo_rank_new))
any(new_mo$mo_rank_new %in% microorganisms$mo)
new_df[which(new_df$fullname %in% new_mo$fullname), "mo"] <- new_mo$mo_rank_new
# species (requires combination with genus)
taxonomy <- taxonomy %>%
bind_rows(taxonomy %>%
@ -998,9 +1041,16 @@ taxonomy <- taxonomy %>%
.before = 1
) %>%
select(!starts_with("mo_")) %>%
arrange(fullname) %>%
arrange(fullname)
# now check these - e.g. Nitrospira is the name of a genus AND its class
taxonomy %>% filter(fullname %in% .[duplicated(fullname), "fullname", drop = TRUE])
taxonomy <- taxonomy %>%
distinct(fullname, .keep_all = TRUE)
# This must not exist:
taxonomy %>% filter(mo %like% "__")
# Remove unwanted taxonomic entries from Protoza/Fungi --------------------
@ -1027,7 +1077,7 @@ message("\nCongratulations! The new taxonomic table will contain ", format(nrow(
# we will use Public Health Information Network Vocabulary Access and Distribution System (PHIN VADS)
# as a source, which copies directly from the latest US SNOMED CT version
# - go to https://phinvads.cdc.gov/vads/ViewValueSet.action?oid=2.16.840.1.114222.4.11.1009
# - check that current online version is higher than SNOMED_VERSION$current_version
# - check that current online version is higher than TAXONOMY_VERSION$SNOMED
# - if so, click on 'Download Value Set', choose 'TXT'
snomed <- vroom("data-raw/SNOMED_PHVS_Microorganism_CDC_V12.txt", skip = 3) %>%
select(1:2) %>%

View File

@ -37,6 +37,7 @@ library(AMR)
# Install the WHONET 2022 software on Windows (http://www.whonet.org/software.html),
# and copy the folder C:\WHONET\Resources to the data-raw/WHONET/ folder
# (for ASIARS-Net update, also copy C:\WHONET\Codes to the data-raw/WHONET/ folder)
# Load source data ----
whonet_organisms <- read_tsv("data-raw/WHONET/Resources/Organisms.txt", na = c("", "NA", "-"), show_col_types = FALSE) %>%
@ -134,9 +135,8 @@ breakpoints_new <- breakpoints %>%
ab = as.ab(WHONET_ABX_CODE),
ref_tbl = REFERENCE_TABLE,
disk_dose = POTENCY,
# keep disks within 6-50 mm
breakpoint_S = if_else(method == "DISK", S %>% pmax(6) %>% pmin(50), S),
breakpoint_R = if_else(method == "DISK", R %>% pmax(6) %>% pmin(50), R),
breakpoint_S = S,
breakpoint_R = R,
uti = SITE_OF_INFECTION %like% "(UTI|urinary|urine)") %>%
# Greek symbols and EM dash symbols are not allowed by CRAN, so replace them with ASCII:
mutate(disk_dose = disk_dose %>%
@ -177,6 +177,9 @@ breakpoints_new <- breakpoints_new %>%
mutate(breakpoint_R = ifelse(guideline %like% "EUCAST" & method == "DISK" & breakpoint_S - breakpoint_R != 0,
breakpoint_R + 1,
breakpoint_R))
# fix missing R breakpoint where there is an S breakpoint
breakpoints_new[which(is.na(breakpoints_new$breakpoint_R)), "breakpoint_R"] <- breakpoints_new[which(is.na(breakpoints_new$breakpoint_R)), "breakpoint_S"]
# check again
breakpoints_new %>% filter(guideline == "EUCAST 2022", ab == "AMC", mo == "B_[ORD]_ENTRBCTR", method == "MIC")
# compare with current version

View File

@ -1 +1 @@
c7fbfa8e8b012a00c9e0de1476e28f99
547b6b086e20bcfb918b3db6f55f84a5

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

File diff suppressed because it is too large Load Diff

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@ -103,7 +103,7 @@ if (AMR:::pkg_is_available("skimr", min_version = "2.0.0")) {
expect_equal(as.rsi(c("", "-", NA, "NULL")), c(NA_rsi_, NA_rsi_, NA_rsi_, NA_rsi_))
# S. pneumoniae/ampicillin in EUCAST 2020: 0.5-2 ug/ml (R is only > 2)
expect_equal(
expect_equal(suppressMessages(
as.character(
as.rsi(
x = as.mic(c(0.125, 0.5, 1, 2, 4)),
@ -111,11 +111,11 @@ expect_equal(
ab = "AMP",
guideline = "EUCAST 2020"
)
),
)),
c("S", "S", "I", "I", "R")
)
# S. pneumoniae/amoxicillin in CLSI 2019: 2-8 ug/ml (R is 8 and > 8)
expect_equal(
expect_equal(suppressMessages(
as.character(
as.rsi(
x = as.mic(c(1, 2, 4, 8, 16)),
@ -123,7 +123,7 @@ expect_equal(
ab = "AMX",
guideline = "CLSI 2019"
)
),
)),
c("S", "S", "I", "R", "R")
)
@ -133,11 +133,11 @@ expect_true(is.null(rsi_interpretation_history()))
# cutoffs at MIC = 8
expect_equal(
as.rsi(as.mic(2), "E. coli", "ampicillin", guideline = "EUCAST 2020"),
suppressMessages(as.rsi(as.mic(2), "E. coli", "ampicillin", guideline = "EUCAST 2020")),
as.rsi("S")
)
expect_equal(
as.rsi(as.mic(32), "E. coli", "ampicillin", guideline = "EUCAST 2020"),
suppressMessages(as.rsi(as.mic(32), "E. coli", "ampicillin", guideline = "EUCAST 2020")),
as.rsi("R")
)
if (AMR:::pkg_is_available("dplyr", min_version = "1.0.0")) {

View File

@ -31,6 +31,8 @@ This transforms a vector to a new class \code{\link{disk}}, which is a disk diff
\details{
Interpret disk values as RSI values with \code{\link[=as.rsi]{as.rsi()}}. It supports guidelines from EUCAST and CLSI.
Disk diffusion growth zone sizes must be between 6 and 50 millimetres. Values higher than 50 but lower than 100 will be maximised to 50. All others input values outside the 6-50 range will return \code{NA}.
\code{NA_disk_} is a missing value of the new \code{disk} class.
}
\examples{

View File

@ -37,7 +37,7 @@ is.rsi.eligible(x, threshold = 0.05)
mo = NULL,
ab = deparse(substitute(x)),
guideline = "EUCAST",
uti = FALSE,
uti = NULL,
conserve_capped_values = FALSE,
add_intrinsic_resistance = FALSE,
reference_data = AMR::rsi_translation,
@ -49,7 +49,7 @@ is.rsi.eligible(x, threshold = 0.05)
mo = NULL,
ab = deparse(substitute(x)),
guideline = "EUCAST",
uti = FALSE,
uti = NULL,
add_intrinsic_resistance = FALSE,
reference_data = AMR::rsi_translation,
...

View File

@ -3,9 +3,9 @@
\docType{data}
\name{microorganisms}
\alias{microorganisms}
\title{Data Set with 48,788 Microorganisms}
\title{Data Set with 48,883 Microorganisms}
\format{
A \link[tibble:tibble]{tibble} with 48,788 observations and 22 variables:
A \link[tibble:tibble]{tibble} with 48,883 observations and 22 variables:
\itemize{
\item \code{mo}\cr ID of microorganism as used by this package
\item \code{fullname}\cr Full name, like \code{"Escherichia coli"}. For the taxonomic ranks genus, species and subspecies, this is the 'pasted' text of genus, species, and subspecies. For all taxonomic ranks higher than genus, this is the name of the taxon.
@ -48,7 +48,7 @@ Included taxonomic data are:
\itemize{
\item All ~34,000 (sub)species from the kingdoms of Archaea and Bacteria
\item ~7,400 (sub)species from the kingdom of Fungi. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package. Only relevant fungi are covered (such as all species of \emph{Aspergillus}, \emph{Candida}, \emph{Cryptococcus}, \emph{Histoplasma}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).
\item ~4,900 (sub)species from the kingdom of Protozoa
\item ~5,000 (sub)species from the kingdom of Protozoa
\item ~1,500 (sub)species from ~50 other relevant genera from the kingdom of Animalia (such as \emph{Strongyloides} and \emph{Taenia})
\item All ~9,400 previously accepted names of all included (sub)species (these were taxonomically renamed)
\item The complete taxonomic tree of all included (sub)species: from kingdom to subspecies
@ -63,7 +63,7 @@ For convenience, some entries were added manually:
\item 2 entries of \emph{Staphylococcus} (coagulase-negative (CoNS) and coagulase-positive (CoPS))
\item 1 entry of \emph{Blastocystis} (\emph{B. hominis}), although it officially does not exist (Noel \emph{et al.} 2005, PMID 15634993)
\item 1 entry of \emph{Moraxella} (\emph{M. catarrhalis}), which was formally named \emph{Branhamella catarrhalis} (Catlin, 1970) though this change was never accepted within the field of clinical microbiology
\item 5 other 'undefined' entries (unknown, unknown Gram negatives, unknown Gram positives, unknown yeast and unknown fungus)
\item 6 other 'undefined' entries (unknown, unknown Gram negatives, unknown Gram positives, unknown yeast, unknown fungus, and unknown anaerobic bacteria)
}
The syntax used to transform the original data to a cleansed \R format, can be found here: \url{https://github.com/msberends/AMR/blob/main/data-raw/reproduction_of_microorganisms.R}.