diff --git a/DESCRIPTION b/DESCRIPTION index 6635ea05..bce7122e 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR -Version: 0.8.0.9030 -Date: 2019-11-11 +Version: 0.8.0.9031 +Date: 2019-11-15 Title: Antimicrobial Resistance Analysis Authors@R: c( person(role = c("aut", "cre"), diff --git a/NAMESPACE b/NAMESPACE index d6109758..e44adf47 100755 --- a/NAMESPACE +++ b/NAMESPACE @@ -323,6 +323,7 @@ importFrom(stats,pchisq) importFrom(stats,predict) importFrom(tidyr,pivot_longer) importFrom(tidyr,pivot_wider) +importFrom(utils,adist) importFrom(utils,browseURL) importFrom(utils,menu) importFrom(utils,read.csv) diff --git a/NEWS.md b/NEWS.md index dca3b445..a8d7cf91 100755 --- a/NEWS.md +++ b/NEWS.md @@ -1,5 +1,16 @@ -# AMR 0.8.0.9030 -Last updated: 11-Nov-2019 +# AMR 0.8.0.9031 +Last updated: 15-Nov-2019 + +### Breaking +* Adopted Adeolu *et al.* (2016), [PMID 27620848](https://www.ncbi.nlm.nih.gov/pubmed/27620848) for the `microorganisms` data set, which means that the new order Enterobacterales now consists of a part of the existing family Enterobacteriaceae, but that this family has been split into other families as well (like *Morganellaceae* and *Yersiniaceae*). Although published in 2016, this information is not yet in the Catalogue of Life version of 2019. All MDRO determinations with `mdro()` will now use the Enterobacterales order for all guidelines before 2016 that were dependent on the Enterobacteriaceae family. + * If you were dependent on the old Enterobacteriaceae family e.g. by using in your code: + ```r + if (mo_family(somebugs) == "Enterobacteriaceae") ... + ``` + then please adjust this to: + ```r + if (mo_order(somebugs) == "Enterobacterales") ... + ``` ### New * Functions `susceptibility()` and `resistance()` as aliases of `proportion_SI()` and `proportion_R()`, respectively. These functions were added to make it more clear that "I" should be considered susceptible and not resistant. @@ -16,11 +27,29 @@ * The new Verbose mode (`mdro(...., verbose = TRUE)`) returns an informative data set where the reason for MDRO determination is given for every isolate, and an list of the resistant antimicrobial agents ### Changes +* Improvements to algorithm in `as.mo()`: + * Now allows "ou" where "au" should have been used and vice versa + * More intelligent way of coping with some consonants like "l" and "r" + * Added a score (a certainty percentage) to `mo_uncertainties()`, that is calculated using the [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance): + ```r + as.mo(c("Stafylococcus aureus", + "staphylokok aureuz")) + #> Warning: + #> Results of two values was guessed with uncertainty. Use mo_uncertainties() to review them. + #> Class 'mo' + #> [1] B_STPHY_AURS B_STPHY_AURS + + mo_uncertainties() + #> "Stafylococcus aureus" -> Staphylococcus aureus (B_STPHY_AURS, score: 95.2%) + #> "staphylokok aureuz" -> Staphylococcus aureus (B_STPHY_AURS, score: 85.7%) + ``` * Removed previously deprecated function `as.atc()` - this function was replaced by `ab_atc()` * Renamed all `portion_*` functions to `proportion_*`. All `portion_*` functions are still available as deprecated functions, and will return a warning when used. * When running `as.rsi()` over a data set, it will now print the guideline that will be used if it is not specified by the user -* Fix for `eucast_rules()`: *Stenotrophomonas maltophilia* not interpreted "R" to ceftazidime anymore (following EUCAST v3.1) -* Adopted Adeolu *et al.* (2016), [PMID 27620848](https://www.ncbi.nlm.nih.gov/pubmed/27620848) for the `microorganisms` data set, which means that the new order Enterobacterales now consists of a part of the existing family *Enterobacteriaceae*, but that this family has been split into other families as well (like *Morganellaceae* and *Yersiniaceae*). Although published in 2016, this information is not yet in the Catalogue of Life version of 2019. All MDRO determinations with `mdro()` will now use the Enterobacterales order for all guidelines before 2016. +* Improvements for `eucast_rules()`: + * Fix where *Stenotrophomonas maltophilia* would always become ceftazidime R (following EUCAST v3.1) + * Fix where *Leuconostoc* and *Pediococcus* would not always become glyopeptides R + * non-EUCAST rules in `eucast_rules()` are now applied first and not as last anymore. This is to improve the dependency on certain antibiotics for the official EUCAST rules. Please see `?eucast_rules`. * Fix for interpreting MIC values with `as.rsi()` where the input is `NA` * Added "imi" and "imp" as allowed abbreviation for Imipenem (IPM) * Fix for automatically determining columns with antibiotic results in `mdro()` and `eucast_rules()` diff --git a/R/eucast_rules.R b/R/eucast_rules.R index 38375369..8d6d4915 100755 --- a/R/eucast_rules.R +++ b/R/eucast_rules.R @@ -24,8 +24,11 @@ EUCAST_VERSION_BREAKPOINTS <- "9.0, 2019" EUCAST_VERSION_EXPERT_RULES <- "3.1, 2016" #' EUCAST rules -#' -#' Apply susceptibility rules as defined by the European Committee on Antimicrobial Susceptibility Testing (EUCAST, \url{http://eucast.org}), see \emph{Source}. This includes (1) expert rules, (2) intrinsic resistance and (3) inferred resistance as defined in their breakpoint tables. +#' +#' @description +#' Apply susceptibility rules as defined by the European Committee on Antimicrobial Susceptibility Testing (EUCAST, \url{http://eucast.org}), see \emph{Source}. This includes (1) expert rules, (2) intrinsic resistance and (3) inferred resistance as defined in their breakpoint tables. +#' +#' To improve the interpretation of the antibiogram before EUCAST rules are applied, some non-EUCAST rules are applied at default, see Details. #' @param x data with antibiotic columns, like e.g. \code{AMX} and \code{AMC} #' @param info print progress #' @param rules a character vector that specifies which rules should be applied - one or more of \code{c("breakpoints", "expert", "other", "all")} @@ -36,6 +39,19 @@ EUCAST_VERSION_EXPERT_RULES <- "3.1, 2016" #' \strong{Note:} This function does not translate MIC values to RSI values. Use \code{\link{as.rsi}} for that. \cr #' \strong{Note:} When ampicillin (AMP, J01CA01) is not available but amoxicillin (AMX, J01CA04) is, the latter will be used for all rules where there is a dependency on ampicillin. These drugs are interchangeable when it comes to expression of antimicrobial resistance. #' +#' Before further processing, some non-EUCAST rules are applied to improve the efficacy of the EUCAST rules. These non-EUCAST rules, that are applied to all isolates, are: +#' \itemize{ +#' \item{Inherit amoxicillin (AMX) from ampicillin (AMP), where amoxicillin (AMX) is unavailable;} +#' \item{Inherit ampicillin (AMP) from amoxicillin (AMX), where ampicillin (AMP) is unavailable;} +#' \item{Set amoxicillin (AMX) = R where amoxicillin/clavulanic acid (AMC) = R;} +#' \item{Set piperacillin (PIP) = R where piperacillin/tazobactam (TZP) = R;} +#' \item{Set trimethoprim (TMP) = R where trimethoprim/sulfamethoxazole (SXT) = R;} +#' \item{Set amoxicillin/clavulanic acid (AMC) = S where amoxicillin (AMX) = S;} +#' \item{Set piperacillin/tazobactam (TZP) = S where piperacillin (PIP) = S;} +#' \item{Set trimethoprim/sulfamethoxazole (SXT) = S where trimethoprim (TMP) = S.} +#' } +#' To \emph{not} use these rules, please use \code{eucast_rules(..., rules = c("breakpoints", "expert"))}. +#' #' The file containing all EUCAST rules is located here: \url{https://gitlab.com/msberends/AMR/blob/master/data-raw/eucast_rules.tsv}. #' #' @section Antibiotics: @@ -516,29 +532,7 @@ eucast_rules <- function(x, as.data.frame(stringsAsFactors = FALSE) ) - if (info == TRUE) { - cat(paste0( - "\nRules by the ", bold("European Committee on Antimicrobial Susceptibility Testing (EUCAST)"), - "\n", blue("http://eucast.org/"), "\n")) - } - - # since ampicillin ^= amoxicillin, get the first from the latter (not in original EUCAST table) - if (!ab_missing(AMP) & !ab_missing(AMX)) { - if (verbose == TRUE) { - cat("\n VERBOSE: transforming", - length(which(x[, AMX] == "S" & !x[, AMP] %in% c("S", "I", "R"))), - "empty ampicillin fields to 'S' based on amoxicillin. ") - cat("\n VERBOSE: transforming", - length(which(x[, AMX] == "I" & !x[, AMP] %in% c("S", "I", "R"))), - "empty ampicillin fields to 'I' based on amoxicillin. ") - cat("\n VERBOSE: transforming", - length(which(x[, AMX] == "R" & !x[, AMP] %in% c("S", "I", "R"))), - "empty ampicillin fields to 'R' based on amoxicillin. \n") - } - x[which(x[, AMX] == "S" & !x[, AMP] %in% c("S", "I", "R")), AMP] <- "S" - x[which(x[, AMX] == "I" & !x[, AMP] %in% c("S", "I", "R")), AMP] <- "I" - x[which(x[, AMX] == "R" & !x[, AMP] %in% c("S", "I", "R")), AMP] <- "R" - } else if (ab_missing(AMP) & !ab_missing(AMX)) { + if (ab_missing(AMP) & !ab_missing(AMX)) { # ampicillin column is missing, but amoxicillin is available message(blue(paste0("NOTE: Using column `", bold(AMX), "` as input for ampicillin (J01CA01) since many EUCAST rules depend on it."))) AMP <- AMX @@ -611,6 +605,7 @@ eucast_rules <- function(x, } } + eucast_notification_shown <- FALSE eucast_rules_df <- eucast_rules_file # internal data file no_added <- 0 no_changed <- 0 @@ -648,6 +643,13 @@ eucast_rules <- function(x, next } + if (info == TRUE & !rule_group_current %like% "other" & eucast_notification_shown == FALSE) { + cat(paste0( + "\n----\nRules by the ", bold("European Committee on Antimicrobial Susceptibility Testing (EUCAST)"), + "\n", blue("http://eucast.org/"), "\n")) + eucast_notification_shown <- TRUE + } + if (info == TRUE) { # Print rule (group) ------------------------------------------------------ @@ -660,7 +662,7 @@ eucast_rules <- function(x, rule_group_current %like% "expert" ~ paste0("\nEUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v", EUCAST_VERSION_EXPERT_RULES, ")\n"), TRUE ~ - "\nOther rules\n" + "\nOther rules by this AMR package\n" ) )) } @@ -707,6 +709,7 @@ eucast_rules <- function(x, } if (like_is_one_of == "is") { + # so 'Enterococcus' will turn into '^Enterococcus$' mo_value <- paste0("^", eucast_rules_df[i, 3], "$") } else if (like_is_one_of == "one_of") { # so 'Clostridium, Actinomyces, ...' will turn into '^(Clostridium|Actinomyces|...)$' @@ -717,7 +720,7 @@ eucast_rules <- function(x, } else if (like_is_one_of == "like") { mo_value <- eucast_rules_df[i, 3] } else { - stop("invalid like_is_one_of", call. = FALSE) + stop("invalid value for column 'like.is.one_of'", call. = FALSE) } source_antibiotics <- eucast_rules_df[i, 4] diff --git a/R/mo.R b/R/mo.R index 5d64def4..bad263d4 100755 --- a/R/mo.R +++ b/R/mo.R @@ -59,15 +59,6 @@ #' #' The algorithm uses data from the Catalogue of Life (see below) and from one other source (see \code{\link{microorganisms}}). #' -#' \strong{Self-learning algoritm} \cr -#' The \code{as.mo()} function gains experience from previously determined microorganism IDs and learns from it. This drastically improves both speed and reliability. Use \code{clear_mo_history()} to reset the algorithms. Only experience from your current \code{AMR} package version is used. This is done because in the future the taxonomic tree (which is included in this package) may change for any organism and it consequently has to rebuild its knowledge. -#' -#' Usually, any guess after the first try runs 80-95\% faster than the first try. -#' -# \emph{For now, learning only works per session. If R is closed or terminated, the algorithms reset. This might be resolved in a future version.} -#' This resets with every update of this \code{AMR} package since results are saved to your local package library folder. -#' -#' \strong{Intelligent rules} \cr #' The \code{as.mo()} function uses several coercion rules for fast and logical results. It assesses the input matching criteria in the following order: #' \itemize{ @@ -76,7 +67,10 @@ #' \item{Breakdown of input values to identify possible matches.} #' } #' -#' This will lead to the effect that e.g. \code{"E. coli"} (a highly prevalent microorganism found in humans) will return the microbial ID of \emph{Escherichia coli} and not \emph{Entamoeba coli} (a less prevalent microorganism in humans), although the latter would alphabetically come first. In addition, the \code{as.mo()} function can differentiate four levels of uncertainty to guess valid results: +#' This will lead to the effect that e.g. \code{"E. coli"} (a highly prevalent microorganism found in humans) will return the microbial ID of \emph{Escherichia coli} and not \emph{Entamoeba coli} (a less prevalent microorganism in humans), although the latter would alphabetically come first. +#' +#' \strong{Coping with uncertain results} \cr +#' In addition, the \code{as.mo()} function can differentiate four levels of uncertainty to guess valid results: #' #' \itemize{ #' \item{Uncertainty level 0: no additional rules are applied;} @@ -95,9 +89,12 @@ #' #' The level of uncertainty can be set using the argument \code{allow_uncertain}. The default is \code{allow_uncertain = TRUE}, which is equal to uncertainty level 2. Using \code{allow_uncertain = FALSE} is equal to uncertainty level 0 and will skip all rules. You can also use e.g. \code{as.mo(..., allow_uncertain = 1)} to only allow up to level 1 uncertainty. #' -#' Use \code{mo_failures()} to get a vector with all values that could not be coerced to a valid value. \cr -#' Use \code{mo_uncertainties()} to get a \code{data.frame} with all values that were coerced to a valid value, but with uncertainty. \cr -#' Use \code{mo_renamed()} to get a \code{data.frame} with all values that could be coerced based on an old, previously accepted taxonomic name. +#' There are three helper functions that can be run after then \code{as.mo()} function: +#' \itemize{ +#' \item{Use \code{mo_uncertainties()} to get a \code{data.frame} with all values that were coerced to a valid value, but with uncertainty. The output contains a score, that is calculated as \code{(n - 0.5 * L) / n}, where \emph{n} is the number of characters of the returned full name of the microorganism, and \emph{L} is the \href{https://en.wikipedia.org/wiki/Levenshtein_distance}{Levenshtein distance} between that full name and the user input.} +#' \item{Use \code{mo_failures()} to get a vector with all values that could not be coerced to a valid value.} +#' \item{Use \code{mo_renamed()} to get a \code{data.frame} with all values that could be coerced based on an old, previously accepted taxonomic name.} +#' } #' #' \strong{Microbial prevalence of pathogens in humans} \cr #' The intelligent rules consider the prevalence of microorganisms in humans grouped into three groups, which is available as the \code{prevalence} columns in the \code{\link{microorganisms}} and \code{\link{microorganisms.old}} data sets. The grouping into prevalence groups is based on experience from several microbiological laboratories in the Netherlands in conjunction with international reports on pathogen prevalence. @@ -107,6 +104,14 @@ #' Group 2 consists of all microorganisms where the taxonomic phylum is Proteobacteria, Firmicutes, Actinobacteria or Sarcomastigophora, or where the taxonomic genus is \emph{Aspergillus}, \emph{Bacteroides}, \emph{Candida}, \emph{Capnocytophaga}, \emph{Chryseobacterium}, \emph{Cryptococcus}, \emph{Elisabethkingia}, \emph{Flavobacterium}, \emph{Fusobacterium}, \emph{Giardia}, \emph{Leptotrichia}, \emph{Mycoplasma}, \emph{Prevotella}, \emph{Rhodotorula}, \emph{Treponema}, \emph{Trichophyton} or \emph{Ureaplasma}. #' #' Group 3 (least prevalent microorganisms) consists of all other microorganisms. +#' +#' \strong{Self-learning algorithm} \cr +#' The \code{as.mo()} function gains experience from previously determined microorganism IDs and learns from it. This drastically improves both speed and reliability. Use \code{clear_mo_history()} to reset the algorithms. Only experience from your current \code{AMR} package version is used. This is done because in the future the taxonomic tree (which is included in this package) may change for any organism and it consequently has to rebuild its knowledge. +#' +#' Usually, any guess after the first try runs 80-95\% faster than the first try. +#' +# \emph{For now, learning only works per session. If R is closed or terminated, the algorithms reset. This might be resolved in a future version.} +#' This resets with every update of this \code{AMR} package since results are saved to your local package library folder. #' @inheritSection catalogue_of_life Catalogue of Life # (source as a section here, so it can be inherited by other man pages:) #' @section Source: @@ -134,7 +139,7 @@ #' as.mo("S aureus") #' as.mo("Staphylococcus aureus") #' as.mo("Staphylococcus aureus (MRSA)") -#' as.mo("Sthafilokkockus aaureuz") # handles incorrect spelling +#' as.mo("Zthafilokkoockus oureuz") # handles incorrect spelling #' as.mo("MRSA") # Methicillin Resistant S. aureus #' as.mo("VISA") # Vancomycin Intermediate S. aureus #' as.mo("VRSA") # Vancomycin Resistant S. aureus @@ -287,7 +292,7 @@ exec_as.mo <- function(x, disable_mo_history = FALSE, debug = FALSE, reference_data_to_use = microorganismsDT) { - + if (!"AMR" %in% base::.packages()) { require("AMR") # check onLoad() in R/zzz.R: data tables are created there. @@ -518,7 +523,7 @@ exec_as.mo <- function(x, x <- gsub("(alpha|beta|gamma).?ha?emoly", "\\1-haemoly", x) # remove genus as first word x <- gsub("^genus ", "", x) - # remove 'uncertain' like texts + # remove 'uncertain'-like texts x <- trimws(gsub("(uncertain|susp[ie]c[a-z]+|verdacht)", "", x)) # allow characters that resemble others = dyslexia_mode ---- if (dyslexia_mode == TRUE) { @@ -539,13 +544,19 @@ exec_as.mo <- function(x, x <- gsub("e+", "e+", x) x <- gsub("o+", "o+", x) x <- gsub("(.)\\1+", "\\1+", x) + # allow multiplication of all other consonants + x <- gsub("([bdghjlnrw]+)", "\\1+", x) # allow ending in -en or -us x <- gsub("e\\+n(?![a-z[])", "(e+n|u+(c|k|q|qu|s|z|x|ks)+)", x, perl = TRUE) - # if the input is longer than 10 characters, allow any constant between all characters, as some might have forgotten a character + # if the input is longer than 10 characters, allow any forgotten consonant between all characters, as some might just have forgotten one... # this will allow "Pasteurella damatis" to be correctly read as "Pasteurella dagmatis". - constants <- paste(letters[!letters %in% c("a", "e", "i", "o", "u")], collapse = "") - - x[nchar(x_backup_without_spp) > 10] <- gsub("[+]", paste0("+[", constants, "]?"), x[nchar(x_backup_without_spp) > 10]) + consonants <- paste(letters[!letters %in% c("a", "e", "i", "o", "u")], collapse = "") + x[nchar(x_backup_without_spp) > 10] <- gsub("[+]", paste0("+[", consonants, "]?"), x[nchar(x_backup_without_spp) > 10]) + # allow au and ou after all these regex implementations + x <- gsub("a+[bcdfghjklmnpqrstvwxyz]?u+[bcdfghjklmnpqrstvwxyz]?", "(a+u+|o+u+)[bcdfghjklmnpqrstvwxyz]?", x, fixed = TRUE) + x <- gsub("o+[bcdfghjklmnpqrstvwxyz]?u+[bcdfghjklmnpqrstvwxyz]?", "(a+u+|o+u+)[bcdfghjklmnpqrstvwxyz]?", x, fixed = TRUE) + # make sure to remove regex overkill (will lead to errors) + x <- gsub("++", "+", x, fixed = TRUE) } x <- strip_whitespace(x, dyslexia_mode) @@ -578,7 +589,7 @@ exec_as.mo <- function(x, } progress <- progress_estimated(n = length(x), min_time = 3) - + for (i in seq_len(length(x))) { progress$tick()$print() @@ -834,8 +845,8 @@ exec_as.mo <- function(x, next } # streptococcal groups: milleri and viridans - if (x_trimmed[i] %like_case% "strepto.* milleri" - | x_backup_without_spp[i] %like_case% "strepto.* milleri" + if (x_trimmed[i] %like_case% "strepto.* mil+er+i" + | x_backup_without_spp[i] %like_case% "strepto.* mil+er+i" | x_backup_without_spp[i] %like_case% "mgs[^a-z]?$") { # Milleri Group Streptococcus (MGS) x[i] <- microorganismsDT[mo == "B_STRPT_MILL", ..property][[1]][1L] @@ -1863,6 +1874,7 @@ mo_uncertainties <- function() { #' @exportMethod print.mo_uncertainties #' @importFrom crayon green yellow red white black bgGreen bgYellow bgRed +#' @importFrom cleaner percentage #' @export #' @noRd print.mo_uncertainties <- function(x, ...) { @@ -1890,7 +1902,9 @@ print.mo_uncertainties <- function(x, ...) { paste0(colour2(paste0(" [", x[i, "uncertainty"], "] ")), ' "', x[i, "input"], '" -> ', colour1(paste0(italic(x[i, "fullname"]), ifelse(!is.na(x[i, "renamed_to"]), paste(", renamed to", italic(x[i, "renamed_to"])), ""), - " (", x[i, "mo"], ")"))), + " (", x[i, "mo"], + ", score: ", percentage(levenshtein_fraction(x[i, "input"], x[i, "fullname"]), digits = 1), + ")"))), sep = "\n") } cat(msg) @@ -1977,3 +1991,15 @@ load_mo_failures_uncertainties_renamed <- function(metadata) { options("mo_uncertainties" = metadata$uncertainties) options("mo_renamed" = metadata$renamed) } + +#' @importFrom utils adist +levenshtein_fraction <- function(input, output) { + levenshtein <- double(length = length(input)) + for (i in seq_len(length(input))) { + # determine levenshtein distance, but maximise to nchar of output + levenshtein[i] <- base::min(base::as.double(adist(input[i], output[i], ignore.case = TRUE)), + base::nchar(output[i])) + } + # self-made score between 0 and 1 (for % certainty, so 0 means huge distance, 1 means no distance) + (base::nchar(output) - 0.5 * levenshtein) / nchar(output) +} diff --git a/R/sysdata.rda b/R/sysdata.rda index c658d3ee..2fadec0f 100644 Binary files a/R/sysdata.rda and b/R/sysdata.rda differ diff --git a/R/zzz.R b/R/zzz.R index 03233e24..f0323055 100755 --- a/R/zzz.R +++ b/R/zzz.R @@ -47,15 +47,21 @@ # maybe add survey later: "https://www.surveymonkey.com/r/AMR_for_R" #' @importFrom data.table as.data.table setkey +#' @importFrom dplyr %>% mutate case_when make_DT <- function() { microorganismsDT <- as.data.table(AMR::microorganisms %>% mutate(kingdom_index = case_when(kingdom == "Bacteria" ~ 1, kingdom == "Fungi" ~ 2, kingdom == "Protozoa" ~ 3, kingdom == "Archaea" ~ 4, - TRUE ~ 6))) - # for fullname_lower: keep only dots, letters, numbers, slashes, spaces and dashes - microorganismsDT$fullname_lower <- gsub("[^.a-z0-9/ \\-]+", "", tolower(microorganismsDT$fullname)) + TRUE ~ 99), + # for fullname_lower: keep only dots, letters, + # numbers, slashes, spaces and dashes + fullname_lower = gsub("[^.a-z0-9/ \\-]+", "", + # use this paste instead of `fullname` to + # work with Viridans Group Streptococci, etc. + tolower(trimws(paste(genus, species, subspecies)))))) + # so arrange data on prevalence first, then kingdom, then full name setkey(microorganismsDT, prevalence, kingdom_index, diff --git a/data-raw/eucast_rules.tsv b/data-raw/eucast_rules.tsv index cac99db2..70110bb1 100644 --- a/data-raw/eucast_rules.tsv +++ b/data-raw/eucast_rules.tsv @@ -9,6 +9,19 @@ # >>>>> IF YOU WANT TO IMPORT THIS FILE INTO YOUR OWN SOFTWARE, HAVE THE FIRST 10 LINES SKIPPED <<<<< # ------------------------------------------------------------------------------------------------------------------------------- if_mo_property like.is.one_of this_value and_these_antibiotics have_these_values then_change_these_antibiotics to_value reference.rule reference.rule_group +genus like .* AMP S AMX S Non-EUCAST: inherit ampicillin results for unavailable amoxicillin Other rules +genus like .* AMP I AMX I Non-EUCAST: inherit ampicillin results for unavailable amoxicillin Other rules +genus like .* AMP R AMX R Non-EUCAST: inherit ampicillin results for unavailable amoxicillin Other rules +genus like .* AMX S AMP S Non-EUCAST: inherit amoxicillin results for unavailable ampicillin Other rules +genus like .* AMX I AMP I Non-EUCAST: inherit amoxicillin results for unavailable ampicillin Other rules +genus like .* AMX R AMP R Non-EUCAST: inherit amoxicillin results for unavailable ampicillin Other rules +genus like .* AMC R AMP, AMX R Non-EUCAST: set ampicillin = R where amoxicillin/clav acid = R Other rules +genus like .* TZP R PIP R Non-EUCAST: set piperacillin = R where piperacillin/tazobactam = R Other rules +genus like .* SXT R TMP R Non-EUCAST: set trimethoprim = R where trimethoprim/sulfa = R Other rules +genus like .* AMP S AMC S Non-EUCAST: set amoxicillin/clav acid = S where ampicillin = S Other rules +genus like .* AMX S AMC S Non-EUCAST: set amoxicillin/clav acid = S where ampicillin = S Other rules +genus like .* PIP S TZP S Non-EUCAST: set piperacillin/tazobactam = S where piperacillin = S Other rules +genus like .* TMP S SXT S Non-EUCAST: set trimethoprim/sulfa = S where trimethoprim = S Other rules order is Enterobacterales AMP S AMX S Enterobacterales (Order) Breakpoints order is Enterobacterales AMP I AMX I Enterobacterales (Order) Breakpoints order is Enterobacterales AMP R AMX R Enterobacterales (Order) Breakpoints @@ -53,7 +66,7 @@ genus_species like ^Streptococcus (australis|bovis|constellatus|cristatus|gallol genus_species like ^Streptococcus (australis|bovis|constellatus|cristatus|gallolyticus|gordonii|infantarius|infantis|mitis|mutans|oligofermentans|oralis|peroris|pseudopneumoniae|salivarius|sinensis|sobrinus|thermophilus|vestibularis|anginosus|equinus|intermedius|parasanguinis|sanguinis)$ AMP I AMX, AMC, PIP, TZP I Viridans group streptococci Breakpoints genus_species like ^Streptococcus (australis|bovis|constellatus|cristatus|gallolyticus|gordonii|infantarius|infantis|mitis|mutans|oligofermentans|oralis|peroris|pseudopneumoniae|salivarius|sinensis|sobrinus|thermophilus|vestibularis|anginosus|equinus|intermedius|parasanguinis|sanguinis)$ AMP R AMX, AMC, PIP, TZP R Viridans group streptococci Breakpoints genus_species is Haemophilus influenzae AMP S AMX, PIP S Haemophilus influenzae Breakpoints -genus_species is ^Haemophilus influenzae AMP I AMX, PIP I Haemophilus influenzae Breakpoints +genus_species is Haemophilus influenzae AMP I AMX, PIP I Haemophilus influenzae Breakpoints genus_species is Haemophilus influenzae AMP R AMX, PIP R Haemophilus influenzae Breakpoints genus_species is Haemophilus influenzae PEN S AMP, AMX, AMC, PIP, TZP S Haemophilus influenzae Breakpoints genus_species is Haemophilus influenzae AMC S TZP S Haemophilus influenzae Breakpoints @@ -164,7 +177,7 @@ genus_species is Enterococcus casseliflavus FUS, CAZ, cephalosporins_without_C genus_species is Enterococcus faecium FUS, CAZ, cephalosporins_without_CAZ, aminoglycosides, macrolides, TMP, SXT R Table 04: Intrinsic resistance in Gram-positive bacteria Expert Rules genus is Corynebacterium FOS R Table 04: Intrinsic resistance in Gram-positive bacteria Expert Rules genus_species is Listeria monocytogenes cephalosporins R Table 04: Intrinsic resistance in Gram-positive bacteria Expert Rules -genus is Leuconostoc, Pediococcus glycopeptides R Table 04: Intrinsic resistance in Gram-positive bacteria Expert Rules +genus one_of Leuconostoc, Pediococcus glycopeptides R Table 04: Intrinsic resistance in Gram-positive bacteria Expert Rules genus is Lactobacillus glycopeptides R Table 04: Intrinsic resistance in Gram-positive bacteria Expert Rules genus_species is Clostridium ramosum VAN R Table 04: Intrinsic resistance in Gram-positive bacteria Expert Rules genus_species is Clostridium innocuum VAN R Table 04: Intrinsic resistance in Gram-positive bacteria Expert Rules @@ -172,9 +185,9 @@ genus_species like ^Streptococcus (pyogenes|agalactiae|dysgalactiae|group A|grou genus is Enterococcus AMP R ureidopenicillins, carbapenems R Table 08: Interpretive rules for B-lactam agents and Gram-positive cocci Expert Rules genus is Enterococcus AMX R ureidopenicillins, carbapenems R Table 08: Interpretive rules for B-lactam agents and Gram-positive cocci Expert Rules family is Enterobacteriaceae TIC, PIP R, S PIP R Table 09: Interpretive rules for B-lactam agents and Gram-negative rods Expert Rules -genus is .* ERY S AZM, CLR S Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins Expert Rules -genus is .* ERY I AZM, CLR I Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins Expert Rules -genus is .* ERY R AZM, CLR R Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins Expert Rules +genus like .* ERY S AZM, CLR S Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins Expert Rules +genus like .* ERY I AZM, CLR I Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins Expert Rules +genus like .* ERY R AZM, CLR R Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins Expert Rules genus is Staphylococcus TOB R KAN, AMK R Table 12: Interpretive rules for aminoglycosides Expert Rules genus is Staphylococcus GEN R aminoglycosides R Table 12: Interpretive rules for aminoglycosides Expert Rules order is Enterobacterales GEN, TOB I, S GEN R Table 12: Interpretive rules for aminoglycosides Expert Rules @@ -183,10 +196,3 @@ genus is Staphylococcus MFX R fluoroquinolones R Table 13: Interpretive rules fo genus_species is Streptococcus pneumoniae MFX R fluoroquinolones R Table 13: Interpretive rules for quinolones Expert Rules order is Enterobacterales CIP R fluoroquinolones R Table 13: Interpretive rules for quinolones Expert Rules genus_species is Neisseria gonorrhoeae CIP R fluoroquinolones R Table 13: Interpretive rules for quinolones Expert Rules -genus is .* AMC R AMP, AMX R Non-EUCAST: ampicillin = R where amoxicillin/clav acid = R Other rules -genus is .* TZP R PIP R Non-EUCAST: piperacillin = R where piperacillin/tazobactam = R Other rules -genus is .* SXT R TMP R Non-EUCAST: trimethoprim = R where trimethoprim/sulfa = R Other rules -genus is .* AMP S AMC S Non-EUCAST: amoxicillin/clav acid = S where ampicillin = S Other rules -genus is .* AMX S AMC S Non-EUCAST: amoxicillin/clav acid = S where ampicillin = S Other rules -genus is .* PIP S TZP S Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S Other rules -genus is .* TMP S SXT S Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S Other rules diff --git a/data-raw/internals.R b/data-raw/internals.R index 9ad2c2a8..555e7dc9 100644 --- a/data-raw/internals.R +++ b/data-raw/internals.R @@ -2,14 +2,18 @@ # source("data-raw/internals.R") # See 'data-raw/eucast_rules.tsv' for the EUCAST reference file -eucast_rules_file <- dplyr::arrange( - .data = utils::read.delim(file = "data-raw/eucast_rules.tsv", +eucast_rules_file <- utils::read.delim(file = "data-raw/eucast_rules.tsv", skip = 10, sep = "\t", stringsAsFactors = FALSE, header = TRUE, strip.white = TRUE, - na = c(NA, "", NULL)), + na = c(NA, "", NULL)) +# take the order of the reference.rule_group column in the orginal data file +eucast_rules_file$reference.rule_group <- factor(eucast_rules_file$reference.rule_group, + levels = unique(eucast_rules_file$reference.rule_group), + ordered = TRUE) +eucast_rules_file <- dplyr::arrange(eucast_rules_file, reference.rule_group, reference.rule) diff --git a/data/example_isolates.rda b/data/example_isolates.rda index 82568530..15c315bc 100644 Binary files a/data/example_isolates.rda and b/data/example_isolates.rda differ diff --git a/docs/404.html b/docs/404.html index 5927d9a0..7bdaaa2a 100644 --- a/docs/404.html +++ b/docs/404.html @@ -84,7 +84,7 @@
diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index ba6220da..858b9650 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -84,7 +84,7 @@ diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index c6d9730e..412273a1 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -41,7 +41,7 @@ @@ -187,7 +187,7 @@AMR.Rmd
Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 11 November 2019.
+Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 15 November 2019.
So, we can draw at least two conclusions immediately. From a data scientists perspective, the data looks clean: only values M
and F
. From a researchers perspective: there are slightly more men. Nothing we didn’t already know.
The data is already quite clean, but we still need to transform some variables. The bacteria
column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate()
function of the dplyr
package makes this really easy:
data <- data %>%
@@ -419,60 +419,62 @@
Because the amoxicillin (column AMX
) and amoxicillin/clavulanic acid (column AMC
) in our data were generated randomly, some rows will undoubtedly contain AMX = S and AMC = R, which is technically impossible. The eucast_rules()
fixes this:
data <- eucast_rules(data, col_mo = "bacteria")
#
-# Rules by the European Committee on Antimicrobial Susceptibility Testing (EUCAST)
-# http://eucast.org/
-#
-# EUCAST Clinical Breakpoints (v9.0, 2019)
-# Aerococcus sanguinicola (no changes)
-# Aerococcus urinae (no changes)
-# Anaerobic Gram-negatives (no changes)
-# Anaerobic Gram-positives (no changes)
-# Campylobacter coli (no changes)
-# Campylobacter jejuni (no changes)
-# Enterobacterales (Order) (no changes)
-# Enterococcus (no changes)
-# Haemophilus influenzae (no changes)
-# Kingella kingae (no changes)
-# Moraxella catarrhalis (no changes)
-# Pasteurella multocida (no changes)
-# Staphylococcus (no changes)
-# Streptococcus groups A, B, C, G (no changes)
-# Streptococcus pneumoniae (1,552 values changed)
-# Viridans group streptococci (no changes)
-#
-# EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
-# Table 01: Intrinsic resistance in Enterobacteriaceae (1,279 values changed)
-# Table 02: Intrinsic resistance in non-fermentative Gram-negative bacteria (no changes)
-# Table 03: Intrinsic resistance in other Gram-negative bacteria (no changes)
-# Table 04: Intrinsic resistance in Gram-positive bacteria (2,800 values changed)
-# Table 08: Interpretive rules for B-lactam agents and Gram-positive cocci (no changes)
-# Table 09: Interpretive rules for B-lactam agents and Gram-negative rods (no changes)
-# Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins (no changes)
-# Table 12: Interpretive rules for aminoglycosides (no changes)
-# Table 13: Interpretive rules for quinolones (no changes)
+# Other rules by this AMR package
+# Non-EUCAST: inherit amoxicillin results for unavailable ampicillin (no changes)
+# Non-EUCAST: inherit ampicillin results for unavailable amoxicillin (no changes)
+# Non-EUCAST: set amoxicillin/clav acid = S where ampicillin = S (3,022 values changed)
+# Non-EUCAST: set ampicillin = R where amoxicillin/clav acid = R (151 values changed)
+# Non-EUCAST: set piperacillin = R where piperacillin/tazobactam = R (no changes)
+# Non-EUCAST: set piperacillin/tazobactam = S where piperacillin = S (no changes)
+# Non-EUCAST: set trimethoprim = R where trimethoprim/sulfa = R (no changes)
+# Non-EUCAST: set trimethoprim/sulfa = S where trimethoprim = S (no changes)
+#
+# ----
+# Rules by the European Committee on Antimicrobial Susceptibility Testing (EUCAST)
+# http://eucast.org/
+#
+# EUCAST Clinical Breakpoints (v9.0, 2019)
+# Aerococcus sanguinicola (no changes)
+# Aerococcus urinae (no changes)
+# Anaerobic Gram-negatives (no changes)
+# Anaerobic Gram-positives (no changes)
+# Campylobacter coli (no changes)
+# Campylobacter jejuni (no changes)
+# Enterobacterales (Order) (no changes)
+# Enterococcus (no changes)
+# Haemophilus influenzae (no changes)
+# Kingella kingae (no changes)
+# Moraxella catarrhalis (no changes)
+# Pasteurella multocida (no changes)
+# Staphylococcus (no changes)
+# Streptococcus groups A, B, C, G (no changes)
+# Streptococcus pneumoniae (1,071 values changed)
+# Viridans group streptococci (no changes)
#
-# Other rules
-# Non-EUCAST: amoxicillin/clav acid = S where ampicillin = S (2,257 values changed)
-# Non-EUCAST: ampicillin = R where amoxicillin/clav acid = R (132 values changed)
-# Non-EUCAST: piperacillin = R where piperacillin/tazobactam = R (no changes)
-# Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no changes)
-# Non-EUCAST: trimethoprim = R where trimethoprim/sulfa = R (no changes)
-# Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no changes)
-#
-# --------------------------------------------------------------------------
-# EUCAST rules affected 6,599 out of 20,000 rows, making a total of 8,020 edits
-# => added 0 test results
-#
-# => changed 8,020 test results
-# - 119 test results changed from S to I
-# - 4,832 test results changed from S to R
-# - 1,096 test results changed from I to S
-# - 342 test results changed from I to R
-# - 1,607 test results changed from R to S
-# - 24 test results changed from R to I
-# --------------------------------------------------------------------------
-#
-# Use eucast_rules(..., verbose = TRUE) (on your original data) to get a data.frame with all specified edits instead.
+# EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
+# Table 01: Intrinsic resistance in Enterobacteriaceae (1,282 values changed)
+# Table 02: Intrinsic resistance in non-fermentative Gram-negative bacteria (no changes)
+# Table 03: Intrinsic resistance in other Gram-negative bacteria (no changes)
+# Table 04: Intrinsic resistance in Gram-positive bacteria (2,783 values changed)
+# Table 08: Interpretive rules for B-lactam agents and Gram-positive cocci (no changes)
+# Table 09: Interpretive rules for B-lactam agents and Gram-negative rods (no changes)
+# Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins (no changes)
+# Table 12: Interpretive rules for aminoglycosides (no changes)
+# Table 13: Interpretive rules for quinolones (no changes)
+#
+# --------------------------------------------------------------------------
+# EUCAST rules affected 6,586 out of 20,000 rows, making a total of 8,309 edits
+# => added 0 test results
+#
+# => changed 8,309 test results
+# - 129 test results changed from S to I
+# - 4,834 test results changed from S to R
+# - 1,222 test results changed from I to S
+# - 324 test results changed from I to R
+# - 1,800 test results changed from R to S
+# --------------------------------------------------------------------------
+#
+# Use eucast_rules(..., verbose = TRUE) (on your original data) to get a data.frame with all specified edits instead.
So only 28.3% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
So only 28.4% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
For future use, the above two syntaxes can be shortened with the filter_first_isolate()
function:
We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient D2, sorted on date:
+We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient I9, sorted on date:
isolate | @@ -524,19 +526,19 @@||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | -2010-02-14 | -D2 | +2010-02-08 | +I9 | B_ESCHR_COLI | +S | +S | R | S | -S | -S | TRUE | ||||||||||
2 | -2010-04-27 | -D2 | +2010-03-05 | +I9 | B_ESCHR_COLI | S | S | @@ -546,30 +548,30 @@|||||||||||||||
3 | -2010-05-31 | -D2 | +2010-05-14 | +I9 | B_ESCHR_COLI | +S | +S | +S | R | -S | -S | -S | FALSE | |||||||||
4 | -2010-08-21 | -D2 | +2010-12-10 | +I9 | B_ESCHR_COLI | +R | S | -S | -S | +R | S | FALSE | ||||||||||
5 | -2010-09-21 | -D2 | +2010-12-17 | +I9 | B_ESCHR_COLI | S | S | @@ -579,30 +581,30 @@|||||||||||||||
6 | -2010-10-04 | -D2 | +2011-04-18 | +I9 | B_ESCHR_COLI | -R | S | S | S | -FALSE | +S | +TRUE | ||||||||||
7 | -2010-10-11 | -D2 | +2011-04-25 | +I9 | B_ESCHR_COLI | -S | -S | R | S | +S | +S | FALSE | ||||||||||
8 | -2010-11-16 | -D2 | +2011-06-06 | +I9 | B_ESCHR_COLI | S | S | @@ -612,23 +614,23 @@|||||||||||||||
9 | -2011-03-05 | -D2 | +2011-07-14 | +I9 | B_ESCHR_COLI | S | S | S | S | -TRUE | +FALSE | |||||||||||
10 | -2011-04-18 | -D2 | +2011-07-31 | +I9 | B_ESCHR_COLI | S | S | -S | +R | S | FALSE |
isolate | @@ -662,20 +664,20 @@||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | -2010-02-14 | -D2 | +2010-02-08 | +I9 | B_ESCHR_COLI | +S | +S | R | S | -S | -S | TRUE | TRUE | |
2 | -2010-04-27 | -D2 | +2010-03-05 | +I9 | B_ESCHR_COLI | S | S | @@ -686,68 +688,68 @@|||||||
3 | -2010-05-31 | -D2 | +2010-05-14 | +I9 | B_ESCHR_COLI | +S | +S | +S | R | -S | -S | -S | FALSE | TRUE |
4 | -2010-08-21 | -D2 | +2010-12-10 | +I9 | B_ESCHR_COLI | +R | S | -S | -S | +R | S | FALSE | TRUE | |
5 | -2010-09-21 | -D2 | +2010-12-17 | +I9 | B_ESCHR_COLI | S | S | S | S | FALSE | -FALSE | -|||
6 | -2010-10-04 | -D2 | -B_ESCHR_COLI | -R | -S | -S | -S | -FALSE | TRUE | |||||
7 | -2010-10-11 | -D2 | +||||||||||||
6 | +2011-04-18 | +I9 | B_ESCHR_COLI | S | S | +S | +S | +TRUE | +TRUE | +|||||
7 | +2011-04-25 | +I9 | +B_ESCHR_COLI | R | S | +S | +S | FALSE | TRUE | |||||
8 | -2010-11-16 | -D2 | +2011-06-06 | +I9 | B_ESCHR_COLI | S | S | @@ -758,35 +760,35 @@|||||||
9 | -2011-03-05 | -D2 | +2011-07-14 | +I9 | B_ESCHR_COLI | S | S | S | S | -TRUE | -TRUE | +FALSE | +FALSE | |
10 | -2011-04-18 | -D2 | +2011-07-31 | +I9 | B_ESCHR_COLI | S | S | -S | +R | S | FALSE | -FALSE | +TRUE |
Instead of 2, now 8 isolates are flagged. In total, 75.0% of all isolates are marked ‘first weighted’ - 46.8% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
+Instead of 2, now 9 isolates are flagged. In total, 75.3% of all isolates are marked ‘first weighted’ - 46.8% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
As with filter_first_isolate()
, there’s a shortcut for this new algorithm too:
So we end up with 15,009 isolates for analysis.
+So we end up with 15,051 isolates for analysis.
We can remove unneeded columns:
@@ -811,45 +813,13 @@Frequency table
Class: character
-Length: 15,009 (of which NA: 0 = 0%)
+Length: 15,051 (of which NA: 0 = 0%)
Unique: 4
Shortest: 16
Longest: 24
The functions resistance()
and susceptibility()
can be used to calculate antimicrobial resistance or susceptibility. For more specific analyses, the functions proportion_S()
, proportion_SI()
, proportion_I()
, proportion_IR()
and proportion_R()
can be used to determine the proportion of a specific antimicrobial outcome.
As per the EUCAST guideline of 2019, we calculate resistance as the proportion of R (proportion_R()
, equal to resistance()
) and susceptibility as the proportion of S and I (proportion_SI()
, equal to susceptibility()
). These functions can be used on their own:
Or can be used in conjuction with group_by()
and summarise()
, both from the dplyr
package:
data_1st %>%
group_by(hospital) %>%
@@ -993,19 +995,19 @@ Longest: 24
Hospital A
-0.4640823
+0.4651671
Hospital B
-0.4663609
+0.4687618
Hospital C
-0.4736130
+0.4569626
Hospital D
-0.4749499
+0.4668462
SPSS.Rmd
Last updated: 11-Nov-2019
+Last updated: 15-Nov-2019
+microorganisms
data set, which means that the new order Enterobacterales now consists of a part of the existing family Enterobacteriaceae, but that this family has been split into other families as well (like Morganellaceae and Yersiniaceae). Although published in 2016, this information is not yet in the Catalogue of Life version of 2019. All MDRO determinations with mdro()
will now use the Enterobacterales order for all guidelines before 2016 that were dependent on the Enterobacteriaceae family.
+If you were dependent on the old Enterobacteriaceae family e.g. by using in your code:
+ +then please adjust this to:
+ +Functions susceptibility()
and resistance()
as aliases of proportion_SI()
and proportion_R()
, respectively. These functions were added to make it more clear that “I” should be considered susceptible and not resistant.
as.mo()
:
+Added a score (a certainty percentage) to mo_uncertainties()
, that is calculated using the Levenshtein distance:
as.mo(c("Stafylococcus aureus",
+ "staphylokok aureuz"))
+#> Warning:
+#> Results of two values was guessed with uncertainty. Use mo_uncertainties() to review them.
+#> Class 'mo'
+#> [1] B_STPHY_AURS B_STPHY_AURS
+
+mo_uncertainties()
+#> "Stafylococcus aureus" -> Staphylococcus aureus (B_STPHY_AURS, score: 95.2%)
+#> "staphylokok aureuz" -> Staphylococcus aureus (B_STPHY_AURS, score: 85.7%)
as.atc()
- this function was replaced by ab_atc()
portion_*
functions to proportion_*
. All portion_*
functions are still available as deprecated functions, and will return a warning when used.as.rsi()
over a data set, it will now print the guideline that will be used if it is not specified by the usereucast_rules()
: Stenotrophomonas maltophilia not interpreted “R” to ceftazidime anymore (following EUCAST v3.1)microorganisms
data set, which means that the new order Enterobacterales now consists of a part of the existing family Enterobacteriaceae, but that this family has been split into other families as well (like Morganellaceae and Yersiniaceae). Although published in 2016, this information is not yet in the Catalogue of Life version of 2019. All MDRO determinations with mdro()
will now use the Enterobacterales order for all guidelines before 2016.eucast_rules()
:
+eucast_rules()
are now applied first and not as last anymore. This is to improve the dependency on certain antibiotics for the official EUCAST rules. Please see ?eucast_rules
.as.rsi()
where the input is NA
Determination of first isolates now excludes all ‘unknown’ microorganisms at default, i.e. microbial code "UNKNOWN"
. They can be included with the new parameter include_unknown
:
"con"
(contamination) will be excluded at default, since as.mo("con") = "UNKNOWN"
. The function always shows a note with the number of ‘unknown’ microorganisms that were included or excluded.For code consistency, classes ab
and mo
will now be preserved in any subsetting or assignment. For the sake of data integrity, this means that invalid assignments will now result in NA
:
# how it works in base R:
-x <- factor("A")
-x[1] <- "B"
-#> Warning message:
-#> invalid factor level, NA generated
-
-# how it now works similarly for classes 'mo' and 'ab':
-x <- as.mo("E. coli")
-x[1] <- "testvalue"
-#> Warning message:
-#> invalid microorganism code, NA generated
# how it works in base R:
+x <- factor("A")
+x[1] <- "B"
+#> Warning message:
+#> invalid factor level, NA generated
+
+# how it now works similarly for classes 'mo' and 'ab':
+x <- as.mo("E. coli")
+x[1] <- "testvalue"
+#> Warning message:
+#> invalid microorganism code, NA generated
"testvalue"
could never be understood by e.g. mo_name()
, although the class would suggest a valid microbial code.freq()
has moved to a new package, clean
(CRAN link), since creating frequency tables actually does not fit the scope of this package. The freq()
function still works, since it is re-exported from the clean
package (which will be installed automatically upon updating this AMR
package).Renamed data set septic_patients
to example_isolates
"testvalue"
could never be
Function bug_drug_combinations()
to quickly get a data.frame
with the results of all bug-drug combinations in a data set. The column containing microorganism codes is guessed automatically and its input is transformed with mo_shortname()
at default:
x <- bug_drug_combinations(example_isolates)
-#> NOTE: Using column `mo` as input for `col_mo`.
-x[1:4, ]
-#> mo ab S I R total
-#> 1 A. baumannii AMC 0 0 3 3
-#> 2 A. baumannii AMK 0 0 0 0
-#> 3 A. baumannii AMP 0 0 3 3
-#> 4 A. baumannii AMX 0 0 3 3
-#> NOTE: Use 'format()' on this result to get a publicable/printable format.
-
-# change the transformation with the FUN argument to anything you like:
-x <- bug_drug_combinations(example_isolates, FUN = mo_gramstain)
-#> NOTE: Using column `mo` as input for `col_mo`.
-x[1:4, ]
-#> mo ab S I R total
-#> 1 Gram-negative AMC 469 89 174 732
-#> 2 Gram-negative AMK 251 0 2 253
-#> 3 Gram-negative AMP 227 0 405 632
-#> 4 Gram-negative AMX 227 0 405 632
-#> NOTE: Use 'format()' on this result to get a publicable/printable format.
x <- bug_drug_combinations(example_isolates)
+#> NOTE: Using column `mo` as input for `col_mo`.
+x[1:4, ]
+#> mo ab S I R total
+#> 1 A. baumannii AMC 0 0 3 3
+#> 2 A. baumannii AMK 0 0 0 0
+#> 3 A. baumannii AMP 0 0 3 3
+#> 4 A. baumannii AMX 0 0 3 3
+#> NOTE: Use 'format()' on this result to get a publicable/printable format.
+
+# change the transformation with the FUN argument to anything you like:
+x <- bug_drug_combinations(example_isolates, FUN = mo_gramstain)
+#> NOTE: Using column `mo` as input for `col_mo`.
+x[1:4, ]
+#> mo ab S I R total
+#> 1 Gram-negative AMC 469 89 174 732
+#> 2 Gram-negative AMK 251 0 2 253
+#> 3 Gram-negative AMP 227 0 405 632
+#> 4 Gram-negative AMX 227 0 405 632
+#> NOTE: Use 'format()' on this result to get a publicable/printable format.
You can format this to a printable format, ready for reporting or exporting to e.g. Excel with the base R format()
function:
Additional way to calculate co-resistance, i.e. when using multiple antimicrobials as input for portion_*
functions or count_*
functions. This can be used to determine the empiric susceptibility of a combination therapy. A new parameter only_all_tested
(which defaults to FALSE
) replaces the old also_single_tested
and can be used to select one of the two methods to count isolates and calculate portions. The difference can be seen in this example table (which is also on the portion
and count
help pages), where the %SI is being determined:
# --------------------------------------------------------------------
-# only_all_tested = FALSE only_all_tested = TRUE
-# ----------------------- -----------------------
-# Drug A Drug B include as include as include as include as
-# numerator denominator numerator denominator
-# -------- -------- ---------- ----------- ---------- -----------
-# S or I S or I X X X X
-# R S or I X X X X
-# <NA> S or I X X - -
-# S or I R X X X X
-# R R - X - X
-# <NA> R - - - -
-# S or I <NA> X X - -
-# R <NA> - - - -
-# <NA> <NA> - - - -
-# --------------------------------------------------------------------
# --------------------------------------------------------------------
+# only_all_tested = FALSE only_all_tested = TRUE
+# ----------------------- -----------------------
+# Drug A Drug B include as include as include as include as
+# numerator denominator numerator denominator
+# -------- -------- ---------- ----------- ---------- -----------
+# S or I S or I X X X X
+# R S or I X X X X
+# <NA> S or I X X - -
+# S or I R X X X X
+# R R - X - X
+# <NA> R - - - -
+# S or I <NA> X X - -
+# R <NA> - - - -
+# <NA> <NA> - - - -
+# --------------------------------------------------------------------
also_single_tested
will throw an informative error that it has been replaced by only_all_tested
.tibble
printing support for classes rsi
, mic
, disk
, ab
mo
. When using tibble
s containing antimicrobial columns, values S
will print in green, values I
will print in yellow and values R
will print in red. Microbial IDs (class mo
) will emphasise on the genus and species, not on the kingdom.
also_single_tested
w
Function rsi_df()
to transform a data.frame
to a data set containing only the microbial interpretation (S, I, R), the antibiotic, the percentage of S/I/R and the number of available isolates. This is a convenient combination of the existing functions count_df()
and portion_df()
to immediately show resistance percentages and number of available isolates:
Support for all scientifically published pathotypes of E. coli to date (that we could find). Supported are:
@@ -471,12 +511,12 @@ Since this is a major change, usage of the oldalso_single_tested
w
All these lead to the microbial ID of E. coli:
-as.mo("UPEC")
-# B_ESCHR_COL
-mo_name("UPEC")
-# "Escherichia coli"
-mo_gramstain("EHEC")
-# "Gram-negative"
as.mo("UPEC")
+# B_ESCHR_COL
+mo_name("UPEC")
+# "Escherichia coli"
+mo_gramstain("EHEC")
+# "Gram-negative"
mo_info()
as an analogy to ab_info()
. The mo_info()
prints a list with the full taxonomy, authors, and the URL to the online database of a microorganismFunction mo_synonyms()
to get all previously accepted taxonomic names of a microorganism
septic_patients %>%
- freq(age) %>%
- boxplot()
-# grouped boxplots:
-septic_patients %>%
- group_by(hospital_id) %>%
- freq(age) %>%
- boxplot()
New filters for antimicrobial classes. Use these functions to filter isolates on results in one of more antibiotics from a specific class:
-filter_aminoglycosides()
-filter_carbapenems()
-filter_cephalosporins()
-filter_1st_cephalosporins()
-filter_2nd_cephalosporins()
-filter_3rd_cephalosporins()
-filter_4th_cephalosporins()
-filter_fluoroquinolones()
-filter_glycopeptides()
-filter_macrolides()
-filter_tetracyclines()
filter_aminoglycosides()
+filter_carbapenems()
+filter_cephalosporins()
+filter_1st_cephalosporins()
+filter_2nd_cephalosporins()
+filter_3rd_cephalosporins()
+filter_4th_cephalosporins()
+filter_fluoroquinolones()
+filter_glycopeptides()
+filter_macrolides()
+filter_tetracyclines()
The antibiotics
data set will be searched, after which the input data will be checked for column names with a value in any abbreviations, codes or official names found in the antibiotics
data set. For example:
All ab_*
functions are deprecated and replaced by atc_*
functions:
ab_property -> atc_property()
-ab_name -> atc_name()
-ab_official -> atc_official()
-ab_trivial_nl -> atc_trivial_nl()
-ab_certe -> atc_certe()
-ab_umcg -> atc_umcg()
-ab_tradenames -> atc_tradenames()
ab_property -> atc_property()
+ab_name -> atc_name()
+ab_official -> atc_official()
+ab_trivial_nl -> atc_trivial_nl()
+ab_certe -> atc_certe()
+ab_umcg -> atc_umcg()
+ab_tradenames -> atc_tradenames()
as.atc()
internally. The old atc_property
has been renamed atc_online_property()
. This is done for two reasons: firstly, not all ATC codes are of antibiotics (ab) but can also be of antivirals or antifungals. Secondly, the input must have class atc
or must be coerable to this class. Properties of these classes should start with the same class name, analogous to as.mo()
and e.g. mo_genus
.set_mo_source()
and get_mo_source()
to use your own predefined MO codes as input for as.mo()
and consequently all mo_*
functionsdplyr
version 0.8.0as.atc()
internally. The old atc_property
New function age_groups()
to split ages into custom or predefined groups (like children or elderly). This allows for easier demographic antimicrobial resistance analysis per age group.
New function ggplot_rsi_predict()
as well as the base R plot()
function can now be used for resistance prediction calculated with resistance_predict()
:
-
+
Functions filter_first_isolate()
and filter_first_weighted_isolate()
to shorten and fasten filtering on data sets with antimicrobial results, e.g.:
-
+
is equal to:
-
+
New function availability()
to check the number of available (non-empty) results in a data.frame
@@ -746,33 +786,33 @@ These functions use as.atc()
internally. The old atc_property
-
Now handles incorrect spelling, like i
instead of y
and f
instead of ph
:
-
+
-
Uncertainty of the algorithm is now divided into four levels, 0 to 3, where the default allow_uncertain = TRUE
is equal to uncertainty level 2. Run ?as.mo
for more info about these levels.
-# equal:
-as.mo(..., allow_uncertain = TRUE)
-as.mo(..., allow_uncertain = 2)
-
-# also equal:
-as.mo(..., allow_uncertain = FALSE)
-as.mo(..., allow_uncertain = 0)
+# equal:
+as.mo(..., allow_uncertain = TRUE)
+as.mo(..., allow_uncertain = 2)
+
+# also equal:
+as.mo(..., allow_uncertain = FALSE)
+as.mo(..., allow_uncertain = 0)
Using as.mo(..., allow_uncertain = 3)
could lead to very unreliable results.
- Implemented the latest publication of Becker et al. (2019), for categorising coagulase-negative Staphylococci
- All microbial IDs that found are now saved to a local file
~/.Rhistory_mo
. Use the new function clean_mo_history()
to delete this file, which resets the algorithms.
-
Incoercible results will now be considered ‘unknown’, MO code UNKNOWN
. On foreign systems, properties of these will be translated to all languages already previously supported: German, Dutch, French, Italian, Spanish and Portuguese:
-
+
- Fix for vector containing only empty values
- Finds better results when input is in other languages
@@ -818,19 +858,19 @@ Using as.mo(..., allow_uncertain = 3)
-
Support for tidyverse quasiquotation! Now you can create frequency tables of function outcomes:
-# Determine genus of microorganisms (mo) in `septic_patients` data set:
-# OLD WAY
-septic_patients %>%
- mutate(genus = mo_genus(mo)) %>%
- freq(genus)
-# NEW WAY
-septic_patients %>%
- freq(mo_genus(mo))
-
-# Even supports grouping variables:
-septic_patients %>%
- group_by(gender) %>%
- freq(mo_genus(mo))
+# Determine genus of microorganisms (mo) in `septic_patients` data set:
+# OLD WAY
+septic_patients %>%
+ mutate(genus = mo_genus(mo)) %>%
+ freq(genus)
+# NEW WAY
+septic_patients %>%
+ freq(mo_genus(mo))
+
+# Even supports grouping variables:
+septic_patients %>%
+ group_by(gender) %>%
+ freq(mo_genus(mo))
- Header info is now available as a list, with the
header
function
- The parameter
header
is now set to TRUE
at default, even for markdown
@@ -905,10 +945,10 @@ Using as.mo(..., allow_uncertain = 3)Fewer than 3 characters as input for as.mo
will return NA
-
Function as.mo
(and all mo_*
wrappers) now supports genus abbreviations with “species” attached
-
+
- Added parameter
combine_IR
(TRUE/FALSE) to functions portion_df
and count_df
, to indicate that all values of I and R must be merged into one, so the output only consists of S vs. IR (susceptible vs. non-susceptible)
- Fix for
portion_*(..., as_percent = TRUE)
when minimal number of isolates would not be met
@@ -921,15 +961,15 @@ Using as.mo(..., allow_uncertain = 3)
-
Support for grouping variables, test with:
-
+
-
Support for (un)selecting columns:
-
+
- Check for
hms::is.hms
@@ -1009,18 +1049,18 @@ Using as.mo(..., allow_uncertain = 3)
They also come with support for German, Dutch, French, Italian, Spanish and Portuguese:
-mo_gramstain("E. coli")
-# [1] "Gram negative"
-mo_gramstain("E. coli", language = "de") # German
-# [1] "Gramnegativ"
-mo_gramstain("E. coli", language = "es") # Spanish
-# [1] "Gram negativo"
-mo_fullname("S. group A", language = "pt") # Portuguese
-# [1] "Streptococcus grupo A"
+mo_gramstain("E. coli")
+# [1] "Gram negative"
+mo_gramstain("E. coli", language = "de") # German
+# [1] "Gramnegativ"
+mo_gramstain("E. coli", language = "es") # Spanish
+# [1] "Gram negativo"
+mo_fullname("S. group A", language = "pt") # Portuguese
+# [1] "Streptococcus grupo A"
Furthermore, former taxonomic names will give a note about the current taxonomic name:
-mo_gramstain("Esc blattae")
-# Note: 'Escherichia blattae' (Burgess et al., 1973) was renamed 'Shimwellia blattae' (Priest and Barker, 2010)
-# [1] "Gram negative"
+mo_gramstain("Esc blattae")
+# Note: 'Escherichia blattae' (Burgess et al., 1973) was renamed 'Shimwellia blattae' (Priest and Barker, 2010)
+# [1] "Gram negative"
Functions count_R
, count_IR
, count_I
, count_SI
and count_S
to selectively count resistant or susceptible isolates
@@ -1031,18 +1071,18 @@ Using as.mo(..., allow_uncertain = 3)
-
Functions as.mo
and is.mo
as replacements for as.bactid
and is.bactid
(since the microoganisms
data set not only contains bacteria). These last two functions are deprecated and will be removed in a future release. The as.mo
function determines microbial IDs using intelligent rules:
-as.mo("E. coli")
-# [1] B_ESCHR_COL
-as.mo("MRSA")
-# [1] B_STPHY_AUR
-as.mo("S group A")
-# [1] B_STRPTC_GRA
+as.mo("E. coli")
+# [1] B_ESCHR_COL
+as.mo("MRSA")
+# [1] B_STPHY_AUR
+as.mo("S group A")
+# [1] B_STRPTC_GRA
And with great speed too - on a quite regular Linux server from 2007 it takes us less than 0.02 seconds to transform 25,000 items:
-
+
- Added parameter
reference_df
for as.mo
, so users can supply their own microbial IDs, name or codes as a reference table
- Renamed all previous references to
bactid
to mo
, like:
@@ -1070,12 +1110,12 @@ Using as.mo(..., allow_uncertain = 3)Added three antimicrobial agents to the antibiotics
data set: Terbinafine (D01BA02), Rifaximin (A07AA11) and Isoconazole (D01AC05)
-
Added 163 trade names to the antibiotics
data set, it now contains 298 different trade names in total, e.g.:
-
+
- For
first_isolate
, rows will be ignored when there’s no species available
- Function
ratio
is now deprecated and will be removed in a future release, as it is not really the scope of this package
@@ -1086,13 +1126,13 @@ Using as.mo(..., allow_uncertain = 3)
-
Support for quasiquotation in the functions series count_*
and portions_*
, and n_rsi
. This allows to check for more than 2 vectors or columns.
-
+
- Edited
ggplot_rsi
and geom_rsi
so they can cope with count_df
. The new fun
parameter has value portion_df
at default, but can be set to count_df
.
- Fix for
ggplot_rsi
when the ggplot2
package was not loaded
@@ -1106,12 +1146,12 @@ Using as.mo(..., allow_uncertain = 3)
-
Support for types (classes) list and matrix for freq
-
+
For lists, subsetting is possible:
-
+
as.mo(..., allow_uncertain = 3)
Contents
mo
) typically looks li
Values that cannot be coered will be considered 'unknown' and will get the MO code UNKNOWN
.
Use the mo_property_*
functions to get properties based on the returned code, see Examples.
The algorithm uses data from the Catalogue of Life (see below) and from one other source (see microorganisms
).
Self-learning algoritm
-The as.mo()
function gains experience from previously determined microorganism IDs and learns from it. This drastically improves both speed and reliability. Use clear_mo_history()
to reset the algorithms. Only experience from your current AMR
package version is used. This is done because in the future the taxonomic tree (which is included in this package) may change for any organism and it consequently has to rebuild its knowledge.
Usually, any guess after the first try runs 80-95% faster than the first try.
-This resets with every update of this AMR
package since results are saved to your local package library folder.
Intelligent rules
-The as.mo()
function uses several coercion rules for fast and logical results. It assesses the input matching criteria in the following order:
The as.mo()
function uses several coercion rules for fast and logical results. It assesses the input matching criteria in the following order:
Human pathogenic prevalence: the function starts with more prevalent microorganisms, followed by less prevalent ones;
Taxonomic kingdom: the function starts with determining Bacteria, then Fungi, then Protozoa, then others;
Breakdown of input values to identify possible matches.
This will lead to the effect that e.g. "E. coli"
(a highly prevalent microorganism found in humans) will return the microbial ID of Escherichia coli and not Entamoeba coli (a less prevalent microorganism in humans), although the latter would alphabetically come first. In addition, the as.mo()
function can differentiate four levels of uncertainty to guess valid results:
This will lead to the effect that e.g. "E. coli"
(a highly prevalent microorganism found in humans) will return the microbial ID of Escherichia coli and not Entamoeba coli (a less prevalent microorganism in humans), although the latter would alphabetically come first.
Coping with uncertain results
+In addition, the as.mo()
function can differentiate four levels of uncertainty to guess valid results:
Uncertainty level 0: no additional rules are applied;
Uncertainty level 1: allow previously accepted (but now invalid) taxonomic names and minor spelling errors;
as.mo()
function uses several coercion rules for fast and logic
The level of uncertainty can be set using the argument allow_uncertain
. The default is allow_uncertain = TRUE
, which is equal to uncertainty level 2. Using allow_uncertain = FALSE
is equal to uncertainty level 0 and will skip all rules. You can also use e.g. as.mo(..., allow_uncertain = 1)
to only allow up to level 1 uncertainty.
Use mo_failures()
to get a vector with all values that could not be coerced to a valid value.
-Use mo_uncertainties()
to get a data.frame
with all values that were coerced to a valid value, but with uncertainty.
-Use mo_renamed()
to get a data.frame
with all values that could be coerced based on an old, previously accepted taxonomic name.
There are three helper functions that can be run after then as.mo()
function:
Use mo_uncertainties()
to get a data.frame
with all values that were coerced to a valid value, but with uncertainty. The output contains a score, that is calculated as (n - 0.5 * L) / n
, where n is the number of characters of the returned full name of the microorganism, and L is the Levenshtein distance between that full name and the user input.
Use mo_failures()
to get a vector with all values that could not be coerced to a valid value.
Use mo_renamed()
to get a data.frame
with all values that could be coerced based on an old, previously accepted taxonomic name.
Microbial prevalence of pathogens in humans
The intelligent rules consider the prevalence of microorganisms in humans grouped into three groups, which is available as the prevalence
columns in the microorganisms
and microorganisms.old
data sets. The grouping into prevalence groups is based on experience from several microbiological laboratories in the Netherlands in conjunction with international reports on pathogen prevalence.
Group 1 (most prevalent microorganisms) consists of all microorganisms where the taxonomic class is Gammaproteobacteria or where the taxonomic genus is Enterococcus, Staphylococcus or Streptococcus. This group consequently contains all common Gram-negative bacteria, such as Pseudomonas and Legionella and all species within the order Enterobacteriales.
Group 2 consists of all microorganisms where the taxonomic phylum is Proteobacteria, Firmicutes, Actinobacteria or Sarcomastigophora, or where the taxonomic genus is Aspergillus, Bacteroides, Candida, Capnocytophaga, Chryseobacterium, Cryptococcus, Elisabethkingia, Flavobacterium, Fusobacterium, Giardia, Leptotrichia, Mycoplasma, Prevotella, Rhodotorula, Treponema, Trichophyton or Ureaplasma.
Group 3 (least prevalent microorganisms) consists of all other microorganisms.
+Self-learning algorithm
+The as.mo()
function gains experience from previously determined microorganism IDs and learns from it. This drastically improves both speed and reliability. Use clear_mo_history()
to reset the algorithms. Only experience from your current AMR
package version is used. This is done because in the future the taxonomic tree (which is included in this package) may change for any organism and it consequently has to rebuild its knowledge.
Usually, any guess after the first try runs 80-95% faster than the first try.
+This resets with every update of this AMR
package since results are saved to your local package library folder.
mo_property
functions (like Apply susceptibility rules as defined by the European Committee on Antimicrobial Susceptibility Testing (EUCAST, http://eucast.org), see Source. This includes (1) expert rules, (2) intrinsic resistance and (3) inferred resistance as defined in their breakpoint tables.
+To improve the interpretation of the antibiogram before EUCAST rules are applied, some non-EUCAST rules are applied at default, see Details.
eucast_rules(x, col_mo = NULL, info = TRUE, rules = c("breakpoints", @@ -289,6 +291,16 @@Note: This function does not translate MIC values to RSI values. Use
+as.rsi
for that.
Note: When ampicillin (AMP, J01CA01) is not available but amoxicillin (AMX, J01CA04) is, the latter will be used for all rules where there is a dependency on ampicillin. These drugs are interchangeable when it comes to expression of antimicrobial resistance.Before further processing, some non-EUCAST rules are applied to improve the efficacy of the EUCAST rules. These non-EUCAST rules, that are applied to all isolates, are:
Inherit amoxicillin (AMX) from ampicillin (AMP), where amoxicillin (AMX) is unavailable;
Inherit ampicillin (AMP) from amoxicillin (AMX), where ampicillin (AMP) is unavailable;
Set amoxicillin (AMX) = R where amoxicillin/clavulanic acid (AMC) = R;
Set piperacillin (PIP) = R where piperacillin/tazobactam (TZP) = R;
Set trimethoprim (TMP) = R where trimethoprim/sulfamethoxazole (SXT) = R;
Set amoxicillin/clavulanic acid (AMC) = S where amoxicillin (AMX) = S;
Set piperacillin/tazobactam (TZP) = S where piperacillin (PIP) = S;
Set trimethoprim/sulfamethoxazole (SXT) = S where trimethoprim (TMP) = S.
To not use these rules, please use eucast_rules(..., rules = c("breakpoints", "expert"))
.
The file containing all EUCAST rules is located here: https://gitlab.com/msberends/AMR/blob/master/data-raw/eucast_rules.tsv.