diff --git a/DESCRIPTION b/DESCRIPTION index 6635ea05..bce7122e 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR -Version: 0.8.0.9030 -Date: 2019-11-11 +Version: 0.8.0.9031 +Date: 2019-11-15 Title: Antimicrobial Resistance Analysis Authors@R: c( person(role = c("aut", "cre"), diff --git a/NAMESPACE b/NAMESPACE index d6109758..e44adf47 100755 --- a/NAMESPACE +++ b/NAMESPACE @@ -323,6 +323,7 @@ importFrom(stats,pchisq) importFrom(stats,predict) importFrom(tidyr,pivot_longer) importFrom(tidyr,pivot_wider) +importFrom(utils,adist) importFrom(utils,browseURL) importFrom(utils,menu) importFrom(utils,read.csv) diff --git a/NEWS.md b/NEWS.md index dca3b445..a8d7cf91 100755 --- a/NEWS.md +++ b/NEWS.md @@ -1,5 +1,16 @@ -# AMR 0.8.0.9030 -Last updated: 11-Nov-2019 +# AMR 0.8.0.9031 +Last updated: 15-Nov-2019 + +### Breaking +* Adopted Adeolu *et al.* (2016), [PMID 27620848](https://www.ncbi.nlm.nih.gov/pubmed/27620848) for the `microorganisms` data set, which means that the new order Enterobacterales now consists of a part of the existing family Enterobacteriaceae, but that this family has been split into other families as well (like *Morganellaceae* and *Yersiniaceae*). Although published in 2016, this information is not yet in the Catalogue of Life version of 2019. All MDRO determinations with `mdro()` will now use the Enterobacterales order for all guidelines before 2016 that were dependent on the Enterobacteriaceae family. + * If you were dependent on the old Enterobacteriaceae family e.g. by using in your code: + ```r + if (mo_family(somebugs) == "Enterobacteriaceae") ... + ``` + then please adjust this to: + ```r + if (mo_order(somebugs) == "Enterobacterales") ... + ``` ### New * Functions `susceptibility()` and `resistance()` as aliases of `proportion_SI()` and `proportion_R()`, respectively. These functions were added to make it more clear that "I" should be considered susceptible and not resistant. @@ -16,11 +27,29 @@ * The new Verbose mode (`mdro(...., verbose = TRUE)`) returns an informative data set where the reason for MDRO determination is given for every isolate, and an list of the resistant antimicrobial agents ### Changes +* Improvements to algorithm in `as.mo()`: + * Now allows "ou" where "au" should have been used and vice versa + * More intelligent way of coping with some consonants like "l" and "r" + * Added a score (a certainty percentage) to `mo_uncertainties()`, that is calculated using the [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance): + ```r + as.mo(c("Stafylococcus aureus", + "staphylokok aureuz")) + #> Warning: + #> Results of two values was guessed with uncertainty. Use mo_uncertainties() to review them. + #> Class 'mo' + #> [1] B_STPHY_AURS B_STPHY_AURS + + mo_uncertainties() + #> "Stafylococcus aureus" -> Staphylococcus aureus (B_STPHY_AURS, score: 95.2%) + #> "staphylokok aureuz" -> Staphylococcus aureus (B_STPHY_AURS, score: 85.7%) + ``` * Removed previously deprecated function `as.atc()` - this function was replaced by `ab_atc()` * Renamed all `portion_*` functions to `proportion_*`. All `portion_*` functions are still available as deprecated functions, and will return a warning when used. * When running `as.rsi()` over a data set, it will now print the guideline that will be used if it is not specified by the user -* Fix for `eucast_rules()`: *Stenotrophomonas maltophilia* not interpreted "R" to ceftazidime anymore (following EUCAST v3.1) -* Adopted Adeolu *et al.* (2016), [PMID 27620848](https://www.ncbi.nlm.nih.gov/pubmed/27620848) for the `microorganisms` data set, which means that the new order Enterobacterales now consists of a part of the existing family *Enterobacteriaceae*, but that this family has been split into other families as well (like *Morganellaceae* and *Yersiniaceae*). Although published in 2016, this information is not yet in the Catalogue of Life version of 2019. All MDRO determinations with `mdro()` will now use the Enterobacterales order for all guidelines before 2016. +* Improvements for `eucast_rules()`: + * Fix where *Stenotrophomonas maltophilia* would always become ceftazidime R (following EUCAST v3.1) + * Fix where *Leuconostoc* and *Pediococcus* would not always become glyopeptides R + * non-EUCAST rules in `eucast_rules()` are now applied first and not as last anymore. This is to improve the dependency on certain antibiotics for the official EUCAST rules. Please see `?eucast_rules`. * Fix for interpreting MIC values with `as.rsi()` where the input is `NA` * Added "imi" and "imp" as allowed abbreviation for Imipenem (IPM) * Fix for automatically determining columns with antibiotic results in `mdro()` and `eucast_rules()` diff --git a/R/eucast_rules.R b/R/eucast_rules.R index 38375369..8d6d4915 100755 --- a/R/eucast_rules.R +++ b/R/eucast_rules.R @@ -24,8 +24,11 @@ EUCAST_VERSION_BREAKPOINTS <- "9.0, 2019" EUCAST_VERSION_EXPERT_RULES <- "3.1, 2016" #' EUCAST rules -#' -#' Apply susceptibility rules as defined by the European Committee on Antimicrobial Susceptibility Testing (EUCAST, \url{http://eucast.org}), see \emph{Source}. This includes (1) expert rules, (2) intrinsic resistance and (3) inferred resistance as defined in their breakpoint tables. +#' +#' @description +#' Apply susceptibility rules as defined by the European Committee on Antimicrobial Susceptibility Testing (EUCAST, \url{http://eucast.org}), see \emph{Source}. This includes (1) expert rules, (2) intrinsic resistance and (3) inferred resistance as defined in their breakpoint tables. +#' +#' To improve the interpretation of the antibiogram before EUCAST rules are applied, some non-EUCAST rules are applied at default, see Details. #' @param x data with antibiotic columns, like e.g. \code{AMX} and \code{AMC} #' @param info print progress #' @param rules a character vector that specifies which rules should be applied - one or more of \code{c("breakpoints", "expert", "other", "all")} @@ -36,6 +39,19 @@ EUCAST_VERSION_EXPERT_RULES <- "3.1, 2016" #' \strong{Note:} This function does not translate MIC values to RSI values. Use \code{\link{as.rsi}} for that. \cr #' \strong{Note:} When ampicillin (AMP, J01CA01) is not available but amoxicillin (AMX, J01CA04) is, the latter will be used for all rules where there is a dependency on ampicillin. These drugs are interchangeable when it comes to expression of antimicrobial resistance. #' +#' Before further processing, some non-EUCAST rules are applied to improve the efficacy of the EUCAST rules. These non-EUCAST rules, that are applied to all isolates, are: +#' \itemize{ +#' \item{Inherit amoxicillin (AMX) from ampicillin (AMP), where amoxicillin (AMX) is unavailable;} +#' \item{Inherit ampicillin (AMP) from amoxicillin (AMX), where ampicillin (AMP) is unavailable;} +#' \item{Set amoxicillin (AMX) = R where amoxicillin/clavulanic acid (AMC) = R;} +#' \item{Set piperacillin (PIP) = R where piperacillin/tazobactam (TZP) = R;} +#' \item{Set trimethoprim (TMP) = R where trimethoprim/sulfamethoxazole (SXT) = R;} +#' \item{Set amoxicillin/clavulanic acid (AMC) = S where amoxicillin (AMX) = S;} +#' \item{Set piperacillin/tazobactam (TZP) = S where piperacillin (PIP) = S;} +#' \item{Set trimethoprim/sulfamethoxazole (SXT) = S where trimethoprim (TMP) = S.} +#' } +#' To \emph{not} use these rules, please use \code{eucast_rules(..., rules = c("breakpoints", "expert"))}. +#' #' The file containing all EUCAST rules is located here: \url{https://gitlab.com/msberends/AMR/blob/master/data-raw/eucast_rules.tsv}. #' #' @section Antibiotics: @@ -516,29 +532,7 @@ eucast_rules <- function(x, as.data.frame(stringsAsFactors = FALSE) ) - if (info == TRUE) { - cat(paste0( - "\nRules by the ", bold("European Committee on Antimicrobial Susceptibility Testing (EUCAST)"), - "\n", blue("http://eucast.org/"), "\n")) - } - - # since ampicillin ^= amoxicillin, get the first from the latter (not in original EUCAST table) - if (!ab_missing(AMP) & !ab_missing(AMX)) { - if (verbose == TRUE) { - cat("\n VERBOSE: transforming", - length(which(x[, AMX] == "S" & !x[, AMP] %in% c("S", "I", "R"))), - "empty ampicillin fields to 'S' based on amoxicillin. ") - cat("\n VERBOSE: transforming", - length(which(x[, AMX] == "I" & !x[, AMP] %in% c("S", "I", "R"))), - "empty ampicillin fields to 'I' based on amoxicillin. ") - cat("\n VERBOSE: transforming", - length(which(x[, AMX] == "R" & !x[, AMP] %in% c("S", "I", "R"))), - "empty ampicillin fields to 'R' based on amoxicillin. \n") - } - x[which(x[, AMX] == "S" & !x[, AMP] %in% c("S", "I", "R")), AMP] <- "S" - x[which(x[, AMX] == "I" & !x[, AMP] %in% c("S", "I", "R")), AMP] <- "I" - x[which(x[, AMX] == "R" & !x[, AMP] %in% c("S", "I", "R")), AMP] <- "R" - } else if (ab_missing(AMP) & !ab_missing(AMX)) { + if (ab_missing(AMP) & !ab_missing(AMX)) { # ampicillin column is missing, but amoxicillin is available message(blue(paste0("NOTE: Using column `", bold(AMX), "` as input for ampicillin (J01CA01) since many EUCAST rules depend on it."))) AMP <- AMX @@ -611,6 +605,7 @@ eucast_rules <- function(x, } } + eucast_notification_shown <- FALSE eucast_rules_df <- eucast_rules_file # internal data file no_added <- 0 no_changed <- 0 @@ -648,6 +643,13 @@ eucast_rules <- function(x, next } + if (info == TRUE & !rule_group_current %like% "other" & eucast_notification_shown == FALSE) { + cat(paste0( + "\n----\nRules by the ", bold("European Committee on Antimicrobial Susceptibility Testing (EUCAST)"), + "\n", blue("http://eucast.org/"), "\n")) + eucast_notification_shown <- TRUE + } + if (info == TRUE) { # Print rule (group) ------------------------------------------------------ @@ -660,7 +662,7 @@ eucast_rules <- function(x, rule_group_current %like% "expert" ~ paste0("\nEUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v", EUCAST_VERSION_EXPERT_RULES, ")\n"), TRUE ~ - "\nOther rules\n" + "\nOther rules by this AMR package\n" ) )) } @@ -707,6 +709,7 @@ eucast_rules <- function(x, } if (like_is_one_of == "is") { + # so 'Enterococcus' will turn into '^Enterococcus$' mo_value <- paste0("^", eucast_rules_df[i, 3], "$") } else if (like_is_one_of == "one_of") { # so 'Clostridium, Actinomyces, ...' will turn into '^(Clostridium|Actinomyces|...)$' @@ -717,7 +720,7 @@ eucast_rules <- function(x, } else if (like_is_one_of == "like") { mo_value <- eucast_rules_df[i, 3] } else { - stop("invalid like_is_one_of", call. = FALSE) + stop("invalid value for column 'like.is.one_of'", call. = FALSE) } source_antibiotics <- eucast_rules_df[i, 4] diff --git a/R/mo.R b/R/mo.R index 5d64def4..bad263d4 100755 --- a/R/mo.R +++ b/R/mo.R @@ -59,15 +59,6 @@ #' #' The algorithm uses data from the Catalogue of Life (see below) and from one other source (see \code{\link{microorganisms}}). #' -#' \strong{Self-learning algoritm} \cr -#' The \code{as.mo()} function gains experience from previously determined microorganism IDs and learns from it. This drastically improves both speed and reliability. Use \code{clear_mo_history()} to reset the algorithms. Only experience from your current \code{AMR} package version is used. This is done because in the future the taxonomic tree (which is included in this package) may change for any organism and it consequently has to rebuild its knowledge. -#' -#' Usually, any guess after the first try runs 80-95\% faster than the first try. -#' -# \emph{For now, learning only works per session. If R is closed or terminated, the algorithms reset. This might be resolved in a future version.} -#' This resets with every update of this \code{AMR} package since results are saved to your local package library folder. -#' -#' \strong{Intelligent rules} \cr #' The \code{as.mo()} function uses several coercion rules for fast and logical results. It assesses the input matching criteria in the following order: #' \itemize{ @@ -76,7 +67,10 @@ #' \item{Breakdown of input values to identify possible matches.} #' } #' -#' This will lead to the effect that e.g. \code{"E. coli"} (a highly prevalent microorganism found in humans) will return the microbial ID of \emph{Escherichia coli} and not \emph{Entamoeba coli} (a less prevalent microorganism in humans), although the latter would alphabetically come first. In addition, the \code{as.mo()} function can differentiate four levels of uncertainty to guess valid results: +#' This will lead to the effect that e.g. \code{"E. coli"} (a highly prevalent microorganism found in humans) will return the microbial ID of \emph{Escherichia coli} and not \emph{Entamoeba coli} (a less prevalent microorganism in humans), although the latter would alphabetically come first. +#' +#' \strong{Coping with uncertain results} \cr +#' In addition, the \code{as.mo()} function can differentiate four levels of uncertainty to guess valid results: #' #' \itemize{ #' \item{Uncertainty level 0: no additional rules are applied;} @@ -95,9 +89,12 @@ #' #' The level of uncertainty can be set using the argument \code{allow_uncertain}. The default is \code{allow_uncertain = TRUE}, which is equal to uncertainty level 2. Using \code{allow_uncertain = FALSE} is equal to uncertainty level 0 and will skip all rules. You can also use e.g. \code{as.mo(..., allow_uncertain = 1)} to only allow up to level 1 uncertainty. #' -#' Use \code{mo_failures()} to get a vector with all values that could not be coerced to a valid value. \cr -#' Use \code{mo_uncertainties()} to get a \code{data.frame} with all values that were coerced to a valid value, but with uncertainty. \cr -#' Use \code{mo_renamed()} to get a \code{data.frame} with all values that could be coerced based on an old, previously accepted taxonomic name. +#' There are three helper functions that can be run after then \code{as.mo()} function: +#' \itemize{ +#' \item{Use \code{mo_uncertainties()} to get a \code{data.frame} with all values that were coerced to a valid value, but with uncertainty. The output contains a score, that is calculated as \code{(n - 0.5 * L) / n}, where \emph{n} is the number of characters of the returned full name of the microorganism, and \emph{L} is the \href{https://en.wikipedia.org/wiki/Levenshtein_distance}{Levenshtein distance} between that full name and the user input.} +#' \item{Use \code{mo_failures()} to get a vector with all values that could not be coerced to a valid value.} +#' \item{Use \code{mo_renamed()} to get a \code{data.frame} with all values that could be coerced based on an old, previously accepted taxonomic name.} +#' } #' #' \strong{Microbial prevalence of pathogens in humans} \cr #' The intelligent rules consider the prevalence of microorganisms in humans grouped into three groups, which is available as the \code{prevalence} columns in the \code{\link{microorganisms}} and \code{\link{microorganisms.old}} data sets. The grouping into prevalence groups is based on experience from several microbiological laboratories in the Netherlands in conjunction with international reports on pathogen prevalence. @@ -107,6 +104,14 @@ #' Group 2 consists of all microorganisms where the taxonomic phylum is Proteobacteria, Firmicutes, Actinobacteria or Sarcomastigophora, or where the taxonomic genus is \emph{Aspergillus}, \emph{Bacteroides}, \emph{Candida}, \emph{Capnocytophaga}, \emph{Chryseobacterium}, \emph{Cryptococcus}, \emph{Elisabethkingia}, \emph{Flavobacterium}, \emph{Fusobacterium}, \emph{Giardia}, \emph{Leptotrichia}, \emph{Mycoplasma}, \emph{Prevotella}, \emph{Rhodotorula}, \emph{Treponema}, \emph{Trichophyton} or \emph{Ureaplasma}. #' #' Group 3 (least prevalent microorganisms) consists of all other microorganisms. +#' +#' \strong{Self-learning algorithm} \cr +#' The \code{as.mo()} function gains experience from previously determined microorganism IDs and learns from it. This drastically improves both speed and reliability. Use \code{clear_mo_history()} to reset the algorithms. Only experience from your current \code{AMR} package version is used. This is done because in the future the taxonomic tree (which is included in this package) may change for any organism and it consequently has to rebuild its knowledge. +#' +#' Usually, any guess after the first try runs 80-95\% faster than the first try. +#' +# \emph{For now, learning only works per session. If R is closed or terminated, the algorithms reset. This might be resolved in a future version.} +#' This resets with every update of this \code{AMR} package since results are saved to your local package library folder. #' @inheritSection catalogue_of_life Catalogue of Life # (source as a section here, so it can be inherited by other man pages:) #' @section Source: @@ -134,7 +139,7 @@ #' as.mo("S aureus") #' as.mo("Staphylococcus aureus") #' as.mo("Staphylococcus aureus (MRSA)") -#' as.mo("Sthafilokkockus aaureuz") # handles incorrect spelling +#' as.mo("Zthafilokkoockus oureuz") # handles incorrect spelling #' as.mo("MRSA") # Methicillin Resistant S. aureus #' as.mo("VISA") # Vancomycin Intermediate S. aureus #' as.mo("VRSA") # Vancomycin Resistant S. aureus @@ -287,7 +292,7 @@ exec_as.mo <- function(x, disable_mo_history = FALSE, debug = FALSE, reference_data_to_use = microorganismsDT) { - + if (!"AMR" %in% base::.packages()) { require("AMR") # check onLoad() in R/zzz.R: data tables are created there. @@ -518,7 +523,7 @@ exec_as.mo <- function(x, x <- gsub("(alpha|beta|gamma).?ha?emoly", "\\1-haemoly", x) # remove genus as first word x <- gsub("^genus ", "", x) - # remove 'uncertain' like texts + # remove 'uncertain'-like texts x <- trimws(gsub("(uncertain|susp[ie]c[a-z]+|verdacht)", "", x)) # allow characters that resemble others = dyslexia_mode ---- if (dyslexia_mode == TRUE) { @@ -539,13 +544,19 @@ exec_as.mo <- function(x, x <- gsub("e+", "e+", x) x <- gsub("o+", "o+", x) x <- gsub("(.)\\1+", "\\1+", x) + # allow multiplication of all other consonants + x <- gsub("([bdghjlnrw]+)", "\\1+", x) # allow ending in -en or -us x <- gsub("e\\+n(?![a-z[])", "(e+n|u+(c|k|q|qu|s|z|x|ks)+)", x, perl = TRUE) - # if the input is longer than 10 characters, allow any constant between all characters, as some might have forgotten a character + # if the input is longer than 10 characters, allow any forgotten consonant between all characters, as some might just have forgotten one... # this will allow "Pasteurella damatis" to be correctly read as "Pasteurella dagmatis". - constants <- paste(letters[!letters %in% c("a", "e", "i", "o", "u")], collapse = "") - - x[nchar(x_backup_without_spp) > 10] <- gsub("[+]", paste0("+[", constants, "]?"), x[nchar(x_backup_without_spp) > 10]) + consonants <- paste(letters[!letters %in% c("a", "e", "i", "o", "u")], collapse = "") + x[nchar(x_backup_without_spp) > 10] <- gsub("[+]", paste0("+[", consonants, "]?"), x[nchar(x_backup_without_spp) > 10]) + # allow au and ou after all these regex implementations + x <- gsub("a+[bcdfghjklmnpqrstvwxyz]?u+[bcdfghjklmnpqrstvwxyz]?", "(a+u+|o+u+)[bcdfghjklmnpqrstvwxyz]?", x, fixed = TRUE) + x <- gsub("o+[bcdfghjklmnpqrstvwxyz]?u+[bcdfghjklmnpqrstvwxyz]?", "(a+u+|o+u+)[bcdfghjklmnpqrstvwxyz]?", x, fixed = TRUE) + # make sure to remove regex overkill (will lead to errors) + x <- gsub("++", "+", x, fixed = TRUE) } x <- strip_whitespace(x, dyslexia_mode) @@ -578,7 +589,7 @@ exec_as.mo <- function(x, } progress <- progress_estimated(n = length(x), min_time = 3) - + for (i in seq_len(length(x))) { progress$tick()$print() @@ -834,8 +845,8 @@ exec_as.mo <- function(x, next } # streptococcal groups: milleri and viridans - if (x_trimmed[i] %like_case% "strepto.* milleri" - | x_backup_without_spp[i] %like_case% "strepto.* milleri" + if (x_trimmed[i] %like_case% "strepto.* mil+er+i" + | x_backup_without_spp[i] %like_case% "strepto.* mil+er+i" | x_backup_without_spp[i] %like_case% "mgs[^a-z]?$") { # Milleri Group Streptococcus (MGS) x[i] <- microorganismsDT[mo == "B_STRPT_MILL", ..property][[1]][1L] @@ -1863,6 +1874,7 @@ mo_uncertainties <- function() { #' @exportMethod print.mo_uncertainties #' @importFrom crayon green yellow red white black bgGreen bgYellow bgRed +#' @importFrom cleaner percentage #' @export #' @noRd print.mo_uncertainties <- function(x, ...) { @@ -1890,7 +1902,9 @@ print.mo_uncertainties <- function(x, ...) { paste0(colour2(paste0(" [", x[i, "uncertainty"], "] ")), ' "', x[i, "input"], '" -> ', colour1(paste0(italic(x[i, "fullname"]), ifelse(!is.na(x[i, "renamed_to"]), paste(", renamed to", italic(x[i, "renamed_to"])), ""), - " (", x[i, "mo"], ")"))), + " (", x[i, "mo"], + ", score: ", percentage(levenshtein_fraction(x[i, "input"], x[i, "fullname"]), digits = 1), + ")"))), sep = "\n") } cat(msg) @@ -1977,3 +1991,15 @@ load_mo_failures_uncertainties_renamed <- function(metadata) { options("mo_uncertainties" = metadata$uncertainties) options("mo_renamed" = metadata$renamed) } + +#' @importFrom utils adist +levenshtein_fraction <- function(input, output) { + levenshtein <- double(length = length(input)) + for (i in seq_len(length(input))) { + # determine levenshtein distance, but maximise to nchar of output + levenshtein[i] <- base::min(base::as.double(adist(input[i], output[i], ignore.case = TRUE)), + base::nchar(output[i])) + } + # self-made score between 0 and 1 (for % certainty, so 0 means huge distance, 1 means no distance) + (base::nchar(output) - 0.5 * levenshtein) / nchar(output) +} diff --git a/R/sysdata.rda b/R/sysdata.rda index c658d3ee..2fadec0f 100644 Binary files a/R/sysdata.rda and b/R/sysdata.rda differ diff --git a/R/zzz.R b/R/zzz.R index 03233e24..f0323055 100755 --- a/R/zzz.R +++ b/R/zzz.R @@ -47,15 +47,21 @@ # maybe add survey later: "https://www.surveymonkey.com/r/AMR_for_R" #' @importFrom data.table as.data.table setkey +#' @importFrom dplyr %>% mutate case_when make_DT <- function() { microorganismsDT <- as.data.table(AMR::microorganisms %>% mutate(kingdom_index = case_when(kingdom == "Bacteria" ~ 1, kingdom == "Fungi" ~ 2, kingdom == "Protozoa" ~ 3, kingdom == "Archaea" ~ 4, - TRUE ~ 6))) - # for fullname_lower: keep only dots, letters, numbers, slashes, spaces and dashes - microorganismsDT$fullname_lower <- gsub("[^.a-z0-9/ \\-]+", "", tolower(microorganismsDT$fullname)) + TRUE ~ 99), + # for fullname_lower: keep only dots, letters, + # numbers, slashes, spaces and dashes + fullname_lower = gsub("[^.a-z0-9/ \\-]+", "", + # use this paste instead of `fullname` to + # work with Viridans Group Streptococci, etc. + tolower(trimws(paste(genus, species, subspecies)))))) + # so arrange data on prevalence first, then kingdom, then full name setkey(microorganismsDT, prevalence, kingdom_index, diff --git a/data-raw/eucast_rules.tsv b/data-raw/eucast_rules.tsv index cac99db2..70110bb1 100644 --- a/data-raw/eucast_rules.tsv +++ b/data-raw/eucast_rules.tsv @@ -9,6 +9,19 @@ # >>>>> IF YOU WANT TO IMPORT THIS FILE INTO YOUR OWN SOFTWARE, HAVE THE FIRST 10 LINES SKIPPED <<<<< # ------------------------------------------------------------------------------------------------------------------------------- if_mo_property like.is.one_of this_value and_these_antibiotics have_these_values then_change_these_antibiotics to_value reference.rule reference.rule_group +genus like .* AMP S AMX S Non-EUCAST: inherit ampicillin results for unavailable amoxicillin Other rules +genus like .* AMP I AMX I Non-EUCAST: inherit ampicillin results for unavailable amoxicillin Other rules +genus like .* AMP R AMX R Non-EUCAST: inherit ampicillin results for unavailable amoxicillin Other rules +genus like .* AMX S AMP S Non-EUCAST: inherit amoxicillin results for unavailable ampicillin Other rules +genus like .* AMX I AMP I Non-EUCAST: inherit amoxicillin results for unavailable ampicillin Other rules +genus like .* AMX R AMP R Non-EUCAST: inherit amoxicillin results for unavailable ampicillin Other rules +genus like .* AMC R AMP, AMX R Non-EUCAST: set ampicillin = R where amoxicillin/clav acid = R Other rules +genus like .* TZP R PIP R Non-EUCAST: set piperacillin = R where piperacillin/tazobactam = R Other rules +genus like .* SXT R TMP R Non-EUCAST: set trimethoprim = R where trimethoprim/sulfa = R Other rules +genus like .* AMP S AMC S Non-EUCAST: set amoxicillin/clav acid = S where ampicillin = S Other rules +genus like .* AMX S AMC S Non-EUCAST: set amoxicillin/clav acid = S where ampicillin = S Other rules +genus like .* PIP S TZP S Non-EUCAST: set piperacillin/tazobactam = S where piperacillin = S Other rules +genus like .* TMP S SXT S Non-EUCAST: set trimethoprim/sulfa = S where trimethoprim = S Other rules order is Enterobacterales AMP S AMX S Enterobacterales (Order) Breakpoints order is Enterobacterales AMP I AMX I Enterobacterales (Order) Breakpoints order is Enterobacterales AMP R AMX R Enterobacterales (Order) Breakpoints @@ -53,7 +66,7 @@ genus_species like ^Streptococcus (australis|bovis|constellatus|cristatus|gallol genus_species like ^Streptococcus (australis|bovis|constellatus|cristatus|gallolyticus|gordonii|infantarius|infantis|mitis|mutans|oligofermentans|oralis|peroris|pseudopneumoniae|salivarius|sinensis|sobrinus|thermophilus|vestibularis|anginosus|equinus|intermedius|parasanguinis|sanguinis)$ AMP I AMX, AMC, PIP, TZP I Viridans group streptococci Breakpoints genus_species like ^Streptococcus (australis|bovis|constellatus|cristatus|gallolyticus|gordonii|infantarius|infantis|mitis|mutans|oligofermentans|oralis|peroris|pseudopneumoniae|salivarius|sinensis|sobrinus|thermophilus|vestibularis|anginosus|equinus|intermedius|parasanguinis|sanguinis)$ AMP R AMX, AMC, PIP, TZP R Viridans group streptococci Breakpoints genus_species is Haemophilus influenzae AMP S AMX, PIP S Haemophilus influenzae Breakpoints -genus_species is ^Haemophilus influenzae AMP I AMX, PIP I Haemophilus influenzae Breakpoints +genus_species is Haemophilus influenzae AMP I AMX, PIP I Haemophilus influenzae Breakpoints genus_species is Haemophilus influenzae AMP R AMX, PIP R Haemophilus influenzae Breakpoints genus_species is Haemophilus influenzae PEN S AMP, AMX, AMC, PIP, TZP S Haemophilus influenzae Breakpoints genus_species is Haemophilus influenzae AMC S TZP S Haemophilus influenzae Breakpoints @@ -164,7 +177,7 @@ genus_species is Enterococcus casseliflavus FUS, CAZ, cephalosporins_without_C genus_species is Enterococcus faecium FUS, CAZ, cephalosporins_without_CAZ, aminoglycosides, macrolides, TMP, SXT R Table 04: Intrinsic resistance in Gram-positive bacteria Expert Rules genus is Corynebacterium FOS R Table 04: Intrinsic resistance in Gram-positive bacteria Expert Rules genus_species is Listeria monocytogenes cephalosporins R Table 04: Intrinsic resistance in Gram-positive bacteria Expert Rules -genus is Leuconostoc, Pediococcus glycopeptides R Table 04: Intrinsic resistance in Gram-positive bacteria Expert Rules +genus one_of Leuconostoc, Pediococcus glycopeptides R Table 04: Intrinsic resistance in Gram-positive bacteria Expert Rules genus is Lactobacillus glycopeptides R Table 04: Intrinsic resistance in Gram-positive bacteria Expert Rules genus_species is Clostridium ramosum VAN R Table 04: Intrinsic resistance in Gram-positive bacteria Expert Rules genus_species is Clostridium innocuum VAN R Table 04: Intrinsic resistance in Gram-positive bacteria Expert Rules @@ -172,9 +185,9 @@ genus_species like ^Streptococcus (pyogenes|agalactiae|dysgalactiae|group A|grou genus is Enterococcus AMP R ureidopenicillins, carbapenems R Table 08: Interpretive rules for B-lactam agents and Gram-positive cocci Expert Rules genus is Enterococcus AMX R ureidopenicillins, carbapenems R Table 08: Interpretive rules for B-lactam agents and Gram-positive cocci Expert Rules family is Enterobacteriaceae TIC, PIP R, S PIP R Table 09: Interpretive rules for B-lactam agents and Gram-negative rods Expert Rules -genus is .* ERY S AZM, CLR S Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins Expert Rules -genus is .* ERY I AZM, CLR I Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins Expert Rules -genus is .* ERY R AZM, CLR R Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins Expert Rules +genus like .* ERY S AZM, CLR S Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins Expert Rules +genus like .* ERY I AZM, CLR I Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins Expert Rules +genus like .* ERY R AZM, CLR R Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins Expert Rules genus is Staphylococcus TOB R KAN, AMK R Table 12: Interpretive rules for aminoglycosides Expert Rules genus is Staphylococcus GEN R aminoglycosides R Table 12: Interpretive rules for aminoglycosides Expert Rules order is Enterobacterales GEN, TOB I, S GEN R Table 12: Interpretive rules for aminoglycosides Expert Rules @@ -183,10 +196,3 @@ genus is Staphylococcus MFX R fluoroquinolones R Table 13: Interpretive rules fo genus_species is Streptococcus pneumoniae MFX R fluoroquinolones R Table 13: Interpretive rules for quinolones Expert Rules order is Enterobacterales CIP R fluoroquinolones R Table 13: Interpretive rules for quinolones Expert Rules genus_species is Neisseria gonorrhoeae CIP R fluoroquinolones R Table 13: Interpretive rules for quinolones Expert Rules -genus is .* AMC R AMP, AMX R Non-EUCAST: ampicillin = R where amoxicillin/clav acid = R Other rules -genus is .* TZP R PIP R Non-EUCAST: piperacillin = R where piperacillin/tazobactam = R Other rules -genus is .* SXT R TMP R Non-EUCAST: trimethoprim = R where trimethoprim/sulfa = R Other rules -genus is .* AMP S AMC S Non-EUCAST: amoxicillin/clav acid = S where ampicillin = S Other rules -genus is .* AMX S AMC S Non-EUCAST: amoxicillin/clav acid = S where ampicillin = S Other rules -genus is .* PIP S TZP S Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S Other rules -genus is .* TMP S SXT S Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S Other rules diff --git a/data-raw/internals.R b/data-raw/internals.R index 9ad2c2a8..555e7dc9 100644 --- a/data-raw/internals.R +++ b/data-raw/internals.R @@ -2,14 +2,18 @@ # source("data-raw/internals.R") # See 'data-raw/eucast_rules.tsv' for the EUCAST reference file -eucast_rules_file <- dplyr::arrange( - .data = utils::read.delim(file = "data-raw/eucast_rules.tsv", +eucast_rules_file <- utils::read.delim(file = "data-raw/eucast_rules.tsv", skip = 10, sep = "\t", stringsAsFactors = FALSE, header = TRUE, strip.white = TRUE, - na = c(NA, "", NULL)), + na = c(NA, "", NULL)) +# take the order of the reference.rule_group column in the orginal data file +eucast_rules_file$reference.rule_group <- factor(eucast_rules_file$reference.rule_group, + levels = unique(eucast_rules_file$reference.rule_group), + ordered = TRUE) +eucast_rules_file <- dplyr::arrange(eucast_rules_file, reference.rule_group, reference.rule) diff --git a/data/example_isolates.rda b/data/example_isolates.rda index 82568530..15c315bc 100644 Binary files a/data/example_isolates.rda and b/data/example_isolates.rda differ diff --git a/docs/404.html b/docs/404.html index 5927d9a0..7bdaaa2a 100644 --- a/docs/404.html +++ b/docs/404.html @@ -84,7 +84,7 @@ AMR (for R) - 0.8.0.9030 + 0.8.0.9031 diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index ba6220da..858b9650 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -84,7 +84,7 @@ AMR (for R) - 0.8.0.9030 + 0.8.0.9031 diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index c6d9730e..412273a1 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -41,7 +41,7 @@ AMR (for R) - 0.8.0.9030 + 0.8.0.9031 @@ -187,7 +187,7 @@

How to conduct AMR analysis

Matthijs S. Berends

-

11 November 2019

+

15 November 2019

@@ -196,7 +196,7 @@ -

Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 11 November 2019.

+

Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 15 November 2019.

Introduction

@@ -212,21 +212,21 @@ -2019-11-11 +2019-11-15 abcd Escherichia coli S S -2019-11-11 +2019-11-15 abcd Escherichia coli S R -2019-11-11 +2019-11-15 efgh Escherichia coli R @@ -321,64 +321,64 @@ -2011-09-25 -O7 -Hospital C -Staphylococcus aureus -S -S -S -S -F - - -2012-04-04 -O9 -Hospital A -Escherichia coli -S -S -S -S -F - - -2015-03-11 -S3 -Hospital A -Escherichia coli -R -S -S -S -F - - -2014-12-11 -G1 +2015-09-02 +V3 Hospital B Escherichia coli S S S S +F + + +2017-06-27 +H4 +Hospital C +Escherichia coli +R +R +S +S M -2013-01-02 -J8 -Hospital D +2012-05-31 +X3 +Hospital B +Streptococcus pneumoniae +I +S +S +S +F + + +2013-06-08 +K2 +Hospital B +Staphylococcus aureus +I +S +S +S +M + + +2012-02-10 +M5 +Hospital A Escherichia coli S S S -R +S M -2014-08-17 -S8 -Hospital C +2010-06-25 +N7 +Hospital D Escherichia coli S S @@ -406,8 +406,8 @@ # # Item Count Percent Cum. Count Cum. Percent # --- ----- ------- -------- ----------- ------------- -# 1 M 10,427 52.14% 10,427 52.14% -# 2 F 9,573 47.87% 20,000 100.00% +# 1 M 10,309 51.55% 10,309 51.55% +# 2 F 9,691 48.46% 20,000 100.00%

So, we can draw at least two conclusions immediately. From a data scientists perspective, the data looks clean: only values M and F. From a researchers perspective: there are slightly more men. Nothing we didn’t already know.

The data is already quite clean, but we still need to transform some variables. The bacteria column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate() function of the dplyr package makes this really easy:

data <- data %>%
@@ -419,60 +419,62 @@
 

Because the amoxicillin (column AMX) and amoxicillin/clavulanic acid (column AMC) in our data were generated randomly, some rows will undoubtedly contain AMX = S and AMC = R, which is technically impossible. The eucast_rules() fixes this:

data <- eucast_rules(data, col_mo = "bacteria")
 # 
-# Rules by the European Committee on Antimicrobial Susceptibility Testing (EUCAST)
-# http://eucast.org/
-# 
-# EUCAST Clinical Breakpoints (v9.0, 2019)
-# Aerococcus sanguinicola (no changes)
-# Aerococcus urinae (no changes)
-# Anaerobic Gram-negatives (no changes)
-# Anaerobic Gram-positives (no changes)
-# Campylobacter coli (no changes)
-# Campylobacter jejuni (no changes)
-# Enterobacterales (Order) (no changes)
-# Enterococcus (no changes)
-# Haemophilus influenzae (no changes)
-# Kingella kingae (no changes)
-# Moraxella catarrhalis (no changes)
-# Pasteurella multocida (no changes)
-# Staphylococcus (no changes)
-# Streptococcus groups A, B, C, G (no changes)
-# Streptococcus pneumoniae (1,552 values changed)
-# Viridans group streptococci (no changes)
-# 
-# EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
-# Table 01: Intrinsic resistance in Enterobacteriaceae (1,279 values changed)
-# Table 02: Intrinsic resistance in non-fermentative Gram-negative bacteria (no changes)
-# Table 03: Intrinsic resistance in other Gram-negative bacteria (no changes)
-# Table 04: Intrinsic resistance in Gram-positive bacteria (2,800 values changed)
-# Table 08: Interpretive rules for B-lactam agents and Gram-positive cocci (no changes)
-# Table 09: Interpretive rules for B-lactam agents and Gram-negative rods (no changes)
-# Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins (no changes)
-# Table 12: Interpretive rules for aminoglycosides (no changes)
-# Table 13: Interpretive rules for quinolones (no changes)
+# Other rules by this AMR package
+# Non-EUCAST: inherit amoxicillin results for unavailable ampicillin (no changes)
+# Non-EUCAST: inherit ampicillin results for unavailable amoxicillin (no changes)
+# Non-EUCAST: set amoxicillin/clav acid = S where ampicillin = S (3,022 values changed)
+# Non-EUCAST: set ampicillin = R where amoxicillin/clav acid = R (151 values changed)
+# Non-EUCAST: set piperacillin = R where piperacillin/tazobactam = R (no changes)
+# Non-EUCAST: set piperacillin/tazobactam = S where piperacillin = S (no changes)
+# Non-EUCAST: set trimethoprim = R where trimethoprim/sulfa = R (no changes)
+# Non-EUCAST: set trimethoprim/sulfa = S where trimethoprim = S (no changes)
+# 
+# ----
+# Rules by the European Committee on Antimicrobial Susceptibility Testing (EUCAST)
+# http://eucast.org/
+# 
+# EUCAST Clinical Breakpoints (v9.0, 2019)
+# Aerococcus sanguinicola (no changes)
+# Aerococcus urinae (no changes)
+# Anaerobic Gram-negatives (no changes)
+# Anaerobic Gram-positives (no changes)
+# Campylobacter coli (no changes)
+# Campylobacter jejuni (no changes)
+# Enterobacterales (Order) (no changes)
+# Enterococcus (no changes)
+# Haemophilus influenzae (no changes)
+# Kingella kingae (no changes)
+# Moraxella catarrhalis (no changes)
+# Pasteurella multocida (no changes)
+# Staphylococcus (no changes)
+# Streptococcus groups A, B, C, G (no changes)
+# Streptococcus pneumoniae (1,071 values changed)
+# Viridans group streptococci (no changes)
 # 
-# Other rules
-# Non-EUCAST: amoxicillin/clav acid = S where ampicillin = S (2,257 values changed)
-# Non-EUCAST: ampicillin = R where amoxicillin/clav acid = R (132 values changed)
-# Non-EUCAST: piperacillin = R where piperacillin/tazobactam = R (no changes)
-# Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no changes)
-# Non-EUCAST: trimethoprim = R where trimethoprim/sulfa = R (no changes)
-# Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no changes)
-# 
-# --------------------------------------------------------------------------
-# EUCAST rules affected 6,599 out of 20,000 rows, making a total of 8,020 edits
-# => added 0 test results
-# 
-# => changed 8,020 test results
-#    - 119 test results changed from S to I
-#    - 4,832 test results changed from S to R
-#    - 1,096 test results changed from I to S
-#    - 342 test results changed from I to R
-#    - 1,607 test results changed from R to S
-#    - 24 test results changed from R to I
-# --------------------------------------------------------------------------
-# 
-# Use eucast_rules(..., verbose = TRUE) (on your original data) to get a data.frame with all specified edits instead.
+# EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016) +# Table 01: Intrinsic resistance in Enterobacteriaceae (1,282 values changed) +# Table 02: Intrinsic resistance in non-fermentative Gram-negative bacteria (no changes) +# Table 03: Intrinsic resistance in other Gram-negative bacteria (no changes) +# Table 04: Intrinsic resistance in Gram-positive bacteria (2,783 values changed) +# Table 08: Interpretive rules for B-lactam agents and Gram-positive cocci (no changes) +# Table 09: Interpretive rules for B-lactam agents and Gram-negative rods (no changes) +# Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins (no changes) +# Table 12: Interpretive rules for aminoglycosides (no changes) +# Table 13: Interpretive rules for quinolones (no changes) +# +# -------------------------------------------------------------------------- +# EUCAST rules affected 6,586 out of 20,000 rows, making a total of 8,309 edits +# => added 0 test results +# +# => changed 8,309 test results +# - 129 test results changed from S to I +# - 4,834 test results changed from S to R +# - 1,222 test results changed from I to S +# - 324 test results changed from I to R +# - 1,800 test results changed from R to S +# -------------------------------------------------------------------------- +# +# Use eucast_rules(..., verbose = TRUE) (on your original data) to get a data.frame with all specified edits instead.

@@ -497,8 +499,8 @@ # NOTE: Using column `bacteria` as input for `col_mo`. # NOTE: Using column `date` as input for `col_date`. # NOTE: Using column `patient_id` as input for `col_patient_id`. -# => Found 5,657 first isolates (28.3% of total)

-

So only 28.3% is suitable for resistance analysis! We can now filter on it with the filter() function, also from the dplyr package:

+# => Found 5,688 first isolates (28.4% of total) +

So only 28.4% is suitable for resistance analysis! We can now filter on it with the filter() function, also from the dplyr package:

data_1st <- data %>% 
   filter(first == TRUE)

For future use, the above two syntaxes can be shortened with the filter_first_isolate() function:

@@ -508,7 +510,7 @@

First weighted isolates

-

We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient D2, sorted on date:

+

We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient I9, sorted on date:

@@ -524,19 +526,19 @@ - - + + + + - - - - + + @@ -546,30 +548,30 @@ - - + + + + + - - - - - + + + - - + - - + + @@ -579,30 +581,30 @@ - - + + - - + + - - + + - - + + - - + + @@ -612,23 +614,23 @@ - - + + - + - - + + - + @@ -645,7 +647,7 @@ # NOTE: Using column `patient_id` as input for `col_patient_id`.# NOTE: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics = FALSE to prevent this.# [Criterion] Inclusion based on key antibiotics, ignoring I -# => Found 15,009 first weighted isolates (75.0% of total) +# => Found 15,051 first weighted isolates (75.3% of total)
isolate
12010-02-14D22010-02-08I9 B_ESCHR_COLISS R SSS TRUE
22010-04-27D22010-03-05I9 B_ESCHR_COLI S S
32010-05-31D22010-05-14I9 B_ESCHR_COLISSS RSSS FALSE
42010-08-21D22010-12-10I9 B_ESCHR_COLIR SSSR S FALSE
52010-09-21D22010-12-17I9 B_ESCHR_COLI S S
62010-10-04D22011-04-18I9 B_ESCHR_COLIR S S SFALSESTRUE
72010-10-11D22011-04-25I9 B_ESCHR_COLISS R SSS FALSE
82010-11-16D22011-06-06I9 B_ESCHR_COLI S S
92011-03-05D22011-07-14I9 B_ESCHR_COLI S S S STRUEFALSE
102011-04-18D22011-07-31I9 B_ESCHR_COLI S SSR S FALSE
@@ -662,20 +664,20 @@ - - + + + + - - - - + + @@ -686,68 +688,68 @@ - - + + + + + - - - - - + + + - - + - - + + - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + - - + + @@ -758,35 +760,35 @@ - - + + - - + + - - + + - + - +
isolate
12010-02-14D22010-02-08I9 B_ESCHR_COLISS R SSS TRUE TRUE
22010-04-27D22010-03-05I9 B_ESCHR_COLI S S
32010-05-31D22010-05-14I9 B_ESCHR_COLISSS RSSS FALSE TRUE
42010-08-21D22010-12-10I9 B_ESCHR_COLIR SSSR S FALSE TRUE
52010-09-21D22010-12-17I9 B_ESCHR_COLI S S S S FALSEFALSE
62010-10-04D2B_ESCHR_COLIRSSSFALSE TRUE
72010-10-11D2
62011-04-18I9 B_ESCHR_COLI S SSSTRUETRUE
72011-04-25I9B_ESCHR_COLI R SSS FALSE TRUE
82010-11-16D22011-06-06I9 B_ESCHR_COLI S S
92011-03-05D22011-07-14I9 B_ESCHR_COLI S S S STRUETRUEFALSEFALSE
102011-04-18D22011-07-31I9 B_ESCHR_COLI S SSR S FALSEFALSETRUE
-

Instead of 2, now 8 isolates are flagged. In total, 75.0% of all isolates are marked ‘first weighted’ - 46.8% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.

+

Instead of 2, now 9 isolates are flagged. In total, 75.3% of all isolates are marked ‘first weighted’ - 46.8% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.

As with filter_first_isolate(), there’s a shortcut for this new algorithm too:

data_1st <- data %>% 
   filter_first_weighted_isolate()
-

So we end up with 15,009 isolates for analysis.

+

So we end up with 15,051 isolates for analysis.

We can remove unneeded columns:

data_1st <- data_1st %>% 
   select(-c(first, keyab))
@@ -811,45 +813,13 @@ -1 -2011-09-25 -O7 +2 +2017-06-27 +H4 Hospital C -B_STPHY_AURS -S -S -S -S -F -Gram-positive -Staphylococcus -aureus -TRUE - - -3 -2015-03-11 -S3 -Hospital A B_ESCHR_COLI R -S -S -S -F -Gram-negative -Escherichia -coli -TRUE - - -4 -2014-12-11 -G1 -Hospital B -B_ESCHR_COLI -S -S +R S S M @@ -859,15 +829,47 @@ TRUE +3 +2012-05-31 +X3 +Hospital B +B_STRPT_PNMN +I +I +S +R +F +Gram-positive +Streptococcus +pneumoniae +TRUE + + +4 +2013-06-08 +K2 +Hospital B +B_STPHY_AURS +I +S +S +S +M +Gram-positive +Staphylococcus +aureus +TRUE + + 5 -2013-01-02 -J8 -Hospital D +2012-02-10 +M5 +Hospital A B_ESCHR_COLI S S S -R +S M Gram-negative Escherichia @@ -876,34 +878,34 @@ 7 -2013-08-06 -H4 -Hospital B -B_ESCHR_COLI -R -I +2015-10-03 +B7 +Hospital A +B_KLBSL_PNMN R S +S +R M Gram-negative -Escherichia -coli +Klebsiella +pneumoniae TRUE 9 -2016-10-03 -F3 -Hospital D -B_ESCHR_COLI +2016-04-27 +O7 +Hospital B +B_STRPT_PNMN +S S S R -S -M -Gram-negative -Escherichia -coli +F +Gram-positive +Streptococcus +pneumoniae TRUE @@ -925,7 +927,7 @@
data_1st %>% freq(genus, species)

Frequency table

Class: character
-Length: 15,009 (of which NA: 0 = 0%)
+Length: 15,051 (of which NA: 0 = 0%)
Unique: 4

Shortest: 16
Longest: 24

@@ -942,33 +944,33 @@ Longest: 24

1 Escherichia coli -7,411 -49.38% -7,411 -49.38% +7,443 +49.45% +7,443 +49.45% 2 Staphylococcus aureus -3,707 -24.70% -11,118 -74.08% +3,689 +24.51% +11,132 +73.96% 3 Streptococcus pneumoniae -2,318 -15.44% -13,436 -89.52% +2,398 +15.93% +13,530 +89.89% 4 Klebsiella pneumoniae -1,573 -10.48% -15,009 +1,521 +10.11% +15,051 100.00% @@ -980,7 +982,7 @@ Longest: 24

The functions resistance() and susceptibility() can be used to calculate antimicrobial resistance or susceptibility. For more specific analyses, the functions proportion_S(), proportion_SI(), proportion_I(), proportion_IR() and proportion_R() can be used to determine the proportion of a specific antimicrobial outcome.

As per the EUCAST guideline of 2019, we calculate resistance as the proportion of R (proportion_R(), equal to resistance()) and susceptibility as the proportion of S and I (proportion_SI(), equal to susceptibility()). These functions can be used on their own:

data_1st %>% resistance(AMX)
-# [1] 0.4684523
+# [1] 0.4655505

Or can be used in conjuction with group_by() and summarise(), both from the dplyr package:

data_1st %>% 
   group_by(hospital) %>% 
@@ -993,19 +995,19 @@ Longest: 24

Hospital A -0.4640823 +0.4651671 Hospital B -0.4663609 +0.4687618 Hospital C -0.4736130 +0.4569626 Hospital D -0.4749499 +0.4668462 @@ -1023,23 +1025,23 @@ Longest: 24

Hospital A -0.4640823 -4566 +0.4651671 +4579 Hospital B -0.4663609 -5232 +0.4687618 +5282 Hospital C -0.4736130 -2217 +0.4569626 +2219 Hospital D -0.4749499 -2994 +0.4668462 +2971 @@ -1059,27 +1061,27 @@ Longest: 24

Escherichia -0.9211982 -0.8896235 -0.9929834 +0.9183125 +0.9039366 +0.9934166 Klebsiella -0.8239034 -0.8804832 -0.9809282 +0.9119001 +0.8954635 +0.9960552 Staphylococcus -0.9188023 -0.9209603 -0.9932560 +0.9262673 +0.9227433 +0.9951206 Streptococcus -0.5974978 +0.6092577 0.0000000 -0.5974978 +0.6092577 diff --git a/docs/articles/AMR_files/figure-html/plot 1-1.png b/docs/articles/AMR_files/figure-html/plot 1-1.png index a6172ff6..1c0a9842 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 1-1.png and b/docs/articles/AMR_files/figure-html/plot 1-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 3-1.png b/docs/articles/AMR_files/figure-html/plot 3-1.png index 4e4dfe54..8c3a0329 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 3-1.png and b/docs/articles/AMR_files/figure-html/plot 3-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 4-1.png b/docs/articles/AMR_files/figure-html/plot 4-1.png index 3d384a83..a3a91508 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 4-1.png and b/docs/articles/AMR_files/figure-html/plot 4-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 5-1.png b/docs/articles/AMR_files/figure-html/plot 5-1.png index cbcb6264..d929dd38 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 5-1.png and b/docs/articles/AMR_files/figure-html/plot 5-1.png differ diff --git a/docs/articles/SPSS.html b/docs/articles/SPSS.html index 2863556e..f467342e 100644 --- a/docs/articles/SPSS.html +++ b/docs/articles/SPSS.html @@ -41,7 +41,7 @@ AMR (for R) - 0.8.0.9030 + 0.8.0.9031
@@ -187,7 +187,7 @@

How to import data from SPSS / SAS / Stata

Matthijs S. Berends

-

11 November 2019

+

15 November 2019

diff --git a/docs/articles/index.html b/docs/articles/index.html index 2270e9a5..aedec170 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -84,7 +84,7 @@ AMR (for R) - 0.8.0.9030 + 0.8.0.9031 diff --git a/docs/authors.html b/docs/authors.html index 881f4b86..b2ef68f7 100644 --- a/docs/authors.html +++ b/docs/authors.html @@ -84,7 +84,7 @@ AMR (for R) - 0.8.0.9030 + 0.8.0.9031 diff --git a/docs/index.html b/docs/index.html index 6686dfef..01f2d259 100644 --- a/docs/index.html +++ b/docs/index.html @@ -45,7 +45,7 @@ AMR (for R) - 0.8.0.9030 + 0.8.0.9031 diff --git a/docs/news/index.html b/docs/news/index.html index 63b55043..33f7bacf 100644 --- a/docs/news/index.html +++ b/docs/news/index.html @@ -84,7 +84,7 @@ AMR (for R) - 0.8.0.9030 + 0.8.0.9031 @@ -231,23 +231,39 @@ -
+

-AMR 0.8.0.9030 Unreleased +AMR 0.8.0.9031 Unreleased

-

Last updated: 11-Nov-2019

+

Last updated: 15-Nov-2019

+
+

+Breaking

+
    +
  • Adopted Adeolu et al. (2016), PMID 27620848 for the microorganisms data set, which means that the new order Enterobacterales now consists of a part of the existing family Enterobacteriaceae, but that this family has been split into other families as well (like Morganellaceae and Yersiniaceae). Although published in 2016, this information is not yet in the Catalogue of Life version of 2019. All MDRO determinations with mdro() will now use the Enterobacterales order for all guidelines before 2016 that were dependent on the Enterobacteriaceae family. +
      +
    • +

      If you were dependent on the old Enterobacteriaceae family e.g. by using in your code:

      +
      if (mo_family(somebugs) == "Enterobacteriaceae") ...
      +

      then please adjust this to:

      +
      if (mo_order(somebugs) == "Enterobacterales") ...
      +
    • +
    +
  • +
+

New

@@ -1345,7 +1385,7 @@ Using as.mo(..., allow_uncertain = 3)

Contents

@@ -305,18 +305,15 @@ A microorganism ID from this package (class: mo) typically looks li

Values that cannot be coered will be considered 'unknown' and will get the MO code UNKNOWN.

Use the mo_property_* functions to get properties based on the returned code, see Examples.

The algorithm uses data from the Catalogue of Life (see below) and from one other source (see microorganisms).

-

Self-learning algoritm
-The as.mo() function gains experience from previously determined microorganism IDs and learns from it. This drastically improves both speed and reliability. Use clear_mo_history() to reset the algorithms. Only experience from your current AMR package version is used. This is done because in the future the taxonomic tree (which is included in this package) may change for any organism and it consequently has to rebuild its knowledge.

-

Usually, any guess after the first try runs 80-95% faster than the first try.

-

This resets with every update of this AMR package since results are saved to your local package library folder.

-

Intelligent rules
-The as.mo() function uses several coercion rules for fast and logical results. It assesses the input matching criteria in the following order:

    +

    The as.mo() function uses several coercion rules for fast and logical results. It assesses the input matching criteria in the following order:

    • Human pathogenic prevalence: the function starts with more prevalent microorganisms, followed by less prevalent ones;

    • Taxonomic kingdom: the function starts with determining Bacteria, then Fungi, then Protozoa, then others;

    • Breakdown of input values to identify possible matches.

    -

    This will lead to the effect that e.g. "E. coli" (a highly prevalent microorganism found in humans) will return the microbial ID of Escherichia coli and not Entamoeba coli (a less prevalent microorganism in humans), although the latter would alphabetically come first. In addition, the as.mo() function can differentiate four levels of uncertainty to guess valid results:

    +

    This will lead to the effect that e.g. "E. coli" (a highly prevalent microorganism found in humans) will return the microbial ID of Escherichia coli and not Entamoeba coli (a less prevalent microorganism in humans), although the latter would alphabetically come first.

    +

    Coping with uncertain results
    +In addition, the as.mo() function can differentiate four levels of uncertainty to guess valid results:

    • Uncertainty level 0: no additional rules are applied;

    • Uncertainty level 1: allow previously accepted (but now invalid) taxonomic names and minor spelling errors;

    • @@ -332,14 +329,21 @@ The as.mo() function uses several coercion rules for fast and logic

    The level of uncertainty can be set using the argument allow_uncertain. The default is allow_uncertain = TRUE, which is equal to uncertainty level 2. Using allow_uncertain = FALSE is equal to uncertainty level 0 and will skip all rules. You can also use e.g. as.mo(..., allow_uncertain = 1) to only allow up to level 1 uncertainty.

    -

    Use mo_failures() to get a vector with all values that could not be coerced to a valid value.
    -Use mo_uncertainties() to get a data.frame with all values that were coerced to a valid value, but with uncertainty.
    -Use mo_renamed() to get a data.frame with all values that could be coerced based on an old, previously accepted taxonomic name.

    +

    There are three helper functions that can be run after then as.mo() function:

      +
    • Use mo_uncertainties() to get a data.frame with all values that were coerced to a valid value, but with uncertainty. The output contains a score, that is calculated as (n - 0.5 * L) / n, where n is the number of characters of the returned full name of the microorganism, and L is the Levenshtein distance between that full name and the user input.

    • +
    • Use mo_failures() to get a vector with all values that could not be coerced to a valid value.

    • +
    • Use mo_renamed() to get a data.frame with all values that could be coerced based on an old, previously accepted taxonomic name.

    • +
    +

    Microbial prevalence of pathogens in humans
    The intelligent rules consider the prevalence of microorganisms in humans grouped into three groups, which is available as the prevalence columns in the microorganisms and microorganisms.old data sets. The grouping into prevalence groups is based on experience from several microbiological laboratories in the Netherlands in conjunction with international reports on pathogen prevalence.

    Group 1 (most prevalent microorganisms) consists of all microorganisms where the taxonomic class is Gammaproteobacteria or where the taxonomic genus is Enterococcus, Staphylococcus or Streptococcus. This group consequently contains all common Gram-negative bacteria, such as Pseudomonas and Legionella and all species within the order Enterobacteriales.

    Group 2 consists of all microorganisms where the taxonomic phylum is Proteobacteria, Firmicutes, Actinobacteria or Sarcomastigophora, or where the taxonomic genus is Aspergillus, Bacteroides, Candida, Capnocytophaga, Chryseobacterium, Cryptococcus, Elisabethkingia, Flavobacterium, Fusobacterium, Giardia, Leptotrichia, Mycoplasma, Prevotella, Rhodotorula, Treponema, Trichophyton or Ureaplasma.

    Group 3 (least prevalent microorganisms) consists of all other microorganisms.

    +

    Self-learning algorithm
    +The as.mo() function gains experience from previously determined microorganism IDs and learns from it. This drastically improves both speed and reliability. Use clear_mo_history() to reset the algorithms. Only experience from your current AMR package version is used. This is done because in the future the taxonomic tree (which is included in this package) may change for any organism and it consequently has to rebuild its knowledge.

    +

    Usually, any guess after the first try runs 80-95% faster than the first try.

    +

    This resets with every update of this AMR package since results are saved to your local package library folder.

    Source

    @@ -376,7 +380,7 @@ The mo_property functions (like as.mo("S aureus") as.mo("Staphylococcus aureus") as.mo("Staphylococcus aureus (MRSA)") -as.mo("Sthafilokkockus aaureuz") # handles incorrect spelling +as.mo("Zthafilokkoockus oureuz") # handles incorrect spelling as.mo("MRSA") # Methicillin Resistant S. aureus as.mo("VISA") # Vancomycin Intermediate S. aureus as.mo("VRSA") # Vancomycin Resistant S. aureus diff --git a/docs/reference/eucast_rules.html b/docs/reference/eucast_rules.html index 14727388..5372f500 100644 --- a/docs/reference/eucast_rules.html +++ b/docs/reference/eucast_rules.html @@ -51,7 +51,8 @@ - + @@ -85,7 +86,7 @@ AMR (for R) - 0.8.0.9027 + 0.8.0.9031
@@ -235,6 +236,7 @@

Apply susceptibility rules as defined by the European Committee on Antimicrobial Susceptibility Testing (EUCAST, http://eucast.org), see Source. This includes (1) expert rules, (2) intrinsic resistance and (3) inferred resistance as defined in their breakpoint tables.

+

To improve the interpretation of the antibiogram before EUCAST rules are applied, some non-EUCAST rules are applied at default, see Details.

eucast_rules(x, col_mo = NULL, info = TRUE, rules = c("breakpoints",
@@ -289,6 +291,16 @@
 
     

Note: This function does not translate MIC values to RSI values. Use as.rsi for that.
Note: When ampicillin (AMP, J01CA01) is not available but amoxicillin (AMX, J01CA04) is, the latter will be used for all rules where there is a dependency on ampicillin. These drugs are interchangeable when it comes to expression of antimicrobial resistance.

+

Before further processing, some non-EUCAST rules are applied to improve the efficacy of the EUCAST rules. These non-EUCAST rules, that are applied to all isolates, are:

    +
  • Inherit amoxicillin (AMX) from ampicillin (AMP), where amoxicillin (AMX) is unavailable;

  • +
  • Inherit ampicillin (AMP) from amoxicillin (AMX), where ampicillin (AMP) is unavailable;

  • +
  • Set amoxicillin (AMX) = R where amoxicillin/clavulanic acid (AMC) = R;

  • +
  • Set piperacillin (PIP) = R where piperacillin/tazobactam (TZP) = R;

  • +
  • Set trimethoprim (TMP) = R where trimethoprim/sulfamethoxazole (SXT) = R;

  • +
  • Set amoxicillin/clavulanic acid (AMC) = S where amoxicillin (AMX) = S;

  • +
  • Set piperacillin/tazobactam (TZP) = S where piperacillin (PIP) = S;

  • +
  • Set trimethoprim/sulfamethoxazole (SXT) = S where trimethoprim (TMP) = S.

  • +

To not use these rules, please use eucast_rules(..., rules = c("breakpoints", "expert")).

The file containing all EUCAST rules is located here: https://gitlab.com/msberends/AMR/blob/master/data-raw/eucast_rules.tsv.

Antibiotics

diff --git a/docs/reference/index.html b/docs/reference/index.html index 82a46449..12a663db 100644 --- a/docs/reference/index.html +++ b/docs/reference/index.html @@ -84,7 +84,7 @@ AMR (for R) - 0.8.0.9030 + 0.8.0.9031 diff --git a/docs/reference/resistance_predict.html b/docs/reference/resistance_predict.html index 568c6a2d..0d1cc210 100644 --- a/docs/reference/resistance_predict.html +++ b/docs/reference/resistance_predict.html @@ -85,7 +85,7 @@ AMR (for R) - 0.8.0.9030 + 0.8.0.9031 diff --git a/man/as.mo.Rd b/man/as.mo.Rd index 81328b86..85ffb00e 100644 --- a/man/as.mo.Rd +++ b/man/as.mo.Rd @@ -70,14 +70,6 @@ Use the \code{\link{mo_property}_*} functions to get properties based on the ret The algorithm uses data from the Catalogue of Life (see below) and from one other source (see \code{\link{microorganisms}}). -\strong{Self-learning algoritm} \cr -The \code{as.mo()} function gains experience from previously determined microorganism IDs and learns from it. This drastically improves both speed and reliability. Use \code{clear_mo_history()} to reset the algorithms. Only experience from your current \code{AMR} package version is used. This is done because in the future the taxonomic tree (which is included in this package) may change for any organism and it consequently has to rebuild its knowledge. - -Usually, any guess after the first try runs 80-95\% faster than the first try. - -This resets with every update of this \code{AMR} package since results are saved to your local package library folder. - -\strong{Intelligent rules} \cr The \code{as.mo()} function uses several coercion rules for fast and logical results. It assesses the input matching criteria in the following order: \itemize{ \item{Human pathogenic prevalence: the function starts with more prevalent microorganisms, followed by less prevalent ones;} @@ -85,7 +77,10 @@ The \code{as.mo()} function uses several coercion rules for fast and logical res \item{Breakdown of input values to identify possible matches.} } -This will lead to the effect that e.g. \code{"E. coli"} (a highly prevalent microorganism found in humans) will return the microbial ID of \emph{Escherichia coli} and not \emph{Entamoeba coli} (a less prevalent microorganism in humans), although the latter would alphabetically come first. In addition, the \code{as.mo()} function can differentiate four levels of uncertainty to guess valid results: +This will lead to the effect that e.g. \code{"E. coli"} (a highly prevalent microorganism found in humans) will return the microbial ID of \emph{Escherichia coli} and not \emph{Entamoeba coli} (a less prevalent microorganism in humans), although the latter would alphabetically come first. + +\strong{Coping with uncertain results} \cr +In addition, the \code{as.mo()} function can differentiate four levels of uncertainty to guess valid results: \itemize{ \item{Uncertainty level 0: no additional rules are applied;} @@ -104,9 +99,12 @@ This leads to e.g.: The level of uncertainty can be set using the argument \code{allow_uncertain}. The default is \code{allow_uncertain = TRUE}, which is equal to uncertainty level 2. Using \code{allow_uncertain = FALSE} is equal to uncertainty level 0 and will skip all rules. You can also use e.g. \code{as.mo(..., allow_uncertain = 1)} to only allow up to level 1 uncertainty. -Use \code{mo_failures()} to get a vector with all values that could not be coerced to a valid value. \cr -Use \code{mo_uncertainties()} to get a \code{data.frame} with all values that were coerced to a valid value, but with uncertainty. \cr -Use \code{mo_renamed()} to get a \code{data.frame} with all values that could be coerced based on an old, previously accepted taxonomic name. +There are three helper functions that can be run after then \code{as.mo()} function: +\itemize{ + \item{Use \code{mo_uncertainties()} to get a \code{data.frame} with all values that were coerced to a valid value, but with uncertainty. The output contains a score, that is calculated as \code{(n - 0.5 * L) / n}, where \emph{n} is the number of characters of the returned full name of the microorganism, and \emph{L} is the \href{https://en.wikipedia.org/wiki/Levenshtein_distance}{Levenshtein distance} between that full name and the user input.} + \item{Use \code{mo_failures()} to get a vector with all values that could not be coerced to a valid value.} + \item{Use \code{mo_renamed()} to get a \code{data.frame} with all values that could be coerced based on an old, previously accepted taxonomic name.} +} \strong{Microbial prevalence of pathogens in humans} \cr The intelligent rules consider the prevalence of microorganisms in humans grouped into three groups, which is available as the \code{prevalence} columns in the \code{\link{microorganisms}} and \code{\link{microorganisms.old}} data sets. The grouping into prevalence groups is based on experience from several microbiological laboratories in the Netherlands in conjunction with international reports on pathogen prevalence. @@ -116,6 +114,13 @@ Group 1 (most prevalent microorganisms) consists of all microorganisms where the Group 2 consists of all microorganisms where the taxonomic phylum is Proteobacteria, Firmicutes, Actinobacteria or Sarcomastigophora, or where the taxonomic genus is \emph{Aspergillus}, \emph{Bacteroides}, \emph{Candida}, \emph{Capnocytophaga}, \emph{Chryseobacterium}, \emph{Cryptococcus}, \emph{Elisabethkingia}, \emph{Flavobacterium}, \emph{Fusobacterium}, \emph{Giardia}, \emph{Leptotrichia}, \emph{Mycoplasma}, \emph{Prevotella}, \emph{Rhodotorula}, \emph{Treponema}, \emph{Trichophyton} or \emph{Ureaplasma}. Group 3 (least prevalent microorganisms) consists of all other microorganisms. + +\strong{Self-learning algorithm} \cr +The \code{as.mo()} function gains experience from previously determined microorganism IDs and learns from it. This drastically improves both speed and reliability. Use \code{clear_mo_history()} to reset the algorithms. Only experience from your current \code{AMR} package version is used. This is done because in the future the taxonomic tree (which is included in this package) may change for any organism and it consequently has to rebuild its knowledge. + +Usually, any guess after the first try runs 80-95\% faster than the first try. + +This resets with every update of this \code{AMR} package since results are saved to your local package library folder. } \section{Source}{ @@ -152,7 +157,7 @@ as.mo("S. aureus") as.mo("S aureus") as.mo("Staphylococcus aureus") as.mo("Staphylococcus aureus (MRSA)") -as.mo("Sthafilokkockus aaureuz") # handles incorrect spelling +as.mo("Zthafilokkoockus oureuz") # handles incorrect spelling as.mo("MRSA") # Methicillin Resistant S. aureus as.mo("VISA") # Vancomycin Intermediate S. aureus as.mo("VRSA") # Vancomycin Resistant S. aureus diff --git a/man/eucast_rules.Rd b/man/eucast_rules.Rd index 97a982fd..24b6e799 100644 --- a/man/eucast_rules.Rd +++ b/man/eucast_rules.Rd @@ -42,12 +42,27 @@ eucast_rules(x, col_mo = NULL, info = TRUE, rules = c("breakpoints", The input of \code{x}, possibly with edited values of antibiotics. Or, if \code{verbose = TRUE}, a \code{data.frame} with all original and new values of the affected bug-drug combinations. } \description{ -Apply susceptibility rules as defined by the European Committee on Antimicrobial Susceptibility Testing (EUCAST, \url{http://eucast.org}), see \emph{Source}. This includes (1) expert rules, (2) intrinsic resistance and (3) inferred resistance as defined in their breakpoint tables. +Apply susceptibility rules as defined by the European Committee on Antimicrobial Susceptibility Testing (EUCAST, \url{http://eucast.org}), see \emph{Source}. This includes (1) expert rules, (2) intrinsic resistance and (3) inferred resistance as defined in their breakpoint tables. + +To improve the interpretation of the antibiogram before EUCAST rules are applied, some non-EUCAST rules are applied at default, see Details. } \details{ \strong{Note:} This function does not translate MIC values to RSI values. Use \code{\link{as.rsi}} for that. \cr \strong{Note:} When ampicillin (AMP, J01CA01) is not available but amoxicillin (AMX, J01CA04) is, the latter will be used for all rules where there is a dependency on ampicillin. These drugs are interchangeable when it comes to expression of antimicrobial resistance. +Before further processing, some non-EUCAST rules are applied to improve the efficacy of the EUCAST rules. These non-EUCAST rules, that are applied to all isolates, are: +\itemize{ + \item{Inherit amoxicillin (AMX) from ampicillin (AMP), where amoxicillin (AMX) is unavailable;} + \item{Inherit ampicillin (AMP) from amoxicillin (AMX), where ampicillin (AMP) is unavailable;} + \item{Set amoxicillin (AMX) = R where amoxicillin/clavulanic acid (AMC) = R;} + \item{Set piperacillin (PIP) = R where piperacillin/tazobactam (TZP) = R;} + \item{Set trimethoprim (TMP) = R where trimethoprim/sulfamethoxazole (SXT) = R;} + \item{Set amoxicillin/clavulanic acid (AMC) = S where amoxicillin (AMX) = S;} + \item{Set piperacillin/tazobactam (TZP) = S where piperacillin (PIP) = S;} + \item{Set trimethoprim/sulfamethoxazole (SXT) = S where trimethoprim (TMP) = S.} +} +To \emph{not} use these rules, please use \code{eucast_rules(..., rules = c("breakpoints", "expert"))}. + The file containing all EUCAST rules is located here: \url{https://gitlab.com/msberends/AMR/blob/master/data-raw/eucast_rules.tsv}. } \section{Antibiotics}{