diff --git a/DESCRIPTION b/DESCRIPTION index 2c786620..5cf924b4 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR -Version: 0.7.1.9034 -Date: 2019-08-09 +Version: 0.7.1.9035 +Date: 2019-08-11 Title: Antimicrobial Resistance Analysis Authors@R: c( person(role = c("aut", "cre"), diff --git a/NEWS.md b/NEWS.md index 53104f81..59a8373a 100755 --- a/NEWS.md +++ b/NEWS.md @@ -1,8 +1,12 @@ -# AMR 0.7.1.9034 +# AMR 0.7.1.9035 ### Breaking * Function `freq()` has moved to a new package, [`clean`](https://github.com/msberends/clean) ([CRAN link](https://cran.r-project.org/package=clean)). Creating frequency tables is actually not the scope of this package (never was) and this function has matured a lot over the last two years. Therefore, a new package was created for data cleaning and checking and it perfectly fits the `freq()` function. The [`clean`](https://github.com/msberends/clean) package is available on CRAN and will be installed automatically when updating the `AMR` package, that now imports it. In a later stage, the `skewness()` and `kurtosis()` functions will be moved to the `clean` package too. -* Determination of first isolates now **excludes** all 'unknown' microorganisms at default, i.e. microbial code `"UNKNOWN"`. They can be included with the new parameter `include_unknown`: `first_isolates(..., include_unknown = TRUE)`. For WHONET users, this means that all records with organism code `"con"` (*contamination*) will be excluded at default, since `as.mo("con") = "UNKNOWN"`. +* Determination of first isolates now **excludes** all 'unknown' microorganisms at default, i.e. microbial code `"UNKNOWN"`. They can be included with the new parameter `include_unknown`: + ```r + first_isolate(..., include_unknown = TRUE) + ``` + For WHONET users, this means that all records with organism code `"con"` (*contamination*) will be excluded at default, since `as.mo("con") = "UNKNOWN"`. The function always shows a note with the number of 'unknown' microorganisms that were included or excluded. ### New * Additional way to calculate co-resistance, i.e. when using multiple antibiotics as input for `portion_*` functions or `count_*` functions. This can be used to determine the empiric susceptibily of a combination therapy. A new parameter `only_all_tested` (**which defaults to `FALSE`**) replaces the old `also_single_tested` and can be used to select one of the two methods to count isolates and calculate portions. The difference can be seen in this example table (which is also on the `portion` and `count` help pages), where the %SI is being determined: @@ -44,7 +48,7 @@ * Using factors as input now adds missing factors levels when the function changes antibiotic results * Added tibble printing support for classes `rsi`, `mic`, `ab` and `mo`. When using tibbles containing antibiotic columns, values `S` will print in green, values `I` will print in yellow and values `R` will print in red: ```r - (run this on your own console, as this page does not support colour printing) + # (run this on your own console, as this page does not support colour printing) tibble(mo = sample(AMR::microorganisms$fullname, 10), drug1 = as.rsi(sample(c("S", "I", "R"), 10, replace = TRUE, prob = c(0.6, 0.1, 0.3))), @@ -65,7 +69,7 @@ * Fix for using `mo_*` functions where the coercion uncertainties and failures would not be available through `mo_uncertainties()` and `mo_failures()` anymore * Deprecated the `country` parameter of `mdro()` in favour of the already existing `guideline` parameter to support multiple guidelines within one country * The `name` of `RIF` is now Rifampicin instead of Rifampin -* The `antibiotics` data set is now sorted by name +* The `antibiotics` data set is now sorted by name and all cephalosporines now have their generation between brackets * Speed improvement for `guess_ab_col()` which is now 30 times faster for antibiotic abbreviations #### Other diff --git a/R/ab.R b/R/ab.R index 19936a30..3fbc5929 100755 --- a/R/ab.R +++ b/R/ab.R @@ -233,7 +233,7 @@ as.ab <- function(x) { } if (length(x_unknown) > 0) { - warning("These values could not be coerced to a valid antibiotic ID: ", + warning("These values could not be coerced to a valid antimicrobial ID: ", paste('"', sort(unique(x_unknown)), '"', sep = "", collapse = ', '), ".", call. = FALSE) diff --git a/R/deprecated.R b/R/deprecated.R index 1096aede..7f75e9e5 100755 --- a/R/deprecated.R +++ b/R/deprecated.R @@ -29,6 +29,5 @@ #' @rdname AMR-deprecated as.atc <- function(x) { .Deprecated("ab_atc", package = "AMR") - ab_atc(x) + AMR::ab_atc(x) } - diff --git a/R/eucast_rules.R b/R/eucast_rules.R index ac553d3d..13b3283d 100755 --- a/R/eucast_rules.R +++ b/R/eucast_rules.R @@ -669,7 +669,7 @@ eucast_rules <- function(x, suppressWarnings( all_staph <- AMR::microorganisms %>% filter(genus == "Staphylococcus") %>% - mutate(CNS_CPS = mo_fullname(mo, Becker = "all")) + mutate(CNS_CPS = mo_name(mo, Becker = "all")) ) if (eucast_rules_df[i, 3] %like% "coagulase-") { eucast_rules_df[i, 3] <- paste0("^(", diff --git a/R/get_locale.R b/R/get_locale.R index b5316b24..a7248214 100755 --- a/R/get_locale.R +++ b/R/get_locale.R @@ -21,14 +21,16 @@ #' Translate strings from AMR package #' -#' For language-dependent output of AMR functions, like \code{\link{mo_fullname}} and \code{\link{mo_type}}. +#' For language-dependent output of AMR functions, like \code{\link{mo_name}}, \code{\link{mo_type}} and \code{\link{ab_name}}. #' @details Strings will be translated to foreign languages if they are defined in a local translation file. Additions to this file can be suggested at our repository. The file can be found here: \url{https://gitlab.com/msberends/AMR/blob/master/data-raw/translations.tsv}. #' +#' Currently supported languages can be found if running: \code{unique(AMR:::translations_file$lang)}. +#' #' Please suggest your own translations \href{https://gitlab.com/msberends/AMR/issues/new?issue[title]=Translation\%20suggestion}{by creating a new issue on our repository}. #' #' This file will be read by all functions where a translated output can be desired, like all \code{\link{mo_property}} functions (\code{\link{mo_fullname}}, \code{\link{mo_type}}, etc.). #' -#' The system language will be used at default, if supported, using \code{\link{get_locale}}. The system language can be overwritten with \code{\link{getOption}("AMR_locale")}. +#' The system language will be used at default, if that language is supported. The system language can be overwritten with \code{\link{getOption}("AMR_locale")}. #' @inheritSection AMR Read more on our website! #' @rdname translate #' @name translate @@ -39,27 +41,27 @@ #' # with get_locale() #' #' # English -#' mo_fullname("CoNS", language = "en") +#' mo_name("CoNS", language = "en") #' #> "Coagulase-negative Staphylococcus (CoNS)" #' #' # German -#' mo_fullname("CoNS", language = "de") +#' mo_name("CoNS", language = "de") #' #> "Koagulase-negative Staphylococcus (KNS)" #' #' # Dutch -#' mo_fullname("CoNS", language = "nl") +#' mo_name("CoNS", language = "nl") #' #> "Coagulase-negatieve Staphylococcus (CNS)" #' #' # Spanish -#' mo_fullname("CoNS", language = "es") +#' mo_name("CoNS", language = "es") #' #> "Staphylococcus coagulasa negativo (SCN)" #' #' # Italian -#' mo_fullname("CoNS", language = "it") +#' mo_name("CoNS", language = "it") #' #> "Staphylococcus negativo coagulasi (CoNS)" #' #' # Portuguese -#' mo_fullname("CoNS", language = "pt") +#' mo_name("CoNS", language = "pt") #' #> "Staphylococcus coagulase negativo (CoNS)" get_locale <- function() { if (getOption("AMR_locale", "en") != "en") { diff --git a/R/like.R b/R/like.R index b7d6c158..2ab07bdc 100755 --- a/R/like.R +++ b/R/like.R @@ -28,7 +28,7 @@ #' @rdname like #' @export #' @details Using RStudio? This function can also be inserted from the Addins menu and can have its own Keyboard Shortcut like Ctrl+Shift+L or Cmd+Shift+L (see Tools > Modify Keyboard Shortcuts...). -#' @source Idea from the \href{https://github.com/Rdatatable/data.table/blob/master/R/like.R}{\code{like} function from the \code{data.table} package}, but made it case insensitive at default and let it support multiple patterns. +#' @source Idea from the \href{https://github.com/Rdatatable/data.table/blob/master/R/like.R}{\code{like} function from the \code{data.table} package}, but made it case insensitive at default and let it support multiple patterns. Also, if the regex fails the first time, it tries again with \code{perl = TRUE}. #' @seealso \code{\link[base]{grep}} #' @inheritSection AMR Read more on our website! #' @examples diff --git a/R/misc.R b/R/misc.R index 7040eb00..d9fca3bc 100755 --- a/R/misc.R +++ b/R/misc.R @@ -29,48 +29,6 @@ addin_insert_like <- function() { rstudioapi::insertText(" %like% ") } -# No export, no Rd -# works exactly like round(), but rounds `round(44.55, 1)` as 44.6 instead of 44.5 -# and adds decimal zeroes until `digits` is reached when force_zero = TRUE -round2 <- function(x, digits = 0, force_zero = TRUE) { - # https://stackoverflow.com/a/12688836/4575331 - val <- (trunc((abs(x) * 10 ^ digits) + 0.5) / 10 ^ digits) * sign(x) - if (digits > 0 & force_zero == TRUE) { - val[val != as.integer(val)] <- paste0(val[val != as.integer(val)], - strrep("0", max(0, digits - nchar(gsub(".*[.](.*)$", "\\1", val[val != as.integer(val)]))))) - } - val -} - -# Coefficient of variation (CV) -cv <- function(x, na.rm = TRUE) { - stats::sd(x, na.rm = na.rm) / base::abs(base::mean(x, na.rm = na.rm)) -} - -# Coefficient of dispersion, or coefficient of quartile variation (CQV). -# (Bonett et al., 2006: Confidence interval for a coefficient of quartile variation). -cqv <- function(x, na.rm = TRUE) { - fives <- stats::fivenum(x, na.rm = na.rm) - (fives[4] - fives[2]) / (fives[4] + fives[2]) -} - -# show bytes as kB/MB/GB -# size_humanreadable(123456) # 121 kB -# size_humanreadable(12345678) # 11.8 MB -size_humanreadable <- function(bytes, decimals = 1) { - bytes <- bytes %>% as.double() - # Adapted from: - # http://jeffreysambells.com/2012/10/25/human-readable-filesize-php - size <- c('B','kB','MB','GB','TB','PB','EB','ZB','YB') - factor <- floor((nchar(bytes) - 1) / 3) - # added slight improvement; no decimals for B and kB: - decimals <- rep(decimals, length(bytes)) - decimals[size[factor + 1] %in% c('B', 'kB')] <- 0 - - out <- paste(sprintf(paste0("%.", decimals, "f"), bytes / (1024 ^ factor)), size[factor + 1]) - out -} - percent_clean <- clean:::percent # No export, no Rd percent <- function(x, round = 1, force_zero = FALSE, decimal.mark = getOption("OutDec"), big.mark = ",", ...) { diff --git a/R/mo.R b/R/mo.R index 2f9f7405..ed3307f0 100755 --- a/R/mo.R +++ b/R/mo.R @@ -477,7 +477,8 @@ exec_as.mo <- function(x, # translate to English for supported languages of mo_property x <- gsub("(gruppe|groep|grupo|gruppo|groupe)", "group", x, ignore.case = TRUE) x <- gsub("(hefe|gist|gisten|levadura|lievito|fermento|levure)[a-z]*", "yeast", x, ignore.case = TRUE) - x <- gsub("(schimmels?|mofo|molde|stampo|moisissure)[a-z]*", "fungus", x, ignore.case = TRUE) + x <- gsub("(schimmels?|mofo|molde|stampo|moisissure|fungi)[a-z]*", "fungus", x, ignore.case = TRUE) + x <- gsub("Fungus[ph|f]rya", "Fungiphrya", x, ignore.case = TRUE) # remove non-text in case of "E. coli" except dots and spaces x <- gsub("[^.a-zA-Z0-9/ \\-]+", "", x) # replace minus by a space @@ -1216,7 +1217,7 @@ exec_as.mo <- function(x, } return(found[1L]) } - if (b.x_trimmed %like% "fungus") { + if (b.x_trimmed %like% "(fungus|fungi)" & !b.x_trimmed %like% "Fungiphrya") { found <- "F_FUNGUS" found_result <- found found <- microorganismsDT[mo == found, ..property][[1]] diff --git a/R/mo_property.R b/R/mo_property.R index 0ef59ed4..bd3e1d27 100755 --- a/R/mo_property.R +++ b/R/mo_property.R @@ -318,7 +318,7 @@ mo_synonyms <- function(x, ...) { } }) if (length(syns) > 1) { - names(syns) <- mo_fullname(x) + names(syns) <- mo_name(x) result <- syns } else { result <- unlist(syns) @@ -340,7 +340,7 @@ mo_info <- function(x, language = get_locale(), ...) { url = unname(mo_url(y, open = FALSE)), ref = mo_ref(y)))) if (length(info) > 1) { - names(info) <- mo_fullname(x) + names(info) <- mo_name(x) result <- info } else { result <- info[[1L]] @@ -368,7 +368,7 @@ mo_url <- function(x, open = FALSE, ...) { NA_character_)) u <- df$url - names(u) <- AMR::mo_fullname(mo) + names(u) <- AMR::mo_name(mo) if (open == TRUE) { if (length(u) > 1) { warning("only the first URL will be opened, as `browseURL()` only suports one string.") diff --git a/R/rsi.R b/R/rsi.R index b772db61..0274ee29 100755 --- a/R/rsi.R +++ b/R/rsi.R @@ -25,14 +25,14 @@ #' @rdname as.rsi #' @param x vector of values (for class \code{mic}: an MIC value in mg/L, for class \code{disk}: a disk diffusion radius in millimeters) #' @param mo a microorganism code, generated with \code{\link{as.mo}} -#' @param ab an antibiotic code, generated with \code{\link{as.ab}} +#' @param ab an antimicrobial code, generated with \code{\link{as.ab}} #' @inheritParams first_isolate #' @param guideline defaults to the latest included EUCAST guideline, run \code{unique(AMR::rsi_translation$guideline)} for all options -#' @param threshold maximum fraction of \code{x} that is allowed to fail transformation, see Examples +#' @param threshold maximum fraction of invalid antimicrobial interpretations of \code{x}, see Examples #' @param ... parameters passed on to methods #' @details Run \code{unique(AMR::rsi_translation$guideline)} for a list of all supported guidelines. #' -#' After using \code{as.rsi}, you can use \code{\link{eucast_rules}} to (1) apply inferred susceptibility and resistance based on results of other antibiotics and (2) apply intrinsic resistance based on taxonomic properties of a microorganism. +#' After using \code{as.rsi}, you can use \code{\link{eucast_rules}} to (1) apply inferred susceptibility and resistance based on results of other antimicrobials and (2) apply intrinsic resistance based on taxonomic properties of a microorganism. #' #' The function \code{is.rsi.eligible} returns \code{TRUE} when a columns contains at most 5\% invalid antimicrobial interpretations (not S and/or I and/or R), and \code{FALSE} otherwise. The threshold of 5\% can be set with the \code{threshold} parameter. #' @section Interpretation of S, I and R: @@ -265,7 +265,7 @@ as.rsi.data.frame <- function(x, col_mo = NULL, guideline = "EUCAST", ...) { ab_cols <- colnames(x)[sapply(x, function(y) is.mic(y) | is.disk(y))] if (length(ab_cols) == 0) { - stop("No columns with MIC values or disk zones found in this data set. Use as.mic or as.disk to transform antibiotic columns.", call. = FALSE) + stop("No columns with MIC values or disk zones found in this data set. Use as.mic or as.disk to transform antimicrobial columns.", call. = FALSE) } # try to find columns based on type diff --git a/R/sysdata.rda b/R/sysdata.rda index 5c44c6ce..186c9e9c 100644 Binary files a/R/sysdata.rda and b/R/sysdata.rda differ diff --git a/data-raw/reproduction_of_antibiotics.R b/data-raw/reproduction_of_antibiotics.R index b6b7835f..78f34089 100644 --- a/data-raw/reproduction_of_antibiotics.R +++ b/data-raw/reproduction_of_antibiotics.R @@ -294,6 +294,8 @@ antibiotics[which(antibiotics$ab == "RIF"), "name"] <- "Rifampicin" # PME and PVM1 (the J0 one) both mean 'Pivmecillinam', so: antibiotics <- filter(antibiotics, ab != "PME") antibiotics[which(antibiotics$ab == "PVM1"), "ab"] <- "PME" +# Remove Sinecatechins +antibiotics <- filter(antibiotics, ab != "SNC") # ESBL E-test codes: antibiotics[which(antibiotics$ab == "CCV"), "abbreviations"][[1]] <- list(c("xtzl")) antibiotics[which(antibiotics$ab == "CAZ"), "abbreviations"][[1]] <- list(c(antibiotics[which(antibiotics$ab == "CAZ"), "abbreviations"][[1]], "xtz", "cefta")) @@ -304,6 +306,44 @@ antibiotics[which(antibiotics$ab == "CTX"), "abbreviations"][[1]] <- list(c(anti antibiotics <- antibiotics %>% arrange(name) +# set cephalosporins groups for the ones that could not be determined automatically: +antibiotics <- antibiotics %>% + mutate(group = case_when( + name == "Cefcapene" ~ "Cephalosporins (3rd gen.)", + name == "Cefcapene pivoxil" ~ "Cephalosporins (3rd gen.)", + name == "Cefditoren pivoxil" ~ "Cephalosporins (3rd gen.)", + name == "Cefepime/clavulanic acid" ~ "Cephalosporins (4th gen.)", + name == "Cefepime/tazobactam" ~ "Cephalosporins (4th gen.)", + name == "Cefetamet pivoxil" ~ "Cephalosporins (3rd gen.)", + name == "Cefetecol (Cefcatacol)" ~ "Cephalosporins (4th gen.)", + name == "Cefetrizole" ~ "Cephalosporins (unclassified gen.)", + name == "Cefoselis" ~ "Cephalosporins (4th gen.)", + name == "Cefotaxime/clavulanic acid" ~ "Cephalosporins (3rd gen.)", + name == "Cefotaxime/sulbactam" ~ "Cephalosporins (3rd gen.)", + name == "Cefotiam hexetil" ~ "Cephalosporins (3rd gen.)", + name == "Cefovecin" ~ "Cephalosporins (3rd gen.)", + name == "Cefozopran" ~ "Cephalosporins (4th gen.)", + name == "Cefpimizole" ~ "Cephalosporins (3rd gen.)", + name == "Cefpodoxime proxetil" ~ "Cephalosporins (3rd gen.)", + name == "Cefpodoxime/clavulanic acid" ~ "Cephalosporins (3rd gen.)", + name == "Cefquinome" ~ "Cephalosporins (4th gen.)", + name == "Cefsumide" ~ "Cephalosporins (unclassified gen.)", + name == "Ceftaroline" ~ "Cephalosporins (5th gen.)", + name == "Ceftaroline/avibactam" ~ "Cephalosporins (5th gen.)", + name == "Ceftazidime/avibactam" ~ "Cephalosporins (3rd gen.)", + name == "Cefteram" ~ "Cephalosporins (3rd gen.)", + name == "Cefteram pivoxil" ~ "Cephalosporins (3rd gen.)", + name == "Ceftiofur" ~ "Cephalosporins (3rd gen.)", + name == "Ceftizoxime alapivoxil" ~ "Cephalosporins (3rd gen.)", + name == "Ceftobiprole" ~ "Cephalosporins (5th gen.)", + name == "Ceftobiprole medocaril" ~ "Cephalosporins (5th gen.)", + name == "Ceftolozane/enzyme inhibitor" ~ "Cephalosporins (5th gen.)", + name == "Ceftolozane/tazobactam" ~ "Cephalosporins (5th gen.)", + name == "Cefuroxime axetil" ~ "Cephalosporins (2nd gen.)", + TRUE ~ group)) + +# set as data.frame again +antibiotics <- as.data.frame(antibiotics, stringsAsFactors = FALSE) class(antibiotics$ab) <- "ab" dim(antibiotics) # for R/data.R diff --git a/data-raw/translations.tsv b/data-raw/translations.tsv index 0456fe47..b1f6e6dc 100644 --- a/data-raw/translations.tsv +++ b/data-raw/translations.tsv @@ -368,6 +368,8 @@ nl Cephalosporins (1st gen.) Cefalosporines (1e gen.) TRUE FALSE nl Cephalosporins (2nd gen.) Cefalosporines (2e gen.) TRUE FALSE nl Cephalosporins (3rd gen.) Cefalosporines (3e gen.) TRUE FALSE nl Cephalosporins (4th gen.) Cefalosporines (4e gen.) TRUE FALSE +nl Cephalosporins (5th gen.) Cefalosporines (5e gen.) TRUE FALSE +nl Cephalosporins (unclassified gen.) Cefalosporines (ongeclassificeerd) TRUE FALSE nl Cephalosporins Cefalosporines TRUE FALSE nl Glycopeptides Glycopeptiden TRUE FALSE nl Macrolides/lincosamides Macroliden/lincosamiden TRUE FALSE diff --git a/data/antibiotics.rda b/data/antibiotics.rda index 82025b42..aef6b59d 100755 Binary files a/data/antibiotics.rda and b/data/antibiotics.rda differ diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index 71b07b4f..80b7586d 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -78,7 +78,7 @@ AMR (for R) - 0.7.1.9034 + 0.7.1.9035 diff --git a/docs/articles/benchmarks.html b/docs/articles/benchmarks.html index 2d2dbb5b..7214e2c2 100644 --- a/docs/articles/benchmarks.html +++ b/docs/articles/benchmarks.html @@ -40,7 +40,7 @@ AMR (for R) - 0.7.1.9029 + 0.7.1.9035 @@ -185,7 +185,7 @@

Benchmarks

Matthijs S. Berends

-

08 August 2019

+

11 August 2019

@@ -211,13 +211,13 @@ print(S.aureus, unit = "ms", signif = 2) # Unit: milliseconds # expr min lq mean median uq max neval -# as.mo("sau") 8.4 8.5 12 9.1 10.0 27.0 10 -# as.mo("stau") 30.0 31.0 33 32.0 32.0 48.0 10 -# as.mo("staaur") 8.4 8.6 11 9.0 9.9 26.0 10 -# as.mo("STAAUR") 8.5 8.5 16 8.7 9.7 56.0 10 -# as.mo("S. aureus") 22.0 22.0 24 22.0 23.0 40.0 10 -# as.mo("S. aureus") 22.0 22.0 36 24.0 41.0 98.0 10 -# as.mo("Staphylococcus aureus") 3.9 4.0 4 4.0 4.1 4.3 10 +# as.mo("sau") 8.3 8.5 14.0 9.1 9.2 39.0 10 +# as.mo("stau") 31.0 32.0 38.0 32.0 47.0 53.0 10 +# as.mo("staaur") 8.2 8.4 20.0 8.7 40.0 55.0 10 +# as.mo("STAAUR") 8.2 8.4 10.0 8.9 9.4 24.0 10 +# as.mo("S. aureus") 23.0 24.0 33.0 24.0 24.0 99.0 10 +# as.mo("S. aureus") 23.0 23.0 29.0 24.0 40.0 42.0 10 +# as.mo("Staphylococcus aureus") 3.7 3.9 4.1 4.1 4.2 4.6 10

In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 5 milliseconds means it can determine 200 input values per second. It case of 100 milliseconds, this is only 10 input values per second. The second input is the only one that has to be looked up thoroughly. All the others are known codes (the first one is a WHONET code) or common laboratory codes, or common full organism names like the last one. Full organism names are always preferred.

To achieve this speed, the as.mo function also takes into account the prevalence of human pathogenic microorganisms. The downside is of course that less prevalent microorganisms will be determined less fast. See this example for the ID of Thermus islandicus (B_THERMS_ISL), a bug probably never found before in humans:

T.islandicus <- microbenchmark(as.mo("theisl"),
@@ -229,12 +229,12 @@
 print(T.islandicus, unit = "ms", signif = 2)
 # Unit: milliseconds
 #                         expr min  lq mean median  uq max neval
-#              as.mo("theisl") 260 270  280    280 290 310    10
-#              as.mo("THEISL") 260 270  290    280 290 380    10
-#       as.mo("T. islandicus") 130 140  150    150 150 160    10
-#      as.mo("T.  islandicus") 130 140  140    140 150 160    10
-#  as.mo("Thermus islandicus")  47  50   58     62  65  68    10
-

That takes 9.4 times as much time on average. A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance. Full names (like Thermus islandicus) are almost fast - these are the most probable input from most data sets.

+# as.mo("theisl") 270 270 280 290 290 300 10 +# as.mo("THEISL") 280 290 290 290 300 300 10 +# as.mo("T. islandicus") 130 130 150 150 160 160 10 +# as.mo("T. islandicus") 130 130 150 150 150 160 10 +# as.mo("Thermus islandicus") 46 48 54 50 63 71 10 +

That takes 8.8 times as much time on average. A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance. Full names (like Thermus islandicus) are almost fast - these are the most probable input from most data sets.

In the figure below, we compare Escherichia coli (which is very common) with Prevotella brevis (which is moderately common) and with Thermus islandicus (which is very uncommon):

par(mar = c(5, 16, 4, 2)) # set more space for left margin text (16)
 
@@ -253,7 +253,7 @@
 

Repetitive results

-

Repetitive results are unique values that are present more than once. Unique values will only be calculated once by as.mo(). We will use mo_fullname() for this test - a helper function that returns the full microbial name (genus, species and possibly subspecies) which uses as.mo() internally.

+

Repetitive results are unique values that are present more than once. Unique values will only be calculated once by as.mo(). We will use mo_name() for this test - a helper function that returns the full microbial name (genus, species and possibly subspecies) which uses as.mo() internally.

-

So transforming 500,000 values (!!) of 50 unique values only takes 0.67 seconds (666 ms). You only lose time on your unique input values.

+# expr min lq mean median uq max neval +# mo_name(x) 623 631 659 637 697 729 10
+

So transforming 500,000 values (!!) of 50 unique values only takes 0.64 seconds (637 ms). You only lose time on your unique input values.

Precalculated results

-

What about precalculated results? If the input is an already precalculated result of a helper function like mo_fullname(), it almost doesn’t take any time at all (see ‘C’ below):

-
run_it <- microbenchmark(A = mo_fullname("B_STPHY_AUR"),
-                         B = mo_fullname("S. aureus"),
-                         C = mo_fullname("Staphylococcus aureus"),
+

What about precalculated results? If the input is an already precalculated result of a helper function like mo_name(), it almost doesn’t take any time at all (see ‘C’ below):

+ -

So going from mo_fullname("Staphylococcus aureus") to "Staphylococcus aureus" takes 0.0008 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:

+# expr min lq mean median uq max neval +# A 6.290 6.730 7.170 7.010 7.760 8.09 10 +# B 22.600 22.700 26.200 23.000 25.400 44.30 10 +# C 0.798 0.806 0.874 0.844 0.891 1.05 10
+

So going from mo_name("Staphylococcus aureus") to "Staphylococcus aureus" takes 0.0008 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:

run_it <- microbenchmark(A = mo_species("aureus"),
                          B = mo_genus("Staphylococcus"),
-                         C = mo_fullname("Staphylococcus aureus"),
+                         C = mo_name("Staphylococcus aureus"),
                          D = mo_family("Staphylococcaceae"),
                          E = mo_order("Bacillales"),
                          F = mo_class("Bacilli"),
@@ -310,47 +310,47 @@
 print(run_it, unit = "ms", signif = 3)
 # Unit: milliseconds
 #  expr   min    lq  mean median    uq   max neval
-#     A 0.437 0.456 0.499  0.482 0.560 0.607    10
-#     B 0.474 0.484 0.534  0.509 0.588 0.627    10
-#     C 0.621 0.712 0.799  0.812 0.829 1.020    10
-#     D 0.469 0.482 0.534  0.513 0.595 0.654    10
-#     E 0.415 0.434 0.493  0.459 0.557 0.678    10
-#     F 0.458 0.523 0.538  0.546 0.554 0.601    10
-#     G 0.416 0.438 0.484  0.450 0.563 0.621    10
-#     H 0.420 0.434 0.491  0.448 0.577 0.620    10
+# A 0.455 0.458 0.471 0.465 0.482 0.504 10 +# B 0.480 0.482 0.497 0.491 0.497 0.554 10 +# C 0.662 0.687 0.754 0.750 0.788 0.964 10 +# D 0.484 0.484 0.496 0.488 0.501 0.544 10 +# E 0.442 0.450 0.459 0.456 0.462 0.492 10 +# F 0.440 0.447 0.456 0.452 0.463 0.486 10 +# G 0.450 0.452 0.462 0.459 0.463 0.485 10 +# H 0.455 0.461 0.467 0.467 0.471 0.492 10

Of course, when running mo_phylum("Firmicutes") the function has zero knowledge about the actual microorganism, namely S. aureus. But since the result would be "Firmicutes" too, there is no point in calculating the result. And because this package ‘knows’ all phyla of all known bacteria (according to the Catalogue of Life), it can just return the initial value immediately.

Results in other languages

When the system language is non-English and supported by this AMR package, some functions will have a translated result. This almost does’t take extra time:

-
mo_fullname("CoNS", language = "en") # or just mo_fullname("CoNS") on an English system
+
mo_name("CoNS", language = "en") # or just mo_name("CoNS") on an English system
 # [1] "Coagulase-negative Staphylococcus (CoNS)"
 
-mo_fullname("CoNS", language = "es") # or just mo_fullname("CoNS") on a Spanish system
+mo_name("CoNS", language = "es") # or just mo_name("CoNS") on a Spanish system
 # [1] "Staphylococcus coagulasa negativo (SCN)"
 
-mo_fullname("CoNS", language = "nl") # or just mo_fullname("CoNS") on a Dutch system
+mo_name("CoNS", language = "nl") # or just mo_name("CoNS") on a Dutch system
 # [1] "Coagulase-negatieve Staphylococcus (CNS)"
 
-run_it <- microbenchmark(en = mo_fullname("CoNS", language = "en"),
-                         de = mo_fullname("CoNS", language = "de"),
-                         nl = mo_fullname("CoNS", language = "nl"),
-                         es = mo_fullname("CoNS", language = "es"),
-                         it = mo_fullname("CoNS", language = "it"),
-                         fr = mo_fullname("CoNS", language = "fr"),
-                         pt = mo_fullname("CoNS", language = "pt"),
+run_it <- microbenchmark(en = mo_name("CoNS", language = "en"),
+                         de = mo_name("CoNS", language = "de"),
+                         nl = mo_name("CoNS", language = "nl"),
+                         es = mo_name("CoNS", language = "es"),
+                         it = mo_name("CoNS", language = "it"),
+                         fr = mo_name("CoNS", language = "fr"),
+                         pt = mo_name("CoNS", language = "pt"),
                          times = 10)
 print(run_it, unit = "ms", signif = 4)
 # Unit: milliseconds
-#  expr   min    lq  mean median    uq    max neval
-#    en 17.02 17.26 17.89  17.85 18.50  18.84    10
-#    de 18.28 18.65 22.91  18.84 19.67  41.64    10
-#    nl 24.07 24.31 32.74  24.60 25.02 105.60    10
-#    es 18.59 19.00 19.99  19.32 19.81  26.42    10
-#    it 18.28 18.40 22.59  19.07 20.38  39.47    10
-#    fr 18.34 18.70 21.48  19.37 20.83  34.67    10
-#    pt 18.60 18.92 19.25  19.19 19.59  20.14    10
+# expr min lq mean median uq max neval +# en 17.66 17.86 18.50 18.49 19.14 19.36 10 +# de 19.03 19.38 19.64 19.49 20.01 20.42 10 +# nl 24.40 25.23 30.77 25.78 41.94 44.93 10 +# es 19.18 19.22 23.30 19.53 21.34 39.20 10 +# it 19.02 19.24 23.53 19.57 20.35 50.89 10 +# fr 19.28 19.33 19.87 19.57 20.19 21.25 10 +# pt 18.89 19.14 19.77 19.67 20.21 20.99 10

Currently supported are German, Dutch, Spanish, Italian, French and Portuguese.

diff --git a/docs/articles/benchmarks_files/figure-html/unnamed-chunk-5-1.png b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-5-1.png index 88637bdf..938b62bf 100644 Binary files a/docs/articles/benchmarks_files/figure-html/unnamed-chunk-5-1.png and b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-5-1.png differ diff --git a/docs/articles/index.html b/docs/articles/index.html index dbdcbf53..b8b53703 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -78,7 +78,7 @@ AMR (for R) - 0.7.1.9034 + 0.7.1.9035 diff --git a/docs/authors.html b/docs/authors.html index 6665c28b..c4e81532 100644 --- a/docs/authors.html +++ b/docs/authors.html @@ -78,7 +78,7 @@ AMR (for R) - 0.7.1.9034 + 0.7.1.9035 diff --git a/docs/index.html b/docs/index.html index 76320f6d..a5a1bb68 100644 --- a/docs/index.html +++ b/docs/index.html @@ -42,7 +42,7 @@ AMR (for R) - 0.7.1.9034 + 0.7.1.9035 @@ -302,7 +302,7 @@

WHO Collaborating Centre for Drug Statistics Methodology

This package contains all ~450 antimicrobial drugs and their Anatomical Therapeutic Chemical (ATC) codes, ATC groups and Defined Daily Dose (DDD, oral and IV) from the World Health Organization Collaborating Centre for Drug Statistics Methodology (WHOCC, https://www.whocc.no) and the Pharmaceuticals Community Register of the European Commission.

-

NOTE: The WHOCC copyright does not allow use for commercial purposes, unlike any other info from this package. See \url{https://www.whocc.no/copyright_disclaimer/}.

+

NOTE: The WHOCC copyright does not allow use for commercial purposes, unlike any other info from this package. See https://www.whocc.no/copyright_disclaimer/.

Read more about the data from WHOCC in our manual.

diff --git a/docs/news/index.html b/docs/news/index.html index 939281f7..9575cea0 100644 --- a/docs/news/index.html +++ b/docs/news/index.html @@ -78,7 +78,7 @@ AMR (for R) - 0.7.1.9034 + 0.7.1.9035
@@ -225,16 +225,20 @@ -
+

-AMR 0.7.1.9034 Unreleased +AMR 0.7.1.9035 Unreleased

Breaking

  • Function freq() has moved to a new package, clean (CRAN link). Creating frequency tables is actually not the scope of this package (never was) and this function has matured a lot over the last two years. Therefore, a new package was created for data cleaning and checking and it perfectly fits the freq() function. The clean package is available on CRAN and will be installed automatically when updating the AMR package, that now imports it. In a later stage, the skewness() and kurtosis() functions will be moved to the clean package too.
  • -
  • Determination of first isolates now excludes all ‘unknown’ microorganisms at default, i.e. microbial code "UNKNOWN". They can be included with the new parameter include_unknown: first_isolates(..., include_unknown = TRUE). For WHONET users, this means that all records with organism code "con" (contamination) will be excluded at default, since as.mo("con") = "UNKNOWN".
  • +
  • +

    Determination of first isolates now excludes all ‘unknown’ microorganisms at default, i.e. microbial code "UNKNOWN". They can be included with the new parameter include_unknown:

    +
    first_isolate(..., include_unknown = TRUE)
    +

    For WHONET users, this means that all records with organism code "con" (contamination) will be excluded at default, since as.mo("con") = "UNKNOWN". The function always shows a note with the number of ‘unknown’ microorganisms that were included or excluded.

    +
@@ -243,29 +247,29 @@ @@ -286,14 +290,14 @@
  • Added tibble printing support for classes rsi, mic, ab and mo. When using tibbles containing antibiotic columns, values S will print in green, values I will print in yellow and values R will print in red:

    -
    (run this on your own console, as this page does not support colour printing)
    -tibble(mo = sample(AMR::microorganisms$fullname, 10),
    -       drug1 = as.rsi(sample(c("S", "I", "R"), 10, replace = TRUE, 
    -                             prob = c(0.6, 0.1, 0.3))),
    -       drug2 = as.rsi(sample(c("S", "I", "R"), 10, replace = TRUE,
    -                             prob = c(0.6, 0.1, 0.3))),
    -       drug3 = as.rsi(sample(c("S", "I", "R"), 10, replace = TRUE,
    -                             prob = c(0.6, 0.1, 0.3))))
    +
    # (run this on your own console, as this page does not support colour printing)
    +tibble(mo = sample(AMR::microorganisms$fullname, 10),
    +       drug1 = as.rsi(sample(c("S", "I", "R"), 10, replace = TRUE, 
    +                             prob = c(0.6, 0.1, 0.3))),
    +       drug2 = as.rsi(sample(c("S", "I", "R"), 10, replace = TRUE,
    +                             prob = c(0.6, 0.1, 0.3))),
    +       drug3 = as.rsi(sample(c("S", "I", "R"), 10, replace = TRUE,
    +                             prob = c(0.6, 0.1, 0.3))))
  • Removed class atc - using as.atc() is now deprecated in favour of ab_atc() and this will return a character, not the atc class anymore
  • Removed deprecated functions abname(), ab_official(), atc_name(), atc_official(), atc_property(), atc_tradenames(), atc_trivial_nl() @@ -312,7 +316,7 @@
  • Fix for using mo_* functions where the coercion uncertainties and failures would not be available through mo_uncertainties() and mo_failures() anymore
  • Deprecated the country parameter of mdro() in favour of the already existing guideline parameter to support multiple guidelines within one country
  • The name of RIF is now Rifampicin instead of Rifampin
  • -
  • The antibiotics data set is now sorted by name
  • +
  • The antibiotics data set is now sorted by name and all cephalosporines now have their generation between brackets
  • Speed improvement for guess_ab_col() which is now 30 times faster for antibiotic abbreviations

  • @@ -334,14 +338,14 @@

    All these lead to the microbial ID of E. coli:

    - +
  • Function mo_info() as an analogy to ab_info(). The mo_info() prints a list with the full taxonomy, authors, and the URL to the online database of a microorganism
  • Function mo_synonyms() to get all previously accepted taxonomic names of a microorganism

  • @@ -463,14 +467,14 @@ Please
    septic_patients %>% 
    -  freq(age) %>% 
    -  boxplot()
    -# grouped boxplots:
    -septic_patients %>% 
    -  group_by(hospital_id) %>% 
    -  freq(age) %>%
    -  boxplot()
    + @@ -555,32 +559,32 @@ This data is updated annually - check the included version with the new function
  • New filters for antimicrobial classes. Use these functions to filter isolates on results in one of more antibiotics from a specific class:

    - +

    The antibiotics data set will be searched, after which the input data will be checked for column names with a value in any abbreviations, codes or official names found in the antibiotics data set. For example:

    - +
  • All ab_* functions are deprecated and replaced by atc_* functions:

    - + These functions use as.atc() internally. The old atc_property has been renamed atc_online_property(). This is done for two reasons: firstly, not all ATC codes are of antibiotics (ab) but can also be of antivirals or antifungals. Secondly, the input must have class atc or must be coerable to this class. Properties of these classes should start with the same class name, analogous to as.mo() and e.g. mo_genus.
  • New functions set_mo_source() and get_mo_source() to use your own predefined MO codes as input for as.mo() and consequently all mo_* functions
  • Support for the upcoming dplyr version 0.8.0
  • @@ -592,20 +596,20 @@ These functions use as.atc()New function age_groups() to split ages into custom or predefined groups (like children or elderly). This allows for easier demographic antimicrobial resistance analysis per age group.
  • New function ggplot_rsi_predict() as well as the base R plot() function can now be used for resistance prediction calculated with resistance_predict():

    -
    x <- resistance_predict(septic_patients, col_ab = "amox")
    -plot(x)
    -ggplot_rsi_predict(x)
    +
    x <- resistance_predict(septic_patients, col_ab = "amox")
    +plot(x)
    +ggplot_rsi_predict(x)
  • Functions filter_first_isolate() and filter_first_weighted_isolate() to shorten and fasten filtering on data sets with antimicrobial results, e.g.:

    - +

    is equal to:

    -
    septic_patients %>%
    -  mutate(only_firsts = first_isolate(septic_patients, ...)) %>%
    -  filter(only_firsts == TRUE) %>%
    -  select(-only_firsts)
    +
    septic_patients %>%
    +  mutate(only_firsts = first_isolate(septic_patients, ...)) %>%
    +  filter(only_firsts == TRUE) %>%
    +  select(-only_firsts)
  • New function availability() to check the number of available (non-empty) results in a data.frame
  • @@ -634,33 +638,33 @@ These functions use as.atc()
  • Now handles incorrect spelling, like i instead of y and f instead of ph:

    - +
  • Uncertainty of the algorithm is now divided into four levels, 0 to 3, where the default allow_uncertain = TRUE is equal to uncertainty level 2. Run ?as.mo for more info about these levels.

    -
    # equal:
    -as.mo(..., allow_uncertain = TRUE)
    -as.mo(..., allow_uncertain = 2)
    -
    -# also equal:
    -as.mo(..., allow_uncertain = FALSE)
    -as.mo(..., allow_uncertain = 0)
    +
    # equal:
    +as.mo(..., allow_uncertain = TRUE)
    +as.mo(..., allow_uncertain = 2)
    +
    +# also equal:
    +as.mo(..., allow_uncertain = FALSE)
    +as.mo(..., allow_uncertain = 0)
    Using as.mo(..., allow_uncertain = 3) could lead to very unreliable results.
  • Implemented the latest publication of Becker et al. (2019), for categorising coagulase-negative Staphylococci
  • All microbial IDs that found are now saved to a local file ~/.Rhistory_mo. Use the new function clean_mo_history() to delete this file, which resets the algorithms.
  • Incoercible results will now be considered ‘unknown’, MO code UNKNOWN. On foreign systems, properties of these will be translated to all languages already previously supported: German, Dutch, French, Italian, Spanish and Portuguese:

    - +
  • Fix for vector containing only empty values
  • Finds better results when input is in other languages
  • @@ -706,19 +710,19 @@ Using as.mo(..., allow_uncertain = 3)
  • Support for tidyverse quasiquotation! Now you can create frequency tables of function outcomes:

    - +
  • Header info is now available as a list, with the header function
  • The parameter header is now set to TRUE at default, even for markdown
  • @@ -793,10 +797,10 @@ Using as.mo(..., allow_uncertain = 3)Fewer than 3 characters as input for as.mo will return NA
  • Function as.mo (and all mo_* wrappers) now supports genus abbreviations with “species” attached

    -
    as.mo("E. species")        # B_ESCHR
    -mo_fullname("E. spp.")     # "Escherichia species"
    -as.mo("S. spp")            # B_STPHY
    -mo_fullname("S. species")  # "Staphylococcus species"
    +
    as.mo("E. species")        # B_ESCHR
    +mo_fullname("E. spp.")     # "Escherichia species"
    +as.mo("S. spp")            # B_STPHY
    +mo_fullname("S. species")  # "Staphylococcus species"
  • Added parameter combine_IR (TRUE/FALSE) to functions portion_df and count_df, to indicate that all values of I and R must be merged into one, so the output only consists of S vs. IR (susceptible vs. non-susceptible)
  • Fix for portion_*(..., as_percent = TRUE) when minimal number of isolates would not be met
  • @@ -809,15 +813,15 @@ Using as.mo(..., allow_uncertain = 3)
  • Support for grouping variables, test with:

    - +
  • Support for (un)selecting columns:

    - +
  • Check for hms::is.hms
  • @@ -897,18 +901,18 @@ Using as.mo(..., allow_uncertain = 3)

    They also come with support for German, Dutch, French, Italian, Spanish and Portuguese:

    -
    mo_gramstain("E. coli")
    -# [1] "Gram negative"
    -mo_gramstain("E. coli", language = "de") # German
    -# [1] "Gramnegativ"
    -mo_gramstain("E. coli", language = "es") # Spanish
    -# [1] "Gram negativo"
    -mo_fullname("S. group A", language = "pt") # Portuguese
    -# [1] "Streptococcus grupo A"
    +
    mo_gramstain("E. coli")
    +# [1] "Gram negative"
    +mo_gramstain("E. coli", language = "de") # German
    +# [1] "Gramnegativ"
    +mo_gramstain("E. coli", language = "es") # Spanish
    +# [1] "Gram negativo"
    +mo_fullname("S. group A", language = "pt") # Portuguese
    +# [1] "Streptococcus grupo A"

    Furthermore, former taxonomic names will give a note about the current taxonomic name:

    - +
  • Functions count_R, count_IR, count_I, count_SI and count_S to selectively count resistant or susceptible isolates
  • @@ -1233,7 +1237,7 @@ Using as.mo(..., allow_uncertain = 3)

    Contents

    @@ -267,7 +267,7 @@ ab -

    an antibiotic code, generated with as.ab

    +

    an antimicrobial code, generated with as.ab

    guideline @@ -279,7 +279,7 @@ threshold -

    maximum fraction of x that is allowed to fail transformation, see Examples

    +

    maximum fraction of invalid antimicrobial interpretations of x, see Examples

    @@ -290,7 +290,7 @@

    Details

    Run unique(AMR::rsi_translation$guideline) for a list of all supported guidelines.

    -

    After using as.rsi, you can use eucast_rules to (1) apply inferred susceptibility and resistance based on results of other antibiotics and (2) apply intrinsic resistance based on taxonomic properties of a microorganism.

    +

    After using as.rsi, you can use eucast_rules to (1) apply inferred susceptibility and resistance based on results of other antimicrobials and (2) apply intrinsic resistance based on taxonomic properties of a microorganism.

    The function is.rsi.eligible returns TRUE when a columns contains at most 5% invalid antimicrobial interpretations (not S and/or I and/or R), and FALSE otherwise. The threshold of 5% can be set with the threshold parameter.

    Interpretation of S, I and R

    diff --git a/docs/reference/catalogue_of_life.html b/docs/reference/catalogue_of_life.html index a7352ec1..11c04ccb 100644 --- a/docs/reference/catalogue_of_life.html +++ b/docs/reference/catalogue_of_life.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9033 + 0.7.1.9035
    diff --git a/docs/reference/ggplot_rsi.html b/docs/reference/ggplot_rsi.html index dcab20f1..7241202f 100644 --- a/docs/reference/ggplot_rsi.html +++ b/docs/reference/ggplot_rsi.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9033 + 0.7.1.9035 diff --git a/docs/reference/index.html b/docs/reference/index.html index 4c744dc2..de5910f1 100644 --- a/docs/reference/index.html +++ b/docs/reference/index.html @@ -78,7 +78,7 @@ AMR (for R) - 0.7.1.9034 + 0.7.1.9035 diff --git a/docs/reference/like.html b/docs/reference/like.html index 954e87c7..b5d0978f 100644 --- a/docs/reference/like.html +++ b/docs/reference/like.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9029 + 0.7.1.9035 @@ -261,7 +261,7 @@

    Source

    -

    Idea from the like function from the data.table package, but made it case insensitive at default and let it support multiple patterns.

    +

    Idea from the like function from the data.table package, but made it case insensitive at default and let it support multiple patterns. Also, if the regex fails the first time, it tries again with perl = TRUE.

    Value

    diff --git a/docs/reference/microorganisms.codes.html b/docs/reference/microorganisms.codes.html index a05b208e..cdc1fe23 100644 --- a/docs/reference/microorganisms.codes.html +++ b/docs/reference/microorganisms.codes.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9034 + 0.7.1.9035 diff --git a/docs/reference/microorganisms.html b/docs/reference/microorganisms.html index 558e7bd1..64d3804d 100644 --- a/docs/reference/microorganisms.html +++ b/docs/reference/microorganisms.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9033 + 0.7.1.9035 diff --git a/docs/reference/microorganisms.old.html b/docs/reference/microorganisms.old.html index 71e3a485..379e0078 100644 --- a/docs/reference/microorganisms.old.html +++ b/docs/reference/microorganisms.old.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9033 + 0.7.1.9035 diff --git a/docs/reference/translate.html b/docs/reference/translate.html index 8d17761c..083b5d6f 100644 --- a/docs/reference/translate.html +++ b/docs/reference/translate.html @@ -47,7 +47,7 @@ - + @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9029 + 0.7.1.9035 @@ -230,7 +230,7 @@
    -

    For language-dependent output of AMR functions, like mo_fullname and mo_type.

    +

    For language-dependent output of AMR functions, like mo_name, mo_type and ab_name.

    @@ -239,9 +239,10 @@

    Details

    Strings will be translated to foreign languages if they are defined in a local translation file. Additions to this file can be suggested at our repository. The file can be found here: https://gitlab.com/msberends/AMR/blob/master/data-raw/translations.tsv.

    +

    Currently supported languages can be found if running: unique(AMR:::translations_file$lang).

    Please suggest your own translations by creating a new issue on our repository.

    This file will be read by all functions where a translated output can be desired, like all mo_property functions (mo_fullname, mo_type, etc.).

    -

    The system language will be used at default, if supported, using get_locale. The system language can be overwritten with getOption("AMR_locale").

    +

    The system language will be used at default, if that language is supported. The system language can be overwritten with getOption("AMR_locale").

    Read more on our website!

    @@ -256,27 +257,27 @@ # with get_locale() # English -mo_fullname("CoNS", language = "en") +mo_name("CoNS", language = "en") #> "Coagulase-negative Staphylococcus (CoNS)" # German -mo_fullname("CoNS", language = "de") +mo_name("CoNS", language = "de") #> "Koagulase-negative Staphylococcus (KNS)" # Dutch -mo_fullname("CoNS", language = "nl") +mo_name("CoNS", language = "nl") #> "Coagulase-negatieve Staphylococcus (CNS)" # Spanish -mo_fullname("CoNS", language = "es") +mo_name("CoNS", language = "es") #> "Staphylococcus coagulasa negativo (SCN)" # Italian -mo_fullname("CoNS", language = "it") +mo_name("CoNS", language = "it") #> "Staphylococcus negativo coagulasi (CoNS)" # Portuguese -mo_fullname("CoNS", language = "pt") +mo_name("CoNS", language = "pt") #> "Staphylococcus coagulase negativo (CoNS)" # } diff --git a/index.md b/index.md index 7874f335..9bd80c57 100644 --- a/index.md +++ b/index.md @@ -130,7 +130,7 @@ Read more about the data from the Catalogue of Life [in our manual](./reference/ This package contains **all ~450 antimicrobial drugs** and their Anatomical Therapeutic Chemical (ATC) codes, ATC groups and Defined Daily Dose (DDD, oral and IV) from the World Health Organization Collaborating Centre for Drug Statistics Methodology (WHOCC, https://www.whocc.no) and the [Pharmaceuticals Community Register of the European Commission](http://ec.europa.eu/health/documents/community-register/html/atc.htm). -**NOTE: The WHOCC copyright does not allow use for commercial purposes, unlike any other info from this package. See \url{https://www.whocc.no/copyright_disclaimer/}.** +**NOTE: The WHOCC copyright does not allow use for commercial purposes, unlike any other info from this package. See https://www.whocc.no/copyright_disclaimer/.** Read more about the data from WHOCC [in our manual](./reference/WHOCC.html). diff --git a/man/as.rsi.Rd b/man/as.rsi.Rd index 5751e176..1ad8b230 100755 --- a/man/as.rsi.Rd +++ b/man/as.rsi.Rd @@ -29,13 +29,13 @@ is.rsi.eligible(x, threshold = 0.05) \item{mo}{a microorganism code, generated with \code{\link{as.mo}}} -\item{ab}{an antibiotic code, generated with \code{\link{as.ab}}} +\item{ab}{an antimicrobial code, generated with \code{\link{as.ab}}} \item{guideline}{defaults to the latest included EUCAST guideline, run \code{unique(AMR::rsi_translation$guideline)} for all options} \item{col_mo}{column name of the unique IDs of the microorganisms (see \code{\link{mo}}), defaults to the first column of class \code{mo}. Values will be coerced using \code{\link{as.mo}}.} -\item{threshold}{maximum fraction of \code{x} that is allowed to fail transformation, see Examples} +\item{threshold}{maximum fraction of invalid antimicrobial interpretations of \code{x}, see Examples} } \value{ Ordered factor with new class \code{rsi} @@ -46,7 +46,7 @@ Interpret MIC values according to EUCAST or CLSI, or clean up existing RSI value \details{ Run \code{unique(AMR::rsi_translation$guideline)} for a list of all supported guidelines. -After using \code{as.rsi}, you can use \code{\link{eucast_rules}} to (1) apply inferred susceptibility and resistance based on results of other antibiotics and (2) apply intrinsic resistance based on taxonomic properties of a microorganism. +After using \code{as.rsi}, you can use \code{\link{eucast_rules}} to (1) apply inferred susceptibility and resistance based on results of other antimicrobials and (2) apply intrinsic resistance based on taxonomic properties of a microorganism. The function \code{is.rsi.eligible} returns \code{TRUE} when a columns contains at most 5\% invalid antimicrobial interpretations (not S and/or I and/or R), and \code{FALSE} otherwise. The threshold of 5\% can be set with the \code{threshold} parameter. } diff --git a/man/like.Rd b/man/like.Rd index 74d820fd..0ae9b091 100755 --- a/man/like.Rd +++ b/man/like.Rd @@ -5,7 +5,7 @@ \alias{\%like\%} \title{Pattern Matching} \source{ -Idea from the \href{https://github.com/Rdatatable/data.table/blob/master/R/like.R}{\code{like} function from the \code{data.table} package}, but made it case insensitive at default and let it support multiple patterns. +Idea from the \href{https://github.com/Rdatatable/data.table/blob/master/R/like.R}{\code{like} function from the \code{data.table} package}, but made it case insensitive at default and let it support multiple patterns. Also, if the regex fails the first time, it tries again with \code{perl = TRUE}. } \usage{ like(x, pattern) diff --git a/man/translate.Rd b/man/translate.Rd index 47945857..5426ee68 100644 --- a/man/translate.Rd +++ b/man/translate.Rd @@ -8,16 +8,18 @@ get_locale() } \description{ -For language-dependent output of AMR functions, like \code{\link{mo_fullname}} and \code{\link{mo_type}}. +For language-dependent output of AMR functions, like \code{\link{mo_name}}, \code{\link{mo_type}} and \code{\link{ab_name}}. } \details{ Strings will be translated to foreign languages if they are defined in a local translation file. Additions to this file can be suggested at our repository. The file can be found here: \url{https://gitlab.com/msberends/AMR/blob/master/data-raw/translations.tsv}. +Currently supported languages can be found if running: \code{unique(AMR:::translations_file$lang)}. + Please suggest your own translations \href{https://gitlab.com/msberends/AMR/issues/new?issue[title]=Translation\%20suggestion}{by creating a new issue on our repository}. This file will be read by all functions where a translated output can be desired, like all \code{\link{mo_property}} functions (\code{\link{mo_fullname}}, \code{\link{mo_type}}, etc.). -The system language will be used at default, if supported, using \code{\link{get_locale}}. The system language can be overwritten with \code{\link{getOption}("AMR_locale")}. +The system language will be used at default, if that language is supported. The system language can be overwritten with \code{\link{getOption}("AMR_locale")}. } \section{Read more on our website!}{ @@ -30,26 +32,26 @@ On our website \url{https://msberends.gitlab.io/AMR} you can find \href{https:// # with get_locale() # English -mo_fullname("CoNS", language = "en") +mo_name("CoNS", language = "en") #> "Coagulase-negative Staphylococcus (CoNS)" # German -mo_fullname("CoNS", language = "de") +mo_name("CoNS", language = "de") #> "Koagulase-negative Staphylococcus (KNS)" # Dutch -mo_fullname("CoNS", language = "nl") +mo_name("CoNS", language = "nl") #> "Coagulase-negatieve Staphylococcus (CNS)" # Spanish -mo_fullname("CoNS", language = "es") +mo_name("CoNS", language = "es") #> "Staphylococcus coagulasa negativo (SCN)" # Italian -mo_fullname("CoNS", language = "it") +mo_name("CoNS", language = "it") #> "Staphylococcus negativo coagulasi (CoNS)" # Portuguese -mo_fullname("CoNS", language = "pt") +mo_name("CoNS", language = "pt") #> "Staphylococcus coagulase negativo (CoNS)" } diff --git a/tests/testthat/test-misc.R b/tests/testthat/test-misc.R index 951f4d48..e27493be 100755 --- a/tests/testthat/test-misc.R +++ b/tests/testthat/test-misc.R @@ -31,10 +31,6 @@ test_that("percentages works", { expect_equal(percent(0.0055), "0.6%") }) -test_that("size format works", { - expect_equal(size_humanreadable(123456), "121 kB") -}) - test_that("functions missing in older R versions work", { expect_equal(strrep("A", 5), "AAAAA") expect_equal(strrep(c("A", "B"), c(5, 2)), c("AAAAA", "BB")) diff --git a/tests/testthat/test-mo.R b/tests/testthat/test-mo.R index b8c76a56..85a07fe6 100644 --- a/tests/testthat/test-mo.R +++ b/tests/testthat/test-mo.R @@ -253,7 +253,7 @@ test_that("as.mo works", { rep(NA_character_, 3)) expect_equal(as.character(as.mo("con")), "UNKNOWN") expect_equal(as.character(as.mo("xxx")), NA_character_) - expect_equal(as.character(as.mo(c("xxx", "con"))), c(NA_character_, "UNKNOWN")) + expect_equal(as.character(as.mo(c("xxx", "con", "eco"))), c(NA_character_, "UNKNOWN", "B_ESCHR_COL")) expect_equal(as.character(as.mo(c("other", "none", "unknown"))), rep("UNKNOWN", 3)) @@ -269,5 +269,17 @@ test_that("as.mo works", { # very old MO codes (<= v0.5.0) expect_equal(as.character(as.mo("F_CCCCS_NEO")), "F_CRYPT_NEO") expect_equal(as.character(as.mo("F_CANDD_GLB")), "F_CANDD_GLA") + + # debug mode + expect_warning(as.mo("kshgcjkhsdgkshjdfsfvsdfv", debug = TRUE, allow_uncertain = 3)) + # ..coccus + expect_equal(as.character(as.mo(c("meningococ", "gonococ", "pneumococ"))), + c("B_NESSR_MEN", "B_NESSR_GON", "B_STRPT_PNE")) + # yeasts and fungi + expect_equal(suppressWarnings(as.character(as.mo(c("yeasts", "fungi")))), + c("F_YEAST", "F_FUNGUS")) + + # print tibble + expect_output(print(tibble(mo = as.mo("B_STRPT_PNE")))) }) diff --git a/vignettes/benchmarks.Rmd b/vignettes/benchmarks.Rmd index 7ba38e0d..1e82c708 100755 --- a/vignettes/benchmarks.Rmd +++ b/vignettes/benchmarks.Rmd @@ -110,7 +110,7 @@ Uncommon microorganisms take a lot more time than common microorganisms. To reli ### Repetitive results -Repetitive results are unique values that are present more than once. Unique values will only be calculated once by `as.mo()`. We will use `mo_fullname()` for this test - a helper function that returns the full microbial name (genus, species and possibly subspecies) which uses `as.mo()` internally. +Repetitive results are unique values that are present more than once. Unique values will only be calculated once by `as.mo()`. We will use `mo_name()` for this test - a helper function that returns the full microbial name (genus, species and possibly subspecies) which uses `as.mo()` internally. ```{r, message = FALSE} library(dplyr) @@ -132,7 +132,7 @@ length(x) n_distinct(x) # now let's see: -run_it <- microbenchmark(mo_fullname(x), +run_it <- microbenchmark(mo_name(x), times = 10) print(run_it, unit = "ms", signif = 3) ``` @@ -141,22 +141,22 @@ So transforming 500,000 values (!!) of `r n_distinct(x)` unique values only take ### Precalculated results -What about precalculated results? If the input is an already precalculated result of a helper function like `mo_fullname()`, it almost doesn't take any time at all (see 'C' below): +What about precalculated results? If the input is an already precalculated result of a helper function like `mo_name()`, it almost doesn't take any time at all (see 'C' below): ```{r} -run_it <- microbenchmark(A = mo_fullname("B_STPHY_AUR"), - B = mo_fullname("S. aureus"), - C = mo_fullname("Staphylococcus aureus"), +run_it <- microbenchmark(A = mo_name("B_STPHY_AUR"), + B = mo_name("S. aureus"), + C = mo_name("Staphylococcus aureus"), times = 10) print(run_it, unit = "ms", signif = 3) ``` -So going from `mo_fullname("Staphylococcus aureus")` to `"Staphylococcus aureus"` takes `r format(round(run_it %>% filter(expr == "C") %>% pull(time) %>% median() / 1e9, 4), scientific = FALSE)` seconds - it doesn't even start calculating *if the result would be the same as the expected resulting value*. That goes for all helper functions: +So going from `mo_name("Staphylococcus aureus")` to `"Staphylococcus aureus"` takes `r format(round(run_it %>% filter(expr == "C") %>% pull(time) %>% median() / 1e9, 4), scientific = FALSE)` seconds - it doesn't even start calculating *if the result would be the same as the expected resulting value*. That goes for all helper functions: ```{r} run_it <- microbenchmark(A = mo_species("aureus"), B = mo_genus("Staphylococcus"), - C = mo_fullname("Staphylococcus aureus"), + C = mo_name("Staphylococcus aureus"), D = mo_family("Staphylococcaceae"), E = mo_order("Bacillales"), F = mo_class("Bacilli"), @@ -173,19 +173,19 @@ Of course, when running `mo_phylum("Firmicutes")` the function has zero knowledg When the system language is non-English and supported by this `AMR` package, some functions will have a translated result. This almost does't take extra time: ```{r} -mo_fullname("CoNS", language = "en") # or just mo_fullname("CoNS") on an English system +mo_name("CoNS", language = "en") # or just mo_name("CoNS") on an English system -mo_fullname("CoNS", language = "es") # or just mo_fullname("CoNS") on a Spanish system +mo_name("CoNS", language = "es") # or just mo_name("CoNS") on a Spanish system -mo_fullname("CoNS", language = "nl") # or just mo_fullname("CoNS") on a Dutch system +mo_name("CoNS", language = "nl") # or just mo_name("CoNS") on a Dutch system -run_it <- microbenchmark(en = mo_fullname("CoNS", language = "en"), - de = mo_fullname("CoNS", language = "de"), - nl = mo_fullname("CoNS", language = "nl"), - es = mo_fullname("CoNS", language = "es"), - it = mo_fullname("CoNS", language = "it"), - fr = mo_fullname("CoNS", language = "fr"), - pt = mo_fullname("CoNS", language = "pt"), +run_it <- microbenchmark(en = mo_name("CoNS", language = "en"), + de = mo_name("CoNS", language = "de"), + nl = mo_name("CoNS", language = "nl"), + es = mo_name("CoNS", language = "es"), + it = mo_name("CoNS", language = "it"), + fr = mo_name("CoNS", language = "fr"), + pt = mo_name("CoNS", language = "pt"), times = 10) print(run_it, unit = "ms", signif = 4) ```