diff --git a/DESCRIPTION b/DESCRIPTION index ec533542..0d45b61e 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR -Version: 0.7.1.9080 -Date: 2019-09-22 +Version: 0.7.1.9081 +Date: 2019-09-23 Title: Antimicrobial Resistance Analysis Authors@R: c( person(role = c("aut", "cre"), diff --git a/NEWS.md b/NEWS.md index 2f2680f5..a6c4e532 100755 --- a/NEWS.md +++ b/NEWS.md @@ -1,5 +1,5 @@ -# AMR 0.7.1.9080 -Last updated: 22-Sep-2019 +# AMR 0.7.1.9081 +Last updated: 23-Sep-2019 ### Breaking * Determination of first isolates now **excludes** all 'unknown' microorganisms at default, i.e. microbial code `"UNKNOWN"`. They can be included with the new parameter `include_unknown`: @@ -25,17 +25,27 @@ * Function `freq()` has moved to a new package, [`clean`](https://github.com/msberends/clean) ([CRAN link](https://cran.r-project.org/package=clean)), since creating frequency tables actually does not fit the scope of this package. The `freq()` function still works, since it is re-exported from the `clean` package (which will be installed automatically upon updating this `AMR` package). ### New -* Function `bug_drug_combinations()` to quickly get a `data.frame` with the antimicrobial resistance of any bug-drug combination in a data set: +* Function `bug_drug_combinations()` to quickly get a `data.frame` with the antimicrobial resistance of any bug-drug combination in a data set. The columns with microorganism codes is guessed automatically and its input is transformed with `mo_shortname()` at default: ```r x <- bug_drug_combinations(example_isolates) - x # NOTE: Using column `mo` as input for `col_mo`. - #> ab mo S I R total - #> 1 AMC B_ESCHR_COLI 332 74 61 467 - #> 2 AMC B_KLBSL_PNMN 49 3 6 58 - #> 3 AMC B_PROTS_MRBL 28 7 1 36 - #> 4 AMC B_PSDMN_AERG 0 0 30 30 - #> 5 AMC B_STPHY_AURS 234 0 1 235 + x[1:5, ] + #> ab mo S I R total + #> 1 AMC CoNS 178 0 132 310 + #> 2 AMC E. coli 332 74 61 467 + #> 3 AMC K. pneumoniae 49 3 6 58 + #> 4 AMC P. aeruginosa 0 0 30 30 + #> 5 AMC P. mirabilis 28 7 1 36 + + # change the transformation with the FUN argument to anything you like: + x <- bug_drug_combinations(example_isolates, FUN = mo_gramstain) + # NOTE: Using column `mo` as input for `col_mo`. + x[1:4, ] + #> ab mo S I R total + #> 1 AMC Gram-negative 469 89 174 732 + #> 2 AMC Gram-positive 873 2 272 1147 + #> 3 AMK Gram-negative 251 0 2 253 + #> 4 AMK Gram-positive 0 0 100 100 ``` You can format this to a printable format, ready for reporting or exporting to e.g. Excel with the base R `format()` function: ```r @@ -82,7 +92,8 @@ * Added support for *Blastocystis* * Added support for 5,000 new fungi * Added support for unknown yeasts and fungi - * Changed most microorganism IDs to improve readability. **IMPORTANT:** Because of these changes, the microorganism IDs have been changed to a slightly different format. Old microorganism IDs are still supported, but support will be dropped in a future version. Use `as.mo()` on your old codes to transform them to the new format. + * Changed most microorganism IDs to improve readability. For example, the old code `B_ENTRC_FAE` could have been both *E. faecalis* and *E. faecium*. Its new code is `B_ENTRC_FCLS` and *E. faecium* has become `B_ENTRC_FACM`. Also, the Latin character æ (ae) is now preserved at the start of each genus and species abbreviation. For example, the old code for *Aerococcus urinae* was `B_ARCCC_NAE`. This is now `B_AERCC_URIN`. + **IMPORTANT:** Old microorganism IDs are still supported, but support will be dropped in a future version. Use `as.mo()` on your old codes to transform them to the new format. Using functions from the `mo_*` family (like `mo_name()` and `mo_gramstain()`) on old codes, will throw a warning. * Renamed data set `septic_patients` to `example_isolates` * Function `eucast_rules()`: * Fixed a bug for *Yersinia pseudotuberculosis* diff --git a/R/bug_drug_combinations.R b/R/bug_drug_combinations.R index fb893da0..b51a8aad 100644 --- a/R/bug_drug_combinations.R +++ b/R/bug_drug_combinations.R @@ -25,12 +25,16 @@ #' @inheritParams eucast_rules #' @param combine_IR logical to indicate whether values R and I should be summed #' @param add_ab_group logical to indicate where the group of the antimicrobials must be included as a first column -#' @param ... argumments passed on to \code{\link{mo_name}} +#' @param FUN the function to call on the \code{mo} column to transform the microorganism IDs, defaults to \code{\link{mo_shortname}} +#' @param ... argumments passed on to \code{FUN} #' @inheritParams rsi_df +#' @inheritParams base::formatC #' @importFrom dplyr rename #' @importFrom tidyr spread #' @importFrom clean freq #' @details The function \code{format} calculates the resistance per bug-drug combination. Use \code{combine_IR = FALSE} (default) to test R vs. S+I and \code{combine_IR = TRUE} to test R+I vs. S. +#' +#' The language of the output can be overwritten with \code{options(AMR_locale)}, please see \link{translate}. #' @export #' @rdname bug_drug_combinations #' @source \strong{M39 Analysis and Presentation of Cumulative Antimicrobial Susceptibility Test Data, 4th Edition}, 2014, \emph{Clinical and Laboratory Standards Institute (CLSI)}. \url{https://clsi.org/standards/products/microbiology/documents/m39/}. @@ -40,8 +44,21 @@ #' x <- bug_drug_combinations(example_isolates) #' x #' format(x) +#' +#' # Use FUN to change to transformation of microorganism codes +#' x <- bug_drug_combinations(example_isolates, +#' FUN = mo_gramstain) +#' +#' x <- bug_drug_combinations(example_isolates, +#' FUN = function(x) ifelse(x == "B_ESCHR_COLI", +#' "E. coli", +#' "Others")) #' } -bug_drug_combinations <- function(x, col_mo = NULL, minimum = 30) { +bug_drug_combinations <- function(x, + col_mo = NULL, + minimum = 30, + FUN = mo_shortname, + ...) { if (!is.data.frame(x)) { stop("`x` must be a data frame.", call. = FALSE) } @@ -56,7 +73,7 @@ bug_drug_combinations <- function(x, col_mo = NULL, minimum = 30) { } x <- x %>% - mutate(col_mo = x %>% pull(col_mo)) %>% + mutate(mo = x %>% pull(col_mo) %>% FUN(...)) %>% filter(mo %in% (clean::freq(mo) %>% filter(count >= minimum) %>% pull(item))) %>% @@ -76,31 +93,37 @@ bug_drug_combinations <- function(x, col_mo = NULL, minimum = 30) { #' @exportMethod format.bug_drug_combinations #' @export #' @rdname bug_drug_combinations -format.bug_drug_combinations <- function(x, combine_IR = FALSE, add_ab_group = TRUE, ...) { +format.bug_drug_combinations <- function(x, + combine_IR = FALSE, + add_ab_group = TRUE, + decimal.mark = getOption("OutDec"), + big.mark = ifelse(decimal.mark == ",", ".", ",")) { if (combine_IR == FALSE) { x$isolates <- x$R } else { x$isolates <- x$R + x$I } y <- x %>% - mutate(mo = mo_name(mo, ...), - txt = paste0(percent(isolates / total, force_zero = TRUE), - " (", trimws(format(isolates, big.mark = ",")), "/", - trimws(format(total, big.mark = ",")), ")")) %>% + mutate(txt = paste0(percent(isolates / total, force_zero = TRUE, decimal.mark = decimal.mark, big.mark = big.mark), + " (", trimws(format(isolates, big.mark = big.mark)), "/", + trimws(format(total, big.mark = big.mark)), ")")) %>% select(ab, mo, txt) %>% spread(mo, txt) %>% mutate_all(~ifelse(is.na(.), "", .)) %>% - mutate(ab = paste0(ab_name(ab), " (", as.ab(ab), ", ", ab_atc(ab), ")"), - ab_group = ab_group(ab)) %>% + mutate(ab_group = ab_group(ab), + ab = paste0(ab_name(ab), " (", as.ab(ab), ", ", ab_atc(ab), ")")) %>% select(ab_group, ab, everything()) %>% arrange(ab_group, ab) %>% mutate(ab_group = ifelse(ab_group != lag(ab_group) | is.na(lag(ab_group)), ab_group, "")) if (add_ab_group == FALSE) { - y <- y %>% select(-ab_group) + y <- y %>% select(-ab_group) %>% rename("Antibiotic" = ab) + colnames(y)[1] <- translate_AMR(colnames(y)[1], language = get_locale(), only_unknown = FALSE) + } else { + y <- y %>% rename("Group" = ab_group, + "Antibiotic" = ab) + colnames(y)[1:2] <- translate_AMR(colnames(y)[1:2], language = get_locale(), only_unknown = FALSE) } - y <- y %>% rename("Group" = ab_group, - "Antibiotic" = ab) y } diff --git a/R/freq.R b/R/freq.R index bd58d646..a0b36256 100755 --- a/R/freq.R +++ b/R/freq.R @@ -42,8 +42,8 @@ freq.mo <- function(x, ...) { decimal.mark = "."), " (", percent(sum(grams == "Gram-positive", na.rm = TRUE) / length(grams), force_zero = TRUE, round = 2), ")"), - `Unique genera` = n_distinct(mo_genus(x_noNA, language = NULL)), - `Unique species` = n_distinct(paste(mo_genus(x_noNA, language = NULL), + `Nr of genera` = n_distinct(mo_genus(x_noNA, language = NULL)), + `Nr of species` = n_distinct(paste(mo_genus(x_noNA, language = NULL), mo_species(x_noNA, language = NULL))))) } diff --git a/R/mo.R b/R/mo.R index 0348665e..00969922 100755 --- a/R/mo.R +++ b/R/mo.R @@ -65,59 +65,48 @@ #' Usually, any guess after the first try runs 80-95\% faster than the first try. #' # \emph{For now, learning only works per session. If R is closed or terminated, the algorithms reset. This might be resolved in a future version.} -#' +#' This resets with every update of this \code{AMR} package since results are saved to your local package library folder. +#' #' \strong{Intelligent rules} \cr -#' This function uses intelligent rules to help getting fast and logical results. It tries to find matches in this order: +#' The \code{as.mo()} function uses several coercion rules for fast and logical results. It assesses the input matching criteria in the following order: + #' \itemize{ -#' \item{Valid MO codes and full names: it first searches in already valid MO code and known genus/species combinations} -#' \item{Human pathogenic prevalence: it first searches in more prevalent microorganisms, then less prevalent ones (see \emph{Microbial prevalence of pathogens in humans} below)} -#' \item{Taxonomic kingdom: it first searches in Bacteria, then Fungi, then Protozoa, then Archaea, then others} -#' \item{Breakdown of input values: from here it starts to breakdown input values to find possible matches} +#' \item{Human pathogenic prevalence: the function starts with more prevalent microorganisms, followed by less prevalent ones;} +#' \item{Taxonomic kingdom: the function starts with determining Bacteria, then Fungi, then Protozoa, then others;} +#' \item{Breakdown of input values to identify possible matches.} +#' } +#' +#' This will lead to the effect that e.g. \code{"E. coli"} (a highly prevalent microorganism found in humans) will return the microbial ID of \emph{Escherichia coli} and not \emph{Entamoeba coli} (a less prevalent microorganism in humans), although the latter would alphabetically come first. In addition, the \code{as.mo()} function can differentiate four levels of uncertainty to guess valid results: +#' +#' \itemize{ +#' \item{Uncertainty level 0: no additional rules are applied;} +#' \item{Uncertainty level 1: allow previously accepted (but now invalid) taxonomic names and minor spelling errors;} +#' \item{Uncertainty level 2: allow all of level 1, strip values between brackets, inverse the words of the input, strip off text elements from the end keeping at least two elements;} +#' \item{Uncertainty level 3: allow all of level 1 and 2, strip off text elements from the end, allow any part of a taxonomic name.} #' } #' -#' -#' A couple of effects because of these rules: -#' \itemize{ -#' \item{\code{"E. coli"} will return the ID of \emph{Escherichia coli} and not \emph{Entamoeba coli}, although the latter would alphabetically come first} -#' \item{\code{"H. influenzae"} will return the ID of \emph{Haemophilus influenzae} and not \emph{Haematobacter influenzae} for the same reason} -#' \item{Something like \code{"stau"} or \code{"S aur"} will return the ID of \emph{Staphylococcus aureus} and not \emph{Staphylococcus auricularis}} -#' } -#' This means that looking up human pathogenic microorganisms takes less time than looking up human non-pathogenic microorganisms. -#' -#' \strong{Uncertain results} \cr -#' The algorithm can additionally use three different levels of uncertainty to guess valid results. The default is \code{allow_uncertain = TRUE}, which is equal to uncertainty level 2. Using \code{allow_uncertain = FALSE} will skip all of these additional rules: -#' \itemize{ -#' \item{(uncertainty level 1): It tries to look for only matching genera, previously accepted (but now invalid) taxonomic names and misspelled input} -#' \item{(uncertainty level 2): It removed parts between brackets, strips off words from the end one by one and re-evaluates the input with all previous rules} -#' \item{(uncertainty level 3): It strips off words from the start one by one and tries any part of the name} -#' } -#' -#' You can also use e.g. \code{as.mo(..., allow_uncertain = 1)} to only allow up to level 1 uncertainty. -#' -#' Examples: +#' This leads to e.g.: +#' #' \itemize{ #' \item{\code{"Streptococcus group B (known as S. agalactiae)"}. The text between brackets will be removed and a warning will be thrown that the result \emph{Streptococcus group B} (\code{B_STRPT_GRPB}) needs review.} -#' \item{\code{"S. aureus - please mind: MRSA"}. The last word will be stripped, after which the function will try to find a match. If it does not, the second last word will be stripped, etc. Again, a warning will be thrown that the result \emph{Staphylococcus aureus} (\code{B_STPHY_AUR}) needs review.} -#' \item{\code{"Fluoroquinolone-resistant Neisseria gonorrhoeae"}. The first word will be stripped, after which the function will try to find a match. A warning will be thrown that the result \emph{Neisseria gonorrhoeae} (\code{B_NESSR_GON}) needs review.} +#' \item{\code{"S. aureus - please mind: MRSA"}. The last word will be stripped, after which the function will try to find a match. If it does not, the second last word will be stripped, etc. Again, a warning will be thrown that the result \emph{Staphylococcus aureus} (\code{B_STPHY_AURS}) needs review.} +#' \item{\code{"Fluoroquinolone-resistant Neisseria gonorrhoeae"}. The first word will be stripped, after which the function will try to find a match. A warning will be thrown that the result \emph{Neisseria gonorrhoeae} (\code{B_NESSR_GNRR}) needs review.} #' } #' -#' Use \code{mo_failures()} to get a vector with all values that could not be coerced to a valid value. -#' -#' Use \code{mo_uncertainties()} to get a data.frame with all values that were coerced to a valid value, but with uncertainty. -#' -#' Use \code{mo_renamed()} to get a data.frame with all values that could be coerced based on an old, previously accepted taxonomic name. +#' The level of uncertainty can be set using the argument \code{allow_uncertain}. The default is \code{allow_uncertain = TRUE}, which is equal to uncertainty level 2. Using \code{allow_uncertain = FALSE} is equal to uncertainty level 0 and will skip all rules. You can also use e.g. \code{as.mo(..., allow_uncertain = 1)} to only allow up to level 1 uncertainty. +#' +#' Use \code{mo_failures()} to get a vector with all values that could not be coerced to a valid value. \cr +#' Use \code{mo_uncertainties()} to get a \code{data.frame} with all values that were coerced to a valid value, but with uncertainty. \cr +#' Use \code{mo_renamed()} to get a \code{data.frame} with all values that could be coerced based on an old, previously accepted taxonomic name. #' #' \strong{Microbial prevalence of pathogens in humans} \cr -#' The intelligent rules take into account microbial prevalence of pathogens in humans. It uses three groups and all (sub)species are in only one group. These groups are: -#' \itemize{ -#' \item{1 (most prevalent): class is Gammaproteobacteria \strong{or} genus is one of: \emph{Enterococcus}, \emph{Staphylococcus}, \emph{Streptococcus}.} -#' \item{2: phylum is one of: Proteobacteria, Firmicutes, Actinobacteria, Sarcomastigophora \strong{or} genus is one of: \emph{Aspergillus}, \emph{Bacteroides}, \emph{Candida}, \emph{Capnocytophaga}, \emph{Chryseobacterium}, \emph{Cryptococcus}, \emph{Elisabethkingia}, \emph{Flavobacterium}, \emph{Fusobacterium}, \emph{Giardia}, \emph{Leptotrichia}, \emph{Mycoplasma}, \emph{Prevotella}, \emph{Rhodotorula}, \emph{Treponema}, \emph{Trichophyton}, \emph{Ureaplasma}.} -#' \item{3 (least prevalent): all others.} -#' } -#' -#' Group 1 contains all common Gram positives and Gram negatives, like all Enterobacteriaceae and e.g. \emph{Pseudomonas} and \emph{Legionella}. -#' -#' Group 2 contains probably less pathogenic microorganisms; all other members of phyla that were found in humans in the Northern Netherlands between 2001 and 2018. +#' The intelligent rules consider the prevalence of microorganisms in humans grouped into three groups, which is available as the \code{prevalence} columns in the \code{\link{microorganisms}} and \code{\link{microorganisms.old}} data sets. The grouping into prevalence groups is based on experience from several microbiological laboratories in the Netherlands in conjunction with international reports on pathogen prevalence. +#' +#' Group 1 (most prevalent microorganisms) consists of all microorganisms where the taxonomic class is Gammaproteobacteria or where the taxonomic genus is \emph{Enterococcus}, \emph{Staphylococcus} or \emph{Streptococcus}. This group consequently contains all common Gram-negative bacteria, such as \emph{Pseudomonas} and \emph{Legionella} and all species within the order Enterobacteriales. +#' +#' Group 2 consists of all microorganisms where the taxonomic phylum is Proteobacteria, Firmicutes, Actinobacteria or Sarcomastigophora, or where the taxonomic genus is \emph{Aspergillus}, \emph{Bacteroides}, \emph{Candida}, \emph{Capnocytophaga}, \emph{Chryseobacterium}, \emph{Cryptococcus}, \emph{Elisabethkingia}, \emph{Flavobacterium}, \emph{Fusobacterium}, \emph{Giardia}, \emph{Leptotrichia}, \emph{Mycoplasma}, \emph{Prevotella}, \emph{Rhodotorula}, \emph{Treponema}, \emph{Trichophyton} or \emph{Ureaplasma}. +#' +#' Group 3 (least prevalent microorganisms) consists of all other microorganisms. #' @inheritSection catalogue_of_life Catalogue of Life # (source as a section here, so it can be inherited by other man pages:) #' @section Source: @@ -1722,7 +1711,7 @@ exec_as.mo <- function(x, } if (old_mo_warning == TRUE & property != "mo") { - warning("The input contained old microorganism IDs from previous versions of this package. Please use as.mo() on these old codes.\nSUPPORT FOR THIS WILL BE DROPPED IN A FUTURE VERSION.", call. = FALSE) + warning("The input contained old microorganism IDs from previous versions of this package.\nPlease use `as.mo()` on these old IDs to transform them to the new format.\nSUPPORT FOR THIS WILL BE DROPPED IN A FUTURE VERSION.", call. = FALSE) } x diff --git a/R/sysdata.rda b/R/sysdata.rda index 4f28c4f4..0bc537ee 100644 Binary files a/R/sysdata.rda and b/R/sysdata.rda differ diff --git a/data-raw/translations.tsv b/data-raw/translations.tsv index feba8456..58fa076b 100644 --- a/data-raw/translations.tsv +++ b/data-raw/translations.tsv @@ -52,6 +52,8 @@ nl biogroup biogroep FALSE FALSE nl vegetative vegetatief FALSE FALSE nl ([([ ]*?)group \\1groep FALSE FALSE nl ([([ ]*?)Group \\1Groep FALSE FALSE +nl antibiotic antibioticum FALSE FALSE +nl Antibiotic Antibioticum FALSE FALSE es Coagulase-negative Staphylococcus Staphylococcus coagulasa negativo FALSE FALSE es Coagulase-positive Staphylococcus Staphylococcus coagulasa positivo FALSE FALSE es Beta-haemolytic Streptococcus Streptococcus Beta-hemolítico FALSE FALSE diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index d82259ff..99f1528c 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -78,7 +78,7 @@
- - + + + + diff --git a/docs/articles/benchmarks.html b/docs/articles/benchmarks.html index 5bbd3863..897c6bd0 100644 --- a/docs/articles/benchmarks.html +++ b/docs/articles/benchmarks.html @@ -1,35 +1,75 @@ + - -benchmarks.Rmd
Source: https://gitlab.com/msberends/AMR/blob/master/vignettes/benchmarks.Rmd
-One of the most important features of this package is the complete microbial taxonomic database, supplied by the Catalogue of Life. We created a function as.mo()
that transforms any user input value to a valid microbial ID by using intelligent rules combined with the taxonomic tree of Catalogue of Life.
Using the microbenchmark
package, we can review the calculation performance of this function. Its function microbenchmark()
runs different input expressions independently of each other and measures their time-to-result.
One of the most important features of this package is the complete microbial taxonomic database, supplied by the Catalogue of Life. We created a function as.mo()
that transforms any user input value to a valid microbial ID by using intelligent rules combined with the taxonomic tree of Catalogue of Life.
Using the microbenchmark
package, we can review the calculation performance of this function. Its function microbenchmark()
runs different input expressions independently of each other and measures their time-to-result.
In the next test, we try to ‘coerce’ different input values for Staphylococcus aureus. The actual result is the same every time: it returns its microorganism code B_STPHY_AURS
(B stands for Bacteria, the taxonomic kingdom).
But the calculation time differs a lot:
-S.aureus <- microbenchmark(
- as.mo("sau"), # WHONET code
- as.mo("stau"),
- as.mo("STAU"),
- as.mo("staaur"),
- as.mo("STAAUR"),
- as.mo("S. aureus"),
- as.mo("S aureus"),
- as.mo("Staphylococcus aureus"), # official taxonomic name
- as.mo("Staphylococcus aureus (MRSA)"), # additional text
- as.mo("Sthafilokkockus aaureuz"), # incorrect spelling
- as.mo("MRSA"), # Methicillin Resistant S. aureus
- as.mo("VISA"), # Vancomycin Intermediate S. aureus
- as.mo("VRSA"), # Vancomycin Resistant S. aureus
- as.mo(22242419), # Catalogue of Life ID
+S.aureus <- microbenchmark(
+ as.mo("sau"), # WHONET code
+ as.mo("stau"),
+ as.mo("STAU"),
+ as.mo("staaur"),
+ as.mo("STAAUR"),
+ as.mo("S. aureus"),
+ as.mo("S aureus"),
+ as.mo("Staphylococcus aureus"), # official taxonomic name
+ as.mo("Staphylococcus aureus (MRSA)"), # additional text
+ as.mo("Sthafilokkockus aaureuz"), # incorrect spelling
+ as.mo("MRSA"), # Methicillin Resistant S. aureus
+ as.mo("VISA"), # Vancomycin Intermediate S. aureus
+ as.mo("VRSA"), # Vancomycin Resistant S. aureus
+ as.mo(22242419), # Catalogue of Life ID
times = 10)
-print(S.aureus, unit = "ms", signif = 2)
+print(S.aureus, unit = "ms", signif = 2)
# Unit: milliseconds
-# expr min lq mean median uq max
-# as.mo("sau") 8.5 8.6 11 8.7 9.1 34
-# as.mo("stau") 31.0 31.0 39 31.0 56.0 58
-# as.mo("STAU") 31.0 34.0 39 34.0 35.0 60
-# as.mo("staaur") 8.5 8.7 15 8.9 9.6 67
-# as.mo("STAAUR") 8.6 8.7 14 9.0 9.9 36
-# as.mo("S. aureus") 23.0 23.0 40 25.0 26.0 180
-# as.mo("S aureus") 23.0 24.0 30 26.0 30.0 51
-# as.mo("Staphylococcus aureus") 28.0 28.0 31 29.0 29.0 51
-# as.mo("Staphylococcus aureus (MRSA)") 570.0 600.0 620 620.0 640.0 710
-# as.mo("Sthafilokkockus aaureuz") 280.0 310.0 320 320.0 330.0 340
-# as.mo("MRSA") 8.4 8.6 11 8.8 9.5 35
-# as.mo("VISA") 19.0 19.0 21 20.0 22.0 24
-# as.mo("VRSA") 19.0 19.0 27 23.0 41.0 46
-# as.mo(22242419) 18.0 18.0 22 21.0 22.0 43
+# expr min lq mean median uq max
+# as.mo("sau") 9.1 9.1 12.0 9.5 10.0 35
+# as.mo("stau") 31.0 32.0 37.0 33.0 34.0 58
+# as.mo("STAU") 31.0 32.0 34.0 34.0 35.0 37
+# as.mo("staaur") 8.6 9.1 9.7 9.8 10.0 11
+# as.mo("STAAUR") 8.7 8.9 17.0 9.4 12.0 57
+# as.mo("S. aureus") 23.0 24.0 34.0 26.0 46.0 54
+# as.mo("S aureus") 23.0 24.0 28.0 25.0 28.0 53
+# as.mo("Staphylococcus aureus") 29.0 29.0 31.0 30.0 32.0 34
+# as.mo("Staphylococcus aureus (MRSA)") 570.0 590.0 620.0 620.0 650.0 690
+# as.mo("Sthafilokkockus aaureuz") 310.0 320.0 350.0 330.0 340.0 480
+# as.mo("MRSA") 8.7 9.0 12.0 9.5 9.7 32
+# as.mo("VISA") 19.0 20.0 22.0 22.0 24.0 26
+# as.mo("VRSA") 19.0 20.0 28.0 22.0 43.0 48
+# as.mo(22242419) 18.0 19.0 25.0 22.0 23.0 41
# neval
# 10
# 10
@@ -249,137 +291,134 @@
# 10
# 10
# 10
-
+
In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 5 milliseconds means it can determine 200 input values per second. It case of 100 milliseconds, this is only 10 input values per second. The second input is the only one that has to be looked up thoroughly. All the others are known codes (the first one is a WHONET code) or common laboratory codes, or common full organism names like the last one. Full organism names are always preferred.
To achieve this speed, the as.mo
function also takes into account the prevalence of human pathogenic microorganisms. The downside is of course that less prevalent microorganisms will be determined less fast. See this example for the ID of Methanosarcina semesiae (B_MTHNSR_SEMS
), a bug probably never found before in humans:
-M.semesiae <- microbenchmark(as.mo("metsem"),
- as.mo("METSEM"),
- as.mo("M. semesiae"),
- as.mo("M. semesiae"),
- as.mo("Methanosarcina semesiae"),
+M.semesiae <- microbenchmark(as.mo("metsem"),
+ as.mo("METSEM"),
+ as.mo("M. semesiae"),
+ as.mo("M. semesiae"),
+ as.mo("Methanosarcina semesiae"),
times = 10)
-print(M.semesiae, unit = "ms", signif = 4)
+print(M.semesiae, unit = "ms", signif = 4)
# Unit: milliseconds
-# expr min lq mean median uq
-# as.mo("metsem") 1310.00 1340.0 1361.00 1358 1387.00
-# as.mo("METSEM") 1304.00 1320.0 1350.00 1341 1382.00
-# as.mo("M. semesiae") 1839.00 1968.0 1990.00 2006 2032.00
-# as.mo("M. semesiae") 1947.00 1978.0 2014.00 2019 2046.00
-# as.mo("Methanosarcina semesiae") 30.49 31.2 35.04 32 32.81
+# expr min lq mean median uq
+# as.mo("metsem") 1343.00 1379.00 1415.00 1404.00 1424.00
+# as.mo("METSEM") 1335.00 1356.00 1418.00 1410.00 1451.00
+# as.mo("M. semesiae") 1852.00 2045.00 2081.00 2107.00 2154.00
+# as.mo("M. semesiae") 1961.00 2037.00 2095.00 2085.00 2123.00
+# as.mo("Methanosarcina semesiae") 30.55 31.13 34.35 32.63 33.33
# max neval
-# 1401.00 10
-# 1411.00 10
-# 2049.00 10
-# 2088.00 10
-# 63.03 10
-That takes 15.2 times as much time on average. A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance. Full names (like Methanosarcina semesiae) are almost fast - these are the most probable input from most data sets.
+# 1579.00 10
+# 1557.00 10
+# 2163.00 10
+# 2336.00 10
+# 54.12 10
+That takes 15.6 times as much time on average. A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance. Full names (like Methanosarcina semesiae) are almost fast - these are the most probable input from most data sets.
In the figure below, we compare Escherichia coli (which is very common) with Prevotella brevis (which is moderately common) and with Methanosarcina semesiae (which is uncommon):
-
-In reality, the as.mo()
functions learns from its own output to speed up determinations for next times. In above figure, this effect was disabled to show the difference with the boxplot below - when you would use as.mo()
yourself:
-
+
+In reality, the as.mo()
functions learns from its own output to speed up determinations for next times. In above figure, this effect was disabled to show the difference with the boxplot below - when you would use as.mo()
yourself:
+
The highest outliers are the first times. All next determinations were done in only thousands of seconds.
Uncommon microorganisms take a lot more time than common microorganisms. To relieve this pitfall and further improve performance, two important calculations take almost no time at all: repetitive results and already precalculated results.
-
-Repetitive results
-Repetitive results are unique values that are present more than once. Unique values will only be calculated once by as.mo()
. We will use mo_name()
for this test - a helper function that returns the full microbial name (genus, species and possibly subspecies) which uses as.mo()
internally.
-library(dplyr)
+Repetitive results
+Repetitive results are unique values that are present more than once. Unique values will only be calculated once by as.mo()
. We will use mo_name()
for this test - a helper function that returns the full microbial name (genus, species and possibly subspecies) which uses as.mo()
internally.
+library(dplyr)
# take all MO codes from the example_isolates data set
x <- example_isolates$mo %>%
# keep only the unique ones
- unique() %>%
+ unique() %>%
# pick 50 of them at random
- sample(50) %>%
+ sample(50) %>%
# paste that 10,000 times
- rep(10000) %>%
+ rep(10000) %>%
# scramble it
- sample()
+ sample()
# got indeed 50 times 10,000 = half a million?
-length(x)
+length(x)
# [1] 500000
# and how many unique values do we have?
-n_distinct(x)
+n_distinct(x)
# [1] 50
-# now let's see:
-run_it <- microbenchmark(mo_name(x),
+# now let's see:
+run_it <- microbenchmark(mo_name(x),
times = 10)
-print(run_it, unit = "ms", signif = 3)
+print(run_it, unit = "ms", signif = 3)
# Unit: milliseconds
# expr min lq mean median uq max neval
-# mo_name(x) 598 639 656 657 671 735 10
+# mo_name(x) 626 639 663 658 682 731 10
So transforming 500,000 values (!!) of 50 unique values only takes 0.66 seconds (657 ms). You only lose time on your unique input values.
-
-Precalculated results
-What about precalculated results? If the input is an already precalculated result of a helper function like mo_name()
, it almost doesn’t take any time at all (see ‘C’ below):
-run_it <- microbenchmark(A = mo_name("B_STPHY_AURS"),
- B = mo_name("S. aureus"),
- C = mo_name("Staphylococcus aureus"),
+Precalculated results
+What about precalculated results? If the input is an already precalculated result of a helper function like mo_name()
, it almost doesn’t take any time at all (see ‘C’ below):
+run_it <- microbenchmark(A = mo_name("B_STPHY_AURS"),
+ B = mo_name("S. aureus"),
+ C = mo_name("Staphylococcus aureus"),
times = 10)
-print(run_it, unit = "ms", signif = 3)
+print(run_it, unit = "ms", signif = 3)
# Unit: milliseconds
-# expr min lq mean median uq max neval
-# A 6.150 6.340 9.110 6.370 6.400 33.700 10
-# B 22.000 22.200 22.900 22.300 22.400 28.300 10
-# C 0.691 0.784 0.783 0.795 0.802 0.814 10
-So going from mo_name("Staphylococcus aureus")
to "Staphylococcus aureus"
takes 0.0008 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:
-run_it <- microbenchmark(A = mo_species("aureus"),
- B = mo_genus("Staphylococcus"),
- C = mo_name("Staphylococcus aureus"),
- D = mo_family("Staphylococcaceae"),
- E = mo_order("Bacillales"),
- F = mo_class("Bacilli"),
- G = mo_phylum("Firmicutes"),
- H = mo_kingdom("Bacteria"),
+# expr min lq mean median uq max neval
+# A 6.260 6.320 9.050 6.360 6.810 32.40 10
+# B 22.800 23.000 23.800 23.200 23.900 28.20 10
+# C 0.709 0.813 0.836 0.843 0.854 0.96 10
+So going from mo_name("Staphylococcus aureus")
to "Staphylococcus aureus"
takes 0.0008 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:
+run_it <- microbenchmark(A = mo_species("aureus"),
+ B = mo_genus("Staphylococcus"),
+ C = mo_name("Staphylococcus aureus"),
+ D = mo_family("Staphylococcaceae"),
+ E = mo_order("Bacillales"),
+ F = mo_class("Bacilli"),
+ G = mo_phylum("Firmicutes"),
+ H = mo_kingdom("Bacteria"),
times = 10)
-print(run_it, unit = "ms", signif = 3)
+print(run_it, unit = "ms", signif = 3)
# Unit: milliseconds
# expr min lq mean median uq max neval
-# A 0.462 0.471 0.480 0.482 0.491 0.498 10
-# B 0.609 0.627 0.645 0.638 0.657 0.714 10
-# C 0.651 0.731 0.771 0.772 0.806 0.887 10
-# D 0.431 0.457 0.488 0.468 0.485 0.675 10
-# E 0.450 0.452 0.466 0.465 0.473 0.500 10
-# F 0.461 0.466 0.481 0.474 0.495 0.514 10
-# G 0.449 0.453 0.465 0.464 0.471 0.495 10
-# H 0.455 0.458 0.481 0.465 0.485 0.594 10
-Of course, when running mo_phylum("Firmicutes")
the function has zero knowledge about the actual microorganism, namely S. aureus. But since the result would be "Firmicutes"
too, there is no point in calculating the result. And because this package ‘knows’ all phyla of all known bacteria (according to the Catalogue of Life), it can just return the initial value immediately.
+# A 0.466 0.466 0.495 0.475 0.510 0.622 10
+# B 0.497 0.511 0.558 0.517 0.575 0.844 10
+# C 0.709 0.783 0.876 0.857 0.956 1.110 10
+# D 0.477 0.486 0.547 0.514 0.639 0.669 10
+# E 0.468 0.476 0.504 0.481 0.520 0.630 10
+# F 0.454 0.461 0.509 0.475 0.522 0.687 10
+# G 0.459 0.465 0.522 0.475 0.587 0.637 10
+# H 0.432 0.460 0.502 0.469 0.535 0.623 10
+Of course, when running mo_phylum("Firmicutes")
the function has zero knowledge about the actual microorganism, namely S. aureus. But since the result would be "Firmicutes"
too, there is no point in calculating the result. And because this package ‘knows’ all phyla of all known bacteria (according to the Catalogue of Life), it can just return the initial value immediately.
-
-Results in other languages
+Results in other languages
When the system language is non-English and supported by this AMR
package, some functions will have a translated result. This almost does’t take extra time:
-mo_name("CoNS", language = "en") # or just mo_name("CoNS") on an English system
-# [1] "Coagulase-negative Staphylococcus (CoNS)"
+mo_name("CoNS", language = "en") # or just mo_name("CoNS") on an English system
+# [1] "Coagulase-negative Staphylococcus (CoNS)"
-mo_name("CoNS", language = "es") # or just mo_name("CoNS") on a Spanish system
-# [1] "Staphylococcus coagulasa negativo (SCN)"
+mo_name("CoNS", language = "es") # or just mo_name("CoNS") on a Spanish system
+# [1] "Staphylococcus coagulasa negativo (SCN)"
-mo_name("CoNS", language = "nl") # or just mo_name("CoNS") on a Dutch system
-# [1] "Coagulase-negatieve Staphylococcus (CNS)"
+mo_name("CoNS", language = "nl") # or just mo_name("CoNS") on a Dutch system
+# [1] "Coagulase-negatieve Staphylococcus (CNS)"
-run_it <- microbenchmark(en = mo_name("CoNS", language = "en"),
- de = mo_name("CoNS", language = "de"),
- nl = mo_name("CoNS", language = "nl"),
- es = mo_name("CoNS", language = "es"),
- it = mo_name("CoNS", language = "it"),
- fr = mo_name("CoNS", language = "fr"),
- pt = mo_name("CoNS", language = "pt"),
+run_it <- microbenchmark(en = mo_name("CoNS", language = "en"),
+ de = mo_name("CoNS", language = "de"),
+ nl = mo_name("CoNS", language = "nl"),
+ es = mo_name("CoNS", language = "es"),
+ it = mo_name("CoNS", language = "it"),
+ fr = mo_name("CoNS", language = "fr"),
+ pt = mo_name("CoNS", language = "pt"),
times = 10)
-print(run_it, unit = "ms", signif = 4)
+print(run_it, unit = "ms", signif = 4)
# Unit: milliseconds
# expr min lq mean median uq max neval
-# en 17.93 18.18 19.34 18.76 19.02 26.27 10
-# de 19.44 19.63 22.03 19.80 20.23 41.83 10
-# nl 24.54 24.78 27.37 25.23 25.55 47.06 10
-# es 19.51 19.94 20.27 20.20 20.55 21.16 10
-# it 19.40 19.67 24.91 19.99 20.90 46.83 10
-# fr 19.24 19.45 22.53 19.80 20.17 47.71 10
-# pt 19.18 19.33 19.87 19.72 20.62 20.75 10
+# en 18.35 18.55 21.13 18.70 18.79 43.22 10
+# de 19.69 19.94 20.81 20.24 20.74 25.64 10
+# nl 25.28 25.42 28.05 25.55 26.59 48.83 10
+# es 19.77 19.95 22.83 20.29 20.76 46.03 10
+# it 19.81 19.88 20.19 20.13 20.55 20.85 10
+# fr 19.62 20.02 22.79 20.26 21.23 44.33 10
+# pt 20.05 20.37 23.19 20.67 21.46 44.96 10
Currently supported are German, Dutch, Spanish, Italian, French and Portuguese.
This package contains the complete taxonomic tree of almost all 70,000 microorganisms from the authoritative and comprehensive Catalogue of Life (CoL, www.catalogueoflife.org). With catalogue_of_life_version()
can be checked which version of the CoL is included in this package.
This package contains the complete taxonomic tree of almost all 70,000 microorganisms from the authoritative and comprehensive Catalogue of Life (CoL, www.catalogueoflife.org). With catalogue_of_life_version()
can be checked which version of the CoL is included in this package.
Read more about which data from the Catalogue of Life in our manual.
It cleanses existing data by providing new classes for microoganisms, antibiotics and antimicrobial results (both S/I/R and MIC). By installing this package, you teach R everything about microbiology that is needed for analysis. These functions all use intelligent rules to guess results that you would expect:
as.mo()
to get a microbial ID. The IDs are human readable for the trained eye - the ID of Klebsiella pneumoniae is “B_KLBSL_PNMN” (B stands for Bacteria) and the ID of S. aureus is “B_STPHY_AURS”. The function takes almost any text as input that looks like the name or code of a microorganism like “E. coli”, “esco” or “esccol” and tries to find expected results using intelligent rules combined with the included Catalogue of Life data set. It only takes milliseconds to find results, please see our benchmarks. Moreover, it can group Staphylococci into coagulase negative and positive (CoNS and CoPS, see source) and can categorise Streptococci into Lancefield groups (like beta-haemolytic Streptococcus Group B, source).as.ab()
to get an antibiotic ID. Like microbial IDs, these IDs are also human readable based on those used by EARS-Net. For example, the ID of amoxicillin is AMX
and the ID of gentamicin is GEN
. The as.ab()
function also uses intelligent rules to find results like accepting misspelling, trade names and abbrevations used in many laboratory systems. For instance, the values “Furabid”, “Furadantin”, “nitro” all return the ID of Nitrofurantoine. To accomplish this, the package contains a database with most LIS codes, official names, trade names, ATC codes, defined daily doses (DDD) and drug categories of antibiotics.as.rsi()
to get antibiotic interpretations based on raw MIC values (in mg/L) or disk diffusion values (in mm), or transform existing values to valid antimicrobial results. It produces just S, I or R based on your input and warns about invalid values. Even values like “<=0.002; S” (combined MIC/RSI) will result in “S”.as.mic()
to cleanse your MIC values. It produces a so-called factor (called ordinal in SPSS) with valid MIC values as levels. A value like “<=0.002; S” (combined MIC/RSI) will result in “<=0.002”.as.mo()
to get a microbial ID. The IDs are human readable for the trained eye - the ID of Klebsiella pneumoniae is “B_KLBSL_PNMN” (B stands for Bacteria) and the ID of S. aureus is “B_STPHY_AURS”. The function takes almost any text as input that looks like the name or code of a microorganism like “E. coli”, “esco” or “esccol” and tries to find expected results using intelligent rules combined with the included Catalogue of Life data set. It only takes milliseconds to find results, please see our benchmarks. Moreover, it can group Staphylococci into coagulase negative and positive (CoNS and CoPS, see source) and can categorise Streptococci into Lancefield groups (like beta-haemolytic Streptococcus Group B, source).as.ab()
to get an antibiotic ID. Like microbial IDs, these IDs are also human readable based on those used by EARS-Net. For example, the ID of amoxicillin is AMX
and the ID of gentamicin is GEN
. The as.ab()
function also uses intelligent rules to find results like accepting misspelling, trade names and abbrevations used in many laboratory systems. For instance, the values “Furabid”, “Furadantin”, “nitro” all return the ID of Nitrofurantoine. To accomplish this, the package contains a database with most LIS codes, official names, trade names, ATC codes, defined daily doses (DDD) and drug categories of antibiotics.as.rsi()
to get antibiotic interpretations based on raw MIC values (in mg/L) or disk diffusion values (in mm), or transform existing values to valid antimicrobial results. It produces just S, I or R based on your input and warns about invalid values. Even values like “<=0.002; S” (combined MIC/RSI) will result in “S”.as.mic()
to cleanse your MIC values. It produces a so-called factor (called ordinal in SPSS) with valid MIC values as levels. A value like “<=0.002; S” (combined MIC/RSI) will result in “<=0.002”.It enhances existing data and adds new data from data sets included in this package.
eucast_rules()
to apply EUCAST expert rules to isolates (not the translation from MIC to RSI values, use as.rsi()
for that).first_isolate()
to identify the first isolates of every patient using guidelines from the CLSI (Clinical and Laboratory Standards Institute).
+eucast_rules()
to apply EUCAST expert rules to isolates (not the translation from MIC to RSI values, use as.rsi()
for that).first_isolate()
to identify the first isolates of every patient using guidelines from the CLSI (Clinical and Laboratory Standards Institute).
mdro()
(abbreviation of Multi Drug Resistant Organisms) to check your isolates for exceptional resistance with country-specific guidelines or EUCAST rules. Currently, national guidelines for Germany and the Netherlands are supported.microorganisms
contains the complete taxonomic tree of ~70,000 microorganisms. Furthermore, some colloquial names and all Gram stains are available, which enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like mo_genus()
, mo_family()
, mo_gramstain()
or even mo_phylum()
. As they use as.mo()
internally, they also use the same intelligent rules for determination. For example, mo_genus("MRSA")
and mo_genus("S. aureus")
will both return "Staphylococcus"
. They also come with support for German, Dutch, Spanish, Italian, French and Portuguese. These functions can be used to add new variables to your data.antibiotics
contains ~450 antimicrobial drugs with their EARS-Net code, ATC code, PubChem compound ID, official name, common LIS codes and DDDs of both oral and parenteral administration. It also contains all (thousands of) trade names found in PubChem. The function ab_atc()
will return the ATC code of an antibiotic as defined by the WHO. Use functions like ab_name()
, ab_group()
and ab_tradenames()
to look up values. The ab_*
functions use as.ab()
internally so they support the same intelligent rules to guess the most probable result. For example, ab_name("Fluclox")
, ab_name("Floxapen")
and ab_name("J01CF05")
will all return "Flucloxacillin"
. These functions can again be used to add new variables to your data.mdro()
(abbreviation of Multi Drug Resistant Organisms) to check your isolates for exceptional resistance with country-specific guidelines or EUCAST rules. Currently, national guidelines for Germany and the Netherlands are supported.microorganisms
contains the complete taxonomic tree of ~70,000 microorganisms. Furthermore, some colloquial names and all Gram stains are available, which enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like mo_genus()
, mo_family()
, mo_gramstain()
or even mo_phylum()
. As they use as.mo()
internally, they also use the same intelligent rules for determination. For example, mo_genus("MRSA")
and mo_genus("S. aureus")
will both return "Staphylococcus"
. They also come with support for German, Dutch, Spanish, Italian, French and Portuguese. These functions can be used to add new variables to your data.antibiotics
contains ~450 antimicrobial drugs with their EARS-Net code, ATC code, PubChem compound ID, official name, common LIS codes and DDDs of both oral and parenteral administration. It also contains all (thousands of) trade names found in PubChem. The function ab_atc()
will return the ATC code of an antibiotic as defined by the WHO. Use functions like ab_name()
, ab_group()
and ab_tradenames()
to look up values. The ab_*
functions use as.ab()
internally so they support the same intelligent rules to guess the most probable result. For example, ab_name("Fluclox")
, ab_name("Floxapen")
and ab_name("J01CF05")
will all return "Flucloxacillin"
. These functions can again be used to add new variables to your data.It analyses the data with convenient functions that use well-known methods.
portion_R()
, portion_IR()
, portion_I()
, portion_SI()
and portion_S()
functions. Similarly, the number of isolates can be determined with the count_R()
, count_IR()
, count_I()
, count_SI()
and count_S()
functions. All these functions can be used with the dplyr
package (e.g. in conjunction with summarise()
)geom_rsi()
, a function made for the ggplot2
packageresistance_predict()
functionportion_R()
, portion_IR()
, portion_I()
, portion_SI()
and portion_S()
functions. Similarly, the number of isolates can be determined with the count_R()
, count_IR()
, count_I()
, count_SI()
and count_S()
functions. All these functions can be used with the dplyr
package (e.g. in conjunction with summarise()
)geom_rsi()
, a function made for the ggplot2
packageresistance_predict()
functionLast updated: 22-Sep-2019
+Last updated: 23-Sep-2019
Determination of first isolates now excludes all ‘unknown’ microorganisms at default, i.e. microbial code "UNKNOWN"
. They can be included with the new parameter include_unknown
:
"con"
(contamination) will be excluded at default, since as.mo("con") = "UNKNOWN"
. The function always shows a note with the number of ‘unknown’ microorganisms that were included or excluded.For code consistency, classes ab
and mo
will now be preserved in any subsetting or assignment. For the sake of data integrity, this means that invalid assignments will now result in NA
:
#> invalid factor level, NA generated
# how it now works similarly for classes 'mo' and 'ab':
-x <- as.mo("E. coli")
+x <- as.mo("E. coli")
x[1] <- "testvalue"
#> Warning message:
#> invalid microorganism code, NA generated
"testvalue"
could never be understood by e.g. mo_name()
, although the class would suggest a valid microbial code.
-Function freq()
has moved to a new package, clean
(CRAN link), since creating frequency tables actually does not fit the scope of this package. The freq()
function still works, since it is re-exported from the clean
package (which will be installed automatically upon updating this AMR
package).
"testvalue"
could never be understood by e.g. mo_name()
, although the class would suggest a valid microbial code.
+Function freq()
has moved to a new package, clean
(CRAN link), since creating frequency tables actually does not fit the scope of this package. The freq()
function still works, since it is re-exported from the clean
package (which will be installed automatically upon updating this AMR
package).
"testvalue"
could never be
New
Function bug_drug_combinations()
to quickly get a data.frame
with the antimicrobial resistance of any bug-drug combination in a data set:
x <- bug_drug_combinations(example_isolates)
-x
-# NOTE: Using column `mo` as input for `col_mo`.
-#> ab mo S I R total
-#> 1 AMC B_ESCHR_COLI 332 74 61 467
-#> 2 AMC B_KLBSL_PNMN 49 3 6 58
-#> 3 AMC B_PROTS_MRBL 28 7 1 36
-#> 4 AMC B_PSDMN_AERG 0 0 30 30
-#> 5 AMC B_STPHY_AURS 234 0 1 235
Function bug_drug_combinations()
to quickly get a data.frame
with the antimicrobial resistance of any bug-drug combination in a data set. The columns with microorganism codes is guessed automatically and its input is transformed with mo_shortname()
at default:
x <- bug_drug_combinations(example_isolates)
+# NOTE: Using column `mo` as input for `col_mo`.
+x[1:5, ]
+#> ab mo S I R total
+#> 1 AMC CoNS 178 0 132 310
+#> 2 AMC E. coli 332 74 61 467
+#> 3 AMC K. pneumoniae 49 3 6 58
+#> 4 AMC P. aeruginosa 0 0 30 30
+#> 5 AMC P. mirabilis 28 7 1 36
+
+# change the transformation with the FUN argument to anything you like:
+x <- bug_drug_combinations(example_isolates, FUN = mo_gramstain)
+# NOTE: Using column `mo` as input for `col_mo`.
+x[1:4, ]
+#> ab mo S I R total
+#> 1 AMC Gram-negative 469 89 174 732
+#> 2 AMC Gram-positive 873 2 272 1147
+#> 3 AMK Gram-negative 251 0 2 253
+#> 4 AMK Gram-positive 0 0 100 100
You can format this to a printable format, ready for reporting or exporting to e.g. Excel with the base R format()
function:
also_single_tested
w
as.mo()
(of which some led to additions to the microorganisms
data set). Many thanks to all contributors that helped improving the algorithms.
+as.mo()
(of which some led to additions to the microorganisms
data set). Many thanks to all contributors that helped improving the algorithms.
also_single_tested
w
as.mo()
on your old codes to transform them to the new format.B_ENTRC_FAE
could have been both E. faecalis and E. faecium. Its new code is B_ENTRC_FCLS
and E. faecium has become B_ENTRC_FACM
. Also, the Latin character æ (ae) is now preserved at the start of each genus and species abbreviation. For example, the old code for Aerococcus urinae was B_ARCCC_NAE
. This is now B_AERCC_URIN
. IMPORTANT: Old microorganism IDs are still supported, but support will be dropped in a future version. Use as.mo()
on your old codes to transform them to the new format. Using functions from the mo_*
family (like mo_name()
and mo_gramstain()
) on old codes, will throw a warning.septic_patients
to example_isolates
eucast_rules()
:
+eucast_rules()
:
eucast_rules(..., verbose = TRUE)
) returns more informative and readable outputeucast_rules(..., verbose = TRUE)
) returns more informative and readable outputAMR:::get_column_abx()
)atc
- using as.atc()
is now deprecated in favour of ab_atc()
and this will return a character, not the atc
class anymoreatc
- using as.atc()
is now deprecated in favour of ab_atc()
and this will return a character, not the atc
class anymoreabname()
, ab_official()
, atc_name()
, atc_official()
, atc_property()
, atc_tradenames()
, atc_trivial_nl()
mo_shortname()
+mo_shortname()
mo_*
functions where the coercion uncertainties and failures would not be available through mo_uncertainties()
and mo_failures()
anymorecountry
parameter of mdro()
in favour of the already existing guideline
parameter to support multiple guidelines within one countrymo_*
functions where the coercion uncertainties and failures would not be available through mo_uncertainties()
and mo_failures()
anymorecountry
parameter of mdro()
in favour of the already existing guideline
parameter to support multiple guidelines within one countryname
of RIF
is now Rifampicin instead of Rifampinantibiotics
data set is now sorted by name and all cephalosporins now have their generation between bracketsguess_ab_col()
which is now 30 times faster for antibiotic abbreviationsfilter_ab_class()
to be more reliable and to support 5th generation cephalosporinsavailability()
now uses portion_R()
instead of portion_IR()
, to comply with EUCAST insightsage()
and age_groups()
now have a na.rm
parameter to remove empty valuesp.symbol()
to p_symbol()
(the former is now deprecated and will be removed in a future version)x
in age_groups()
will now introduce NA
s and not return an error anymoreguess_ab_col()
which is now 30 times faster for antibiotic abbreviationsfilter_ab_class()
to be more reliable and to support 5th generation cephalosporinsavailability()
now uses portion_R()
instead of portion_IR()
, to comply with EUCAST insightsage()
and age_groups()
now have a na.rm
parameter to remove empty valuesp.symbol()
to p_symbol()
(the former is now deprecated and will be removed in a future version)x
in age_groups()
will now introduce NA
s and not return an error anymorekey_antibiotics()
on foreign systemskey_antibiotics()
on foreign systemsalso_single_tested
w
New
Function rsi_df()
to transform a data.frame
to a data set containing only the microbial interpretation (S, I, R), the antibiotic, the percentage of S/I/R and the number of available isolates. This is a convenient combination of the existing functions count_df()
and portion_df()
to immediately show resistance percentages and number of available isolates:
Function rsi_df()
to transform a data.frame
to a data set containing only the microbial interpretation (S, I, R), the antibiotic, the percentage of S/I/R and the number of available isolates. This is a convenient combination of the existing functions count_df()
and portion_df()
to immediately show resistance percentages and number of available isolates:
septic_patients %>%
select(AMX, CIP) %>%
- rsi_df()
+ rsi_df()
# antibiotic interpretation value isolates
# 1 Amoxicillin SI 0.4442636 546
# 2 Amoxicillin R 0.5557364 683
@@ -396,41 +406,41 @@ Since this is a major change, usage of the old also_single_tested
w
- UPEC (Uropathogenic E. coli)
All these lead to the microbial ID of E. coli:
-as.mo("UPEC")
+as.mo("UPEC")
# B_ESCHR_COL
-mo_name("UPEC")
+mo_name("UPEC")
# "Escherichia coli"
-mo_gramstain("EHEC")
+mo_gramstain("EHEC")
# "Gram-negative"
-- Function
mo_info()
as an analogy to ab_info()
. The mo_info()
prints a list with the full taxonomy, authors, and the URL to the online database of a microorganism
-Function mo_synonyms()
to get all previously accepted taxonomic names of a microorganism
+- Function
mo_info()
as an analogy to ab_info()
. The mo_info()
prints a list with the full taxonomy, authors, and the URL to the online database of a microorganism
+Function mo_synonyms()
to get all previously accepted taxonomic names of a microorganism
Changed
-- Column names of output
count_df()
and portion_df()
are now lowercase
+- Column names of output
count_df()
and portion_df()
are now lowercase
- Fixed bug in translation of microorganism names
- Fixed bug in determining taxonomic kingdoms
-- Algorithm improvements for
as.ab()
and as.mo()
to understand even more severely misspelled input
-- Function
as.ab()
now allows spaces for coercing antibiotics names
+- Algorithm improvements for
as.ab()
and as.mo()
to understand even more severely misspelled input
+- Function
as.ab()
now allows spaces for coercing antibiotics names
- Added
ggplot2
methods for automatically determining the scale type of classes mo
and ab
- Added names of object in the header in frequency tables, even when using pipes
-- Prevented
"bacteria"
from getting coerced by as.ab()
because Bacterial is a brand name of trimethoprim (TMP)
-- Fixed a bug where setting an antibiotic would not work for
eucast_rules()
and mdro()
+ - Prevented
"bacteria"
from getting coerced by as.ab()
because Bacterial is a brand name of trimethoprim (TMP)
+- Fixed a bug where setting an antibiotic would not work for
eucast_rules()
and mdro()
- Fixed a EUCAST rule for Staphylococci, where amikacin resistance would not be inferred from tobramycin
-- Removed
latest_annual_release
from the catalogue_of_life_version()
function
+- Removed
latest_annual_release
from the catalogue_of_life_version()
function
- Removed antibiotic code
PVM1
from the antibiotics
data set as this was a duplicate of PME
-- Fixed bug where not all old taxonomic names would be printed, when using a vector as input for
as.mo()
+ - Fixed bug where not all old taxonomic names would be printed, when using a vector as input for
as.mo()
- Manually added Trichomonas vaginalis from the kingdom of Protozoa, which is missing from the Catalogue of Life
- Small improvements to
plot()
and barplot()
for MIC and RSI classes
-- Allow Catalogue of Life IDs to be coerced by
as.mo()
+ - Allow Catalogue of Life IDs to be coerced by
as.mo()
@@ -450,18 +460,18 @@ Since this is a major change, usage of the old also_single_tested
w
New
-- Support for translation of disk diffusion and MIC values to RSI values (i.e. antimicrobial interpretations). Supported guidelines are EUCAST (2011 to 2019) and CLSI (2011 to 2019). Use
as.rsi()
on an MIC value (created with as.mic()
), a disk diffusion value (created with the new as.disk()
) or on a complete date set containing columns with MIC or disk diffusion values.
-- Function
mo_name()
as alias of mo_fullname()
+ - Support for translation of disk diffusion and MIC values to RSI values (i.e. antimicrobial interpretations). Supported guidelines are EUCAST (2011 to 2019) and CLSI (2011 to 2019). Use
as.rsi()
on an MIC value (created with as.mic()
), a disk diffusion value (created with the new as.disk()
) or on a complete date set containing columns with MIC or disk diffusion values.
+- Function
mo_name()
as alias of mo_fullname()
-- Added guidelines of the WHO to determine multi-drug resistance (MDR) for TB (
mdr_tb()
) and added a new vignette about MDR. Read this tutorial here on our website.
+- Added guidelines of the WHO to determine multi-drug resistance (MDR) for TB (
mdr_tb()
) and added a new vignette about MDR. Read this tutorial here on our website.
first_isolate()
where missing species would lead to incorrect FALSEs. This bug was not present in AMR v0.5.0, but was in v0.6.0 and v0.6.1.eucast_rules()
where antibiotics from WHONET software would not be recognisedfirst_isolate()
where missing species would lead to incorrect FALSEs. This bug was not present in AMR v0.5.0, but was in v0.6.0 and v0.6.1.eucast_rules()
where antibiotics from WHONET software would not be recognisedantibiotics
data set:
also_single_tested
w
Please create an issue in one of our repositories if you want additions in this file.ggplot_rsi()
:
+ggplot_rsi()
:
colours
to set the bar colourstitle
, subtitle
, caption
, x.title
and y.title
to set titles and axis descriptionsguess_ab_col()
+guess_ab_col()
microorganisms.old
data set, which leads to better results finding when using the as.mo()
functionportion_df()
and count_df()
this means that their new parameter combine_SI
is TRUE at default. Our plotting function ggplot_rsi()
also reflects this change since it uses count_df()
internally.age()
function gained a new parameter exact
to determine ages with decimalsmicroorganisms.old
data set, which leads to better results finding when using the as.mo()
functionportion_df()
and count_df()
this means that their new parameter combine_SI
is TRUE at default. Our plotting function ggplot_rsi()
also reflects this change since it uses count_df()
internally.age()
function gained a new parameter exact
to determine ages with decimalsguess_mo()
, guess_atc()
, EUCAST_rules()
, interpretive_reading()
, rsi()
freq()
):
+freq()
):
age_groups()
, to let groups of fives and tens end with 100+ instead of 120+freq()
for when all values are NA
+age_groups()
, to let groups of fives and tens end with 100+ instead of 120+freq()
for when all values are NA
first_isolate()
for when dates are missingguess_ab_col()
+first_isolate()
for when dates are missingguess_ab_col()
as.mo()
now gently interprets any number of whitespace characters (like tabs) as one spaceas.mo()
now returns UNKNOWN
for "con"
(WHONET ID of ‘contamination’) and returns NA
for "xxx"
(WHONET ID of ‘no growth’)as.mo()
+as.mo()
now gently interprets any number of whitespace characters (like tabs) as one spaceas.mo()
now returns UNKNOWN
for "con"
(WHONET ID of ‘contamination’) and returns NA
for "xxx"
(WHONET ID of ‘no growth’)as.mo()
microorganisms.codes
and cleaned it upFix for mo_shortname()
where species would not be determined correctly
Fix for mo_shortname()
where species would not be determined correctly
eucast_rules()
with verbose = TRUE
+eucast_rules()
with verbose = TRUE
as.mo()
to identify an MO code.as.mo()
to identify an MO code.
microorganisms
data set now contains:
mo
codes changed (e.g. Streptococcus changed from B_STRPTC
to B_STRPT
). A translation table is used internally to support older microorganism IDs, so users will not notice this difference.mo_rank()
for the taxonomic rank (genus, species, infraspecies, etc.)mo_url()
to get the direct URL of a species from the Catalogue of Lifemo_rank()
for the taxonomic rank (genus, species, infraspecies, etc.)mo_url()
to get the direct URL of a species from the Catalogue of Lifefirst_isolate()
and eucast_rules()
, all parameters will be filled in automatically.first_isolate()
and eucast_rules()
, all parameters will be filled in automatically.antibiotics
data set now contains a column ears_net
.as.mo()
now knows all WHONET species abbreviations too, because almost 2,000 microbial abbreviations were added to the microorganisms.codes
data set.as.mo()
now knows all WHONET species abbreviations too, because almost 2,000 microbial abbreviations were added to the microorganisms.codes
data set.New filters for antimicrobial classes. Use these functions to filter isolates on results in one of more antibiotics from a specific class:
-filter_aminoglycosides()
-filter_carbapenems()
-filter_cephalosporins()
-filter_1st_cephalosporins()
-filter_2nd_cephalosporins()
-filter_3rd_cephalosporins()
-filter_4th_cephalosporins()
-filter_fluoroquinolones()
-filter_glycopeptides()
-filter_macrolides()
-filter_tetracyclines()
filter_aminoglycosides()
+filter_carbapenems()
+filter_cephalosporins()
+filter_1st_cephalosporins()
+filter_2nd_cephalosporins()
+filter_3rd_cephalosporins()
+filter_4th_cephalosporins()
+filter_fluoroquinolones()
+filter_glycopeptides()
+filter_macrolides()
+filter_tetracyclines()
The antibiotics
data set will be searched, after which the input data will be checked for column names with a value in any abbreviations, codes or official names found in the antibiotics
data set. For example:
septic_patients %>% filter_glycopeptides(result = "R")
+septic_patients %>% filter_glycopeptides(result = "R")
# Filtering on glycopeptide antibacterials: any of `vanc` or `teic` is R
-septic_patients %>% filter_glycopeptides(result = "R", scope = "all")
+septic_patients %>% filter_glycopeptides(result = "R", scope = "all")
# Filtering on glycopeptide antibacterials: all of `vanc` and `teic` is R
@@ -618,33 +628,33 @@ This data is updated annually - check the included version with the new function
ab_certe -> atc_certe()
ab_umcg -> atc_umcg()
ab_tradenames -> atc_tradenames()
as.atc()
internally. The old atc_property
has been renamed atc_online_property()
. This is done for two reasons: firstly, not all ATC codes are of antibiotics (ab) but can also be of antivirals or antifungals. Secondly, the input must have class atc
or must be coerable to this class. Properties of these classes should start with the same class name, analogous to as.mo()
and e.g. mo_genus
.set_mo_source()
and get_mo_source()
to use your own predefined MO codes as input for as.mo()
and consequently all mo_*
functionsas.atc()
internally. The old atc_property
has been renamed atc_online_property()
. This is done for two reasons: firstly, not all ATC codes are of antibiotics (ab) but can also be of antivirals or antifungals. Secondly, the input must have class atc
or must be coerable to this class. Properties of these classes should start with the same class name, analogous to as.mo()
and e.g. mo_genus
.
+set_mo_source()
and get_mo_source()
to use your own predefined MO codes as input for as.mo()
and consequently all mo_*
functionsdplyr
version 0.8.0guess_ab_col()
to find an antibiotic column in a tablemo_failures()
to review values that could not be coerced to a valid MO code, using as.mo()
. This latter function will now only show a maximum of 10 uncoerced values and will refer to mo_failures()
.mo_uncertainties()
to review values that could be coerced to a valid MO code using as.mo()
, but with uncertainty.mo_renamed()
to get a list of all returned values from as.mo()
that have had taxonomic renamingage()
to calculate the (patients) age in yearsage_groups()
to split ages into custom or predefined groups (like children or elderly). This allows for easier demographic antimicrobial resistance analysis per age group.guess_ab_col()
to find an antibiotic column in a tablemo_failures()
to review values that could not be coerced to a valid MO code, using as.mo()
. This latter function will now only show a maximum of 10 uncoerced values and will refer to mo_failures()
.mo_uncertainties()
to review values that could be coerced to a valid MO code using as.mo()
, but with uncertainty.mo_renamed()
to get a list of all returned values from as.mo()
that have had taxonomic renamingage()
to calculate the (patients) age in yearsage_groups()
to split ages into custom or predefined groups (like children or elderly). This allows for easier demographic antimicrobial resistance analysis per age group.New function ggplot_rsi_predict()
as well as the base R plot()
function can now be used for resistance prediction calculated with resistance_predict()
:
x <- resistance_predict(septic_patients, col_ab = "amox")
+New function ggplot_rsi_predict()
as well as the base R plot()
function can now be used for resistance prediction calculated with resistance_predict()
:
+
+ggplot_rsi_predict(x)
Functions filter_first_isolate()
and filter_first_weighted_isolate()
to shorten and fasten filtering on data sets with antimicrobial results, e.g.:
septic_patients %>% filter_first_isolate(...)
+Functions filter_first_isolate()
and filter_first_weighted_isolate()
to shorten and fasten filtering on data sets with antimicrobial results, e.g.:
+
+filter_first_isolate(septic_patients, ...)
is equal to:
availability()
to check the number of available (non-empty) results in a data.frame
+availability()
to check the number of available (non-empty) results in a data.frame
New vignettes about how to conduct AMR analysis, predict antimicrobial resistance, use the G-test and more. These are also available (and even easier readable) on our website: https://msberends.gitlab.io/AMR.
as.atc()
internally. The old atc_property
Changed
-- Function
eucast_rules()
:
+ - Function
eucast_rules()
:
- Updated EUCAST Clinical breakpoints to version 9.0 of 1 January 2019, the data set
septic_patients
now reflects these changes
- Fixed a critical bug where some rules that depend on previous applied rules would not be applied adequately
- Emphasised in manual that penicillin is meant as benzylpenicillin (ATC J01CE01)
-- New info is returned when running this function, stating exactly what has been changed or added. Use
eucast_rules(..., verbose = TRUE)
to get a data set with all changed per bug and drug combination.
+- New info is returned when running this function, stating exactly what has been changed or added. Use
eucast_rules(..., verbose = TRUE)
to get a data set with all changed per bug and drug combination.
- Removed data sets
microorganisms.oldDT
, microorganisms.prevDT
, microorganisms.unprevDT
and microorganismsDT
since they were no longer needed and only contained info already available in the microorganisms
data set
- Added 65 antibiotics to the
antibiotics
data set, from the Pharmaceuticals Community Register of the European Commission
- Removed columns
atc_group1_nl
and atc_group2_nl
from the antibiotics
data set
-- Functions
atc_ddd()
and atc_groups()
have been renamed atc_online_ddd()
and atc_online_groups()
. The old functions are deprecated and will be removed in a future version.
-- Function
guess_mo()
is now deprecated in favour of as.mo()
and will be removed in future versions
-- Function
guess_atc()
is now deprecated in favour of as.atc()
and will be removed in future versions
-- Improvements for
as.mo()
:
+ - Functions
atc_ddd()
and atc_groups()
have been renamed atc_online_ddd()
and atc_online_groups()
. The old functions are deprecated and will be removed in a future version.
+- Function
guess_mo()
is now deprecated in favour of as.mo()
and will be removed in future versions
+- Function
guess_atc()
is now deprecated in favour of as.atc()
and will be removed in future versions
+- Improvements for
as.mo()
:
-
Now handles incorrect spelling, like i
instead of y
and f
instead of ph
:
-
-
Uncertainty of the algorithm is now divided into four levels, 0 to 3, where the default allow_uncertain = TRUE
is equal to uncertainty level 2. Run ?as.mo
for more info about these levels.
+Uncertainty of the algorithm is now divided into four levels, 0 to 3, where the default allow_uncertain = TRUE
is equal to uncertainty level 2. Run ?as.mo
for more info about these levels.
# equal:
-as.mo(..., allow_uncertain = TRUE)
-as.mo(..., allow_uncertain = 2)
+as.mo(..., allow_uncertain = TRUE)
+as.mo(..., allow_uncertain = 2)
# also equal:
-as.mo(..., allow_uncertain = FALSE)
-as.mo(..., allow_uncertain = 0)
-Using as.mo(..., allow_uncertain = 3)
could lead to very unreliable results.
+as.mo(..., allow_uncertain = FALSE)
+as.mo(..., allow_uncertain = 0)
as.mo(..., allow_uncertain = 3)
could lead to very unreliable results.
~/.Rhistory_mo
. Use the new function clean_mo_history()
to delete this file, which resets the algorithms.Incoercible results will now be considered ‘unknown’, MO code UNKNOWN
. On foreign systems, properties of these will be translated to all languages already previously supported: German, Dutch, French, Italian, Spanish and Portuguese:
mo_genus("qwerty", language = "es")
+mo_genus("qwerty", language = "es")
# Warning:
# one unique value (^= 100.0%) could not be coerced and is considered 'unknown': "qwerty". Use mo_failures() to review it.
#> [1] "(género desconocido)"
@@ -712,7 +722,7 @@ Using as.mo(..., allow_uncertain = 3)
could lead to very unreliable
Console will return the percentage of uncoercable input
-Function first_isolate()
:
+ Function first_isolate()
:
- Fixed a bug where distances between dates would not be calculated right - in the
septic_patients
data set this yielded a difference of 0.15% more isolates
- Will now use a column named like “patid” for the patient ID (parameter
col_patientid
), when this parameter was left blank
@@ -724,38 +734,38 @@ Using as.mo(..., allow_uncertain = 3)
could lead to very unreliable
- A note to the manual pages of the
portion
functions, that low counts can influence the outcome and that the portion
functions may camouflage this, since they only return the portion (albeit being dependent on the minimum
parameter)
- Merged data sets
microorganisms.certe
and microorganisms.umcg
into microorganisms.codes
-- Function
mo_taxonomy()
now contains the kingdom too
-- Reduce false positives for
is.rsi.eligible()
using the new threshold
parameter
-- New colours for
scale_rsi_colours()
+ - Function
mo_taxonomy()
now contains the kingdom too
+- Reduce false positives for
is.rsi.eligible()
using the new threshold
parameter
+- New colours for
scale_rsi_colours()
- Summaries of class
mo
will now return the top 3 and the unique count, e.g. using summary(mo)
- Small text updates to summaries of class
rsi
and mic
-- Function
as.rsi()
:
+ - Function
as.rsi()
:
- Now gives a warning when inputting MIC values
- Now accepts high and low resistance:
"HIGH S"
will return S
-- Frequency tables (
freq()
function):
+ - Frequency tables (
freq()
function):
-
Support for tidyverse quasiquotation! Now you can create frequency tables of function outcomes:
# Determine genus of microorganisms (mo) in `septic_patients` data set:
# OLD WAY
septic_patients %>%
- mutate(genus = mo_genus(mo)) %>%
- freq(genus)
+ mutate(genus = mo_genus(mo)) %>%
+ freq(genus)
# NEW WAY
septic_patients %>%
- freq(mo_genus(mo))
+ freq(mo_genus(mo))
# Even supports grouping variables:
septic_patients %>%
group_by(gender) %>%
- freq(mo_genus(mo))
+ freq(mo_genus(mo))
header
functionheader
is now set to TRUE
at default, even for markdownas.mo(..., allow_uncertain = 3)
could lead to very unreliable
select()
on frequency tablesscale_y_percent()
now contains the limits
parametermdro()
, key_antibiotics()
and eucast_rules()
+scale_y_percent()
now contains the limits
parametermdro()
, key_antibiotics()
and eucast_rules()
resistance_predict()
function)as.mic()
to support more values ending in (several) zeroesresistance_predict()
function)as.mic()
to support more values ending in (several) zeroes%like%
, it will now return the callas.mo(..., allow_uncertain = 3)
could lead to very unreliable
as.mo
will return NAFunction as.mo
(and all mo_*
wrappers) now supports genus abbreviations with “species” attached
combine_IR
(TRUE/FALSE) to functions portion_df
and count_df
, to indicate that all values of I and R must be merged into one, so the output only consists of S vs. IR (susceptible vs. non-susceptible)portion_*(..., as_percent = TRUE)
when minimal number of isolates would not be metalso_single_tested
for portion_*
and count_*
functions to also include cases where not all antibiotics were tested but at least one of the tested antibiotics includes the target antimicribial interpretation, see ?portion
+also_single_tested
for portion_*
and count_*
functions to also include cases where not all antibiotics were tested but at least one of the tested antibiotics includes the target antimicribial interpretation, see ?portion
portion_*
functions now throws a warning when total available isolate is below parameter minimum
as.mo
, as.rsi
, as.mic
, as.atc
and freq
will not set package name as attribute anymorefreq()
:
+freq()
:
Support for grouping variables, test with:
+ freq(gender)Support for (un)selecting columns:
hms::is.hms
@@ -934,16 +944,16 @@ Using as.mo(..., allow_uncertain = 3)
could lead to very unreliable
They also come with support for German, Dutch, French, Italian, Spanish and Portuguese:
-mo_gramstain("E. coli")
+mo_gramstain("E. coli")
# [1] "Gram negative"
-mo_gramstain("E. coli", language = "de") # German
+mo_gramstain("E. coli", language = "de") # German
# [1] "Gramnegativ"
-mo_gramstain("E. coli", language = "es") # Spanish
+mo_gramstain("E. coli", language = "es") # Spanish
# [1] "Gram negativo"
-mo_fullname("S. group A", language = "pt") # Portuguese
+mo_fullname("S. group A", language = "pt") # Portuguese
# [1] "Streptococcus grupo A"
Furthermore, former taxonomic names will give a note about the current taxonomic name:
-mo_gramstain("Esc blattae")
+mo_gramstain("Esc blattae")
# Note: 'Escherichia blattae' (Burgess et al., 1973) was renamed 'Shimwellia blattae' (Priest and Barker, 2010)
# [1] "Gram negative"
@@ -956,15 +966,15 @@ Using as.mo(..., allow_uncertain = 3)
could lead to very unreliable
Functions as.mo
and is.mo
as replacements for as.bactid
and is.bactid
(since the microoganisms
data set not only contains bacteria). These last two functions are deprecated and will be removed in a future release. The as.mo
function determines microbial IDs using intelligent rules:
-as.mo("E. coli")
+as.mo("E. coli")
# [1] B_ESCHR_COL
-as.mo("MRSA")
+as.mo("MRSA")
# [1] B_STPHY_AUR
-as.mo("S group A")
+as.mo("S group A")
# [1] B_STRPTC_GRA
And with great speed too - on a quite regular Linux server from 2007 it takes us less than 0.02 seconds to transform 25,000 items:
thousands_of_E_colis <- rep("E. coli", 25000)
-microbenchmark::microbenchmark(as.mo(thousands_of_E_colis), unit = "s")
+microbenchmark::microbenchmark(as.mo(thousands_of_E_colis), unit = "s")
# Unit: seconds
# min median max neval
# 0.01817717 0.01843957 0.03878077 100
@@ -997,9 +1007,9 @@ Using as.mo(..., allow_uncertain = 3)
could lead to very unreliable
Added 163 trade names to the antibiotics
data set, it now contains 298 different trade names in total, e.g.:
ab_official("Bactroban")
# [1] "Mupirocin"
-ab_name(c("Bactroban", "Amoxil", "Zithromax", "Floxapen"))
+ab_name(c("Bactroban", "Amoxil", "Zithromax", "Floxapen"))
# [1] "Mupirocin" "Amoxicillin" "Azithromycin" "Flucloxacillin"
-ab_atc(c("Bactroban", "Amoxil", "Zithromax", "Floxapen"))
+ab_atc(c("Bactroban", "Amoxil", "Zithromax", "Floxapen"))
# [1] "R01AX06" "J01CA04" "J01FA10" "J01CF05"
For first_isolate
, rows will be ignored when there’s no species available
@@ -1011,13 +1021,13 @@ Using as.mo(..., allow_uncertain = 3)
could lead to very unreliable
Support for quasiquotation in the functions series count_*
and portions_*
, and n_rsi
. This allows to check for more than 2 vectors or columns.
-septic_patients %>% select(amox, cipr) %>% count_IR()
+septic_patients %>% select(amox, cipr) %>% count_IR()
# which is the same as:
-septic_patients %>% count_IR(amox, cipr)
+septic_patients %>% count_IR(amox, cipr)
-septic_patients %>% portion_S(amcl)
-septic_patients %>% portion_S(amcl, gent)
-septic_patients %>% portion_S(amcl, gent, pita)
+septic_patients %>% portion_S(amcl)
+septic_patients %>% portion_S(amcl, gent)
+septic_patients %>% portion_S(amcl, gent, pita)
Edited ggplot_rsi
and geom_rsi
so they can cope with count_df
. The new fun
parameter has value portion_df
at default, but can be set to count_df
.
Fix for ggplot_rsi
when the ggplot2
package was not loaded
@@ -1032,11 +1042,11 @@ Using as.mo(..., allow_uncertain = 3)
could lead to very unreliable
Support for types (classes) list and matrix for freq
+freq(my_matrix)
For lists, subsetting is possible:
my_list = list(age = septic_patients$age, gender = septic_patients$gender)
-my_list %>% freq(age)
-my_list %>% freq(gender)
+my_list %>% freq(age)
+my_list %>% freq(gender)
as.mo(..., allow_uncertain = 3)
could lead to very unreliable
septic_patients %>% select(tobr, gent) %>% ggplot_rsi
will show portions of S, I and R immediately in a pretty plot?ggplot_rsi
+?ggplot_rsi
as.mo(..., allow_uncertain = 3)
could lead to very unreliable
rsi
(antimicrobial resistance) to use as inputtable
to use as input: freq(table(x, y))
+table
to use as input: freq(table(x, y))
hist
and plot
to use a frequency table as input: hist(freq(df$age))
as.vector
, as.data.frame
, as_tibble
and format
freq(mydata, mycolumn)
is the same as mydata %>% freq(mycolumn)
+freq(mydata, mycolumn)
is the same as mydata %>% freq(mycolumn)
top_freq
function to return the top/below n items as vectoras.mo(..., allow_uncertain = 3)
could lead to very unreliable
rsi
and mic
functions:
as.rsi("<=0.002; S")
will return S
+as.rsi("<=0.002; S")
will return S
as.mic("<=0.002; S")
will return <=0.002
+as.mic("<=0.002; S")
will return <=0.002
as.mic("<= 0.002")
now worksas.mic("<= 0.002")
now worksrsi
and mic
do not add the attribute package.version
anymore"groups"
option for atc_property(..., property)
. It will return a vector of the ATC hierarchy as defined by the WHO. The new function atc_groups
is a convenient wrapper around this.atc_property
as it requires the host set by url
to be responsiveas.mo(..., allow_uncertain = 3)
could lead to very unreliable
BRMO
and MRGN
are wrappers for Dutch and German guidelines, respectively"points"
or "keyantibiotics"
, see ?first_isolate
+"points"
or "keyantibiotics"
, see ?first_isolate
tibble
s and data.table
sas.mo(..., allow_uncertain = 3)
could lead to very unreliable
@@ -305,41 +305,35 @@ A microorganism ID from this package (class: mo
) typically looks li
Self-learning algoritm
The as.mo()
function gains experience from previously determined microorganism IDs and learns from it. This drastically improves both speed and reliability. Use clear_mo_history()
to reset the algorithms. Only experience from your current AMR
package version is used. This is done because in the future the taxonomic tree (which is included in this package) may change for any organism and it consequently has to rebuild its knowledge.
Usually, any guess after the first try runs 80-95% faster than the first try.
-Intelligent rules
-This function uses intelligent rules to help getting fast and logical results. It tries to find matches in this order:
Valid MO codes and full names: it first searches in already valid MO code and known genus/species combinations
Human pathogenic prevalence: it first searches in more prevalent microorganisms, then less prevalent ones (see Microbial prevalence of pathogens in humans below)
Taxonomic kingdom: it first searches in Bacteria, then Fungi, then Protozoa, then Archaea, then others
Breakdown of input values: from here it starts to breakdown input values to find possible matches
This resets with every update of this AMR
package since results are saved to your local package library folder.
Intelligent rules
+The as.mo()
function uses several coercion rules for fast and logical results. It assesses the input matching criteria in the following order:
Human pathogenic prevalence: the function starts with more prevalent microorganisms, followed by less prevalent ones;
Taxonomic kingdom: the function starts with determining Bacteria, then Fungi, then Protozoa, then others;
Breakdown of input values to identify possible matches.
A couple of effects because of these rules:
"E. coli"
will return the ID of Escherichia coli and not Entamoeba coli, although the latter would alphabetically come first
"H. influenzae"
will return the ID of Haemophilus influenzae and not Haematobacter influenzae for the same reason
Something like "stau"
or "S aur"
will return the ID of Staphylococcus aureus and not Staphylococcus auricularis
This means that looking up human pathogenic microorganisms takes less time than looking up human non-pathogenic microorganisms.
-Uncertain results
-The algorithm can additionally use three different levels of uncertainty to guess valid results. The default is allow_uncertain = TRUE
, which is equal to uncertainty level 2. Using allow_uncertain = FALSE
will skip all of these additional rules:
(uncertainty level 1): It tries to look for only matching genera, previously accepted (but now invalid) taxonomic names and misspelled input
(uncertainty level 2): It removed parts between brackets, strips off words from the end one by one and re-evaluates the input with all previous rules
(uncertainty level 3): It strips off words from the start one by one and tries any part of the name
This will lead to the effect that e.g. "E. coli"
(a highly prevalent microorganism found in humans) will return the microbial ID of Escherichia coli and not Entamoeba coli (a less prevalent microorganism in humans), although the latter would alphabetically come first. In addition, the as.mo()
function can differentiate four levels of uncertainty to guess valid results:
Uncertainty level 0: no additional rules are applied;
Uncertainty level 1: allow previously accepted (but now invalid) taxonomic names and minor spelling errors;
Uncertainty level 2: allow all of level 1, strip values between brackets, inverse the words of the input, strip off text elements from the end keeping at least two elements;
Uncertainty level 3: allow all of level 1 and 2, strip off text elements from the end, allow any part of a taxonomic name.
You can also use e.g. as.mo(..., allow_uncertain = 1)
to only allow up to level 1 uncertainty.
Examples:
This leads to e.g.:
+"Streptococcus group B (known as S. agalactiae)"
. The text between brackets will be removed and a warning will be thrown that the result Streptococcus group B (B_STRPT_GRPB
) needs review.
"S. aureus - please mind: MRSA"
. The last word will be stripped, after which the function will try to find a match. If it does not, the second last word will be stripped, etc. Again, a warning will be thrown that the result Staphylococcus aureus (B_STPHY_AUR
) needs review.
"Fluoroquinolone-resistant Neisseria gonorrhoeae"
. The first word will be stripped, after which the function will try to find a match. A warning will be thrown that the result Neisseria gonorrhoeae (B_NESSR_GON
) needs review.
"S. aureus - please mind: MRSA"
. The last word will be stripped, after which the function will try to find a match. If it does not, the second last word will be stripped, etc. Again, a warning will be thrown that the result Staphylococcus aureus (B_STPHY_AURS
) needs review.
"Fluoroquinolone-resistant Neisseria gonorrhoeae"
. The first word will be stripped, after which the function will try to find a match. A warning will be thrown that the result Neisseria gonorrhoeae (B_NESSR_GNRR
) needs review.
Use mo_failures()
to get a vector with all values that could not be coerced to a valid value.
Use mo_uncertainties()
to get a data.frame with all values that were coerced to a valid value, but with uncertainty.
Use mo_renamed()
to get a data.frame with all values that could be coerced based on an old, previously accepted taxonomic name.
The level of uncertainty can be set using the argument allow_uncertain
. The default is allow_uncertain = TRUE
, which is equal to uncertainty level 2. Using allow_uncertain = FALSE
is equal to uncertainty level 0 and will skip all rules. You can also use e.g. as.mo(..., allow_uncertain = 1)
to only allow up to level 1 uncertainty.
Use mo_failures()
to get a vector with all values that could not be coerced to a valid value.
+Use mo_uncertainties()
to get a data.frame
with all values that were coerced to a valid value, but with uncertainty.
+Use mo_renamed()
to get a data.frame
with all values that could be coerced based on an old, previously accepted taxonomic name.
Microbial prevalence of pathogens in humans
-The intelligent rules take into account microbial prevalence of pathogens in humans. It uses three groups and all (sub)species are in only one group. These groups are:
1 (most prevalent): class is Gammaproteobacteria or genus is one of: Enterococcus, Staphylococcus, Streptococcus.
2: phylum is one of: Proteobacteria, Firmicutes, Actinobacteria, Sarcomastigophora or genus is one of: Aspergillus, Bacteroides, Candida, Capnocytophaga, Chryseobacterium, Cryptococcus, Elisabethkingia, Flavobacterium, Fusobacterium, Giardia, Leptotrichia, Mycoplasma, Prevotella, Rhodotorula, Treponema, Trichophyton, Ureaplasma.
3 (least prevalent): all others.
Group 1 contains all common Gram positives and Gram negatives, like all Enterobacteriaceae and e.g. Pseudomonas and Legionella.
-Group 2 contains probably less pathogenic microorganisms; all other members of phyla that were found in humans in the Northern Netherlands between 2001 and 2018.
+The intelligent rules consider the prevalence of microorganisms in humans grouped into three groups, which is available as theprevalence
columns in the microorganisms
and microorganisms.old
data sets. The grouping into prevalence groups is based on experience from several microbiological laboratories in the Netherlands in conjunction with international reports on pathogen prevalence.
+Group 1 (most prevalent microorganisms) consists of all microorganisms where the taxonomic class is Gammaproteobacteria or where the taxonomic genus is Enterococcus, Staphylococcus or Streptococcus. This group consequently contains all common Gram-negative bacteria, such as Pseudomonas and Legionella and all species within the order Enterobacteriales.
+Group 2 consists of all microorganisms where the taxonomic phylum is Proteobacteria, Firmicutes, Actinobacteria or Sarcomastigophora, or where the taxonomic genus is Aspergillus, Bacteroides, Candida, Capnocytophaga, Chryseobacterium, Cryptococcus, Elisabethkingia, Flavobacterium, Fusobacterium, Giardia, Leptotrichia, Mycoplasma, Prevotella, Rhodotorula, Treponema, Trichophyton or Ureaplasma.
+Group 3 (least prevalent microorganisms) consists of all other microorganisms.
bug_drug_combinations(x, col_mo = NULL, minimum = 30) +bug_drug_combinations(x, col_mo = NULL, minimum = 30, + FUN = mo_shortname, ...) # S3 method for bug_drug_combinations format(x, combine_IR = FALSE, - add_ab_group = TRUE, ...)+ add_ab_group = TRUE, decimal.mark = getOption("OutDec"), + big.mark = ifelse(decimal.mark == ",", ".", ","))
minimum | the minimum allowed number of available (tested) isolates. Any isolate count lower than |
+ ||
---|---|---|---|
FUN | +the function to call on the |
+ ||
... | +argumments passed on to |
+ ||
combine_IR | logical to indicate whether values R and I should be summed |
@@ -264,8 +274,15 @@
logical to indicate where the group of the antimicrobials must be included as a first column |
|
... | -argumments passed on to |
+ decimal.mark | +the character to be used to indicate the numeric + decimal point. |
+
big.mark | +character; if not empty used as mark between every
+ |
The function format
calculates the resistance per bug-drug combination. Use combine_IR = FALSE
(default) to test R vs. S+I and combine_IR = TRUE
to test R+I vs. S.
The language of the output can be overwritten with options(AMR_locale)
, please see translate.
The group
column in antibiotics
data set will be searched for ab_class
(case-insensitive). If no results are found, the atc_group1
and atc_group2
columns will be searched. Next, x
will be checked for column names with a value in any abbreviations, codes or official names found in the antibiotics
data set.
The group
column in antibiotics
data set will be searched for ab_class
(case-insensitive). If no results are found, the atc_group1
and atc_group2
columns will be searched. Next, x
will be checked for column names with a value in any abbreviations, codes or official names found in the antibiotics
data set.