diff --git a/DESCRIPTION b/DESCRIPTION index ff87372b..6635ea05 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR -Version: 0.8.0.9029 -Date: 2019-11-10 +Version: 0.8.0.9030 +Date: 2019-11-11 Title: Antimicrobial Resistance Analysis Authors@R: c( person(role = c("aut", "cre"), @@ -47,7 +47,7 @@ Imports: microbenchmark, pillar, rlang (>= 0.3.1), - tidyr (>= 0.7.0) + tidyr (>= 1.0.0) Suggests: covr (>= 3.0.1), curl, diff --git a/NAMESPACE b/NAMESPACE index 9e49e241..d6109758 100755 --- a/NAMESPACE +++ b/NAMESPACE @@ -321,8 +321,8 @@ importFrom(stats,glm) importFrom(stats,lm) importFrom(stats,pchisq) importFrom(stats,predict) -importFrom(tidyr,gather) -importFrom(tidyr,spread) +importFrom(tidyr,pivot_longer) +importFrom(tidyr,pivot_wider) importFrom(utils,browseURL) importFrom(utils,menu) importFrom(utils,read.csv) diff --git a/NEWS.md b/NEWS.md index b3afc516..dca3b445 100755 --- a/NEWS.md +++ b/NEWS.md @@ -1,8 +1,16 @@ -# AMR 0.8.0.9029 -Last updated: 10-Nov-2019 +# AMR 0.8.0.9030 +Last updated: 11-Nov-2019 ### New -* Functions `susceptibility()` and `resistance()` as aliases of `proportion_SI()` and `proportion_R()`, respectively. These functions were added to make it more clear that I should be considered susceptible and not resistant. +* Functions `susceptibility()` and `resistance()` as aliases of `proportion_SI()` and `proportion_R()`, respectively. These functions were added to make it more clear that "I" should be considered susceptible and not resistant. + ```r + library(dplyr) + example_isolates %>% + group_by(bug = mo_name(mo)) %>% + summarise(amoxicillin = resistance(AMX), + amox_clav = resistance(AMC)) %>% + filter(!is.na(amoxicillin) | !is.na(amox_clav)) + ``` * Support for a new MDRO guideline: Magiorakos AP, Srinivasan A *et al.* "Multidrug-resistant, extensively drug-resistant and pandrug-resistant bacteria: an international expert proposal for interim standard definitions for acquired resistance." Clinical Microbiology and Infection (2012). * This is now the new default guideline for the `mdro()` function * The new Verbose mode (`mdro(...., verbose = TRUE)`) returns an informative data set where the reason for MDRO determination is given for every isolate, and an list of the resistant antimicrobial agents diff --git a/R/bug_drug_combinations.R b/R/bug_drug_combinations.R index bcdd3728..41c19b18 100644 --- a/R/bug_drug_combinations.R +++ b/R/bug_drug_combinations.R @@ -32,7 +32,7 @@ #' @inheritParams rsi_df #' @inheritParams base::formatC #' @importFrom dplyr %>% rename group_by select mutate filter summarise ungroup -#' @importFrom tidyr spread +#' @importFrom tidyr pivot_longer #' @details The function \code{format} calculates the resistance per bug-drug combination. Use \code{combine_IR = FALSE} (default) to test R vs. S+I and \code{combine_IR = TRUE} to test R+I vs. S. #' #' The language of the output can be overwritten with \code{options(AMR_locale)}, please see \link{translate}. @@ -80,7 +80,7 @@ bug_drug_combinations <- function(x, FUN(...)) %>% group_by(mo) %>% select_if(is.rsi) %>% - gather("ab", "value", -mo) %>% + pivot_longer(-mo, names_to = "ab") %>% group_by(mo, ab) %>% summarise(S = sum(value == "S", na.rm = TRUE), I = sum(value == "I", na.rm = TRUE), @@ -93,7 +93,7 @@ bug_drug_combinations <- function(x, } #' @importFrom dplyr everything rename %>% ungroup group_by summarise mutate_all arrange everything lag -#' @importFrom tidyr spread +#' @importFrom tidyr pivot_wider #' @importFrom cleaner percentage #' @exportMethod format.bug_drug_combinations #' @export @@ -135,7 +135,7 @@ format.bug_drug_combinations <- function(x, } ab_txt } - + y <- x %>% mutate(ab = as.ab(ab), ab_txt = give_ab_name(ab = ab, format = translate_ab, language = language)) %>% @@ -146,8 +146,9 @@ format.bug_drug_combinations <- function(x, mutate(txt = paste0(percentage(isolates / total, decimal.mark = decimal.mark, big.mark = big.mark), " (", trimws(format(isolates, big.mark = big.mark)), "/", trimws(format(total, big.mark = big.mark)), ")")) %>% - select(ab, ab_txt, mo, txt) %>% - spread(mo, txt) %>% + select(ab, ab_txt, mo, txt) %>% + arrange(mo) %>% + pivot_wider(names_from = mo, values_from = txt) %>% mutate_all(~ifelse(is.na(.), "", .)) %>% mutate(ab_group = ab_group(ab, language = language), ab_txt) %>% diff --git a/R/resistance_predict.R b/R/resistance_predict.R index b8c9faad..a65deb83 100755 --- a/R/resistance_predict.R +++ b/R/resistance_predict.R @@ -29,12 +29,13 @@ #' @param year_every unit of sequence between lowest year found in the data and \code{year_max} #' @param minimum minimal amount of available isolates per year to include. Years containing less observations will be estimated by the model. #' @param model the statistical model of choice. This could be a generalised linear regression model with binomial distribution (i.e. using \code{\link{glm}(..., family = \link{binomial})}), assuming that a period of zero resistance was followed by a period of increasing resistance leading slowly to more and more resistance. See Details for all valid options. -#' @param I_as_S a logical to indicate whether values \code{I} should be treated as \code{S} (will otherwise be treated as \code{R}) +#' @param I_as_S a logical to indicate whether values \code{I} should be treated as \code{S} (will otherwise be treated as \code{R}). The default, \code{TRUE}, follows the redefinition by EUCAST about the interpretion of I (increased exposure) in 2019, see section 'Interpretation of S, I and R' below. #' @param preserve_measurements a logical to indicate whether predictions of years that are actually available in the data should be overwritten by the original data. The standard errors of those years will be \code{NA}. #' @param info a logical to indicate whether textual analysis should be printed with the name and \code{\link{summary}} of the statistical model. #' @param main title of the plot #' @param ribbon a logical to indicate whether a ribbon should be shown (default) or error bars #' @param ... parameters passed on to functions +#' @inheritSection as.rsi Interpretation of S, I and R #' @inheritParams first_isolate #' @inheritParams graphics::plot #' @details Valid options for the statistical model are: @@ -59,6 +60,7 @@ #' @export #' @importFrom stats predict glm lm #' @importFrom dplyr %>% pull mutate mutate_at n group_by_at summarise filter filter_at all_vars n_distinct arrange case_when n_groups transmute ungroup +#' @importFrom tidyr pivot_wider #' @inheritSection AMR Read more on our website! #' @examples #' x <- resistance_predict(example_isolates, col_ab = "AMX", year_min = 2010, model = "binomial") @@ -161,6 +163,7 @@ resistance_predict <- function(x, } year <- function(x) { + # don't depend on lubridate or so, would be overkill for only this function if (all(grepl("^[0-9]{4}$", x))) { x } else { @@ -192,9 +195,12 @@ resistance_predict <- function(x, } colnames(df) <- c("year", "antibiotic", "observations") + df <- df %>% filter(!is.na(antibiotic)) %>% - tidyr::spread(antibiotic, observations, fill = 0) %>% + pivot_wider(names_from = antibiotic, + values_from = observations, + values_fill = list(observations = 0)) %>% filter((R + S) >= minimum) df_matrix <- df %>% ungroup() %>% diff --git a/R/rsi_calc.R b/R/rsi_calc.R index 46750806..8623f774 100755 --- a/R/rsi_calc.R +++ b/R/rsi_calc.R @@ -167,8 +167,8 @@ rsi_calc <- function(..., } } -#' @importFrom dplyr %>% summarise_if mutate select everything bind_rows -#' @importFrom tidyr gather +#' @importFrom dplyr %>% summarise_if mutate select everything bind_rows arrange +#' @importFrom tidyr pivot_longer rsi_calc_df <- function(type, # "proportion" or "count" data, translate_ab = "name", @@ -247,12 +247,13 @@ rsi_calc_df <- function(type, # "proportion" or "count" } res <- res %>% - gather(antibiotic, value, -interpretation, -data.groups) %>% - select(antibiotic, everything()) + pivot_longer(-c(interpretation, data.groups), names_to = "antibiotic") %>% + select(antibiotic, everything()) %>% + arrange(antibiotic, interpretation) if (!translate_ab == FALSE) { res <- res %>% mutate(antibiotic = AMR::ab_property(antibiotic, property = translate_ab, language = language)) } - res + as.data.frame(res, stringsAsFactors = FALSE) } diff --git a/docs/404.html b/docs/404.html index dea8d387..5927d9a0 100644 --- a/docs/404.html +++ b/docs/404.html @@ -84,7 +84,7 @@
diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index 1e502d69..ba6220da 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -84,7 +84,7 @@ diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index b9b1a724..c6d9730e 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -41,7 +41,7 @@ @@ -187,7 +187,7 @@AMR.Rmd
Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 10 November 2019.
+Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 11 November 2019.
Now, let’s start the cleaning and the analysis!
@@ -406,8 +406,8 @@ # # Item Count Percent Cum. Count Cum. Percent # --- ----- ------- -------- ----------- ------------- -# 1 M 10,417 52.09% 10,417 52.09% -# 2 F 9,583 47.92% 20,000 100.00% +# 1 M 10,427 52.14% 10,427 52.14% +# 2 F 9,573 47.87% 20,000 100.00%So, we can draw at least two conclusions immediately. From a data scientists perspective, the data looks clean: only values M
and F
. From a researchers perspective: there are slightly more men. Nothing we didn’t already know.
The data is already quite clean, but we still need to transform some variables. The bacteria
column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate()
function of the dplyr
package makes this really easy:
data <- data %>%
@@ -437,14 +437,14 @@
# Pasteurella multocida (no changes)
# Staphylococcus (no changes)
# Streptococcus groups A, B, C, G (no changes)
-# Streptococcus pneumoniae (1,545 values changed)
+# Streptococcus pneumoniae (1,552 values changed)
# Viridans group streptococci (no changes)
#
# EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
-# Table 01: Intrinsic resistance in Enterobacteriaceae (1,309 values changed)
+# Table 01: Intrinsic resistance in Enterobacteriaceae (1,279 values changed)
# Table 02: Intrinsic resistance in non-fermentative Gram-negative bacteria (no changes)
# Table 03: Intrinsic resistance in other Gram-negative bacteria (no changes)
-# Table 04: Intrinsic resistance in Gram-positive bacteria (2,733 values changed)
+# Table 04: Intrinsic resistance in Gram-positive bacteria (2,800 values changed)
# Table 08: Interpretive rules for B-lactam agents and Gram-positive cocci (no changes)
# Table 09: Interpretive rules for B-lactam agents and Gram-negative rods (no changes)
# Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins (no changes)
@@ -452,23 +452,23 @@
# Table 13: Interpretive rules for quinolones (no changes)
#
# Other rules
-# Non-EUCAST: amoxicillin/clav acid = S where ampicillin = S (2,194 values changed)
-# Non-EUCAST: ampicillin = R where amoxicillin/clav acid = R (121 values changed)
+# Non-EUCAST: amoxicillin/clav acid = S where ampicillin = S (2,257 values changed)
+# Non-EUCAST: ampicillin = R where amoxicillin/clav acid = R (132 values changed)
# Non-EUCAST: piperacillin = R where piperacillin/tazobactam = R (no changes)
# Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no changes)
# Non-EUCAST: trimethoprim = R where trimethoprim/sulfa = R (no changes)
# Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no changes)
#
# --------------------------------------------------------------------------
-# EUCAST rules affected 6,489 out of 20,000 rows, making a total of 7,902 edits
+# EUCAST rules affected 6,599 out of 20,000 rows, making a total of 8,020 edits
# => added 0 test results
#
-# => changed 7,902 test results
-# - 118 test results changed from S to I
-# - 4,776 test results changed from S to R
-# - 1,063 test results changed from I to S
-# - 318 test results changed from I to R
-# - 1,603 test results changed from R to S
+# => changed 8,020 test results
+# - 119 test results changed from S to I
+# - 4,832 test results changed from S to R
+# - 1,096 test results changed from I to S
+# - 342 test results changed from I to R
+# - 1,607 test results changed from R to S
# - 24 test results changed from R to I
# --------------------------------------------------------------------------
#
@@ -497,8 +497,8 @@
# NOTE: Using column `bacteria` as input for `col_mo`.
# NOTE: Using column `date` as input for `col_date`.
# NOTE: Using column `patient_id` as input for `col_patient_id`.
-# => Found 5,696 first isolates (28.5% of total)
So only 28.5% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
So only 28.3% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
For future use, the above two syntaxes can be shortened with the filter_first_isolate()
function:
We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient P1, sorted on date:
+We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient D2, sorted on date:
isolate | @@ -524,19 +524,19 @@|||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | -2010-01-26 | -P1 | +2010-02-14 | +D2 | B_ESCHR_COLI | -I | -S | -S | R | +S | +S | +S | TRUE |
2 | -2010-04-19 | -P1 | +2010-04-27 | +D2 | B_ESCHR_COLI | S | S | @@ -546,30 +546,30 @@||||||
3 | -2010-04-24 | -P1 | +2010-05-31 | +D2 | B_ESCHR_COLI | R | -R | -R | +S | +S | S | FALSE | |
4 | -2010-06-11 | -P1 | +2010-08-21 | +D2 | B_ESCHR_COLI | S | S | S | -R | +S | FALSE | ||
5 | -2010-11-24 | -P1 | +2010-09-21 | +D2 | B_ESCHR_COLI | S | S | @@ -579,8 +579,8 @@||||||
6 | -2010-12-11 | -P1 | +2010-10-04 | +D2 | B_ESCHR_COLI | R | S | @@ -590,8 +590,8 @@||||||
7 | -2010-12-23 | -P1 | +2010-10-11 | +D2 | B_ESCHR_COLI | S | S | @@ -601,10 +601,10 @@||||||
8 | -2011-01-14 | -P1 | +2010-11-16 | +D2 | B_ESCHR_COLI | -R | +S | S | S | S | @@ -612,26 +612,26 @@|||
9 | -2011-01-19 | -P1 | +2011-03-05 | +D2 | B_ESCHR_COLI | -R | -R | S | S | -FALSE | -|||
10 | -2011-01-26 | -P1 | -B_ESCHR_COLI | -R | -I | S | S | TRUE | |||||
10 | +2011-04-18 | +D2 | +B_ESCHR_COLI | +S | +S | +S | +S | +FALSE | +
Only 2 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics()
function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.
isolate | @@ -662,20 +662,20 @@||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | -2010-01-26 | -P1 | +2010-02-14 | +D2 | B_ESCHR_COLI | -I | -S | -S | R | +S | +S | +S | TRUE | TRUE |
2 | -2010-04-19 | -P1 | +2010-04-27 | +D2 | B_ESCHR_COLI | S | S | @@ -686,44 +686,44 @@|||||||
3 | -2010-04-24 | -P1 | +2010-05-31 | +D2 | B_ESCHR_COLI | R | -R | -R | +S | +S | S | FALSE | TRUE | |
4 | -2010-06-11 | -P1 | +2010-08-21 | +D2 | B_ESCHR_COLI | S | S | S | -R | +S | FALSE | TRUE | ||
5 | -2010-11-24 | -P1 | +2010-09-21 | +D2 | B_ESCHR_COLI | S | S | S | S | FALSE | -TRUE | +FALSE | ||
6 | -2010-12-11 | -P1 | +2010-10-04 | +D2 | B_ESCHR_COLI | R | S | @@ -734,8 +734,8 @@|||||||
7 | -2010-12-23 | -P1 | +2010-10-11 | +D2 | B_ESCHR_COLI | S | S | @@ -746,10 +746,10 @@|||||||
8 | -2011-01-14 | -P1 | +2010-11-16 | +D2 | B_ESCHR_COLI | -R | +S | S | S | S | @@ -758,35 +758,35 @@||||
9 | -2011-01-19 | -P1 | +2011-03-05 | +D2 | B_ESCHR_COLI | -R | -R | +S | +S | +S | +S | +TRUE | +TRUE | +|
10 | +2011-04-18 | +D2 | +B_ESCHR_COLI | +S | +S | S | S | FALSE | -TRUE | -|||||
10 | -2011-01-26 | -P1 | -B_ESCHR_COLI | -R | -I | -S | -S | -TRUE | -TRUE | +FALSE |
Instead of 2, now 10 isolates are flagged. In total, 76.2% of all isolates are marked ‘first weighted’ - 47.7% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
+Instead of 2, now 8 isolates are flagged. In total, 75.0% of all isolates are marked ‘first weighted’ - 46.8% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
As with filter_first_isolate()
, there’s a shortcut for this new algorithm too:
So we end up with 15,241 isolates for analysis.
+So we end up with 15,009 isolates for analysis.
We can remove unneeded columns:
@@ -812,57 +812,9 @@Frequency table
Class: character
-Length: 15,241 (of which NA: 0 = 0%)
+Length: 15,009 (of which NA: 0 = 0%)
Unique: 4
Shortest: 16
Longest: 24
The functions resistance()
and susceptibility()
can be used to calculate antimicrobial resistance or susceptibility. For more specific analyses, the functions proportion_S()
, proportion_SI()
, proportion_I()
, proportion_IR()
and proportion_R()
can be used to determine the proportion of a specific antimicrobial outcome.
As per the EUCAST guideline of 2019, we calculate resistance as the proportion of R (proportion_R()
, equal to resistance()
) and susceptibility as the proportion of S and I (proportion_SI()
, equal to susceptibility()
). These functions can be used on their own:
Or can be used in conjuction with group_by()
and summarise()
, both from the dplyr
package:
data_1st %>%
group_by(hospital) %>%
@@ -993,19 +993,19 @@ Longest: 24
Hospital A
-0.4637681
+0.4640823
Hospital B
-0.4776811
+0.4663609
Hospital C
-0.4811697
+0.4736130
Hospital D
-0.4694820
+0.4749499
@@ -1023,23 +1023,23 @@ Longest: 24
Hospital A
-0.4637681
-4554
+0.4640823
+4566
Hospital B
-0.4776811
-5399
+0.4663609
+5232
Hospital C
-0.4811697
-2257
+0.4736130
+2217
Hospital D
-0.4694820
-3031
+0.4749499
+2994
@@ -1059,27 +1059,27 @@ Longest: 24
Escherichia
-0.9212433
-0.8984591
-0.9927565
+0.9211982
+0.8896235
+0.9929834
Klebsiella
-0.8298677
-0.8897290
-0.9836169
+0.8239034
+0.8804832
+0.9809282
Staphylococcus
-0.9290305
-0.9258168
-0.9946438
+0.9188023
+0.9209603
+0.9932560
Streptococcus
-0.6067899
+0.5974978
0.0000000
-0.6067899
+0.5974978
@@ -1089,11 +1089,12 @@ Longest: 24
summarise("1. Amoxi/clav" = susceptibility(AMC),
"2. Gentamicin" = susceptibility(GEN),
"3. Amoxi/clav + genta" = susceptibility(AMC, GEN)) %>%
- tidyr::gather("antibiotic", "S", -genus) %>%
- ggplot(aes(x = genus,
- y = S,
- fill = antibiotic)) +
- geom_col(position = "dodge2")
The next example uses the included example_isolates
, which is an anonymised data set containing 2,000 microbial blood culture isolates with their full antibiograms found in septic patients in 4 different hospitals in the Netherlands, between 2001 and 2017. This data.frame
can be used to practice AMR analysis.
We will compare the resistance to fosfomycin (column FOS
) in hospital A and D. The input for the fisher.test()
can be retrieved with a transformation like this:
check_FOS <- example_isolates %>%
- filter(hospital_id %in% c("A", "D")) %>% # filter on only hospitals A and D
- select(hospital_id, FOS) %>% # select the hospitals and fosfomycin
- group_by(hospital_id) %>% # group on the hospitals
- count_df(combine_SI = TRUE) %>% # count all isolates per group (hospital_id)
- tidyr::spread(hospital_id, value) %>% # transform output so A and D are columns
- select(A, D) %>% # and select these only
- as.matrix() # transform to good old matrix for fisher.test()
-
-check_FOS
-# A D
-# [1,] 25 77
-# [2,] 24 33
# use package 'tidyr' to pivot data;
+# it gets installed with this 'AMR' package
+library(tidyr)
+
+check_FOS <- example_isolates %>%
+ filter(hospital_id %in% c("A", "D")) %>% # filter on only hospitals A and D
+ select(hospital_id, FOS) %>% # select the hospitals and fosfomycin
+ group_by(hospital_id) %>% # group on the hospitals
+ count_df(combine_SI = TRUE) %>% # count all isolates per group (hospital_id)
+ pivot_wider(names_from = hospital_id, # transform output so A and D are columns
+ values_from = value) %>%
+ select(A, D) %>% # and only select these columns
+ as.matrix() # transform to a good old matrix for fisher.test()
+
+check_FOS
+# A D
+# [1,] 25 77
+# [2,] 24 33
We can apply the test now with:
# do Fisher's Exact Test
fisher.test(check_FOS)
@@ -1181,7 +1187,7 @@ Longest: 24
# sample estimates:
# odds ratio
# 0.4488318
As can be seen, the p value is 0.031, which means that the fosfomycin resistances found in hospital A and D are really different.
+As can be seen, the p value is 0.031, which means that the fosfomycin resistance found in hospital A and D are really different.
SPSS.Rmd
R is extremely flexible.
-Because you write the syntax yourself, you can do anything you want. The flexibility in transforming, gathering, grouping and summarising data, or drawing plots, is endless - with SPSS, SAS or Stata you are bound to their algorithms and format styles. They may be a bit flexible, but you can probably never create that very specific publication-ready plot without using other (paid) software. If you sometimes write syntaxes in SPSS to run a complete analysis or to ‘automate’ some of your work, you could do this a lot less time in R. You will notice that writing syntaxes in R is a lot more nifty and clever than in SPSS. Still, as working with any statistical package, you will have to have knowledge about what you are doing (statistically) and what you are willing to accomplish.
+Because you write the syntax yourself, you can do anything you want. The flexibility in transforming, arranging, grouping and summarising data, or drawing plots, is endless - with SPSS, SAS or Stata you are bound to their algorithms and format styles. They may be a bit flexible, but you can probably never create that very specific publication-ready plot without using other (paid) software. If you sometimes write syntaxes in SPSS to run a complete analysis or to ‘automate’ some of your work, you could do this a lot less time in R. You will notice that writing syntaxes in R is a lot more nifty and clever than in SPSS. Still, as working with any statistical package, you will have to have knowledge about what you are doing (statistically) and what you are willing to accomplish.
R can be easily automated.
diff --git a/docs/articles/index.html b/docs/articles/index.html index a1f60b5b..2270e9a5 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -84,7 +84,7 @@ diff --git a/docs/authors.html b/docs/authors.html index 3299e539..881f4b86 100644 --- a/docs/authors.html +++ b/docs/authors.html @@ -84,7 +84,7 @@ diff --git a/docs/index.html b/docs/index.html index 59f76de3..6686dfef 100644 --- a/docs/index.html +++ b/docs/index.html @@ -45,7 +45,7 @@ diff --git a/docs/news/index.html b/docs/news/index.html index 9471058e..63b55043 100644 --- a/docs/news/index.html +++ b/docs/news/index.html @@ -84,7 +84,7 @@ @@ -231,16 +231,24 @@ -Last updated: 10-Nov-2019
+Last updated: 11-Nov-2019
susceptibility()
and resistance()
as aliases of proportion_SI()
and proportion_R()
, respectively. These functions were added to make it more clear that I should be considered susceptible and not resistant.Functions susceptibility()
and resistance()
as aliases of proportion_SI()
and proportion_R()
, respectively. These functions were added to make it more clear that “I” should be considered susceptible and not resistant.
mdro()
functionDetermination of first isolates now excludes all ‘unknown’ microorganisms at default, i.e. microbial code "UNKNOWN"
. They can be included with the new parameter include_unknown
:
"con"
(contamination) will be excluded at default, since as.mo("con") = "UNKNOWN"
. The function always shows a note with the number of ‘unknown’ microorganisms that were included or excluded.For code consistency, classes ab
and mo
will now be preserved in any subsetting or assignment. For the sake of data integrity, this means that invalid assignments will now result in NA
:
# how it works in base R:
-x <- factor("A")
-x[1] <- "B"
-#> Warning message:
-#> invalid factor level, NA generated
-
-# how it now works similarly for classes 'mo' and 'ab':
-x <- as.mo("E. coli")
-x[1] <- "testvalue"
-#> Warning message:
-#> invalid microorganism code, NA generated
# how it works in base R:
+x <- factor("A")
+x[1] <- "B"
+#> Warning message:
+#> invalid factor level, NA generated
+
+# how it now works similarly for classes 'mo' and 'ab':
+x <- as.mo("E. coli")
+x[1] <- "testvalue"
+#> Warning message:
+#> invalid microorganism code, NA generated
"testvalue"
could never be understood by e.g. mo_name()
, although the class would suggest a valid microbial code.freq()
has moved to a new package, clean
(CRAN link), since creating frequency tables actually does not fit the scope of this package. The freq()
function still works, since it is re-exported from the clean
package (which will be installed automatically upon updating this AMR
package).Renamed data set septic_patients
to example_isolates
"testvalue"
could never be
Function bug_drug_combinations()
to quickly get a data.frame
with the results of all bug-drug combinations in a data set. The column containing microorganism codes is guessed automatically and its input is transformed with mo_shortname()
at default:
x <- bug_drug_combinations(example_isolates)
-#> NOTE: Using column `mo` as input for `col_mo`.
-x[1:4, ]
-#> mo ab S I R total
-#> 1 A. baumannii AMC 0 0 3 3
-#> 2 A. baumannii AMK 0 0 0 0
-#> 3 A. baumannii AMP 0 0 3 3
-#> 4 A. baumannii AMX 0 0 3 3
-#> NOTE: Use 'format()' on this result to get a publicable/printable format.
-
-# change the transformation with the FUN argument to anything you like:
-x <- bug_drug_combinations(example_isolates, FUN = mo_gramstain)
-#> NOTE: Using column `mo` as input for `col_mo`.
-x[1:4, ]
-#> mo ab S I R total
-#> 1 Gram-negative AMC 469 89 174 732
-#> 2 Gram-negative AMK 251 0 2 253
-#> 3 Gram-negative AMP 227 0 405 632
-#> 4 Gram-negative AMX 227 0 405 632
-#> NOTE: Use 'format()' on this result to get a publicable/printable format.
x <- bug_drug_combinations(example_isolates)
+#> NOTE: Using column `mo` as input for `col_mo`.
+x[1:4, ]
+#> mo ab S I R total
+#> 1 A. baumannii AMC 0 0 3 3
+#> 2 A. baumannii AMK 0 0 0 0
+#> 3 A. baumannii AMP 0 0 3 3
+#> 4 A. baumannii AMX 0 0 3 3
+#> NOTE: Use 'format()' on this result to get a publicable/printable format.
+
+# change the transformation with the FUN argument to anything you like:
+x <- bug_drug_combinations(example_isolates, FUN = mo_gramstain)
+#> NOTE: Using column `mo` as input for `col_mo`.
+x[1:4, ]
+#> mo ab S I R total
+#> 1 Gram-negative AMC 469 89 174 732
+#> 2 Gram-negative AMK 251 0 2 253
+#> 3 Gram-negative AMP 227 0 405 632
+#> 4 Gram-negative AMX 227 0 405 632
+#> NOTE: Use 'format()' on this result to get a publicable/printable format.
You can format this to a printable format, ready for reporting or exporting to e.g. Excel with the base R format()
function:
Additional way to calculate co-resistance, i.e. when using multiple antimicrobials as input for portion_*
functions or count_*
functions. This can be used to determine the empiric susceptibility of a combination therapy. A new parameter only_all_tested
(which defaults to FALSE
) replaces the old also_single_tested
and can be used to select one of the two methods to count isolates and calculate portions. The difference can be seen in this example table (which is also on the portion
and count
help pages), where the %SI is being determined:
# --------------------------------------------------------------------
-# only_all_tested = FALSE only_all_tested = TRUE
-# ----------------------- -----------------------
-# Drug A Drug B include as include as include as include as
-# numerator denominator numerator denominator
-# -------- -------- ---------- ----------- ---------- -----------
-# S or I S or I X X X X
-# R S or I X X X X
-# <NA> S or I X X - -
-# S or I R X X X X
-# R R - X - X
-# <NA> R - - - -
-# S or I <NA> X X - -
-# R <NA> - - - -
-# <NA> <NA> - - - -
-# --------------------------------------------------------------------
# --------------------------------------------------------------------
+# only_all_tested = FALSE only_all_tested = TRUE
+# ----------------------- -----------------------
+# Drug A Drug B include as include as include as include as
+# numerator denominator numerator denominator
+# -------- -------- ---------- ----------- ---------- -----------
+# S or I S or I X X X X
+# R S or I X X X X
+# <NA> S or I X X - -
+# S or I R X X X X
+# R R - X - X
+# <NA> R - - - -
+# S or I <NA> X X - -
+# R <NA> - - - -
+# <NA> <NA> - - - -
+# --------------------------------------------------------------------
also_single_tested
will throw an informative error that it has been replaced by only_all_tested
.tibble
printing support for classes rsi
, mic
, disk
, ab
mo
. When using tibble
s containing antimicrobial columns, values S
will print in green, values I
will print in yellow and values R
will print in red. Microbial IDs (class mo
) will emphasise on the genus and species, not on the kingdom.
also_single_tested
w
Function rsi_df()
to transform a data.frame
to a data set containing only the microbial interpretation (S, I, R), the antibiotic, the percentage of S/I/R and the number of available isolates. This is a convenient combination of the existing functions count_df()
and portion_df()
to immediately show resistance percentages and number of available isolates:
Support for all scientifically published pathotypes of E. coli to date (that we could find). Supported are:
@@ -463,12 +471,12 @@ Since this is a major change, usage of the oldalso_single_tested
w
All these lead to the microbial ID of E. coli:
-as.mo("UPEC")
-# B_ESCHR_COL
-mo_name("UPEC")
-# "Escherichia coli"
-mo_gramstain("EHEC")
-# "Gram-negative"
as.mo("UPEC")
+# B_ESCHR_COL
+mo_name("UPEC")
+# "Escherichia coli"
+mo_gramstain("EHEC")
+# "Gram-negative"
mo_info()
as an analogy to ab_info()
. The mo_info()
prints a list with the full taxonomy, authors, and the URL to the online database of a microorganismFunction mo_synonyms()
to get all previously accepted taxonomic names of a microorganism
septic_patients %>%
- freq(age) %>%
- boxplot()
-# grouped boxplots:
-septic_patients %>%
- group_by(hospital_id) %>%
- freq(age) %>%
- boxplot()
septic_patients %>%
+ freq(age) %>%
+ boxplot()
+# grouped boxplots:
+septic_patients %>%
+ group_by(hospital_id) %>%
+ freq(age) %>%
+ boxplot()
New filters for antimicrobial classes. Use these functions to filter isolates on results in one of more antibiotics from a specific class:
-filter_aminoglycosides()
-filter_carbapenems()
-filter_cephalosporins()
-filter_1st_cephalosporins()
-filter_2nd_cephalosporins()
-filter_3rd_cephalosporins()
-filter_4th_cephalosporins()
-filter_fluoroquinolones()
-filter_glycopeptides()
-filter_macrolides()
-filter_tetracyclines()
filter_aminoglycosides()
+filter_carbapenems()
+filter_cephalosporins()
+filter_1st_cephalosporins()
+filter_2nd_cephalosporins()
+filter_3rd_cephalosporins()
+filter_4th_cephalosporins()
+filter_fluoroquinolones()
+filter_glycopeptides()
+filter_macrolides()
+filter_tetracyclines()
The antibiotics
data set will be searched, after which the input data will be checked for column names with a value in any abbreviations, codes or official names found in the antibiotics
data set. For example:
All ab_*
functions are deprecated and replaced by atc_*
functions:
ab_property -> atc_property()
-ab_name -> atc_name()
-ab_official -> atc_official()
-ab_trivial_nl -> atc_trivial_nl()
-ab_certe -> atc_certe()
-ab_umcg -> atc_umcg()
-ab_tradenames -> atc_tradenames()
ab_property -> atc_property()
+ab_name -> atc_name()
+ab_official -> atc_official()
+ab_trivial_nl -> atc_trivial_nl()
+ab_certe -> atc_certe()
+ab_umcg -> atc_umcg()
+ab_tradenames -> atc_tradenames()
as.atc()
internally. The old atc_property
has been renamed atc_online_property()
. This is done for two reasons: firstly, not all ATC codes are of antibiotics (ab) but can also be of antivirals or antifungals. Secondly, the input must have class atc
or must be coerable to this class. Properties of these classes should start with the same class name, analogous to as.mo()
and e.g. mo_genus
.set_mo_source()
and get_mo_source()
to use your own predefined MO codes as input for as.mo()
and consequently all mo_*
functionsdplyr
version 0.8.0as.atc()
internally. The old atc_property
New function age_groups()
to split ages into custom or predefined groups (like children or elderly). This allows for easier demographic antimicrobial resistance analysis per age group.
New function ggplot_rsi_predict()
as well as the base R plot()
function can now be used for resistance prediction calculated with resistance_predict()
:
-
+
Functions filter_first_isolate()
and filter_first_weighted_isolate()
to shorten and fasten filtering on data sets with antimicrobial results, e.g.:
-
+
is equal to:
-
+
New function availability()
to check the number of available (non-empty) results in a data.frame
@@ -738,33 +746,33 @@ These functions use as.atc()
internally. The old atc_property
-
Now handles incorrect spelling, like i
instead of y
and f
instead of ph
:
-
+
-
Uncertainty of the algorithm is now divided into four levels, 0 to 3, where the default allow_uncertain = TRUE
is equal to uncertainty level 2. Run ?as.mo
for more info about these levels.
-# equal:
-as.mo(..., allow_uncertain = TRUE)
-as.mo(..., allow_uncertain = 2)
-
-# also equal:
-as.mo(..., allow_uncertain = FALSE)
-as.mo(..., allow_uncertain = 0)
+# equal:
+as.mo(..., allow_uncertain = TRUE)
+as.mo(..., allow_uncertain = 2)
+
+# also equal:
+as.mo(..., allow_uncertain = FALSE)
+as.mo(..., allow_uncertain = 0)
Using as.mo(..., allow_uncertain = 3)
could lead to very unreliable results.
- Implemented the latest publication of Becker et al. (2019), for categorising coagulase-negative Staphylococci
- All microbial IDs that found are now saved to a local file
~/.Rhistory_mo
. Use the new function clean_mo_history()
to delete this file, which resets the algorithms.
-
Incoercible results will now be considered ‘unknown’, MO code UNKNOWN
. On foreign systems, properties of these will be translated to all languages already previously supported: German, Dutch, French, Italian, Spanish and Portuguese:
-
+
- Fix for vector containing only empty values
- Finds better results when input is in other languages
@@ -810,19 +818,19 @@ Using as.mo(..., allow_uncertain = 3)
-
Support for tidyverse quasiquotation! Now you can create frequency tables of function outcomes:
-# Determine genus of microorganisms (mo) in `septic_patients` data set:
-# OLD WAY
-septic_patients %>%
- mutate(genus = mo_genus(mo)) %>%
- freq(genus)
-# NEW WAY
-septic_patients %>%
- freq(mo_genus(mo))
-
-# Even supports grouping variables:
-septic_patients %>%
- group_by(gender) %>%
- freq(mo_genus(mo))
+# Determine genus of microorganisms (mo) in `septic_patients` data set:
+# OLD WAY
+septic_patients %>%
+ mutate(genus = mo_genus(mo)) %>%
+ freq(genus)
+# NEW WAY
+septic_patients %>%
+ freq(mo_genus(mo))
+
+# Even supports grouping variables:
+septic_patients %>%
+ group_by(gender) %>%
+ freq(mo_genus(mo))
- Header info is now available as a list, with the
header
function
- The parameter
header
is now set to TRUE
at default, even for markdown
@@ -897,10 +905,10 @@ Using as.mo(..., allow_uncertain = 3)Fewer than 3 characters as input for as.mo
will return NA
-
Function as.mo
(and all mo_*
wrappers) now supports genus abbreviations with “species” attached
-
+
- Added parameter
combine_IR
(TRUE/FALSE) to functions portion_df
and count_df
, to indicate that all values of I and R must be merged into one, so the output only consists of S vs. IR (susceptible vs. non-susceptible)
- Fix for
portion_*(..., as_percent = TRUE)
when minimal number of isolates would not be met
@@ -913,15 +921,15 @@ Using as.mo(..., allow_uncertain = 3)
-
Support for grouping variables, test with:
-
+
-
Support for (un)selecting columns:
-
+
- Check for
hms::is.hms
@@ -1001,18 +1009,18 @@ Using as.mo(..., allow_uncertain = 3)
They also come with support for German, Dutch, French, Italian, Spanish and Portuguese:
-mo_gramstain("E. coli")
-# [1] "Gram negative"
-mo_gramstain("E. coli", language = "de") # German
-# [1] "Gramnegativ"
-mo_gramstain("E. coli", language = "es") # Spanish
-# [1] "Gram negativo"
-mo_fullname("S. group A", language = "pt") # Portuguese
-# [1] "Streptococcus grupo A"
+mo_gramstain("E. coli")
+# [1] "Gram negative"
+mo_gramstain("E. coli", language = "de") # German
+# [1] "Gramnegativ"
+mo_gramstain("E. coli", language = "es") # Spanish
+# [1] "Gram negativo"
+mo_fullname("S. group A", language = "pt") # Portuguese
+# [1] "Streptococcus grupo A"
Furthermore, former taxonomic names will give a note about the current taxonomic name:
-mo_gramstain("Esc blattae")
-# Note: 'Escherichia blattae' (Burgess et al., 1973) was renamed 'Shimwellia blattae' (Priest and Barker, 2010)
-# [1] "Gram negative"
+mo_gramstain("Esc blattae")
+# Note: 'Escherichia blattae' (Burgess et al., 1973) was renamed 'Shimwellia blattae' (Priest and Barker, 2010)
+# [1] "Gram negative"
Functions count_R
, count_IR
, count_I
, count_SI
and count_S
to selectively count resistant or susceptible isolates
@@ -1023,18 +1031,18 @@ Using as.mo(..., allow_uncertain = 3)
-
Functions as.mo
and is.mo
as replacements for as.bactid
and is.bactid
(since the microoganisms
data set not only contains bacteria). These last two functions are deprecated and will be removed in a future release. The as.mo
function determines microbial IDs using intelligent rules:
-as.mo("E. coli")
-# [1] B_ESCHR_COL
-as.mo("MRSA")
-# [1] B_STPHY_AUR
-as.mo("S group A")
-# [1] B_STRPTC_GRA
+as.mo("E. coli")
+# [1] B_ESCHR_COL
+as.mo("MRSA")
+# [1] B_STPHY_AUR
+as.mo("S group A")
+# [1] B_STRPTC_GRA
And with great speed too - on a quite regular Linux server from 2007 it takes us less than 0.02 seconds to transform 25,000 items:
-
+
- Added parameter
reference_df
for as.mo
, so users can supply their own microbial IDs, name or codes as a reference table
- Renamed all previous references to
bactid
to mo
, like:
@@ -1062,12 +1070,12 @@ Using as.mo(..., allow_uncertain = 3)Added three antimicrobial agents to the antibiotics
data set: Terbinafine (D01BA02), Rifaximin (A07AA11) and Isoconazole (D01AC05)
-
Added 163 trade names to the antibiotics
data set, it now contains 298 different trade names in total, e.g.:
-
+
- For
first_isolate
, rows will be ignored when there’s no species available
- Function
ratio
is now deprecated and will be removed in a future release, as it is not really the scope of this package
@@ -1078,13 +1086,13 @@ Using as.mo(..., allow_uncertain = 3)
-
Support for quasiquotation in the functions series count_*
and portions_*
, and n_rsi
. This allows to check for more than 2 vectors or columns.
-
+
- Edited
ggplot_rsi
and geom_rsi
so they can cope with count_df
. The new fun
parameter has value portion_df
at default, but can be set to count_df
.
- Fix for
ggplot_rsi
when the ggplot2
package was not loaded
@@ -1098,12 +1106,12 @@ Using as.mo(..., allow_uncertain = 3)
-
Support for types (classes) list and matrix for freq
-
+
For lists, subsetting is possible:
-
+
as.mo(..., allow_uncertain = 3)
Contents