diff --git a/DESCRIPTION b/DESCRIPTION index 67a8e1d2..c438c08b 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR Version: 0.5.0.9016 -Date: 2019-01-30 +Date: 2019-02-01 Title: Antimicrobial Resistance Analysis Authors@R: c( person( diff --git a/R/atc_online.R b/R/atc_online.R index b8db697e..77bbad0e 100644 --- a/R/atc_online.R +++ b/R/atc_online.R @@ -79,20 +79,12 @@ atc_online_property <- function(atc_code, stop("Packages 'xml2', 'rvest' and 'curl' are required for this function") } - # check active network interface, from https://stackoverflow.com/a/5078002/4575331 - has_internet <- function(url) { - # extract host from given url - # https://www.whocc.no/atc_ddd_index/ -> www.whocc.no - url <- url %>% - gsub("^(http://|https://)", "", .) %>% - strsplit('/', fixed = TRUE) %>% - unlist() %>% - .[1] - !is.null(curl::nslookup(url, error = FALSE)) + if (!all(atc_code %in% AMR::antibiotics)) { + atc_code <- as.character(as.atc(atc_code)) } - # check for connection using the ATC of amoxicillin - if (!curl::has_internet(url = url)) { - message("The URL could not be reached.") + + if (!curl::has_internet()) { + message("There appears to be no internet connection.") return(rep(NA, length(atc_code))) } diff --git a/R/freq.R b/R/freq.R index 5fc966c9..34fe6013 100755 --- a/R/freq.R +++ b/R/freq.R @@ -610,7 +610,13 @@ format_header <- function(x, markdown = FALSE, decimal.mark = ".", big.mark = ", # class and mode if (is.null(header$columns)) { + if (markdown == TRUE) { + header$class <- paste0("`", header$class, "`") + } if (!header$mode %in% header$class) { + if (markdown == TRUE) { + header$mode <- paste0("`", header$mode, "`") + } header$class <- header$class %>% rev() %>% paste(collapse = " > ") %>% paste0(silver(paste0(" (", header$mode, ")"))) } else { header$class <- header$class %>% rev() %>% paste(collapse = " > ") diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index 09e907f5..9854b148 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -40,7 +40,7 @@
@@ -185,7 +185,7 @@AMR.Rmd
Note: values on this page will change with every website update since they are based on randomly created values and the page was written in RMarkdown. However, the methodology remains unchanged. This page was generated on 29 January 2019.
-Note: values on this page will change with every website update since they are based on randomly created values and the page was written in RMarkdown. However, the methodology remains unchanged. This page was generated on 01 February 2019.
+For this tutorial, we will create fake demonstration data to work with.
You can skip to Cleaning the data if you already have your own data ready. If you start your analysis, try to make the structure of your data generally look like this:
2019-01-29 | +2019-02-01 | abcd | Escherichia coli | S | S |
2019-01-29 | +2019-02-01 | abcd | Escherichia coli | S | R |
2019-01-29 | +2019-02-01 | efgh | Escherichia coli | R | @@ -232,73 +232,73 @@
As with many uses in R, we need some additional packages for AMR analysis. Our package works closely together with the tidyverse packages dplyr
and ggplot2
by Dr Hadley Wickham. The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.
Our AMR
package depends on these packages and even extends their use and functions.
We will create some fake example data to use for analysis. For antimicrobial resistance analysis, we need at least: a patient ID, name or code of a microorganism, a date and antimicrobial results (an antibiogram). It could also include a specimen type (e.g. to filter on blood or urine), the ward type (e.g. to filter on ICUs).
With additional columns (like a hospital name, the patients gender of even [well-defined] clinical properties) you can do a comparative analysis, as this tutorial will demonstrate too.
-To start with patients, we need a unique list of patients.
- +The LETTERS
object is available in R - it’s a vector with 26 characters: A
to Z
. The patients
object we just created is now a vector of length 260, with values (patient IDs) varying from A1
to Z10
. Now we we also set the gender of our patients, by putting the ID and the gender in a table:
patients_table <- data.frame(patient_id = patients,
+ gender = c(rep("M", 135),
+ rep("F", 125)))
The first 135 patient IDs are now male, the other 125 are female.
Let’s pretend that our data consists of blood cultures isolates from 1 January 2010 until 1 January 2018.
- +This dates
object now contains all days in our date range.
For this tutorial, we will uses four different microorganisms: Escherichia coli, Staphylococcus aureus, Streptococcus pneumoniae, and Klebsiella pneumoniae:
-bacteria <- c("Escherichia coli", "Staphylococcus aureus",
- "Streptococcus pneumoniae", "Klebsiella pneumoniae")
bacteria <- c("Escherichia coli", "Staphylococcus aureus",
+ "Streptococcus pneumoniae", "Klebsiella pneumoniae")
For completeness, we can also add the hospital where the patients was admitted and we need to define valid antibmicrobial results for our randomisation:
- +Using the sample()
function, we can randomly select items from all objects we defined earlier. To let our fake data reflect reality a bit, we will also approximately define the probabilities of bacteria and the antibiotic results with the prob
parameter.
data <- data.frame(date = sample(dates, 5000, replace = TRUE),
- patient_id = sample(patients, 5000, replace = TRUE),
- hospital = sample(hospitals, 5000, replace = TRUE, prob = c(0.30, 0.35, 0.15, 0.20)),
- bacteria = sample(bacteria, 5000, replace = TRUE, prob = c(0.50, 0.25, 0.15, 0.10)),
- amox = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.60, 0.05, 0.35)),
- amcl = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.75, 0.10, 0.15)),
- cipr = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.80, 0.00, 0.20)),
- gent = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.92, 0.00, 0.08))
- )
data <- data.frame(date = sample(dates, 5000, replace = TRUE),
+ patient_id = sample(patients, 5000, replace = TRUE),
+ hospital = sample(hospitals, 5000, replace = TRUE, prob = c(0.30, 0.35, 0.15, 0.20)),
+ bacteria = sample(bacteria, 5000, replace = TRUE, prob = c(0.50, 0.25, 0.15, 0.10)),
+ amox = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.60, 0.05, 0.35)),
+ amcl = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.75, 0.10, 0.15)),
+ cipr = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.80, 0.00, 0.20)),
+ gent = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.92, 0.00, 0.08))
+ )
Using the left_join()
function from the dplyr
package, we can ‘map’ the gender to the patient ID using the patients_table
object we created earlier:
data <- data %>% left_join(patients_table)
The resulting data set contains 5,000 blood culture isolates. With the head()
function we can preview the first 6 values of this data set:
head(data)
date | @@ -313,153 +313,153 @@|||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2015-02-23 | -C5 | -Hospital B | -Staphylococcus aureus | -R | -R | +2014-06-07 | +Y9 | +Hospital A | +Klebsiella pneumoniae | S | S | -M | +S | +S | +F |
2011-10-29 | -X2 | +2010-06-10 | +Z1 | +Hospital B | +Escherichia coli | +R | +S | +S | +S | +F | +|||||
2012-03-20 | +J6 | Hospital C | Streptococcus pneumoniae | +S | +S | +S | +S | +M | +|||||||
2016-10-31 | +M5 | +Hospital A | +Escherichia coli | +S | R | S | S | -S | -F | +M | |||||
2010-06-10 | -J5 | +2016-05-05 | +W8 | Hospital B | Escherichia coli | -S | -S | -S | -S | -M | -|||||
2013-11-09 | -U4 | -Hospital A | -Escherichia coli | R | S | S | S | F | |||||||
2010-10-12 | -C3 | +||||||||||||||
2016-03-10 | +G8 | Hospital A | -Staphylococcus aureus | +Streptococcus pneumoniae | +S | S | -I | S | S | M | |||||
2017-12-04 | -T3 | -Hospital C | -Escherichia coli | -R | -S | -S | -S | -F | -
Now, let’s start the cleaning and the analysis!
Use the frequency table function freq()
to look specifically for unique values in any variable. For example, for the gender
variable:
# Frequency table
-# Class: factor (numeric)
-# Levels: F, M
-# Length: 5,000 (of which NA: 0 = 0.00%)
+data %>% freq(gender) # this would be the same: freq(data$gender)
+# Frequency table of `gender` from a `data.frame` (5,000 x 9)
+# Class: factor (numeric)
+# Levels: F, M
+# Length: 5,000 (of which NA: 0 = 0.00%)
# Unique: 2
#
# Item Count Percent Cum. Count Cum. Percent
# --- ----- ------ -------- ----------- -------------
-# 1 M 2,586 51.7% 2,586 51.7%
-# 2 F 2,414 48.3% 5,000 100.0%
+# 1 M 2,622 52.4% 2,622 52.4%
+# 2 F 2,378 47.6% 5,000 100.0%
So, we can draw at least two conclusions immediately. From a data scientist perspective, the data looks clean: only values M
and F
. From a researcher perspective: there are slightly more men. Nothing we didn’t already know.
The data is already quite clean, but we still need to transform some variables. The bacteria
column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate()
function of the dplyr
package makes this really easy:
We also want to transform the antibiotics, because in real life data we don’t know if they are really clean. The as.rsi()
function ensures reliability and reproducibility in these kind of variables. The mutate_at()
will run the as.rsi()
function on defined variables:
Finally, we will apply EUCAST rules on our antimicrobial results. In Europe, most medical microbiological laboratories already apply these rules. Our package features their latest insights on intrinsic resistance and exceptional phenotypes. Moreover, the eucast_rules()
function can also apply additional rules, like forcing
Because the amoxicillin (column amox
) and amoxicillin/clavulanic acid (column amcl
) in our data were generated randomly, some rows will undoubtedly contain amox = S and amcl = R, which is technically impossible. The eucast_rules()
fixes this:
data <- eucast_rules(data, col_mo = "bacteria")
-#
-# Rules by the European Committee on Antimicrobial Susceptibility Testing (EUCAST)
-#
-# EUCAST Clinical Breakpoints (v9.0, 2019)
-# Enterobacteriales (Order) (no changes)
-# Staphylococcus (no changes)
-# Enterococcus (no changes)
-# Streptococcus groups A, B, C, G (no changes)
-# Streptococcus pneumoniae (no changes)
-# Viridans group streptococci (no changes)
-# Haemophilus influenzae (no changes)
-# Moraxella catarrhalis (no changes)
-# Anaerobic Gram positives (no changes)
-# Anaerobic Gram negatives (no changes)
-# Pasteurella multocida (no changes)
-# Campylobacter jejuni and C. coli (no changes)
-# Aerococcus sanguinicola and A. urinae (no changes)
-# Kingella kingae (no changes)
-#
-# EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
-# Table 1: Intrinsic resistance in Enterobacteriaceae (324 changes)
-# Table 2: Intrinsic resistance in non-fermentative Gram-negative bacteria (no changes)
-# Table 3: Intrinsic resistance in other Gram-negative bacteria (no changes)
-# Table 4: Intrinsic resistance in Gram-positive bacteria (722 changes)
-# Table 8: Interpretive rules for B-lactam agents and Gram-positive cocci (no changes)
-# Table 9: Interpretive rules for B-lactam agents and Gram-negative rods (no changes)
-# Table 10: Interpretive rules for B-lactam agents and other Gram-negative bacteria (no changes)
-# Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins (no changes)
-# Table 12: Interpretive rules for aminoglycosides (no changes)
-# Table 13: Interpretive rules for quinolones (no changes)
-#
-# Other rules
-# Non-EUCAST: ampicillin = R where amoxicillin/clav acid = R (no changes)
-# Non-EUCAST: piperacillin = R where piperacillin/tazobactam = R (no changes)
-# Non-EUCAST: trimethoprim = R where trimethoprim/sulfa = R (no changes)
-# Non-EUCAST: amoxicillin/clav acid = S where ampicillin = S (no changes)
-# Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no changes)
-# Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no changes)
-#
-# => EUCAST rules affected 1,853 out of 5,000 rows -> changed 1,046 test results.
data <- eucast_rules(data, col_mo = "bacteria")
+#
+# Rules by the European Committee on Antimicrobial Susceptibility Testing (EUCAST)
+#
+# EUCAST Clinical Breakpoints (v9.0, 2019)
+# Enterobacteriales (Order) (no changes)
+# Staphylococcus (no changes)
+# Enterococcus (no changes)
+# Streptococcus groups A, B, C, G (no changes)
+# Streptococcus pneumoniae (no changes)
+# Viridans group streptococci (no changes)
+# Haemophilus influenzae (no changes)
+# Moraxella catarrhalis (no changes)
+# Anaerobic Gram positives (no changes)
+# Anaerobic Gram negatives (no changes)
+# Pasteurella multocida (no changes)
+# Campylobacter jejuni and C. coli (no changes)
+# Aerococcus sanguinicola and A. urinae (no changes)
+# Kingella kingae (no changes)
+#
+# EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
+# Table 1: Intrinsic resistance in Enterobacteriaceae (340 changes)
+# Table 2: Intrinsic resistance in non-fermentative Gram-negative bacteria (no changes)
+# Table 3: Intrinsic resistance in other Gram-negative bacteria (no changes)
+# Table 4: Intrinsic resistance in Gram-positive bacteria (681 changes)
+# Table 8: Interpretive rules for B-lactam agents and Gram-positive cocci (no changes)
+# Table 9: Interpretive rules for B-lactam agents and Gram-negative rods (no changes)
+# Table 10: Interpretive rules for B-lactam agents and other Gram-negative bacteria (no changes)
+# Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins (no changes)
+# Table 12: Interpretive rules for aminoglycosides (no changes)
+# Table 13: Interpretive rules for quinolones (no changes)
+#
+# Other rules
+# Non-EUCAST: ampicillin = R where amoxicillin/clav acid = R (no changes)
+# Non-EUCAST: piperacillin = R where piperacillin/tazobactam = R (no changes)
+# Non-EUCAST: trimethoprim = R where trimethoprim/sulfa = R (no changes)
+# Non-EUCAST: amoxicillin/clav acid = S where ampicillin = S (no changes)
+# Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no changes)
+# Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no changes)
+#
+# => EUCAST rules affected 1,858 out of 5,000 rows -> changed 1,021 test results.
Now that we have the microbial ID, we can add some taxonomic properties:
-data <- data %>%
- mutate(gramstain = mo_gramstain(bacteria),
- genus = mo_genus(bacteria),
- species = mo_species(bacteria))
data <- data %>%
+ mutate(gramstain = mo_gramstain(bacteria),
+ genus = mo_genus(bacteria),
+ species = mo_species(bacteria))
We also need to know which isolates we can actually use for analysis.
To conduct an analysis of antimicrobial resistance, you must only include the first isolate of every patient per episode (Hindler et al., Clin Infect Dis. 2007). If you would not do this, you could easily get an overestimate or underestimate of the resistance of an antibiotic. Imagine that a patient was admitted with an MRSA and that it was found in 5 different blood cultures the following weeks (yes, some countries like the Netherlands have these blood drawing policies). The resistance percentage of oxacillin of all isolates would be overestimated, because you included this MRSA more than once. It would clearly be selection bias.
The Clinical and Laboratory Standards Institute (CLSI) appoints this as follows:
@@ -467,22 +467,22 @@(…) When preparing a cumulative antibiogram to guide clinical decisions about empirical antimicrobial therapy of initial infections, only the first isolate of a given species per patient, per analysis period (eg, one year) should be included, irrespective of body site, antimicrobial susceptibility profile, or other phenotypical characteristics (eg, biotype). The first isolate is easily identified, and cumulative antimicrobial susceptibility test data prepared using the first isolate are generally comparable to cumulative antimicrobial susceptibility test data calculated by other methods, providing duplicate isolates are excluded.
M39-A4 Analysis and Presentation of Cumulative Antimicrobial Susceptibility Test Data, 4th Edition. CLSI, 2014. Chapter 6.4
This AMR
package includes this methodology with the first_isolate()
function. It adopts the episode of a year (can be changed by user) and it starts counting days after every selected isolate. This new variable can easily be added to our data:
data <- data %>%
- mutate(first = first_isolate(.))
-# NOTE: Using column `bacteria` as input for `col_mo`.
-# NOTE: Using column `date` as input for `col_date`.
-# NOTE: Using column `patient_id` as input for `col_patient_id`.
-# => Found 2,926 first isolates (58.5% of total)
So only 58.5% is suitable for resistance analysis! We can now filter on is with the filter()
function, also from the dplyr
package:
data <- data %>%
+ mutate(first = first_isolate(.))
+# NOTE: Using column `bacteria` as input for `col_mo`.
+# NOTE: Using column `date` as input for `col_date`.
+# NOTE: Using column `patient_id` as input for `col_patient_id`.
+# => Found 2,956 first isolates (59.1% of total)
So only 59.1% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
data_1st <- data %>%
+ filter(first == TRUE)
For future use, the above two syntaxes can be shortened with the filter_first_isolate()
function:
data_1st <- data %>%
+ filter_first_isolate()
We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Imagine this data, sorted on date:
1 | -2010-08-27 | -X6 | +2010-10-23 | +K10 | B_ESCHR_COL | -S | -S | R | S | +S | +S | TRUE |
2 | -2010-12-12 | -X6 | +2011-03-17 | +K10 | B_ESCHR_COL | R | -S | R | S | +S | FALSE | |
3 | -2011-05-24 | -X6 | +2011-08-12 | +K10 | B_ESCHR_COL | -R | +S | S | S | S | @@ -532,19 +532,19 @@||
4 | -2011-09-03 | -X6 | +2012-02-24 | +K10 | B_ESCHR_COL | S | -S | R | S | +S | TRUE | |
5 | -2011-09-21 | -X6 | +2012-04-19 | +K10 | B_ESCHR_COL | S | S | @@ -554,41 +554,41 @@|||||
6 | -2011-10-31 | -X6 | +2013-08-25 | +K10 | B_ESCHR_COL | R | -S | R | S | -FALSE | +R | +TRUE |
7 | -2012-07-02 | -X6 | +2014-01-04 | +K10 | B_ESCHR_COL | -S | -S | +R | +R | S | S | FALSE |
8 | -2012-12-30 | -X6 | +2014-03-05 | +K10 | B_ESCHR_COL | +R | S | S | S | -S | -TRUE | +FALSE |
9 | -2013-03-15 | -X6 | +2014-03-11 | +K10 | B_ESCHR_COL | S | S | @@ -598,29 +598,29 @@|||||
10 | -2014-01-14 | -X6 | +2014-06-20 | +K10 | B_ESCHR_COL | S | -R | S | S | -TRUE | +S | +FALSE |
Only 4 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and show be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics()
function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.
Only 3 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics()
function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.
If a column exists with a name like ‘key(…)ab’ the first_isolate()
function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output:
data <- data %>%
- mutate(keyab = key_antibiotics(.)) %>%
- mutate(first_weighted = first_isolate(.))
-# NOTE: Using column `bacteria` as input for `col_mo`.
-# NOTE: Using column `bacteria` as input for `col_mo`.
-# NOTE: Using column `date` as input for `col_date`.
-# NOTE: Using column `patient_id` as input for `col_patient_id`.
-# NOTE: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics = FALSE to prevent this.
-# [Criterion] Inclusion based on key antibiotics, ignoring I.
-# => Found 4,404 first weighted isolates (88.1% of total)
data <- data %>%
+ mutate(keyab = key_antibiotics(.)) %>%
+ mutate(first_weighted = first_isolate(.))
+# NOTE: Using column `bacteria` as input for `col_mo`.
+# NOTE: Using column `bacteria` as input for `col_mo`.
+# NOTE: Using column `date` as input for `col_date`.
+# NOTE: Using column `patient_id` as input for `col_patient_id`.
+# NOTE: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics = FALSE to prevent this.
+# [Criterion] Inclusion based on key antibiotics, ignoring I.
+# => Found 4,383 first weighted isolates (87.7% of total)
isolate | @@ -637,34 +637,34 @@||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | -2010-08-27 | -X6 | +2010-10-23 | +K10 | B_ESCHR_COL | -S | -S | R | S | +S | +S | TRUE | TRUE | |
2 | -2010-12-12 | -X6 | +2011-03-17 | +K10 | B_ESCHR_COL | R | -S | R | S | +S | FALSE | TRUE | ||
3 | -2011-05-24 | -X6 | +2011-08-12 | +K10 | B_ESCHR_COL | -R | +S | S | S | S | @@ -673,20 +673,20 @@||||
4 | -2011-09-03 | -X6 | +2012-02-24 | +K10 | B_ESCHR_COL | S | -S | R | S | +S | TRUE | TRUE | ||
5 | -2011-09-21 | -X6 | +2012-04-19 | +K10 | B_ESCHR_COL | S | S | @@ -697,23 +697,23 @@|||||||
6 | -2011-10-31 | -X6 | +2013-08-25 | +K10 | B_ESCHR_COL | R | -S | R | S | -FALSE | +R | +TRUE | TRUE | |
7 | -2012-07-02 | -X6 | +2014-01-04 | +K10 | B_ESCHR_COL | -S | -S | +R | +R | S | S | FALSE | @@ -721,52 +721,52 @@||
8 | -2012-12-30 | -X6 | +2014-03-05 | +K10 | B_ESCHR_COL | +R | S | S | S | -S | -TRUE | +FALSE | TRUE | |
9 | -2013-03-15 | -X6 | +2014-03-11 | +K10 | B_ESCHR_COL | S | S | S | S | FALSE | -FALSE | +TRUE | ||
10 | -2014-01-14 | -X6 | +2014-06-20 | +K10 | B_ESCHR_COL | S | -R | S | S | -TRUE | -TRUE | +S | +FALSE | +FALSE |
Instead of 4, now 9 isolates are flagged. In total, 88.1% of all isolates are marked ‘first weighted’ - 146.6% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
+Instead of 3, now 9 isolates are flagged. In total, 87.7% of all isolates are marked ‘first weighted’ - 28.5% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
As with filter_first_isolate()
, there’s a shortcut for this new algorithm too:
So we end up with 4,404 isolates for analysis.
+data_1st <- data %>%
+ filter_first_weighted_isolate()
So we end up with 4,383 isolates for analysis.
We can remove unneeded columns:
- +Now our data looks like:
- +head(data_1st)
date | @@ -785,55 +785,70 @@|||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2015-02-23 | -C5 | -Hospital B | -B_STPHY_AUR | -R | +2014-06-07 | +Y9 | +Hospital A | +B_KLBSL_PNE | R | S | S | -M | -Gram positive | -Staphylococcus | -aureus | +S | +F | +Gram negative | +Klebsiella | +pneumoniae | TRUE |
2011-10-29 | -X2 | +2010-06-10 | +Z1 | +Hospital B | +B_ESCHR_COL | +R | +S | +S | +S | +F | +Gram negative | +Escherichia | +coli | +TRUE | +|||||||
2012-03-20 | +J6 | Hospital C | B_STRPTC_PNE | -R | +S | S | S | R | -F | +M | Gram positive | Streptococcus | pneumoniae | TRUE | |||||||
2016-10-31 | +M5 | +Hospital A | +B_ESCHR_COL | +S | +R | +S | +S | +M | +Gram negative | +Escherichia | +coli | +TRUE | +|||||||||
2010-06-10 | -J5 | +2016-05-05 | +W8 | Hospital B | B_ESCHR_COL | -S | -S | -S | -S | -M | -Gram negative | -Escherichia | -coli | -TRUE | -|||||||
2013-11-09 | -U4 | -Hospital A | -B_ESCHR_COL | R | S | S | @@ -844,34 +859,19 @@coli | TRUE | |||||||||||||
2010-10-12 | -C3 | +||||||||||||||||||||
2016-03-10 | +G8 | Hospital A | -B_STPHY_AUR | -S | -I | +B_STRPTC_PNE | S | S | +S | +R | M | Gram positive | -Staphylococcus | -aureus | -TRUE | -||||||
2017-12-04 | -T3 | -Hospital C | -B_ESCHR_COL | -R | -S | -S | -S | -F | -Gram negative | -Escherichia | -coli | +Streptococcus | +pneumoniae | TRUE | |||||||
1 | Escherichia coli | -2,165 | -49.2% | -2,165 | -49.2% | +2,128 | +48.6% | +2,128 | +48.6% | ||||||||||||
2 | Staphylococcus aureus | -1,078 | -24.5% | -3,243 | -73.6% | +1,110 | +25.3% | +3,238 | +73.9% | ||||||||||||
3 | Streptococcus pneumoniae | -719 | -16.3% | -3,962 | -90.0% | +684 | +15.6% | +3,922 | +89.5% | ||||||||||||
4 | Klebsiella pneumoniae | -442 | -10.0% | -4,404 | +461 | +10.5% | +4,383 | 100.0% |
The functions portion_R
, portion_RI
, portion_I
, portion_IS
and portion_S
can be used to determine the portion of a specific antimicrobial outcome. They can be used on their own:
data_1st %>% portion_IR(amox)
+# [1] 0.4832307
Or can be used in conjuction with group_by()
and summarise()
, both from the dplyr
package:
data_1st %>%
+ group_by(hospital) %>%
+ summarise(amoxicillin = portion_IR(amox))
hospital | @@ -955,27 +960,27 @@ Longest: 24||
---|---|---|
Hospital A | -0.4852484 | +0.4959169 |
Hospital B | -0.4594419 | +0.4690956 |
Hospital C | -0.4875346 | +0.5075529 |
Hospital D | -0.4583822 | +0.4695341 |
Of course it would be very convenient to know the number of isolates responsible for the percentages. For that purpose the n_rsi()
can be used, which works exactly like n_distinct()
from the dplyr
package. It counts all isolates available for every group (i.e. values S, I or R):
data_1st %>%
- group_by(hospital) %>%
- summarise(amoxicillin = portion_IR(amox),
- available = n_rsi(amox))
data_1st %>%
+ group_by(hospital) %>%
+ summarise(amoxicillin = portion_IR(amox),
+ available = n_rsi(amox))
hospital | @@ -985,32 +990,32 @@ Longest: 24||||
---|---|---|---|---|
Hospital A | -0.4852484 | -1288 | +0.4959169 | +1347 |
Hospital B | -0.4594419 | -1541 | +0.4690956 | +1537 |
Hospital C | -0.4875346 | -722 | +0.5075529 | +662 |
Hospital D | -0.4583822 | -853 | +0.4695341 | +837 |
These functions can also be used to get the portion of multiple antibiotics, to calculate co-resistance very easily:
-data_1st %>%
- group_by(genus) %>%
- summarise(amoxicillin = portion_S(amcl),
- gentamicin = portion_S(gent),
- "amox + gent" = portion_S(amcl, gent))
data_1st %>%
+ group_by(genus) %>%
+ summarise(amoxicillin = portion_S(amcl),
+ gentamicin = portion_S(gent),
+ "amox + gent" = portion_S(amcl, gent))
genus | @@ -1021,99 +1026,99 @@ Longest: 24||||||
---|---|---|---|---|---|---|
Escherichia | -0.7247113 | -0.9205543 | -0.9764434 | +0.7481203 | +0.9163534 | +0.9765038 |
Klebsiella | -0.7714932 | -0.9049774 | -0.9841629 | +0.7722343 | +0.9240781 | +0.9913232 |
Staphylococcus | -0.7541744 | -0.9313544 | -0.9860853 | +0.7567568 | +0.9234234 | +0.9864865 |
Streptococcus | -0.7593880 | +0.7383041 | 0.0000000 | -0.7593880 | +0.7383041 |
To make a transition to the next part, let’s see how this difference could be plotted:
-data_1st %>%
- group_by(genus) %>%
- summarise("1. Amoxicillin" = portion_S(amcl),
- "2. Gentamicin" = portion_S(gent),
- "3. Amox + gent" = portion_S(amcl, gent)) %>%
- tidyr::gather("Antibiotic", "S", -genus) %>%
- ggplot(aes(x = genus,
- y = S,
- fill = Antibiotic)) +
- geom_col(position = "dodge2")
data_1st %>%
+ group_by(genus) %>%
+ summarise("1. Amoxicillin" = portion_S(amcl),
+ "2. Gentamicin" = portion_S(gent),
+ "3. Amox + gent" = portion_S(amcl, gent)) %>%
+ tidyr::gather("Antibiotic", "S", -genus) %>%
+ ggplot(aes(x = genus,
+ y = S,
+ fill = Antibiotic)) +
+ geom_col(position = "dodge2")
To show results in plots, most R users would nowadays use the ggplot2
package. This package lets you create plots in layers. You can read more about it on their website. A quick example would look like these syntaxes:
ggplot(data = a_data_set,
- mapping = aes(x = year,
- y = value)) +
- geom_col() +
- labs(title = "A title",
- subtitle = "A subtitle",
- x = "My X axis",
- y = "My Y axis")
-
-ggplot(a_data_set,
- aes(year, value) +
- geom_bar()
ggplot(data = a_data_set,
+ mapping = aes(x = year,
+ y = value)) +
+ geom_col() +
+ labs(title = "A title",
+ subtitle = "A subtitle",
+ x = "My X axis",
+ y = "My Y axis")
+
+ggplot(a_data_set,
+ aes(year, value) +
+ geom_bar()
The AMR
package contains functions to extend this ggplot2
package, for example geom_rsi()
. It automatically transforms data with count_df()
or portion_df()
and show results in stacked bars. Its simplest and shortest example:
Omit the translate_ab = FALSE
to have the antibiotic codes (amox, amcl, cipr, gent) translated to official WHO names (amoxicillin, amoxicillin and betalactamase inhibitor, ciprofloxacin, gentamicin).
If we group on e.g. the genus
column and add some additional functions from our package, we can create this:
# group the data on `genus`
-ggplot(data_1st %>% group_by(genus)) +
- # create bars with genus on x axis
- # it looks for variables with class `rsi`,
- # of which we have 4 (earlier created with `as.rsi`)
- geom_rsi(x = "genus") +
- # split plots on antibiotic
- facet_rsi(facet = "Antibiotic") +
- # make R red, I yellow and S green
- scale_rsi_colours() +
- # show percentages on y axis
- scale_y_percent(breaks = 0:4 * 25) +
- # turn 90 degrees, make it bars instead of columns
- coord_flip() +
- # add labels
- labs(title = "Resistance per genus and antibiotic",
- subtitle = "(this is fake data)") +
- # and print genus in italic to follow our convention
- # (is now y axis because we turned the plot)
- theme(axis.text.y = element_text(face = "italic"))
# group the data on `genus`
+ggplot(data_1st %>% group_by(genus)) +
+ # create bars with genus on x axis
+ # it looks for variables with class `rsi`,
+ # of which we have 4 (earlier created with `as.rsi`)
+ geom_rsi(x = "genus") +
+ # split plots on antibiotic
+ facet_rsi(facet = "Antibiotic") +
+ # make R red, I yellow and S green
+ scale_rsi_colours() +
+ # show percentages on y axis
+ scale_y_percent(breaks = 0:4 * 25) +
+ # turn 90 degrees, make it bars instead of columns
+ coord_flip() +
+ # add labels
+ labs(title = "Resistance per genus and antibiotic",
+ subtitle = "(this is fake data)") +
+ # and print genus in italic to follow our convention
+ # (is now y axis because we turned the plot)
+ theme(axis.text.y = element_text(face = "italic"))
To simplify this, we also created the ggplot_rsi()
function, which combines almost all above functions:
data_1st %>%
- group_by(genus) %>%
- ggplot_rsi(x = "genus",
- facet = "Antibiotic",
- breaks = 0:4 * 25,
- datalabels = FALSE) +
- coord_flip()
data_1st %>%
+ group_by(genus) %>%
+ ggplot_rsi(x = "genus",
+ facet = "Antibiotic",
+ breaks = 0:4 * 25,
+ datalabels = FALSE) +
+ coord_flip()
The next example uses the included septic_patients
, which is an anonymised data set containing 2,000 microbial blood culture isolates with their full antibiograms found in septic patients in 4 different hospitals in the Netherlands, between 2001 and 2017. It is true, genuine data. This data.frame
can be used to practice AMR analysis.
We will compare the resistance to fosfomycin (column fosf
) in hospital A and D. The input for the final fisher.test()
will be this:
We can transform the data and apply the test in only a couple of lines:
-septic_patients %>%
- filter(hospital_id %in% c("A", "D")) %>% # filter on only hospitals A and D
- select(hospital_id, fosf) %>% # select the hospitals and fosfomycin
- group_by(hospital_id) %>% # group on the hospitals
- count_df(combine_IR = TRUE) %>% # count all isolates per group (hospital_id)
- tidyr::spread(hospital_id, Value) %>% # transform output so A and D are columns
- select(A, D) %>% # and select these only
- as.matrix() %>% # transform to good old matrix for fisher.test()
- fisher.test() # do Fisher's Exact Test
-#
-# Fisher's Exact Test for Count Data
-#
-# data: .
-# p-value = 0.03104
-# alternative hypothesis: true odds ratio is not equal to 1
-# 95 percent confidence interval:
-# 1.054283 4.735995
-# sample estimates:
-# odds ratio
-# 2.228006
septic_patients %>%
+ filter(hospital_id %in% c("A", "D")) %>% # filter on only hospitals A and D
+ select(hospital_id, fosf) %>% # select the hospitals and fosfomycin
+ group_by(hospital_id) %>% # group on the hospitals
+ count_df(combine_IR = TRUE) %>% # count all isolates per group (hospital_id)
+ tidyr::spread(hospital_id, Value) %>% # transform output so A and D are columns
+ select(A, D) %>% # and select these only
+ as.matrix() %>% # transform to good old matrix for fisher.test()
+ fisher.test() # do Fisher's Exact Test
+#
+# Fisher's Exact Test for Count Data
+#
+# data: .
+# p-value = 0.03104
+# alternative hypothesis: true odds ratio is not equal to 1
+# 95 percent confidence interval:
+# 1.054283 4.735995
+# sample estimates:
+# odds ratio
+# 2.228006
As can be seen, the p value is 0.03, which means that the fosfomycin resistances found in hospital A and D are really different.
This package is available on the official R network (CRAN), which has a peer-reviewed submission process. Install this package in R with:
- +install.packages("AMR")
It will be downloaded and installed automatically. For RStudio, click on the menu Tools > Install Packages… and then type in “AMR” and press Install.
The latest and unpublished development version can be installed with (precaution: may be unstable):
- +install.packages("devtools")
+devtools::install_gitlab("msberends/AMR")
The AMR
package basically does four important things:
It cleanses existing data by providing new classes for microoganisms, antibiotics and antimicrobial results (both S/I/R and MIC). With this package, you learn R everything about microbiology that is needed for analysis. These functions all use artificial intelligence to guess results that you would expect:
+as.mo()
to get an ID of a microorganism. The IDs are human readable for the trained eye - the ID of Klebsiella pneumoniae is “B_KLBSL_PNE” (B stands for Bacteria) and the ID of S. aureus is “B_STPHY_AUR”. The function takes almost any text as input that looks like the name or code of a microorganism like “E. coli”, “esco” or “esccol” and tries to find expected results using artificial intelligence (AI) on the included ITIS data set, consisting of almost 20,000 microorganisms. It is very fast, please see our benchmarks. Moreover, it can group Staphylococci into coagulase negative and positive (CoNS and CoPS, see source) and can categorise Streptococci into Lancefield groups (like beta-haemolytic Streptococcus Group B, source).as.rsi()
to transform values to valid antimicrobial results. It produces just S, I or R based on your input and warns about invalid values. Even values like “<=0.002; S” (combined MIC/RSI) will result in “S”.as.mic()
to cleanse your MIC values. It produces a so-called factor (called ordinal in SPSS) with valid MIC values as levels. A value like “<=0.002; S” (combined MIC/RSI) will result in “<=0.002”.as.atc()
to get the ATC code of an antibiotic as defined by the WHO. This package contains a database with most LIS codes, official names, DDDs and even trade names of antibiotics. For example, the values “Furabid”, “Furadantin”, “nitro” all return the ATC code of Nitrofurantoine.It enhances existing data and adds new data from data sets included in this package.
+eucast_rules()
to apply EUCAST expert rules to isolates.first_isolate()
to identify the first isolates of every patient using guidelines from the CLSI (Clinical and Laboratory Standards Institute).
@@ -306,9 +306,9 @@
microorganisms
contains the complete taxonomic tree of almost 20,000 microorganisms (bacteria, fungi/yeasts and protozoa). Furthermore, the colloquial name and Gram stain are available, which enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like mo_genus()
, mo_family()
, mo_gramstain()
or even mo_phylum()
. As they use as.mo()
internally, they also use artificial intelligence. For example, mo_genus("MRSA")
and mo_genus("S. aureus")
will both return "Staphylococcus"
. They also come with support for German, Dutch, Spanish, Italian, French and Portuguese. These functions can be used to add new variables to your data.antibiotics
contains almost 500 antimicrobial drugs with their ATC code, EARS-Net code, common LIS codes, official name, trivial name and DDD of both oral and parenteral administration. It also contains hundreds of trade names. Use functions like atc_name()
and atc_tradenames()
to look up values. The atc_*
functions use as.atc()
internally so they support AI to guess your expected result. For example, atc_name("Fluclox")
, atc_name("Floxapen")
and atc_name("J01CF05")
will all return "Flucloxacillin"
. These functions can again be used to add new variables to your data.It analyses the data with convenient functions that use well-known methods.
+portion_R()
, portion_IR()
, portion_I()
, portion_SI()
and portion_S()
functions. Similarly, the number of isolates can be determined with the count_R()
, count_IR()
, count_I()
, count_SI()
and count_S()
functions. All these functions can be used with the dplyr
package (e.g. in conjunction with summarise()
)geom_rsi()
, a function made for the ggplot2
packagekurtosis()
, skewness()
and create frequency tables with freq()
It teaches the user how to use all the above actions.
+septic_patients
. This data set contains:
@@ -329,8 +329,6 @@
as.mo()
to identify an MO code.first_isolate()
and eucast_rules()
, all parameters will be filled in automatically.antibiotics
data set now contains a column ears_net
.All ab_*
functions are deprecated and replaced by atc_*
functions:
ab_property -> atc_property()
-ab_name -> atc_name()
-ab_official -> atc_official()
-ab_trivial_nl -> atc_trivial_nl()
-ab_certe -> atc_certe()
-ab_umcg -> atc_umcg()
-ab_tradenames -> atc_tradenames()
as.atc()
internally. The old atc_property
has been renamed atc_online_property()
. This is done for two reasons: firstly, not all ATC codes are of antibiotics (ab) but can also be of antivirals or antifungals. Secondly, the input must have class atc
or must be coerable to this class. Properties of these classes should start with the same class name, analogous to as.mo()
and e.g. mo_genus
.pkgdown
)
-ab_*
functions are deprecated and replaced by atc_*
functions: r ab_property -> atc_property() ab_name -> atc_name() ab_official -> atc_official() ab_trivial_nl -> atc_trivial_nl() ab_certe -> atc_certe() ab_umcg -> atc_umcg() ab_tradenames -> atc_tradenames()
These functions use as.atc()
internally. The old atc_property
has been renamed atc_online_property()
. This is done for two reasons: firstly, not all ATC codes are of antibiotics (ab) but can also be of antivirals or antifungals. Secondly, the input must have class atc
or must be coerable to this class. Properties of these classes should start with the same class name, analogous to as.mo()
and e.g. mo_genus
.pkgdown
)set_mo_source()
and get_mo_source()
to use your own predefined MO codes as input for as.mo()
and consequently all mo_*
functionsdplyr
version 0.8.0guess_ab_col()
to find an antibiotic column in a tableas.atc()
mo_renamed()
to get a list of all returned values from as.mo()
that have had taxonomic renamingage()
to calculate the (patients) age in yearsage_groups()
to split ages into custom or predefined groups (like children or elderly). This allows for easier demographic antimicrobial resistance analysis per age group.New function ggplot_rsi_predict()
as well as the base R plot()
function can now be used for resistance prediction calculated with resistance_predict()
:
ggplot_rsi_predict()
as well as the base R plot()
function can now be used for resistance prediction calculated with resistance_predict()
: r x <- resistance_predict(septic_patients, col_ab = "amox") plot(x) ggplot_rsi_predict(x)
Functions filter_first_isolate()
and filter_first_weighted_isolate()
to shorten and fasten filtering on data sets with antimicrobial results, e.g.:
is equal to:
- +filter_first_isolate()
and filter_first_weighted_isolate()
to shorten and fasten filtering on data sets with antimicrobial results, e.g.: r septic_patients %>% filter_first_isolate(...) # or filter_first_isolate(septic_patients, ...)
is equal to: r septic_patients %>% mutate(only_firsts = first_isolate(septic_patients, ...)) %>% filter(only_firsts == TRUE) %>% select(-only_firsts)
New vignettes about how to conduct AMR analysis, predict antimicrobial resistance, use the G-test and more. These are also available (and even easier readable) on our website: https://msberends.gitlab.io/AMR.
as.atc()
atc_ddd()
and atc_groups()
have been renamed atc_online_ddd()
and atc_online_groups()
. The old functions are deprecated and will be removed in a future version.guess_mo()
is now deprecated in favour of as.mo()
and will be removed in future versionsguess_atc()
is now deprecated in favour of as.atc()
and will be removed in future versionseucast_rules()
:
-eucast_rules()
:as.mo()
:
-as.mo()
:as.atc()
first_isolate()
:
-first_isolate()
:septic_patients
data set this yielded a difference of 0.15% more isolatescol_patientid
), when this parameter was left blankcol_keyantibiotics()
), when this parameter was left blankoutput_logical
, the function will now always return a logical valuefilter_specimen
to specimen_group
, although using filter_specimen
will still workportion
functions, that low counts can influence the outcome and that the portion
functions may camouflage this, since they only return the portion (albeit being dependent on the minimum
parameter)microorganisms.certe
and microorganisms.umcg
into microorganisms.codes
as.atc()
rsi
and mic
freq()
function):
-freq()
function):Support for tidyverse quasiquotation! Now you can create frequency tables of function outcomes:
-# Determine genus of microorganisms (mo) in `septic_patients` data set:
-# OLD WAY
-septic_patients %>%
- mutate(genus = mo_genus(mo)) %>%
- freq(genus)
-# NEW WAY
-septic_patients %>%
- freq(mo_genus(mo))
-
-# Even supports grouping variables:
-septic_patients %>%
- group_by(gender) %>%
- freq(mo_genus(mo))
# Determine genus of microorganisms (mo) in `septic_patients` data set:
+# OLD WAY
+septic_patients %>%
+ mutate(genus = mo_genus(mo)) %>%
+ freq(genus)
+# NEW WAY
+septic_patients %>%
+ freq(mo_genus(mo))
+
+# Even supports grouping variables:
+septic_patients %>%
+ group_by(gender) %>%
+ freq(mo_genus(mo))
header
functionheader
is now set to TRUE
at default, even for markdownas.atc()
droplevels
to exclude empty factor levels when input is a factorselect()
on frequency tablesscale_y_percent()
now contains the limits
parametermdro()
, key_antibiotics()
and eucast_rules()
resistance_predict()
function)as.mic()
to support more values ending in (several) zeroesFix for as.mic()
to support more values ending in (several) zeroes
as.atc()
EUCAST_rules
was renamed to eucast_rules
, the old function still exists as a deprecated functioneucast_rules
function:
-eucast_rules
function:rules
to specify which rules should be applied (expert rules, breakpoints, others or all)verbose
which can be set to TRUE
to get very specific messages about which columns and rows were affectedas.atc()
septic_patients
now reflects these changespipe
for piperacillin (J01CA12), also to the mdro
functionkingdom
to the microorganisms data set, and function mo_kingdom
to look up valuesas.mo
(and subsequently all mo_*
functions), as empty values wil be ignored a priori
as.mo
will return NAFunction as.mo
(and all mo_*
wrappers) now supports genus abbreviations with “species” attached
as.mo
(and all mo_*
wrappers) now supports genus abbreviations with “species” attached r as.mo("E. species") # B_ESCHR mo_fullname("E. spp.") # "Escherichia species" as.mo("S. spp") # B_STPHY mo_fullname("S. species") # "Staphylococcus species"
combine_IR
(TRUE/FALSE) to functions portion_df
and count_df
, to indicate that all values of I and R must be merged into one, so the output only consists of S vs. IR (susceptible vs. non-susceptible)portion_*(..., as_percent = TRUE)
when minimal number of isolates would not be metas.atc()
portion_*
functions now throws a warning when total available isolate is below parameter minimum
as.mo
, as.rsi
, as.mic
, as.atc
and freq
will not set package name as attribute anymorefreq()
:
-freq()
:Support for grouping variables, test with:
- +septic_patients %>%
+ group_by(hospital_id) %>%
+ freq(gender)
Support for (un)selecting columns:
-septic_patients %>%
- freq(hospital_id) %>%
- select(-count, -cum_count) # only get item, percent, cum_percent
septic_patients %>%
+ freq(hospital_id) %>%
+ select(-count, -cum_count) # only get item, percent, cum_percent
hms::is.hms
as.atc()
na
, to choose which character to print for empty valuesheader
to turn the header info off (default when markdown = TRUE
)title
to manually setbthe title of the frequency tablefirst_isolate
now tries to find columns to use as input when parameters are left blankmdro
)as.atc()
ggplot_rsi
and scale_y_percent
have breaks
parameteras.mo
:
-as.mo
:"CRS"
-> Stenotrophomonas maltophilia
as.atc()
"MSSE"
-> Staphylococcus epidermidis
join
functionsis.rsi.eligible
, now 15-20 times fasterg.test
, when sum(x)
is below 1000 or any of the expected values is below 5, Fisher’s Exact Test will be suggestedas.atc()
New
microorganisms
now contains all microbial taxonomic data from ITIS (kingdoms Bacteria, Fungi and Protozoa), the Integrated Taxonomy Information System, available via https://itis.gov. The data set now contains more than 18,000 microorganisms with all known bacteria, fungi and protozoa according ITIS with genus, species, subspecies, family, order, class, phylum and subkingdom. The new data set microorganisms.old
contains all previously known taxonomic names from those kingdoms.mo_property
:
-mo_property
:mo_phylum
, mo_class
, mo_order
, mo_family
, mo_genus
, mo_species
, mo_subspecies
mo_fullname
, mo_shortname
@@ -530,52 +475,22 @@ These functions use as.atc()
mo_ref
They also come with support for German, Dutch, French, Italian, Spanish and Portuguese:
-mo_gramstain("E. coli")
-# [1] "Gram negative"
-mo_gramstain("E. coli", language = "de") # German
-# [1] "Gramnegativ"
-mo_gramstain("E. coli", language = "es") # Spanish
-# [1] "Gram negativo"
-mo_fullname("S. group A", language = "pt") # Portuguese
-# [1] "Streptococcus grupo A"
Furthermore, former taxonomic names will give a note about the current taxonomic name:
- -count_R
, count_IR
, count_I
, count_SI
and count_S
to selectively count resistant or susceptible isolates
+They also come with support for German, Dutch, French, Italian, Spanish and Portuguese: r mo_gramstain("E. coli") # [1] "Gram negative" mo_gramstain("E. coli", language = "de") # German # [1] "Gramnegativ" mo_gramstain("E. coli", language = "es") # Spanish # [1] "Gram negativo" mo_fullname("S. group A", language = "pt") # Portuguese # [1] "Streptococcus grupo A"
Furthermore, former taxonomic names will give a note about the current taxonomic name: r mo_gramstain("Esc blattae") # Note: 'Escherichia blattae' (Burgess et al., 1973) was renamed 'Shimwellia blattae' (Priest and Barker, 2010) # [1] "Gram negative"
count_R
, count_IR
, count_I
, count_SI
and count_S
to selectively count resistant or susceptible isolatescount_df
(which works like portion_df
) to get all counts of S, I and R of a data set with antibiotic columns, with support for grouped variablesis.rsi.eligible
to check for columns that have valid antimicrobial results, but do not have the rsi
class yet. Transform the columns of your raw data with: data %>% mutate_if(is.rsi.eligible, as.rsi)
Functions as.mo
and is.mo
as replacements for as.bactid
and is.bactid
(since the microoganisms
data set not only contains bacteria). These last two functions are deprecated and will be removed in a future release. The as.mo
function determines microbial IDs using Artificial Intelligence (AI):
as.mo("E. coli")
-# [1] B_ESCHR_COL
-as.mo("MRSA")
-# [1] B_STPHY_AUR
-as.mo("S group A")
-# [1] B_STRPTC_GRA
And with great speed too - on a quite regular Linux server from 2007 it takes us less than 0.02 seconds to transform 25,000 items:
- +as.mo
and is.mo
as replacements for as.bactid
and is.bactid
(since the microoganisms
data set not only contains bacteria). These last two functions are deprecated and will be removed in a future release. The as.mo
function determines microbial IDs using Artificial Intelligence (AI): r as.mo("E. coli") # [1] B_ESCHR_COL as.mo("MRSA") # [1] B_STPHY_AUR as.mo("S group A") # [1] B_STRPTC_GRA
And with great speed too - on a quite regular Linux server from 2007 it takes us less than 0.02 seconds to transform 25,000 items: r thousands_of_E_colis <- rep("E. coli", 25000) microbenchmark::microbenchmark(as.mo(thousands_of_E_colis), unit = "s") # Unit: seconds # min median max neval # 0.01817717 0.01843957 0.03878077 100
reference_df
for as.mo
, so users can supply their own microbial IDs, name or codes as a reference tablebactid
to mo
, like:
-bactid
to mo
, like:EUCAST_rules
, first_isolate
and key_antibiotics
microorganisms
and septic_patients
labels_rsi_count
to print datalabels on a RSI ggplot2
modelFunctions as.atc
and is.atc
to transform/look up antibiotic ATC codes as defined by the WHO. The existing function guess_atc
is now an alias of as.atc
.
ab_property
and its aliases: ab_name
, ab_tradenames
, ab_certe
, ab_umcg
and ab_trivial_nl
@@ -590,14 +505,7 @@ These functions use as.atc()
Changed
antibiotics
data set: Terbinafine (D01BA02), Rifaximin (A07AA11) and Isoconazole (D01AC05)Added 163 trade names to the antibiotics
data set, it now contains 298 different trade names in total, e.g.:
antibiotics
data set, it now contains 298 different trade names in total, e.g.: r ab_official("Bactroban") # [1] "Mupirocin" ab_name(c("Bactroban", "Amoxil", "Zithromax", "Floxapen")) # [1] "Mupirocin" "Amoxicillin" "Azithromycin" "Flucloxacillin" ab_atc(c("Bactroban", "Amoxil", "Zithromax", "Floxapen")) # [1] "R01AX06" "J01CA04" "J01FA10" "J01CF05"
first_isolate
, rows will be ignored when there’s no species availableratio
is now deprecated and will be removed in a future release, as it is not really the scope of this packageas.atc()
prevalence
column to the microorganisms
data setminimum
and as_percent
to portion_df
Support for quasiquotation in the functions series count_*
and portions_*
, and n_rsi
. This allows to check for more than 2 vectors or columns.
ggplot_rsi
and geom_rsi
so they can cope with count_df
. The new fun
parameter has value portion_df
at default, but can be set to count_df
.ggplot_rsi
when the ggplot2
package was not loadedlabels_rsi_count
to ggplot_rsi
-geom_rsi
(and ggplot_rsi
) so you can set your own preferencesquote
to the freq
functiondiff
for frequency tablesfreq
) header of class character
-Support for types (classes) list and matrix for freq
For lists, subsetting is possible:
- -count_*
and portions_*
, and n_rsi
. This allows to check for more than 2 vectors or columns. ```r septic_patients %>% select(amox, cipr) %>% count_IR() # which is the same as: septic_patients %>% count_IR(amox, cipr)septic_patients %>% portion_S(amcl) septic_patients %>% portion_S(amcl, gent) septic_patients %>% portion_S(amcl, gent, pita) * Edited `ggplot_rsi` and `geom_rsi` so they can cope with `count_df`. The new `fun` parameter has value `portion_df` at default, but can be set to `count_df`. * Fix for `ggplot_rsi` when the `ggplot2` package was not loaded * Added datalabels function `labels_rsi_count` to `ggplot_rsi` * Added possibility to set any parameter to `geom_rsi` (and `ggplot_rsi`) so you can set your own preferences * Fix for joins, where predefined suffices would not be honoured * Added parameter `quote` to the `freq` function * Added generic function `diff` for frequency tables * Added longest en shortest character length in the frequency table (`freq`) header of class `character` * Support for types (classes) list and matrix for `freq`
r my_matrix = with(septic_patients, matrix(c(age, gender), ncol = 2)) freq(my_matrix) For lists, subsetting is possible:
r my_list = list(age = septic_patients$age, gender = septic_patients$gender) my_list %>% freq(age) my_list %>% freq(gender) ```
as.atc()
Newrsi_df
was removed in favour of new functions portion_R
, portion_IR
, portion_I
, portion_SI
and portion_S
to selectively calculate resistance or susceptibility. These functions are 20 to 30 times faster than the old rsi
function. The old function still works, but is deprecated.
-rsi_df
was removed in favour of new functions portion_R
, portion_IR
, portion_I
, portion_SI
and portion_S
to selectively calculate resistance or susceptibility. These functions are 20 to 30 times faster than the old rsi
function. The old function still works, but is deprecated.portion_df
to get all portions of S, I and R of a data set with antibiotic columns, with support for grouped variablesggplot2
-geom_rsi
, facet_rsi
, scale_y_percent
, scale_rsi_colours
and theme_rsi
ggplot_rsi
to apply all above functions on a data set:
@@ -679,32 +554,22 @@ These functions use as.atc()
as.bactid
and is.bactid
to transform/ look up microbial ID’s.guess_bactid
is now an alias of as.bactid
kurtosis
and skewness
that are lacking in base R - they are generic functions and have support for vectors, data.frames and matricesg.test
to perform the Χ2 distributed G-test, which use is the same as chisq.test
ratio
to transform a vector of values to a preset ratioratio
to transform a vector of values to a preset ratioratio(c(10, 500, 10), ratio = "1:2:1")
would return 130, 260, 130
%in%
or %like%
(and give them keyboard shortcuts), or to view the datasets that come with this packagep.symbol
to transform p values to their related symbols: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
clipboard_import
and clipboard_export
as helper functions to quickly copy and paste from/to software like Excel and SPSS. These functions use the clipr
package, but are a little altered to also support headless Linux servers (so you can use it in RStudio Server)freq
):
-freq
):rsi
(antimicrobial resistance) to use as inputtable
to use as input: freq(table(x, y))
@@ -719,8 +584,6 @@ These functions use as.atc()
options(max.print.freq = n)
where n is your preset valueas.atc()
microorganisms
dataset (especially for Salmonella) and the column bactid
now has the new class "bactid"
rsi
and mic
functions:
-rsi
and mic
functions:as.rsi("<=0.002; S")
will return S
as.mic("<=0.002; S")
will return <=0.002
as.mic("<= 0.002")
now worksrsi
and mic
do not add the attribute package.version
anymore"groups"
option for atc_property(..., property)
. It will return a vector of the ATC hierarchy as defined by the WHO. The new function atc_groups
is a convenient wrapper around this.atc_property
as it requires the host set by url
to be responsivefirst_isolate
algorithm to exclude isolates where bacteria ID or genus is unavailable924b62
) from the dplyr
package v0.7.5 and aboveguess_bactid
(now called as.bactid
)
-guess_bactid
(now called as.bactid
)yourdata %>% select(genus, species) %>% as.bactid()
now also worksas.atc()
as.atc()
n_rsi
to count cases where antibiotic test results were available, to be used in conjunction with dplyr::summarise
, see ?rsin_rsi
to count cases where antibiotic test results were available, to be used in conjunction with dplyr::summarise
, see ?rsiguess_bactid
to determine the ID of a microorganism based on genus/species or known abbreviations like MRSAguess_atc
to determine the ATC of an antibiotic based on name, trade name, or known abbreviationsfreq
to create frequency tables, with additional info in a headerMDRO
to determine Multi Drug Resistant Organisms (MDRO) with support for country-specific guidelines.
-MDRO
to determine Multi Drug Resistant Organisms (MDRO) with support for country-specific guidelines.BRMO
and MRGN
are wrappers for Dutch and German guidelines, respectively"points"
or "keyantibiotics"
, see ?first_isolate
tibble
s and data.table
s