diff --git a/DESCRIPTION b/DESCRIPTION index 3298eaede..3b568c517 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,5 +1,5 @@ Package: AMR -Version: 1.3.0.9000 +Version: 1.3.0.9001 Date: 2020-08-10 Title: Antimicrobial Resistance Analysis Authors@R: c( diff --git a/NEWS.md b/NEWS.md index 034004e4f..4fbfc5953 100755 --- a/NEWS.md +++ b/NEWS.md @@ -1,4 +1,4 @@ -# AMR 1.3.0.9000 +# AMR 1.3.0.9001 ## Last updated: 10 August 2020 ### Changed diff --git a/R/rsi.R b/R/rsi.R index db5718aa7..371b4baa1 100755 --- a/R/rsi.R +++ b/R/rsi.R @@ -110,7 +110,7 @@ #' library(dplyr) #' df %>% mutate_at(vars(AMP:TOB), as.rsi) #' df %>% mutate(across(AMP:TOB), as.rsi) - +#' #' df %>% #' mutate_at(vars(AMP:TOB), as.rsi, mo = "E. coli") #' diff --git a/docs/404.html b/docs/404.html index 1fc44158d..ec81be3ce 100644 --- a/docs/404.html +++ b/docs/404.html @@ -81,7 +81,7 @@
@@ -248,7 +248,7 @@ Content not found. Please use links in the navbar. diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index ff5e96566..3c51005c9 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -81,7 +81,7 @@ @@ -496,7 +496,7 @@ END OF TERMS AND CONDITIONS diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index aa04f346b..c2eb21d46 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -39,7 +39,7 @@ @@ -186,7 +186,7 @@vignettes/AMR.Rmd
AMR.Rmd
Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 30 July 2020.
+Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 10 August 2020.
As with many uses in R, we need some additional packages for AMR analysis. Our package works closely together with the tidyverse packages dplyr
and ggplot2
by RStudio. The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.
We will also use the cleaner
package, that can be used for cleaning data and creating frequency tables.
library(dplyr) -library(ggplot2) -library(AMR) -library(cleaner) ++# install.packages(c("dplyr", "ggplot2", "AMR", "cleaner")) ++library(dplyr) +library(ggplot2) +library(AMR) +library(cleaner) # (if not yet installed, install with:) -# install.packages(c("dplyr", "ggplot2", "AMR", "cleaner"))
To start with patients, we need a unique list of patients.
- +The LETTERS
object is available in R - it’s a vector with 26 characters: A
to Z
. The patients
object we just created is now a vector of length 260, with values (patient IDs) varying from A1
to Z10
. Now we we also set the gender of our patients, by putting the ID and the gender in a table:
patients_table <- data.frame(patient_id = patients, - gender = c(rep("M", 135), - rep("F", 125)))
+patients_table <- data.frame(patient_id = patients, + gender = c(rep("M", 135), + rep("F", 125))) +
The first 135 patient IDs are now male, the other 125 are female.
Let’s pretend that our data consists of blood cultures isolates from between 1 January 2010 and 1 January 2018.
- +This dates
object now contains all days in our date range.
For this tutorial, we will uses four different microorganisms: Escherichia coli, Staphylococcus aureus, Streptococcus pneumoniae, and Klebsiella pneumoniae:
-bacteria <- c("Escherichia coli", "Staphylococcus aureus", - "Streptococcus pneumoniae", "Klebsiella pneumoniae")
+bacteria <- c("Escherichia coli", "Staphylococcus aureus", + "Streptococcus pneumoniae", "Klebsiella pneumoniae") +
For completeness, we can also add the hospital where the patients was admitted and we need to define valid antibmicrobial results for our randomisation:
- +Using the sample()
function, we can randomly select items from all objects we defined earlier. To let our fake data reflect reality a bit, we will also approximately define the probabilities of bacteria and the antibiotic results with the prob
parameter.
sample_size <- 20000 -data <- data.frame(date = sample(dates, size = sample_size, replace = TRUE), - patient_id = sample(patients, size = sample_size, replace = TRUE), - hospital = sample(hospitals, size = sample_size, replace = TRUE, - prob = c(0.30, 0.35, 0.15, 0.20)), - bacteria = sample(bacteria, size = sample_size, replace = TRUE, - prob = c(0.50, 0.25, 0.15, 0.10)), - AMX = sample(ab_interpretations, size = sample_size, replace = TRUE, - prob = c(0.60, 0.05, 0.35)), - AMC = sample(ab_interpretations, size = sample_size, replace = TRUE, - prob = c(0.75, 0.10, 0.15)), - CIP = sample(ab_interpretations, size = sample_size, replace = TRUE, - prob = c(0.80, 0.00, 0.20)), - GEN = sample(ab_interpretations, size = sample_size, replace = TRUE, - prob = c(0.92, 0.00, 0.08)))
+sample_size <- 20000 +data <- data.frame(date = sample(dates, size = sample_size, replace = TRUE), + patient_id = sample(patients, size = sample_size, replace = TRUE), + hospital = sample(hospitals, size = sample_size, replace = TRUE, + prob = c(0.30, 0.35, 0.15, 0.20)), + bacteria = sample(bacteria, size = sample_size, replace = TRUE, + prob = c(0.50, 0.25, 0.15, 0.10)), + AMX = sample(ab_interpretations, size = sample_size, replace = TRUE, + prob = c(0.60, 0.05, 0.35)), + AMC = sample(ab_interpretations, size = sample_size, replace = TRUE, + prob = c(0.75, 0.10, 0.15)), + CIP = sample(ab_interpretations, size = sample_size, replace = TRUE, + prob = c(0.80, 0.00, 0.20)), + GEN = sample(ab_interpretations, size = sample_size, replace = TRUE, + prob = c(0.92, 0.00, 0.08))) +
Using the left_join()
function from the dplyr
package, we can ‘map’ the gender to the patient ID using the patients_table
object we created earlier:
data <- data %>% left_join(patients_table)
+data <- data %>% left_join(patients_table) +
The resulting data set contains 20,000 blood culture isolates. With the head()
function we can preview the first 6 rows of this data set:
head(data)
+head(data) +
date | @@ -336,70 +354,70 @@|||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2017-06-10 | -J6 | -Hospital B | +2014-08-12 | +M9 | +Hospital D | +Staphylococcus aureus | +R | +S | +S | +S | +M | +||
2013-03-13 | +W1 | +Hospital A | Escherichia coli | I | -R | -S | -S | -M | -|||||
2013-08-26 | -H10 | -Hospital A | -Escherichia coli | -S | -S | -R | -S | -M | -|||||
2016-02-09 | -S7 | -Hospital D | -Escherichia coli | -R | S | R | S | F | |||||
2015-12-02 | +L5 | +Hospital D | +Staphylococcus aureus | +R | +S | +R | +S | +M | +|||||
2013-06-27 | -A5 | +2017-09-11 | +O7 | Hospital B | -Escherichia coli | -S | -S | -S | -S | -M | -|||
2017-03-03 | -Q4 | -Hospital D | -Escherichia coli | +Streptococcus pneumoniae | R | S | S | S | F | ||||
2014-06-03 | +K4 | +Hospital C | +Streptococcus pneumoniae | +S | +I | +S | +S | +M | +|||||
2015-07-13 | -I5 | -Hospital A | +2010-07-19 | +W4 | +Hospital B | Staphylococcus aureus | S | S | -R | S | -M | +S | +F |
We also created a package dedicated to data cleaning and checking, called the cleaner
package. It freq()
function can be used to create frequency tables.
For example, for the gender
variable:
data %>% freq(gender)
+data %>% freq(gender) +
Frequency table
Class: character
Length: 20,000
@@ -432,16 +452,16 @@ Longest: 1
So, we can draw at least two conclusions immediately. From a data scientists perspective, the data looks clean: only values M
and F
. From a researchers perspective: there are slightly more men. Nothing we didn’t already know.
The data is already quite clean, but we still need to transform some variables. The bacteria
column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate()
function of the dplyr
package makes this really easy:
We also want to transform the antibiotics, because in real life data we don’t know if they are really clean. The as.rsi()
function ensures reliability and reproducibility in these kind of variables. The mutate_at()
will run the as.rsi()
function on defined variables:
Finally, we will apply EUCAST rules on our antimicrobial results. In Europe, most medical microbiological laboratories already apply these rules. Our package features their latest insights on intrinsic resistance and exceptional phenotypes. Moreover, the eucast_rules()
function can also apply additional rules, like forcing
Because the amoxicillin (column AMX
) and amoxicillin/clavulanic acid (column AMC
) in our data were generated randomly, some rows will undoubtedly contain AMX = S and AMC = R, which is technically impossible. The eucast_rules()
fixes this:
data <- eucast_rules(data, col_mo = "bacteria", rules = "all")
+data <- eucast_rules(data, col_mo = "bacteria", rules = "all") +
Now that we have the microbial ID, we can add some taxonomic properties:
-data <- data %>% - mutate(gramstain = mo_gramstain(bacteria), - genus = mo_genus(bacteria), - species = mo_species(bacteria))
+data <- data %>% + mutate(gramstain = mo_gramstain(bacteria), + genus = mo_genus(bacteria), + species = mo_species(bacteria)) +
(…) When preparing a cumulative antibiogram to guide clinical decisions about empirical antimicrobial therapy of initial infections, only the first isolate of a given species per patient, per analysis period (eg, one year) should be included, irrespective of body site, antimicrobial susceptibility profile, or other phenotypical characteristics (eg, biotype). The first isolate is easily identified, and cumulative antimicrobial susceptibility test data prepared using the first isolate are generally comparable to cumulative antimicrobial susceptibility test data calculated by other methods, providing duplicate isolates are excluded.
M39-A4 Analysis and Presentation of Cumulative Antimicrobial Susceptibility Test Data, 4th Edition. CLSI, 2014. Chapter 6.4
This AMR
package includes this methodology with the first_isolate()
function. It adopts the episode of a year (can be changed by user) and it starts counting days after every selected isolate. This new variable can easily be added to our data:
data <- data %>% - mutate(first = first_isolate(.)) +-+data <- data %>% + mutate(first = first_isolate(.)) # NOTE: Using column `bacteria` as input for `col_mo`. # NOTE: Using column `date` as input for `col_date`. -# NOTE: Using column `patient_id` as input for `col_patient_id`.So only 28.2% is suitable for resistance analysis! We can now filter on it with the
-filter()
function, also from thedplyr
package:+# NOTE: Using column `patient_id` as input for `col_patient_id`. +data_1st <- data %>% - filter(first == TRUE)
So only 28.5% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
+data_1st <- data %>% + filter(first == TRUE) +
For future use, the above two syntaxes can be shortened with the filter_first_isolate()
function:
data_1st <- data %>% - filter_first_isolate()
+data_1st <- data %>% + filter_first_isolate() +
We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient Q3, sorted on date:
+We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient A7, sorted on date:
isolate | @@ -507,8 +541,8 @@ Longest: 1|||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
1 | -2010-02-06 | -Q3 | +2010-02-03 | +A7 | B_ESCHR_COLI | S | S | @@ -518,10 +552,10 @@ Longest: 1||||
2 | -2010-04-09 | -Q3 | +2010-02-04 | +A7 | B_ESCHR_COLI | -S | +R | S | S | S | @@ -529,19 +563,30 @@ Longest: 1|
3 | -2010-07-03 | -Q3 | +2010-04-05 | +A7 | B_ESCHR_COLI | S | S | -R | +S | S | FALSE |
4 | -2010-07-31 | -Q3 | +2010-06-15 | +A7 | +B_ESCHR_COLI | +S | +S | +S | +S | +FALSE | +|
5 | +2010-08-28 | +A7 | B_ESCHR_COLI | S | S | @@ -549,10 +594,21 @@ Longest: 1R | FALSE | ||||
6 | +2010-09-03 | +A7 | +B_ESCHR_COLI | +S | +S | +R | +S | +FALSE | +|||
5 | -2010-10-26 | -Q3 | +7 | +2010-11-14 | +A7 | B_ESCHR_COLI | S | S | @@ -561,53 +617,31 @@ Longest: 1FALSE | ||
6 | -2010-12-11 | -Q3 | +8 | +2011-02-14 | +A7 | B_ESCHR_COLI | -S | -S | -S | -R | -FALSE | -
7 | -2011-03-20 | -Q3 | -B_ESCHR_COLI | -R | +I | S | S | S | TRUE | ||
8 | -2011-04-08 | -Q3 | -B_ESCHR_COLI | -S | -S | -S | -S | -FALSE | -|||
9 | -2011-04-24 | -Q3 | +2011-03-06 | +A7 | B_ESCHR_COLI | R | S | -S | +R | S | FALSE |
10 | -2011-05-08 | -Q3 | +2011-03-09 | +A7 | B_ESCHR_COLI | S | S | @@ -619,15 +653,17 @@ Longest: 1
Only 2 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics()
function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.
If a column exists with a name like ‘key(…)ab’ the first_isolate()
function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output:
data <- data %>% - mutate(keyab = key_antibiotics(.)) %>% - mutate(first_weighted = first_isolate(.)) ++# NOTE: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics = FALSE to prevent this. ++data <- data %>% + mutate(keyab = key_antibiotics(.)) %>% + mutate(first_weighted = first_isolate(.)) # NOTE: Using column `bacteria` as input for `col_mo`. # NOTE: more than one result was found for item 1: amoxicillin/clavulanic acid, azidocillin # NOTE: Using column `bacteria` as input for `col_mo`. # NOTE: Using column `date` as input for `col_date`. # NOTE: Using column `patient_id` as input for `col_patient_id`. -# NOTE: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics = FALSE to prevent this.
isolate | @@ -644,8 +680,8 @@ Longest: 1|||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | -2010-02-06 | -Q3 | +2010-02-03 | +A7 | B_ESCHR_COLI | S | S | @@ -656,32 +692,44 @@ Longest: 1||||||
2 | -2010-04-09 | -Q3 | +2010-02-04 | +A7 | B_ESCHR_COLI | -S | +R | S | S | S | FALSE | -FALSE | +TRUE |
3 | -2010-07-03 | -Q3 | +2010-04-05 | +A7 | B_ESCHR_COLI | S | S | -R | +S | S | FALSE | TRUE | |
4 | -2010-07-31 | -Q3 | +2010-06-15 | +A7 | +B_ESCHR_COLI | +S | +S | +S | +S | +FALSE | +FALSE | +||
5 | +2010-08-28 | +A7 | B_ESCHR_COLI | S | S | @@ -690,46 +738,22 @@ Longest: 1FALSE | TRUE | ||||||
5 | -2010-10-26 | -Q3 | -B_ESCHR_COLI | -S | -S | -S | -S | -FALSE | -TRUE | -||||
6 | -2010-12-11 | -Q3 | +2010-09-03 | +A7 | B_ESCHR_COLI | S | S | -S | R | +S | FALSE | TRUE | |
7 | -2011-03-20 | -Q3 | -B_ESCHR_COLI | -R | -S | -S | -S | -TRUE | -TRUE | -||||
8 | -2011-04-08 | -Q3 | +2010-11-14 | +A7 | B_ESCHR_COLI | S | S | @@ -738,22 +762,34 @@ Longest: 1FALSE | TRUE | ||||
8 | +2011-02-14 | +A7 | +B_ESCHR_COLI | +I | +S | +S | +S | +TRUE | +TRUE | +||||
9 | -2011-04-24 | -Q3 | +2011-03-06 | +A7 | B_ESCHR_COLI | R | S | -S | +R | S | FALSE | TRUE | |
10 | -2011-05-08 | -Q3 | +2011-03-09 | +A7 | B_ESCHR_COLI | S | S | @@ -764,16 +800,22 @@ Longest: 1
Instead of 2, now 9 isolates are flagged. In total, 78.8% of all isolates are marked ‘first weighted’ - 50.6% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
+Instead of 2, now 9 isolates are flagged. In total, 79.0% of all isolates are marked ‘first weighted’ - 50.5% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
As with filter_first_isolate()
, there’s a shortcut for this new algorithm too:
data_1st <- data %>% - filter_first_weighted_isolate()
So we end up with 15,769 isolates for analysis.
++data_1st <- data %>% + filter_first_weighted_isolate() +
So we end up with 15,794 isolates for analysis.
We can remove unneeded columns:
- +Now our data looks like:
-head(data_1st)
+head(data_1st) +
1 | -2017-06-10 | -J6 | -Hospital B | -B_ESCHR_COLI | -R | -R | -S | -S | -M | -Gram-negative | -Escherichia | -coli | -TRUE | -||
2 | -2013-08-26 | -H10 | -Hospital A | -B_ESCHR_COLI | -S | -S | -R | -S | -M | -Gram-negative | -Escherichia | -coli | -TRUE | -||
3 | -2016-02-09 | -S7 | +2014-08-12 | +M9 | Hospital D | -B_ESCHR_COLI | -R | -S | -R | -S | -F | -Gram-negative | -Escherichia | -coli | -TRUE | -
4 | -2013-06-27 | -A5 | -Hospital B | -B_ESCHR_COLI | -S | -S | -S | -S | -M | -Gram-negative | -Escherichia | -coli | -TRUE | -||
5 | -2017-03-03 | -Q4 | -Hospital D | -B_ESCHR_COLI | -R | -S | -S | -S | -F | -Gram-negative | -Escherichia | -coli | -TRUE | -||
7 | -2015-08-02 | -N2 | -Hospital B | B_STPHY_AURS | R | S | @@ -904,6 +866,86 @@ Longest: 1aureus | TRUE | |||||||
2 | +2013-03-13 | +W1 | +Hospital A | +B_ESCHR_COLI | +I | +S | +R | +S | +F | +Gram-negative | +Escherichia | +coli | +TRUE | +||
3 | +2015-12-02 | +L5 | +Hospital D | +B_STPHY_AURS | +R | +S | +R | +S | +M | +Gram-positive | +Staphylococcus | +aureus | +TRUE | +||
4 | +2017-09-11 | +O7 | +Hospital B | +B_STRPT_PNMN | +R | +R | +S | +R | +F | +Gram-positive | +Streptococcus | +pneumoniae | +TRUE | +||
5 | +2014-06-03 | +K4 | +Hospital C | +B_STRPT_PNMN | +S | +S | +S | +R | +M | +Gram-positive | +Streptococcus | +pneumoniae | +TRUE | +||
7 | +2011-02-14 | +B9 | +Hospital A | +B_ESCHR_COLI | +R | +S | +S | +S | +M | +Gram-negative | +Escherichia | +coli | +TRUE | +
Time for the analysis!
@@ -918,13 +960,17 @@ Longest: 1 Dispersion of speciesTo just get an idea how the species are distributed, create a frequency table with our freq()
function. We created the genus
and species
column earlier based on the microbial ID. With paste()
, we can concatenate them together.
The freq()
function can be used like the base R language was intended:
Or can be used like the dplyr
way, which is easier readable:
data_1st %>% freq(genus, species)
+data_1st %>% freq(genus, species) +
Frequency table
Class: character
-Length: 15,769
-Available: 15,769 (100%, NA: 0 = 0%)
+Length: 15,794
+Available: 15,794 (100%, NA: 0 = 0%)
Unique: 4
Shortest: 16
Longest: 24
If you want to get a quick glance of the number of isolates in different bug/drug combinations, you can use the bug_drug_combinations()
function:
data_1st %>% - bug_drug_combinations() %>% - head() # show first 6 rows
+data_1st %>% + bug_drug_combinations() %>% + head() # show first 6 rows +
# NOTE: Using column `bacteria` as input for `col_mo`.
E. coli | AMX | -3769 | -276 | -3924 | -7969 | +3722 | +247 | +3859 | +7828 |
E. coli | AMC | -6242 | -307 | -1420 | -7969 | +6127 | +293 | +1408 | +7828 |
E. coli | CIP | -6044 | +5959 | 0 | -1925 | -7969 | +1869 | +7828 | |
E. coli | GEN | -7149 | +7073 | 0 | -820 | -7969 | +755 | +7828 | |
K. pneumoniae | AMX | 0 | 0 | -1542 | -1542 | +1642 | +1642 | ||
K. pneumoniae | AMC | -1209 | -69 | -264 | -1542 | +1304 | +58 | +280 | +1642 |
Using Tidyverse selections, you can also select columns based on the antibiotic class they are in:
-data_1st %>% - select(bacteria, fluoroquinolones()) %>% - bug_drug_combinations()
+data_1st %>% + select(bacteria, fluoroquinolones()) %>% + bug_drug_combinations() +
# Selecting fluoroquinolones: `CIP` (ciprofloxacin)
# NOTE: Using column `bacteria` as input for `col_mo`.
E. coli | CIP | -6044 | +5959 | 0 | -1925 | -7969 | +1869 | +7828 |
K. pneumoniae | CIP | -1182 | +1237 | 0 | -360 | -1542 | +405 | +1642 |
S. aureus | CIP | -3010 | +3006 | 0 | -937 | -3947 | +919 | +3925 |
S. pneumoniae | CIP | -1777 | +1801 | 0 | -534 | -2311 | +598 | +2399 |
The functions resistance()
and susceptibility()
can be used to calculate antimicrobial resistance or susceptibility. For more specific analyses, the functions proportion_S()
, proportion_SI()
, proportion_I()
, proportion_IR()
and proportion_R()
can be used to determine the proportion of a specific antimicrobial outcome.
As per the EUCAST guideline of 2019, we calculate resistance as the proportion of R (proportion_R()
, equal to resistance()
) and susceptibility as the proportion of S and I (proportion_SI()
, equal to susceptibility()
). These functions can be used on their own:
data_1st %>% resistance(AMX) -# [1] 0.5378908
+data_1st %>% resistance(AMX) +# [1] 0.5372293 +
Or can be used in conjuction with group_by()
and summarise()
, both from the dplyr
package:
data_1st %>% - group_by(hospital) %>% - summarise(amoxicillin = resistance(AMX))
+data_1st %>% + group_by(hospital) %>% + summarise(amoxicillin = resistance(AMX)) +
# `summarise()` ungrouping output (override with `.groups` argument)
Hospital A | -0.5322241 | +0.5349377 |
Hospital B | -0.5324628 | +0.5382268 |
Hospital C | -0.5388535 | +0.5455691 |
Hospital D | -0.5552030 | +0.5325387 |
Of course it would be very convenient to know the number of isolates responsible for the percentages. For that purpose the n_rsi()
can be used, which works exactly like n_distinct()
from the dplyr
package. It counts all isolates available for every group (i.e. values S, I or R):
data_1st %>% - group_by(hospital) %>% - summarise(amoxicillin = resistance(AMX), - available = n_rsi(AMX))
+data_1st %>% + group_by(hospital) %>% + summarise(amoxicillin = resistance(AMX), + available = n_rsi(AMX)) +
# `summarise()` ungrouping output (override with `.groups` argument)
Hospital A | -0.5322241 | -4748 | +0.5349377 | +4737 |
Hospital B | -0.5324628 | -5514 | +0.5382268 | +5572 |
Hospital C | -0.5388535 | -2355 | +0.5455691 | +2381 |
Hospital D | -0.5552030 | -3152 | +0.5325387 | +3104 |
These functions can also be used to get the proportion of multiple antibiotics, to calculate empiric susceptibility of combination therapies very easily:
-data_1st %>% - group_by(genus) %>% - summarise(amoxiclav = susceptibility(AMC), - gentamicin = susceptibility(GEN), - amoxiclav_genta = susceptibility(AMC, GEN))
+data_1st %>% + group_by(genus) %>% + summarise(amoxiclav = susceptibility(AMC), + gentamicin = susceptibility(GEN), + amoxiclav_genta = susceptibility(AMC, GEN)) +
# `summarise()` ungrouping output (override with `.groups` argument)
Escherichia | -0.8218095 | -0.8971013 | -0.9858201 | +0.8201329 | +0.9035514 | +0.9853091 |
Klebsiella | -0.8287938 | -0.8949416 | -0.9883268 | +0.8294762 | +0.9025579 | +0.9829476 |
Staphylococcus | -0.8198632 | -0.9191791 | -0.9863187 | +0.8219108 | +0.9141401 | +0.9844586 |
Streptococcus | -0.5374297 | +0.5414756 | 0.0000000 | -0.5374297 | +0.5414756 |
To make a transition to the next part, let’s see how this difference could be plotted:
-data_1st %>% - group_by(genus) %>% - summarise("1. Amoxi/clav" = susceptibility(AMC), - "2. Gentamicin" = susceptibility(GEN), - "3. Amoxi/clav + genta" = susceptibility(AMC, GEN)) %>% ++ tidyr::pivot_longer(-genus, names_to = "antibiotic") %>% + ggplot(aes(x = genus, + y = value, + fill = antibiotic)) + + geom_col(position = "dodge2") +# `summarise()` ungrouping output (override with `.groups` argument) ++data_1st %>% + group_by(genus) %>% + summarise("1. Amoxi/clav" = susceptibility(AMC), + "2. Gentamicin" = susceptibility(GEN), + "3. Amoxi/clav + genta" = susceptibility(AMC, GEN)) %>% # pivot_longer() from the tidyr package "lengthens" data: - tidyr::pivot_longer(-genus, names_to = "antibiotic") %>% - ggplot(aes(x = genus, - y = value, - fill = antibiotic)) + - geom_col(position = "dodge2") -# `summarise()` ungrouping output (override with `.groups` argument)
To show results in plots, most R users would nowadays use the ggplot2
package. This package lets you create plots in layers. You can read more about it on their website. A quick example would look like these syntaxes:
ggplot(data = a_data_set, - mapping = aes(x = year, - y = value)) + - geom_col() + - labs(title = "A title", - subtitle = "A subtitle", - x = "My X axis", - y = "My Y axis") ++ggplot(a_data_set) + + geom_bar(aes(year)) ++ggplot(data = a_data_set, + mapping = aes(x = year, + y = value)) + + geom_col() + + labs(title = "A title", + subtitle = "A subtitle", + x = "My X axis", + y = "My Y axis") # or as short as: -ggplot(a_data_set) + - geom_bar(aes(year))
The AMR
package contains functions to extend this ggplot2
package, for example geom_rsi()
. It automatically transforms data with count_df()
or proportion_df()
and show results in stacked bars. Its simplest and shortest example:
Omit the translate_ab = FALSE
to have the antibiotic codes (AMX, AMC, CIP, GEN) translated to official WHO names (amoxicillin, amoxicillin/clavulanic acid, ciprofloxacin, gentamicin).
If we group on e.g. the genus
column and add some additional functions from our package, we can create this:
# group the data on `genus` -ggplot(data_1st %>% group_by(genus)) + ++ theme(axis.text.y = element_text(face = "italic")) ++# group the data on `genus` +ggplot(data_1st %>% group_by(genus)) + # create bars with genus on x axis # it looks for variables with class `rsi`, # of which we have 4 (earlier created with `as.rsi`) - geom_rsi(x = "genus") + + geom_rsi(x = "genus") + # split plots on antibiotic - facet_rsi(facet = "antibiotic") + + facet_rsi(facet = "antibiotic") + # set colours to the R/SI interpretations - scale_rsi_colours() + + scale_rsi_colours() + # show percentages on y axis - scale_y_percent(breaks = 0:4 * 25) + + scale_y_percent(breaks = 0:4 * 25) + # turn 90 degrees, to make it bars instead of columns - coord_flip() + + coord_flip() + # add labels - labs(title = "Resistance per genus and antibiotic", - subtitle = "(this is fake data)") + + labs(title = "Resistance per genus and antibiotic", + subtitle = "(this is fake data)") + # and print genus in italic to follow our convention # (is now y axis because we turned the plot) - theme(axis.text.y = element_text(face = "italic"))
To simplify this, we also created the ggplot_rsi()
function, which combines almost all above functions:
data_1st %>% - group_by(genus) %>% - ggplot_rsi(x = "genus", - facet = "antibiotic", - breaks = 0:4 * 25, - datalabels = FALSE) + - coord_flip()
+data_1st %>% + group_by(genus) %>% + ggplot_rsi(x = "genus", + facet = "antibiotic", + breaks = 0:4 * 25, + datalabels = FALSE) + + coord_flip() +
The next example uses the example_isolates
data set. This is a data set included with this package and contains 2,000 microbial isolates with their full antibiograms. It reflects reality and can be used to practice AMR analysis.
We will compare the resistance to fosfomycin (column FOS
) in hospital A and D. The input for the fisher.test()
can be retrieved with a transformation like this:
# use package 'tidyr' to pivot data: -library(tidyr) ++# [2,] 24 33 ++# use package 'tidyr' to pivot data: +library(tidyr) -check_FOS <- example_isolates %>% - filter(hospital_id %in% c("A", "D")) %>% # filter on only hospitals A and D - select(hospital_id, FOS) %>% # select the hospitals and fosfomycin - group_by(hospital_id) %>% # group on the hospitals - count_df(combine_SI = TRUE) %>% # count all isolates per group (hospital_id) - pivot_wider(names_from = hospital_id, # transform output so A and D are columns - values_from = value) %>% - select(A, D) %>% # and only select these columns +check_FOS <- example_isolates %>% + filter(hospital_id %in% c("A", "D")) %>% # filter on only hospitals A and D + select(hospital_id, FOS) %>% # select the hospitals and fosfomycin + group_by(hospital_id) %>% # group on the hospitals + count_df(combine_SI = TRUE) %>% # count all isolates per group (hospital_id) + pivot_wider(names_from = hospital_id, # transform output so A and D are columns + values_from = value) %>% + select(A, D) %>% # and only select these columns as.matrix() # transform to a good old matrix for fisher.test() -check_FOS +check_FOS # A D # [1,] 25 77 -# [2,] 24 33
We can apply the test now with:
-# do Fisher's Exact Test -fisher.test(check_FOS) ++# 0.4488318 ++# do Fisher's Exact Test +fisher.test(check_FOS) # # Fisher's Exact Test for Count Data # @@ -1308,7 +1379,8 @@ Longest: 24 # 0.2111489 0.9485124 # sample estimates: # odds ratio -# 0.4488318
As can be seen, the p value is 0.031, which means that the fosfomycin resistance found in isolates from patients in hospital A and D are really different.
vignettes/EUCAST.Rmd
EUCAST.Rmd
These rules can be used to discard impossible bug-drug combinations in your data. For example, Klebsiella produces beta-lactamase that prevents ampicillin (or amoxicillin) from working against it. In other words, practically every strain of Klebsiella is resistant to ampicillin.
Sometimes, laboratory data can still contain such strains with ampicillin being susceptible to ampicillin. This could be because an antibiogram is available before an identification is available, and the antibiogram is then not re-interpreted based on the identification (namely, Klebsiella). EUCAST expert rules solve this, that can be applied using eucast_rules()
:
oops <- data.frame(mo = c("Klebsiella", ++# 2 Escherichia S ++oops <- data.frame(mo = c("Klebsiella", "Escherichia"), - ampicillin = "S") -oops + ampicillin = "S") +oops # mo ampicillin # 1 Klebsiella S # 2 Escherichia S -eucast_rules(oops, info = FALSE) +eucast_rules(oops, info = FALSE) # mo ampicillin # 1 Klebsiella R -# 2 Escherichia S
EUCAST rules can not only be used for correction, they can also be used for filling in known resistance and susceptibility based on results of other antimicrobials drugs. This process is called interpretive reading and is part of the eucast_rules()
function as well:
data <- data.frame(mo = c("Staphylococcus aureus", +-+data <- data.frame(mo = c("Staphylococcus aureus", "Enterococcus faecalis", "Escherichia coli", "Klebsiella pneumoniae", "Pseudomonas aeruginosa"), - VAN = "-", # Vancomycin - AMX = "-", # Amoxicillin - COL = "-", # Colistin - CAZ = "-", # Ceftazidime - CXM = "-", # Cefuroxime - PEN = "S", # Penicillin G - FOX = "S", # Cefoxitin - stringsAsFactors = FALSE)+ VAN = "-", # Vancomycin + AMX = "-", # Amoxicillin + COL = "-", # Colistin + CAZ = "-", # Ceftazidime + CXM = "-", # Cefuroxime + PEN = "S", # Penicillin G + FOX = "S", # Cefoxitin + stringsAsFactors = FALSE) +data
+data
+
mo | @@ -300,7 +306,9 @@
---|
eucast_rules(data)
+eucast_rules(data) +
# Warning: Not all columns with antimicrobial results are of class <rsi>.
# Transform eligible columns to class <rsi> on beforehand: your_data %>% mutate_if(is.rsi.eligible, as.rsi)
For another example, I will create a data set to determine multi-drug resistant TB:
-# a helper function to get a random vector with values S, I and R ++my_TB_data <- data.frame(rifampicin = sample_rsi(), + isoniazid = sample_rsi(), + gatifloxacin = sample_rsi(), + ethambutol = sample_rsi(), + pyrazinamide = sample_rsi(), + moxifloxacin = sample_rsi(), + kanamycin = sample_rsi()) ++# a helper function to get a random vector with values S, I and R # with the probabilities 50% - 10% - 40% -sample_rsi <- function() { +sample_rsi <- function() { sample(c("S", "I", "R"), - size = 5000, - prob = c(0.5, 0.1, 0.4), - replace = TRUE) + size = 5000, + prob = c(0.5, 0.1, 0.4), + replace = TRUE) } -my_TB_data <- data.frame(rifampicin = sample_rsi(), - isoniazid = sample_rsi(), - gatifloxacin = sample_rsi(), - ethambutol = sample_rsi(), - pyrazinamide = sample_rsi(), - moxifloxacin = sample_rsi(), - kanamycin = sample_rsi())
Because all column names are automatically verified for valid drug names or codes, this would have worked exactly the same:
-my_TB_data <- data.frame(RIF = sample_rsi(), - INH = sample_rsi(), - GAT = sample_rsi(), - ETH = sample_rsi(), - PZA = sample_rsi(), - MFX = sample_rsi(), - KAN = sample_rsi())
+my_TB_data <- data.frame(RIF = sample_rsi(), + INH = sample_rsi(), + GAT = sample_rsi(), + ETH = sample_rsi(), + PZA = sample_rsi(), + MFX = sample_rsi(), + KAN = sample_rsi()) +
The data set now looks like this:
-head(my_TB_data) ++# 4 S +# 5 S +# 6 R ++head(my_TB_data) # rifampicin isoniazid gatifloxacin ethambutol pyrazinamide moxifloxacin -# 1 S S R R S I -# 2 R S R R R S -# 3 R S S S I S -# 4 S S I S R S -# 5 R I R S R S -# 6 S S S S R R +# 1 R R R R I I +# 2 S S S R S R +# 3 S I S S S S +# 4 S I S R R R +# 5 S S R S S R +# 6 S R S R R R # kanamycin -# 1 I -# 2 I +# 1 S +# 2 R # 3 S -# 4 R -# 5 R -# 6 S
We can now add the interpretation of MDR-TB to our data set. You can use:
-mdro(my_TB_data, guideline = "TB")
+mdro(my_TB_data, guideline = "TB") +
or its shortcut mdr_tb()
:
my_TB_data$mdr <- mdr_tb(my_TB_data) ++# NOTE: Reliability would be improved if these antimicrobial results would be available too: capreomycin (CAP), rifabutin (RIB), rifapentine (RFP) ++my_TB_data$mdr <- mdr_tb(my_TB_data) # NOTE: No column found as input for `col_mo`, assuming all records contain Mycobacterium tuberculosis. # NOTE: Auto-guessing columns suitable for analysis...OK. -# NOTE: Reliability would be improved if these antimicrobial results would be available too: capreomycin (CAP), rifabutin (RIB), rifapentine (RFP)
Create a frequency table of the results:
-freq(my_TB_data$mdr)
+freq(my_TB_data$mdr) +
Frequency table
Class: factor > ordered (numeric)
Length: 5,000
@@ -347,40 +363,40 @@ Unique: 5
vignettes/PCA.Rmd
PCA.Rmd
For PCA, we need to transform our AMR data first. This is what the example_isolates
data set in this package looks like:
library(AMR) -library(dplyr) -glimpse(example_isolates) ++# $ RIF <ord> R, R, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R, R, R… ++library(AMR) +library(dplyr) +glimpse(example_isolates) # Rows: 2,000 # Columns: 49 # $ date <date> 2002-01-02, 2002-01-03, 2002-01-07, 2002-01-07, 2002… @@ -257,16 +258,18 @@ # $ CHL <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… # $ COL <ord> NA, NA, R, R, R, R, R, R, R, R, R, R, NA, NA, NA, R, … # $ MUP <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… -# $ RIF <ord> R, R, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R, R, R…
Now to transform this to a data set with only resistance percentages per taxonomic order and genus:
-resistance_data <- example_isolates %>% - group_by(order = mo_order(mo), # group on anything, like order - genus = mo_genus(mo)) %>% # and genus as we do here - summarise_if(is.rsi, resistance) %>% # then get resistance of all drugs - select(order, genus, AMC, CXM, CTX, - CAZ, GEN, TOB, TMP, SXT) # and select only relevant columns ++# 6 Actinomycetales Rothia NA NA NA NA NA NA NA NA ++resistance_data <- example_isolates %>% + group_by(order = mo_order(mo), # group on anything, like order + genus = mo_genus(mo)) %>% # and genus as we do here + summarise_if(is.rsi, resistance) %>% # then get resistance of all drugs + select(order, genus, AMC, CXM, CTX, + CAZ, GEN, TOB, TMP, SXT) # and select only relevant columns -head(resistance_data) +head(resistance_data) # # A tibble: 6 x 10 # # Groups: order [2] # order genus AMC CXM CTX CAZ GEN TOB TMP SXT @@ -276,35 +279,46 @@ # 3 Actinomycetales Cutibacterium NA NA NA NA NA NA NA NA # 4 Actinomycetales Dermabacter NA NA NA NA NA NA NA NA # 5 Actinomycetales Micrococcus NA NA NA NA NA NA NA NA -# 6 Actinomycetales Rothia NA NA NA NA NA NA NA NA
The new pca()
function will automatically filter on rows that contain numeric values in all selected variables, so we now only need to do:
pca_result <- pca(resistance_data) ++# Total observations available: 7. ++pca_result <- pca(resistance_data) # NOTE: Columns selected for PCA: AMC CXM CTX CAZ GEN TOB TMP SXT. -# Total observations available: 7.
The result can be reviewed with the good old summary()
function:
summary(pca_result) ++# Cumulative Proportion 0.580 0.9332 0.98014 0.99449 0.99988 1.00000 1.000e+00 ++summary(pca_result) # Importance of components: # PC1 PC2 PC3 PC4 PC5 PC6 PC7 # Standard deviation 2.154 1.6809 0.61305 0.33882 0.20755 0.03137 1.602e-16 # Proportion of Variance 0.580 0.3532 0.04698 0.01435 0.00538 0.00012 0.000e+00 -# Cumulative Proportion 0.580 0.9332 0.98014 0.99449 0.99988 1.00000 1.000e+00
Good news. The first two components explain a total of 93.3% of the variance (see the PC1 and PC2 values of the Proportion of Variance. We can create a so-called biplot with the base R biplot()
function, to see which antimicrobial resistance per drug explain the difference per microorganism.
biplot(pca_result)
+biplot(pca_result) +
But we can’t see the explanation of the points. Perhaps this works better with our new ggplot_pca()
function, that automatically adds the right labels and even groups:
ggplot_pca(pca_result)
+ggplot_pca(pca_result) +
You can also print an ellipse per group, and edit the appearance:
-ggplot_pca(pca_result, ellipse = TRUE) + - ggplot2::labs(title = "An AMR/PCA biplot!")
+ggplot_pca(pca_result, ellipse = TRUE) + + ggplot2::labs(title = "An AMR/PCA biplot!") +
vignettes/SPSS.Rmd
SPSS.Rmd
To demonstrate the first point:
-# not all values are valid MIC values: ++# [1] "J01CF05" ++# not all values are valid MIC values: as.mic(0.125) # Class <mic> # [1] 0.125 @@ -253,13 +254,13 @@ # [1] "Gram-negative" # Klebsiella is intrinsic resistant to amoxicllin, according to EUCAST: -klebsiella_test <- data.frame(mo = "klebsiella", - amox = "S", - stringsAsFactors = FALSE) -klebsiella_test # (our original data) +klebsiella_test <- data.frame(mo = "klebsiella", + amox = "S", + stringsAsFactors = FALSE) +klebsiella_test # (our original data) # mo amox # 1 klebsiella S -eucast_rules(klebsiella_test, info = FALSE) # (the edited data by EUCAST rules) +eucast_rules(klebsiella_test, info = FALSE) # (the edited data by EUCAST rules) # mo amox # 1 klebsiella R @@ -271,7 +272,8 @@ # [4] "fluclox" "flucloxacilina" "flucloxacillin" # [7] "flucloxacilline" "flucloxacillinum" "fluorochloroxacillin" ab_atc("floxapen") -# [1] "J01CF05"
If you want named variables to be imported as factors so it resembles SPSS more, use as_factor()
.
The difference is this:
-SPSS_data ++# # … with 4,193 more rows ++SPSS_data # # A tibble: 4,203 x 4 # v001 sex status statusage # <dbl> <dbl+lbl> <dbl+lbl> <dbl> @@ -303,7 +306,7 @@ # 10 10018 0 1 66.6 # # … with 4,193 more rows -as_factor(SPSS_data) +as_factor(SPSS_data) # # A tibble: 4,203 x 4 # v001 sex status statusage # <dbl> <fct> <fct> <dbl> @@ -317,67 +320,82 @@ # 8 10011 Male alive 73.1 # 9 10017 Male alive 56.7 # 10 10018 Female alive 66.6 -# # … with 4,193 more rows
To import data from SPSS, SAS or Stata, you can use the great haven
package yourself:
# download and install the latest version: ++library(haven) ++# download and install the latest version: install.packages("haven") # load the package you just installed: -library(haven)
You can now import files as follows:
To read files from SPSS into R:
-# read any SPSS file based on file extension (best way): -read_spss(file = "path/to/file") ++read_por(file = "path/to/file") ++# read any SPSS file based on file extension (best way): +read_spss(file = "path/to/file") # read .sav or .zsav file: -read_sav(file = "path/to/file") +read_sav(file = "path/to/file") # read .por file: -read_por(file = "path/to/file")
Do not forget about as_factor()
, as mentioned above.
To export your R objects to the SPSS file format:
-# save as .sav file: -write_sav(data = yourdata, path = "path/to/file") ++write_sav(data = yourdata, path = "path/to/file", compress = TRUE) ++# save as .sav file: +write_sav(data = yourdata, path = "path/to/file") # save as compressed .zsav file: -write_sav(data = yourdata, path = "path/to/file", compress = TRUE)
To read files from SAS into R:
-# read .sas7bdat + .sas7bcat files: -read_sas(data_file = "path/to/file", catalog_file = NULL) ++read_xpt(file = "path/to/file") ++# read .sas7bdat + .sas7bcat files: +read_sas(data_file = "path/to/file", catalog_file = NULL) # read SAS transport files (version 5 and version 8): -read_xpt(file = "path/to/file")
To export your R objects to the SAS file format:
-# save as regular SAS file: -write_sas(data = yourdata, path = "path/to/file") ++write_xpt(data = yourdata, path = "path/to/file", version = 8) ++# save as regular SAS file: +write_sas(data = yourdata, path = "path/to/file") # the SAS transport format is an open format # (required for submission of the data to the FDA) -write_xpt(data = yourdata, path = "path/to/file", version = 8)
To read files from Stata into R:
-# read .dta file: -read_stata(file = "/path/to/file") ++read_dta(file = "/path/to/file") ++# read .dta file: +read_stata(file = "/path/to/file") # works exactly the same: -read_dta(file = "/path/to/file")
To export your R objects to the Stata file format:
-vignettes/WHONET.Rmd
WHONET.Rmd
This tutorial assumes you already imported the WHONET data with e.g. the readxl
package. In RStudio, this can be done using the menu button ‘Import Dataset’ in the tab ‘Environment’. Choose the option ‘From Excel’ and select your exported file. Make sure date fields are imported correctly.
An example syntax could look like this:
-library(readxl) -data <- read_excel(path = "path/to/your/file.xlsx")
+library(readxl) +data <- read_excel(path = "path/to/your/file.xlsx") +
This package comes with an example data set WHONET
. We will use it for this analysis.
First, load the relevant packages if you did not yet did this. I use the tidyverse for all of my analyses. All of them. If you don’t know it yet, I suggest you read about it on their website: https://www.tidyverse.org/.
-library(dplyr) # part of tidyverse -library(ggplot2) # part of tidyverse -library(AMR) # this package -library(cleaner) # to create frequency tables
+library(dplyr) # part of tidyverse +library(ggplot2) # part of tidyverse +library(AMR) # this package +library(cleaner) # to create frequency tables +
We will have to transform some variables to simplify and automate the analysis:
mo
) using our Catalogue of Life reference data set, which contains all ~70,000 microorganisms from the taxonomic kingdoms Bacteria, Fungi and Protozoa. We do the tranformation with as.mo()
. This function also recognises almost all WHONET abbreviations of microorganisms."S"
, "I"
or "R"
. That is exactly where the as.rsi()
function is for.# transform variables -data <- WHONET %>% ++ mutate_at(vars(AMP_ND10:CIP_EE), as.rsi) ++# transform variables +data <- WHONET %>% # get microbial ID based on given organism - mutate(mo = as.mo(Organism)) %>% + mutate(mo = as.mo(Organism)) %>% # transform everything from "AMP_ND10" to "CIP_EE" to the new `rsi` class - mutate_at(vars(AMP_ND10:CIP_EE), as.rsi)
No errors or warnings, so all values are transformed succesfully.
We also created a package dedicated to data cleaning and checking, called the cleaner
package. Its freq()
function can be used to create frequency tables.
So let’s check our data, with a couple of frequency tables:
-# our newly created `mo` variable, put in the mo_name() function -data %>% freq(mo_name(mo), nmax = 10)
+# our newly created `mo` variable, put in the mo_name() function +data %>% freq(mo_name(mo), nmax = 10) +
Frequency table
Class: character
Length: 500
@@ -328,9 +336,11 @@ Longest: 40
(omitted 27 entries, n = 56 [11.20%])
-# our transformed antibiotic columns ++data %>% freq(AMC_ND2) ++# our transformed antibiotic columns # amoxicillin/clavulanic acid (J01CR02) as an example -data %>% freq(AMC_ND2)
Frequency table
Class: factor > ordered > rsi (numeric)
Length: 500
@@ -378,10 +388,12 @@ Unique: 3
An easy ggplot
will already give a lot of information, using the included ggplot_rsi()
function:
data %>% - group_by(Country) %>% - select(Country, AMP_ND2, AMC_ED20, CAZ_ED10, CIP_ED5) %>% - ggplot_rsi(translate_ab = 'ab', facet = "Country", datalabels = FALSE)
+data %>% + group_by(Country) %>% + select(Country, AMP_ND2, AMC_ED20, CAZ_ED10, CIP_ED5) %>% + ggplot_rsi(translate_ab = 'ab', facet = "Country", datalabels = FALSE) +
vignettes/benchmarks.Rmd
benchmarks.Rmd
One of the most important features of this package is the complete microbial taxonomic database, supplied by the Catalogue of Life. We created a function as.mo()
that transforms any user input value to a valid microbial ID by using intelligent rules combined with the taxonomic tree of Catalogue of Life.
Using the microbenchmark
package, we can review the calculation performance of this function. Its function microbenchmark()
runs different input expressions independently of each other and measures their time-to-result.
microbenchmark <- microbenchmark::microbenchmark -library(AMR) -library(dplyr)
+microbenchmark <- microbenchmark::microbenchmark +library(AMR) +library(dplyr) +
In the next test, we try to ‘coerce’ different input values into the microbial code of Staphylococcus aureus. Coercion is a computational process of forcing output based on an input. For microorganism names, coercing user input to taxonomically valid microorganism names is crucial to ensure correct interpretation and to enable grouping based on taxonomic properties.
The actual result is the same every time: it returns its microorganism code B_STPHY_AURS
(B stands for Bacteria, the taxonomic kingdom).
But the calculation time differs a lot:
-S.aureus <- microbenchmark( ++# expr min lq mean median uq max neval +# as.mo("sau") 11.0 14 21 15 16 51 10 +# as.mo("stau") 170.0 170 190 190 210 240 10 +# as.mo("STAU") 160.0 170 180 180 200 210 10 +# as.mo("staaur") 11.0 13 19 14 18 48 10 +# as.mo("STAAUR") 11.0 13 22 17 37 40 10 +# as.mo("S. aureus") 15.0 15 24 17 26 56 10 +# as.mo("S aureus") 12.0 13 21 16 23 49 10 +# as.mo("Staphylococcus aureus") 9.8 13 21 14 15 65 10 +# as.mo("Staphylococcus aureus (MRSA)") 960.0 960 1100 980 1100 1400 10 +# as.mo("Sthafilokkockus aaureuz") 440.0 450 480 470 480 570 10 +# as.mo("MRSA") 12.0 14 22 15 17 86 10 +# as.mo("VISA") 15.0 18 25 19 40 42 10 +# as.mo("VRSA") 14.0 15 30 22 44 69 10 +# as.mo(22242419) 130.0 150 160 170 180 190 10 ++S.aureus <- microbenchmark( as.mo("sau"), # WHONET code as.mo("stau"), as.mo("STAU"), @@ -218,47 +221,50 @@ as.mo("VISA"), # Vancomycin Intermediate S. aureus as.mo("VRSA"), # Vancomycin Resistant S. aureus as.mo(22242419), # Catalogue of Life ID - times = 10) -print(S.aureus, unit = "ms", signif = 2) + times = 10) +print(S.aureus, unit = "ms", signif = 2) # Unit: milliseconds -# expr min lq mean median uq max neval -# as.mo("sau") 11 12 17 13 15 51 10 -# as.mo("stau") 150 160 170 170 190 200 10 -# as.mo("STAU") 160 160 180 190 190 210 10 -# as.mo("staaur") 12 13 23 15 20 68 10 -# as.mo("STAAUR") 11 12 20 16 18 44 10 -# as.mo("S. aureus") 11 13 29 17 44 84 10 -# as.mo("S aureus") 11 15 21 16 18 46 10 -# as.mo("Staphylococcus aureus") 11 13 16 13 15 41 10 -# as.mo("Staphylococcus aureus (MRSA)") 870 890 920 900 950 1100 10 -# as.mo("Sthafilokkockus aaureuz") 400 410 430 440 450 490 10 -# as.mo("MRSA") 13 13 17 14 16 40 10 -# as.mo("VISA") 14 17 25 19 36 46 10 -# as.mo("VRSA") 13 15 21 17 21 50 10 -# as.mo(22242419) 130 140 150 150 150 180 10
In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 5 milliseconds means it can determine 200 input values per second. It case of 100 milliseconds, this is only 10 input values per second.
To achieve this speed, the as.mo
function also takes into account the prevalence of human pathogenic microorganisms. The downside of this is of course that less prevalent microorganisms will be determined less fast. See this example for the ID of Methanosarcina semesiae (B_MTHNSR_SEMS
), a bug probably never found before in humans:
M.semesiae <- microbenchmark(as.mo("metsem"), ++# 10 ++M.semesiae <- microbenchmark(as.mo("metsem"), as.mo("METSEM"), as.mo("M. semesiae"), as.mo("M. semesiae"), as.mo("Methanosarcina semesiae"), - times = 10) -print(M.semesiae, unit = "ms", signif = 4) + times = 10) +print(M.semesiae, unit = "ms", signif = 4) # Unit: milliseconds -# expr min lq mean median uq max -# as.mo("metsem") 176.800 179.200 189.20 185.90 194.00 212.60 -# as.mo("METSEM") 164.400 170.800 193.00 188.20 211.10 243.00 -# as.mo("M. semesiae") 10.950 11.310 19.92 15.41 18.79 50.84 -# as.mo("M. semesiae") 11.560 11.860 17.66 14.15 16.96 50.76 -# as.mo("Methanosarcina semesiae") 9.408 9.669 18.03 14.12 15.24 42.57 +# expr min lq mean median uq max +# as.mo("metsem") 186.900 192.90 204.70 199.10 207.70 251.20 +# as.mo("METSEM") 175.500 199.70 215.20 218.20 232.00 240.40 +# as.mo("M. semesiae") 11.500 13.29 16.47 13.85 16.84 36.90 +# as.mo("M. semesiae") 11.690 11.94 16.81 14.40 15.75 42.76 +# as.mo("Methanosarcina semesiae") 9.688 10.28 14.55 11.99 13.72 39.41 # neval # 10 # 10 # 10 # 10 -# 10
Looking up arbitrary codes of less prevalent microorganisms costs the most time. Full names (like Methanosarcina semesiae) are always very fast and only take some thousands of seconds to coerce - they are the most probable input from most data sets.
In the figure below, we compare Escherichia coli (which is very common) with Prevotella brevis (which is moderately common) and with Methanosarcina semesiae (which is uncommon):
Repetitive results are unique values that are present more than once. Unique values will only be calculated once by as.mo()
. We will use mo_name()
for this test - a helper function that returns the full microbial name (genus, species and possibly subspecies) which uses as.mo()
internally.
# take all MO codes from the example_isolates data set -x <- example_isolates$mo %>% +-+# take all MO codes from the example_isolates data set +x <- example_isolates$mo %>% # keep only the unique ones - unique() %>% + unique() %>% # pick 50 of them at random - sample(50) %>% + sample(50) %>% # paste that 10,000 times - rep(10000) %>% + rep(10000) %>% # scramble it sample() - + # got indeed 50 times 10,000 = half a million? -length(x) +length(x) # [1] 500000 # and how many unique values do we have? -n_distinct(x) +n_distinct(x) # [1] 50 # now let's see: -run_it <- microbenchmark(mo_name(x), - times = 10) -print(run_it, unit = "ms", signif = 3) +run_it <- microbenchmark(mo_name(x), + times = 10) +print(run_it, unit = "ms", signif = 3) # Unit: milliseconds # expr min lq mean median uq max neval -# mo_name(x) 1720 1760 1820 1800 1830 1990 10So transforming 500,000 values (!!) of 50 unique values only takes 1.8 seconds. You only lose time on your unique input values.
+# mo_name(x) 1840 1870 1950 1940 1980 2140 10 +
So transforming 500,000 values (!!) of 50 unique values only takes 1.94 seconds. You only lose time on your unique input values.
What about precalculated results? If the input is an already precalculated result of a helper function like mo_name()
, it almost doesn’t take any time at all (see ‘C’ below):
run_it <- microbenchmark(A = mo_name("B_STPHY_AURS"), - B = mo_name("S. aureus"), - C = mo_name("Staphylococcus aureus"), - times = 10) -print(run_it, unit = "ms", signif = 3) +-+run_it <- microbenchmark(A = mo_name("B_STPHY_AURS"), + B = mo_name("S. aureus"), + C = mo_name("Staphylococcus aureus"), + times = 10) +print(run_it, unit = "ms", signif = 3) # Unit: milliseconds # expr min lq mean median uq max neval -# A 8.16 8.35 9.06 8.97 9.75 10.20 10 -# B 10.50 10.60 15.50 12.20 12.80 49.90 10 -# C 1.04 1.15 1.21 1.19 1.27 1.53 10So going from
-mo_name("Staphylococcus aureus")
to"Staphylococcus aureus"
takes 0.0012 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:+run_it <- microbenchmark(A = mo_species("aureus"), - B = mo_genus("Staphylococcus"), - C = mo_name("Staphylococcus aureus"), - D = mo_family("Staphylococcaceae"), - E = mo_order("Bacillales"), - F = mo_class("Bacilli"), - G = mo_phylum("Firmicutes"), - H = mo_kingdom("Bacteria"), - times = 10) -print(run_it, unit = "ms", signif = 3) +# A 8.17 8.49 9.32 9.32 9.90 10.90 10 +# B 10.90 11.80 16.30 13.20 14.70 45.60 10 +# C 1.06 1.22 1.32 1.28 1.44 1.57 10 +So going from
+mo_name("Staphylococcus aureus")
to"Staphylococcus aureus"
takes 0.0013 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:+# A 1.020 1.030 1.11 1.060 1.22 1.33 10 +# B 0.982 1.010 1.10 1.040 1.21 1.38 10 +# C 0.992 1.020 1.13 1.040 1.24 1.58 10 +# D 0.987 1.000 1.07 1.030 1.08 1.29 10 +# E 0.978 0.982 1.02 0.999 1.03 1.15 10 +# F 0.975 0.992 1.05 1.000 1.03 1.26 10 +# G 0.976 0.983 1.02 0.994 1.03 1.22 10 +# H 0.977 1.010 1.11 1.090 1.21 1.28 10 ++run_it <- microbenchmark(A = mo_species("aureus"), + B = mo_genus("Staphylococcus"), + C = mo_name("Staphylococcus aureus"), + D = mo_family("Staphylococcaceae"), + E = mo_order("Bacillales"), + F = mo_class("Bacilli"), + G = mo_phylum("Firmicutes"), + H = mo_kingdom("Bacteria"), + times = 10) +print(run_it, unit = "ms", signif = 3) # Unit: milliseconds # expr min lq mean median uq max neval -# A 0.948 0.971 1.14 1.020 1.39 1.52 10 -# B 0.968 1.040 1.22 1.190 1.41 1.56 10 -# C 0.979 1.020 1.31 1.260 1.58 1.66 10 -# D 0.964 1.010 1.24 1.190 1.45 1.83 10 -# E 0.977 0.995 1.15 1.030 1.40 1.45 10 -# F 0.878 0.982 1.11 1.010 1.37 1.43 10 -# G 0.929 0.961 1.18 1.000 1.43 1.58 10 -# H 0.901 0.967 1.09 0.998 1.35 1.40 10
Of course, when running mo_phylum("Firmicutes")
the function has zero knowledge about the actual microorganism, namely S. aureus. But since the result would be "Firmicutes"
anyway, there is no point in calculating the result. And because this package ‘knows’ all phyla of all known bacteria (according to the Catalogue of Life), it can just return the initial value immediately.
When the system language is non-English and supported by this AMR
package, some functions will have a translated result. This almost does’t take extra time:
mo_name("CoNS", language = "en") # or just mo_name("CoNS") on an English system ++# expr min lq mean median uq max neval +# en 12.40 14.34 17.88 14.89 15.48 55.22 100 +# de 13.17 14.30 17.90 15.84 16.66 56.60 100 +# nl 17.14 19.86 24.99 20.78 21.70 64.66 100 +# es 13.43 15.29 17.65 15.93 16.59 54.38 100 +# it 13.33 14.83 18.35 15.68 16.36 57.61 100 +# fr 13.40 15.43 18.66 16.01 16.59 54.35 100 +# pt 13.47 15.33 18.93 16.15 16.84 57.28 100 ++mo_name("CoNS", language = "en") # or just mo_name("CoNS") on an English system # [1] "Coagulase-negative Staphylococcus (CoNS)" -mo_name("CoNS", language = "es") # or just mo_name("CoNS") on a Spanish system +mo_name("CoNS", language = "es") # or just mo_name("CoNS") on a Spanish system # [1] "Staphylococcus coagulasa negativo (SCN)" -mo_name("CoNS", language = "nl") # or just mo_name("CoNS") on a Dutch system +mo_name("CoNS", language = "nl") # or just mo_name("CoNS") on a Dutch system # [1] "Coagulase-negatieve Staphylococcus (CNS)" -run_it <- microbenchmark(en = mo_name("CoNS", language = "en"), - de = mo_name("CoNS", language = "de"), - nl = mo_name("CoNS", language = "nl"), - es = mo_name("CoNS", language = "es"), - it = mo_name("CoNS", language = "it"), - fr = mo_name("CoNS", language = "fr"), - pt = mo_name("CoNS", language = "pt"), - times = 100) -print(run_it, unit = "ms", signif = 4) +run_it <- microbenchmark(en = mo_name("CoNS", language = "en"), + de = mo_name("CoNS", language = "de"), + nl = mo_name("CoNS", language = "nl"), + es = mo_name("CoNS", language = "es"), + it = mo_name("CoNS", language = "it"), + fr = mo_name("CoNS", language = "fr"), + pt = mo_name("CoNS", language = "pt"), + times = 100) +print(run_it, unit = "ms", signif = 4) # Unit: milliseconds -# expr min lq mean median uq max neval -# en 12.09 12.46 15.90 13.86 14.55 57.62 100 -# de 12.92 13.26 19.73 14.63 16.01 61.55 100 -# nl 16.53 17.00 20.26 17.64 19.93 57.54 100 -# es 12.98 13.28 18.27 14.76 15.64 179.30 100 -# it 12.92 13.15 19.20 14.08 16.08 64.07 100 -# fr 12.99 13.21 17.81 13.59 15.71 67.97 100 -# pt 13.00 13.23 17.30 14.35 15.65 69.85 100
Currently supported are German, Dutch, Spanish, Italian, French and Portuguese.
vignettes/resistance_predict.Rmd
resistance_predict.Rmd
As with many uses in R, we need some additional packages for AMR analysis. Our package works closely together with the tidyverse packages dplyr
and ggplot2
by Dr Hadley Wickham. The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.
Our AMR
package depends on these packages and even extends their use and functions.
library(dplyr) -library(ggplot2) -library(AMR) ++# install.packages(c("tidyverse", "AMR")) ++library(dplyr) +library(ggplot2) +library(AMR) # (if not yet installed, install with:) -# install.packages(c("tidyverse", "AMR"))
Our package contains a function resistance_predict()
, which takes the same input as functions for other AMR analysis. Based on a date column, it calculates cases per year and uses a regression model to predict antimicrobial resistance.
It is basically as easy as:
-# resistance prediction of piperacillin/tazobactam (TZP): -resistance_predict(tbl = example_isolates, col_date = "date", col_ab = "TZP", model = "binomial") - -# or: -example_isolates %>% - resistance_predict(col_ab = "TZP", - model "binomial") - -# to bind it to object 'predict_TZP' for example: -predict_TZP <- example_isolates %>% - resistance_predict(col_ab = "TZP", - model = "binomial")
# resistance prediction of piperacillin/tazobactam (TZP):
+resistance_predict(tbl = example_isolates, col_date = "date", col_ab = "TZP", model = "binomial")
+
+# or:
+example_isolates %>%
+ resistance_predict(col_ab = "TZP",
+ model "binomial")
+
+# to bind it to object 'predict_TZP' for example:
+predict_TZP <- example_isolates %>%
+ resistance_predict(col_ab = "TZP",
+ model = "binomial")
The function will look for a date column itself if col_date
is not set.
When running any of these commands, a summary of the regression model will be printed unless using resistance_predict(..., info = FALSE)
.
# NOTE: Using column `date` as input for `col_date`.
This text is only a printed summary - the actual result (output) of the function is a data.frame
containing for each year: the number of observations, the actual observed resistance, the estimated resistance and the standard error below and above the estimation:
predict_TZP ++# 29 2030 0.48639359 0.3782932 0.5944939 NA NA 0.48639359 ++predict_TZP # year value se_min se_max observations observed estimated # 1 2002 0.20000000 NA NA 15 0.20000000 0.05616378 # 2 2003 0.06250000 NA NA 32 0.06250000 0.06163839 @@ -258,27 +261,36 @@ predict_TZP <- example_isolates %>% # 26 2027 0.41315710 0.3244399 0.5018743 NA NA 0.41315710 # 27 2028 0.43730688 0.3418075 0.5328063 NA NA 0.43730688 # 28 2029 0.46175755 0.3597639 0.5637512 NA NA 0.46175755 -# 29 2030 0.48639359 0.3782932 0.5944939 NA NA 0.48639359
The function plot
is available in base R, and can be extended by other packages to depend the output based on the type of input. We extended its function to cope with resistance predictions:
plot(predict_TZP)
+plot(predict_TZP) +
This is the fastest way to plot the result. It automatically adds the right axes, error bars, titles, number of available observations and type of model.
We also support the ggplot2
package with our custom function ggplot_rsi_predict()
to create more appealing plots:
ggplot_rsi_predict(predict_TZP)
+ggplot_rsi_predict(predict_TZP) +
++ggplot_rsi_predict(predict_TZP, ribbon = FALSE) ++ # choose for error bars instead of a ribbon -ggplot_rsi_predict(predict_TZP, ribbon = FALSE)
Resistance is not easily predicted; if we look at vancomycin resistance in Gram-positive bacteria, the spread (i.e. standard error) is enormous:
-example_isolates %>% - filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>% - resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "binomial") %>% ++# NOTE: Using column `date` as input for `col_date`. ++example_isolates %>% + filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>% + resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "binomial") %>% ggplot_rsi_predict() -# NOTE: Using column `date` as input for `col_date`.
Vancomycin resistance could be 100% in ten years, but might also stay around 0%.
You can define the model with the model
parameter. The model chosen above is a generalised linear regression model using a binomial distribution, assuming that a period of zero resistance was followed by a period of increasing resistance leading slowly to more and more resistance.
For the vancomycin resistance in Gram-positive bacteria, a linear model might be more appropriate since no binomial distribution is to be expected based on the observed years:
-example_isolates %>% - filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>% - resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>% ++# NOTE: Using column `date` as input for `col_date`. ++example_isolates %>% + filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>% + resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>% ggplot_rsi_predict() -# NOTE: Using column `date` as input for `col_date`.
This seems more likely, doesn’t it?
The model itself is also available from the object, as an attribute
:
model <- attributes(predict_TZP)$model ++# year 0.09883005 0.02295317 4.305725 1.664395e-05 ++model <- attributes(predict_TZP)$model -summary(model)$family +summary(model)$family # # Family: binomial # Link function: logit -summary(model)$coefficients +summary(model)$coefficients # Estimate Std. Error z value Pr(>|z|) # (Intercept) -200.67944891 46.17315349 -4.346237 1.384932e-05 -# year 0.09883005 0.02295317 4.305725 1.664395e-05
READ ALL VIGNETTES ON OUR WEBSITE
diff --git a/docs/authors.html b/docs/authors.html index 89ae47a19..2989f5037 100644 --- a/docs/authors.html +++ b/docs/authors.html @@ -81,7 +81,7 @@ @@ -310,7 +310,7 @@ diff --git a/docs/index.html b/docs/index.html index 222f31827..2644aa3f2 100644 --- a/docs/index.html +++ b/docs/index.html @@ -43,7 +43,7 @@ @@ -192,7 +192,7 @@July 2020
+Since you are one of our users, we would like to know how you use the package and what it brought you or your organisation. If you have a minute, please anonymously fill in this short questionnaire. Your valuable input will help to improve the package and its functionalities. You can answer the open questions in either English, Spanish, French, Dutch, or German. Thank you very much in advance!
PLEASE TAKE PART IN OUR SURVEY!
-Since you are one of our users, we would like to know how you use the package and what it brought you or your organisation. If you have a minute, please anonymously fill in this short questionnaire. Your valuable input will help to improve the package and its functionalities. You can answer the open questions in either English, Spanish, French, Dutch, or German. Thank you very much in advance!
Take me to the 5-min survey!
Take me to the 5-min survey!
This package is available here on the official R network (CRAN), which has a peer-reviewed submission process. Install this package in R from CRAN by using the command:
-install.packages("AMR")
+install.packages("AMR") +
It will be downloaded and installed automatically. For RStudio, click on the menu Tools > Install Packages… and then type in “AMR” and press Install.
Note: Not all functions on this website may be available in this latest release. To use all functions and data sets mentioned on this website, install the latest development version.
The latest and unpublished development version can be installed from GitHub using:
-install.packages("remotes") -remotes::install_github("msberends/AMR")
+install.packages("remotes") +remotes::install_github("msberends/AMR") +
NEWS.md
- Support for using dplyr
’s across()
in as.rsi()
to interpret MIC values or disk zone diameters, that now also automatically determines the column with microorganism names or codes.
# until dplyr 1.0.0 -your_data %>% mutate_if(is.mic, as.rsi) -your_data %>% mutate_if(is.disk, as.rsi) ++ your_data %>% mutate(across(where(is.mic), as.rsi)) +your_data %>% mutate(across(where(is.disk), as.rsi)) ++# until dplyr 1.0.0 +your_data %>% mutate_if(is.mic, as.rsi) +your_data %>% mutate_if(is.disk, as.rsi) # since dplyr 1.0.0 - your_data %>% mutate(across(where(is.mic), as.rsi)) -your_data %>% mutate(across(where(is.disk), as.rsi))
Function ab_from_text()
to retrieve antimicrobial drug names, doses and forms of administration from clinical texts in e.g. health care records, which also corrects for misspelling since it uses as.ab()
internally
Tidyverse selection helpers for antibiotic classes, that help to select the columns of antibiotics that are of a specific antibiotic class, without the need to define the columns or antibiotic abbreviations. They can be used in any function that allows selection helpers, like dplyr::select()
and tidyr::pivot_longer()
:
library(dplyr) ++#> Selecting carbapenems: `IPM` (imipenem), `MEM` (meropenem) ++library(dplyr) # Columns 'IPM' and 'MEM' are in the example_isolates data set -example_isolates %>% +example_isolates %>% select(carbapenems()) -#> Selecting carbapenems: `IPM` (imipenem), `MEM` (meropenem)
Added mo_domain()
as an alias to mo_kingdom()
Added function filter_penicillins()
to filter isolates on a specific result in any column with a name in the antimicrobial ‘penicillins’ class (more specific: ATC subgroup Beta-lactam antibacterials, penicillins)
dplyr::all_of()
) now works againdplyr::all_of()
) now works againMaking this package independent of especially the tidyverse (e.g. packages dplyr
and tidyr
) tremendously increases sustainability on the long term, since tidyverse functions change quite often. Good for users, but hard for package maintainers. Most of our functions are replaced with versions that only rely on base R, which keeps this package fully functional for many years to come, without requiring a lot of maintenance to keep up with other packages anymore. Another upside it that this package can now be used with all versions of R since R-3.0.0 (April 2013). Our package is being used in settings where the resources are very limited. Fewer dependencies on newer software is helpful for such settings.
Negative effects of this change are:
freq()
that was borrowed from the cleaner
package was removed. Use cleaner::freq()
, or run library("cleaner")
before you use freq()
.freq()
that was borrowed from the cleaner
package was removed. Use cleaner::freq()
, or run library("cleaner")
before you use freq()
.mo
or rsi
in a tibble will no longer be in colour and printing rsi
in a tibble will show the class <ord>
, not <rsi>
anymore. This is purely a visual effect.mo_*
family (like mo_name()
and mo_gramstain()
) are noticeably slower when running on hundreds of thousands of rows.mo
and ab
now both also inherit class character
, to support any data transformation. This change invalidates code that checks for class length == 1.Fixed important floating point error for some MIC comparisons in EUCAST 2020 guideline
Interpretation from MIC values (and disk zones) to R/SI can now be used with mutate_at()
of the dplyr
package:
yourdata %>% - mutate_at(vars(antibiotic1:antibiotic25), as.rsi, mo = "E. coli") ++yourdata %>% + mutate_at(vars(antibiotic1:antibiotic25), as.rsi, mo = .$mybacteria) ++yourdata %>% + mutate_at(vars(antibiotic1:antibiotic25), as.rsi, mo = "E. coli") -yourdata %>% - mutate_at(vars(antibiotic1:antibiotic25), as.rsi, mo = .$mybacteria)
Added antibiotic abbreviations for a laboratory manufacturer (GLIMS) for cefuroxime, cefotaxime, ceftazidime, cefepime, cefoxitin and trimethoprim/sulfamethoxazole
Added uti
(as abbreviation of urinary tract infections) as parameter to as.rsi()
, so interpretation of MIC values and disk zones can be made dependent on isolates specifically from UTIs
Support for LOINC codes in the antibiotics
data set. Use ab_loinc()
to retrieve LOINC codes, or use a LOINC code for input in any ab_*
function:
Support for SNOMED CT codes in the microorganisms
data set. Use mo_snomed()
to retrieve SNOMED codes, or use a SNOMED code for input in any mo_*
function:
mo_snomed("S. aureus") ++#> [1] "Gram-positive" ++mo_snomed("S. aureus") #> [1] 115329001 3092008 113961008 mo_name(115329001) #> [1] "Staphylococcus aureus" mo_gramstain(115329001) -#> [1] "Gram-positive"
If you were dependent on the old Enterobacteriaceae family e.g. by using in your code:
-if (mo_family(somebugs) == "Enterobacteriaceae") ...
+if (mo_family(somebugs) == "Enterobacteriaceae") ... +
then please adjust this to:
-if (mo_order(somebugs) == "Enterobacterales") ...
+if (mo_order(somebugs) == "Enterobacterales") ... +
Functions susceptibility()
and resistance()
as aliases of proportion_SI()
and proportion_R()
, respectively. These functions were added to make it more clear that “I” should be considered susceptible and not resistant.
library(dplyr) -example_isolates %>% - group_by(bug = mo_name(mo)) %>% - summarise(amoxicillin = resistance(AMX), - amox_clav = resistance(AMC)) %>% - filter(!is.na(amoxicillin) | !is.na(amox_clav))
+library(dplyr) +example_isolates %>% + group_by(bug = mo_name(mo)) %>% + summarise(amoxicillin = resistance(AMX), + amox_clav = resistance(AMC)) %>% + filter(!is.na(amoxicillin) | !is.na(amox_clav)) +
Support for a new MDRO guideline: Magiorakos AP, Srinivasan A et al. “Multidrug-resistant, extensively drug-resistant and pandrug-resistant bacteria: an international expert proposal for interim standard definitions for acquired resistance.” Clinical Microbiology and Infection (2012).
@@ -593,7 +609,8 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/More intelligent way of coping with some consonants like “l” and “r”
Added a score (a certainty percentage) to mo_uncertainties()
, that is calculated using the Levenshtein distance:
as.mo(c("Stafylococcus aureus", ++#> "staphylokok aureuz" -> Staphylococcus aureus (B_STPHY_AURS, score: 85.7%) ++as.mo(c("Stafylococcus aureus", "staphylokok aureuz")) #> Warning: #> Results of two values were guessed with uncertainty. Use mo_uncertainties() to review them. @@ -602,7 +619,8 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/ mo_uncertainties() #> "Stafylococcus aureus" -> Staphylococcus aureus (B_STPHY_AURS, score: 95.2%) -#> "staphylokok aureuz" -> Staphylococcus aureus (B_STPHY_AURS, score: 85.7%)
Determination of first isolates now excludes all ‘unknown’ microorganisms at default, i.e. microbial code "UNKNOWN"
. They can be included with the new parameter include_unknown
:
first_isolate(..., include_unknown = TRUE)
+first_isolate(..., include_unknown = TRUE) +
For WHONET users, this means that all records/isolates with organism code "con"
(contamination) will be excluded at default, since as.mo("con") = "UNKNOWN"
. The function always shows a note with the number of ‘unknown’ microorganisms that were included or excluded.
For code consistency, classes ab
and mo
will now be preserved in any subsetting or assignment. For the sake of data integrity, this means that invalid assignments will now result in NA
:
# how it works in base R: -x <- factor("A") -x[1] <- "B" ++#> invalid microorganism code, NA generated ++# how it works in base R: +x <- factor("A") +x[1] <- "B" #> Warning message: #> invalid factor level, NA generated # how it now works similarly for classes 'mo' and 'ab': -x <- as.mo("E. coli") -x[1] <- "testvalue" +x <- as.mo("E. coli") +x[1] <- "testvalue" #> Warning message: -#> invalid microorganism code, NA generated
This is important, because a value like "testvalue"
could never be understood by e.g. mo_name()
, although the class would suggest a valid microbial code.
Function freq()
has moved to a new package, clean
(CRAN link), since creating frequency tables actually does not fit the scope of this package. The freq()
function still works, since it is re-exported from the clean
package (which will be installed automatically upon updating this AMR
package).
Function freq()
has moved to a new package, clean
(CRAN link), since creating frequency tables actually does not fit the scope of this package. The freq()
function still works, since it is re-exported from the clean
package (which will be installed automatically upon updating this AMR
package).
Renamed data set septic_patients
to example_isolates
Function bug_drug_combinations()
to quickly get a data.frame
with the results of all bug-drug combinations in a data set. The column containing microorganism codes is guessed automatically and its input is transformed with mo_shortname()
at default:
x <- bug_drug_combinations(example_isolates) ++#> NOTE: Use 'format()' on this result to get a publicable/printable format. ++x <- bug_drug_combinations(example_isolates) #> NOTE: Using column `mo` as input for `col_mo`. -x[1:4, ] +x[1:4, ] #> mo ab S I R total #> 1 A. baumannii AMC 0 0 3 3 #> 2 A. baumannii AMK 0 0 0 0 @@ -689,45 +712,52 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/ #> NOTE: Use 'format()' on this result to get a publicable/printable format. # change the transformation with the FUN argument to anything you like: -x <- bug_drug_combinations(example_isolates, FUN = mo_gramstain) +x <- bug_drug_combinations(example_isolates, FUN = mo_gramstain) #> NOTE: Using column `mo` as input for `col_mo`. -x[1:4, ] +x[1:4, ] #> mo ab S I R total #> 1 Gram-negative AMC 469 89 174 732 #> 2 Gram-negative AMK 251 0 2 253 #> 3 Gram-negative AMP 227 0 405 632 #> 4 Gram-negative AMX 227 0 405 632 -#> NOTE: Use 'format()' on this result to get a publicable/printable format.
You can format this to a printable format, ready for reporting or exporting to e.g. Excel with the base R format()
function:
format(x, combine_IR = FALSE)
+format(x, combine_IR = FALSE) +
Additional way to calculate co-resistance, i.e. when using multiple antimicrobials as input for portion_*
functions or count_*
functions. This can be used to determine the empiric susceptibility of a combination therapy. A new parameter only_all_tested
(which defaults to FALSE
) replaces the old also_single_tested
and can be used to select one of the two methods to count isolates and calculate portions. The difference can be seen in this example table (which is also on the portion
and count
help pages), where the %SI is being determined:
# -------------------------------------------------------------------- -# only_all_tested = FALSE only_all_tested = TRUE -# ----------------------- ----------------------- -# Drug A Drug B include as include as include as include as -# numerator denominator numerator denominator -# -------- -------- ---------- ----------- ---------- ----------- -# S or I S or I X X X X -# R S or I X X X X -# <NA> S or I X X - - -# S or I R X X X X -# R R - X - X -# <NA> R - - - - -# S or I <NA> X X - - -# R <NA> - - - - -# <NA> <NA> - - - - -# --------------------------------------------------------------------
+# -------------------------------------------------------------------- +# only_all_tested = FALSE only_all_tested = TRUE +# ----------------------- ----------------------- +# Drug A Drug B include as include as include as include as +# numerator denominator numerator denominator +# -------- -------- ---------- ----------- ---------- ----------- +# S or I S or I X X X X +# R S or I X X X X +# <NA> S or I X X - - +# S or I R X X X X +# R R - X - X +# <NA> R - - - - +# S or I <NA> X X - - +# R <NA> - - - - +# <NA> <NA> - - - - +# -------------------------------------------------------------------- +
Since this is a major change, usage of the old also_single_tested
will throw an informative error that it has been replaced by only_all_tested
.
tibble
printing support for classes rsi
, mic
, disk
, ab
mo
. When using tibble
s containing antimicrobial columns, values S
will print in green, values I
will print in yellow and values R
will print in red. Microbial IDs (class mo
) will emphasise on the genus and species, not on the kingdom.
Function rsi_df()
to transform a data.frame
to a data set containing only the microbial interpretation (S, I, R), the antibiotic, the percentage of S/I/R and the number of available isolates. This is a convenient combination of the existing functions count_df()
and portion_df()
to immediately show resistance percentages and number of available isolates:
septic_patients %>% - select(AMX, CIP) %>% ++# 4 Ciprofloxacin R 0.1618169 228 ++septic_patients %>% + select(AMX, CIP) %>% rsi_df() # antibiotic interpretation value isolates # 1 Amoxicillin SI 0.4442636 546 # 2 Amoxicillin R 0.5557364 683 # 3 Ciprofloxacin SI 0.8381831 1181 -# 4 Ciprofloxacin R 0.1618169 228
Support for all scientifically published pathotypes of E. coli to date (that we could find). Supported are:
@@ -829,12 +861,14 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/All these lead to the microbial ID of E. coli:
-as.mo("UPEC") ++# "Gram-negative" ++as.mo("UPEC") # B_ESCHR_COL mo_name("UPEC") # "Escherichia coli" mo_gramstain("EHEC") -# "Gram-negative"
Function mo_info()
as an analogy to ab_info()
. The mo_info()
prints a list with the full taxonomy, authors, and the URL to the online database of a microorganism
Function mo_synonyms()
to get all previously accepted taxonomic names of a microorganism
as.mo()
plot()
and barplot()
for MIC and RSI classesplot()
and barplot()
for MIC and RSI classesas.mo()
age()
function gained a new parameter exact
to determine ages with decimalsguess_mo()
, guess_atc()
, EUCAST_rules()
, interpretive_reading()
, rsi()
freq()
):
+freq()
):
speed improvement for microbial IDs
fixed factor level names for R Markdown
when all values are unique it now shows a message instead of a warning
support for boxplots:
-age_groups()
, to let groups of fives and tens end with 100+ instead of 120+freq()
for when all values are NA
+freq()
for when all values are NA
first_isolate()
for when dates are missingguess_ab_col()
@@ -1025,7 +1061,8 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/
New filters for antimicrobial classes. Use these functions to filter isolates on results in one of more antibiotics from a specific class:
-filter_aminoglycosides() ++filter_tetracyclines() ++filter_aminoglycosides() filter_carbapenems() filter_cephalosporins() filter_1st_cephalosporins() @@ -1035,23 +1072,28 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/ filter_fluoroquinolones() filter_glycopeptides() filter_macrolides() -filter_tetracyclines()
The antibiotics
data set will be searched, after which the input data will be checked for column names with a value in any abbreviations, codes or official names found in the antibiotics
data set. For example:
septic_patients %>% filter_glycopeptides(result = "R") ++septic_patients %>% filter_glycopeptides(result = "R", scope = "all") +# Filtering on glycopeptide antibacterials: all of `vanc` and `teic` is R ++septic_patients %>% filter_glycopeptides(result = "R") # Filtering on glycopeptide antibacterials: any of `vanc` or `teic` is R -septic_patients %>% filter_glycopeptides(result = "R", scope = "all") -# Filtering on glycopeptide antibacterials: all of `vanc` and `teic` is R
All ab_*
functions are deprecated and replaced by atc_*
functions:
ab_property -> atc_property() -ab_name -> atc_name() -ab_official -> atc_official() -ab_trivial_nl -> atc_trivial_nl() -ab_certe -> atc_certe() -ab_umcg -> atc_umcg() -ab_tradenames -> atc_tradenames()
These functions use as.atc()
internally. The old atc_property
has been renamed atc_online_property()
. This is done for two reasons: firstly, not all ATC codes are of antibiotics (ab) but can also be of antivirals or antifungals. Secondly, the input must have class atc
or must be coerable to this class. Properties of these classes should start with the same class name, analogous to as.mo()
and e.g. mo_genus
.
+ab_property -> atc_property() +ab_name -> atc_name() +ab_official -> atc_official() +ab_trivial_nl -> atc_trivial_nl() +ab_certe -> atc_certe() +ab_umcg -> atc_umcg() +ab_tradenames -> atc_tradenames() +
These functions use as.atc()
internally. The old atc_property
has been renamed atc_online_property()
. This is done for two reasons: firstly, not all ATC codes are of antibiotics (ab) but can also be of antivirals or antifungals. Secondly, the input must have class atc
or must be coerable to this class. Properties of these classes should start with the same class name, analogous to as.mo()
and e.g. mo_genus
.
New functions set_mo_source()
and get_mo_source()
to use your own predefined MO codes as input for as.mo()
and consequently all mo_*
functions
Support for the upcoming dplyr
version 0.8.0
New function age()
to calculate the (patients) age in years
New function age_groups()
to split ages into custom or predefined groups (like children or elderly). This allows for easier demographic antimicrobial resistance analysis per age group.
New function ggplot_rsi_predict()
as well as the base R plot()
function can now be used for resistance prediction calculated with resistance_predict()
:
x <- resistance_predict(septic_patients, col_ab = "amox") -plot(x) -ggplot_rsi_predict(x)
New function ggplot_rsi_predict()
as well as the base R plot()
function can now be used for resistance prediction calculated with resistance_predict()
:
+x <- resistance_predict(septic_patients, col_ab = "amox") +plot(x) +ggplot_rsi_predict(x) +
Functions filter_first_isolate()
and filter_first_weighted_isolate()
to shorten and fasten filtering on data sets with antimicrobial results, e.g.:
septic_patients %>% filter_first_isolate(...) ++filter_first_isolate(septic_patients, ...) ++septic_patients %>% filter_first_isolate(...) # or -filter_first_isolate(septic_patients, ...)
is equal to:
-septic_patients %>% - mutate(only_firsts = first_isolate(septic_patients, ...)) %>% - filter(only_firsts == TRUE) %>% - select(-only_firsts)
+septic_patients %>% + mutate(only_firsts = first_isolate(septic_patients, ...)) %>% + filter(only_firsts == TRUE) %>% + select(-only_firsts) +
New function availability()
to check the number of available (non-empty) results in a data.frame
New vignettes about how to conduct AMR analysis, predict antimicrobial resistance, use the G-test and more. These are also available (and even easier readable) on our website: https://msberends.gitlab.io/AMR.
microorganisms.oldDT
, microorganisms.prevDT
, microorganisms.unprevDT
and microorganismsDT
since they were no longer needed and only contained info already available in the microorganisms
data setantibiotics
data set, from the Pharmaceuticals Community Register of the European Commissionatc_group1_nl
and atc_group2_nl
from the antibiotics
data setatc_ddd()
and atc_groups()
have been renamed atc_online_ddd()
and atc_online_groups()
. The old functions are deprecated and will be removed in a future version.atc_ddd()
and atc_groups()
have been renamed atc_online_ddd()
and atc_online_groups()
. The old functions are deprecated and will be removed in a future version.guess_mo()
is now deprecated in favour of as.mo()
and will be removed in future versionsguess_atc()
is now deprecated in favour of as.atc()
and will be removed in future versionsas.mo()
:
Now handles incorrect spelling, like i
instead of y
and f
instead of ph
:
# mo_fullname() uses as.mo() internally ++#> [1] "Staphylococcus kloosii" ++# mo_fullname() uses as.mo() internally mo_fullname("Sthafilokockus aaureuz") #> [1] "Staphylococcus aureus" mo_fullname("S. klossi") -#> [1] "Staphylococcus kloosii"
Uncertainty of the algorithm is now divided into four levels, 0 to 3, where the default allow_uncertain = TRUE
is equal to uncertainty level 2. Run ?as.mo
for more info about these levels.
# equal: -as.mo(..., allow_uncertain = TRUE) -as.mo(..., allow_uncertain = 2) ++as.mo(..., allow_uncertain = FALSE) +as.mo(..., allow_uncertain = 0) ++# equal: +as.mo(..., allow_uncertain = TRUE) +as.mo(..., allow_uncertain = 2) # also equal: -as.mo(..., allow_uncertain = FALSE) -as.mo(..., allow_uncertain = 0)
Using as.mo(..., allow_uncertain = 3)
could lead to very unreliable results.
Implemented the latest publication of Becker et al. (2019), for categorising coagulase-negative Staphylococci
All microbial IDs that found are now saved to a local file ~/.Rhistory_mo
. Use the new function clean_mo_history()
to delete this file, which resets the algorithms.
Incoercible results will now be considered ‘unknown’, MO code UNKNOWN
. On foreign systems, properties of these will be translated to all languages already previously supported: German, Dutch, French, Italian, Spanish and Portuguese:
Fix for vector containing only empty values
Finds better results when input is in other languages
freq()
function):
+freq()
function):
Support for tidyverse quasiquotation! Now you can create frequency tables of function outcomes:
-# Determine genus of microorganisms (mo) in `septic_patients` data set: ++septic_patients %>% + group_by(gender) %>% + freq(mo_genus(mo)) ++# Determine genus of microorganisms (mo) in `septic_patients` data set: # OLD WAY -septic_patients %>% - mutate(genus = mo_genus(mo)) %>% - freq(genus) +septic_patients %>% + mutate(genus = mo_genus(mo)) %>% + freq(genus) # NEW WAY -septic_patients %>% - freq(mo_genus(mo)) +septic_patients %>% + freq(mo_genus(mo)) # Even supports grouping variables: -septic_patients %>% - group_by(gender) %>% - freq(mo_genus(mo))
Header info is now available as a list, with the header
function
The parameter header
is now set to TRUE
at default, even for markdown
Fewer than 3 characters as input for as.mo
will return NA
Function as.mo
(and all mo_*
wrappers) now supports genus abbreviations with “species” attached
as.mo("E. species") # B_ESCHR ++mo_fullname("S. species") # "Staphylococcus species" ++as.mo("E. species") # B_ESCHR mo_fullname("E. spp.") # "Escherichia species" as.mo("S. spp") # B_STPHY -mo_fullname("S. species") # "Staphylococcus species"
Added parameter combine_IR
(TRUE/FALSE) to functions portion_df
and count_df
, to indicate that all values of I and R must be merged into one, so the output only consists of S vs. IR (susceptible vs. non-susceptible)
Fix for portion_*(..., as_percent = TRUE)
when minimal number of isolates would not be met
Using portion_*
functions now throws a warning when total available isolate is below parameter minimum
Functions as.mo
, as.rsi
, as.mic
, as.atc
and freq
will not set package name as attribute anymore
Frequency tables - freq()
:
Frequency tables - freq()
:
Support for grouping variables, test with:
-septic_patients %>% - group_by(hospital_id) %>% - freq(gender)
Support for (un)selecting columns:
-septic_patients %>% - freq(hospital_id) %>% - select(-count, -cum_count) # only get item, percent, cum_percent
Check for hms::is.hms
Now prints in markdown at default in non-interactive sessions
Removed diacritics from all authors (columns microorganisms$ref
and microorganisms.old$ref
) to comply with CRAN policy to only allow ASCII characters
Fix for mo_property
not working properly
Fix for eucast_rules
where some Streptococci would become ceftazidime R in EUCAST rule 4.5
Support for named vectors of class mo
, useful for top_freq()
Support for named vectors of class mo
, useful for top_freq()
ggplot_rsi
and scale_y_percent
have breaks
parameter
AI improvements for as.mo
:
They also come with support for German, Dutch, French, Italian, Spanish and Portuguese:
-mo_gramstain("E. coli") ++mo_fullname("S. group A", language = "pt") # Portuguese +# [1] "Streptococcus grupo A" ++mo_gramstain("E. coli") # [1] "Gram negative" -mo_gramstain("E. coli", language = "de") # German +mo_gramstain("E. coli", language = "de") # German # [1] "Gramnegativ" -mo_gramstain("E. coli", language = "es") # Spanish +mo_gramstain("E. coli", language = "es") # Spanish # [1] "Gram negativo" -mo_fullname("S. group A", language = "pt") # Portuguese -# [1] "Streptococcus grupo A"
Furthermore, former taxonomic names will give a note about the current taxonomic name:
-mo_gramstain("Esc blattae") ++# [1] "Gram negative" ++mo_gramstain("Esc blattae") # Note: 'Escherichia blattae' (Burgess et al., 1973) was renamed 'Shimwellia blattae' (Priest and Barker, 2010) -# [1] "Gram negative"
Functions count_R
, count_IR
, count_I
, count_SI
and count_S
to selectively count resistant or susceptible isolates
Function is.rsi.eligible
to check for columns that have valid antimicrobial results, but do not have the rsi
class yet. Transform the columns of your raw data with: data %>% mutate_if(is.rsi.eligible, as.rsi)
Functions as.mo
and is.mo
as replacements for as.bactid
and is.bactid
(since the microoganisms
data set not only contains bacteria). These last two functions are deprecated and will be removed in a future release. The as.mo
function determines microbial IDs using intelligent rules:
as.mo("E. coli") ++# [1] B_STRPTC_GRA ++as.mo("E. coli") # [1] B_ESCHR_COL as.mo("MRSA") # [1] B_STPHY_AUR as.mo("S group A") -# [1] B_STRPTC_GRA
And with great speed too - on a quite regular Linux server from 2007 it takes us less than 0.02 seconds to transform 25,000 items:
-thousands_of_E_colis <- rep("E. coli", 25000) -microbenchmark::microbenchmark(as.mo(thousands_of_E_colis), unit = "s") ++# 0.01817717 0.01843957 0.03878077 100 ++thousands_of_E_colis <- rep("E. coli", 25000) +microbenchmark::microbenchmark(as.mo(thousands_of_E_colis), unit = "s") # Unit: seconds # min median max neval -# 0.01817717 0.01843957 0.03878077 100
Added parameter reference_df
for as.mo
, so users can supply their own microbial IDs, name or codes as a reference table
Added three antimicrobial agents to the antibiotics
data set: Terbinafine (D01BA02), Rifaximin (A07AA11) and Isoconazole (D01AC05)
Added 163 trade names to the antibiotics
data set, it now contains 298 different trade names in total, e.g.:
ab_official("Bactroban") ++# [1] "R01AX06" "J01CA04" "J01FA10" "J01CF05" ++ab_official("Bactroban") # [1] "Mupirocin" ab_name(c("Bactroban", "Amoxil", "Zithromax", "Floxapen")) # [1] "Mupirocin" "Amoxicillin" "Azithromycin" "Flucloxacillin" ab_atc(c("Bactroban", "Amoxil", "Zithromax", "Floxapen")) -# [1] "R01AX06" "J01CA04" "J01FA10" "J01CF05"
For first_isolate
, rows will be ignored when there’s no species available
Function ratio
is now deprecated and will be removed in a future release, as it is not really the scope of this package
Added parameters minimum
and as_percent
to portion_df
Support for quasiquotation in the functions series count_*
and portions_*
, and n_rsi
. This allows to check for more than 2 vectors or columns.
septic_patients %>% select(amox, cipr) %>% count_IR() ++septic_patients %>% portion_S(amcl) +septic_patients %>% portion_S(amcl, gent) +septic_patients %>% portion_S(amcl, gent, pita) ++septic_patients %>% select(amox, cipr) %>% count_IR() # which is the same as: -septic_patients %>% count_IR(amox, cipr) +septic_patients %>% count_IR(amox, cipr) -septic_patients %>% portion_S(amcl) -septic_patients %>% portion_S(amcl, gent) -septic_patients %>% portion_S(amcl, gent, pita)
Edited ggplot_rsi
and geom_rsi
so they can cope with count_df
. The new fun
parameter has value portion_df
at default, but can be set to count_df
.
Fix for ggplot_rsi
when the ggplot2
package was not loaded
Added longest en shortest character length in the frequency table (freq
) header of class character
Support for types (classes) list and matrix for freq
For lists, subsetting is possible:
-my_list = list(age = septic_patients$age, gender = septic_patients$gender) -my_list %>% freq(age) -my_list %>% freq(gender)
rsi
(antimicrobial resistance) to use as inputtable
to use as input: freq(table(x, y))
+table
to use as input: freq(table(x, y))
hist
and plot
to use a frequency table as input: hist(freq(df$age))
as.vector
, as.data.frame
, as_tibble
and format
freq(mydata, mycolumn)
is the same as mydata %>% freq(mycolumn)
+freq(mydata, mycolumn)
is the same as mydata %>% freq(mycolumn)
top_freq
function to return the top/below n items as vectorThese functions are so-called 'Deprecated'. They will be removed in a future release. Using the functions will give a warning with the name of the function it has been replaced by (if there is one).
-portion_R(...) +portion_R(...) -portion_IR(...) +portion_IR(...) -portion_I(...) +portion_I(...) -portion_SI(...) +portion_SI(...) -portion_S(...) +portion_S(...) -portion_df(...)+portion_df(...)
AMR
Package — AMR • AMR (for R)This example data set has the exact same structure as an export file from WHONET. Such files can be used with this package, as this example data set shows. The data itself was based on our example_isolates data set.
-WHONET
+ WHONET
ab_from_text( - text, - type = c("drug", "dose", "administration"), - collapse = NULL, - translate_ab = FALSE, - thorough_search = NULL, - ... + text, + type = c("drug", "dose", "administration"), + collapse = NULL, + translate_ab = FALSE, + thorough_search = NULL, + ... )
Use these functions to return a specific property of an antibiotic from the antibiotics data set. All input values will be evaluated internally with as.ab()
.
ab_name(x, language = get_locale(), tolower = FALSE, ...) +ab_name(x, language = get_locale(), tolower = FALSE, ...) -ab_atc(x, ...) +ab_atc(x, ...) -ab_cid(x, ...) +ab_cid(x, ...) -ab_synonyms(x, ...) +ab_synonyms(x, ...) -ab_tradenames(x, ...) +ab_tradenames(x, ...) -ab_group(x, language = get_locale(), ...) +ab_group(x, language = get_locale(), ...) -ab_atc_group1(x, language = get_locale(), ...) +ab_atc_group1(x, language = get_locale(), ...) -ab_atc_group2(x, language = get_locale(), ...) +ab_atc_group2(x, language = get_locale(), ...) -ab_loinc(x, ...) +ab_loinc(x, ...) -ab_ddd(x, administration = "oral", units = FALSE, ...) +ab_ddd(x, administration = "oral", units = FALSE, ...) -ab_info(x, language = get_locale(), ...) +ab_info(x, language = get_locale(), ...) -ab_url(x, open = FALSE, ...) +ab_url(x, open = FALSE, ...) -ab_property(x, property = "name", language = get_locale(), ...)+ab_property(x, property = "name", language = get_locale(), ...)