vignettes/AMR.Rmd
AMR.Rmd
Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 08 December 2020.
+Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 21 December 2020.
As with many uses in R, we need some additional packages for AMR analysis. Our package works closely together with the tidyverse packages dplyr
and ggplot2
by RStudio. The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.
We will also use the cleaner
package, that can be used for cleaning data and creating frequency tables.
-library(dplyr) ++# install.packages(c("dplyr", "ggplot2", "AMR", "cleaner"))+library(dplyr) library(ggplot2) library(AMR) library(cleaner) # (if not yet installed, install with:) -# install.packages(c("dplyr", "ggplot2", "AMR", "cleaner"))
To start with patients, we need a unique list of patients.
- +The LETTERS
object is available in R - it’s a vector with 26 characters: A
to Z
. The patients
object we just created is now a vector of length 260, with values (patient IDs) varying from A1
to Z10
. Now we we also set the gender of our patients, by putting the ID and the gender in a table:
-patients_table <- data.frame(patient_id = patients, ++ rep("F", 125)))+patients_table <- data.frame(patient_id = patients, gender = c(rep("M", 135), - rep("F", 125)))
The first 135 patient IDs are now male, the other 125 are female.
Let’s pretend that our data consists of blood cultures isolates from between 1 January 2010 and 1 January 2018.
- +This dates
object now contains all days in our date range.
For this tutorial, we will uses four different microorganisms: Escherichia coli, Staphylococcus aureus, Streptococcus pneumoniae, and Klebsiella pneumoniae:
--bacteria <- c("Escherichia coli", "Staphylococcus aureus", - "Streptococcus pneumoniae", "Klebsiella pneumoniae")
+bacteria <- c("Escherichia coli", "Staphylococcus aureus",
+ "Streptococcus pneumoniae", "Klebsiella pneumoniae")
For completeness, we can also add the hospital where the patients was admitted and we need to define valid antibmicrobial results for our randomisation:
- +Using the sample()
function, we can randomly select items from all objects we defined earlier. To let our fake data reflect reality a bit, we will also approximately define the probabilities of bacteria and the antibiotic results with the prob
parameter.
-sample_size <- 20000 ++ prob = c(0.92, 0.00, 0.08)))+sample_size <- 20000 data <- data.frame(date = sample(dates, size = sample_size, replace = TRUE), patient_id = sample(patients, size = sample_size, replace = TRUE), hospital = sample(hospitals, size = sample_size, replace = TRUE, @@ -331,13 +348,13 @@ CIP = sample(ab_interpretations, size = sample_size, replace = TRUE, prob = c(0.80, 0.00, 0.20)), GEN = sample(ab_interpretations, size = sample_size, replace = TRUE, - prob = c(0.92, 0.00, 0.08)))
Using the left_join()
function from the dplyr
package, we can ‘map’ the gender to the patient ID using the patients_table
object we created earlier:
-data <- data %>% left_join(patients_table)
+data <- data %>% left_join(patients_table)
The resulting data set contains 20,000 blood culture isolates. With the head()
function we can preview the first 6 rows of this data set:
-head(data)
+head(data)
date | @@ -352,42 +369,9 @@||||||||||
---|---|---|---|---|---|---|---|---|---|---|
2015-10-08 | -Y10 | -Hospital A | -Escherichia coli | -S | -S | -R | -S | -F | -||
2014-04-05 | -O10 | -Hospital C | -Escherichia coli | -R | -I | -S | -S | -F | -||
2012-07-06 | -H9 | -Hospital A | -Escherichia coli | -R | -S | -S | -S | -M | -||
2016-09-05 | -U3 | -Hospital C | +2013-07-24 | +S10 | +Hospital D | Staphylococcus aureus | R | S | @@ -395,28 +379,61 @@S | F |
2017-05-23 | -S5 | -Hospital C | -Escherichia coli | -I | -I | +|||||
2011-06-06 | +X7 | +Hospital B | +Streptococcus pneumoniae | R | S | +S | +S | F | ||
2013-11-30 | -J2 | -Hospital C | -Streptococcus pneumoniae | +|||||||
2010-08-03 | +E6 | +Hospital B | +Escherichia coli | +R | +I | S | S | +M | +||
2014-03-25 | +S8 | +Hospital B | +Escherichia coli | +S | R | S | +S | +F | +||
2010-09-19 | +J1 | +Hospital D | +Klebsiella pneumoniae | +I | +R | +S | +S | M | ||
2017-05-11 | +T3 | +Hospital A | +Escherichia coli | +S | +S | +S | +S | +F | +
Now, let’s start the cleaning and the analysis!
@@ -427,8 +444,8 @@ Cleaning the dataWe also created a package dedicated to data cleaning and checking, called the cleaner
package. It freq()
function can be used to create frequency tables.
For example, for the gender
variable:
-data %>% freq(gender)
+data %>% freq(gender)
Frequency table
Class: character
Length: 20,000
@@ -449,16 +466,16 @@ Longest: 1
So, we can draw at least two conclusions immediately. From a data scientists perspective, the data looks clean: only values M
and F
. From a researchers perspective: there are slightly more men. Nothing we didn’t already know.
The data is already quite clean, but we still need to transform some variables. The bacteria
column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate()
function of the dplyr
package makes this really easy:
We also want to transform the antibiotics, because in real life data we don’t know if they are really clean. The as.rsi()
function ensures reliability and reproducibility in these kind of variables. The mutate_at()
will run the as.rsi()
function on defined variables:
Finally, we will apply EUCAST rules on our antimicrobial results. In Europe, most medical microbiological laboratories already apply these rules. Our package features their latest insights on intrinsic resistance and exceptional phenotypes. Moreover, the eucast_rules()
function can also apply additional rules, like forcing
Because the amoxicillin (column AMX
) and amoxicillin/clavulanic acid (column AMC
) in our data were generated randomly, some rows will undoubtedly contain AMX = S and AMC = R, which is technically impossible. The eucast_rules()
fixes this:
-data <- eucast_rules(data, col_mo = "bacteria", rules = "all") -# Set amoxicillin (AMX) = R where amoxicillin/clavulanic acid (AMC) = R
+data <- eucast_rules(data, col_mo = "bacteria", rules = "all")
Now that we have the microbial ID, we can add some taxonomic properties:
--data <- data %>% ++ species = mo_species(bacteria))+data <- data %>% mutate(gramstain = mo_gramstain(bacteria), genus = mo_genus(bacteria), - species = mo_species(bacteria))
(…) When preparing a cumulative antibiogram to guide clinical decisions about empirical antimicrobial therapy of initial infections, only the first isolate of a given species per patient, per analysis period (eg, one year) should be included, irrespective of body site, antimicrobial susceptibility profile, or other phenotypical characteristics (eg, biotype). The first isolate is easily identified, and cumulative antimicrobial susceptibility test data prepared using the first isolate are generally comparable to cumulative antimicrobial susceptibility test data calculated by other methods, providing duplicate isolates are excluded.
M39-A4 Analysis and Presentation of Cumulative Antimicrobial Susceptibility Test Data, 4th Edition. CLSI, 2014. Chapter 6.4
This AMR
package includes this methodology with the first_isolate()
function. It adopts the episode of a year (can be changed by user) and it starts counting days after every selected isolate. This new variable can easily be added to our data:
-data <- data %>% +-+data <- data %>% mutate(first = first_isolate(.)) # NOTE: Using column 'bacteria' as input for `col_mo`. # NOTE: Using column 'date' as input for `col_date`. -# NOTE: Using column 'patient_id' as input for `col_patient_id`.
So only 28.6% is suitable for resistance analysis! We can now filter on it with the
-filter()
function, also from thedplyr
package:+# NOTE: Using column 'patient_id' as input for `col_patient_id`.-data_1st <- data %>% - filter(first == TRUE)
So only 28.4% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
+data_1st <- data %>%
+ filter(first == TRUE)
For future use, the above two syntaxes can be shortened with the filter_first_isolate()
function:
-data_1st <- data %>% - filter_first_isolate()
+data_1st <- data %>%
+ filter_first_isolate()
We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient A3, sorted on date:
+We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient R5, sorted on date:
isolate | @@ -532,54 +548,54 @@ Longest: 1||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | -2010-01-04 | -A3 | +2010-02-05 | +R5 | B_ESCHR_COLI | -R | -R | +S | +S | R | S | TRUE | ||
2 | -2010-01-05 | -A3 | +2010-04-22 | +R5 | B_ESCHR_COLI | +S | +S | R | S | -S | -S | FALSE | ||
3 | -2010-02-07 | -A3 | +2010-05-16 | +R5 | B_ESCHR_COLI | -R | S | -R | +S | +S | S | FALSE | ||
4 | -2010-06-17 | -A3 | +2010-05-29 | +R5 | B_ESCHR_COLI | S | S | -S | +R | S | FALSE | |||
5 | -2010-07-23 | -A3 | +2010-06-16 | +R5 | B_ESCHR_COLI | -S | +R | S | S | S | @@ -587,8 +603,8 @@ Longest: 1||||
6 | -2010-11-16 | -A3 | +2010-10-06 | +R5 | B_ESCHR_COLI | S | S | @@ -598,54 +614,54 @@ Longest: 1|||||||
7 | -2010-12-05 | -A3 | +2010-12-22 | +R5 | B_ESCHR_COLI | -I | -I | +R | +S | S | S | FALSE | ||
8 | -2011-01-17 | -A3 | +2011-04-14 | +R5 | B_ESCHR_COLI | S | S | -R | +S | S | TRUE | |||
9 | -2011-01-26 | -A3 | +2011-11-27 | +R5 | B_ESCHR_COLI | S | S | S | -S | +R | FALSE | |||
10 | -2011-02-12 | -A3 | +2012-11-12 | +R5 | B_ESCHR_COLI | -R | -R | -R | S | -FALSE | +S | +S | +S | +TRUE |
Only 2 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics()
function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.
Only 3 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics()
function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.
If a column exists with a name like ‘key(…)ab’ the first_isolate()
function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output:
-data <- data %>% ++# col_keyantibiotics = FALSE to prevent this.+data <- data %>% mutate(keyab = key_antibiotics(.)) %>% mutate(first_weighted = first_isolate(.)) # NOTE: Using column 'bacteria' as input for `col_mo`. @@ -653,7 +669,7 @@ Longest: 1 # NOTE: Using column 'date' as input for `col_date`. # NOTE: Using column 'patient_id' as input for `col_patient_id`. # NOTE: Using column 'keyab' as input for `col_keyantibiotics`. Use -# col_keyantibiotics = FALSE to prevent this.
isolate | @@ -670,11 +686,11 @@ Longest: 1|||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | -2010-01-04 | -A3 | +2010-02-05 | +R5 | B_ESCHR_COLI | -R | -R | +S | +S | R | S | TRUE | @@ -682,130 +698,129 @@ Longest: 1|||
2 | -2010-01-05 | -A3 | +2010-04-22 | +R5 | B_ESCHR_COLI | +S | +S | R | S | -S | -S | FALSE | -TRUE | +FALSE | |
3 | -2010-02-07 | -A3 | +2010-05-16 | +R5 | B_ESCHR_COLI | -R | S | -R | +S | +S | S | FALSE | TRUE | ||
4 | -2010-06-17 | -A3 | +2010-05-29 | +R5 | B_ESCHR_COLI | S | S | -S | +R | S | FALSE | TRUE | |||
5 | -2010-07-23 | -A3 | +2010-06-16 | +R5 | B_ESCHR_COLI | -S | +R | S | S | S | FALSE | -FALSE | +TRUE | ||
6 | -2010-11-16 | -A3 | +2010-10-06 | +R5 | B_ESCHR_COLI | S | S | S | S | FALSE | -FALSE | +TRUE | |||
7 | -2010-12-05 | -A3 | +2010-12-22 | +R5 | B_ESCHR_COLI | -I | -I | +R | +S | S | S | FALSE | -FALSE | +TRUE | |
8 | -2011-01-17 | -A3 | +2011-04-14 | +R5 | B_ESCHR_COLI | S | S | -R | +S | S | TRUE | TRUE | |||
9 | -2011-01-26 | -A3 | +2011-11-27 | +R5 | B_ESCHR_COLI | S | S | S | -S | +R | FALSE | TRUE | |||
10 | -2011-02-12 | -A3 | +2012-11-12 | +R5 | B_ESCHR_COLI | -R | -R | -R | S | -FALSE | +S | +S | +S | +TRUE | TRUE |
Instead of 2, now 7 isolates are flagged. In total, 79.1% of all isolates are marked ‘first weighted’ - 50.6% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
+Instead of 3, now 9 isolates are flagged. In total, 78.4% of all isolates are marked ‘first weighted’ - 50.1% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
As with filter_first_isolate()
, there’s a shortcut for this new algorithm too:
-data_1st <- data %>% - filter_first_weighted_isolate()
So we end up with 15,822 isolates for analysis.
+
+data_1st <- data %>%
+ filter_first_weighted_isolate()
So we end up with 15,689 isolates for analysis.
We can remove unneeded columns:
- +Now our data looks like:
--head(data_1st)
+head(data_1st)
date | patient_id | hospital | @@ -838,101 +852,95 @@ Longest: 1||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | -2012-07-06 | -H9 | -Hospital A | -B_ESCHR_COLI | -R | -S | -S | -S | -M | -Gram-negative | -Escherichia | -coli | -TRUE | -|||||
4 | -2016-09-05 | -U3 | -Hospital C | -B_STPHY_AURS | -R | -S | -S | -S | -F | -Gram-positive | -Staphylococcus | -aureus | -TRUE | -|||||
5 | -2017-05-23 | -S5 | -Hospital C | -B_ESCHR_COLI | -I | -I | -R | -S | -F | -Gram-negative | -Escherichia | -coli | -TRUE | -|||||
8 | -2016-07-20 | -D2 | -Hospital B | -B_ESCHR_COLI | -R | -I | -S | -S | -M | -Gram-negative | -Escherichia | -coli | -TRUE | -|||||
9 | -2010-09-20 | -H10 | +2013-07-24 | +S10 | Hospital D | B_STPHY_AURS | R | -R | -R | S | -M | +S | +S | +F | Gram-positive | Staphylococcus | aureus | TRUE |
10 | -2012-03-30 | -Y4 | +2011-06-06 | +X7 | Hospital B | B_STRPT_PNMN | -S | -S | R | R | +S | +R | F | Gram-positive | Streptococcus | pneumoniae | TRUE | |
2010-08-03 | +E6 | +Hospital B | +B_ESCHR_COLI | +R | +I | +S | +S | +M | +Gram-negative | +Escherichia | +coli | +TRUE | +||||||
2014-03-25 | +S8 | +Hospital B | +B_ESCHR_COLI | +R | +R | +S | +S | +F | +Gram-negative | +Escherichia | +coli | +TRUE | +||||||
2010-09-19 | +J1 | +Hospital D | +B_KLBSL_PNMN | +R | +R | +S | +S | +M | +Gram-negative | +Klebsiella | +pneumoniae | +TRUE | +||||||
2017-05-11 | +T3 | +Hospital A | +B_ESCHR_COLI | +S | +S | +S | +S | +F | +Gram-negative | +Escherichia | +coli | +TRUE | +
Time for the analysis!
@@ -947,15 +955,15 @@ Longest: 1 Dispersion of speciesTo just get an idea how the species are distributed, create a frequency table with our freq()
function. We created the genus
and species
column earlier based on the microbial ID. With paste()
, we can concatenate them together.
The freq()
function can be used like the base R language was intended:
Or can be used like the dplyr
way, which is easier readable:
-data_1st %>% freq(genus, species)
+data_1st %>% freq(genus, species)
Frequency table
Class: character
-Length: 15,822
-Available: 15,822 (100%, NA: 0 = 0%)
+Length: 15,689
+Available: 15,689 (100%, NA: 0 = 0%)
Unique: 4
Shortest: 16
Longest: 24
If you want to get a quick glance of the number of isolates in different bug/drug combinations, you can use the bug_drug_combinations()
function:
-data_1st %>% ++ head() # show first 6 rows+data_1st %>% bug_drug_combinations() %>% - head() # show first 6 rows
# NOTE: Using column 'bacteria' as input for `col_mo`.
E. coli | AMX | -3811 | -259 | -3865 | -7935 | +3784 | +256 | +3819 | +7859 |
E. coli | AMC | -6278 | -298 | -1359 | -7935 | +6190 | +308 | +1361 | +7859 |
E. coli | CIP | -6070 | +5948 | 0 | -1865 | -7935 | +1911 | +7859 | |
E. coli | GEN | -7138 | +7110 | 0 | -797 | -7935 | +749 | +7859 | |
K. pneumoniae | AMX | 0 | 0 | -1578 | -1578 | +1620 | +1620 | ||
K. pneumoniae | AMC | -1248 | -57 | -273 | -1578 | +1270 | +54 | +296 | +1620 |
Using Tidyverse selections, you can also select columns based on the antibiotic class they are in:
--data_1st %>% ++ bug_drug_combinations()+data_1st %>% select(bacteria, fluoroquinolones()) %>% - bug_drug_combinations()
# Selecting fluoroquinolones: 'CIP' (ciprofloxacin)
# NOTE: Using column 'bacteria' as input for `col_mo`.
E. coli | CIP | -6070 | +5948 | 0 | -1865 | -7935 | +1911 | +7859 |
K. pneumoniae | CIP | -1219 | +1232 | 0 | -359 | -1578 | +388 | +1620 |
S. aureus | CIP | -3015 | +2943 | 0 | -918 | -3933 | +876 | +3819 |
S. pneumoniae | CIP | -1798 | +1797 | 0 | -578 | -2376 | +594 | +2391 |
The functions resistance()
and susceptibility()
can be used to calculate antimicrobial resistance or susceptibility. For more specific analyses, the functions proportion_S()
, proportion_SI()
, proportion_I()
, proportion_IR()
and proportion_R()
can be used to determine the proportion of a specific antimicrobial outcome.
As per the EUCAST guideline of 2019, we calculate resistance as the proportion of R (proportion_R()
, equal to resistance()
) and susceptibility as the proportion of S and I (proportion_SI()
, equal to susceptibility()
). These functions can be used on their own:
-data_1st %>% resistance(AMX) -# [1] 0.535457
+data_1st %>% resistance(AMX)
+# [1] 0.5359806
Or can be used in conjuction with group_by()
and summarise()
, both from the dplyr
package:
-data_1st %>% ++ summarise(amoxicillin = resistance(AMX))+data_1st %>% group_by(hospital) %>% - summarise(amoxicillin = resistance(AMX))
# `summarise()` ungrouping output (override with `.groups` argument)
Hospital A | -0.5309604 | +0.5285074 |
Hospital B | -0.5345539 | +0.5438369 |
Hospital C | -0.5403226 | +0.5389447 |
Hospital D | -0.5401970 | +0.5314924 |
Of course it would be very convenient to know the number of isolates responsible for the percentages. For that purpose the n_rsi()
can be used, which works exactly like n_distinct()
from the dplyr
package. It counts all isolates available for every group (i.e. values S, I or R):
-data_1st %>% ++ available = n_rsi(AMX))+data_1st %>% group_by(hospital) %>% summarise(amoxicillin = resistance(AMX), - available = n_rsi(AMX))
# `summarise()` ungrouping output (override with `.groups` argument)
Hospital A | -0.5309604 | -4748 | +0.5285074 | +4683 |
Hospital B | -0.5345539 | -5571 | +0.5438369 | +5395 |
Hospital C | -0.5403226 | -2356 | +0.5389447 | +2388 |
Hospital D | -0.5401970 | -3147 | +0.5314924 | +3223 |
These functions can also be used to get the proportion of multiple antibiotics, to calculate empiric susceptibility of combination therapies very easily:
--data_1st %>% ++ amoxiclav_genta = susceptibility(AMC, GEN))+data_1st %>% group_by(genus) %>% summarise(amoxiclav = susceptibility(AMC), gentamicin = susceptibility(GEN), - amoxiclav_genta = susceptibility(AMC, GEN))
# `summarise()` ungrouping output (override with `.groups` argument)
Escherichia | -0.8287335 | -0.8995589 | -0.9863894 | +0.8268228 | +0.9046953 | +0.9846036 |
Klebsiella | -0.8269962 | -0.8941698 | -0.9828897 | +0.8172840 | +0.8950617 | +0.9827160 |
Staphylococcus | -0.8245614 | -0.9211798 | -0.9839817 | +0.8292747 | +0.9193506 | +0.9832417 |
Streptococcus | -0.5366162 | +0.5411962 | 0.0000000 | -0.5366162 | +0.5411962 |
To make a transition to the next part, let’s see how this difference could be plotted:
--data_1st %>% ++# `summarise()` ungrouping output (override with `.groups` argument)+data_1st %>% group_by(genus) %>% summarise("1. Amoxi/clav" = susceptibility(AMC), "2. Gentamicin" = susceptibility(GEN), @@ -1255,15 +1263,15 @@ Longest: 24 y = value, fill = antibiotic)) + geom_col(position = "dodge2") -# `summarise()` ungrouping output (override with `.groups` argument)
To show results in plots, most R users would nowadays use the ggplot2
package. This package lets you create plots in layers. You can read more about it on their website. A quick example would look like these syntaxes:
-ggplot(data = a_data_set, ++ geom_bar(aes(year))+ggplot(data = a_data_set, mapping = aes(x = year, y = value)) + geom_col() + @@ -1274,16 +1282,16 @@ Longest: 24 # or as short as: ggplot(a_data_set) + - geom_bar(aes(year))
The AMR
package contains functions to extend this ggplot2
package, for example geom_rsi()
. It automatically transforms data with count_df()
or proportion_df()
and show results in stacked bars. Its simplest and shortest example:
Omit the translate_ab = FALSE
to have the antibiotic codes (AMX, AMC, CIP, GEN) translated to official WHO names (amoxicillin, amoxicillin/clavulanic acid, ciprofloxacin, gentamicin).
If we group on e.g. the genus
column and add some additional functions from our package, we can create this:
-# group the data on `genus` ++ theme(axis.text.y = element_text(face = "italic"))+# group the data on `genus` ggplot(data_1st %>% group_by(genus)) + # create bars with genus on x axis # it looks for variables with class `rsi`, @@ -1302,17 +1310,17 @@ Longest: 24 subtitle = "(this is fake data)") + # and print genus in italic to follow our convention # (is now y axis because we turned the plot) - theme(axis.text.y = element_text(face = "italic"))
To simplify this, we also created the ggplot_rsi()
function, which combines almost all above functions:
-data_1st %>% ++ coord_flip()+data_1st %>% group_by(genus) %>% ggplot_rsi(x = "genus", facet = "antibiotic", breaks = 0:4 * 25, datalabels = FALSE) + - coord_flip()
The next example uses the example_isolates
data set. This is a data set included with this package and contains 2,000 microbial isolates with their full antibiograms. It reflects reality and can be used to practice AMR analysis.
We will compare the resistance to fosfomycin (column FOS
) in hospital A and D. The input for the fisher.test()
can be retrieved with a transformation like this:
-# use package 'tidyr' to pivot data: ++# [2,] 24 33+# use package 'tidyr' to pivot data: library(tidyr) check_FOS <- example_isolates %>% @@ -1337,10 +1345,10 @@ Longest: 24 check_FOS # A D # [1,] 25 77 -# [2,] 24 33
We can apply the test now with:
--# do Fisher's Exact Test ++# 0.4488318+# do Fisher's Exact Test fisher.test(check_FOS) # # Fisher's Exact Test for Count Data @@ -1352,7 +1360,7 @@ Longest: 24 # 0.2111489 0.9485124 # sample estimates: # odds ratio -# 0.4488318
As can be seen, the p value is 0.031, which means that the fosfomycin resistance found in isolates from patients in hospital A and D are really different.
Site built with pkgdown 1.6.1.
+Made with pkgdown 1.6.1, using preferably template.
These rules can be used to discard impossible bug-drug combinations in your data. For example, Klebsiella produces beta-lactamase that prevents ampicillin (or amoxicillin) from working against it. In other words, practically every strain of Klebsiella is resistant to ampicillin.
Sometimes, laboratory data can still contain such strains with ampicillin being susceptible to ampicillin. This could be because an antibiogram is available before an identification is available, and the antibiogram is then not re-interpreted based on the identification (namely, Klebsiella). EUCAST expert rules solve this, that can be applied using eucast_rules()
:
-oops <- data.frame(mo = c("Klebsiella", ++# 2 Escherichia S+oops <- data.frame(mo = c("Klebsiella", "Escherichia"), ampicillin = "S") oops @@ -227,19 +244,19 @@ eucast_rules(oops, info = FALSE) # mo ampicillin # 1 Klebsiella R -# 2 Escherichia S
A more convenient function is mo_is_intrinsic_resistant()
that uses the same guideline, but allows to check for one or more specific microorganisms or antibiotics:
-mo_is_intrinsic_resistant(c("Klebsiella", "Escherichia"), ++# [1] TRUE FALSE+mo_is_intrinsic_resistant(c("Klebsiella", "Escherichia"), "ampicillin") # [1] TRUE FALSE mo_is_intrinsic_resistant("Klebsiella", c("ampicillin", "kanamycin")) -# [1] TRUE FALSE
EUCAST rules can not only be used for correction, they can also be used for filling in known resistance and susceptibility based on results of other antimicrobials drugs. This process is called interpretive reading, is basically a form of imputation, and is part of the eucast_rules()
function as well:
-data <- data.frame(mo = c("Staphylococcus aureus", +-+data <- data.frame(mo = c("Staphylococcus aureus", "Enterococcus faecalis", "Escherichia coli", "Klebsiella pneumoniae", @@ -251,9 +268,9 @@ CXM = "-", # Cefuroxime PEN = "S", # Benzylenicillin FOX = "S", # Cefoxitin - stringsAsFactors = FALSE)
+ stringsAsFactors = FALSE)-data
+data
mo | @@ -318,8 +335,8 @@
---|
-eucast_rules(data)
+eucast_rules(data)
For another example, I will create a data set to determine multi-drug resistant TB:
--# a helper function to get a random vector with values S, I and R ++ kanamycin = sample_rsi())+# a helper function to get a random vector with values S, I and R # with the probabilities 50% - 10% - 40% sample_rsi <- function() { sample(c("S", "I", "R"), @@ -304,44 +321,44 @@ Unique: 2 ethambutol = sample_rsi(), pyrazinamide = sample_rsi(), moxifloxacin = sample_rsi(), - kanamycin = sample_rsi())
Because all column names are automatically verified for valid drug names or codes, this would have worked exactly the same:
--my_TB_data <- data.frame(RIF = sample_rsi(), ++ KAN = sample_rsi())+my_TB_data <- data.frame(RIF = sample_rsi(), INH = sample_rsi(), GAT = sample_rsi(), ETH = sample_rsi(), PZA = sample_rsi(), MFX = sample_rsi(), - KAN = sample_rsi())
The data set now looks like this:
--head(my_TB_data) ++# 6 S+head(my_TB_data) # rifampicin isoniazid gatifloxacin ethambutol pyrazinamide moxifloxacin -# 1 R I R S S S -# 2 S R I R R S -# 3 R S R R R R -# 4 R S I R I S -# 5 S R S R S R -# 6 R R S R S S +# 1 S R S R R S +# 2 S R S S S S +# 3 S S S S R S +# 4 S R S R S S +# 5 R S R S R S +# 6 S R R R S S # kanamycin -# 1 R +# 1 S # 2 S # 3 S # 4 S # 5 R -# 6 R
We can now add the interpretation of MDR-TB to our data set. You can use:
--mdro(my_TB_data, guideline = "TB")
+mdro(my_TB_data, guideline = "TB")
or its shortcut mdr_tb()
:
-my_TB_data$mdr <- mdr_tb(my_TB_data) ++# Mycobacterium tuberculosis.+my_TB_data$mdr <- mdr_tb(my_TB_data) # NOTE: No column found as input for `col_mo`, assuming all records contain -# Mycobacterium tuberculosis.
Create a frequency table of the results:
--freq(my_TB_data$mdr)
+freq(my_TB_data$mdr)
Frequency table
Class: factor > ordered (numeric)
Length: 5,000
@@ -361,40 +378,40 @@ Unique: 5
Site built with pkgdown 1.6.1.
+Made with pkgdown 1.6.1, using preferably template.
For PCA, we need to transform our AMR data first. This is what the example_isolates
data set in this package looks like:
-library(AMR) ++# $ RIF <rsi> R, R, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R, R, R…+library(AMR) library(dplyr) glimpse(example_isolates) # Rows: 2,000 @@ -263,10 +280,10 @@ # $ CHL <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… # $ COL <rsi> NA, NA, R, R, R, R, R, R, R, R, R, R, NA, NA, NA, R, … # $ MUP <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… -# $ RIF <rsi> R, R, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R, R, R…
Now to transform this to a data set with only resistance percentages per taxonomic order and genus:
--resistance_data <- example_isolates %>% ++# 6 Actinomycetales Rothia NA NA NA NA NA NA NA NA+resistance_data <- example_isolates %>% group_by(order = mo_order(mo), # group on anything, like order genus = mo_genus(mo)) %>% # and genus as we do here summarise_if(is.rsi, resistance) %>% # then get resistance of all drugs @@ -283,40 +300,40 @@ # 3 Actinomycetales Cutibacterium NA NA NA NA NA NA NA NA # 4 Actinomycetales Dermabacter NA NA NA NA NA NA NA NA # 5 Actinomycetales Micrococcus NA NA NA NA NA NA NA NA -# 6 Actinomycetales Rothia NA NA NA NA NA NA NA NA
The new pca()
function will automatically filter on rows that contain numeric values in all selected variables, so we now only need to do:
-pca_result <- pca(resistance_data) ++# observations available: 7.+pca_result <- pca(resistance_data) # NOTE: Columns selected for PCA: AMC CXM CTX CAZ GEN TOB TMP SXT. Total -# observations available: 7.
The result can be reviewed with the good old summary()
function:
-summary(pca_result) ++# Cumulative Proportion 0.580 0.9331 0.98012 0.99449 0.99988 1.00000 1.000e+00+summary(pca_result) # Importance of components: # PC1 PC2 PC3 PC4 PC5 PC6 PC7 # Standard deviation 2.154 1.6807 0.61365 0.33902 0.20757 0.03136 1.733e-16 # Proportion of Variance 0.580 0.3531 0.04707 0.01437 0.00539 0.00012 0.000e+00 -# Cumulative Proportion 0.580 0.9331 0.98012 0.99449 0.99988 1.00000 1.000e+00
Good news. The first two components explain a total of 93.3% of the variance (see the PC1 and PC2 values of the Proportion of Variance. We can create a so-called biplot with the base R biplot()
function, to see which antimicrobial resistance per drug explain the difference per microorganism.
-biplot(pca_result)
+biplot(pca_result)
But we can’t see the explanation of the points. Perhaps this works better with our new ggplot_pca()
function, that automatically adds the right labels and even groups:
-ggplot_pca(pca_result)
+ggplot_pca(pca_result)
You can also print an ellipse per group, and edit the appearance:
--ggplot_pca(pca_result, ellipse = TRUE) + - ggplot2::labs(title = "An AMR/PCA biplot!")
+ggplot_pca(pca_result, ellipse = TRUE) +
+ ggplot2::labs(title = "An AMR/PCA biplot!")
Site built with pkgdown 1.6.1.
+Made with pkgdown 1.6.1, using preferably template.
vignettes/SPSS.Rmd
SPSS.Rmd
To demonstrate the first point:
--# not all values are valid MIC values: ++# [1] "J01CF05"+# not all values are valid MIC values: as.mic(0.125) # Class <mic> # [1] 0.125 @@ -279,7 +296,7 @@ # [4] "fluclox" "flucloxacilina" "flucloxacillin" # [7] "flucloxacilline" "flucloxacillinum" "fluorochloroxacillin" ab_atc("floxapen") -# [1] "J01CF05"
If you want named variables to be imported as factors so it resembles SPSS more, use as_factor()
.
The difference is this:
--SPSS_data ++# # … with 4,193 more rows+SPSS_data # # A tibble: 4,203 x 4 # v001 sex status statusage # <dbl> <dbl+lbl> <dbl+lbl> <dbl> @@ -326,74 +343,74 @@ # 8 10011 Male alive 73.1 # 9 10017 Male alive 56.7 # 10 10018 Female alive 66.6 -# # … with 4,193 more rows
To import data from SPSS, SAS or Stata, you can use the great haven
package yourself:
-# download and install the latest version: ++library(haven)+# download and install the latest version: install.packages("haven") # load the package you just installed: -library(haven)
You can now import files as follows:
To read files from SPSS into R:
--# read any SPSS file based on file extension (best way): ++read_por(file = "path/to/file")+# read any SPSS file based on file extension (best way): read_spss(file = "path/to/file") # read .sav or .zsav file: read_sav(file = "path/to/file") # read .por file: -read_por(file = "path/to/file")
Do not forget about as_factor()
, as mentioned above.
To export your R objects to the SPSS file format:
-To read files from SAS into R:
--# read .sas7bdat + .sas7bcat files: ++read_xpt(file = "path/to/file")+# read .sas7bdat + .sas7bcat files: read_sas(data_file = "path/to/file", catalog_file = NULL) # read SAS transport files (version 5 and version 8): -read_xpt(file = "path/to/file")
To export your R objects to the SAS file format:
--# save as regular SAS file: ++write_xpt(data = yourdata, path = "path/to/file", version = 8)+# save as regular SAS file: write_sas(data = yourdata, path = "path/to/file") # the SAS transport format is an open format # (required for submission of the data to the FDA) -write_xpt(data = yourdata, path = "path/to/file", version = 8)
To read files from Stata into R:
--# read .dta file: ++read_dta(file = "/path/to/file")+# read .dta file: read_stata(file = "/path/to/file") # works exactly the same: -read_dta(file = "/path/to/file")
To export your R objects to the Stata file format:
-Site built with pkgdown 1.6.1.
+Made with pkgdown 1.6.1, using preferably template.
This tutorial assumes you already imported the WHONET data with e.g. the readxl
package. In RStudio, this can be done using the menu button ‘Import Dataset’ in the tab ‘Environment’. Choose the option ‘From Excel’ and select your exported file. Make sure date fields are imported correctly.
An example syntax could look like this:
--library(readxl) -data <- read_excel(path = "path/to/your/file.xlsx")
+library(readxl)
+data <- read_excel(path = "path/to/your/file.xlsx")
This package comes with an example data set WHONET
. We will use it for this analysis.
First, load the relevant packages if you did not yet did this. I use the tidyverse for all of my analyses. All of them. If you don’t know it yet, I suggest you read about it on their website: https://www.tidyverse.org/.
--library(dplyr) # part of tidyverse ++library(cleaner) # to create frequency tables+library(dplyr) # part of tidyverse library(ggplot2) # part of tidyverse library(AMR) # this package -library(cleaner) # to create frequency tables
We will have to transform some variables to simplify and automate the analysis:
mo
) using our Catalogue of Life reference data set, which contains all ~70,000 microorganisms from the taxonomic kingdoms Bacteria, Fungi and Protozoa. We do the tranformation with as.mo()
. This function also recognises almost all WHONET abbreviations of microorganisms."S"
, "I"
or "R"
. That is exactly where the as.rsi()
function is for.-# transform variables ++ mutate_at(vars(AMP_ND10:CIP_EE), as.rsi)+# transform variables data <- WHONET %>% # get microbial ID based on given organism mutate(mo = as.mo(Organism)) %>% # transform everything from "AMP_ND10" to "CIP_EE" to the new `rsi` class - mutate_at(vars(AMP_ND10:CIP_EE), as.rsi)
No errors or warnings, so all values are transformed succesfully.
We also created a package dedicated to data cleaning and checking, called the cleaner
package. Its freq()
function can be used to create frequency tables.
So let’s check our data, with a couple of frequency tables:
--# our newly created `mo` variable, put in the mo_name() function -data %>% freq(mo_name(mo), nmax = 10)
+# our newly created `mo` variable, put in the mo_name() function
+data %>% freq(mo_name(mo), nmax = 10)
Frequency table
Class: character
Length: 500
@@ -337,10 +354,10 @@ Longest: 40
(omitted 27 entries, n = 56 [11.20%])
--# our transformed antibiotic columns ++data %>% freq(AMC_ND2)+# our transformed antibiotic columns # amoxicillin/clavulanic acid (J01CR02) as an example -data %>% freq(AMC_ND2)
Frequency table
Class: factor > ordered > rsi (numeric)
Length: 500
@@ -391,11 +408,11 @@ Drug group: Beta-lactams/penicillins
An easy ggplot
will already give a lot of information, using the included ggplot_rsi()
function:
-data %>% ++ ggplot_rsi(translate_ab = 'ab', facet = "Country", datalabels = FALSE)+data %>% group_by(Country) %>% select(Country, AMP_ND2, AMC_ED20, CAZ_ED10, CIP_ED5) %>% - ggplot_rsi(translate_ab = 'ab', facet = "Country", datalabels = FALSE)
Site built with pkgdown 1.6.1.
+Made with pkgdown 1.6.1, using preferably template.
One of the most important features of this package is the complete microbial taxonomic database, supplied by the Catalogue of Life. We created a function as.mo()
that transforms any user input value to a valid microbial ID by using intelligent rules combined with the taxonomic tree of Catalogue of Life.
Using the microbenchmark
package, we can review the calculation performance of this function. Its function microbenchmark()
runs different input expressions independently of each other and measures their time-to-result.
-microbenchmark <- microbenchmark::microbenchmark ++library(dplyr)+microbenchmark <- microbenchmark::microbenchmark library(AMR) -library(dplyr)
In the next test, we try to ‘coerce’ different input values into the microbial code of Staphylococcus aureus. Coercion is a computational process of forcing output based on an input. For microorganism names, coercing user input to taxonomically valid microorganism names is crucial to ensure correct interpretation and to enable grouping based on taxonomic properties.
The actual result is the same every time: it returns its microorganism code B_STPHY_AURS
(B stands for Bacteria, the taxonomic kingdom).
But the calculation time differs a lot:
--S.aureus <- microbenchmark( ++# 10+S.aureus <- microbenchmark( as.mo("sau"), # WHONET code as.mo("stau"), as.mo("STAU"), @@ -227,20 +244,20 @@ times = 10) print(S.aureus, unit = "ms", signif = 2) # Unit: milliseconds -# expr min lq mean median uq max -# as.mo("sau") 13.0 13.0 17.0 15.0 16.0 43.0 -# as.mo("stau") 120.0 120.0 140.0 150.0 150.0 160.0 -# as.mo("STAU") 120.0 150.0 150.0 150.0 160.0 160.0 -# as.mo("staaur") 13.0 13.0 17.0 14.0 16.0 44.0 -# as.mo("STAAUR") 13.0 14.0 19.0 16.0 16.0 48.0 -# as.mo("S. aureus") 34.0 58.0 61.0 63.0 68.0 86.0 -# as.mo("S aureus") 33.0 34.0 46.0 36.0 63.0 63.0 -# as.mo("Staphylococcus aureus") 2.3 2.5 2.5 2.5 2.6 2.7 -# as.mo("Staphylococcus aureus (MRSA)") 980.0 980.0 1000.0 1000.0 1000.0 1100.0 -# as.mo("Sthafilokkockus aaureuz") 430.0 430.0 440.0 440.0 440.0 460.0 -# as.mo("MRSA") 13.0 14.0 18.0 16.0 17.0 43.0 -# as.mo("VISA") 21.0 22.0 23.0 22.0 24.0 25.0 -# as.mo("VRSA") 22.0 23.0 35.0 27.0 52.0 55.0 +# expr min lq mean median uq max +# as.mo("sau") 10.0 11.0 20.0 13.0 38.0 41.0 +# as.mo("stau") 100.0 110.0 120.0 110.0 140.0 140.0 +# as.mo("STAU") 100.0 110.0 130.0 140.0 140.0 160.0 +# as.mo("staaur") 10.0 12.0 23.0 14.0 38.0 58.0 +# as.mo("STAAUR") 10.0 11.0 18.0 13.0 14.0 42.0 +# as.mo("S. aureus") 26.0 30.0 46.0 44.0 58.0 78.0 +# as.mo("S aureus") 26.0 27.0 32.0 31.0 31.0 56.0 +# as.mo("Staphylococcus aureus") 2.0 2.4 2.6 2.7 2.8 3.1 +# as.mo("Staphylococcus aureus (MRSA)") 890.0 910.0 920.0 920.0 930.0 960.0 +# as.mo("Sthafilokkockus aaureuz") 380.0 390.0 400.0 390.0 400.0 430.0 +# as.mo("MRSA") 9.9 11.0 16.0 14.0 14.0 41.0 +# as.mo("VISA") 17.0 19.0 28.0 21.0 45.0 52.0 +# as.mo("VRSA") 17.0 19.0 26.0 21.0 22.0 53.0 # neval # 10 # 10 @@ -254,7 +271,7 @@ # 10 # 10 # 10 -# 10
In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 5 milliseconds means it can determine 200 input values per second. It case of 100 milliseconds, this is only 10 input values per second. It is clear that accepted taxonomic names are extremely fast, but some variations can take up to 500-1000 times as much time.
To improve performance, two important calculations take almost no time at all: repetitive results and already precalculated results.
@@ -262,8 +279,8 @@Repetitive results are unique values that are present more than once. Unique values will only be calculated once by as.mo()
. We will use mo_name()
for this test - a helper function that returns the full microbial name (genus, species and possibly subspecies) which uses as.mo()
internally.
-# take all MO codes from the example_isolates data set +-+# take all MO codes from the example_isolates data set x <- example_isolates$mo %>% # and copy them a thousand times rep(1000) %>% @@ -284,27 +301,27 @@ print(run_it, unit = "ms", signif = 3) # Unit: milliseconds # expr min lq mean median uq max neval -# mo_name(x) 145 190 221 213 248 313 10
So getting official taxonomic names of 2,000,000 (!!) items consisting of 90 unique values only takes 0.213 seconds. You only lose time on your unique input values.
+# mo_name(x) 128 167 205 195 232 298 10
So getting official taxonomic names of 2,000,000 (!!) items consisting of 90 unique values only takes 0.195 seconds. You only lose time on your unique input values.
What about precalculated results? If the input is an already precalculated result of a helper function like mo_name()
, it almost doesn’t take any time at all (see ‘C’ below):
-run_it <- microbenchmark(A = mo_name("STAAUR"), +-+run_it <- microbenchmark(A = mo_name("STAAUR"), B = mo_name("S. aureus"), C = mo_name("Staphylococcus aureus"), times = 10) print(run_it, unit = "ms", signif = 3) # Unit: milliseconds # expr min lq mean median uq max neval -# A 8.36 8.59 8.99 9.07 9.29 9.62 10 -# B 26.80 27.50 28.10 28.00 29.10 29.80 10 -# C 2.06 2.15 6.68 2.50 2.69 45.20 10
So going from
-mo_name("Staphylococcus aureus")
to"Staphylococcus aureus"
takes 0.0025 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:+-run_it <- microbenchmark(A = mo_species("aureus"), +# A 7.35 7.70 8.15 8.11 8.68 8.96 10 +# B 22.50 22.80 28.10 24.30 25.70 62.60 10 +# C 1.84 1.98 2.13 2.13 2.17 2.43 10So going from
+mo_name("Staphylococcus aureus")
to"Staphylococcus aureus"
takes 0.0021 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:+# A 1.60 1.63 1.74 1.70 1.83 2.02 10 +# B 1.37 1.40 1.62 1.54 1.74 2.07 10 +# C 1.39 1.45 1.63 1.65 1.77 1.86 10 +# D 1.37 1.40 1.58 1.51 1.62 2.27 10 +# E 1.44 1.64 1.70 1.66 1.81 2.13 10 +# F 1.38 1.42 1.57 1.58 1.67 1.81 10 +# G 1.38 1.62 1.64 1.63 1.70 1.82 10 +# H 1.44 1.65 1.84 1.82 1.99 2.60 10+run_it <- microbenchmark(A = mo_species("aureus"), B = mo_genus("Staphylococcus"), C = mo_name("Staphylococcus aureus"), D = mo_family("Staphylococcaceae"), @@ -316,22 +333,22 @@ print(run_it, unit = "ms", signif = 3) # Unit: milliseconds # expr min lq mean median uq max neval -# A 1.68 1.91 2.13 2.08 2.43 2.62 10 -# B 1.89 1.99 2.16 2.05 2.30 2.69 10 -# C 1.87 1.91 2.09 2.00 2.28 2.46 10 -# D 1.65 1.71 1.92 1.87 1.98 2.58 10 -# E 1.64 1.74 1.90 1.87 1.98 2.34 10 -# F 1.86 1.92 2.05 2.05 2.16 2.27 10 -# G 1.73 1.80 1.99 1.97 2.14 2.29 10 -# H 1.85 1.93 2.08 1.98 2.15 2.93 10
Of course, when running mo_phylum("Firmicutes")
the function has zero knowledge about the actual microorganism, namely S. aureus. But since the result would be "Firmicutes"
anyway, there is no point in calculating the result. And because this package ‘knows’ all phyla of all known bacteria (according to the Catalogue of Life), it can just return the initial value immediately.
When the system language is non-English and supported by this AMR
package, some functions will have a translated result. This almost does’t take extra time:
-mo_name("CoNS", language = "en") # or just mo_name("CoNS") on an English system ++# en 15.77 16.36 18.40 16.85 17.50 53.58 100 +# de 19.03 19.70 24.60 20.14 21.06 59.83 100 +# nl 30.93 32.03 36.58 32.80 33.78 85.59 100 +# es 18.78 19.62 24.49 19.98 20.60 58.73 100 +# it 18.80 19.50 22.68 19.92 20.89 63.99 100 +# fr 18.78 19.59 21.99 19.90 20.75 58.87 100 +# pt 18.79 19.65 23.04 20.08 20.82 59.30 100+mo_name("CoNS", language = "en") # or just mo_name("CoNS") on an English system # [1] "Coagulase-negative Staphylococcus (CoNS)" mo_name("CoNS", language = "es") # or just mo_name("CoNS") on a Spanish system @@ -351,13 +368,13 @@ print(run_it, unit = "ms", signif = 4) # Unit: milliseconds # expr min lq mean median uq max neval -# en 16.16 17.85 22.42 18.54 19.61 68.28 100 -# de 19.55 21.38 24.37 21.96 22.95 61.55 100 -# nl 31.64 35.08 40.58 36.34 37.88 79.00 100 -# es 18.03 21.46 25.61 22.09 23.30 64.97 100 -# it 17.82 21.01 23.29 21.80 22.82 58.47 100 -# fr 18.19 21.29 25.63 21.93 23.20 60.48 100 -# pt 17.76 21.07 24.81 21.83 23.39 65.00 100
Currently supported are German, Dutch, Spanish, Italian, French and Portuguese.
Site built with pkgdown 1.6.1.
+Made with pkgdown 1.6.1, using preferably template.
Site built with pkgdown 1.6.1.
+Made with pkgdown 1.6.1, using preferably template.
As with many uses in R, we need some additional packages for AMR analysis. Our package works closely together with the tidyverse packages dplyr
and ggplot2
by Dr Hadley Wickham. The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.
Our AMR
package depends on these packages and even extends their use and functions.
Our package contains a function resistance_predict()
, which takes the same input as functions for other AMR analysis. Based on a date column, it calculates cases per year and uses a regression model to predict antimicrobial resistance.
It is basically as easy as:
-# resistance prediction of piperacillin/tazobactam (TZP):
-resistance_predict(tbl = example_isolates, col_date = "date", col_ab = "TZP", model = "binomial")
-
-# or:
-example_isolates %>%
- resistance_predict(col_ab = "TZP",
- model "binomial")
-
-# to bind it to object 'predict_TZP' for example:
-predict_TZP <- example_isolates %>%
- resistance_predict(col_ab = "TZP",
- model = "binomial")
# resistance prediction of piperacillin/tazobactam (TZP):
+resistance_predict(tbl = example_isolates, col_date = "date", col_ab = "TZP", model = "binomial")
+
+# or:
+example_isolates %>%
+ resistance_predict(col_ab = "TZP",
+ model "binomial")
+
+# to bind it to object 'predict_TZP' for example:
+predict_TZP <- example_isolates %>%
+ resistance_predict(col_ab = "TZP",
+ model = "binomial")
The function will look for a date column itself if col_date
is not set.
When running any of these commands, a summary of the regression model will be printed unless using resistance_predict(..., info = FALSE)
.
# NOTE: Using column 'date' as input for `col_date`.
This text is only a printed summary - the actual result (output) of the function is a data.frame
containing for each year: the number of observations, the actual observed resistance, the estimated resistance and the standard error below and above the estimation:
-predict_TZP ++# 29 2030 0.48639359 0.3782932 0.5944939 NA NA 0.48639359+predict_TZP # year value se_min se_max observations observed estimated # 1 2002 0.20000000 NA NA 15 0.20000000 0.05616378 # 2 2003 0.06250000 NA NA 32 0.06250000 0.06163839 @@ -265,31 +282,31 @@ # 26 2027 0.41315710 0.3244399 0.5018743 NA NA 0.41315710 # 27 2028 0.43730688 0.3418075 0.5328063 NA NA 0.43730688 # 28 2029 0.46175755 0.3597639 0.5637512 NA NA 0.46175755 -# 29 2030 0.48639359 0.3782932 0.5944939 NA NA 0.48639359
The function plot
is available in base R, and can be extended by other packages to depend the output based on the type of input. We extended its function to cope with resistance predictions:
-plot(predict_TZP)
+plot(predict_TZP)
This is the fastest way to plot the result. It automatically adds the right axes, error bars, titles, number of available observations and type of model.
We also support the ggplot2
package with our custom function ggplot_rsi_predict()
to create more appealing plots:
-ggplot_rsi_predict(predict_TZP)
+ggplot_rsi_predict(predict_TZP)
- ++ggplot_rsi_predict(predict_TZP, ribbon = FALSE)+# choose for error bars instead of a ribbon -ggplot_rsi_predict(predict_TZP, ribbon = FALSE)
Resistance is not easily predicted; if we look at vancomycin resistance in Gram-positive bacteria, the spread (i.e. standard error) is enormous:
--example_isolates %>% ++# NOTE: Using column 'date' as input for `col_date`.+example_isolates %>% filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>% resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "binomial") %>% ggplot_rsi_predict() -# NOTE: Using column 'date' as input for `col_date`.
Vancomycin resistance could be 100% in ten years, but might also stay around 0%.
You can define the model with the model
parameter. The model chosen above is a generalised linear regression model using a binomial distribution, assuming that a period of zero resistance was followed by a period of increasing resistance leading slowly to more and more resistance.
For the vancomycin resistance in Gram-positive bacteria, a linear model might be more appropriate since no binomial distribution is to be expected based on the observed years:
--example_isolates %>% ++# NOTE: Using column 'date' as input for `col_date`.+example_isolates %>% filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>% resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>% ggplot_rsi_predict() -# NOTE: Using column 'date' as input for `col_date`.
This seems more likely, doesn’t it?
The model itself is also available from the object, as an attribute
:
-model <- attributes(predict_TZP)$model ++# year 0.09883005 0.02295317 4.305725 1.664395e-05+model <- attributes(predict_TZP)$model summary(model)$family # @@ -350,7 +367,7 @@ summary(model)$coefficients # Estimate Std. Error z value Pr(>|z|) # (Intercept) -200.67944891 46.17315349 -4.346237 1.384932e-05 -# year 0.09883005 0.02295317 4.305725 1.664395e-05
Site built with pkgdown 1.6.1.
+Made with pkgdown 1.6.1, using preferably template.
Site built with pkgdown 1.6.1.
+Made with pkgdown 1.6.1, using preferably template.
Site built with pkgdown 1.6.1.
+Made with pkgdown 1.6.1, using preferably template.