diff --git a/DESCRIPTION b/DESCRIPTION index f7d59cd2..d82496bb 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -48,15 +48,15 @@ Imports: crayon (>= 1.3.0), data.table (>= 1.9.0), dplyr (>= 0.7.0), + ggplot2, hms, knitr (>= 1.0.0), + microbenchmark, rlang (>= 0.2.0), tidyr (>= 0.7.0) Suggests: covr (>= 3.0.1), curl, - ggplot2, - microbenchmark, readxl, rmarkdown, rstudioapi, diff --git a/R/catalogue_of_life.R b/R/catalogue_of_life.R index 30e9f626..480ab496 100755 --- a/R/catalogue_of_life.R +++ b/R/catalogue_of_life.R @@ -29,8 +29,8 @@ #' Included are: #' \itemize{ #' \item{All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses} -#' \item{All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different species, of which most are not microbial. Including everything tremendously slows down our algortihms, and not all fungi fit the scope of this package. By only including the aforementioned taxonomic orders, the most relevant species are covered (like genera \emph{Aspergillus}, \emph{Candida}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} -#' \item{All ~15,000 previously accepted names of inckuded (sub)species that have been taxonomically renamed} +#' \item{All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant (sub)species are covered (like all species of \emph{Aspergillus}, \emph{Candida}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} +#' \item{All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed} #' \item{The complete taxonomic tree of all included (sub)species: from kingdom to subspecies} #' \item{The responsible author(s) and year of scientific publication} #' } diff --git a/README.md b/README.md index 8c69270c..ad6a632e 100755 --- a/README.md +++ b/README.md @@ -1,4 +1,5 @@ -# AMR (for R) +% AMR (for R) + ### Not a developer? Then please visit our website [https://msberends.gitlab.io/AMR](https://msberends.gitlab.io/AMR) to read about this package. diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index 1e382202..839a1db4 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -1,75 +1,35 @@ - - -
- + + + + -Note: values on this page will change with every website update since they are based on randomly created values and the page was written in RMarkdown. However, the methodology remains unchanged. This page was generated on 20 February 2019.
For this tutorial, we will create fake demonstration data to work with.
You can skip to Cleaning the data if you already have your own data ready. If you start your analysis, try to make the structure of your data generally look like this:
-date | patient_id | mo | amox | cipr | -
---|---|---|---|---|
2019-02-20 | @@ -283,75 +240,81 @@
As with many uses in R, we need some additional packages for AMR analysis. Our package works closely together with the tidyverse packages dplyr
and ggplot2
by Dr Hadley Wickham. The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.
Our AMR
package depends on these packages and even extends their use and functions.
We will create some fake example data to use for analysis. For antimicrobial resistance analysis, we need at least: a patient ID, name or code of a microorganism, a date and antimicrobial results (an antibiogram). It could also include a specimen type (e.g. to filter on blood or urine), the ward type (e.g. to filter on ICUs).
With additional columns (like a hospital name, the patients gender of even [well-defined] clinical properties) you can do a comparative analysis, as this tutorial will demonstrate too.
To start with patients, we need a unique list of patients.
- +The LETTERS
object is available in R - it’s a vector with 26 characters: A
to Z
. The patients
object we just created is now a vector of length 260, with values (patient IDs) varying from A1
to Z10
. Now we we also set the gender of our patients, by putting the ID and the gender in a table:
The first 135 patient IDs are now male, the other 125 are female.
Let’s pretend that our data consists of blood cultures isolates from 1 January 2010 until 1 January 2018.
- +This dates
object now contains all days in our date range.
For this tutorial, we will uses four different microorganisms: Escherichia coli, Staphylococcus aureus, Streptococcus pneumoniae, and Klebsiella pneumoniae:
- +For completeness, we can also add the hospital where the patients was admitted and we need to define valid antibmicrobial results for our randomisation:
- +Using the sample()
function, we can randomly select items from all objects we defined earlier. To let our fake data reflect reality a bit, we will also approximately define the probabilities of bacteria and the antibiotic results with the prob
parameter.
Using the sample()
function, we can randomly select items from all objects we defined earlier. To let our fake data reflect reality a bit, we will also approximately define the probabilities of bacteria and the antibiotic results with the prob
parameter.
sample_size <- 20000
-data <- data.frame(date = sample(dates, size = sample_size, replace = TRUE),
- patient_id = sample(patients, size = sample_size, replace = TRUE),
- hospital = sample(hospitals, size = sample_size, replace = TRUE,
- prob = c(0.30, 0.35, 0.15, 0.20)),
- bacteria = sample(bacteria, size = sample_size, replace = TRUE,
- prob = c(0.50, 0.25, 0.15, 0.10)),
- amox = sample(ab_interpretations, size = sample_size, replace = TRUE,
- prob = c(0.60, 0.05, 0.35)),
- amcl = sample(ab_interpretations, size = sample_size, replace = TRUE,
- prob = c(0.75, 0.10, 0.15)),
- cipr = sample(ab_interpretations, size = sample_size, replace = TRUE,
- prob = c(0.80, 0.00, 0.20)),
- gent = sample(ab_interpretations, size = sample_size, replace = TRUE,
- prob = c(0.92, 0.00, 0.08))
+data <- data.frame(date = sample(dates, size = sample_size, replace = TRUE),
+ patient_id = sample(patients, size = sample_size, replace = TRUE),
+ hospital = sample(hospitals, size = sample_size, replace = TRUE,
+ prob = c(0.30, 0.35, 0.15, 0.20)),
+ bacteria = sample(bacteria, size = sample_size, replace = TRUE,
+ prob = c(0.50, 0.25, 0.15, 0.10)),
+ amox = sample(ab_interpretations, size = sample_size, replace = TRUE,
+ prob = c(0.60, 0.05, 0.35)),
+ amcl = sample(ab_interpretations, size = sample_size, replace = TRUE,
+ prob = c(0.75, 0.10, 0.15)),
+ cipr = sample(ab_interpretations, size = sample_size, replace = TRUE,
+ prob = c(0.80, 0.00, 0.20)),
+ gent = sample(ab_interpretations, size = sample_size, replace = TRUE,
+ prob = c(0.92, 0.00, 0.08))
)
Using the left_join()
function from the dplyr
package, we can ‘map’ the gender to the patient ID using the patients_table
object we created earlier:
The resulting data set contains 5,000 blood culture isolates. With the head()
function we can preview the first 6 values of this data set:
date | patient_id | hospital | @@ -361,73 +324,72 @@cipr | gent | gender | -||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2013-02-06 | -O10 | -Hospital B | -Streptococcus pneumoniae | +2014-07-28 | +F8 | +Hospital A | +Escherichia coli | +R | +R | +S | +S | +M | +|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2017-04-01 | +Y10 | +Hospital B | +Escherichia coli | +R | S | -I | S | S | F | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2015-04-18 | +J2 | +Hospital B | +Klebsiella pneumoniae | +S | +I | +R | +S | +M | +|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2010-01-25 | -W1 | +2017-08-08 | +Y6 | +Hospital C | +Escherichia coli | +R | +S | +S | +S | +F | +|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2010-12-27 | +B3 | Hospital B | Klebsiella pneumoniae | S | I | S | S | -F | -|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2011-02-16 | -N10 | -Hospital C | -Escherichia coli | -S | -S | -S | -S | -F | -|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2010-04-01 | -J4 | -Hospital D | -Escherichia coli | -S | -S | -S | -S | M | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2015-02-27 | -O2 | -Hospital B | -Escherichia coli | -S | +|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2016-12-01 | +D4 | +Hospital D | +Staphylococcus aureus | +R | R | S | -S | -F | -|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2013-03-19 | -E3 | -Hospital B | -Escherichia coli | -S | -I | -S | -S | +R | M |
isolate | date | patient_id | @@ -547,37 +512,36 @@cipr | gent | first | -|||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | -2010-01-19 | -C6 | +2010-01-30 | +J7 | B_ESCHR_COL | -R | S | -R | +S | +S | S | TRUE |
2 | -2010-03-29 | -C6 | +2010-01-30 | +J7 | B_ESCHR_COL | S | -R | -R | +S | +S | S | FALSE |
3 | -2010-05-27 | -C6 | +2010-03-30 | +J7 | B_ESCHR_COL | -S | +R | S | S | S | @@ -585,19 +549,19 @@||
4 | -2010-06-25 | -C6 | +2010-04-11 | +J7 | B_ESCHR_COL | S | S | -S | +R | S | FALSE | |
5 | -2010-06-28 | -C6 | +2010-08-02 | +J7 | B_ESCHR_COL | S | R | @@ -607,76 +571,75 @@|||||
6 | -2010-07-31 | -C6 | +2010-10-14 | +J7 | B_ESCHR_COL | S | S | -S | +R | S | FALSE | |
7 | -2010-08-07 | -C6 | +2010-11-02 | +J7 | B_ESCHR_COL | R | -S | -S | +I | +R | S | FALSE |
8 | -2010-11-13 | -C6 | +2011-02-10 | +J7 | B_ESCHR_COL | -R | -I | S | S | -FALSE | -||
9 | -2011-04-02 | -C6 | -B_ESCHR_COL | -R | -S | S | S | TRUE | ||||
10 | -2011-06-26 | -C6 | +||||||||||
9 | +2011-03-27 | +J7 | B_ESCHR_COL | -R | +S | S | S | S | FALSE | |||
10 | +2011-04-21 | +J7 | +B_ESCHR_COL | +R | +S | +R | +S | +FALSE | +
Only 2 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics()
function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.
If a column exists with a name like ‘key(…)ab’ the first_isolate()
function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output:
Only 2 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics()
function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.
If a column exists with a name like ‘key(…)ab’ the first_isolate()
function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output:
data <- data %>%
- mutate(keyab = key_antibiotics(.)) %>%
- mutate(first_weighted = first_isolate(.))
+ mutate(keyab = key_antibiotics(.)) %>%
+ mutate(first_weighted = first_isolate(.))
#> NOTE: Using column `bacteria` as input for `col_mo`.
#> NOTE: Using column `bacteria` as input for `col_mo`.
#> NOTE: Using column `date` as input for `col_date`.
#> NOTE: Using column `patient_id` as input for `col_patient_id`.
#> NOTE: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics = FALSE to prevent this.
#> [Criterion] Inclusion based on key antibiotics, ignoring I.
-#> => Found 15,809 first weighted isolates (79.0% of total)
isolate | date | patient_id | @@ -687,39 +650,38 @@gent | first | first_weighted | -||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | -2010-01-19 | -C6 | +2010-01-30 | +J7 | B_ESCHR_COL | -R | S | -R | +S | +S | S | TRUE | TRUE | ||
2 | -2010-03-29 | -C6 | +2010-01-30 | +J7 | B_ESCHR_COL | S | -R | -R | +S | +S | S | FALSE | -TRUE | +FALSE | |
3 | -2010-05-27 | -C6 | +2010-03-30 | +J7 | B_ESCHR_COL | -S | +R | S | S | S | @@ -728,20 +690,20 @@|||||
4 | -2010-06-25 | -C6 | +2010-04-11 | +J7 | B_ESCHR_COL | S | S | -S | +R | S | FALSE | -FALSE | +TRUE | ||
5 | -2010-06-28 | -C6 | +2010-08-02 | +J7 | B_ESCHR_COL | S | R | @@ -752,79 +714,78 @@||||||||
6 | -2010-07-31 | -C6 | +2010-10-14 | +J7 | B_ESCHR_COL | S | S | -S | +R | S | FALSE | TRUE | |||
7 | -2010-08-07 | -C6 | +2010-11-02 | +J7 | B_ESCHR_COL | R | -S | -S | +I | +R | S | FALSE | TRUE | ||
8 | -2010-11-13 | -C6 | +2011-02-10 | +J7 | B_ESCHR_COL | -R | -I | S | S | -FALSE | -FALSE | +S | +S | +TRUE | +TRUE |
9 | -2011-04-02 | -C6 | +2011-03-27 | +J7 | B_ESCHR_COL | -R | S | S | S | -TRUE | -TRUE | +S | +FALSE | +FALSE | |
10 | -2011-06-26 | -C6 | +2011-04-21 | +J7 | B_ESCHR_COL | R | S | -S | +R | S | FALSE | -FALSE | +TRUE |
Instead of 2, now 7 isolates are flagged. In total, 79% of all isolates are marked ‘first weighted’ - 50.8% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
-As with filter_first_isolate()
, there’s a shortcut for this new algorithm too:
Instead of 2, now 8 isolates are flagged. In total, 78.9% of all isolates are marked ‘first weighted’ - 50.5% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
+As with filter_first_isolate()
, there’s a shortcut for this new algorithm too:
So we end up with 15,809 isolates for analysis.
+ filter_first_weighted_isolate() +So we end up with 15,784 isolates for analysis.
We can remove unneeded columns:
+ select(-c(first, keyab))Now our data looks like:
- -date | patient_id | @@ -839,32 +800,63 @@genus | species | first_weighted | -|||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | +2014-07-28 | +F8 | +Hospital A | +B_ESCHR_COL | +R | +R | +S | +S | +M | +Gram negative | +Escherichia | +coli | +TRUE | +||||||||||
2 | -2010-01-25 | -W1 | +2017-04-01 | +Y10 | +Hospital B | +B_ESCHR_COL | +R | +S | +S | +S | +F | +Gram negative | +Escherichia | +coli | +TRUE | +||||||||
3 | +2015-04-18 | +J2 | Hospital B | B_KLBSL_PNE | R | I | +R | S | -S | -F | +M | Gram negative | Klebsiella | pneumoniae | TRUE | ||||||||
3 | -2011-02-16 | -N10 | +4 | +2017-08-08 | +Y6 | Hospital C | B_ESCHR_COL | -S | +R | S | S | S | @@ -875,67 +867,35 @@TRUE | ||||||||||
4 | -2010-04-01 | -J4 | +6 | +2016-12-01 | +D4 | Hospital D | -B_ESCHR_COL | -S | -S | -S | +B_STPHY_AUR | +R | +R | S | +R | M | -Gram negative | -Escherichia | -coli | +Gram positive | +Staphylococcus | +aureus | TRUE |
5 | -2015-02-27 | -O2 | -Hospital B | -B_ESCHR_COL | -S | -R | -S | -S | -F | -Gram negative | -Escherichia | -coli | -TRUE | -||||||||||
7 | -2011-08-19 | -D6 | -Hospital B | -B_STPHY_AUR | +2017-03-30 | +N5 | +Hospital A | +B_ESCHR_COL | +S | S | -I | S | S | M | -Gram positive | -Staphylococcus | -aureus | -TRUE | -|||||
10 | -2017-01-12 | -P10 | -Hospital A | -B_STPHY_AUR | -S | -R | -S | -S | -F | -Gram positive | -Staphylococcus | -aureus | +Gram negative | +Escherichia | +coli | TRUE |