diff --git a/DESCRIPTION b/DESCRIPTION index 8dbd4116..251378b0 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR Version: 0.5.0.9017 -Date: 2019-02-13 +Date: 2019-02-14 Title: Antimicrobial Resistance Analysis Authors@R: c( person( diff --git a/R/freq.R b/R/freq.R index b21ba2ec..1f8f5aac 100755 --- a/R/freq.R +++ b/R/freq.R @@ -667,16 +667,17 @@ format_header <- function(x, markdown = FALSE, decimal.mark = ".", big.mark = ", header <- header[names(header) != "na_length"] # format all numeric values - header <- lapply(header, function(x) - if (is.numeric(x)) + header <- lapply(header, function(x) { + if (is.numeric(x)) { if (any(x < 1000)) { format(round2(x, digits = digits), decimal.mark = decimal.mark, big.mark = big.mark) } else { format(x, digits = digits, decimal.mark = decimal.mark, big.mark = big.mark) } - else + } else { x - ) + } + }) # numeric values if (has_length == TRUE & any(x_class %in% c("double", "integer", "numeric", "raw", "single"))) { diff --git a/data/microorganisms.codes.rda b/data/microorganisms.codes.rda index 518404aa..a7f81ff2 100644 Binary files a/data/microorganisms.codes.rda and b/data/microorganisms.codes.rda differ diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index 29aa831d..6bdc843a 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -185,7 +185,7 @@
AMR.Rmd
Note: values on this page will change with every website update since they are based on randomly created values and the page was written in RMarkdown. However, the methodology remains unchanged. This page was generated on 12 February 2019.
+Note: values on this page will change with every website update since they are based on randomly created values and the page was written in RMarkdown. However, the methodology remains unchanged. This page was generated on 14 February 2019.
#> Frequency table of `gender` from a data.frame (20,000 x 9)
#>
#> Class: factor (numeric)
-#> Levels: F, M
#> Length: 20,000 (of which NA: 0 = 0.00%)
+#> Levels: 2: F, M
#> Unique: 2
#>
#> Item Count Percent Cum. Count Cum. Percent
#> --- ----- ------- -------- ----------- -------------
-#> 1 M 10,303 51.5% 10,303 51.5%
-#> 2 F 9,697 48.5% 20,000 100.0%
+#> 1 M 10,444 52.2% 10,444 52.2%
+#> 2 F 9,556 47.8% 20,000 100.0%
So, we can draw at least two conclusions immediately. From a data scientist perspective, the data looks clean: only values M
and F
. From a researcher perspective: there are slightly more men. Nothing we didn’t already know.
The data is already quite clean, but we still need to transform some variables. The bacteria
column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate()
function of the dplyr
package makes this really easy:
data <- data %>%
@@ -436,10 +436,10 @@
#> Kingella kingae (no changes)
#>
#> EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
-#> Table 1: Intrinsic resistance in Enterobacteriaceae (1278 changes)
+#> Table 1: Intrinsic resistance in Enterobacteriaceae (1293 changes)
#> Table 2: Intrinsic resistance in non-fermentative Gram-negative bacteria (no changes)
#> Table 3: Intrinsic resistance in other Gram-negative bacteria (no changes)
-#> Table 4: Intrinsic resistance in Gram-positive bacteria (2750 changes)
+#> Table 4: Intrinsic resistance in Gram-positive bacteria (2812 changes)
#> Table 8: Interpretive rules for B-lactam agents and Gram-positive cocci (no changes)
#> Table 9: Interpretive rules for B-lactam agents and Gram-negative rods (no changes)
#> Table 10: Interpretive rules for B-lactam agents and other Gram-negative bacteria (no changes)
@@ -455,9 +455,9 @@
#> Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no changes)
#> Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no changes)
#>
-#> => EUCAST rules affected 7,442 out of 20,000 rows
+#> => EUCAST rules affected 7,447 out of 20,000 rows
#> -> added 0 test results
-#> -> changed 4,028 test results (0 to S; 0 to I; 4,028 to R)
So only 28.3% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
So only 28.5% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
For future use, the above two syntaxes can be shortened with the filter_first_isolate()
function:
Only 1 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics()
function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.
Only 2 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics()
function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.
If a column exists with a name like ‘key(…)ab’ the first_isolate()
function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output:
data <- data %>%
mutate(keyab = key_antibiotics(.)) %>%
@@ -630,7 +630,7 @@
#> NOTE: Using column `patient_id` as input for `col_patient_id`.
#> NOTE: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics = FALSE to prevent this.
#> [Criterion] Inclusion based on key antibiotics, ignoring I.
-#> => Found 15,783 first weighted isolates (78.9% of total)
isolate | @@ -647,32 +647,44 @@||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | -2010-01-15 | -V8 | +2010-06-20 | +Y4 | B_ESCHR_COL | S | -R | S | +R | S | TRUE | TRUE | ||
2 | -2010-02-08 | -V8 | +2010-07-31 | +Y4 | B_ESCHR_COL | S | S | -R | +S | S | FALSE | TRUE | ||
3 | -2010-02-24 | -V8 | +2010-08-26 | +Y4 | +B_ESCHR_COL | +S | +S | +S | +S | +FALSE | +FALSE | +|||
4 | +2010-12-11 | +Y4 | B_ESCHR_COL | R | S | @@ -681,22 +693,10 @@FALSE | TRUE | |||||||
4 | -2010-04-28 | -V8 | -B_ESCHR_COL | -R | -S | -S | -S | -FALSE | -FALSE | -|||||
5 | -2010-05-13 | -V8 | +2010-12-30 | +Y4 | B_ESCHR_COL | R | S | @@ -707,47 +707,47 @@|||||||
6 | -2010-05-14 | -V8 | +2011-04-02 | +Y4 | B_ESCHR_COL | +R | +I | S | -S | -S | -S | +R | FALSE | TRUE |
7 | -2010-08-04 | -V8 | +2011-04-06 | +Y4 | B_ESCHR_COL | +S | +S | +S | R | -S | -S | -S | FALSE | TRUE |
8 | -2010-09-12 | -V8 | +2011-04-07 | +Y4 | B_ESCHR_COL | S | S | -R | +S | S | FALSE | TRUE | ||
9 | -2010-10-03 | -V8 | +2011-05-28 | +Y4 | B_ESCHR_COL | R | -I | +S | S | S | FALSE | @@ -755,23 +755,23 @@|||
10 | -2011-01-09 | -V8 | +2011-09-09 | +Y4 | B_ESCHR_COL | -R | +S | S | R | S | -FALSE | +TRUE | TRUE |
Instead of 1, now 8 isolates are flagged. In total, 78.9% of all isolates are marked ‘first weighted’ - 50.7% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
+Instead of 2, now 8 isolates are flagged. In total, 79.3% of all isolates are marked ‘first weighted’ - 50.9% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
As with filter_first_isolate()
, there’s a shortcut for this new algorithm too:
So we end up with 15,783 isolates for analysis.
+So we end up with 15,866 isolates for analysis.
We can remove unneeded columns:
@@ -779,6 +779,7 @@date | patient_id | hospital | @@ -795,43 +796,30 @@||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2012-04-28 | -X6 | -Hospital B | -B_ESCHR_COL | -R | +1 | +2016-08-15 | +E9 | +Hospital C | +B_STPHY_AUR | S | R | S | -F | -Gram negative | -Escherichia | -coli | +S | +M | +Gram positive | +Staphylococcus | +aureus | TRUE |
2012-12-23 | -Z1 | -Hospital A | -B_ESCHR_COL | -I | -S | -R | -S | -F | -Gram negative | -Escherichia | -coli | -TRUE | -||||||||||
2011-05-27 | -G10 | -Hospital B | +2 | +2013-02-21 | +I8 | +Hospital D | B_STRPTC_PNE | S | -S | -S | +I | +R | R | M | Gram positive | @@ -839,45 +827,64 @@pneumoniae | TRUE | |||||
2012-08-19 | -Q1 | -Hospital D | +||||||||||||||||||||
3 | +2015-05-12 | +F4 | +Hospital B | B_ESCHR_COL | S | -R | S | S | -F | +S | +M | Gram negative | Escherichia | coli | TRUE | |||||||
2016-05-06 | -Y2 | -Hospital D | -B_KLBSL_PNE | +|||||||||||||||||||
5 | +2013-08-03 | +V8 | +Hospital C | +B_STRPTC_PNE | +S | +S | +S | R | -S | -S | -S | F | -Gram negative | -Klebsiella | +Gram positive | +Streptococcus | pneumoniae | TRUE | ||||
6 | +2017-10-01 | +V9 | +Hospital C | +B_STPHY_AUR | +S | +I | +S | +S | +F | +Gram positive | +Staphylococcus | +aureus | +TRUE | +|||||||||
2010-08-27 | -I8 | -Hospital A | +7 | +2014-06-20 | +E9 | +Hospital B | B_ESCHR_COL | +S | +S | R | S | -S | -S | M | Gram negative | Escherichia | @@ -901,9 +908,9 @@||||||
1 | Escherichia coli | -7,825 | -49.6% | -7,825 | -49.6% | +7,860 | +49.5% | +7,860 | +49.5% | |||||||||||||
2 | Staphylococcus aureus | -3,979 | -25.2% | -11,804 | -74.8% | +3,892 | +24.5% | +11,752 | +74.1% | |||||||||||||
3 | Streptococcus pneumoniae | -2,449 | -15.5% | -14,253 | -90.3% | +2,556 | +16.1% | +14,308 | +90.2% | |||||||||||||
4 | Klebsiella pneumoniae | -1,530 | -9.7% | -15,783 | +1,558 | +9.8% | +15,866 | 100.0% | ||||||||||||||
Hospital A | -0.4697326 | +0.4749633 | ||||||||||||||||||||
Hospital B | -0.4770958 | +0.4879713 | ||||||||||||||||||||
Hospital C | -0.4799670 | +0.4761905 | ||||||||||||||||||||
Hospital D | -0.4785082 | +0.4600062 |
EUCAST.Rmd
G_test.Rmd
WHONET.Rmd
This tutorial assumes you already imported the WHONET data with e.g. the readxl
package. In RStudio, this can be done using the menu button ‘Import Dataset’ in the tab ‘Environment’. Choose the option ‘From Excel’ and select your exported file. Make sure date fields are imported correctly.
An example syntax could look like this:
- +This package comes with an example data set WHONET
. We will use it for this analysis.
First, load the relevant packages if you did not yet did this. I use the tidyverse for all of my analyses. All of them. If you don’t know it yet, I suggest you read about it on their website: https://www.tidyverse.org/.
-library(dplyr) # part of tidyverse
-library(ggplot2) # part of tidyverse
-library(AMR) # this package
library(dplyr) # part of tidyverse
+library(ggplot2) # part of tidyverse
+library(AMR) # this package
We will have to transform some variables to simplify and automate the analysis:
mo
) using the ITIS reference data set, which contains all ~20,000 microorganisms from the taxonomic kingdoms Bacteria, Fungi and Protozoa. We do the tranformation with as.mo()
. This function also recognises almost all WHONET abbreviations of microorganisms."S"
, "I"
or "R"
. That is exactly where the as.rsi()
function is for.# transform variables
-data <- WHONET %>%
- # get microbial ID based on given organism
- mutate(mo = as.mo(Organism)) %>%
- # transform everything from "AMP_ND10" to "CIP_EE" to the new `rsi` class
- mutate_at(vars(AMP_ND10:CIP_EE), as.rsi)
# transform variables
+data <- WHONET %>%
+ # get microbial ID based on given organism
+ mutate(mo = as.mo(Organism)) %>%
+ # transform everything from "AMP_ND10" to "CIP_EE" to the new `rsi` class
+ mutate_at(vars(AMP_ND10:CIP_EE), as.rsi)
No errors or warnings, so all values are transformed succesfully. Let’s check it though, with a couple of frequency tables:
- -Frequency table of mo
from a data.frame
(500 x 54)
-Class: mo
(character
)
+
Frequency table of mo
from a data.frame
(500 x 54)
Class: mo
(character
)
Length: 500 (of which NA: 0 = 0.00%)
Unique: 39
Families: 9
@@ -324,16 +324,16 @@ Species: 36
(omitted 29 entries, n = 57 [11.4%])
-
-# our transformed antibiotic columns
-# amoxicillin/clavulanic acid (J01CR02) as an example
-data %>% freq(AMC_ND2)
Frequency table of AMC_ND2
from a data.frame
(500 x 54)
-Class: factor
> ordered
> rsi
(numeric
)
-Levels: S < I < R
+
+# our transformed antibiotic columns
+# amoxicillin/clavulanic acid (J01CR02) as an example
+data %>% freq(AMC_ND2)
Frequency table of AMC_ND2
from a data.frame
(500 x 54)
Class: factor
> ordered
> rsi
(numeric
)
Length: 500 (of which NA: 19 = 3.80%)
+Levels: 3: S
< I
< R
Unique: 3
%IR: 25.99% (ratio S : IR = 1.0 : 0.4)
+%IR: 25.00% (ratio 2.8:1)