diff --git a/DESCRIPTION b/DESCRIPTION index 1c5d092e..b547216b 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR -Version: 0.7.0.9010 -Date: 2019-06-16 +Version: 0.7.0.9012 +Date: 2019-06-18 Title: Antimicrobial Resistance Analysis Authors@R: c( person( diff --git a/NAMESPACE b/NAMESPACE index 42610ec1..c1329d57 100755 --- a/NAMESPACE +++ b/NAMESPACE @@ -301,6 +301,7 @@ importFrom(graphics,axis) importFrom(graphics,barplot) importFrom(graphics,boxplot) importFrom(graphics,hist) +importFrom(graphics,par) importFrom(graphics,plot) importFrom(graphics,points) importFrom(graphics,text) diff --git a/NEWS.md b/NEWS.md index 57ef0325..7143a761 100755 --- a/NEWS.md +++ b/NEWS.md @@ -1,4 +1,4 @@ -# AMR 0.7.0.9010 +# AMR 0.7.0.9012 #### New * Function `rsi_df()` to transform a `data.frame` to a data set containing only the microbial interpretation (S, I, R), the antibiotic, the percentage of S/I/R and the number of available isolates. This is a convenient combination of the existing functions `count_df()` and `portion_df()` to immediately show resistance percentages and number of available isolates: diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index 49765907..0f0f5d87 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -78,7 +78,7 @@
diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index 139af9fb..9543f9ee 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -40,7 +40,7 @@ @@ -192,7 +192,7 @@AMR.Rmd
Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 15 June 2019.
+Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 18 June 2019.
Now, let’s start the cleaning and the analysis!
@@ -411,8 +411,8 @@ # # Item Count Percent Cum. Count Cum. Percent # --- ----- ------- -------- ----------- ------------- -# 1 M 10,341 51.7% 10,341 51.7% -# 2 F 9,659 48.3% 20,000 100.0% +# 1 M 10,332 51.7% 10,332 51.7% +# 2 F 9,668 48.3% 20,000 100.0%So, we can draw at least two conclusions immediately. From a data scientists perspective, the data looks clean: only values M
and F
. From a researchers perspective: there are slightly more men. Nothing we didn’t already know.
The data is already quite clean, but we still need to transform some variables. The bacteria
column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate()
function of the dplyr
package makes this really easy:
data <- data %>%
@@ -442,14 +442,14 @@
# Pasteurella multocida (no new changes)
# Staphylococcus (no new changes)
# Streptococcus groups A, B, C, G (no new changes)
-# Streptococcus pneumoniae (1,478 new changes)
+# Streptococcus pneumoniae (1,428 new changes)
# Viridans group streptococci (no new changes)
#
# EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
-# Table 01: Intrinsic resistance in Enterobacteriaceae (1,340 new changes)
+# Table 01: Intrinsic resistance in Enterobacteriaceae (1,339 new changes)
# Table 02: Intrinsic resistance in non-fermentative Gram-negative bacteria (no new changes)
# Table 03: Intrinsic resistance in other Gram-negative bacteria (no new changes)
-# Table 04: Intrinsic resistance in Gram-positive bacteria (2,785 new changes)
+# Table 04: Intrinsic resistance in Gram-positive bacteria (2,671 new changes)
# Table 08: Interpretive rules for B-lactam agents and Gram-positive cocci (no new changes)
# Table 09: Interpretive rules for B-lactam agents and Gram-negative rods (no new changes)
# Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins (no new changes)
@@ -457,24 +457,24 @@
# Table 13: Interpretive rules for quinolones (no new changes)
#
# Other rules
-# Non-EUCAST: amoxicillin/clav acid = S where ampicillin = S (2,164 new changes)
-# Non-EUCAST: ampicillin = R where amoxicillin/clav acid = R (105 new changes)
+# Non-EUCAST: amoxicillin/clav acid = S where ampicillin = S (2,233 new changes)
+# Non-EUCAST: ampicillin = R where amoxicillin/clav acid = R (92 new changes)
# Non-EUCAST: piperacillin = R where piperacillin/tazobactam = R (no new changes)
# Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no new changes)
# Non-EUCAST: trimethoprim = R where trimethoprim/sulfa = R (no new changes)
# Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no new changes)
#
# --------------------------------------------------------------------------
-# EUCAST rules affected 6,519 out of 20,000 rows, making a total of 7,872 edits
+# EUCAST rules affected 6,456 out of 20,000 rows, making a total of 7,763 edits
# => added 0 test results
#
-# => changed 7,872 test results
-# - 122 test results changed from S to I
-# - 4,775 test results changed from S to R
-# - 1,060 test results changed from I to S
-# - 316 test results changed from I to R
-# - 1,581 test results changed from R to S
-# - 18 test results changed from R to I
+# => changed 7,763 test results
+# - 95 test results changed from S to I
+# - 4,674 test results changed from S to R
+# - 1,070 test results changed from I to S
+# - 305 test results changed from I to R
+# - 1,596 test results changed from R to S
+# - 23 test results changed from R to I
# --------------------------------------------------------------------------
#
# Use verbose = TRUE to get a data.frame with all specified edits instead.
So only 28.3% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
So only 28.4% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
For future use, the above two syntaxes can be shortened with the filter_first_isolate()
function:
We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient V3, sorted on date:
+We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient W10, sorted on date:
isolate | @@ -529,21 +529,21 @@|||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | -2010-01-06 | -V3 | +2010-01-10 | +W10 | B_ESCHR_COL | S | S | S | -R | +S | TRUE | ||
2 | -2010-07-24 | -V3 | +2010-04-21 | +W10 | B_ESCHR_COL | -R | +S | S | S | S | @@ -551,32 +551,32 @@|||
3 | -2010-07-26 | -V3 | +2010-05-14 | +W10 | B_ESCHR_COL | +S | +S | +S | R | +FALSE | +|||
4 | +2010-05-21 | +W10 | +B_ESCHR_COL | +S | S | S | S | FALSE | |||||
4 | -2011-05-06 | -V3 | -B_ESCHR_COL | -R | -S | -S | -S | -TRUE | -|||||
5 | -2011-06-04 | -V3 | +2010-06-09 | +W10 | B_ESCHR_COL | -S | +R | S | S | S | @@ -584,41 +584,41 @@|||
6 | -2011-07-22 | -V3 | +2010-06-19 | +W10 | B_ESCHR_COL | S | S | -S | +R | S | FALSE | ||
7 | -2011-08-15 | -V3 | +2010-07-07 | +W10 | B_ESCHR_COL | -I | -I | +S | S | R | +S | FALSE | |
8 | -2011-09-20 | -V3 | +2010-07-10 | +W10 | B_ESCHR_COL | R | S | -R | +S | S | FALSE | ||
9 | -2012-03-26 | -V3 | +2010-08-12 | +W10 | B_ESCHR_COL | S | S | @@ -628,18 +628,18 @@||||||
10 | -2012-06-01 | -V3 | +2010-10-15 | +W10 | B_ESCHR_COL | -R | -I | S | S | -TRUE | +S | +S | +FALSE |
Only 3 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics()
function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.
Only 1 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics()
function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.
If a column exists with a name like ‘key(…)ab’ the first_isolate()
function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output:
data <- data %>%
mutate(keyab = key_antibiotics(.)) %>%
@@ -650,7 +650,7 @@
# NOTE: Using column `patient_id` as input for `col_patient_id`.
# NOTE: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics = FALSE to prevent this.
# [Criterion] Inclusion based on key antibiotics, ignoring I.
-# => Found 15,202 first weighted isolates (76.0% of total)
isolate | @@ -667,58 +667,58 @@|||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | -2010-01-06 | -V3 | +2010-01-10 | +W10 | B_ESCHR_COL | S | S | S | -R | +S | TRUE | TRUE | |||
2 | -2010-07-24 | -V3 | +2010-04-21 | +W10 | B_ESCHR_COL | -R | +S | S | S | S | FALSE | -TRUE | +FALSE | ||
3 | -2010-07-26 | -V3 | +2010-05-14 | +W10 | B_ESCHR_COL | +S | +S | +S | R | -S | -S | -S | -FALSE | FALSE | +TRUE |
4 | -2011-05-06 | -V3 | +2010-05-21 | +W10 | B_ESCHR_COL | -R | S | S | S | -TRUE | +S | +FALSE | TRUE | ||
5 | -2011-06-04 | -V3 | +2010-06-09 | +W10 | B_ESCHR_COL | -S | +R | S | S | S | @@ -727,44 +727,44 @@|||||
6 | -2011-07-22 | -V3 | +2010-06-19 | +W10 | B_ESCHR_COL | S | S | -S | -S | -FALSE | -FALSE | -||||
7 | -2011-08-15 | -V3 | -B_ESCHR_COL | -I | -I | -S | R | +S | FALSE | TRUE | |||||
7 | +2010-07-07 | +W10 | +B_ESCHR_COL | +S | +S | +R | +S | +FALSE | +FALSE | +||||||
8 | -2011-09-20 | -V3 | +2010-07-10 | +W10 | B_ESCHR_COL | R | S | -R | +S | S | FALSE | TRUE | |||
9 | -2012-03-26 | -V3 | +2010-08-12 | +W10 | B_ESCHR_COL | S | S | @@ -775,23 +775,23 @@||||||||
10 | -2012-06-01 | -V3 | +2010-10-15 | +W10 | B_ESCHR_COL | -R | -I | S | S | -TRUE | -TRUE | +S | +S | +FALSE | +FALSE |
Instead of 3, now 8 isolates are flagged. In total, 76% of all isolates are marked ‘first weighted’ - 47.7% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
+Instead of 1, now 7 isolates are flagged. In total, 75.5% of all isolates are marked ‘first weighted’ - 47.1% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
As with filter_first_isolate()
, there’s a shortcut for this new algorithm too:
So we end up with 15,202 isolates for analysis.
+So we end up with 15,099 isolates for analysis.
We can remove unneeded columns:
@@ -799,6 +799,7 @@date | patient_id | hospital | @@ -815,68 +816,9 @@||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2010-06-23 | -R10 | -Hospital A | -B_STRPT_PNE | -S | -S | -S | -R | -F | -Gram-positive | -Streptococcus | -pneumoniae | -TRUE | -||||
2016-04-09 | -R1 | -Hospital D | -B_KLBSL_PNE | -R | -S | -S | -S | -F | -Gram-negative | -Klebsiella | -pneumoniae | -TRUE | -||||
2017-01-27 | -U10 | -Hospital A | -B_STRPT_PNE | -R | -R | -R | -R | -F | -Gram-positive | -Streptococcus | -pneumoniae | -TRUE | -||||
2017-05-23 | -C4 | -Hospital B | -B_KLBSL_PNE | -R | -S | -R | -S | -M | -Gram-negative | -Klebsiella | -pneumoniae | -TRUE | -||||
2015-03-27 | -W3 | +2 | +2013-01-14 | +S5 | Hospital B | B_ESCHR_COL | S | @@ -890,12 +832,29 @@TRUE | ||||||||
2014-06-14 | -L5 | -Hospital A | +3 | +2011-04-02 | +G2 | +Hospital B | B_ESCHR_COL | S | S | +R | +S | +M | +Gram-negative | +Escherichia | +coli | +TRUE | +
4 | +2011-09-22 | +B3 | +Hospital C | +B_ESCHR_COL | +R | +S | S | S | M | @@ -904,6 +863,54 @@coli | TRUE | |||||
5 | +2015-07-30 | +G10 | +Hospital A | +B_KLBSL_PNE | +R | +S | +S | +S | +M | +Gram-negative | +Klebsiella | +pneumoniae | +TRUE | +|||
6 | +2017-10-07 | +X3 | +Hospital A | +B_ESCHR_COL | +R | +S | +S | +S | +F | +Gram-negative | +Escherichia | +coli | +TRUE | +|||
7 | +2015-11-01 | +O1 | +Hospital B | +B_ESCHR_COL | +R | +S | +S | +S | +F | +Gram-negative | +Escherichia | +coli | +TRUE | +
Time for the analysis!
@@ -921,9 +928,9 @@Or can be used like the dplyr
way, which is easier readable:
Frequency table of genus
and species
from data_1st
(15,202 x 13)
Frequency table of genus
and species
from data_1st
(15,099 x 13)
Columns: 2
-Length: 15,202 (of which NA: 0 = 0.00%)
+Length: 15,099 (of which NA: 0 = 0.00%)
Unique: 4
Shortest: 16
Longest: 24
The functions portion_S()
, portion_SI()
, portion_I()
, portion_IR()
and portion_R()
can be used to determine the portion of a specific antimicrobial outcome. As per the EUCAST guideline of 2019, we calculate resistance as the portion of R (portion_R()
) and susceptibility as the portion of S and I (portion_SI()
). These functions can be used on their own:
Or can be used in conjuction with group_by()
and summarise()
, both from the dplyr
package:
data_1st %>%
group_by(hospital) %>%
@@ -990,19 +997,19 @@ Longest: 24
Hospital A
-0.4688237
+0.4754386
Hospital B
-0.4693374
+0.4673058
Hospital C
-0.4641460
+0.4625054
Hospital D
-0.4664644
+0.4617169
@@ -1020,23 +1027,23 @@ Longest: 24
Hospital A
-0.4688237
-4667
+0.4754386
+4560
Hospital B
-0.4693374
-5267
+0.4673058
+5215
Hospital C
-0.4641460
-2301
+0.4625054
+2307
Hospital D
-0.4664644
-2967
+0.4617169
+3017
@@ -1056,27 +1063,27 @@ Longest: 24
Escherichia
-0.9249165
-0.8956580
-0.9929192
+0.9237322
+0.8957806
+0.9952083
Klebsiella
-0.8413098
-0.8992443
-0.9911839
+0.8207196
+0.8957816
+0.9844913
Staphylococcus
-0.9169771
-0.9161788
-0.9949441
+0.9161273
+0.9180151
+0.9927184
Streptococcus
-0.6280051
+0.6160635
0.0000000
-0.6280051
+0.6160635
@@ -1087,36 +1094,36 @@ Longest: 24
"2. Gentamicin" = portion_SI(GEN),
"3. Amoxi/clav + genta" = portion_SI(AMC, GEN)) %>%
tidyr::gather("antibiotic", "S", -genus) %>%
- ggplot(aes(x = genus,
+ ggplot(aes(x = genus,
y = S,
fill = antibiotic)) +
- geom_col(position = "dodge2")
To show results in plots, most R users would nowadays use the ggplot2
package. This package lets you create plots in layers. You can read more about it on their website. A quick example would look like these syntaxes:
ggplot(data = a_data_set,
- mapping = aes(x = year,
+ggplot(data = a_data_set,
+ mapping = aes(x = year,
y = value)) +
- geom_col() +
- labs(title = "A title",
+ geom_col() +
+ labs(title = "A title",
subtitle = "A subtitle",
x = "My X axis",
y = "My Y axis")
# or as short as:
-ggplot(a_data_set) +
- geom_bar(aes(year))
+ggplot(a_data_set) +
+ geom_bar(aes(year))
The AMR
package contains functions to extend this ggplot2
package, for example geom_rsi()
. It automatically transforms data with count_df()
or portion_df()
and show results in stacked bars. Its simplest and shortest example:
ggplot(data_1st) +
+
Omit the translate_ab = FALSE
to have the antibiotic codes (AMX, AMC, CIP, GEN) translated to official WHO names (amoxicillin, amoxicillin/clavulanic acid, ciprofloxacin, gentamicin).
If we group on e.g. the genus
column and add some additional functions from our package, we can create this:
# group the data on `genus`
-ggplot(data_1st %>% group_by(genus)) +
+ggplot(data_1st %>% group_by(genus)) +
# create bars with genus on x axis
# it looks for variables with class `rsi`,
# of which we have 4 (earlier created with `as.rsi`)
@@ -1128,13 +1135,13 @@ Longest: 24
# show percentages on y axis
scale_y_percent(breaks = 0:4 * 25) +
# turn 90 degrees, to make it bars instead of columns
- coord_flip() +
+ coord_flip() +
# add labels
- labs(title = "Resistance per genus and antibiotic",
+ labs(title = "Resistance per genus and antibiotic",
subtitle = "(this is fake data)") +
# and print genus in italic to follow our convention
# (is now y axis because we turned the plot)
- theme(axis.text.y = element_text(face = "italic"))
+ theme(axis.text.y = element_text(face = "italic"))
To simplify this, we also created the ggplot_rsi()
function, which combines almost all above functions:
data_1st %>%
@@ -1143,7 +1150,7 @@ Longest: 24
facet = "antibiotic",
breaks = 0:4 * 25,
datalabels = FALSE) +
- coord_flip()
as.mo(..., allow_uncertain = 3)
Contents
These functions are meant to count isolates. Use the portion_*
functions to calculate microbial resistance.
The function n_rsi
is an alias of count_all
. They can be used to count all available isolates, i.e. where all input antibiotics have an available result (S, I or R). Their use is equal to n_distinct
. Their function is equal to count_S(...) + count_IR(...)
.
The function count_df
takes any variable from data
that has an "rsi"
class (created with as.rsi
) and counts the amounts of S, I and R. The resulting tidy data (see Source) data.frame
will have three rows (S/I/R) and a column for each variable with class "rsi"
.
The function rsi_df
works exactly like count_df
, but add the percentage of S, I and R.
The function rsi_df
works exactly like count_df
, but adds the percentage of S, I and R.
Remember that you should filter your table to let it contain only first isolates! Use first_isolate
to determine them in your data set.
These functions are not meant to count isolates, but to calculate the portion of resistance/susceptibility. Use the count
functions to count isolates. Low counts can infuence the outcome - these portion
functions may camouflage this, since they only return the portion albeit being dependent on the minimum
parameter.
The function portion_df
takes any variable from data
that has an "rsi"
class (created with as.rsi
) and calculates the portions R, I and S. The resulting tidy data (see Source) data.frame
will have three rows (S/I/R) and a column for each group and each variable with class "rsi"
.
The function rsi_df
works exactly like portion_df
, but add the number of isolates.
+
The function rsi_df
works exactly like portion_df
, but adds the number of isolates.
To calculate the probability (p) of susceptibility of one antibiotic, we use this formula: