diff --git a/DESCRIPTION b/DESCRIPTION index 67a8e1d2..c438c08b 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR Version: 0.5.0.9016 -Date: 2019-01-30 +Date: 2019-02-01 Title: Antimicrobial Resistance Analysis Authors@R: c( person( diff --git a/R/atc_online.R b/R/atc_online.R index b8db697e..77bbad0e 100644 --- a/R/atc_online.R +++ b/R/atc_online.R @@ -79,20 +79,12 @@ atc_online_property <- function(atc_code, stop("Packages 'xml2', 'rvest' and 'curl' are required for this function") } - # check active network interface, from https://stackoverflow.com/a/5078002/4575331 - has_internet <- function(url) { - # extract host from given url - # https://www.whocc.no/atc_ddd_index/ -> www.whocc.no - url <- url %>% - gsub("^(http://|https://)", "", .) %>% - strsplit('/', fixed = TRUE) %>% - unlist() %>% - .[1] - !is.null(curl::nslookup(url, error = FALSE)) + if (!all(atc_code %in% AMR::antibiotics)) { + atc_code <- as.character(as.atc(atc_code)) } - # check for connection using the ATC of amoxicillin - if (!curl::has_internet(url = url)) { - message("The URL could not be reached.") + + if (!curl::has_internet()) { + message("There appears to be no internet connection.") return(rep(NA, length(atc_code))) } diff --git a/R/freq.R b/R/freq.R index 5fc966c9..34fe6013 100755 --- a/R/freq.R +++ b/R/freq.R @@ -610,7 +610,13 @@ format_header <- function(x, markdown = FALSE, decimal.mark = ".", big.mark = ", # class and mode if (is.null(header$columns)) { + if (markdown == TRUE) { + header$class <- paste0("`", header$class, "`") + } if (!header$mode %in% header$class) { + if (markdown == TRUE) { + header$mode <- paste0("`", header$mode, "`") + } header$class <- header$class %>% rev() %>% paste(collapse = " > ") %>% paste0(silver(paste0(" (", header$mode, ")"))) } else { header$class <- header$class %>% rev() %>% paste(collapse = " > ") diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index 09e907f5..9854b148 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -40,7 +40,7 @@ AMR (for R) - 0.5.0.9015 + 0.5.0.9016 @@ -185,7 +185,7 @@

How to conduct AMR analysis

Matthijs S. Berends

-

29 January 2019

+

01 February 2019

@@ -194,10 +194,10 @@ -

Note: values on this page will change with every website update since they are based on randomly created values and the page was written in RMarkdown. However, the methodology remains unchanged. This page was generated on 29 January 2019.

-
-

-Introduction

+

Note: values on this page will change with every website update since they are based on randomly created values and the page was written in RMarkdown. However, the methodology remains unchanged. This page was generated on 01 February 2019.

+
+

+Introduction

For this tutorial, we will create fake demonstration data to work with.

You can skip to Cleaning the data if you already have your own data ready. If you start your analysis, try to make the structure of your data generally look like this:

@@ -210,21 +210,21 @@ - + - + - + @@ -232,73 +232,73 @@
2019-01-292019-02-01 abcd Escherichia coli S S
2019-01-292019-02-01 abcd Escherichia coli S R
2019-01-292019-02-01 efgh Escherichia coli R
-

Needed R packages

As with many uses in R, we need some additional packages for AMR analysis. Our package works closely together with the tidyverse packages dplyr and ggplot2 by Dr Hadley Wickham. The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.

Our AMR package depends on these packages and even extends their use and functions.

- +
library(dplyr)
+library(ggplot2)
+library(AMR)
+
+# (if not yet installed, install with:)
+# install.packages(c("tidyverse", "AMR"))
-
-

-Creation of data

+
+
+

+Creation of data

We will create some fake example data to use for analysis. For antimicrobial resistance analysis, we need at least: a patient ID, name or code of a microorganism, a date and antimicrobial results (an antibiogram). It could also include a specimen type (e.g. to filter on blood or urine), the ward type (e.g. to filter on ICUs).

With additional columns (like a hospital name, the patients gender of even [well-defined] clinical properties) you can do a comparative analysis, as this tutorial will demonstrate too.

-
-

-Patients

+
+

+Patients

To start with patients, we need a unique list of patients.

-
patients <- unlist(lapply(LETTERS, paste0, 1:10))
+
patients <- unlist(lapply(LETTERS, paste0, 1:10))

The LETTERS object is available in R - it’s a vector with 26 characters: A to Z. The patients object we just created is now a vector of length 260, with values (patient IDs) varying from A1 to Z10. Now we we also set the gender of our patients, by putting the ID and the gender in a table:

-
patients_table <- data.frame(patient_id = patients,
-                             gender = c(rep("M", 135),
-                                        rep("F", 125)))
+
patients_table <- data.frame(patient_id = patients,
+                             gender = c(rep("M", 135),
+                                        rep("F", 125)))

The first 135 patient IDs are now male, the other 125 are female.

-
-

-Dates

+
+

+Dates

Let’s pretend that our data consists of blood cultures isolates from 1 January 2010 until 1 January 2018.

-
dates <- seq(as.Date("2010-01-01"), as.Date("2018-01-01"), by = "day")
+
dates <- seq(as.Date("2010-01-01"), as.Date("2018-01-01"), by = "day")

This dates object now contains all days in our date range.

-

Microorganisms

For this tutorial, we will uses four different microorganisms: Escherichia coli, Staphylococcus aureus, Streptococcus pneumoniae, and Klebsiella pneumoniae:

-
bacteria <- c("Escherichia coli", "Staphylococcus aureus",
-              "Streptococcus pneumoniae", "Klebsiella pneumoniae")
+
bacteria <- c("Escherichia coli", "Staphylococcus aureus",
+              "Streptococcus pneumoniae", "Klebsiella pneumoniae")
-
-

-Other variables

+
+
+

+Other variables

For completeness, we can also add the hospital where the patients was admitted and we need to define valid antibmicrobial results for our randomisation:

-
hospitals <- c("Hospital A", "Hospital B", "Hospital C", "Hospital D")
-ab_interpretations <- c("S", "I", "R")
+
hospitals <- c("Hospital A", "Hospital B", "Hospital C", "Hospital D")
+ab_interpretations <- c("S", "I", "R")
-
-

-Put everything together

+
+

+Put everything together

Using the sample() function, we can randomly select items from all objects we defined earlier. To let our fake data reflect reality a bit, we will also approximately define the probabilities of bacteria and the antibiotic results with the prob parameter.

-
data <- data.frame(date = sample(dates, 5000, replace = TRUE),
-                   patient_id = sample(patients, 5000, replace = TRUE),
-                   hospital = sample(hospitals, 5000, replace = TRUE, prob = c(0.30, 0.35, 0.15, 0.20)),
-                   bacteria = sample(bacteria, 5000, replace = TRUE, prob = c(0.50, 0.25, 0.15, 0.10)),
-                   amox = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.60, 0.05, 0.35)),
-                   amcl = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.75, 0.10, 0.15)),
-                   cipr = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.80, 0.00, 0.20)),
-                   gent = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.92, 0.00, 0.08))
-                   )
+
data <- data.frame(date = sample(dates, 5000, replace = TRUE),
+                   patient_id = sample(patients, 5000, replace = TRUE),
+                   hospital = sample(hospitals, 5000, replace = TRUE, prob = c(0.30, 0.35, 0.15, 0.20)),
+                   bacteria = sample(bacteria, 5000, replace = TRUE, prob = c(0.50, 0.25, 0.15, 0.10)),
+                   amox = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.60, 0.05, 0.35)),
+                   amcl = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.75, 0.10, 0.15)),
+                   cipr = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.80, 0.00, 0.20)),
+                   gent = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.92, 0.00, 0.08))
+                   )

Using the left_join() function from the dplyr package, we can ‘map’ the gender to the patient ID using the patients_table object we created earlier:

- +
data <- data %>% left_join(patients_table)

The resulting data set contains 5,000 blood culture isolates. With the head() function we can preview the first 6 values of this data set:

-
head(data)
+
head(data)
@@ -313,153 +313,153 @@ - - - - - - + + + + - + + + - - + + + + + + + + + + + + + + + + + + + + + + + + + - - + - - + + - - - - - - - - - - - - - - + + + - + + - - - - - - - - - - - -
date
2015-02-23C5Hospital BStaphylococcus aureusRR2014-06-07Y9Hospital AKlebsiella pneumoniae S SMSSF
2011-10-29X22010-06-10Z1Hospital BEscherichia coliRSSSF
2012-03-20J6 Hospital C Streptococcus pneumoniaeSSSSM
2016-10-31M5Hospital AEscherichia coliS R S SSFM
2010-06-10J52016-05-05W8 Hospital B Escherichia coliSSSSM
2013-11-09U4Hospital AEscherichia coli R S S S F
2010-10-12C3
2016-03-10G8 Hospital AStaphylococcus aureusStreptococcus pneumoniaeS SI S S M
2017-12-04T3Hospital CEscherichia coliRSSSF

Now, let’s start the cleaning and the analysis!

-
-

-Cleaning the data

+
+

+Cleaning the data

Use the frequency table function freq() to look specifically for unique values in any variable. For example, for the gender variable:

-
data %>% freq(gender) # this would be the same: freq(data$gender)
-
# Frequency table 
-# Class:   factor (numeric)  
-# Levels:  F, M  
-# Length:  5,000 (of which NA: 0 = 0.00%)  
+
data %>% freq(gender) # this would be the same: freq(data$gender)
+
# Frequency table of `gender` from a `data.frame` (5,000 x 9) 
+# Class:   factor (numeric)
+# Levels:  F, M
+# Length:  5,000 (of which NA: 0 = 0.00%)
 # Unique:  2
 # 
 #      Item    Count   Percent   Cum. Count   Cum. Percent
 # ---  -----  ------  --------  -----------  -------------
-# 1    M       2,586     51.7%        2,586          51.7%
-# 2    F       2,414     48.3%        5,000         100.0%
+# 1 M 2,622 52.4% 2,622 52.4% +# 2 F 2,378 47.6% 5,000 100.0%

So, we can draw at least two conclusions immediately. From a data scientist perspective, the data looks clean: only values M and F. From a researcher perspective: there are slightly more men. Nothing we didn’t already know.

The data is already quite clean, but we still need to transform some variables. The bacteria column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate() function of the dplyr package makes this really easy:

-
data <- data %>%
-  mutate(bacteria = as.mo(bacteria))
+
data <- data %>%
+  mutate(bacteria = as.mo(bacteria))

We also want to transform the antibiotics, because in real life data we don’t know if they are really clean. The as.rsi() function ensures reliability and reproducibility in these kind of variables. The mutate_at() will run the as.rsi() function on defined variables:

-
data <- data %>%
-  mutate_at(vars(amox:gent), as.rsi)
+
data <- data %>%
+  mutate_at(vars(amox:gent), as.rsi)

Finally, we will apply EUCAST rules on our antimicrobial results. In Europe, most medical microbiological laboratories already apply these rules. Our package features their latest insights on intrinsic resistance and exceptional phenotypes. Moreover, the eucast_rules() function can also apply additional rules, like forcing ampicillin = R when amoxicillin/clavulanic acid = R.

Because the amoxicillin (column amox) and amoxicillin/clavulanic acid (column amcl) in our data were generated randomly, some rows will undoubtedly contain amox = S and amcl = R, which is technically impossible. The eucast_rules() fixes this:

-
data <- eucast_rules(data, col_mo = "bacteria")
-# 
-# Rules by the European Committee on Antimicrobial Susceptibility Testing (EUCAST)
-# 
-# EUCAST Clinical Breakpoints (v9.0, 2019)
-# Enterobacteriales (Order) (no changes)
-# Staphylococcus (no changes)
-# Enterococcus (no changes)
-# Streptococcus groups A, B, C, G (no changes)
-# Streptococcus pneumoniae (no changes)
-# Viridans group streptococci (no changes)
-# Haemophilus influenzae (no changes)
-# Moraxella catarrhalis (no changes)
-# Anaerobic Gram positives (no changes)
-# Anaerobic Gram negatives (no changes)
-# Pasteurella multocida (no changes)
-# Campylobacter jejuni and C. coli (no changes)
-# Aerococcus sanguinicola and A. urinae (no changes)
-# Kingella kingae (no changes)
-# 
-# EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
-# Table 1:  Intrinsic resistance in Enterobacteriaceae (324 changes)
-# Table 2:  Intrinsic resistance in non-fermentative Gram-negative bacteria (no changes)
-# Table 3:  Intrinsic resistance in other Gram-negative bacteria (no changes)
-# Table 4:  Intrinsic resistance in Gram-positive bacteria (722 changes)
-# Table 8:  Interpretive rules for B-lactam agents and Gram-positive cocci (no changes)
-# Table 9:  Interpretive rules for B-lactam agents and Gram-negative rods (no changes)
-# Table 10: Interpretive rules for B-lactam agents and other Gram-negative bacteria (no changes)
-# Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins (no changes)
-# Table 12: Interpretive rules for aminoglycosides (no changes)
-# Table 13: Interpretive rules for quinolones (no changes)
-# 
-# Other rules
-# Non-EUCAST: ampicillin = R where amoxicillin/clav acid = R (no changes)
-# Non-EUCAST: piperacillin = R where piperacillin/tazobactam = R (no changes)
-# Non-EUCAST: trimethoprim = R where trimethoprim/sulfa = R (no changes)
-# Non-EUCAST: amoxicillin/clav acid = S where ampicillin = S (no changes)
-# Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no changes)
-# Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no changes)
-# 
-# => EUCAST rules affected 1,853 out of 5,000 rows -> changed 1,046 test results.
+
data <- eucast_rules(data, col_mo = "bacteria")
+# 
+# Rules by the European Committee on Antimicrobial Susceptibility Testing (EUCAST)
+# 
+# EUCAST Clinical Breakpoints (v9.0, 2019)
+# Enterobacteriales (Order) (no changes)
+# Staphylococcus (no changes)
+# Enterococcus (no changes)
+# Streptococcus groups A, B, C, G (no changes)
+# Streptococcus pneumoniae (no changes)
+# Viridans group streptococci (no changes)
+# Haemophilus influenzae (no changes)
+# Moraxella catarrhalis (no changes)
+# Anaerobic Gram positives (no changes)
+# Anaerobic Gram negatives (no changes)
+# Pasteurella multocida (no changes)
+# Campylobacter jejuni and C. coli (no changes)
+# Aerococcus sanguinicola and A. urinae (no changes)
+# Kingella kingae (no changes)
+# 
+# EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
+# Table 1:  Intrinsic resistance in Enterobacteriaceae (340 changes)
+# Table 2:  Intrinsic resistance in non-fermentative Gram-negative bacteria (no changes)
+# Table 3:  Intrinsic resistance in other Gram-negative bacteria (no changes)
+# Table 4:  Intrinsic resistance in Gram-positive bacteria (681 changes)
+# Table 8:  Interpretive rules for B-lactam agents and Gram-positive cocci (no changes)
+# Table 9:  Interpretive rules for B-lactam agents and Gram-negative rods (no changes)
+# Table 10: Interpretive rules for B-lactam agents and other Gram-negative bacteria (no changes)
+# Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins (no changes)
+# Table 12: Interpretive rules for aminoglycosides (no changes)
+# Table 13: Interpretive rules for quinolones (no changes)
+# 
+# Other rules
+# Non-EUCAST: ampicillin = R where amoxicillin/clav acid = R (no changes)
+# Non-EUCAST: piperacillin = R where piperacillin/tazobactam = R (no changes)
+# Non-EUCAST: trimethoprim = R where trimethoprim/sulfa = R (no changes)
+# Non-EUCAST: amoxicillin/clav acid = S where ampicillin = S (no changes)
+# Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no changes)
+# Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no changes)
+# 
+# => EUCAST rules affected 1,858 out of 5,000 rows -> changed 1,021 test results.
-
-

-Adding new variables

+
+

+Adding new variables

Now that we have the microbial ID, we can add some taxonomic properties:

-
data <- data %>% 
-  mutate(gramstain = mo_gramstain(bacteria),
-         genus = mo_genus(bacteria),
-         species = mo_species(bacteria))
-
-

-First isolates

+
data <- data %>% 
+  mutate(gramstain = mo_gramstain(bacteria),
+         genus = mo_genus(bacteria),
+         species = mo_species(bacteria))
+
+

+First isolates

We also need to know which isolates we can actually use for analysis.

To conduct an analysis of antimicrobial resistance, you must only include the first isolate of every patient per episode (Hindler et al., Clin Infect Dis. 2007). If you would not do this, you could easily get an overestimate or underestimate of the resistance of an antibiotic. Imagine that a patient was admitted with an MRSA and that it was found in 5 different blood cultures the following weeks (yes, some countries like the Netherlands have these blood drawing policies). The resistance percentage of oxacillin of all isolates would be overestimated, because you included this MRSA more than once. It would clearly be selection bias.

The Clinical and Laboratory Standards Institute (CLSI) appoints this as follows:

@@ -467,22 +467,22 @@

(…) When preparing a cumulative antibiogram to guide clinical decisions about empirical antimicrobial therapy of initial infections, only the first isolate of a given species per patient, per analysis period (eg, one year) should be included, irrespective of body site, antimicrobial susceptibility profile, or other phenotypical characteristics (eg, biotype). The first isolate is easily identified, and cumulative antimicrobial susceptibility test data prepared using the first isolate are generally comparable to cumulative antimicrobial susceptibility test data calculated by other methods, providing duplicate isolates are excluded.
M39-A4 Analysis and Presentation of Cumulative Antimicrobial Susceptibility Test Data, 4th Edition. CLSI, 2014. Chapter 6.4

This AMR package includes this methodology with the first_isolate() function. It adopts the episode of a year (can be changed by user) and it starts counting days after every selected isolate. This new variable can easily be added to our data:

- -

So only 58.5% is suitable for resistance analysis! We can now filter on is with the filter() function, also from the dplyr package:

- +
data <- data %>% 
+  mutate(first = first_isolate(.))
+# NOTE: Using column `bacteria` as input for `col_mo`.
+# NOTE: Using column `date` as input for `col_date`.
+# NOTE: Using column `patient_id` as input for `col_patient_id`.
+# => Found 2,956 first isolates (59.1% of total)
+

So only 59.1% is suitable for resistance analysis! We can now filter on it with the filter() function, also from the dplyr package:

+
data_1st <- data %>% 
+  filter(first == TRUE)

For future use, the above two syntaxes can be shortened with the filter_first_isolate() function:

- +
data_1st <- data %>% 
+  filter_first_isolate()
-
-

-First weighted isolates

+
+

+First weighted isolates

We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Imagine this data, sorted on date:

@@ -499,32 +499,32 @@ - - + + - - + + - - + + - + - - + + - + @@ -532,19 +532,19 @@ - - + + - + - - + + @@ -554,41 +554,41 @@ - - + + - - + + - - + + - - + + - - + + + - - + - - + + @@ -598,29 +598,29 @@ - - + + - - + +
12010-08-27X62010-10-23K10 B_ESCHR_COLSS R SSS TRUE
22010-12-12X62011-03-17K10 B_ESCHR_COL RS R SS FALSE
32011-05-24X62011-08-12K10 B_ESCHR_COLRS S S S
42011-09-03X62012-02-24K10 B_ESCHR_COL SS R SS TRUE
52011-09-21X62012-04-19K10 B_ESCHR_COL S S
62011-10-31X62013-08-25K10 B_ESCHR_COL RS R SFALSERTRUE
72012-07-02X62014-01-04K10 B_ESCHR_COLSSRR S S FALSE
82012-12-30X62014-03-05K10 B_ESCHR_COLR S S SSTRUEFALSE
92013-03-15X62014-03-11K10 B_ESCHR_COL S S
102014-01-14X62014-06-20K10 B_ESCHR_COL SR S STRUESFALSE
-

Only 4 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and show be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics() function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.

+

Only 3 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics() function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.

If a column exists with a name like ‘key(…)ab’ the first_isolate() function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output:

- +
data <- data %>% 
+  mutate(keyab = key_antibiotics(.)) %>% 
+  mutate(first_weighted = first_isolate(.))
+# NOTE: Using column `bacteria` as input for `col_mo`.
+# NOTE: Using column `bacteria` as input for `col_mo`.
+# NOTE: Using column `date` as input for `col_date`.
+# NOTE: Using column `patient_id` as input for `col_patient_id`.
+# NOTE: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics  = FALSE to prevent this.
+# [Criterion] Inclusion based on key antibiotics, ignoring I.
+# => Found 4,383 first weighted isolates (87.7% of total)
@@ -637,34 +637,34 @@ - - + + - - + + - - + + - + - - + + - + @@ -673,20 +673,20 @@ - - + + - + - - + + @@ -697,23 +697,23 @@ - - + + - - + + - - + + - - + + @@ -721,52 +721,52 @@ - - + + + - - + - - + + - + - - + + - - - + + +
isolate
12010-08-27X62010-10-23K10 B_ESCHR_COLSS R SSS TRUE TRUE
22010-12-12X62011-03-17K10 B_ESCHR_COL RS R SS FALSE TRUE
32011-05-24X62011-08-12K10 B_ESCHR_COLRS S S S
42011-09-03X62012-02-24K10 B_ESCHR_COL SS R SS TRUE TRUE
52011-09-21X62012-04-19K10 B_ESCHR_COL S S
62011-10-31X62013-08-25K10 B_ESCHR_COL RS R SFALSERTRUE TRUE
72012-07-02X62014-01-04K10 B_ESCHR_COLSSRR S S FALSE
82012-12-30X62014-03-05K10 B_ESCHR_COLR S S SSTRUEFALSE TRUE
92013-03-15X62014-03-11K10 B_ESCHR_COL S S S S FALSEFALSETRUE
102014-01-14X62014-06-20K10 B_ESCHR_COL SR S STRUETRUESFALSEFALSE
-

Instead of 4, now 9 isolates are flagged. In total, 88.1% of all isolates are marked ‘first weighted’ - 146.6% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.

+

Instead of 3, now 9 isolates are flagged. In total, 87.7% of all isolates are marked ‘first weighted’ - 28.5% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.

As with filter_first_isolate(), there’s a shortcut for this new algorithm too:

- -

So we end up with 4,404 isolates for analysis.

+
data_1st <- data %>% 
+  filter_first_weighted_isolate()
+

So we end up with 4,383 isolates for analysis.

We can remove unneeded columns:

- +
data_1st <- data_1st %>% 
+  select(-c(first, keyab))

Now our data looks like:

-
head(data_1st)
+
head(data_1st)
@@ -785,55 +785,70 @@ - - - - - + + + + - - - - + + + + + - - + + + + + + + + + + + + + + + + + - + - + + + + + + + + + + + + + + + + - - + + - - - - - - - - - - - - - - - @@ -844,34 +859,19 @@ - - - + + + - - - + + + - - - - - - - - - - - - - - - - - + + @@ -879,17 +879,21 @@

Time for the analysis!

-
+
+

+Analysing the data

+

You might want to start by getting an idea of how the data is distributed. It’s an important start, because it also decides how you will continue your analysis.

+

-Analysing the data

-

You might want to start by getting an idea of how the data is distributed. It’s an important start, because it also decides how you will continue your analysis. ## Dispersion of species To just get an idea how the species are distributed, create a frequency table with our freq() function. We created the genus and species column earlier based on the microbial ID. With paste(), we can concatenate them together.

+Dispersion of species +

To just get an idea how the species are distributed, create a frequency table with our freq() function. We created the genus and species column earlier based on the microbial ID. With paste(), we can concatenate them together.

The freq() function can be used like the base R language was intended:

-
freq(paste(data_1st$genus, data_1st$species))
+
freq(paste(data_1st$genus, data_1st$species))

Or can be used like the dplyr way, which is easier readable:

-
data_1st %>% freq(genus, species)
-

Frequency table
+

data_1st %>% freq(genus, species)
+

Frequency table of genus and species from a data.frame (4,383 x 13)
Columns: 2
-Length: 4,404 (of which NA: 0 = 0.00%)
+Length: 4,383 (of which NA: 0 = 0.00%)
Unique: 4

Shortest: 16
Longest: 24

@@ -906,47 +910,48 @@ Longest: 24

- - - - + + + + - - - - + + + + - - - - + + + + - - - + + +
date
2015-02-23C5Hospital BB_STPHY_AURR2014-06-07Y9Hospital AB_KLBSL_PNE R S SMGram positiveStaphylococcusaureusSFGram negativeKlebsiellapneumoniae TRUE
2011-10-29X22010-06-10Z1Hospital BB_ESCHR_COLRSSSFGram negativeEscherichiacoliTRUE
2012-03-20J6 Hospital C B_STRPTC_PNERS S S RFM Gram positive Streptococcus pneumoniae TRUE
2016-10-31M5Hospital AB_ESCHR_COLSRSSMGram negativeEscherichiacoliTRUE
2010-06-10J52016-05-05W8 Hospital B B_ESCHR_COLSSSSMGram negativeEscherichiacoliTRUE
2013-11-09U4Hospital AB_ESCHR_COL R S Scoli TRUE
2010-10-12C3
2016-03-10G8 Hospital AB_STPHY_AURSIB_STRPTC_PNE S SSR M Gram positiveStaphylococcusaureusTRUE
2017-12-04T3Hospital CB_ESCHR_COLRSSSFGram negativeEscherichiacoliStreptococcuspneumoniae TRUE
1 Escherichia coli2,16549.2%2,16549.2%2,12848.6%2,12848.6%
2 Staphylococcus aureus1,07824.5%3,24373.6%1,11025.3%3,23873.9%
3 Streptococcus pneumoniae71916.3%3,96290.0%68415.6%3,92289.5%
4 Klebsiella pneumoniae44210.0%4,40446110.5%4,383 100.0%
-
-

-Resistance percentages

+
+
+

+Resistance percentages

The functions portion_R, portion_RI, portion_I, portion_IS and portion_S can be used to determine the portion of a specific antimicrobial outcome. They can be used on their own:

- +
data_1st %>% portion_IR(amox)
+# [1] 0.4832307

Or can be used in conjuction with group_by() and summarise(), both from the dplyr package:

-
data_1st %>% 
-  group_by(hospital) %>% 
-  summarise(amoxicillin = portion_IR(amox))
+
data_1st %>% 
+  group_by(hospital) %>% 
+  summarise(amoxicillin = portion_IR(amox))
@@ -955,27 +960,27 @@ Longest: 24

- + - + - + - +
hospital
Hospital A0.48524840.4959169
Hospital B0.45944190.4690956
Hospital C0.48753460.5075529
Hospital D0.45838220.4695341

Of course it would be very convenient to know the number of isolates responsible for the percentages. For that purpose the n_rsi() can be used, which works exactly like n_distinct() from the dplyr package. It counts all isolates available for every group (i.e. values S, I or R):

-
data_1st %>% 
-  group_by(hospital) %>% 
-  summarise(amoxicillin = portion_IR(amox),
-            available = n_rsi(amox))
+
data_1st %>% 
+  group_by(hospital) %>% 
+  summarise(amoxicillin = portion_IR(amox),
+            available = n_rsi(amox))
@@ -985,32 +990,32 @@ Longest: 24

- - + + - - + + - - + + - - + +
hospital
Hospital A0.485248412880.49591691347
Hospital B0.459441915410.46909561537
Hospital C0.48753467220.5075529662
Hospital D0.45838228530.4695341837

These functions can also be used to get the portion of multiple antibiotics, to calculate co-resistance very easily:

-
data_1st %>% 
-  group_by(genus) %>% 
-  summarise(amoxicillin = portion_S(amcl),
-            gentamicin = portion_S(gent),
-            "amox + gent" = portion_S(amcl, gent))
+
data_1st %>% 
+  group_by(genus) %>% 
+  summarise(amoxicillin = portion_S(amcl),
+            gentamicin = portion_S(gent),
+            "amox + gent" = portion_S(amcl, gent))
@@ -1021,99 +1026,99 @@ Longest: 24

- - - + + + - - - + + + - - - + + + - + - +
genus
Escherichia0.72471130.92055430.97644340.74812030.91635340.9765038
Klebsiella0.77149320.90497740.98416290.77223430.92407810.9913232
Staphylococcus0.75417440.93135440.98608530.75675680.92342340.9864865
Streptococcus0.75938800.7383041 0.00000000.75938800.7383041

To make a transition to the next part, let’s see how this difference could be plotted:

-
data_1st %>% 
-  group_by(genus) %>% 
-  summarise("1. Amoxicillin" = portion_S(amcl),
-            "2. Gentamicin" = portion_S(gent),
-            "3. Amox + gent" = portion_S(amcl, gent)) %>% 
-  tidyr::gather("Antibiotic", "S", -genus) %>%
-  ggplot(aes(x = genus,
-             y = S,
-             fill = Antibiotic)) +
-  geom_col(position = "dodge2")
+
data_1st %>% 
+  group_by(genus) %>% 
+  summarise("1. Amoxicillin" = portion_S(amcl),
+            "2. Gentamicin" = portion_S(gent),
+            "3. Amox + gent" = portion_S(amcl, gent)) %>% 
+  tidyr::gather("Antibiotic", "S", -genus) %>%
+  ggplot(aes(x = genus,
+             y = S,
+             fill = Antibiotic)) +
+  geom_col(position = "dodge2")

-
-

-Plots

+
+

+Plots

To show results in plots, most R users would nowadays use the ggplot2 package. This package lets you create plots in layers. You can read more about it on their website. A quick example would look like these syntaxes:

-
ggplot(data = a_data_set,
-       mapping = aes(x = year,
-                     y = value)) +
-  geom_col() +
-  labs(title = "A title",
-       subtitle = "A subtitle",
-       x = "My X axis",
-       y = "My Y axis")
-
-ggplot(a_data_set,
-       aes(year, value) +
-  geom_bar()
+
ggplot(data = a_data_set,
+       mapping = aes(x = year,
+                     y = value)) +
+  geom_col() +
+  labs(title = "A title",
+       subtitle = "A subtitle",
+       x = "My X axis",
+       y = "My Y axis")
+
+ggplot(a_data_set,
+       aes(year, value) +
+  geom_bar()

The AMR package contains functions to extend this ggplot2 package, for example geom_rsi(). It automatically transforms data with count_df() or portion_df() and show results in stacked bars. Its simplest and shortest example:

-
ggplot(data_1st) +
-  geom_rsi(translate_ab = FALSE)
+
ggplot(data_1st) +
+  geom_rsi(translate_ab = FALSE)

Omit the translate_ab = FALSE to have the antibiotic codes (amox, amcl, cipr, gent) translated to official WHO names (amoxicillin, amoxicillin and betalactamase inhibitor, ciprofloxacin, gentamicin).

If we group on e.g. the genus column and add some additional functions from our package, we can create this:

- +
# group the data on `genus`
+ggplot(data_1st %>% group_by(genus)) + 
+  # create bars with genus on x axis
+  # it looks for variables with class `rsi`,
+  # of which we have 4 (earlier created with `as.rsi`)
+  geom_rsi(x = "genus") + 
+  # split plots on antibiotic
+  facet_rsi(facet = "Antibiotic") +
+  # make R red, I yellow and S green
+  scale_rsi_colours() +
+  # show percentages on y axis
+  scale_y_percent(breaks = 0:4 * 25) +
+  # turn 90 degrees, make it bars instead of columns
+  coord_flip() +
+  # add labels
+  labs(title = "Resistance per genus and antibiotic", 
+       subtitle = "(this is fake data)") +
+  # and print genus in italic to follow our convention
+  # (is now y axis because we turned the plot)
+  theme(axis.text.y = element_text(face = "italic"))

To simplify this, we also created the ggplot_rsi() function, which combines almost all above functions:

- +
data_1st %>% 
+  group_by(genus) %>%
+  ggplot_rsi(x = "genus",
+             facet = "Antibiotic",
+             breaks = 0:4 * 25,
+             datalabels = FALSE) +
+  coord_flip()

-
-

-Using an independence test to compare resistance

+
+

+Independence test

The next example uses the included septic_patients, which is an anonymised data set containing 2,000 microbial blood culture isolates with their full antibiograms found in septic patients in 4 different hospitals in the Netherlands, between 2001 and 2017. It is true, genuine data. This data.frame can be used to practice AMR analysis.

We will compare the resistance to fosfomycin (column fosf) in hospital A and D. The input for the final fisher.test() will be this:

@@ -1136,26 +1141,26 @@ Longest: 24

We can transform the data and apply the test in only a couple of lines:

-
septic_patients %>%
-  filter(hospital_id %in% c("A", "D")) %>% # filter on only hospitals A and D
-  select(hospital_id, fosf) %>%            # select the hospitals and fosfomycin
-  group_by(hospital_id) %>%                # group on the hospitals
-  count_df(combine_IR = TRUE) %>%          # count all isolates per group (hospital_id)
-  tidyr::spread(hospital_id, Value) %>%    # transform output so A and D are columns
-  select(A, D) %>%                         # and select these only
-  as.matrix() %>%                          # transform to good old matrix for fisher.test()
-  fisher.test()                            # do Fisher's Exact Test
-# 
-#   Fisher's Exact Test for Count Data
-# 
-# data:  .
-# p-value = 0.03104
-# alternative hypothesis: true odds ratio is not equal to 1
-# 95 percent confidence interval:
-#  1.054283 4.735995
-# sample estimates:
-# odds ratio 
-#   2.228006
+
septic_patients %>%
+  filter(hospital_id %in% c("A", "D")) %>% # filter on only hospitals A and D
+  select(hospital_id, fosf) %>%            # select the hospitals and fosfomycin
+  group_by(hospital_id) %>%                # group on the hospitals
+  count_df(combine_IR = TRUE) %>%          # count all isolates per group (hospital_id)
+  tidyr::spread(hospital_id, Value) %>%    # transform output so A and D are columns
+  select(A, D) %>%                         # and select these only
+  as.matrix() %>%                          # transform to good old matrix for fisher.test()
+  fisher.test()                            # do Fisher's Exact Test
+# 
+#   Fisher's Exact Test for Count Data
+# 
+# data:  .
+# p-value = 0.03104
+# alternative hypothesis: true odds ratio is not equal to 1
+# 95 percent confidence interval:
+#  1.054283 4.735995
+# sample estimates:
+# odds ratio 
+#   2.228006

As can be seen, the p value is 0.03, which means that the fosfomycin resistances found in hospital A and D are really different.

@@ -1166,12 +1171,34 @@ Longest: 24

Contents

diff --git a/docs/articles/AMR_files/figure-html/plot 1-1.png b/docs/articles/AMR_files/figure-html/plot 1-1.png index a7f40a3c..353cc3c4 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 1-1.png and b/docs/articles/AMR_files/figure-html/plot 1-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 3-1.png b/docs/articles/AMR_files/figure-html/plot 3-1.png index 5718311c..433da064 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 3-1.png and b/docs/articles/AMR_files/figure-html/plot 3-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 4-1.png b/docs/articles/AMR_files/figure-html/plot 4-1.png index 5a2af150..80fdb87d 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 4-1.png and b/docs/articles/AMR_files/figure-html/plot 4-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 5-1.png b/docs/articles/AMR_files/figure-html/plot 5-1.png index f9ede70a..66135f32 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 5-1.png and b/docs/articles/AMR_files/figure-html/plot 5-1.png differ diff --git a/docs/index.html b/docs/index.html index 93b69ec0..73a9c432 100644 --- a/docs/index.html +++ b/docs/index.html @@ -236,15 +236,15 @@

Latest released version

This package is available on the official R network (CRAN), which has a peer-reviewed submission process. Install this package in R with:

- +

It will be downloaded and installed automatically. For RStudio, click on the menu Tools > Install Packages… and then type in “AMR” and press Install.

Latest development version

The latest and unpublished development version can be installed with (precaution: may be unstable):

-
install.packages("devtools")
-devtools::install_gitlab("msberends/AMR")
+
install.packages("devtools")
+devtools::install_gitlab("msberends/AMR")
@@ -284,17 +284,17 @@ Overview of functions

The AMR package basically does four important things:

    -
  1. -

    It cleanses existing data by providing new classes for microoganisms, antibiotics and antimicrobial results (both S/I/R and MIC). With this package, you learn R everything about microbiology that is needed for analysis. These functions all use artificial intelligence to guess results that you would expect:

    +
  2. It cleanses existing data by providing new classes for microoganisms, antibiotics and antimicrobial results (both S/I/R and MIC). With this package, you learn R everything about microbiology that is needed for analysis. These functions all use artificial intelligence to guess results that you would expect:
  3. +
  • Use as.mo() to get an ID of a microorganism. The IDs are human readable for the trained eye - the ID of Klebsiella pneumoniae is “B_KLBSL_PNE” (B stands for Bacteria) and the ID of S. aureus is “B_STPHY_AUR”. The function takes almost any text as input that looks like the name or code of a microorganism like “E. coli”, “esco” or “esccol” and tries to find expected results using artificial intelligence (AI) on the included ITIS data set, consisting of almost 20,000 microorganisms. It is very fast, please see our benchmarks. Moreover, it can group Staphylococci into coagulase negative and positive (CoNS and CoPS, see source) and can categorise Streptococci into Lancefield groups (like beta-haemolytic Streptococcus Group B, source).
  • Use as.rsi() to transform values to valid antimicrobial results. It produces just S, I or R based on your input and warns about invalid values. Even values like “<=0.002; S” (combined MIC/RSI) will result in “S”.
  • Use as.mic() to cleanse your MIC values. It produces a so-called factor (called ordinal in SPSS) with valid MIC values as levels. A value like “<=0.002; S” (combined MIC/RSI) will result in “<=0.002”.
  • Use as.atc() to get the ATC code of an antibiotic as defined by the WHO. This package contains a database with most LIS codes, official names, DDDs and even trade names of antibiotics. For example, the values “Furabid”, “Furadantin”, “nitro” all return the ATC code of Nitrofurantoine.
- -
  • -

    It enhances existing data and adds new data from data sets included in this package.

    +
      +
    1. It enhances existing data and adds new data from data sets included in this package.
    2. +
    -
  • -
  • -

    It analyses the data with convenient functions that use well-known methods.

    +
      +
    1. It analyses the data with convenient functions that use well-known methods.
    2. +
    -
  • -
  • -

    It teaches the user how to use all the above actions.

    +
      +
    1. It teaches the user how to use all the above actions.
    2. +
    • Aside from this website with many tutorials, the package itself contains extensive help pages with many examples for all functions.
    • It also contains an example data set called septic_patients. This data set contains: @@ -329,8 +329,6 @@
  • - -

    diff --git a/docs/news/index.html b/docs/news/index.html index b23ee952..905a6bcd 100644 --- a/docs/news/index.html +++ b/docs/news/index.html @@ -236,28 +236,13 @@
    • BREAKING: removed deprecated functions, parameters and references to ‘bactid’. Use as.mo() to identify an MO code.
    • -
    • Support for data from WHONET and EARS-Net (European Antimicrobial Resistance Surveillance Network): -
        +
      • Support for data from WHONET and EARS-Net (European Antimicrobial Resistance Surveillance Network):
      • Exported files from WHONET can be read and used in this package. For functions like first_isolate() and eucast_rules(), all parameters will be filled in automatically.
      • This package now knows all antibiotic abbrevations by EARS-Net (which are also being used by WHONET) - the antibiotics data set now contains a column ears_net.
      • -
      -
    • -
    • -

      All ab_* functions are deprecated and replaced by atc_* functions:

      - -These functions use as.atc() internally. The old atc_property has been renamed atc_online_property(). This is done for two reasons: firstly, not all ATC codes are of antibiotics (ab) but can also be of antivirals or antifungals. Secondly, the input must have class atc or must be coerable to this class. Properties of these classes should start with the same class name, analogous to as.mo() and e.g. mo_genus.
    • -
    • New website: https://msberends.gitlab.io/AMR (built with the great pkgdown) -
        +
      • All ab_* functions are deprecated and replaced by atc_* functions: r ab_property -> atc_property() ab_name -> atc_name() ab_official -> atc_official() ab_trivial_nl -> atc_trivial_nl() ab_certe -> atc_certe() ab_umcg -> atc_umcg() ab_tradenames -> atc_tradenames() These functions use as.atc() internally. The old atc_property has been renamed atc_online_property(). This is done for two reasons: firstly, not all ATC codes are of antibiotics (ab) but can also be of antivirals or antifungals. Secondly, the input must have class atc or must be coerable to this class. Properties of these classes should start with the same class name, analogous to as.mo() and e.g. mo_genus.
      • +
      • New website: https://msberends.gitlab.io/AMR (built with the great pkgdown)
      • Contains the complete manual of this package and all of its functions with an explanation of their parameters
      • Contains a comprehensive tutorial about how to conduct antimicrobial resistance analysis
      • -
      -
    • New functions set_mo_source() and get_mo_source() to use your own predefined MO codes as input for as.mo() and consequently all mo_* functions
    • Support for the upcoming dplyr version 0.8.0
    • New function guess_ab_col() to find an antibiotic column in a table
    • @@ -265,24 +250,11 @@ These functions use as.atc()
    • New function mo_renamed() to get a list of all returned values from as.mo() that have had taxonomic renaming
    • New function age() to calculate the (patients) age in years
    • New function age_groups() to split ages into custom or predefined groups (like children or elderly). This allows for easier demographic antimicrobial resistance analysis per age group.
    • -
    • -

      New function ggplot_rsi_predict() as well as the base R plot() function can now be used for resistance prediction calculated with resistance_predict():

      -
      x <- resistance_predict(septic_patients, col_ab = "amox")
      -plot(x)
      -ggplot_rsi_predict(x)
      +
    • New function ggplot_rsi_predict() as well as the base R plot() function can now be used for resistance prediction calculated with resistance_predict(): r x <- resistance_predict(septic_patients, col_ab = "amox") plot(x) ggplot_rsi_predict(x)
    • -
    • -

      Functions filter_first_isolate() and filter_first_weighted_isolate() to shorten and fasten filtering on data sets with antimicrobial results, e.g.:

      - -

      is equal to:

      -
      septic_patients %>%
      -  mutate(only_firsts = first_isolate(septic_patients, ...)) %>%
      -  filter(only_firsts == TRUE) %>%
      -  select(-only_firsts)
      +
    • Functions filter_first_isolate() and filter_first_weighted_isolate() to shorten and fasten filtering on data sets with antimicrobial results, e.g.: r septic_patients %>% filter_first_isolate(...) # or filter_first_isolate(septic_patients, ...) is equal to: r septic_patients %>% mutate(only_firsts = first_isolate(septic_patients, ...)) %>% filter(only_firsts == TRUE) %>% select(-only_firsts)
    • -
    • New vignettes about how to conduct AMR analysis, predict antimicrobial resistance, use the G-test and more. These are also available (and even easier readable) on our website: https://msberends.gitlab.io/AMR.

    • +
    • New vignettes about how to conduct AMR analysis, predict antimicrobial resistance, use the G-test and more. These are also available (and even easier readable) on our website: https://msberends.gitlab.io/AMR.

    @@ -294,16 +266,12 @@ These functions use as.atc()
  • Functions atc_ddd() and atc_groups() have been renamed atc_online_ddd() and atc_online_groups(). The old functions are deprecated and will be removed in a future version.
  • Function guess_mo() is now deprecated in favour of as.mo() and will be removed in future versions
  • Function guess_atc() is now deprecated in favour of as.atc() and will be removed in future versions
  • -
  • Function eucast_rules(): -
      +
    • Function eucast_rules():
    • Updated EUCAST Clinical breakpoints to version 9.0 of 1 January 2019
    • Fixed a critical bug where some rules that depend on previous applied rules would not be applied adequately
    • Emphasised in manual that penicillin is meant as benzylpenicillin (ATC J01CE01)
    • -
    -
  • -
  • Improvements for as.mo(): -
      +
    • Improvements for as.mo():
    • Fix for vector containing only empty values
    • Finds better results when input is in other languages
    • Better handling for subspecies
    • @@ -314,17 +282,12 @@ These functions use as.atc()
    • Progress bar will be shown when it takes more than 3 seconds to get results
    • Support for formatted console text
    • Console will return the percentage of uncoercable input
    • -
    -
  • -
  • Function first_isolate(): -
      +
    • Function first_isolate():
    • Fixed a bug where distances between dates would not be calculated right - in the septic_patients data set this yielded a difference of 0.15% more isolates
    • Will now use a column named like “patid” for the patient ID (parameter col_patientid), when this parameter was left blank
    • Will now use a column named like “key(…)ab” or “key(…)antibiotics” for the key antibiotics (parameter col_keyantibiotics()), when this parameter was left blank
    • Removed parameter output_logical, the function will now always return a logical value
    • Renamed parameter filter_specimen to specimen_group, although using filter_specimen will still work
    • -
    -
  • A note to the manual pages of the portion functions, that low counts can influence the outcome and that the portion functions may camouflage this, since they only return the portion (albeit being dependent on the minimum parameter)
  • Merged data sets microorganisms.certe and microorganisms.umcg into microorganisms.codes
  • @@ -337,23 +300,22 @@ These functions use as.atc()
  • Small text updates to summaries of class rsi and mic
  • -
  • Frequency tables (freq() function): - -
  • Function scale_y_percent() now contains the limits parameter
  • Automatic parameter filling for mdro(), key_antibiotics() and eucast_rules()
  • Updated examples for resistance prediction (resistance_predict() function)
  • -
  • Fix for as.mic() to support more values ending in (several) zeroes
  • +
  • Fix for as.mic() to support more values ending in (several) zeroes

  • @@ -409,8 +369,7 @@ These functions use as.atc()
  • EUCAST_rules was renamed to eucast_rules, the old function still exists as a deprecated function
  • -
  • Big changes to the eucast_rules function: -
      +
    • Big changes to the eucast_rules function:
    • Now also applies rules from the EUCAST ‘Breakpoint tables for bacteria’, version 8.1, 2018, http://www.eucast.org/clinical_breakpoints/ (see Source of the function)
    • New parameter rules to specify which rules should be applied (expert rules, breakpoints, others or all)
    • New parameter verbose which can be set to TRUE to get very specific messages about which columns and rows were affected
    • @@ -419,18 +378,11 @@ These functions use as.atc()
    • Data set septic_patients now reflects these changes
    • Added parameter pipe for piperacillin (J01CA12), also to the mdro function
    • Small fixes to EUCAST clinical breakpoint rules
    • -
    -
  • Added column kingdom to the microorganisms data set, and function mo_kingdom to look up values
  • Tremendous speed improvement for as.mo (and subsequently all mo_* functions), as empty values wil be ignored a priori
  • Fewer than 3 characters as input for as.mo will return NA
  • -
  • -

    Function as.mo (and all mo_* wrappers) now supports genus abbreviations with “species” attached

    -
    as.mo("E. species")        # B_ESCHR
    -mo_fullname("E. spp.")     # "Escherichia species"
    -as.mo("S. spp")            # B_STPHY
    -mo_fullname("S. species")  # "Staphylococcus species"
    +
  • Function as.mo (and all mo_* wrappers) now supports genus abbreviations with “species” attached r as.mo("E. species") # B_ESCHR mo_fullname("E. spp.") # "Escherichia species" as.mo("S. spp") # B_STPHY mo_fullname("S. species") # "Staphylococcus species"
  • Added parameter combine_IR (TRUE/FALSE) to functions portion_df and count_df, to indicate that all values of I and R must be merged into one, so the output only consists of S vs. IR (susceptible vs. non-susceptible)
  • Fix for portion_*(..., as_percent = TRUE) when minimal number of isolates would not be met
  • @@ -439,19 +391,18 @@ These functions use as.atc()
  • Using portion_* functions now throws a warning when total available isolate is below parameter minimum
  • Functions as.mo, as.rsi, as.mic, as.atc and freq will not set package name as attribute anymore
  • -
  • Frequency tables - freq(): - -
  • first_isolate now tries to find columns to use as input when parameters are left blank
  • Improvements for MDRO algorithm (function mdro)
  • @@ -475,8 +424,7 @@ These functions use as.atc()
  • ggplot_rsi and scale_y_percent have breaks parameter
  • -
  • AI improvements for as.mo: -
      +
    • AI improvements for as.mo:
    • "CRS" -> Stenotrophomonas maltophilia
    • @@ -489,8 +437,6 @@ These functions use as.atc()
    • "MSSE" -> Staphylococcus epidermidis
    • -
    -
  • Fix for join functions
  • Speed improvement for is.rsi.eligible, now 15-20 times faster
  • In g.test, when sum(x) is below 1000 or any of the expected values is below 5, Fisher’s Exact Test will be suggested
  • @@ -519,8 +465,7 @@ These functions use as.atc() New
    • The data set microorganisms now contains all microbial taxonomic data from ITIS (kingdoms Bacteria, Fungi and Protozoa), the Integrated Taxonomy Information System, available via https://itis.gov. The data set now contains more than 18,000 microorganisms with all known bacteria, fungi and protozoa according ITIS with genus, species, subspecies, family, order, class, phylum and subkingdom. The new data set microorganisms.old contains all previously known taxonomic names from those kingdoms.
    • -
    • New functions based on the existing function mo_property: -
        +
      • New functions based on the existing function mo_property:
      • Taxonomic names: mo_phylum, mo_class, mo_order, mo_family, mo_genus, mo_species, mo_subspecies
      • Semantic names: mo_fullname, mo_shortname @@ -530,52 +475,22 @@ These functions use as.atc()
      • Author and year: mo_ref
      -

      They also come with support for German, Dutch, French, Italian, Spanish and Portuguese:

      -
      mo_gramstain("E. coli")
      -# [1] "Gram negative"
      -mo_gramstain("E. coli", language = "de") # German
      -# [1] "Gramnegativ"
      -mo_gramstain("E. coli", language = "es") # Spanish
      -# [1] "Gram negativo"
      -mo_fullname("S. group A", language = "pt") # Portuguese
      -# [1] "Streptococcus grupo A"
      -

      Furthermore, former taxonomic names will give a note about the current taxonomic name:

      - -
    • -
    • Functions count_R, count_IR, count_I, count_SI and count_S to selectively count resistant or susceptible isolates +

      They also come with support for German, Dutch, French, Italian, Spanish and Portuguese: r mo_gramstain("E. coli") # [1] "Gram negative" mo_gramstain("E. coli", language = "de") # German # [1] "Gramnegativ" mo_gramstain("E. coli", language = "es") # Spanish # [1] "Gram negativo" mo_fullname("S. group A", language = "pt") # Portuguese # [1] "Streptococcus grupo A"

      +

      Furthermore, former taxonomic names will give a note about the current taxonomic name: r mo_gramstain("Esc blattae") # Note: 'Escherichia blattae' (Burgess et al., 1973) was renamed 'Shimwellia blattae' (Priest and Barker, 2010) # [1] "Gram negative"

        +
      • Functions count_R, count_IR, count_I, count_SI and count_S to selectively count resistant or susceptible isolates
      • Extra function count_df (which works like portion_df) to get all counts of S, I and R of a data set with antibiotic columns, with support for grouped variables
      • -
      -
    • Function is.rsi.eligible to check for columns that have valid antimicrobial results, but do not have the rsi class yet. Transform the columns of your raw data with: data %>% mutate_if(is.rsi.eligible, as.rsi)
    • -
    • -

      Functions as.mo and is.mo as replacements for as.bactid and is.bactid (since the microoganisms data set not only contains bacteria). These last two functions are deprecated and will be removed in a future release. The as.mo function determines microbial IDs using Artificial Intelligence (AI):

      - -

      And with great speed too - on a quite regular Linux server from 2007 it takes us less than 0.02 seconds to transform 25,000 items:

      - +
    • Functions as.mo and is.mo as replacements for as.bactid and is.bactid (since the microoganisms data set not only contains bacteria). These last two functions are deprecated and will be removed in a future release. The as.mo function determines microbial IDs using Artificial Intelligence (AI): r as.mo("E. coli") # [1] B_ESCHR_COL as.mo("MRSA") # [1] B_STPHY_AUR as.mo("S group A") # [1] B_STRPTC_GRA And with great speed too - on a quite regular Linux server from 2007 it takes us less than 0.02 seconds to transform 25,000 items: r thousands_of_E_colis <- rep("E. coli", 25000) microbenchmark::microbenchmark(as.mo(thousands_of_E_colis), unit = "s") # Unit: seconds # min median max neval # 0.01817717 0.01843957 0.03878077 100
    • Added parameter reference_df for as.mo, so users can supply their own microbial IDs, name or codes as a reference table
    • -
    • Renamed all previous references to bactid to mo, like: -
        +
      • Renamed all previous references to bactid to mo, like:
      • Column names inputs of EUCAST_rules, first_isolate and key_antibiotics
      • Column names of datasets microorganisms and septic_patients
      • All old syntaxes will still work with this version, but will throw warnings
      • -
      -
    • Function labels_rsi_count to print datalabels on a RSI ggplot2 model
    • Functions as.atc and is.atc to transform/look up antibiotic ATC codes as defined by the WHO. The existing function guess_atc is now an alias of as.atc.

    • Function ab_property and its aliases: ab_name, ab_tradenames, ab_certe, ab_umcg and ab_trivial_nl @@ -590,14 +505,7 @@ These functions use as.atc() Changed
      • Added three antimicrobial agents to the antibiotics data set: Terbinafine (D01BA02), Rifaximin (A07AA11) and Isoconazole (D01AC05)
      • -
      • -

        Added 163 trade names to the antibiotics data set, it now contains 298 different trade names in total, e.g.:

        -
        ab_official("Bactroban")
        -# [1] "Mupirocin"
        -ab_name(c("Bactroban", "Amoxil", "Zithromax", "Floxapen"))
        -# [1] "Mupirocin" "Amoxicillin" "Azithromycin" "Flucloxacillin"
        -ab_atc(c("Bactroban", "Amoxil", "Zithromax", "Floxapen"))
        -# [1] "R01AX06" "J01CA04" "J01FA10" "J01CF05"
        +
      • Added 163 trade names to the antibiotics data set, it now contains 298 different trade names in total, e.g.: r ab_official("Bactroban") # [1] "Mupirocin" ab_name(c("Bactroban", "Amoxil", "Zithromax", "Floxapen")) # [1] "Mupirocin" "Amoxicillin" "Azithromycin" "Flucloxacillin" ab_atc(c("Bactroban", "Amoxil", "Zithromax", "Floxapen")) # [1] "R01AX06" "J01CA04" "J01FA10" "J01CF05"
      • For first_isolate, rows will be ignored when there’s no species available
      • Function ratio is now deprecated and will be removed in a future release, as it is not really the scope of this package
      • @@ -606,36 +514,9 @@ These functions use as.atc()
      • Added prevalence column to the microorganisms data set
      • Added parameters minimum and as_percent to portion_df
      • -
      • -

        Support for quasiquotation in the functions series count_* and portions_*, and n_rsi. This allows to check for more than 2 vectors or columns.

        - -
      • -
      • Edited ggplot_rsi and geom_rsi so they can cope with count_df. The new fun parameter has value portion_df at default, but can be set to count_df.
      • -
      • Fix for ggplot_rsi when the ggplot2 package was not loaded
      • -
      • Added datalabels function labels_rsi_count to ggplot_rsi -
      • -
      • Added possibility to set any parameter to geom_rsi (and ggplot_rsi) so you can set your own preferences
      • -
      • Fix for joins, where predefined suffices would not be honoured
      • -
      • Added parameter quote to the freq function
      • -
      • Added generic function diff for frequency tables
      • -
      • Added longest en shortest character length in the frequency table (freq) header of class character -
      • -
      • -

        Support for types (classes) list and matrix for freq

        -
        my_matrix = with(septic_patients, matrix(c(age, gender), ncol = 2))
        -freq(my_matrix)
        -

        For lists, subsetting is possible:

        -
        my_list = list(age = septic_patients$age, gender = septic_patients$gender)
        -my_list %>% freq(age)
        -my_list %>% freq(gender)
        -
      • +
      • Support for quasiquotation in the functions series count_* and portions_*, and n_rsi. This allows to check for more than 2 vectors or columns. ```r septic_patients %>% select(amox, cipr) %>% count_IR() # which is the same as: septic_patients %>% count_IR(amox, cipr)
      +

      septic_patients %>% portion_S(amcl) septic_patients %>% portion_S(amcl, gent) septic_patients %>% portion_S(amcl, gent, pita) * Edited `ggplot_rsi` and `geom_rsi` so they can cope with `count_df`. The new `fun` parameter has value `portion_df` at default, but can be set to `count_df`. * Fix for `ggplot_rsi` when the `ggplot2` package was not loaded * Added datalabels function `labels_rsi_count` to `ggplot_rsi` * Added possibility to set any parameter to `geom_rsi` (and `ggplot_rsi`) so you can set your own preferences * Fix for joins, where predefined suffices would not be honoured * Added parameter `quote` to the `freq` function * Added generic function `diff` for frequency tables * Added longest en shortest character length in the frequency table (`freq`) header of class `character` * Support for types (classes) list and matrix for `freq`r my_matrix = with(septic_patients, matrix(c(age, gender), ncol = 2)) freq(my_matrix) For lists, subsetting is possible:r my_list = list(age = septic_patients$age, gender = septic_patients$gender) my_list %>% freq(age) my_list %>% freq(gender) ```

    @@ -654,21 +535,15 @@ These functions use as.atc() New

    • -BREAKING: rsi_df was removed in favour of new functions portion_R, portion_IR, portion_I, portion_SI and portion_S to selectively calculate resistance or susceptibility. These functions are 20 to 30 times faster than the old rsi function. The old function still works, but is deprecated. -
        +BREAKING: rsi_df was removed in favour of new functions portion_R, portion_IR, portion_I, portion_SI and portion_S to selectively calculate resistance or susceptibility. These functions are 20 to 30 times faster than the old rsi function. The old function still works, but is deprecated.
      • New function portion_df to get all portions of S, I and R of a data set with antibiotic columns, with support for grouped variables
      • -
      -
    • -BREAKING: the methodology for determining first weighted isolates was changed. The antibiotics that are compared between isolates (call key antibiotics) to include more first isolates (afterwards called first weighted isolates) are now as follows: -
        +BREAKING: the methodology for determining first weighted isolates was changed. The antibiotics that are compared between isolates (call key antibiotics) to include more first isolates (afterwards called first weighted isolates) are now as follows:
      • Universal: amoxicillin, amoxicillin/clavlanic acid, cefuroxime, piperacillin/tazobactam, ciprofloxacin, trimethoprim/sulfamethoxazole
      • Gram-positive: vancomycin, teicoplanin, tetracycline, erythromycin, oxacillin, rifampicin
      • Gram-negative: gentamicin, tobramycin, colistin, cefotaxime, ceftazidime, meropenem
      • -
      -
    • Support for ggplot2 -
        +
      • New functions geom_rsi, facet_rsi, scale_y_percent, scale_rsi_colours and theme_rsi
      • New wrapper function ggplot_rsi to apply all above functions on a data set: @@ -679,32 +554,22 @@ These functions use as.atc()
    • -
    - -
  • Determining bacterial ID: -
      +
    • Determining bacterial ID:
    • New functions as.bactid and is.bactid to transform/ look up microbial ID’s.
    • The existing function guess_bactid is now an alias of as.bactid
    • New Becker classification for Staphylococcus to categorise them into Coagulase Negative Staphylococci (CoNS) and Coagulase Positve Staphylococci (CoPS)
    • New Lancefield classification for Streptococcus to categorise them into Lancefield groups
    • -
    -
  • For convience, new descriptive statistical functions kurtosis and skewness that are lacking in base R - they are generic functions and have support for vectors, data.frames and matrices
  • Function g.test to perform the Χ2 distributed G-test, which use is the same as chisq.test
  • -
  • -Function ratio to transform a vector of values to a preset ratio - -
  • Support for Addins menu in RStudio to quickly insert %in% or %like% (and give them keyboard shortcuts), or to view the datasets that come with this package
  • Function p.symbol to transform p values to their related symbols: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Functions clipboard_import and clipboard_export as helper functions to quickly copy and paste from/to software like Excel and SPSS. These functions use the clipr package, but are a little altered to also support headless Linux servers (so you can use it in RStudio Server)
  • -
  • New for frequency tables (function freq): -
      +
    • New for frequency tables (function freq):
    • A vignette to explain its usage
    • Support for rsi (antimicrobial resistance) to use as input
    • Support for table to use as input: freq(table(x, y)) @@ -719,8 +584,6 @@ These functions use as.atc()
    • Header of frequency tables now also show Mean Absolute Deviaton (MAD) and Interquartile Range (IQR)
    • Possibility to globally set the default for the amount of items to print, with options(max.print.freq = n) where n is your preset value
    -
  • -

    @@ -742,27 +605,21 @@ These functions use as.atc()
  • Small improvements to the microorganisms dataset (especially for Salmonella) and the column bactid now has the new class "bactid"
  • -
  • Combined MIC/RSI values will now be coerced by the rsi and mic functions: - -
  • Now possible to coerce MIC values with a space between operator and value, i.e. as.mic("<= 0.002") now works
  • Classes rsi and mic do not add the attribute package.version anymore
  • Added "groups" option for atc_property(..., property). It will return a vector of the ATC hierarchy as defined by the WHO. The new function atc_groups is a convenient wrapper around this.
  • Build-in host check for atc_property as it requires the host set by url to be responsive
  • Improved first_isolate algorithm to exclude isolates where bacteria ID or genus is unavailable
  • Fix for warning hybrid evaluation forced for row_number (924b62) from the dplyr package v0.7.5 and above
  • -
  • Support for empty values and for 1 or 2 columns as input for guess_bactid (now called as.bactid) -
      +
    • Support for empty values and for 1 or 2 columns as input for guess_bactid (now called as.bactid)
    • So yourdata %>% select(genus, species) %>% as.bactid() now also works
    • -
    -
  • Other small fixes
  • @@ -770,14 +627,11 @@ These functions use as.atc()

    Other

    @@ -792,17 +646,14 @@ These functions use as.atc()
    • Full support for Windows, Linux and macOS
    • Full support for old R versions, only R-3.0.0 (April 2013) or later is needed (needed packages may have other dependencies)
    • -
    • Function n_rsi to count cases where antibiotic test results were available, to be used in conjunction with dplyr::summarise, see ?rsi
    • +
    • Function n_rsi to count cases where antibiotic test results were available, to be used in conjunction with dplyr::summarise, see ?rsi
    • Function guess_bactid to determine the ID of a microorganism based on genus/species or known abbreviations like MRSA
    • Function guess_atc to determine the ATC of an antibiotic based on name, trade name, or known abbreviations
    • Function freq to create frequency tables, with additional info in a header
    • -
    • Function MDRO to determine Multi Drug Resistant Organisms (MDRO) with support for country-specific guidelines. -
        +
      • Function MDRO to determine Multi Drug Resistant Organisms (MDRO) with support for country-specific guidelines.
      • Exceptional resistances defined by EUCAST are also supported instead of countries alone
      • Functions BRMO and MRGN are wrappers for Dutch and German guidelines, respectively
      • -
      -
    • New algorithm to determine weighted isolates, can now be "points" or "keyantibiotics", see ?first_isolate
    • New print format for tibbles and data.tables
    • diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml index ac0898dd..6102d6c7 100644 --- a/docs/pkgdown.yml +++ b/docs/pkgdown.yml @@ -1,4 +1,4 @@ -pandoc: 2.3.1 +pandoc: 1.17.2 pkgdown: 1.3.0 pkgdown_sha: ~ articles: diff --git a/vignettes/AMR.Rmd b/vignettes/AMR.Rmd index 317231c7..56a16434 100755 --- a/vignettes/AMR.Rmd +++ b/vignettes/AMR.Rmd @@ -25,7 +25,7 @@ knitr::opts_chunk$set( **Note:** values on this page will change with every website update since they are based on randomly created values and the page was written in [RMarkdown](https://rmarkdown.rstudio.com/). However, the methodology remains unchanged. This page was generated on `r format(Sys.Date(), "%d %B %Y")`. -## Introduction +# Introduction For this tutorial, we will create fake demonstration data to work with. @@ -54,12 +54,12 @@ library(AMR) # install.packages(c("tidyverse", "AMR")) ``` -## Creation of data +# Creation of data We will create some fake example data to use for analysis. For antimicrobial resistance analysis, we need at least: a patient ID, name or code of a microorganism, a date and antimicrobial results (an antibiogram). It could also include a specimen type (e.g. to filter on blood or urine), the ward type (e.g. to filter on ICUs). With additional columns (like a hospital name, the patients gender of even [well-defined] clinical properties) you can do a comparative analysis, as this tutorial will demonstrate too. -#### Patients +## Patients To start with patients, we need a unique list of patients. ```{r create patients} @@ -76,7 +76,7 @@ patients_table <- data.frame(patient_id = patients, The first 135 patient IDs are now male, the other 125 are female. -#### Dates +## Dates Let's pretend that our data consists of blood cultures isolates from 1 January 2010 until 1 January 2018. ```{r create dates} @@ -93,7 +93,7 @@ bacteria <- c("Escherichia coli", "Staphylococcus aureus", "Streptococcus pneumoniae", "Klebsiella pneumoniae") ``` -#### Other variables +## Other variables For completeness, we can also add the hospital where the patients was admitted and we need to define valid antibmicrobial results for our randomisation: ```{r create other} @@ -101,7 +101,7 @@ hospitals <- c("Hospital A", "Hospital B", "Hospital C", "Hospital D") ab_interpretations <- c("S", "I", "R") ``` -#### Put everything together +## Put everything together Using the `sample()` function, we can randomly select items from all objects we defined earlier. To let our fake data reflect reality a bit, we will also approximately define the probabilities of bacteria and the antibiotic results with the `prob` parameter. @@ -134,7 +134,7 @@ knitr::kable(head(data), align = "c") Now, let's start the cleaning and the analysis! -## Cleaning the data +# Cleaning the data Use the frequency table function `freq()` to look specifically for unique values in any variable. For example, for the `gender` variable: ```{r freq gender 1, eval = FALSE} @@ -168,7 +168,7 @@ Because the amoxicillin (column `amox`) and amoxicillin/clavulanic acid (column data <- eucast_rules(data, col_mo = "bacteria") ``` -## Adding new variables +# Adding new variables Now that we have the microbial ID, we can add some taxonomic properties: ```{r new taxo} @@ -178,7 +178,7 @@ data <- data %>% species = mo_species(bacteria)) ``` -### First isolates +## First isolates We also need to know which isolates we can *actually* use for analysis. To conduct an analysis of antimicrobial resistance, you must [only include the first isolate of every patient per episode](https://www.ncbi.nlm.nih.gov/pubmed/17304462) (Hindler *et al.*, Clin Infect Dis. 2007). If you would not do this, you could easily get an overestimate or underestimate of the resistance of an antibiotic. Imagine that a patient was admitted with an MRSA and that it was found in 5 different blood cultures the following weeks (yes, some countries like the Netherlands have these blood drawing policies). The resistance percentage of oxacillin of all \emph{S. aureus} isolates would be overestimated, because you included this MRSA more than once. It would clearly be [selection bias](https://en.wikipedia.org/wiki/Selection_bias). @@ -194,7 +194,7 @@ data <- data %>% mutate(first = first_isolate(.)) ``` -So only `r AMR:::percent(sum(data$first) / nrow(data))` is suitable for resistance analysis! We can now filter on is with the `filter()` function, also from the `dplyr` package: +So only `r AMR:::percent(sum(data$first) / nrow(data))` is suitable for resistance analysis! We can now filter on it with the `filter()` function, also from the `dplyr` package: ```{r 1st isolate filter} data_1st <- data %>% @@ -207,7 +207,7 @@ data_1st <- data %>% filter_first_isolate() ``` -### First *weighted* isolates +## First *weighted* isolates We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Imagine this data, sorted on date: ```{r, echo = FALSE, message = FALSE, warning = FALSE, results = 'asis'} @@ -226,7 +226,7 @@ weighted_df %>% knitr::kable(align = "c") ``` -Only `r sum(weighted_df$first)` isolates are marked as 'first' according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and show be included too. This is why we weigh isolates, based on their antibiogram. The `key_antibiotics()` function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user. +Only `r sum(weighted_df$first)` isolates are marked as 'first' according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The `key_antibiotics()` function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user. If a column exists with a name like 'key(...)ab' the `first_isolate()` function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output: @@ -252,7 +252,7 @@ weighted_df2 %>% knitr::kable(align = "c") ``` -Instead of `r sum(weighted_df$first)`, now `r sum(weighted_df2$first_weighted)` isolates are flagged. In total, `r AMR:::percent(sum(data$first_weighted) / nrow(data))` of all isolates are marked 'first weighted' - `r AMR:::percent((sum(data$first_weighted) / nrow(data)) -- (sum(data$first) / nrow(data)))` more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline. +Instead of `r sum(weighted_df$first)`, now `r sum(weighted_df2$first_weighted)` isolates are flagged. In total, `r AMR:::percent(sum(data$first_weighted) / nrow(data))` of all isolates are marked 'first weighted' - `r AMR:::percent((sum(data$first_weighted) / nrow(data)) - (sum(data$first) / nrow(data)))` more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline. As with `filter_first_isolate()`, there's a shortcut for this new algorithm too: ```{r 1st isolate filter 3, results = 'hide', message = FALSE, warning = FALSE} @@ -280,8 +280,9 @@ knitr::kable(head(data_1st), align = "c") Time for the analysis! -## Analysing the data +# Analysing the data You might want to start by getting an idea of how the data is distributed. It's an important start, because it also decides how you will continue your analysis. + ## Dispersion of species To just get an idea how the species are distributed, create a frequency table with our `freq()` function. We created the `genus` and `species` column earlier based on the microbial ID. With `paste()`, we can concatenate them together. @@ -301,7 +302,7 @@ data_1st %>% freq(genus, species, header = TRUE) ``` -### Resistance percentages +## Resistance percentages The functions `portion_R`, `portion_RI`, `portion_I`, `portion_IS` and `portion_S` can be used to determine the portion of a specific antimicrobial outcome. They can be used on their own: @@ -371,7 +372,7 @@ data_1st %>% geom_col(position = "dodge2") ``` -### Plots +## Plots To show results in plots, most R users would nowadays use the `ggplot2` package. This package lets you create plots in layers. You can read more about it [on their website](https://ggplot2.tidyverse.org/). A quick example would look like these syntaxes: ```{r plot 2, eval = FALSE} @@ -433,7 +434,7 @@ data_1st %>% coord_flip() ``` -### Using an independence test to compare resistance +## Independence test The next example uses the included `septic_patients`, which is an anonymised data set containing 2,000 microbial blood culture isolates with their full antibiograms found in septic patients in 4 different hospitals in the Netherlands, between 2001 and 2017. It is true, genuine data. This `data.frame` can be used to practice AMR analysis.