diff --git a/DESCRIPTION b/DESCRIPTION index 3298eaede..3b568c517 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,5 +1,5 @@ Package: AMR -Version: 1.3.0.9000 +Version: 1.3.0.9001 Date: 2020-08-10 Title: Antimicrobial Resistance Analysis Authors@R: c( diff --git a/NEWS.md b/NEWS.md index 034004e4f..4fbfc5953 100755 --- a/NEWS.md +++ b/NEWS.md @@ -1,4 +1,4 @@ -# AMR 1.3.0.9000 +# AMR 1.3.0.9001 ## Last updated: 10 August 2020 ### Changed diff --git a/R/rsi.R b/R/rsi.R index db5718aa7..371b4baa1 100755 --- a/R/rsi.R +++ b/R/rsi.R @@ -110,7 +110,7 @@ #' library(dplyr) #' df %>% mutate_at(vars(AMP:TOB), as.rsi) #' df %>% mutate(across(AMP:TOB), as.rsi) - +#' #' df %>% #' mutate_at(vars(AMP:TOB), as.rsi, mo = "E. coli") #' diff --git a/docs/404.html b/docs/404.html index 1fc44158d..ec81be3ce 100644 --- a/docs/404.html +++ b/docs/404.html @@ -81,7 +81,7 @@ AMR (for R) - 1.3.0.9000 + 1.3.0.9001 @@ -248,7 +248,7 @@ Content not found. Please use links in the navbar.
-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.5.1.9000.

diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index ff5e96566..3c51005c9 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -81,7 +81,7 @@ AMR (for R) - 1.3.0.9000 + 1.3.0.9001 @@ -496,7 +496,7 @@ END OF TERMS AND CONDITIONS
-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.5.1.9000.

diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index aa04f346b..c2eb21d46 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -39,7 +39,7 @@ AMR (for R) - 1.3.0 + 1.3.0.9001 @@ -186,7 +186,7 @@

How to conduct AMR analysis

Matthijs S. Berends

-

30 July 2020

+

10 August 2020

Source: vignettes/AMR.Rmd @@ -195,7 +195,7 @@ -

Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 30 July 2020.

+

Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 10 August 2020.

Introduction

@@ -226,21 +226,21 @@ -2020-07-30 +2020-08-10 abcd Escherichia coli S S -2020-07-30 +2020-08-10 abcd Escherichia coli S R -2020-07-30 +2020-08-10 efgh Escherichia coli R @@ -253,13 +253,15 @@ Needed R packages

As with many uses in R, we need some additional packages for AMR analysis. Our package works closely together with the tidyverse packages dplyr and ggplot2 by RStudio. The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.

We will also use the cleaner package, that can be used for cleaning data and creating frequency tables.

-
library(dplyr)
-library(ggplot2)
-library(AMR)
-library(cleaner)
+
+library(dplyr)
+library(ggplot2)
+library(AMR)
+library(cleaner)
 
 # (if not yet installed, install with:)
-# install.packages(c("dplyr", "ggplot2", "AMR", "cleaner"))
+# install.packages(c("dplyr", "ggplot2", "AMR", "cleaner")) +
@@ -271,57 +273,73 @@

Patients

To start with patients, we need a unique list of patients.

-
patients <- unlist(lapply(LETTERS, paste0, 1:10))
+
+patients <- unlist(lapply(LETTERS, paste0, 1:10))
+

The LETTERS object is available in R - it’s a vector with 26 characters: A to Z. The patients object we just created is now a vector of length 260, with values (patient IDs) varying from A1 to Z10. Now we we also set the gender of our patients, by putting the ID and the gender in a table:

-
patients_table <- data.frame(patient_id = patients,
-                             gender = c(rep("M", 135),
-                                        rep("F", 125)))
+
+patients_table <- data.frame(patient_id = patients,
+                             gender = c(rep("M", 135),
+                                        rep("F", 125)))
+

The first 135 patient IDs are now male, the other 125 are female.

Dates

Let’s pretend that our data consists of blood cultures isolates from between 1 January 2010 and 1 January 2018.

-
dates <- seq(as.Date("2010-01-01"), as.Date("2018-01-01"), by = "day")
+
+dates <- seq(as.Date("2010-01-01"), as.Date("2018-01-01"), by = "day")
+

This dates object now contains all days in our date range.

Microorganisms

For this tutorial, we will uses four different microorganisms: Escherichia coli, Staphylococcus aureus, Streptococcus pneumoniae, and Klebsiella pneumoniae:

-
bacteria <- c("Escherichia coli", "Staphylococcus aureus",
-              "Streptococcus pneumoniae", "Klebsiella pneumoniae")
+
+bacteria <- c("Escherichia coli", "Staphylococcus aureus",
+              "Streptococcus pneumoniae", "Klebsiella pneumoniae")
+

Other variables

For completeness, we can also add the hospital where the patients was admitted and we need to define valid antibmicrobial results for our randomisation:

-
hospitals <- c("Hospital A", "Hospital B", "Hospital C", "Hospital D")
-ab_interpretations <- c("S", "I", "R")
+
+hospitals <- c("Hospital A", "Hospital B", "Hospital C", "Hospital D")
+ab_interpretations <- c("S", "I", "R")
+

Put everything together

Using the sample() function, we can randomly select items from all objects we defined earlier. To let our fake data reflect reality a bit, we will also approximately define the probabilities of bacteria and the antibiotic results with the prob parameter.

-
sample_size <- 20000
-data <- data.frame(date = sample(dates, size = sample_size, replace = TRUE),
-                   patient_id = sample(patients, size = sample_size, replace = TRUE),
-                   hospital = sample(hospitals, size = sample_size, replace = TRUE,
-                                     prob = c(0.30, 0.35, 0.15, 0.20)),
-                   bacteria = sample(bacteria, size = sample_size, replace = TRUE,
-                                     prob = c(0.50, 0.25, 0.15, 0.10)),
-                   AMX = sample(ab_interpretations, size = sample_size, replace = TRUE,
-                                 prob = c(0.60, 0.05, 0.35)),
-                   AMC = sample(ab_interpretations, size = sample_size, replace = TRUE,
-                                 prob = c(0.75, 0.10, 0.15)),
-                   CIP = sample(ab_interpretations, size = sample_size, replace = TRUE,
-                                 prob = c(0.80, 0.00, 0.20)),
-                   GEN = sample(ab_interpretations, size = sample_size, replace = TRUE,
-                                 prob = c(0.92, 0.00, 0.08)))
+
+sample_size <- 20000
+data <- data.frame(date = sample(dates, size = sample_size, replace = TRUE),
+                   patient_id = sample(patients, size = sample_size, replace = TRUE),
+                   hospital = sample(hospitals, size = sample_size, replace = TRUE,
+                                     prob = c(0.30, 0.35, 0.15, 0.20)),
+                   bacteria = sample(bacteria, size = sample_size, replace = TRUE,
+                                     prob = c(0.50, 0.25, 0.15, 0.10)),
+                   AMX = sample(ab_interpretations, size = sample_size, replace = TRUE,
+                                 prob = c(0.60, 0.05, 0.35)),
+                   AMC = sample(ab_interpretations, size = sample_size, replace = TRUE,
+                                 prob = c(0.75, 0.10, 0.15)),
+                   CIP = sample(ab_interpretations, size = sample_size, replace = TRUE,
+                                 prob = c(0.80, 0.00, 0.20)),
+                   GEN = sample(ab_interpretations, size = sample_size, replace = TRUE,
+                                 prob = c(0.92, 0.00, 0.08)))
+

Using the left_join() function from the dplyr package, we can ‘map’ the gender to the patient ID using the patients_table object we created earlier:

-
data <- data %>% left_join(patients_table)
+
+data <- data %>% left_join(patients_table)
+

The resulting data set contains 20,000 blood culture isolates. With the head() function we can preview the first 6 rows of this data set:

-
head(data)
+
+head(data)
+
@@ -336,70 +354,70 @@ - - - + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + - - + + - - - - - - - - - - - - + + + + + + + + + + + + - - - + + + - - + +
date
2017-06-10J6Hospital B2014-08-12M9Hospital DStaphylococcus aureusRSSSM
2013-03-13W1Hospital A Escherichia coli IRSSM
2013-08-26H10Hospital AEscherichia coliSSRSM
2016-02-09S7Hospital DEscherichia coliR S R S F
2015-12-02L5Hospital DStaphylococcus aureusRSRSM
2013-06-27A52017-09-11O7 Hospital BEscherichia coliSSSSM
2017-03-03Q4Hospital DEscherichia coliStreptococcus pneumoniae R S S S F
2014-06-03K4Hospital CStreptococcus pneumoniaeSISSM
2015-07-13I5Hospital A2010-07-19W4Hospital B Staphylococcus aureus S SR SMSF
@@ -411,7 +429,9 @@ Cleaning the data

We also created a package dedicated to data cleaning and checking, called the cleaner package. It freq() function can be used to create frequency tables.

For example, for the gender variable:

-
data %>% freq(gender)
+
+data %>% freq(gender)
+

Frequency table

Class: character
Length: 20,000
@@ -432,16 +452,16 @@ Longest: 1

1 M -10,312 -51.56% -10,312 -51.56% +10,386 +51.93% +10,386 +51.93% 2 F -9,688 -48.44% +9,614 +48.07% 20,000 100.00% @@ -449,23 +469,31 @@ Longest: 1

So, we can draw at least two conclusions immediately. From a data scientists perspective, the data looks clean: only values M and F. From a researchers perspective: there are slightly more men. Nothing we didn’t already know.

The data is already quite clean, but we still need to transform some variables. The bacteria column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate() function of the dplyr package makes this really easy:

-
data <- data %>%
-  mutate(bacteria = as.mo(bacteria))
+
+data <- data %>%
+  mutate(bacteria = as.mo(bacteria))
+

We also want to transform the antibiotics, because in real life data we don’t know if they are really clean. The as.rsi() function ensures reliability and reproducibility in these kind of variables. The mutate_at() will run the as.rsi() function on defined variables:

-
data <- data %>%
-  mutate_at(vars(AMX:GEN), as.rsi)
+
+data <- data %>%
+  mutate_at(vars(AMX:GEN), as.rsi)
+

Finally, we will apply EUCAST rules on our antimicrobial results. In Europe, most medical microbiological laboratories already apply these rules. Our package features their latest insights on intrinsic resistance and exceptional phenotypes. Moreover, the eucast_rules() function can also apply additional rules, like forcing ampicillin = R when amoxicillin/clavulanic acid = R.

Because the amoxicillin (column AMX) and amoxicillin/clavulanic acid (column AMC) in our data were generated randomly, some rows will undoubtedly contain AMX = S and AMC = R, which is technically impossible. The eucast_rules() fixes this:

-
data <- eucast_rules(data, col_mo = "bacteria", rules = "all")
+
+data <- eucast_rules(data, col_mo = "bacteria", rules = "all")
+

Adding new variables

Now that we have the microbial ID, we can add some taxonomic properties:

-
data <- data %>%
-  mutate(gramstain = mo_gramstain(bacteria),
-         genus = mo_genus(bacteria),
-         species = mo_species(bacteria))
+
+data <- data %>% 
+  mutate(gramstain = mo_gramstain(bacteria),
+         genus = mo_genus(bacteria),
+         species = mo_species(bacteria))
+

First isolates

@@ -476,22 +504,28 @@ Longest: 1

(…) When preparing a cumulative antibiogram to guide clinical decisions about empirical antimicrobial therapy of initial infections, only the first isolate of a given species per patient, per analysis period (eg, one year) should be included, irrespective of body site, antimicrobial susceptibility profile, or other phenotypical characteristics (eg, biotype). The first isolate is easily identified, and cumulative antimicrobial susceptibility test data prepared using the first isolate are generally comparable to cumulative antimicrobial susceptibility test data calculated by other methods, providing duplicate isolates are excluded.
M39-A4 Analysis and Presentation of Cumulative Antimicrobial Susceptibility Test Data, 4th Edition. CLSI, 2014. Chapter 6.4

This AMR package includes this methodology with the first_isolate() function. It adopts the episode of a year (can be changed by user) and it starts counting days after every selected isolate. This new variable can easily be added to our data:

-
data <- data %>%
-  mutate(first = first_isolate(.))
+
+data <- data %>% 
+  mutate(first = first_isolate(.))
 # NOTE: Using column `bacteria` as input for `col_mo`.
 # NOTE: Using column `date` as input for `col_date`.
-# NOTE: Using column `patient_id` as input for `col_patient_id`.
-

So only 28.2% is suitable for resistance analysis! We can now filter on it with the filter() function, also from the dplyr package:

-
data_1st <- data %>%
-  filter(first == TRUE)
+# NOTE: Using column `patient_id` as input for `col_patient_id`. +
+

So only 28.5% is suitable for resistance analysis! We can now filter on it with the filter() function, also from the dplyr package:

+
+data_1st <- data %>% 
+  filter(first == TRUE)
+

For future use, the above two syntaxes can be shortened with the filter_first_isolate() function:

-
data_1st <- data %>%
-  filter_first_isolate()
+
+data_1st <- data %>% 
+  filter_first_isolate()
+

First weighted isolates

-

We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient Q3, sorted on date:

+

We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient A7, sorted on date:

@@ -507,8 +541,8 @@ Longest: 1

- - + + @@ -518,10 +552,10 @@ Longest: 1

- - + + - + @@ -529,19 +563,30 @@ Longest: 1

- - + + - + - - + + + + + + + + + + + + + @@ -549,10 +594,21 @@ Longest: 1

+ + + + + + + + + + + - - - + + + @@ -561,53 +617,31 @@ Longest: 1

- - - + + + - - - - - - - - - - - - + - - - - - - - - - - - - - + + - + - - + + @@ -619,15 +653,17 @@ Longest: 1

isolate
12010-02-06Q32010-02-03A7 B_ESCHR_COLI S S
22010-04-09Q32010-02-04A7 B_ESCHR_COLISR S S S
32010-07-03Q32010-04-05A7 B_ESCHR_COLI S SRS S FALSE
42010-07-31Q32010-06-15A7B_ESCHR_COLISSSSFALSE
52010-08-28A7 B_ESCHR_COLI S S R FALSE
62010-09-03A7B_ESCHR_COLISSRSFALSE
52010-10-26Q372010-11-14A7 B_ESCHR_COLI S S FALSE
62010-12-11Q382011-02-14A7 B_ESCHR_COLISSSRFALSE
72011-03-20Q3B_ESCHR_COLIRI S S S TRUE
82011-04-08Q3B_ESCHR_COLISSSSFALSE
92011-04-24Q32011-03-06A7 B_ESCHR_COLI R SSR S FALSE
102011-05-08Q32011-03-09A7 B_ESCHR_COLI S S

Only 2 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics() function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.

If a column exists with a name like ‘key(…)ab’ the first_isolate() function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output:

-
data <- data %>%
-  mutate(keyab = key_antibiotics(.)) %>%
-  mutate(first_weighted = first_isolate(.))
+
+data <- data %>% 
+  mutate(keyab = key_antibiotics(.)) %>% 
+  mutate(first_weighted = first_isolate(.))
 # NOTE: Using column `bacteria` as input for `col_mo`.
 # NOTE: more than one result was found for item 1: amoxicillin/clavulanic acid, azidocillin
 # NOTE: Using column `bacteria` as input for `col_mo`.
 # NOTE: Using column `date` as input for `col_date`.
 # NOTE: Using column `patient_id` as input for `col_patient_id`.
-# NOTE: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics = FALSE to prevent this.
+# NOTE: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics = FALSE to prevent this. +
@@ -644,8 +680,8 @@ Longest: 1

- - + + @@ -656,32 +692,44 @@ Longest: 1

- - + + - + - + - - + + - + - - + + + + + + + + + + + + + + @@ -690,46 +738,22 @@ Longest: 1

- - - - - - - - - - - - - - + + - + - - - - - - - - - - - - - - + + @@ -738,22 +762,34 @@ Longest: 1

+ + + + + + + + + + + + - - + + - + - - + + @@ -764,16 +800,22 @@ Longest: 1

isolate
12010-02-06Q32010-02-03A7 B_ESCHR_COLI S S
22010-04-09Q32010-02-04A7 B_ESCHR_COLISR S S S FALSEFALSETRUE
32010-07-03Q32010-04-05A7 B_ESCHR_COLI S SRS S FALSE TRUE
42010-07-31Q32010-06-15A7B_ESCHR_COLISSSSFALSEFALSE
52010-08-28A7 B_ESCHR_COLI S S FALSE TRUE
52010-10-26Q3B_ESCHR_COLISSSSFALSETRUE
62010-12-11Q32010-09-03A7 B_ESCHR_COLI S SS RS FALSE TRUE
72011-03-20Q3B_ESCHR_COLIRSSSTRUETRUE
82011-04-08Q32010-11-14A7 B_ESCHR_COLI S S FALSE TRUE
82011-02-14A7B_ESCHR_COLIISSSTRUETRUE
92011-04-24Q32011-03-06A7 B_ESCHR_COLI R SSR S FALSE TRUE
102011-05-08Q32011-03-09A7 B_ESCHR_COLI S S
-

Instead of 2, now 9 isolates are flagged. In total, 78.8% of all isolates are marked ‘first weighted’ - 50.6% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.

+

Instead of 2, now 9 isolates are flagged. In total, 79.0% of all isolates are marked ‘first weighted’ - 50.5% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.

As with filter_first_isolate(), there’s a shortcut for this new algorithm too:

-
data_1st <- data %>%
-  filter_first_weighted_isolate()
-

So we end up with 15,769 isolates for analysis.

+
+data_1st <- data %>% 
+  filter_first_weighted_isolate()
+
+

So we end up with 15,794 isolates for analysis.

We can remove unneeded columns:

-
data_1st <- data_1st %>%
-  select(-c(first, keyab))
+
+data_1st <- data_1st %>% 
+  select(-c(first, keyab))
+

Now our data looks like:

-
head(data_1st)
+
+head(data_1st)
+
@@ -788,7 +830,7 @@ Longest: 1

-+ @@ -810,89 +852,9 @@ Longest: 1

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - @@ -904,6 +866,86 @@ Longest: 1

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
12017-06-10J6Hospital BB_ESCHR_COLIRRSSMGram-negativeEscherichiacoliTRUE
22013-08-26H10Hospital AB_ESCHR_COLISSRSMGram-negativeEscherichiacoliTRUE
32016-02-09S72014-08-12M9 Hospital DB_ESCHR_COLIRSRSFGram-negativeEscherichiacoliTRUE
42013-06-27A5Hospital BB_ESCHR_COLISSSSMGram-negativeEscherichiacoliTRUE
52017-03-03Q4Hospital DB_ESCHR_COLIRSSSFGram-negativeEscherichiacoliTRUE
72015-08-02N2Hospital B B_STPHY_AURS R S aureus TRUE
22013-03-13W1Hospital AB_ESCHR_COLIISRSFGram-negativeEscherichiacoliTRUE
32015-12-02L5Hospital DB_STPHY_AURSRSRSMGram-positiveStaphylococcusaureusTRUE
42017-09-11O7Hospital BB_STRPT_PNMNRRSRFGram-positiveStreptococcuspneumoniaeTRUE
52014-06-03K4Hospital CB_STRPT_PNMNSSSRMGram-positiveStreptococcuspneumoniaeTRUE
72011-02-14B9Hospital AB_ESCHR_COLIRSSSMGram-negativeEscherichiacoliTRUE

Time for the analysis!

@@ -918,13 +960,17 @@ Longest: 1

Dispersion of species

To just get an idea how the species are distributed, create a frequency table with our freq() function. We created the genus and species column earlier based on the microbial ID. With paste(), we can concatenate them together.

The freq() function can be used like the base R language was intended:

-
freq(paste(data_1st$genus, data_1st$species))
+
+freq(paste(data_1st$genus, data_1st$species))
+

Or can be used like the dplyr way, which is easier readable:

-
data_1st %>% freq(genus, species)
+
+data_1st %>% freq(genus, species)
+

Frequency table

Class: character
-Length: 15,769
-Available: 15,769 (100%, NA: 0 = 0%)
+Length: 15,794
+Available: 15,794 (100%, NA: 0 = 0%)
Unique: 4

Shortest: 16
Longest: 24

@@ -941,33 +987,33 @@ Longest: 24

1 Escherichia coli -7,969 -50.54% -7,969 -50.54% +7,828 +49.56% +7,828 +49.56% 2 Staphylococcus aureus -3,947 -25.03% -11,916 -75.57% +3,925 +24.85% +11,753 +74.41% 3 Streptococcus pneumoniae -2,311 -14.66% -14,227 -90.22% +2,399 +15.19% +14,152 +89.60% 4 Klebsiella pneumoniae -1,542 -9.78% -15,769 +1,642 +10.40% +15,794 100.00% @@ -977,9 +1023,11 @@ Longest: 24

Overview of different bug/drug combinations

If you want to get a quick glance of the number of isolates in different bug/drug combinations, you can use the bug_drug_combinations() function:

-
data_1st %>%
-  bug_drug_combinations() %>%
-  head() # show first 6 rows
+
+data_1st %>% 
+  bug_drug_combinations() %>% 
+  head() # show first 6 rows
+
# NOTE: Using column `bacteria` as input for `col_mo`.
@@ -994,57 +1042,59 @@ Longest: 24

- - - - + + + + - - - - + + + + - + - - + + - + - - + + - - + + - - - - + + + +
E. coli AMX376927639247969372224738597828
E. coli AMC624230714207969612729314087828
E. coli CIP60445959 01925796918697828
E. coli GEN71497073 082079697557828
K. pneumoniae AMX 0 01542154216421642
K. pneumoniae AMC12096926415421304582801642

Using Tidyverse selections, you can also select columns based on the antibiotic class they are in:

-
data_1st %>%
-  select(bacteria, fluoroquinolones()) %>%
-  bug_drug_combinations()
+
+data_1st %>% 
+  select(bacteria, fluoroquinolones()) %>% 
+  bug_drug_combinations()
+
# Selecting fluoroquinolones: `CIP` (ciprofloxacin)
 # NOTE: Using column `bacteria` as input for `col_mo`.
@@ -1060,34 +1110,34 @@ Longest: 24

- + - - + + - + - - + + - + - - + + - + - - + +
E. coli CIP60445959 01925796918697828
K. pneumoniae CIP11821237 036015424051642
S. aureus CIP30103006 093739479193925
S. pneumoniae CIP17771801 053423115982399
@@ -1098,12 +1148,16 @@ Longest: 24

Resistance percentages

The functions resistance() and susceptibility() can be used to calculate antimicrobial resistance or susceptibility. For more specific analyses, the functions proportion_S(), proportion_SI(), proportion_I(), proportion_IR() and proportion_R() can be used to determine the proportion of a specific antimicrobial outcome.

As per the EUCAST guideline of 2019, we calculate resistance as the proportion of R (proportion_R(), equal to resistance()) and susceptibility as the proportion of S and I (proportion_SI(), equal to susceptibility()). These functions can be used on their own:

-
data_1st %>% resistance(AMX)
-# [1] 0.5378908
+
+data_1st %>% resistance(AMX)
+# [1] 0.5372293
+

Or can be used in conjuction with group_by() and summarise(), both from the dplyr package:

-
data_1st %>%
-  group_by(hospital) %>%
-  summarise(amoxicillin = resistance(AMX))
+
+data_1st %>% 
+  group_by(hospital) %>% 
+  summarise(amoxicillin = resistance(AMX))
+
# `summarise()` ungrouping output (override with `.groups` argument)
@@ -1113,27 +1167,29 @@ Longest: 24

- + - + - + - +
Hospital A0.53222410.5349377
Hospital B0.53246280.5382268
Hospital C0.53885350.5455691
Hospital D0.55520300.5325387

Of course it would be very convenient to know the number of isolates responsible for the percentages. For that purpose the n_rsi() can be used, which works exactly like n_distinct() from the dplyr package. It counts all isolates available for every group (i.e. values S, I or R):

-
data_1st %>%
-  group_by(hospital) %>%
-  summarise(amoxicillin = resistance(AMX),
-            available = n_rsi(AMX))
+
+data_1st %>% 
+  group_by(hospital) %>% 
+  summarise(amoxicillin = resistance(AMX),
+            available = n_rsi(AMX))
+
# `summarise()` ungrouping output (override with `.groups` argument)
@@ -1144,32 +1200,34 @@ Longest: 24

- - + + - - + + - - + + - - + +
Hospital A0.532224147480.53493774737
Hospital B0.532462855140.53822685572
Hospital C0.538853523550.54556912381
Hospital D0.555203031520.53253873104

These functions can also be used to get the proportion of multiple antibiotics, to calculate empiric susceptibility of combination therapies very easily:

-
data_1st %>%
-  group_by(genus) %>%
-  summarise(amoxiclav = susceptibility(AMC),
-            gentamicin = susceptibility(GEN),
-            amoxiclav_genta = susceptibility(AMC, GEN))
+
+data_1st %>% 
+  group_by(genus) %>% 
+  summarise(amoxiclav = susceptibility(AMC),
+            gentamicin = susceptibility(GEN),
+            amoxiclav_genta = susceptibility(AMC, GEN))
+
# `summarise()` ungrouping output (override with `.groups` argument)
@@ -1181,96 +1239,106 @@ Longest: 24

- - - + + + - - - + + + - - - + + + - + - +
Escherichia0.82180950.89710130.98582010.82013290.90355140.9853091
Klebsiella0.82879380.89494160.98832680.82947620.90255790.9829476
Staphylococcus0.81986320.91917910.98631870.82191080.91414010.9844586
Streptococcus0.53742970.5414756 0.00000000.53742970.5414756

To make a transition to the next part, let’s see how this difference could be plotted:

-
data_1st %>%
-  group_by(genus) %>%
-  summarise("1. Amoxi/clav" = susceptibility(AMC),
-            "2. Gentamicin" = susceptibility(GEN),
-            "3. Amoxi/clav + genta" = susceptibility(AMC, GEN)) %>%
+
+data_1st %>% 
+  group_by(genus) %>% 
+  summarise("1. Amoxi/clav" = susceptibility(AMC),
+            "2. Gentamicin" = susceptibility(GEN),
+            "3. Amoxi/clav + genta" = susceptibility(AMC, GEN)) %>% 
   # pivot_longer() from the tidyr package "lengthens" data:
-  tidyr::pivot_longer(-genus, names_to = "antibiotic") %>%
-  ggplot(aes(x = genus,
-             y = value,
-             fill = antibiotic)) +
-  geom_col(position = "dodge2")
-# `summarise()` ungrouping output (override with `.groups` argument)
+ tidyr::pivot_longer(-genus, names_to = "antibiotic") %>% + ggplot(aes(x = genus, + y = value, + fill = antibiotic)) + + geom_col(position = "dodge2") +# `summarise()` ungrouping output (override with `.groups` argument) +

Plots

To show results in plots, most R users would nowadays use the ggplot2 package. This package lets you create plots in layers. You can read more about it on their website. A quick example would look like these syntaxes:

-
ggplot(data = a_data_set,
-       mapping = aes(x = year,
-                     y = value)) +
-  geom_col() +
-  labs(title = "A title",
-       subtitle = "A subtitle",
-       x = "My X axis",
-       y = "My Y axis")
+
+ggplot(data = a_data_set,
+       mapping = aes(x = year,
+                     y = value)) +
+  geom_col() +
+  labs(title = "A title",
+       subtitle = "A subtitle",
+       x = "My X axis",
+       y = "My Y axis")
 
 # or as short as:
-ggplot(a_data_set) +
-  geom_bar(aes(year))
+ggplot(a_data_set) + + geom_bar(aes(year)) +

The AMR package contains functions to extend this ggplot2 package, for example geom_rsi(). It automatically transforms data with count_df() or proportion_df() and show results in stacked bars. Its simplest and shortest example:

-
ggplot(data_1st) +
-  geom_rsi(translate_ab = FALSE)
+
+ggplot(data_1st) +
+  geom_rsi(translate_ab = FALSE)
+

Omit the translate_ab = FALSE to have the antibiotic codes (AMX, AMC, CIP, GEN) translated to official WHO names (amoxicillin, amoxicillin/clavulanic acid, ciprofloxacin, gentamicin).

If we group on e.g. the genus column and add some additional functions from our package, we can create this:

-
# group the data on `genus`
-ggplot(data_1st %>% group_by(genus)) +
+
+# group the data on `genus`
+ggplot(data_1st %>% group_by(genus)) + 
   # create bars with genus on x axis
   # it looks for variables with class `rsi`,
   # of which we have 4 (earlier created with `as.rsi`)
-  geom_rsi(x = "genus") +
+  geom_rsi(x = "genus") + 
   # split plots on antibiotic
-  facet_rsi(facet = "antibiotic") +
+  facet_rsi(facet = "antibiotic") +
   # set colours to the R/SI interpretations
-  scale_rsi_colours() +
+  scale_rsi_colours() +
   # show percentages on y axis
-  scale_y_percent(breaks = 0:4 * 25) +
+  scale_y_percent(breaks = 0:4 * 25) +
   # turn 90 degrees, to make it bars instead of columns
-  coord_flip() +
+  coord_flip() +
   # add labels
-  labs(title = "Resistance per genus and antibiotic",
-       subtitle = "(this is fake data)") +
+  labs(title = "Resistance per genus and antibiotic", 
+       subtitle = "(this is fake data)") +
   # and print genus in italic to follow our convention
   # (is now y axis because we turned the plot)
-  theme(axis.text.y = element_text(face = "italic"))
+ theme(axis.text.y = element_text(face = "italic")) +

To simplify this, we also created the ggplot_rsi() function, which combines almost all above functions:

-
data_1st %>%
-  group_by(genus) %>%
-  ggplot_rsi(x = "genus",
-             facet = "antibiotic",
-             breaks = 0:4 * 25,
-             datalabels = FALSE) +
-  coord_flip()
+
+data_1st %>% 
+  group_by(genus) %>%
+  ggplot_rsi(x = "genus",
+             facet = "antibiotic",
+             breaks = 0:4 * 25,
+             datalabels = FALSE) +
+  coord_flip()
+

@@ -1278,26 +1346,29 @@ Longest: 24

Independence test

The next example uses the example_isolates data set. This is a data set included with this package and contains 2,000 microbial isolates with their full antibiograms. It reflects reality and can be used to practice AMR analysis.

We will compare the resistance to fosfomycin (column FOS) in hospital A and D. The input for the fisher.test() can be retrieved with a transformation like this:

-
# use package 'tidyr' to pivot data:
-library(tidyr)
+
+# use package 'tidyr' to pivot data:
+library(tidyr)
 
-check_FOS <- example_isolates %>%
-  filter(hospital_id %in% c("A", "D")) %>% # filter on only hospitals A and D
-  select(hospital_id, FOS) %>%             # select the hospitals and fosfomycin
-  group_by(hospital_id) %>%                # group on the hospitals
-  count_df(combine_SI = TRUE) %>%          # count all isolates per group (hospital_id)
-  pivot_wider(names_from = hospital_id,    # transform output so A and D are columns
-              values_from = value) %>%
-  select(A, D) %>%                         # and only select these columns
+check_FOS <- example_isolates %>%
+  filter(hospital_id %in% c("A", "D")) %>% # filter on only hospitals A and D
+  select(hospital_id, FOS) %>%             # select the hospitals and fosfomycin
+  group_by(hospital_id) %>%                # group on the hospitals
+  count_df(combine_SI = TRUE) %>%          # count all isolates per group (hospital_id)
+  pivot_wider(names_from = hospital_id,    # transform output so A and D are columns
+              values_from = value) %>%     
+  select(A, D) %>%                         # and only select these columns
   as.matrix()                              # transform to a good old matrix for fisher.test()
 
-check_FOS
+check_FOS
 #       A  D
 # [1,] 25 77
-# [2,] 24 33
+# [2,] 24 33 +

We can apply the test now with:

-
# do Fisher's Exact Test
-fisher.test(check_FOS)
+
+# do Fisher's Exact Test
+fisher.test(check_FOS)                            
 # 
 #   Fisher's Exact Test for Count Data
 # 
@@ -1308,7 +1379,8 @@ Longest: 24

# 0.2111489 0.9485124 # sample estimates: # odds ratio -# 0.4488318
+# 0.4488318 +

As can be seen, the p value is 0.031, which means that the fosfomycin resistance found in isolates from patients in hospital A and D are really different.

@@ -1329,7 +1401,7 @@ Longest: 24

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.5.1.9000.

diff --git a/docs/articles/AMR_files/figure-html/plot 1-1.png b/docs/articles/AMR_files/figure-html/plot 1-1.png index 9bba859b7..1fe308839 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 1-1.png and b/docs/articles/AMR_files/figure-html/plot 1-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 3-1.png b/docs/articles/AMR_files/figure-html/plot 3-1.png index ca339004c..cc98996c1 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 3-1.png and b/docs/articles/AMR_files/figure-html/plot 3-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 4-1.png b/docs/articles/AMR_files/figure-html/plot 4-1.png index 2261caed3..085d67091 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 4-1.png and b/docs/articles/AMR_files/figure-html/plot 4-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 5-1.png b/docs/articles/AMR_files/figure-html/plot 5-1.png index cda73c9ec..70fefce0c 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 5-1.png and b/docs/articles/AMR_files/figure-html/plot 5-1.png differ diff --git a/docs/articles/EUCAST.html b/docs/articles/EUCAST.html index 9b8b2e08a..cc32fbd59 100644 --- a/docs/articles/EUCAST.html +++ b/docs/articles/EUCAST.html @@ -39,7 +39,7 @@ AMR (for R) - 1.3.0 + 1.3.0.9001 @@ -186,7 +186,7 @@

How to apply EUCAST rules

Matthijs S. Berends

-

30 July 2020

+

10 August 2020

Source: vignettes/EUCAST.Rmd @@ -209,33 +209,39 @@ Examples

These rules can be used to discard impossible bug-drug combinations in your data. For example, Klebsiella produces beta-lactamase that prevents ampicillin (or amoxicillin) from working against it. In other words, practically every strain of Klebsiella is resistant to ampicillin.

Sometimes, laboratory data can still contain such strains with ampicillin being susceptible to ampicillin. This could be because an antibiogram is available before an identification is available, and the antibiogram is then not re-interpreted based on the identification (namely, Klebsiella). EUCAST expert rules solve this, that can be applied using eucast_rules():

-
oops <- data.frame(mo = c("Klebsiella",
+
+oops <- data.frame(mo = c("Klebsiella", 
                           "Escherichia"),
-                   ampicillin = "S")
-oops
+                   ampicillin = "S")
+oops
 #            mo ampicillin
 # 1  Klebsiella          S
 # 2 Escherichia          S
 
-eucast_rules(oops, info = FALSE)
+eucast_rules(oops, info = FALSE)
 #            mo ampicillin
 # 1  Klebsiella          R
-# 2 Escherichia          S
+# 2 Escherichia S +

EUCAST rules can not only be used for correction, they can also be used for filling in known resistance and susceptibility based on results of other antimicrobials drugs. This process is called interpretive reading and is part of the eucast_rules() function as well:

-
data <- data.frame(mo = c("Staphylococcus aureus",
+
+data <- data.frame(mo = c("Staphylococcus aureus",
                           "Enterococcus faecalis",
                           "Escherichia coli",
                           "Klebsiella pneumoniae",
                           "Pseudomonas aeruginosa"),
-                   VAN = "-",       # Vancomycin
-                   AMX = "-",       # Amoxicillin
-                   COL = "-",       # Colistin
-                   CAZ = "-",       # Ceftazidime
-                   CXM = "-",       # Cefuroxime
-                   PEN = "S",       # Penicillin G
-                   FOX = "S",       # Cefoxitin
-                   stringsAsFactors = FALSE)
-
data
+ VAN = "-", # Vancomycin + AMX = "-", # Amoxicillin + COL = "-", # Colistin + CAZ = "-", # Ceftazidime + CXM = "-", # Cefuroxime + PEN = "S", # Penicillin G + FOX = "S", # Cefoxitin + stringsAsFactors = FALSE) +
+
+data
+
@@ -300,7 +306,9 @@
mo
-
eucast_rules(data)
+
+eucast_rules(data)
+
# Warning: Not all columns with antimicrobial results are of class <rsi>.
 # Transform eligible columns to class <rsi> on beforehand: your_data %>% mutate_if(is.rsi.eligible, as.rsi)
@@ -385,7 +393,7 @@
-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.5.1.9000.

diff --git a/docs/articles/MDR.html b/docs/articles/MDR.html index 689d6a41f..c7008e5f5 100644 --- a/docs/articles/MDR.html +++ b/docs/articles/MDR.html @@ -39,7 +39,7 @@ AMR (for R) - 1.3.0 + 1.3.0.9001 @@ -186,7 +186,7 @@

How to determine multi-drug resistance (MDR)

Matthijs S. Berends

-

30 July 2020

+

10 August 2020

Source: vignettes/MDR.Rmd @@ -234,16 +234,20 @@ Examples

The mdro() function always returns an ordered factor. For example, the output of the default guideline by Magiorakos et al. returns a factor with levels ‘Negative’, ‘MDR’, ‘XDR’ or ‘PDR’ in that order.

The next example uses the example_isolates data set. This is a data set included with this package and contains 2,000 microbial isolates with their full antibiograms. It reflects reality and can be used to practice AMR analysis. If we test the MDR/XDR/PDR guideline on this data set, we get:

-
library(dplyr)   # to support pipes: %>%
-library(cleaner) # to create frequency tables
-
example_isolates %>%
-  mdro() %>%
+
+library(dplyr)   # to support pipes: %>%
+library(cleaner) # to create frequency tables
+
+
+example_isolates %>% 
+  mdro() %>% 
   freq() # show frequency table of the result
 # NOTE: Using column `mo` as input for `col_mo`.
 # NOTE: Auto-guessing columns suitable for analysis...OK.
 # NOTE: Reliability would be improved if these antimicrobial results would be available too: ceftaroline (CPT), fusidic acid (FUS), telavancin (TLV), daptomycin (DAP), quinupristin/dalfopristin (QDA), minocycline (MNO), gentamicin-high (GEH), streptomycin-high (STH), doripenem (DOR), levofloxacin (LVX), netilmicin (NET), ticarcillin/clavulanic acid (TCC), ertapenem (ETP), cefotetan (CTT), aztreonam (ATM), ampicillin/sulbactam (SAM), polymyxin B (PLB)
 # Warning in mdro(.): NA introduced for isolates where the available percentage of
-# antimicrobial classes was below 50% (set with `pct_required_classes`)
+# antimicrobial classes was below 50% (set with `pct_required_classes`) +

Frequency table

Class: factor > ordered (numeric)
Length: 2,000
@@ -279,55 +283,67 @@ Unique: 2

For another example, I will create a data set to determine multi-drug resistant TB:

-
# a helper function to get a random vector with values S, I and R
+
+# a helper function to get a random vector with values S, I and R
 # with the probabilities 50% - 10% - 40%
-sample_rsi <- function() {
+sample_rsi <- function() {
   sample(c("S", "I", "R"),
-         size = 5000,
-         prob = c(0.5, 0.1, 0.4),
-         replace = TRUE)
+         size = 5000,
+         prob = c(0.5, 0.1, 0.4),
+         replace = TRUE)
 }
 
-my_TB_data <- data.frame(rifampicin = sample_rsi(),
-                         isoniazid = sample_rsi(),
-                         gatifloxacin = sample_rsi(),
-                         ethambutol = sample_rsi(),
-                         pyrazinamide = sample_rsi(),
-                         moxifloxacin = sample_rsi(),
-                         kanamycin = sample_rsi())
+my_TB_data <- data.frame(rifampicin = sample_rsi(), + isoniazid = sample_rsi(), + gatifloxacin = sample_rsi(), + ethambutol = sample_rsi(), + pyrazinamide = sample_rsi(), + moxifloxacin = sample_rsi(), + kanamycin = sample_rsi()) +

Because all column names are automatically verified for valid drug names or codes, this would have worked exactly the same:

-
my_TB_data <- data.frame(RIF = sample_rsi(),
-                         INH = sample_rsi(),
-                         GAT = sample_rsi(),
-                         ETH = sample_rsi(),
-                         PZA = sample_rsi(),
-                         MFX = sample_rsi(),
-                         KAN = sample_rsi())
+
+my_TB_data <- data.frame(RIF = sample_rsi(),
+                         INH = sample_rsi(),
+                         GAT = sample_rsi(),
+                         ETH = sample_rsi(),
+                         PZA = sample_rsi(),
+                         MFX = sample_rsi(),
+                         KAN = sample_rsi())
+

The data set now looks like this:

-
head(my_TB_data)
+
+head(my_TB_data)
 #   rifampicin isoniazid gatifloxacin ethambutol pyrazinamide moxifloxacin
-# 1          S         S            R          R            S            I
-# 2          R         S            R          R            R            S
-# 3          R         S            S          S            I            S
-# 4          S         S            I          S            R            S
-# 5          R         I            R          S            R            S
-# 6          S         S            S          S            R            R
+# 1          R         R            R          R            I            I
+# 2          S         S            S          R            S            R
+# 3          S         I            S          S            S            S
+# 4          S         I            S          R            R            R
+# 5          S         S            R          S            S            R
+# 6          S         R            S          R            R            R
 #   kanamycin
-# 1         I
-# 2         I
+# 1         S
+# 2         R
 # 3         S
-# 4         R
-# 5         R
-# 6         S
+# 4 S +# 5 S +# 6 R +

We can now add the interpretation of MDR-TB to our data set. You can use:

-
mdro(my_TB_data, guideline = "TB")
+
+mdro(my_TB_data, guideline = "TB")
+

or its shortcut mdr_tb():

-
my_TB_data$mdr <- mdr_tb(my_TB_data)
+
+my_TB_data$mdr <- mdr_tb(my_TB_data)
 # NOTE: No column found as input for `col_mo`, assuming all records contain Mycobacterium tuberculosis.
 # NOTE: Auto-guessing columns suitable for analysis...OK.
-# NOTE: Reliability would be improved if these antimicrobial results would be available too: capreomycin (CAP), rifabutin (RIB), rifapentine (RFP)
+# NOTE: Reliability would be improved if these antimicrobial results would be available too: capreomycin (CAP), rifabutin (RIB), rifapentine (RFP) +

Create a frequency table of the results:

-
freq(my_TB_data$mdr)
+
+freq(my_TB_data$mdr)
+

Frequency table

Class: factor > ordered (numeric)
Length: 5,000
@@ -347,40 +363,40 @@ Unique: 5

1 Mono-resistant -3215 -64.30% -3215 -64.30% +3229 +64.58% +3229 +64.58% 2 -Multi-drug-resistant -643 -12.86% -3858 -77.16% +Negative +674 +13.48% +3903 +78.06% 3 -Negative -637 -12.74% -4495 -89.90% +Multi-drug-resistant +616 +12.32% +4519 +90.38% 4 Poly-resistant -292 -5.84% -4787 -95.74% +285 +5.70% +4804 +96.08% 5 Extensively drug-resistant -213 -4.26% +196 +3.92% 5000 100.00% @@ -402,7 +418,7 @@ Unique: 5

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.5.1.9000.

diff --git a/docs/articles/PCA.html b/docs/articles/PCA.html index 4e5521cc6..9152e2128 100644 --- a/docs/articles/PCA.html +++ b/docs/articles/PCA.html @@ -39,7 +39,7 @@ AMR (for R) - 1.3.0 + 1.3.0.9001 @@ -186,7 +186,7 @@

How to conduct principal component analysis (PCA) for AMR

Matthijs S. Berends

-

30 July 2020

+

10 August 2020

Source: vignettes/PCA.Rmd @@ -204,9 +204,10 @@

Transforming

For PCA, we need to transform our AMR data first. This is what the example_isolates data set in this package looks like:

-
library(AMR)
-library(dplyr)
-glimpse(example_isolates)
+
+library(AMR)
+library(dplyr)
+glimpse(example_isolates)
 # Rows: 2,000
 # Columns: 49
 # $ date            <date> 2002-01-02, 2002-01-03, 2002-01-07, 2002-01-07, 2002…
@@ -257,16 +258,18 @@
 # $ CHL             <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
 # $ COL             <ord> NA, NA, R, R, R, R, R, R, R, R, R, R, NA, NA, NA, R, …
 # $ MUP             <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
-# $ RIF             <ord> R, R, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R, R, R…
+# $ RIF <ord> R, R, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R, R, R… +

Now to transform this to a data set with only resistance percentages per taxonomic order and genus:

-
resistance_data <- example_isolates %>%
-  group_by(order = mo_order(mo),       # group on anything, like order
-           genus = mo_genus(mo)) %>%   #  and genus as we do here
-  summarise_if(is.rsi, resistance) %>% # then get resistance of all drugs
-  select(order, genus, AMC, CXM, CTX,
-         CAZ, GEN, TOB, TMP, SXT)      # and select only relevant columns
+
+resistance_data <- example_isolates %>% 
+  group_by(order = mo_order(mo),       # group on anything, like order
+           genus = mo_genus(mo)) %>%   #  and genus as we do here
+  summarise_if(is.rsi, resistance) %>% # then get resistance of all drugs
+  select(order, genus, AMC, CXM, CTX, 
+         CAZ, GEN, TOB, TMP, SXT)      # and select only relevant columns
 
-head(resistance_data)
+head(resistance_data)
 # # A tibble: 6 x 10
 # # Groups:   order [2]
 #   order           genus            AMC   CXM   CTX   CAZ   GEN   TOB   TMP   SXT
@@ -276,35 +279,46 @@
 # 3 Actinomycetales Cutibacterium     NA    NA    NA    NA    NA    NA    NA    NA
 # 4 Actinomycetales Dermabacter       NA    NA    NA    NA    NA    NA    NA    NA
 # 5 Actinomycetales Micrococcus       NA    NA    NA    NA    NA    NA    NA    NA
-# 6 Actinomycetales Rothia            NA    NA    NA    NA    NA    NA    NA    NA
+# 6 Actinomycetales Rothia NA NA NA NA NA NA NA NA +

Perform principal component analysis

The new pca() function will automatically filter on rows that contain numeric values in all selected variables, so we now only need to do:

-
pca_result <- pca(resistance_data)
+
+pca_result <- pca(resistance_data)
 # NOTE: Columns selected for PCA: AMC CXM CTX CAZ GEN TOB TMP SXT.
-#       Total observations available: 7.
+# Total observations available: 7. +

The result can be reviewed with the good old summary() function:

-
summary(pca_result)
+
+summary(pca_result)
 # Importance of components:
 #                          PC1    PC2     PC3     PC4     PC5     PC6       PC7
 # Standard deviation     2.154 1.6809 0.61305 0.33882 0.20755 0.03137 1.602e-16
 # Proportion of Variance 0.580 0.3532 0.04698 0.01435 0.00538 0.00012 0.000e+00
-# Cumulative Proportion  0.580 0.9332 0.98014 0.99449 0.99988 1.00000 1.000e+00
+# Cumulative Proportion 0.580 0.9332 0.98014 0.99449 0.99988 1.00000 1.000e+00 +

Good news. The first two components explain a total of 93.3% of the variance (see the PC1 and PC2 values of the Proportion of Variance. We can create a so-called biplot with the base R biplot() function, to see which antimicrobial resistance per drug explain the difference per microorganism.

Plotting the results

-
biplot(pca_result)
+
+biplot(pca_result)
+

But we can’t see the explanation of the points. Perhaps this works better with our new ggplot_pca() function, that automatically adds the right labels and even groups:

-
ggplot_pca(pca_result)
+
+ggplot_pca(pca_result)
+

You can also print an ellipse per group, and edit the appearance:

-
ggplot_pca(pca_result, ellipse = TRUE) +
-  ggplot2::labs(title = "An AMR/PCA biplot!")
+
+ggplot_pca(pca_result, ellipse = TRUE) +
+  ggplot2::labs(title = "An AMR/PCA biplot!")
+

@@ -324,7 +338,7 @@
-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.5.1.9000.

diff --git a/docs/articles/SPSS.html b/docs/articles/SPSS.html index fed8f1109..0d27a4af5 100644 --- a/docs/articles/SPSS.html +++ b/docs/articles/SPSS.html @@ -39,7 +39,7 @@ AMR (for R) - 1.3.0 + 1.3.0.9001 @@ -186,7 +186,7 @@

How to import data from SPSS / SAS / Stata

Matthijs S. Berends

-

30 July 2020

+

10 August 2020

Source: vignettes/SPSS.Rmd @@ -240,7 +240,8 @@

To demonstrate the first point:

-
# not all values are valid MIC values:
+
+# not all values are valid MIC values:
 as.mic(0.125)
 # Class <mic>
 # [1] 0.125
@@ -253,13 +254,13 @@
 # [1] "Gram-negative"
 
 # Klebsiella is intrinsic resistant to amoxicllin, according to EUCAST:
-klebsiella_test <- data.frame(mo = "klebsiella",
-                              amox = "S",
-                              stringsAsFactors = FALSE)
-klebsiella_test # (our original data)
+klebsiella_test <- data.frame(mo = "klebsiella", 
+                              amox = "S",
+                              stringsAsFactors = FALSE)
+klebsiella_test # (our original data)
 #           mo amox
 # 1 klebsiella    S
-eucast_rules(klebsiella_test, info = FALSE) # (the edited data by EUCAST rules)
+eucast_rules(klebsiella_test, info = FALSE) # (the edited data by EUCAST rules)
 #           mo amox
 # 1 klebsiella    R
 
@@ -271,7 +272,8 @@
 # [4] "fluclox"              "flucloxacilina"       "flucloxacillin"      
 # [7] "flucloxacilline"      "flucloxacillinum"     "fluorochloroxacillin"
 ab_atc("floxapen")
-# [1] "J01CF05"
+# [1] "J01CF05" +

@@ -287,7 +289,8 @@

If you want named variables to be imported as factors so it resembles SPSS more, use as_factor().

The difference is this:

-
SPSS_data
+
+SPSS_data
 # # A tibble: 4,203 x 4
 #     v001 sex       status    statusage
 #    <dbl> <dbl+lbl> <dbl+lbl>     <dbl>
@@ -303,7 +306,7 @@
 # 10 10018 0         1              66.6
 # # … with 4,193 more rows
 
-as_factor(SPSS_data)
+as_factor(SPSS_data)
 # # A tibble: 4,203 x 4
 #     v001 sex    status statusage
 #    <dbl> <fct>  <fct>      <dbl>
@@ -317,67 +320,82 @@
 #  8 10011 Male   alive       73.1
 #  9 10017 Male   alive       56.7
 # 10 10018 Female alive       66.6
-# # … with 4,193 more rows
+# # … with 4,193 more rows +

Base R

To import data from SPSS, SAS or Stata, you can use the great haven package yourself:

-
# download and install the latest version:
+
+# download and install the latest version:
 install.packages("haven")
 # load the package you just installed:
-library(haven)
+library(haven) +

You can now import files as follows:

SPSS

To read files from SPSS into R:

-
# read any SPSS file based on file extension (best way):
-read_spss(file = "path/to/file")
+
+# read any SPSS file based on file extension (best way):
+read_spss(file = "path/to/file")
 
 # read .sav or .zsav file:
-read_sav(file = "path/to/file")
+read_sav(file = "path/to/file")
 
 # read .por file:
-read_por(file = "path/to/file")
+read_por(file = "path/to/file") +

Do not forget about as_factor(), as mentioned above.

To export your R objects to the SPSS file format:

-
# save as .sav file:
-write_sav(data = yourdata, path = "path/to/file")
+
+# save as .sav file:
+write_sav(data = yourdata, path = "path/to/file")
 
 # save as compressed .zsav file:
-write_sav(data = yourdata, path = "path/to/file", compress = TRUE)
+write_sav(data = yourdata, path = "path/to/file", compress = TRUE) +

SAS

To read files from SAS into R:

-
# read .sas7bdat + .sas7bcat files:
-read_sas(data_file = "path/to/file", catalog_file = NULL)
+
+# read .sas7bdat + .sas7bcat files:
+read_sas(data_file = "path/to/file", catalog_file = NULL)
 
 # read SAS transport files (version 5 and version 8):
-read_xpt(file = "path/to/file")
+read_xpt(file = "path/to/file") +

To export your R objects to the SAS file format:

-
# save as regular SAS file:
-write_sas(data = yourdata, path = "path/to/file")
+
+# save as regular SAS file:
+write_sas(data = yourdata, path = "path/to/file")
 
 # the SAS transport format is an open format 
 # (required for submission of the data to the FDA)
-write_xpt(data = yourdata, path = "path/to/file", version = 8)
+write_xpt(data = yourdata, path = "path/to/file", version = 8) +

Stata

To read files from Stata into R:

-
# read .dta file:
-read_stata(file = "/path/to/file")
+
+# read .dta file:
+read_stata(file = "/path/to/file")
 
 # works exactly the same:
-read_dta(file = "/path/to/file")
+read_dta(file = "/path/to/file") +

To export your R objects to the Stata file format:

-
# save as .dta file, Stata version 14:
+
+# save as .dta file, Stata version 14:
 # (supports Stata v8 until v15 at the time of writing)
-write_dta(data = yourdata, path = "/path/to/file", version = 14)
+write_dta(data = yourdata, path = "/path/to/file", version = 14) +
@@ -398,7 +416,7 @@
-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.5.1.9000.

diff --git a/docs/articles/WHONET.html b/docs/articles/WHONET.html index d5e1eccc3..74d6b2a6d 100644 --- a/docs/articles/WHONET.html +++ b/docs/articles/WHONET.html @@ -39,7 +39,7 @@ AMR (for R) - 1.3.0 + 1.3.0.9001 @@ -186,7 +186,7 @@

How to work with WHONET data

Matthijs S. Berends

-

30 July 2020

+

10 August 2020

Source: vignettes/WHONET.Rmd @@ -200,34 +200,42 @@ Import of data

This tutorial assumes you already imported the WHONET data with e.g. the readxl package. In RStudio, this can be done using the menu button ‘Import Dataset’ in the tab ‘Environment’. Choose the option ‘From Excel’ and select your exported file. Make sure date fields are imported correctly.

An example syntax could look like this:

-
library(readxl)
-data <- read_excel(path = "path/to/your/file.xlsx")
+
+library(readxl)
+data <- read_excel(path = "path/to/your/file.xlsx")
+

This package comes with an example data set WHONET. We will use it for this analysis.

Preparation

First, load the relevant packages if you did not yet did this. I use the tidyverse for all of my analyses. All of them. If you don’t know it yet, I suggest you read about it on their website: https://www.tidyverse.org/.

-
library(dplyr)   # part of tidyverse
-library(ggplot2) # part of tidyverse
-library(AMR)     # this package
-library(cleaner) # to create frequency tables
+
+library(dplyr)   # part of tidyverse
+library(ggplot2) # part of tidyverse
+library(AMR)     # this package
+library(cleaner) # to create frequency tables
+

We will have to transform some variables to simplify and automate the analysis:

-
# transform variables
-data <- WHONET %>%
+
+# transform variables
+data <- WHONET %>%
   # get microbial ID based on given organism
-  mutate(mo = as.mo(Organism)) %>%
+  mutate(mo = as.mo(Organism)) %>% 
   # transform everything from "AMP_ND10" to "CIP_EE" to the new `rsi` class
-  mutate_at(vars(AMP_ND10:CIP_EE), as.rsi)
+ mutate_at(vars(AMP_ND10:CIP_EE), as.rsi) +

No errors or warnings, so all values are transformed succesfully.

We also created a package dedicated to data cleaning and checking, called the cleaner package. Its freq() function can be used to create frequency tables.

So let’s check our data, with a couple of frequency tables:

-
# our newly created `mo` variable, put in the mo_name() function
-data %>% freq(mo_name(mo), nmax = 10)
+
+# our newly created `mo` variable, put in the mo_name() function
+data %>% freq(mo_name(mo), nmax = 10)
+

Frequency table

Class: character
Length: 500
@@ -328,9 +336,11 @@ Longest: 40

(omitted 27 entries, n = 56 [11.20%])

-
# our transformed antibiotic columns
+
+# our transformed antibiotic columns
 # amoxicillin/clavulanic acid (J01CR02) as an example
-data %>% freq(AMC_ND2)
+data %>% freq(AMC_ND2) +

Frequency table

Class: factor > ordered > rsi (numeric)
Length: 500
@@ -378,10 +388,12 @@ Unique: 3

A first glimpse at results

An easy ggplot will already give a lot of information, using the included ggplot_rsi() function:

-
data %>%
-  group_by(Country) %>%
-  select(Country, AMP_ND2, AMC_ED20, CAZ_ED10, CIP_ED5) %>%
-  ggplot_rsi(translate_ab = 'ab', facet = "Country", datalabels = FALSE)
+
+data %>%
+  group_by(Country) %>%
+  select(Country, AMP_ND2, AMC_ED20, CAZ_ED10, CIP_ED5) %>%
+  ggplot_rsi(translate_ab = 'ab', facet = "Country", datalabels = FALSE)
+

@@ -399,7 +411,7 @@ Unique: 3

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.5.1.9000.

diff --git a/docs/articles/benchmarks.html b/docs/articles/benchmarks.html index 05445cfe1..c9752ac77 100644 --- a/docs/articles/benchmarks.html +++ b/docs/articles/benchmarks.html @@ -39,7 +39,7 @@ AMR (for R) - 1.3.0 + 1.3.0.9001 @@ -186,7 +186,7 @@

Benchmarks

Matthijs S. Berends

-

30 July 2020

+

10 August 2020

Source: vignettes/benchmarks.Rmd @@ -197,13 +197,16 @@

One of the most important features of this package is the complete microbial taxonomic database, supplied by the Catalogue of Life. We created a function as.mo() that transforms any user input value to a valid microbial ID by using intelligent rules combined with the taxonomic tree of Catalogue of Life.

Using the microbenchmark package, we can review the calculation performance of this function. Its function microbenchmark() runs different input expressions independently of each other and measures their time-to-result.

-
microbenchmark <- microbenchmark::microbenchmark
-library(AMR)
-library(dplyr)
+
+microbenchmark <- microbenchmark::microbenchmark
+library(AMR)
+library(dplyr)
+

In the next test, we try to ‘coerce’ different input values into the microbial code of Staphylococcus aureus. Coercion is a computational process of forcing output based on an input. For microorganism names, coercing user input to taxonomically valid microorganism names is crucial to ensure correct interpretation and to enable grouping based on taxonomic properties.

The actual result is the same every time: it returns its microorganism code B_STPHY_AURS (B stands for Bacteria, the taxonomic kingdom).

But the calculation time differs a lot:

-
S.aureus <- microbenchmark(
+
+S.aureus <- microbenchmark(
   as.mo("sau"), # WHONET code
   as.mo("stau"),
   as.mo("STAU"),
@@ -218,47 +221,50 @@
   as.mo("VISA"), # Vancomycin Intermediate S. aureus
   as.mo("VRSA"), # Vancomycin Resistant S. aureus
   as.mo(22242419), # Catalogue of Life ID
-  times = 10)
-print(S.aureus, unit = "ms", signif = 2)
+  times = 10)
+print(S.aureus, unit = "ms", signif = 2)
 # Unit: milliseconds
-#                                   expr min  lq mean median  uq  max neval
-#                           as.mo("sau")  11  12   17     13  15   51    10
-#                          as.mo("stau") 150 160  170    170 190  200    10
-#                          as.mo("STAU") 160 160  180    190 190  210    10
-#                        as.mo("staaur")  12  13   23     15  20   68    10
-#                        as.mo("STAAUR")  11  12   20     16  18   44    10
-#                     as.mo("S. aureus")  11  13   29     17  44   84    10
-#                      as.mo("S aureus")  11  15   21     16  18   46    10
-#         as.mo("Staphylococcus aureus")  11  13   16     13  15   41    10
-#  as.mo("Staphylococcus aureus (MRSA)") 870 890  920    900 950 1100    10
-#       as.mo("Sthafilokkockus aaureuz") 400 410  430    440 450  490    10
-#                          as.mo("MRSA")  13  13   17     14  16   40    10
-#                          as.mo("VISA")  14  17   25     19  36   46    10
-#                          as.mo("VRSA")  13  15   21     17  21   50    10
-#                        as.mo(22242419) 130 140  150    150 150  180    10
+# expr min lq mean median uq max neval +# as.mo("sau") 11.0 14 21 15 16 51 10 +# as.mo("stau") 170.0 170 190 190 210 240 10 +# as.mo("STAU") 160.0 170 180 180 200 210 10 +# as.mo("staaur") 11.0 13 19 14 18 48 10 +# as.mo("STAAUR") 11.0 13 22 17 37 40 10 +# as.mo("S. aureus") 15.0 15 24 17 26 56 10 +# as.mo("S aureus") 12.0 13 21 16 23 49 10 +# as.mo("Staphylococcus aureus") 9.8 13 21 14 15 65 10 +# as.mo("Staphylococcus aureus (MRSA)") 960.0 960 1100 980 1100 1400 10 +# as.mo("Sthafilokkockus aaureuz") 440.0 450 480 470 480 570 10 +# as.mo("MRSA") 12.0 14 22 15 17 86 10 +# as.mo("VISA") 15.0 18 25 19 40 42 10 +# as.mo("VRSA") 14.0 15 30 22 44 69 10 +# as.mo(22242419) 130.0 150 160 170 180 190 10 +

In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 5 milliseconds means it can determine 200 input values per second. It case of 100 milliseconds, this is only 10 input values per second.

To achieve this speed, the as.mo function also takes into account the prevalence of human pathogenic microorganisms. The downside of this is of course that less prevalent microorganisms will be determined less fast. See this example for the ID of Methanosarcina semesiae (B_MTHNSR_SEMS), a bug probably never found before in humans:

-
M.semesiae <- microbenchmark(as.mo("metsem"),
+
+M.semesiae <- microbenchmark(as.mo("metsem"),
                              as.mo("METSEM"),
                              as.mo("M. semesiae"),
                              as.mo("M.  semesiae"),
                              as.mo("Methanosarcina semesiae"),
-                             times = 10)
-print(M.semesiae, unit = "ms", signif = 4)
+                             times = 10)
+print(M.semesiae, unit = "ms", signif = 4)
 # Unit: milliseconds
-#                              expr     min      lq   mean median     uq    max
-#                   as.mo("metsem") 176.800 179.200 189.20 185.90 194.00 212.60
-#                   as.mo("METSEM") 164.400 170.800 193.00 188.20 211.10 243.00
-#              as.mo("M. semesiae")  10.950  11.310  19.92  15.41  18.79  50.84
-#             as.mo("M.  semesiae")  11.560  11.860  17.66  14.15  16.96  50.76
-#  as.mo("Methanosarcina semesiae")   9.408   9.669  18.03  14.12  15.24  42.57
+#                              expr     min     lq   mean median     uq    max
+#                   as.mo("metsem") 186.900 192.90 204.70 199.10 207.70 251.20
+#                   as.mo("METSEM") 175.500 199.70 215.20 218.20 232.00 240.40
+#              as.mo("M. semesiae")  11.500  13.29  16.47  13.85  16.84  36.90
+#             as.mo("M.  semesiae")  11.690  11.94  16.81  14.40  15.75  42.76
+#  as.mo("Methanosarcina semesiae")   9.688  10.28  14.55  11.99  13.72  39.41
 #  neval
 #     10
 #     10
 #     10
 #     10
-#     10
+# 10 +

Looking up arbitrary codes of less prevalent microorganisms costs the most time. Full names (like Methanosarcina semesiae) are always very fast and only take some thousands of seconds to coerce - they are the most probable input from most data sets.

In the figure below, we compare Escherichia coli (which is very common) with Prevotella brevis (which is moderately common) and with Methanosarcina semesiae (which is uncommon):

@@ -267,102 +273,110 @@

Repetitive results

Repetitive results are unique values that are present more than once. Unique values will only be calculated once by as.mo(). We will use mo_name() for this test - a helper function that returns the full microbial name (genus, species and possibly subspecies) which uses as.mo() internally.

-
# take all MO codes from the example_isolates data set
-x <- example_isolates$mo %>%
+
+# take all MO codes from the example_isolates data set
+x <- example_isolates$mo %>%
   # keep only the unique ones
-  unique() %>%
+  unique() %>%
   # pick 50 of them at random
-  sample(50) %>%
+  sample(50) %>%
   # paste that 10,000 times
-  rep(10000) %>%
+  rep(10000) %>%
   # scramble it
   sample()
-
+  
 # got indeed 50 times 10,000 = half a million?
-length(x)
+length(x)
 # [1] 500000
 
 # and how many unique values do we have?
-n_distinct(x)
+n_distinct(x)
 # [1] 50
 
 # now let's see:
-run_it <- microbenchmark(mo_name(x),
-                         times = 10)
-print(run_it, unit = "ms", signif = 3)
+run_it <- microbenchmark(mo_name(x),
+                         times = 10)
+print(run_it, unit = "ms", signif = 3)
 # Unit: milliseconds
 #        expr  min   lq mean median   uq  max neval
-#  mo_name(x) 1720 1760 1820   1800 1830 1990    10
-

So transforming 500,000 values (!!) of 50 unique values only takes 1.8 seconds. You only lose time on your unique input values.

+# mo_name(x) 1840 1870 1950 1940 1980 2140 10 +
+

So transforming 500,000 values (!!) of 50 unique values only takes 1.94 seconds. You only lose time on your unique input values.

Precalculated results

What about precalculated results? If the input is an already precalculated result of a helper function like mo_name(), it almost doesn’t take any time at all (see ‘C’ below):

-
run_it <- microbenchmark(A = mo_name("B_STPHY_AURS"),
-                         B = mo_name("S. aureus"),
-                         C = mo_name("Staphylococcus aureus"),
-                         times = 10)
-print(run_it, unit = "ms", signif = 3)
+
+run_it <- microbenchmark(A = mo_name("B_STPHY_AURS"),
+                         B = mo_name("S. aureus"),
+                         C = mo_name("Staphylococcus aureus"),
+                         times = 10)
+print(run_it, unit = "ms", signif = 3)
 # Unit: milliseconds
 #  expr   min    lq  mean median    uq   max neval
-#     A  8.16  8.35  9.06   8.97  9.75 10.20    10
-#     B 10.50 10.60 15.50  12.20 12.80 49.90    10
-#     C  1.04  1.15  1.21   1.19  1.27  1.53    10
-

So going from mo_name("Staphylococcus aureus") to "Staphylococcus aureus" takes 0.0012 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:

-
run_it <- microbenchmark(A = mo_species("aureus"),
-                         B = mo_genus("Staphylococcus"),
-                         C = mo_name("Staphylococcus aureus"),
-                         D = mo_family("Staphylococcaceae"),
-                         E = mo_order("Bacillales"),
-                         F = mo_class("Bacilli"),
-                         G = mo_phylum("Firmicutes"),
-                         H = mo_kingdom("Bacteria"),
-                         times = 10)
-print(run_it, unit = "ms", signif = 3)
+#     A  8.17  8.49  9.32   9.32  9.90 10.90    10
+#     B 10.90 11.80 16.30  13.20 14.70 45.60    10
+#     C  1.06  1.22  1.32   1.28  1.44  1.57    10
+
+

So going from mo_name("Staphylococcus aureus") to "Staphylococcus aureus" takes 0.0013 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:

+
+run_it <- microbenchmark(A = mo_species("aureus"),
+                         B = mo_genus("Staphylococcus"),
+                         C = mo_name("Staphylococcus aureus"),
+                         D = mo_family("Staphylococcaceae"),
+                         E = mo_order("Bacillales"),
+                         F = mo_class("Bacilli"),
+                         G = mo_phylum("Firmicutes"),
+                         H = mo_kingdom("Bacteria"),
+                         times = 10)
+print(run_it, unit = "ms", signif = 3)
 # Unit: milliseconds
 #  expr   min    lq mean median   uq  max neval
-#     A 0.948 0.971 1.14  1.020 1.39 1.52    10
-#     B 0.968 1.040 1.22  1.190 1.41 1.56    10
-#     C 0.979 1.020 1.31  1.260 1.58 1.66    10
-#     D 0.964 1.010 1.24  1.190 1.45 1.83    10
-#     E 0.977 0.995 1.15  1.030 1.40 1.45    10
-#     F 0.878 0.982 1.11  1.010 1.37 1.43    10
-#     G 0.929 0.961 1.18  1.000 1.43 1.58    10
-#     H 0.901 0.967 1.09  0.998 1.35 1.40    10
+# A 1.020 1.030 1.11 1.060 1.22 1.33 10 +# B 0.982 1.010 1.10 1.040 1.21 1.38 10 +# C 0.992 1.020 1.13 1.040 1.24 1.58 10 +# D 0.987 1.000 1.07 1.030 1.08 1.29 10 +# E 0.978 0.982 1.02 0.999 1.03 1.15 10 +# F 0.975 0.992 1.05 1.000 1.03 1.26 10 +# G 0.976 0.983 1.02 0.994 1.03 1.22 10 +# H 0.977 1.010 1.11 1.090 1.21 1.28 10 +

Of course, when running mo_phylum("Firmicutes") the function has zero knowledge about the actual microorganism, namely S. aureus. But since the result would be "Firmicutes" anyway, there is no point in calculating the result. And because this package ‘knows’ all phyla of all known bacteria (according to the Catalogue of Life), it can just return the initial value immediately.

Results in other languages

When the system language is non-English and supported by this AMR package, some functions will have a translated result. This almost does’t take extra time:

-
mo_name("CoNS", language = "en") # or just mo_name("CoNS") on an English system
+
+mo_name("CoNS", language = "en") # or just mo_name("CoNS") on an English system
 # [1] "Coagulase-negative Staphylococcus (CoNS)"
 
-mo_name("CoNS", language = "es") # or just mo_name("CoNS") on a Spanish system
+mo_name("CoNS", language = "es") # or just mo_name("CoNS") on a Spanish system
 # [1] "Staphylococcus coagulasa negativo (SCN)"
 
-mo_name("CoNS", language = "nl") # or just mo_name("CoNS") on a Dutch system
+mo_name("CoNS", language = "nl") # or just mo_name("CoNS") on a Dutch system
 # [1] "Coagulase-negatieve Staphylococcus (CNS)"
 
-run_it <- microbenchmark(en = mo_name("CoNS", language = "en"),
-                         de = mo_name("CoNS", language = "de"),
-                         nl = mo_name("CoNS", language = "nl"),
-                         es = mo_name("CoNS", language = "es"),
-                         it = mo_name("CoNS", language = "it"),
-                         fr = mo_name("CoNS", language = "fr"),
-                         pt = mo_name("CoNS", language = "pt"),
-                         times = 100)
-print(run_it, unit = "ms", signif = 4)
+run_it <- microbenchmark(en = mo_name("CoNS", language = "en"),
+                         de = mo_name("CoNS", language = "de"),
+                         nl = mo_name("CoNS", language = "nl"),
+                         es = mo_name("CoNS", language = "es"),
+                         it = mo_name("CoNS", language = "it"),
+                         fr = mo_name("CoNS", language = "fr"),
+                         pt = mo_name("CoNS", language = "pt"),
+                         times = 100)
+print(run_it, unit = "ms", signif = 4)
 # Unit: milliseconds
-#  expr   min    lq  mean median    uq    max neval
-#    en 12.09 12.46 15.90  13.86 14.55  57.62   100
-#    de 12.92 13.26 19.73  14.63 16.01  61.55   100
-#    nl 16.53 17.00 20.26  17.64 19.93  57.54   100
-#    es 12.98 13.28 18.27  14.76 15.64 179.30   100
-#    it 12.92 13.15 19.20  14.08 16.08  64.07   100
-#    fr 12.99 13.21 17.81  13.59 15.71  67.97   100
-#    pt 13.00 13.23 17.30  14.35 15.65  69.85   100
+# expr min lq mean median uq max neval +# en 12.40 14.34 17.88 14.89 15.48 55.22 100 +# de 13.17 14.30 17.90 15.84 16.66 56.60 100 +# nl 17.14 19.86 24.99 20.78 21.70 64.66 100 +# es 13.43 15.29 17.65 15.93 16.59 54.38 100 +# it 13.33 14.83 18.35 15.68 16.36 57.61 100 +# fr 13.40 15.43 18.66 16.01 16.59 54.35 100 +# pt 13.47 15.33 18.93 16.15 16.84 57.28 100 +

Currently supported are German, Dutch, Spanish, Italian, French and Portuguese.

@@ -380,7 +394,7 @@
-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.5.1.9000.

diff --git a/docs/articles/benchmarks_files/figure-html/unnamed-chunk-4-1.png b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-4-1.png index 5e394fd22..d8682b8f2 100644 Binary files a/docs/articles/benchmarks_files/figure-html/unnamed-chunk-4-1.png and b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-4-1.png differ diff --git a/docs/articles/benchmarks_files/figure-html/unnamed-chunk-6-1.png b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-6-1.png index d2fab4535..4c7e50edc 100644 Binary files a/docs/articles/benchmarks_files/figure-html/unnamed-chunk-6-1.png and b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-6-1.png differ diff --git a/docs/articles/index.html b/docs/articles/index.html index cd41c3074..4c43cf47d 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -81,7 +81,7 @@ AMR (for R) - 1.3.0.9000 + 1.3.0.9001 @@ -263,7 +263,7 @@
-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.5.1.9000.

diff --git a/docs/articles/resistance_predict.html b/docs/articles/resistance_predict.html index 6f4926fa2..3e56fc221 100644 --- a/docs/articles/resistance_predict.html +++ b/docs/articles/resistance_predict.html @@ -39,7 +39,7 @@ AMR (for R) - 1.3.0 + 1.3.0.9001 @@ -186,7 +186,7 @@

How to predict antimicrobial resistance

Matthijs S. Berends

-

30 July 2020

+

10 August 2020

Source: vignettes/resistance_predict.Rmd @@ -200,35 +200,38 @@ Needed R packages

As with many uses in R, we need some additional packages for AMR analysis. Our package works closely together with the tidyverse packages dplyr and ggplot2 by Dr Hadley Wickham. The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.

Our AMR package depends on these packages and even extends their use and functions.

-
library(dplyr)
-library(ggplot2)
-library(AMR)
+
+library(dplyr)
+library(ggplot2)
+library(AMR)
 
 # (if not yet installed, install with:)
-# install.packages(c("tidyverse", "AMR"))
+# install.packages(c("tidyverse", "AMR")) +

Prediction analysis

Our package contains a function resistance_predict(), which takes the same input as functions for other AMR analysis. Based on a date column, it calculates cases per year and uses a regression model to predict antimicrobial resistance.

It is basically as easy as:

-
# resistance prediction of piperacillin/tazobactam (TZP):
-resistance_predict(tbl = example_isolates, col_date = "date", col_ab = "TZP", model = "binomial")
-
-# or:
-example_isolates %>% 
-  resistance_predict(col_ab = "TZP",
-                     model  "binomial")
-
-# to bind it to object 'predict_TZP' for example:
-predict_TZP <- example_isolates %>% 
-  resistance_predict(col_ab = "TZP",
-                     model = "binomial")
+
# resistance prediction of piperacillin/tazobactam (TZP):
+resistance_predict(tbl = example_isolates, col_date = "date", col_ab = "TZP", model = "binomial")
+
+# or:
+example_isolates %>% 
+  resistance_predict(col_ab = "TZP",
+                     model  "binomial")
+
+# to bind it to object 'predict_TZP' for example:
+predict_TZP <- example_isolates %>% 
+  resistance_predict(col_ab = "TZP",
+                     model = "binomial")

The function will look for a date column itself if col_date is not set.

When running any of these commands, a summary of the regression model will be printed unless using resistance_predict(..., info = FALSE).

# NOTE: Using column `date` as input for `col_date`.

This text is only a printed summary - the actual result (output) of the function is a data.frame containing for each year: the number of observations, the actual observed resistance, the estimated resistance and the standard error below and above the estimation:

-
predict_TZP
+
+predict_TZP
 #    year      value    se_min    se_max observations   observed  estimated
 # 1  2002 0.20000000        NA        NA           15 0.20000000 0.05616378
 # 2  2003 0.06250000        NA        NA           32 0.06250000 0.06163839
@@ -258,27 +261,36 @@ predict_TZP <- example_isolates %>%
 # 26 2027 0.41315710 0.3244399 0.5018743           NA         NA 0.41315710
 # 27 2028 0.43730688 0.3418075 0.5328063           NA         NA 0.43730688
 # 28 2029 0.46175755 0.3597639 0.5637512           NA         NA 0.46175755
-# 29 2030 0.48639359 0.3782932 0.5944939           NA         NA 0.48639359
+# 29 2030 0.48639359 0.3782932 0.5944939 NA NA 0.48639359 +

The function plot is available in base R, and can be extended by other packages to depend the output based on the type of input. We extended its function to cope with resistance predictions:

-
plot(predict_TZP)
+
+plot(predict_TZP)
+

This is the fastest way to plot the result. It automatically adds the right axes, error bars, titles, number of available observations and type of model.

We also support the ggplot2 package with our custom function ggplot_rsi_predict() to create more appealing plots:

-
ggplot_rsi_predict(predict_TZP)
+
+ggplot_rsi_predict(predict_TZP)
+

-
+
+
 # choose for error bars instead of a ribbon
-ggplot_rsi_predict(predict_TZP, ribbon = FALSE)
+ggplot_rsi_predict(predict_TZP, ribbon = FALSE) +

Choosing the right model

Resistance is not easily predicted; if we look at vancomycin resistance in Gram-positive bacteria, the spread (i.e. standard error) is enormous:

-
example_isolates %>%
-  filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
-  resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "binomial") %>%
+
+example_isolates %>%
+  filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
+  resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "binomial") %>% 
   ggplot_rsi_predict()
-# NOTE: Using column `date` as input for `col_date`.
+# NOTE: Using column `date` as input for `col_date`. +

Vancomycin resistance could be 100% in ten years, but might also stay around 0%.

You can define the model with the model parameter. The model chosen above is a generalised linear regression model using a binomial distribution, assuming that a period of zero resistance was followed by a period of increasing resistance leading slowly to more and more resistance.

@@ -319,25 +331,29 @@ predict_TZP <- example_isolates %>%

For the vancomycin resistance in Gram-positive bacteria, a linear model might be more appropriate since no binomial distribution is to be expected based on the observed years:

-
example_isolates %>%
-  filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
-  resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>%
+
+example_isolates %>%
+  filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
+  resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>% 
   ggplot_rsi_predict()
-# NOTE: Using column `date` as input for `col_date`.
+# NOTE: Using column `date` as input for `col_date`. +

This seems more likely, doesn’t it?

The model itself is also available from the object, as an attribute:

-
model <- attributes(predict_TZP)$model
+
+model <- attributes(predict_TZP)$model
 
-summary(model)$family
+summary(model)$family
 # 
 # Family: binomial 
 # Link function: logit
 
-summary(model)$coefficients
+summary(model)$coefficients
 #                  Estimate  Std. Error   z value     Pr(>|z|)
 # (Intercept) -200.67944891 46.17315349 -4.346237 1.384932e-05
-# year           0.09883005  0.02295317  4.305725 1.664395e-05
+# year 0.09883005 0.02295317 4.305725 1.664395e-05 +
@@ -357,7 +373,7 @@ predict_TZP <- example_isolates %>%
-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.5.1.9000.

diff --git a/docs/articles/welcome_to_AMR.html b/docs/articles/welcome_to_AMR.html index b8f304725..2ddbd8837 100644 --- a/docs/articles/welcome_to_AMR.html +++ b/docs/articles/welcome_to_AMR.html @@ -39,7 +39,7 @@ AMR (for R) - 1.3.0.9000 + 1.3.0.9001 @@ -195,6 +195,7 @@ +

READ ALL VIGNETTES ON OUR WEBSITE

Welcome to the AMR package

@@ -243,7 +244,7 @@
-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.5.1.9000.

diff --git a/docs/authors.html b/docs/authors.html index 89ae47a19..2989f5037 100644 --- a/docs/authors.html +++ b/docs/authors.html @@ -81,7 +81,7 @@ AMR (for R) - 1.3.0.9000 + 1.3.0.9001 @@ -310,7 +310,7 @@
-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.5.1.9000.

diff --git a/docs/index.html b/docs/index.html index 222f31827..2644aa3f2 100644 --- a/docs/index.html +++ b/docs/index.html @@ -43,7 +43,7 @@ AMR (for R) - 1.3.0.9000 + 1.3.0.9001 @@ -192,7 +192,7 @@

July 2020
PLEASE TAKE PART IN OUR SURVEY!
-Since you are one of our users, we would like to know how you use the package and what it brought you or your organisation. If you have a minute, please anonymously fill in this short questionnaire. Your valuable input will help to improve the package and its functionalities. You can answer the open questions in either English, Spanish, French, Dutch, or German. Thank you very much in advance!
Take me to the 5-min survey!

+Since you are one of our users, we would like to know how you use the package and what it brought you or your organisation. If you have a minute, please anonymously fill in this short questionnaire. Your valuable input will help to improve the package and its functionalities. You can answer the open questions in either English, Spanish, French, Dutch, or German. Thank you very much in advance!
Take me to the 5-min survey!

@@ -246,7 +246,9 @@ Since you are one of our users, we would like to know how you use the package an

Latest released version

This package is available here on the official R network (CRAN), which has a peer-reviewed submission process. Install this package in R from CRAN by using the command:

-
install.packages("AMR")
+
+install.packages("AMR")
+

It will be downloaded and installed automatically. For RStudio, click on the menu Tools > Install Packages… and then type in “AMR” and press Install.

Note: Not all functions on this website may be available in this latest release. To use all functions and data sets mentioned on this website, install the latest development version.

@@ -254,8 +256,10 @@ Since you are one of our users, we would like to know how you use the package an

Latest development version

The latest and unpublished development version can be installed from GitHub using:

-
install.packages("remotes")
-remotes::install_github("msberends/AMR")
+
+install.packages("remotes") 
+remotes::install_github("msberends/AMR")
+
@@ -419,7 +423,7 @@ Since you are one of our users, we would like to know how you use the package an
-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.5.1.9000.

diff --git a/docs/news/index.html b/docs/news/index.html index 25da3cc56..3983f88eb 100644 --- a/docs/news/index.html +++ b/docs/news/index.html @@ -81,7 +81,7 @@ AMR (for R) - 1.3.0.9000 + 1.3.0.9001 @@ -229,9 +229,9 @@ Source: NEWS.md -
-

-AMR 1.3.0.9000 Unreleased +
+

+AMR 1.3.0.9001 Unreleased

@@ -243,13 +243,15 @@
  • Support for using dplyr’s across() in as.rsi() to interpret MIC values or disk zone diameters, that now also automatically determines the column with microorganism names or codes.

    -
    # until dplyr 1.0.0
    -your_data %>% mutate_if(is.mic, as.rsi)
    -your_data %>% mutate_if(is.disk, as.rsi)
    +
    +# until dplyr 1.0.0
    +your_data %>% mutate_if(is.mic, as.rsi)
    +your_data %>% mutate_if(is.disk, as.rsi)
     
      # since dplyr 1.0.0
    - your_data %>% mutate(across(where(is.mic), as.rsi))
    -your_data %>% mutate(across(where(is.disk), as.rsi))
    + your_data %>% mutate(across(where(is.mic), as.rsi)) +your_data %>% mutate(across(where(is.disk), as.rsi)) +

@@ -266,12 +268,14 @@
  • Function ab_from_text() to retrieve antimicrobial drug names, doses and forms of administration from clinical texts in e.g. health care records, which also corrects for misspelling since it uses as.ab() internally

  • Tidyverse selection helpers for antibiotic classes, that help to select the columns of antibiotics that are of a specific antibiotic class, without the need to define the columns or antibiotic abbreviations. They can be used in any function that allows selection helpers, like dplyr::select() and tidyr::pivot_longer():

    -
    library(dplyr)
    +
    +library(dplyr)
     
     # Columns 'IPM' and 'MEM' are in the example_isolates data set
    -example_isolates %>%
    +example_isolates %>% 
       select(carbapenems())
    -#> Selecting carbapenems: `IPM` (imipenem), `MEM` (meropenem)
    +#> Selecting carbapenems: `IPM` (imipenem), `MEM` (meropenem) +
  • Added mo_domain() as an alias to mo_kingdom()

  • Added function filter_penicillins() to filter isolates on a specific result in any column with a name in the antimicrobial ‘penicillins’ class (more specific: ATC subgroup Beta-lactam antibacterials, penicillins)

  • @@ -294,7 +298,7 @@
    • 95% speed improvement by using other base R functions for calculation
    • Using unexisting columns wil now return an error instead of dropping them silently
    • -
    • Using variables for column names (as well as selectors like dplyr::all_of()) now works again
    • +
    • Using variables for column names (as well as selectors like dplyr::all_of()) now works again
  • @@ -349,7 +353,7 @@

    Making this package independent of especially the tidyverse (e.g. packages dplyr and tidyr) tremendously increases sustainability on the long term, since tidyverse functions change quite often. Good for users, but hard for package maintainers. Most of our functions are replaced with versions that only rely on base R, which keeps this package fully functional for many years to come, without requiring a lot of maintenance to keep up with other packages anymore. Another upside it that this package can now be used with all versions of R since R-3.0.0 (April 2013). Our package is being used in settings where the resources are very limited. Fewer dependencies on newer software is helpful for such settings.

    Negative effects of this change are:

      -
    • Function freq() that was borrowed from the cleaner package was removed. Use cleaner::freq(), or run library("cleaner") before you use freq().
    • +
    • Function freq() that was borrowed from the cleaner package was removed. Use cleaner::freq(), or run library("cleaner") before you use freq().
    • Printing values of class mo or rsi in a tibble will no longer be in colour and printing rsi in a tibble will show the class <ord>, not <rsi> anymore. This is purely a visual effect.
    • All functions from the mo_* family (like mo_name() and mo_gramstain()) are noticeably slower when running on hundreds of thousands of rows.
    • For developers: classes mo and ab now both also inherit class character, to support any data transformation. This change invalidates code that checks for class length == 1.
    • @@ -453,11 +457,13 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/
    • Fixed important floating point error for some MIC comparisons in EUCAST 2020 guideline

    • Interpretation from MIC values (and disk zones) to R/SI can now be used with mutate_at() of the dplyr package:

      -
      yourdata %>%
      -  mutate_at(vars(antibiotic1:antibiotic25), as.rsi, mo = "E. coli")
      +
      +yourdata %>% 
      +  mutate_at(vars(antibiotic1:antibiotic25), as.rsi, mo = "E. coli")
       
      -yourdata %>%
      -  mutate_at(vars(antibiotic1:antibiotic25), as.rsi, mo = .$mybacteria)
      +yourdata %>% + mutate_at(vars(antibiotic1:antibiotic25), as.rsi, mo = .$mybacteria) +
    • Added antibiotic abbreviations for a laboratory manufacturer (GLIMS) for cefuroxime, cefotaxime, ceftazidime, cefepime, cefoxitin and trimethoprim/sulfamethoxazole

    • Added uti (as abbreviation of urinary tract infections) as parameter to as.rsi(), so interpretation of MIC values and disk zones can be made dependent on isolates specifically from UTIs

    • @@ -480,21 +486,25 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/
      • Support for LOINC codes in the antibiotics data set. Use ab_loinc() to retrieve LOINC codes, or use a LOINC code for input in any ab_* function:

        -
        ab_loinc("ampicillin")
        +
        +ab_loinc("ampicillin")
         #> [1] "21066-6" "3355-5"  "33562-0" "33919-2" "43883-8" "43884-6" "87604-5"
         ab_name("21066-6")
         #> [1] "Ampicillin"
         ab_atc("21066-6")
        -#> [1] "J01CA01"
        +#> [1] "J01CA01" +
      • Support for SNOMED CT codes in the microorganisms data set. Use mo_snomed() to retrieve SNOMED codes, or use a SNOMED code for input in any mo_* function:

        -
        mo_snomed("S. aureus")
        +
        +mo_snomed("S. aureus")
         #> [1] 115329001   3092008 113961008
         mo_name(115329001)
         #> [1] "Staphylococcus aureus"
         mo_gramstain(115329001)
        -#> [1] "Gram-positive"
        +#> [1] "Gram-positive" +
      @@ -552,9 +562,13 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/
      • If you were dependent on the old Enterobacteriaceae family e.g. by using in your code:

        -
        if (mo_family(somebugs) == "Enterobacteriaceae") ...
        +
        +if (mo_family(somebugs) == "Enterobacteriaceae") ...
        +

        then please adjust this to:

        -
        if (mo_order(somebugs) == "Enterobacterales") ...
        +
        +if (mo_order(somebugs) == "Enterobacterales") ...
        +
      @@ -566,12 +580,14 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/
      • Functions susceptibility() and resistance() as aliases of proportion_SI() and proportion_R(), respectively. These functions were added to make it more clear that “I” should be considered susceptible and not resistant.

        -
        library(dplyr)
        -example_isolates %>%
        -  group_by(bug = mo_name(mo)) %>%
        -  summarise(amoxicillin = resistance(AMX),
        -            amox_clav   = resistance(AMC)) %>%
        -  filter(!is.na(amoxicillin) | !is.na(amox_clav))
        +
        +library(dplyr)
        +example_isolates %>%
        +  group_by(bug = mo_name(mo)) %>% 
        +  summarise(amoxicillin = resistance(AMX),
        +            amox_clav   = resistance(AMC)) %>%
        +  filter(!is.na(amoxicillin) | !is.na(amox_clav))
        +
      • Support for a new MDRO guideline: Magiorakos AP, Srinivasan A et al. “Multidrug-resistant, extensively drug-resistant and pandrug-resistant bacteria: an international expert proposal for interim standard definitions for acquired resistance.” Clinical Microbiology and Infection (2012).

        @@ -593,7 +609,8 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/
      • More intelligent way of coping with some consonants like “l” and “r”

      • Added a score (a certainty percentage) to mo_uncertainties(), that is calculated using the Levenshtein distance:

        -
        as.mo(c("Stafylococcus aureus",
        +
        +as.mo(c("Stafylococcus aureus",
                 "staphylokok aureuz"))
         #> Warning: 
         #> Results of two values were guessed with uncertainty. Use mo_uncertainties() to review them.
        @@ -602,7 +619,8 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/
         
         mo_uncertainties()
         #> "Stafylococcus aureus" -> Staphylococcus aureus (B_STPHY_AURS, score: 95.2%)
        -#> "staphylokok aureuz"   -> Staphylococcus aureus (B_STPHY_AURS, score: 85.7%)
        +#> "staphylokok aureuz" -> Staphylococcus aureus (B_STPHY_AURS, score: 85.7%) +
      @@ -650,25 +668,29 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/
      • Determination of first isolates now excludes all ‘unknown’ microorganisms at default, i.e. microbial code "UNKNOWN". They can be included with the new parameter include_unknown:

        -
        first_isolate(..., include_unknown = TRUE)
        +
        +first_isolate(..., include_unknown = TRUE)
        +

        For WHONET users, this means that all records/isolates with organism code "con" (contamination) will be excluded at default, since as.mo("con") = "UNKNOWN". The function always shows a note with the number of ‘unknown’ microorganisms that were included or excluded.

      • For code consistency, classes ab and mo will now be preserved in any subsetting or assignment. For the sake of data integrity, this means that invalid assignments will now result in NA:

        -
        # how it works in base R:
        -x <- factor("A")
        -x[1] <- "B"
        +
        +# how it works in base R:
        +x <- factor("A")
        +x[1] <- "B"
         #> Warning message:
         #> invalid factor level, NA generated
         
         # how it now works similarly for classes 'mo' and 'ab':
        -x <- as.mo("E. coli")
        -x[1] <- "testvalue"
        +x <- as.mo("E. coli")
        +x[1] <- "testvalue"
         #> Warning message:
        -#> invalid microorganism code, NA generated
        +#> invalid microorganism code, NA generated +

        This is important, because a value like "testvalue" could never be understood by e.g. mo_name(), although the class would suggest a valid microbial code.

      • -
      • Function freq() has moved to a new package, clean (CRAN link), since creating frequency tables actually does not fit the scope of this package. The freq() function still works, since it is re-exported from the clean package (which will be installed automatically upon updating this AMR package).

      • +
      • Function freq() has moved to a new package, clean (CRAN link), since creating frequency tables actually does not fit the scope of this package. The freq() function still works, since it is re-exported from the clean package (which will be installed automatically upon updating this AMR package).

      • Renamed data set septic_patients to example_isolates

  • @@ -678,9 +700,10 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/

    @@ -804,14 +834,16 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/

    All these lead to the microbial ID of E. coli:

    -
    as.mo("UPEC")
    +
    +as.mo("UPEC")
     # B_ESCHR_COL
     mo_name("UPEC")
     # "Escherichia coli"
     mo_gramstain("EHEC")
    -# "Gram-negative"
    +# "Gram-negative" +
  • Function mo_info() as an analogy to ab_info(). The mo_info() prints a list with the full taxonomy, authors, and the URL to the online database of a microorganism

  • Function mo_synonyms() to get all previously accepted taxonomic names of a microorganism

  • @@ -862,7 +896,7 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/
  • Fixed bug where not all old taxonomic names would be printed, when using a vector as input for as.mo()
  • Manually added Trichomonas vaginalis from the kingdom of Protozoa, which is missing from the Catalogue of Life
  • -
  • Small improvements to plot() and barplot() for MIC and RSI classes
  • +
  • Small improvements to plot() and barplot() for MIC and RSI classes
  • Allow Catalogue of Life IDs to be coerced by as.mo()
  • @@ -925,21 +959,23 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/
  • The age() function gained a new parameter exact to determine ages with decimals
  • Removed deprecated functions guess_mo(), guess_atc(), EUCAST_rules(), interpretive_reading(), rsi()
  • -
  • Frequency tables (freq()): +
  • Frequency tables (freq()):
  • @@ -948,7 +984,7 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/
  • Added ceftazidim intrinsic resistance to Streptococci
  • Changed default settings for age_groups(), to let groups of fives and tens end with 100+ instead of 120+
  • -
  • Fix for freq() for when all values are NA +
  • Fix for freq() for when all values are NA
  • Fix for first_isolate() for when dates are missing
  • Improved speed of guess_ab_col() @@ -1025,7 +1061,8 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/
  • New filters for antimicrobial classes. Use these functions to filter isolates on results in one of more antibiotics from a specific class:

    -
    filter_aminoglycosides()
    +
    +filter_aminoglycosides()
     filter_carbapenems()
     filter_cephalosporins()
     filter_1st_cephalosporins()
    @@ -1035,23 +1072,28 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/
     filter_fluoroquinolones()
     filter_glycopeptides()
     filter_macrolides()
    -filter_tetracyclines()
    +filter_tetracyclines() +

    The antibiotics data set will be searched, after which the input data will be checked for column names with a value in any abbreviations, codes or official names found in the antibiotics data set. For example:

    -
    septic_patients %>% filter_glycopeptides(result = "R")
    +
    +septic_patients %>% filter_glycopeptides(result = "R")
     # Filtering on glycopeptide antibacterials: any of `vanc` or `teic` is R
    -septic_patients %>% filter_glycopeptides(result = "R", scope = "all")
    -# Filtering on glycopeptide antibacterials: all of `vanc` and `teic` is R
    +septic_patients %>% filter_glycopeptides(result = "R", scope = "all") +# Filtering on glycopeptide antibacterials: all of `vanc` and `teic` is R +
  • All ab_* functions are deprecated and replaced by atc_* functions:

    -
    ab_property -> atc_property()
    -ab_name -> atc_name()
    -ab_official -> atc_official()
    -ab_trivial_nl -> atc_trivial_nl()
    -ab_certe -> atc_certe()
    -ab_umcg -> atc_umcg()
    -ab_tradenames -> atc_tradenames()
    -

    These functions use as.atc() internally. The old atc_property has been renamed atc_online_property(). This is done for two reasons: firstly, not all ATC codes are of antibiotics (ab) but can also be of antivirals or antifungals. Secondly, the input must have class atc or must be coerable to this class. Properties of these classes should start with the same class name, analogous to as.mo() and e.g. mo_genus.

    +
    +ab_property -> atc_property()
    +ab_name -> atc_name()
    +ab_official -> atc_official()
    +ab_trivial_nl -> atc_trivial_nl()
    +ab_certe -> atc_certe()
    +ab_umcg -> atc_umcg()
    +ab_tradenames -> atc_tradenames()
    +
    +

    These functions use as.atc() internally. The old atc_property has been renamed atc_online_property(). This is done for two reasons: firstly, not all ATC codes are of antibiotics (ab) but can also be of antivirals or antifungals. Secondly, the input must have class atc or must be coerable to this class. Properties of these classes should start with the same class name, analogous to as.mo() and e.g. mo_genus.

  • New functions set_mo_source() and get_mo_source() to use your own predefined MO codes as input for as.mo() and consequently all mo_* functions

  • Support for the upcoming dplyr version 0.8.0

  • @@ -1062,21 +1104,27 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/
  • New function age() to calculate the (patients) age in years

  • New function age_groups() to split ages into custom or predefined groups (like children or elderly). This allows for easier demographic antimicrobial resistance analysis per age group.

  • -

    New function ggplot_rsi_predict() as well as the base R plot() function can now be used for resistance prediction calculated with resistance_predict():

    -
    x <- resistance_predict(septic_patients, col_ab = "amox")
    -plot(x)
    -ggplot_rsi_predict(x)
    +

    New function ggplot_rsi_predict() as well as the base R plot() function can now be used for resistance prediction calculated with resistance_predict():

    +
    +x <- resistance_predict(septic_patients, col_ab = "amox")
    +plot(x)
    +ggplot_rsi_predict(x)
    +
  • Functions filter_first_isolate() and filter_first_weighted_isolate() to shorten and fasten filtering on data sets with antimicrobial results, e.g.:

    -
    septic_patients %>% filter_first_isolate(...)
    +
    +septic_patients %>% filter_first_isolate(...)
     # or
    -filter_first_isolate(septic_patients, ...)
    +filter_first_isolate(septic_patients, ...) +

    is equal to:

    -
    septic_patients %>%
    -  mutate(only_firsts = first_isolate(septic_patients, ...)) %>%
    -  filter(only_firsts == TRUE) %>%
    -  select(-only_firsts)
    +
    +septic_patients %>%
    +  mutate(only_firsts = first_isolate(septic_patients, ...)) %>%
    +  filter(only_firsts == TRUE) %>%
    +  select(-only_firsts)
    +
  • New function availability() to check the number of available (non-empty) results in a data.frame

  • New vignettes about how to conduct AMR analysis, predict antimicrobial resistance, use the G-test and more. These are also available (and even easier readable) on our website: https://msberends.gitlab.io/AMR.

  • @@ -1097,40 +1145,46 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/
  • Removed data sets microorganisms.oldDT, microorganisms.prevDT, microorganisms.unprevDT and microorganismsDT since they were no longer needed and only contained info already available in the microorganisms data set
  • Added 65 antibiotics to the antibiotics data set, from the Pharmaceuticals Community Register of the European Commission
  • Removed columns atc_group1_nl and atc_group2_nl from the antibiotics data set
  • -
  • Functions atc_ddd() and atc_groups() have been renamed atc_online_ddd() and atc_online_groups(). The old functions are deprecated and will be removed in a future version.
  • +
  • Functions atc_ddd() and atc_groups() have been renamed atc_online_ddd() and atc_online_groups(). The old functions are deprecated and will be removed in a future version.
  • Function guess_mo() is now deprecated in favour of as.mo() and will be removed in future versions
  • Function guess_atc() is now deprecated in favour of as.atc() and will be removed in future versions
  • Improvements for as.mo():
  • -
  • Frequency tables (freq() function): +
  • Frequency tables (freq() function): @@ -1529,13 +1607,13 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/