diff --git a/DESCRIPTION b/DESCRIPTION index 1c4e30e7..14a7cf81 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR -Version: 0.7.1.9010 -Date: 2019-07-09 +Version: 0.7.1.9012 +Date: 2019-07-10 Title: Antimicrobial Resistance Analysis Authors@R: c( person( diff --git a/NEWS.md b/NEWS.md index a5375b27..aa4d72d3 100755 --- a/NEWS.md +++ b/NEWS.md @@ -1,4 +1,4 @@ -# AMR 0.7.1.9010 +# AMR 0.7.1.9012 ### New * Additional way to calculate co-resistance, i.e. when using multiple antibiotics as input for `portion_*` functions or `count_*` functions. This can be used to determine the empiric susceptibily of a combination therapy. A new parameter `only_all_tested` (**which defaults to `FALSE`**) replaces the old `also_single_tested` and can be used to select one of the two methods to count isolates and calculate portions. The difference can be seen in this example table (which is also on the `portion` and `count` help pages), where the %SI is being determined: diff --git a/R/mo_property.R b/R/mo_property.R index 1e2829f9..56834003 100755 --- a/R/mo_property.R +++ b/R/mo_property.R @@ -151,9 +151,14 @@ mo_shortname <- function(x, language = get_locale(), ...) { x.mo <- AMR::as.mo(x, ...) metadata <- get_mo_failures_uncertainties_renamed() + replace_empty <- function(x) { + x[x == ""] <- "spp." + x + } + # get first char of genus and complete species in English - shortnames <- paste0(substr(mo_genus(x.mo, language = NULL), 1, 1), ". ", mo_species(x.mo, language = NULL)) - + shortnames <- paste0(substr(mo_genus(x.mo, language = NULL), 1, 1), ". ", replace_empty(mo_species(x.mo, language = NULL))) + # exceptions for Staphylococci shortnames[shortnames == "S. coagulase-negative" ] <- "CoNS" shortnames[shortnames == "S. coagulase-positive" ] <- "CoPS" diff --git a/R/sysdata.rda b/R/sysdata.rda index 43047d60..096a3d9b 100644 Binary files a/R/sysdata.rda and b/R/sysdata.rda differ diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index ce2d5e1f..415c8868 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -78,7 +78,7 @@ AMR (for R) - 0.7.1.9010 + 0.7.1.9012 diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index a4c0afd1..f1537c11 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -40,7 +40,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 @@ -192,7 +192,7 @@

How to conduct AMR analysis

Matthijs S. Berends

-

01 July 2019

+

10 July 2019

@@ -201,7 +201,7 @@ -

Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 01 July 2019.

+

Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 10 July 2019.

Introduction

@@ -217,21 +217,21 @@ -2019-07-01 +2019-07-10 abcd Escherichia coli S S -2019-07-01 +2019-07-10 abcd Escherichia coli S R -2019-07-01 +2019-07-10 efgh Escherichia coli R @@ -244,12 +244,12 @@ Needed R packages

As with many uses in R, we need some additional packages for AMR analysis. Our package works closely together with the tidyverse packages dplyr and ggplot2 by Dr Hadley Wickham. The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.

Our AMR package depends on these packages and even extends their use and functions.

-
library(dplyr)
-library(ggplot2)
-library(AMR)
-
-# (if not yet installed, install with:)
-# install.packages(c("tidyverse", "AMR"))
+
library(dplyr)
+library(ggplot2)
+library(AMR)
+
+# (if not yet installed, install with:)
+# install.packages(c("tidyverse", "AMR"))
@@ -261,58 +261,58 @@

Patients

To start with patients, we need a unique list of patients.

-
patients <- unlist(lapply(LETTERS, paste0, 1:10))
+
patients <- unlist(lapply(LETTERS, paste0, 1:10))

The LETTERS object is available in R - it’s a vector with 26 characters: A to Z. The patients object we just created is now a vector of length 260, with values (patient IDs) varying from A1 to Z10. Now we we also set the gender of our patients, by putting the ID and the gender in a table:

-
patients_table <- data.frame(patient_id = patients,
-                             gender = c(rep("M", 135),
-                                        rep("F", 125)))
+
patients_table <- data.frame(patient_id = patients,
+                             gender = c(rep("M", 135),
+                                        rep("F", 125)))

The first 135 patient IDs are now male, the other 125 are female.

Dates

Let’s pretend that our data consists of blood cultures isolates from between 1 January 2010 and 1 January 2018.

-
dates <- seq(as.Date("2010-01-01"), as.Date("2018-01-01"), by = "day")
+
dates <- seq(as.Date("2010-01-01"), as.Date("2018-01-01"), by = "day")

This dates object now contains all days in our date range.

Microorganisms

For this tutorial, we will uses four different microorganisms: Escherichia coli, Staphylococcus aureus, Streptococcus pneumoniae, and Klebsiella pneumoniae:

-
bacteria <- c("Escherichia coli", "Staphylococcus aureus",
-              "Streptococcus pneumoniae", "Klebsiella pneumoniae")
+
bacteria <- c("Escherichia coli", "Staphylococcus aureus",
+              "Streptococcus pneumoniae", "Klebsiella pneumoniae")

Other variables

For completeness, we can also add the hospital where the patients was admitted and we need to define valid antibmicrobial results for our randomisation:

-
hospitals <- c("Hospital A", "Hospital B", "Hospital C", "Hospital D")
-ab_interpretations <- c("S", "I", "R")
+
hospitals <- c("Hospital A", "Hospital B", "Hospital C", "Hospital D")
+ab_interpretations <- c("S", "I", "R")

Put everything together

Using the sample() function, we can randomly select items from all objects we defined earlier. To let our fake data reflect reality a bit, we will also approximately define the probabilities of bacteria and the antibiotic results with the prob parameter.

-
sample_size <- 20000
-data <- data.frame(date = sample(dates, size = sample_size, replace = TRUE),
-                   patient_id = sample(patients, size = sample_size, replace = TRUE),
-                   hospital = sample(hospitals, size = sample_size, replace = TRUE,
-                                     prob = c(0.30, 0.35, 0.15, 0.20)),
-                   bacteria = sample(bacteria, size = sample_size, replace = TRUE,
-                                     prob = c(0.50, 0.25, 0.15, 0.10)),
-                   AMX = sample(ab_interpretations, size = sample_size, replace = TRUE,
-                                 prob = c(0.60, 0.05, 0.35)),
-                   AMC = sample(ab_interpretations, size = sample_size, replace = TRUE,
-                                 prob = c(0.75, 0.10, 0.15)),
-                   CIP = sample(ab_interpretations, size = sample_size, replace = TRUE,
-                                 prob = c(0.80, 0.00, 0.20)),
-                   GEN = sample(ab_interpretations, size = sample_size, replace = TRUE,
-                                 prob = c(0.92, 0.00, 0.08))
-                   )
+
sample_size <- 20000
+data <- data.frame(date = sample(dates, size = sample_size, replace = TRUE),
+                   patient_id = sample(patients, size = sample_size, replace = TRUE),
+                   hospital = sample(hospitals, size = sample_size, replace = TRUE,
+                                     prob = c(0.30, 0.35, 0.15, 0.20)),
+                   bacteria = sample(bacteria, size = sample_size, replace = TRUE,
+                                     prob = c(0.50, 0.25, 0.15, 0.10)),
+                   AMX = sample(ab_interpretations, size = sample_size, replace = TRUE,
+                                 prob = c(0.60, 0.05, 0.35)),
+                   AMC = sample(ab_interpretations, size = sample_size, replace = TRUE,
+                                 prob = c(0.75, 0.10, 0.15)),
+                   CIP = sample(ab_interpretations, size = sample_size, replace = TRUE,
+                                 prob = c(0.80, 0.00, 0.20)),
+                   GEN = sample(ab_interpretations, size = sample_size, replace = TRUE,
+                                 prob = c(0.92, 0.00, 0.08))
+                   )

Using the left_join() function from the dplyr package, we can ‘map’ the gender to the patient ID using the patients_table object we created earlier:

-
data <- data %>% left_join(patients_table)
+
data <- data %>% left_join(patients_table)

The resulting data set contains 20,000 blood culture isolates. With the head() function we can preview the first 6 values of this data set:

-
head(data)
+
head(data)
@@ -327,70 +327,70 @@ - - - - - - - - - - - - - + + - + - - - - - - - - - - - - - - - - + + + + - + + - - - - + + + + + + - - - - + + - - + + + + + + + + + + + + + - + + + + + + + + + + + +
date
2011-09-06Z5Hospital BEscherichia coliRSSSF
2015-03-21E72011-07-20X1 Hospital CEscherichia coliStaphylococcus aureus R I S SM
2010-08-11X6Hospital CEscherichia coliSSRS F
2012-06-16E10Hospital DStaphylococcus aureusR2017-12-18Q8Hospital BStreptococcus pneumoniae S S SMSF
2016-12-29J3Hospital CEscherichia coli2015-12-08X5Hospital BStaphylococcus aureusSS RSSSMRF
2010-04-09Q32011-12-08X3 Hospital BKlebsiella pneumoniaeSSSSF
2011-01-02E1Hospital A Streptococcus pneumoniae R S S SFM
2012-10-22K1Hospital CStreptococcus pneumoniaeSSSSM
@@ -401,7 +401,7 @@

Cleaning the data

Use the frequency table function freq() to look specifically for unique values in any variable. For example, for the gender variable:

-
data %>% freq(gender) # this would be the same: freq(data$gender)
+
data %>% freq(gender) # this would be the same: freq(data$gender)
# Frequency table of `gender` from `data` (20,000 x 9) 
 # 
 # Class:   factor (numeric)
@@ -411,82 +411,82 @@
 # 
 #      Item     Count   Percent   Cum. Count   Cum. Percent
 # ---  -----  -------  --------  -----------  -------------
-# 1    M       10,408     52.0%       10,408          52.0%
-# 2    F        9,592     48.0%       20,000         100.0%
+# 1 M 10,454 52.3% 10,454 52.3% +# 2 F 9,546 47.7% 20,000 100.0%

So, we can draw at least two conclusions immediately. From a data scientists perspective, the data looks clean: only values M and F. From a researchers perspective: there are slightly more men. Nothing we didn’t already know.

The data is already quite clean, but we still need to transform some variables. The bacteria column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate() function of the dplyr package makes this really easy:

-
data <- data %>%
-  mutate(bacteria = as.mo(bacteria))
+
data <- data %>%
+  mutate(bacteria = as.mo(bacteria))

We also want to transform the antibiotics, because in real life data we don’t know if they are really clean. The as.rsi() function ensures reliability and reproducibility in these kind of variables. The mutate_at() will run the as.rsi() function on defined variables:

-
data <- data %>%
-  mutate_at(vars(AMX:GEN), as.rsi)
+
data <- data %>%
+  mutate_at(vars(AMX:GEN), as.rsi)

Finally, we will apply EUCAST rules on our antimicrobial results. In Europe, most medical microbiological laboratories already apply these rules. Our package features their latest insights on intrinsic resistance and exceptional phenotypes. Moreover, the eucast_rules() function can also apply additional rules, like forcing ampicillin = R when amoxicillin/clavulanic acid = R.

Because the amoxicillin (column AMX) and amoxicillin/clavulanic acid (column AMC) in our data were generated randomly, some rows will undoubtedly contain AMX = S and AMC = R, which is technically impossible. The eucast_rules() fixes this:

-
data <- eucast_rules(data, col_mo = "bacteria")
-# 
-# Rules by the European Committee on Antimicrobial Susceptibility Testing (EUCAST)
-# http://eucast.org/
-# 
-# EUCAST Clinical Breakpoints (v9.0, 2019)
-# Aerococcus sanguinicola (no new changes)
-# Aerococcus urinae (no new changes)
-# Anaerobic Gram-negatives (no new changes)
-# Anaerobic Gram-positives (no new changes)
-# Campylobacter coli (no new changes)
-# Campylobacter jejuni (no new changes)
-# Enterobacteriales (Order) (no new changes)
-# Enterococcus (no new changes)
-# Haemophilus influenzae (no new changes)
-# Kingella kingae (no new changes)
-# Moraxella catarrhalis (no new changes)
-# Pasteurella multocida (no new changes)
-# Staphylococcus (no new changes)
-# Streptococcus groups A, B, C, G (no new changes)
-# Streptococcus pneumoniae (1,443 new changes)
-# Viridans group streptococci (no new changes)
-# 
-# EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
-# Table 01: Intrinsic resistance in Enterobacteriaceae (1,332 new changes)
-# Table 02: Intrinsic resistance in non-fermentative Gram-negative bacteria (no new changes)
-# Table 03: Intrinsic resistance in other Gram-negative bacteria (no new changes)
-# Table 04: Intrinsic resistance in Gram-positive bacteria (2,723 new changes)
-# Table 08: Interpretive rules for B-lactam agents and Gram-positive cocci (no new changes)
-# Table 09: Interpretive rules for B-lactam agents and Gram-negative rods (no new changes)
-# Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins (no new changes)
-# Table 12: Interpretive rules for aminoglycosides (no new changes)
-# Table 13: Interpretive rules for quinolones (no new changes)
-# 
-# Other rules
-# Non-EUCAST: amoxicillin/clav acid = S where ampicillin = S (2,213 new changes)
-# Non-EUCAST: ampicillin = R where amoxicillin/clav acid = R (127 new changes)
-# Non-EUCAST: piperacillin = R where piperacillin/tazobactam = R (no new changes)
-# Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no new changes)
-# Non-EUCAST: trimethoprim = R where trimethoprim/sulfa = R (no new changes)
-# Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no new changes)
-# 
-# --------------------------------------------------------------------------
-# EUCAST rules affected 6,513 out of 20,000 rows, making a total of 7,838 edits
-# => added 0 test results
-# 
-# => changed 7,838 test results
-#    - 115 test results changed from S to I
-#    - 4,719 test results changed from S to R
-#    - 1,077 test results changed from I to S
-#    - 335 test results changed from I to R
-#    - 1,573 test results changed from R to S
-#    - 19 test results changed from R to I
-# --------------------------------------------------------------------------
-# 
-# Use verbose = TRUE to get a data.frame with all specified edits instead.
+
data <- eucast_rules(data, col_mo = "bacteria")
+# 
+# Rules by the European Committee on Antimicrobial Susceptibility Testing (EUCAST)
+# http://eucast.org/
+# 
+# EUCAST Clinical Breakpoints (v9.0, 2019)
+# Aerococcus sanguinicola (no new changes)
+# Aerococcus urinae (no new changes)
+# Anaerobic Gram-negatives (no new changes)
+# Anaerobic Gram-positives (no new changes)
+# Campylobacter coli (no new changes)
+# Campylobacter jejuni (no new changes)
+# Enterobacteriales (Order) (no new changes)
+# Enterococcus (no new changes)
+# Haemophilus influenzae (no new changes)
+# Kingella kingae (no new changes)
+# Moraxella catarrhalis (no new changes)
+# Pasteurella multocida (no new changes)
+# Staphylococcus (no new changes)
+# Streptococcus groups A, B, C, G (no new changes)
+# Streptococcus pneumoniae (1,481 new changes)
+# Viridans group streptococci (no new changes)
+# 
+# EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
+# Table 01: Intrinsic resistance in Enterobacteriaceae (1,328 new changes)
+# Table 02: Intrinsic resistance in non-fermentative Gram-negative bacteria (no new changes)
+# Table 03: Intrinsic resistance in other Gram-negative bacteria (no new changes)
+# Table 04: Intrinsic resistance in Gram-positive bacteria (2,778 new changes)
+# Table 08: Interpretive rules for B-lactam agents and Gram-positive cocci (no new changes)
+# Table 09: Interpretive rules for B-lactam agents and Gram-negative rods (no new changes)
+# Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins (no new changes)
+# Table 12: Interpretive rules for aminoglycosides (no new changes)
+# Table 13: Interpretive rules for quinolones (no new changes)
+# 
+# Other rules
+# Non-EUCAST: amoxicillin/clav acid = S where ampicillin = S (2,245 new changes)
+# Non-EUCAST: ampicillin = R where amoxicillin/clav acid = R (118 new changes)
+# Non-EUCAST: piperacillin = R where piperacillin/tazobactam = R (no new changes)
+# Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no new changes)
+# Non-EUCAST: trimethoprim = R where trimethoprim/sulfa = R (no new changes)
+# Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no new changes)
+# 
+# --------------------------------------------------------------------------
+# EUCAST rules affected 6,581 out of 20,000 rows, making a total of 7,950 edits
+# => added 0 test results
+# 
+# => changed 7,950 test results
+#    - 120 test results changed from S to I
+#    - 4,812 test results changed from S to R
+#    - 1,089 test results changed from I to S
+#    - 317 test results changed from I to R
+#    - 1,588 test results changed from R to S
+#    - 24 test results changed from R to I
+# --------------------------------------------------------------------------
+# 
+# Use verbose = TRUE (on your original data) to get a data.frame with all specified edits instead.

Adding new variables

Now that we have the microbial ID, we can add some taxonomic properties:

-
data <- data %>% 
-  mutate(gramstain = mo_gramstain(bacteria),
-         genus = mo_genus(bacteria),
-         species = mo_species(bacteria))
+
data <- data %>% 
+  mutate(gramstain = mo_gramstain(bacteria),
+         genus = mo_genus(bacteria),
+         species = mo_species(bacteria))

First isolates

@@ -497,23 +497,23 @@

(…) When preparing a cumulative antibiogram to guide clinical decisions about empirical antimicrobial therapy of initial infections, only the first isolate of a given species per patient, per analysis period (eg, one year) should be included, irrespective of body site, antimicrobial susceptibility profile, or other phenotypical characteristics (eg, biotype). The first isolate is easily identified, and cumulative antimicrobial susceptibility test data prepared using the first isolate are generally comparable to cumulative antimicrobial susceptibility test data calculated by other methods, providing duplicate isolates are excluded.
M39-A4 Analysis and Presentation of Cumulative Antimicrobial Susceptibility Test Data, 4th Edition. CLSI, 2014. Chapter 6.4

This AMR package includes this methodology with the first_isolate() function. It adopts the episode of a year (can be changed by user) and it starts counting days after every selected isolate. This new variable can easily be added to our data:

- -

So only 28.6% is suitable for resistance analysis! We can now filter on it with the filter() function, also from the dplyr package:

- + +

So only 28.4% is suitable for resistance analysis! We can now filter on it with the filter() function, also from the dplyr package:

+

For future use, the above two syntaxes can be shortened with the filter_first_isolate() function:

- +

First weighted isolates

-

We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient S7, sorted on date:

+

We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient B6, sorted on date:

@@ -529,30 +529,30 @@ - - + + - + - - + + - - - + + + - - + + @@ -562,10 +562,10 @@ - - + + - + @@ -573,8 +573,8 @@ - - + + @@ -584,19 +584,19 @@ - - + + + - - + - - + + @@ -606,32 +606,32 @@ - - + + - + + - - - + + - + - - + + - + @@ -639,18 +639,18 @@
isolate
12010-01-28S72010-02-26B6 B_ESCHR_COL RI SR S TRUE
22010-02-07S72010-05-20B6 B_ESCHR_COLSSS RSSS FALSE
32010-03-16S72010-05-28B6 B_ESCHR_COL R S
42010-10-09S72010-06-27B6 B_ESCHR_COLSR S S S
52011-01-25S72010-09-06B6 B_ESCHR_COL R S
62011-02-16S72010-10-16B6 B_ESCHR_COLR S S SSTRUEFALSE
72011-02-24S72010-10-20B6 B_ESCHR_COL S S
82011-03-30S72010-11-01B6 B_ESCHR_COLRSS S RS FALSE
92011-04-25S72010-11-14B6 B_ESCHR_COL S S SRS FALSE
102011-05-06S72011-01-25B6 B_ESCHR_COLSR S S S
-

Only 2 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics() function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.

+

Only 1 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics() function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.

If a column exists with a name like ‘key(…)ab’ the first_isolate() function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output:

- + @@ -667,118 +667,118 @@ - - + + - + - - + + - - - + + + - - + + - + - - + + - + - + - - + + - + - - + + + - - - + + - - + + - + - - + + - + + - - - + + - + - - + + - + @@ -787,18 +787,19 @@
isolate
12010-01-28S72010-02-26B6 B_ESCHR_COL RI SR S TRUE TRUE
22010-02-07S72010-05-20B6 B_ESCHR_COLSSS RSSS FALSE TRUE
32010-03-16S72010-05-28B6 B_ESCHR_COL R S S S FALSETRUEFALSE
42010-10-09S72010-06-27B6 B_ESCHR_COLSR S S S FALSETRUEFALSE
52011-01-25S72010-09-06B6 B_ESCHR_COL R S S S FALSETRUEFALSE
62011-02-16S72010-10-16B6 B_ESCHR_COLR S S SSTRUETRUEFALSEFALSE
72011-02-24S72010-10-20B6 B_ESCHR_COL S S S S FALSEFALSETRUE
82011-03-30S72010-11-01B6 B_ESCHR_COLRSS S RS FALSE TRUE
92011-04-25S72010-11-14B6 B_ESCHR_COL S S SRS FALSE TRUE
102011-05-06S72011-01-25B6 B_ESCHR_COLSR S S S
-

Instead of 2, now 9 isolates are flagged. In total, 75.5% of all isolates are marked ‘first weighted’ - 46.9% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.

+

Instead of 1, now 6 isolates are flagged. In total, 75% of all isolates are marked ‘first weighted’ - 46.6% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.

As with filter_first_isolate(), there’s a shortcut for this new algorithm too:

- -

So we end up with 15,097 isolates for analysis.

+ +

So we end up with 15,003 isolates for analysis.

We can remove unneeded columns:

- +

Now our data looks like:

-
head(data_1st)
+
head(data_1st)
+ @@ -815,90 +816,96 @@ - - + + + - + + + + - - - - - - + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + - - - - + + - - - - + + + + + - + - - + + - - - + + + + - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -918,12 +925,12 @@ Dispersion of species

To just get an idea how the species are distributed, create a frequency table with our freq() function. We created the genus and species column earlier based on the microbial ID. With paste(), we can concatenate them together.

The freq() function can be used like the base R language was intended:

-
freq(paste(data_1st$genus, data_1st$species))
+
freq(paste(data_1st$genus, data_1st$species))

Or can be used like the dplyr way, which is easier readable:

-
data_1st %>% freq(genus, species)
-

Frequency table of genus and species from data_1st (15,097 x 13)

+
data_1st %>% freq(genus, species)
+

Frequency table of genus and species from data_1st (15,003 x 13)

Columns: 2
-Length: 15,097 (of which NA: 0 = 0.00%)
+Length: 15,003 (of which NA: 0 = 0.00%)
Unique: 4

Shortest: 16
Longest: 24

@@ -940,33 +947,33 @@ Longest: 24

- - - - + + + + - - - + + + - - - - + + + + - - - + + + @@ -976,12 +983,12 @@ Longest: 24

Resistance percentages

The functions portion_S(), portion_SI(), portion_I(), portion_IR() and portion_R() can be used to determine the portion of a specific antimicrobial outcome. As per the EUCAST guideline of 2019, we calculate resistance as the portion of R (portion_R()) and susceptibility as the portion of S and I (portion_SI()). These functions can be used on their own:

- +

Or can be used in conjuction with group_by() and summarise(), both from the dplyr package:

-
data_1st %>% 
-  group_by(hospital) %>% 
-  summarise(amoxicillin = portion_R(AMX))
+
data_1st %>% 
+  group_by(hospital) %>% 
+  summarise(amoxicillin = portion_R(AMX))
date patient_id hospital
2011-09-06Z522017-12-18Q8 Hospital BB_ESCHR_COLB_STRPT_PNESSS RSSS FGram-negativeEscherichiacoliGram-positiveStreptococcuspneumoniae TRUE
2015-03-21E7Hospital CB_ESCHR_COLRISSMGram-negativeEscherichiacoliTRUE
2010-08-11X6Hospital CB_ESCHR_COLSSRSFGram-negativeEscherichiacoliTRUE
2012-06-16E10Hospital D32015-12-08X5Hospital B B_STPHY_AURSS RSSSMRF Gram-positive Staphylococcus aureus TRUE
2016-12-29J3Hospital CB_ESCHR_COL42011-12-08X3Hospital BB_KLBSL_PNE R S S SMF Gram-negativeEscherichiacoliKlebsiellapneumoniae TRUE
2010-04-09Q3Hospital B52011-01-02E1Hospital A B_STRPT_PNE R R S RFMGram-positiveStreptococcuspneumoniaeTRUE
62012-10-22K1Hospital CB_STRPT_PNESSSRMGram-positiveStreptococcuspneumoniaeTRUE
72016-07-14C5Hospital DB_STRPT_PNERRSRM Gram-positive Streptococcus pneumoniae
1 Escherichia coli7,48349.6%7,48349.6%7,25148.3%7,25148.3%
2 Staphylococcus aureus3,67324.3%11,1563,83425.6%11,085 73.9%
3 Streptococcus pneumoniae2,30615.3%13,46289.2%2,32815.5%13,41389.4%
4 Klebsiella pneumoniae1,63510.8%15,0971,59010.6%15,003 100.0%
@@ -990,27 +997,27 @@ Longest: 24

- + - + - + - +
hospital
Hospital A0.48104060.4698715
Hospital B0.47142590.4707917
Hospital C0.46251130.4548753
Hospital D0.47532470.4847403

Of course it would be very convenient to know the number of isolates responsible for the percentages. For that purpose the n_rsi() can be used, which works exactly like n_distinct() from the dplyr package. It counts all isolates available for every group (i.e. values S, I or R):

-
data_1st %>% 
-  group_by(hospital) %>% 
-  summarise(amoxicillin = portion_R(AMX),
-            available = n_rsi(AMX))
+
data_1st %>% 
+  group_by(hospital) %>% 
+  summarise(amoxicillin = portion_R(AMX),
+            available = n_rsi(AMX))
@@ -1020,32 +1027,32 @@ Longest: 24

- - + + - - + + - - + + - +
hospital
Hospital A0.481040645360.46987154514
Hospital B0.471425952670.47079175204
Hospital C0.462511322140.45487532205
Hospital D0.47532470.4847403 3080

These functions can also be used to get the portion of multiple antibiotics, to calculate empiric susceptibility of combination therapies very easily:

- + @@ -1056,94 +1063,94 @@ Longest: 24

- - - + + + - - - + + + - - - + + + - + - +
genus
Escherichia0.92569820.89295740.99518910.92332090.89394570.9942077
Klebsiella0.82140670.90336390.98654430.83144650.91509430.9874214
Staphylococcus0.92295130.92240680.99373810.92383930.91601460.9921753
Streptococcus0.61535130.6121134 0.00000000.61535130.6121134

To make a transition to the next part, let’s see how this difference could be plotted:

-
data_1st %>% 
-  group_by(genus) %>% 
-  summarise("1. Amoxi/clav" = portion_SI(AMC),
-            "2. Gentamicin" = portion_SI(GEN),
-            "3. Amoxi/clav + genta" = portion_SI(AMC, GEN)) %>% 
-  tidyr::gather("antibiotic", "S", -genus) %>%
-  ggplot(aes(x = genus,
-             y = S,
-             fill = antibiotic)) +
-  geom_col(position = "dodge2")
+
data_1st %>% 
+  group_by(genus) %>% 
+  summarise("1. Amoxi/clav" = portion_SI(AMC),
+            "2. Gentamicin" = portion_SI(GEN),
+            "3. Amoxi/clav + genta" = portion_SI(AMC, GEN)) %>% 
+  tidyr::gather("antibiotic", "S", -genus) %>%
+  ggplot(aes(x = genus,
+             y = S,
+             fill = antibiotic)) +
+  geom_col(position = "dodge2")

Plots

To show results in plots, most R users would nowadays use the ggplot2 package. This package lets you create plots in layers. You can read more about it on their website. A quick example would look like these syntaxes:

-
ggplot(data = a_data_set,
-       mapping = aes(x = year,
-                     y = value)) +
-  geom_col() +
-  labs(title = "A title",
-       subtitle = "A subtitle",
-       x = "My X axis",
-       y = "My Y axis")
-
-# or as short as:
-ggplot(a_data_set) +
-  geom_bar(aes(year))
+
ggplot(data = a_data_set,
+       mapping = aes(x = year,
+                     y = value)) +
+  geom_col() +
+  labs(title = "A title",
+       subtitle = "A subtitle",
+       x = "My X axis",
+       y = "My Y axis")
+
+# or as short as:
+ggplot(a_data_set) +
+  geom_bar(aes(year))

The AMR package contains functions to extend this ggplot2 package, for example geom_rsi(). It automatically transforms data with count_df() or portion_df() and show results in stacked bars. Its simplest and shortest example:

-
ggplot(data_1st) +
-  geom_rsi(translate_ab = FALSE)
+
ggplot(data_1st) +
+  geom_rsi(translate_ab = FALSE)

Omit the translate_ab = FALSE to have the antibiotic codes (AMX, AMC, CIP, GEN) translated to official WHO names (amoxicillin, amoxicillin/clavulanic acid, ciprofloxacin, gentamicin).

If we group on e.g. the genus column and add some additional functions from our package, we can create this:

- +

To simplify this, we also created the ggplot_rsi() function, which combines almost all above functions:

- +

@@ -1151,33 +1158,33 @@ Longest: 24

Independence test

The next example uses the included septic_patients, which is an anonymised data set containing 2,000 microbial blood culture isolates with their full antibiograms found in septic patients in 4 different hospitals in the Netherlands, between 2001 and 2017. It is true, genuine data. This data.frame can be used to practice AMR analysis.

We will compare the resistance to fosfomycin (column FOS) in hospital A and D. The input for the fisher.test() can be retrieved with a transformation like this:

-
check_FOS <- septic_patients %>%
-  filter(hospital_id %in% c("A", "D")) %>% # filter on only hospitals A and D
-  select(hospital_id, FOS) %>%             # select the hospitals and fosfomycin
-  group_by(hospital_id) %>%                # group on the hospitals
-  count_df(combine_SI = TRUE) %>%          # count all isolates per group (hospital_id)
-  tidyr::spread(hospital_id, value) %>%    # transform output so A and D are columns
-  select(A, D) %>%                         # and select these only
-  as.matrix()                              # transform to good old matrix for fisher.test()
-
-check_FOS
-#       A  D
-# [1,] 25 77
-# [2,] 24 33
+
check_FOS <- septic_patients %>%
+  filter(hospital_id %in% c("A", "D")) %>% # filter on only hospitals A and D
+  select(hospital_id, FOS) %>%             # select the hospitals and fosfomycin
+  group_by(hospital_id) %>%                # group on the hospitals
+  count_df(combine_SI = TRUE) %>%          # count all isolates per group (hospital_id)
+  tidyr::spread(hospital_id, value) %>%    # transform output so A and D are columns
+  select(A, D) %>%                         # and select these only
+  as.matrix()                              # transform to good old matrix for fisher.test()
+
+check_FOS
+#       A  D
+# [1,] 25 77
+# [2,] 24 33

We can apply the test now with:

- +

As can be seen, the p value is 0.031, which means that the fosfomycin resistances found in hospital A and D are really different.

diff --git a/docs/articles/AMR_files/figure-html/plot 1-1.png b/docs/articles/AMR_files/figure-html/plot 1-1.png index 97295b95..4b81035a 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 1-1.png and b/docs/articles/AMR_files/figure-html/plot 1-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 3-1.png b/docs/articles/AMR_files/figure-html/plot 3-1.png index bf1b2b19..e7b3a110 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 3-1.png and b/docs/articles/AMR_files/figure-html/plot 3-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 4-1.png b/docs/articles/AMR_files/figure-html/plot 4-1.png index 3f4fbeb6..f4358c5d 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 4-1.png and b/docs/articles/AMR_files/figure-html/plot 4-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 5-1.png b/docs/articles/AMR_files/figure-html/plot 5-1.png index dc9c4aa3..59f73e6e 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 5-1.png and b/docs/articles/AMR_files/figure-html/plot 5-1.png differ diff --git a/docs/articles/EUCAST.html b/docs/articles/EUCAST.html index 256d165e..087c2f29 100644 --- a/docs/articles/EUCAST.html +++ b/docs/articles/EUCAST.html @@ -40,7 +40,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 @@ -192,7 +192,7 @@

How to apply EUCAST rules

Matthijs S. Berends

-

01 July 2019

+

10 July 2019

diff --git a/docs/articles/MDR.html b/docs/articles/MDR.html index f59d29b5..aaebb3d6 100644 --- a/docs/articles/MDR.html +++ b/docs/articles/MDR.html @@ -40,7 +40,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 @@ -192,7 +192,7 @@

How to determine multi-drug resistance (MDR)

Matthijs S. Berends

-

01 July 2019

+

10 July 2019

@@ -208,57 +208,57 @@
  • “WIP-Richtlijn Bijzonder Resistente Micro-organismen (BRMO)”, by RIVM (Rijksinstituut voor de Volksgezondheid, the Netherlands National Institute for Public Health and the Environment)
  • As an example, I will make a data set to determine multi-drug resistant TB:

    -
    # a helper function to get a random vector with values S, I and R
    -# with the probabilities 50%-10%-40%
    -sample_rsi <- function() {
    -  sample(c("S", "I", "R"),
    -         size = 5000,
    -         prob = c(0.5, 0.1, 0.4),
    -         replace = TRUE)
    -}
    -
    -my_TB_data <- data.frame(rifampicin = sample_rsi(),
    -                         isoniazid = sample_rsi(),
    -                         gatifloxacin = sample_rsi(),
    -                         ethambutol = sample_rsi(),
    -                         pyrazinamide = sample_rsi(),
    -                         moxifloxacin = sample_rsi(),
    -                         kanamycin = sample_rsi())
    +
    # a helper function to get a random vector with values S, I and R
    +# with the probabilities 50%-10%-40%
    +sample_rsi <- function() {
    +  sample(c("S", "I", "R"),
    +         size = 5000,
    +         prob = c(0.5, 0.1, 0.4),
    +         replace = TRUE)
    +}
    +
    +my_TB_data <- data.frame(rifampicin = sample_rsi(),
    +                         isoniazid = sample_rsi(),
    +                         gatifloxacin = sample_rsi(),
    +                         ethambutol = sample_rsi(),
    +                         pyrazinamide = sample_rsi(),
    +                         moxifloxacin = sample_rsi(),
    +                         kanamycin = sample_rsi())

    Because all column names are automatically verified for valid drug names or codes, this would have worked exactly the same:

    -
    my_TB_data <- data.frame(RIF = sample_rsi(),
    -                         INH = sample_rsi(),
    -                         GAT = sample_rsi(),
    -                         ETH = sample_rsi(),
    -                         PZA = sample_rsi(),
    -                         MFX = sample_rsi(),
    -                         KAN = sample_rsi())
    +
    my_TB_data <- data.frame(RIF = sample_rsi(),
    +                         INH = sample_rsi(),
    +                         GAT = sample_rsi(),
    +                         ETH = sample_rsi(),
    +                         PZA = sample_rsi(),
    +                         MFX = sample_rsi(),
    +                         KAN = sample_rsi())

    The data set looks like this now:

    -
    head(my_TB_data)
    -#   rifampicin isoniazid gatifloxacin ethambutol pyrazinamide moxifloxacin
    -# 1          R         R            S          S            R            I
    -# 2          R         R            S          R            S            R
    -# 3          R         S            S          R            R            R
    -# 4          R         R            S          S            S            I
    -# 5          R         R            S          S            R            I
    -# 6          R         S            R          R            S            S
    -#   kanamycin
    -# 1         S
    -# 2         S
    -# 3         S
    -# 4         R
    -# 5         R
    -# 6         S
    +
    head(my_TB_data)
    +#   rifampicin isoniazid gatifloxacin ethambutol pyrazinamide moxifloxacin
    +# 1          R         S            S          R            S            R
    +# 2          R         S            R          S            S            S
    +# 3          R         R            R          R            S            R
    +# 4          S         S            R          S            R            S
    +# 5          R         R            S          I            S            S
    +# 6          S         S            R          R            R            S
    +#   kanamycin
    +# 1         S
    +# 2         S
    +# 3         S
    +# 4         R
    +# 5         R
    +# 6         S

    We can now add the interpretation of MDR-TB to our data set:

    -
    my_TB_data$mdr <- mdr_tb(my_TB_data)
    -# NOTE: No column found as input for `col_mo`, assuming all records contain Mycobacterium tuberculosis.
    -# Determining multidrug-resistant organisms (MDRO), according to:
    -# Guideline: Companion handbook to the WHO guidelines for the programmatic management of drug-resistant tuberculosis
    -# Version:   WHO/HTM/TB/2014.11
    -# Author:    WHO (World Health Organization)
    -# Source:    https://www.who.int/tb/publications/pmdt_companionhandbook/en/
    -# NOTE: Reliability might be improved if these antimicrobial results would be available too: CAP (capreomycin), RIB (rifabutin), RFP (rifapentine)
    +
    my_TB_data$mdr <- mdr_tb(my_TB_data)
    +# NOTE: No column found as input for `col_mo`, assuming all records contain Mycobacterium tuberculosis.
    +# Determining multidrug-resistant organisms (MDRO), according to:
    +# Guideline: Companion handbook to the WHO guidelines for the programmatic management of drug-resistant tuberculosis
    +# Version:   WHO/HTM/TB/2014.11
    +# Author:    WHO (World Health Organization)
    +# Source:    https://www.who.int/tb/publications/pmdt_companionhandbook/en/
    +# NOTE: Reliability might be improved if these antimicrobial results would be available too: CAP (capreomycin), RIB (rifabutin), RFP (rifapentine)

    And review the result with a frequency table:

    -
    freq(my_TB_data$mdr)
    +
    freq(my_TB_data$mdr)

    Frequency table of mdr from my_TB_data (5,000 x 8)

    Class: factor > ordered (numeric)
    Length: 5,000 (of which NA: 0 = 0.00%)
    @@ -277,40 +277,40 @@ Unique: 5

    1 Mono-resistance -3,222 -64.4% -3,222 -64.4% +3,273 +65.5% +3,273 +65.5% 2 Negative -659 -13.2% -3,881 -77.6% +687 +13.7% +3,960 +79.2% 3 Multidrug resistance -589 -11.8% -4,470 -89.4% +569 +11.4% +4,529 +90.6% 4 Poly-resistance -313 -6.3% -4,783 -95.7% +277 +5.5% +4,806 +96.1% 5 Extensive drug resistance -217 -4.3% +194 +3.9% 5,000 100.0% diff --git a/docs/articles/SPSS.html b/docs/articles/SPSS.html index 35c47c15..67304bb9 100644 --- a/docs/articles/SPSS.html +++ b/docs/articles/SPSS.html @@ -40,7 +40,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 @@ -192,7 +192,7 @@

    How to import data from SPSS / SAS / Stata

    Matthijs S. Berends

    -

    01 July 2019

    +

    10 July 2019

    @@ -242,39 +242,39 @@

    If you sometimes write syntaxes in SPSS to run a complete analysis or to ‘automate’ some of your work, you should perhaps do this in R. You will notice that writing syntaxes in R is a lot more nifty and clever than in SPSS. Still, as working with any statistical package, you will have to have knowledge about what you are doing (statistically) and what you are willing to accomplish.

    To demonstrate the first point:

    -
    # not all values are valid MIC values:
    -as.mic(0.125)
    -# Class 'mic'
    -# [1] 0.125
    -as.mic("testvalue")
    -# Class 'mic'
    -# [1] <NA>
    -
    -# the Gram stain is avaiable for all bacteria:
    -mo_gramstain("E. coli")
    -# [1] "Gram-negative"
    -
    -# Klebsiella is intrinsic resistant to amoxicllin, according to EUCAST:
    -klebsiella_test <- data.frame(mo = "klebsiella", 
    -                              amox = "S",
    -                              stringsAsFactors = FALSE)
    -klebsiella_test
    -#           mo amox
    -# 1 klebsiella    S
    -eucast_rules(klebsiella_test, info = FALSE)
    -#           mo amox
    -# 1 klebsiella    R
    -
    -# hundreds of trade names can be translated to a name, trade name or an ATC code:
    -ab_name("floxapen")
    -# [1] "Flucloxacillin"
    -ab_tradenames("floxapen")
    -#  [1] "Floxacillin"          "FLOXACILLIN"          "Floxapen"            
    -#  [4] "Floxapen sodium salt" "Fluclox"              "Flucloxacilina"      
    -#  [7] "Flucloxacillin"       "Flucloxacilline"      "Flucloxacillinum"    
    -# [10] "Fluorochloroxacillin"
    -ab_atc("floxapen")
    -# [1] "J01CF05"
    +
    # not all values are valid MIC values:
    +as.mic(0.125)
    +# Class 'mic'
    +# [1] 0.125
    +as.mic("testvalue")
    +# Class 'mic'
    +# [1] <NA>
    +
    +# the Gram stain is avaiable for all bacteria:
    +mo_gramstain("E. coli")
    +# [1] "Gram-negative"
    +
    +# Klebsiella is intrinsic resistant to amoxicllin, according to EUCAST:
    +klebsiella_test <- data.frame(mo = "klebsiella", 
    +                              amox = "S",
    +                              stringsAsFactors = FALSE)
    +klebsiella_test
    +#           mo amox
    +# 1 klebsiella    S
    +eucast_rules(klebsiella_test, info = FALSE)
    +#           mo amox
    +# 1 klebsiella    R
    +
    +# hundreds of trade names can be translated to a name, trade name or an ATC code:
    +ab_name("floxapen")
    +# [1] "Flucloxacillin"
    +ab_tradenames("floxapen")
    +#  [1] "Floxacillin"          "FLOXACILLIN"          "Floxapen"            
    +#  [4] "Floxapen sodium salt" "Fluclox"              "Flucloxacilina"      
    +#  [7] "Flucloxacillin"       "Flucloxacilline"      "Flucloxacillinum"    
    +# [10] "Fluorochloroxacillin"
    +ab_atc("floxapen")
    +# [1] "J01CF05"

    @@ -290,97 +290,97 @@

    If you want named variables to be imported as factors so it resembles SPSS more, use as_factor().

    The difference is this:

    - +

    Base R

    To import data from SPSS, SAS or Stata, you can use the great haven package yourself:

    -
    # download and install the latest version:
    -install.packages("haven")
    -# load the package you just installed:
    -library(haven) 
    +
    # download and install the latest version:
    +install.packages("haven")
    +# load the package you just installed:
    +library(haven) 

    You can now import files as follows:

    SPSS

    To read files from SPSS into R:

    -
    # read any SPSS file based on file extension (best way):
    -read_spss(file = "path/to/file")
    -
    -# read .sav or .zsav file:
    -read_sav(file = "path/to/file")
    -
    -# read .por file:
    -read_por(file = "path/to/file")
    +
    # read any SPSS file based on file extension (best way):
    +read_spss(file = "path/to/file")
    +
    +# read .sav or .zsav file:
    +read_sav(file = "path/to/file")
    +
    +# read .por file:
    +read_por(file = "path/to/file")

    Do not forget about as_factor(), as mentioned above.

    To export your R objects to the SPSS file format:

    -
    # save as .sav file:
    -write_sav(data = yourdata, path = "path/to/file")
    -
    -# save as compressed .zsav file:
    -write_sav(data = yourdata, path = "path/to/file", compress = TRUE)
    +
    # save as .sav file:
    +write_sav(data = yourdata, path = "path/to/file")
    +
    +# save as compressed .zsav file:
    +write_sav(data = yourdata, path = "path/to/file", compress = TRUE)

    SAS

    To read files from SAS into R:

    -
    # read .sas7bdat + .sas7bcat files:
    -read_sas(data_file = "path/to/file", catalog_file = NULL)
    -
    -# read SAS transport files (version 5 and version 8):
    -read_xpt(file = "path/to/file")
    +
    # read .sas7bdat + .sas7bcat files:
    +read_sas(data_file = "path/to/file", catalog_file = NULL)
    +
    +# read SAS transport files (version 5 and version 8):
    +read_xpt(file = "path/to/file")

    To export your R objects to the SAS file format:

    -
    # save as regular SAS file:
    -write_sas(data = yourdata, path = "path/to/file")
    -
    -# the SAS transport format is an open format 
    -# (required for submission of the data to the FDA)
    -write_xpt(data = yourdata, path = "path/to/file", version = 8)
    +
    # save as regular SAS file:
    +write_sas(data = yourdata, path = "path/to/file")
    +
    +# the SAS transport format is an open format 
    +# (required for submission of the data to the FDA)
    +write_xpt(data = yourdata, path = "path/to/file", version = 8)

    Stata

    To read files from Stata into R:

    -
    # read .dta file:
    -read_stata(file = "/path/to/file")
    -
    -# works exactly the same:
    -read_dta(file = "/path/to/file")
    +
    # read .dta file:
    +read_stata(file = "/path/to/file")
    +
    +# works exactly the same:
    +read_dta(file = "/path/to/file")

    To export your R objects to the Stata file format:

    - +
    diff --git a/docs/articles/WHONET.html b/docs/articles/WHONET.html index e4e054b8..ac5cd4e0 100644 --- a/docs/articles/WHONET.html +++ b/docs/articles/WHONET.html @@ -40,7 +40,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 @@ -192,7 +192,7 @@

    How to work with WHONET data

    Matthijs S. Berends

    -

    01 July 2019

    +

    10 July 2019

    @@ -206,31 +206,31 @@ Import of data

    This tutorial assumes you already imported the WHONET data with e.g. the readxl package. In RStudio, this can be done using the menu button ‘Import Dataset’ in the tab ‘Environment’. Choose the option ‘From Excel’ and select your exported file. Make sure date fields are imported correctly.

    An example syntax could look like this:

    -
    library(readxl)
    -data <- read_excel(path = "path/to/your/file.xlsx")
    +
    library(readxl)
    +data <- read_excel(path = "path/to/your/file.xlsx")

    This package comes with an example data set WHONET. We will use it for this analysis.

    Preparation

    First, load the relevant packages if you did not yet did this. I use the tidyverse for all of my analyses. All of them. If you don’t know it yet, I suggest you read about it on their website: https://www.tidyverse.org/.

    -
    library(dplyr)   # part of tidyverse
    -library(ggplot2) # part of tidyverse
    -library(AMR)     # this package
    +
    library(dplyr)   # part of tidyverse
    +library(ggplot2) # part of tidyverse
    +library(AMR)     # this package

    We will have to transform some variables to simplify and automate the analysis:

    -
    # transform variables
    -data <- WHONET %>%
    -  # get microbial ID based on given organism
    -  mutate(mo = as.mo(Organism)) %>% 
    -  # transform everything from "AMP_ND10" to "CIP_EE" to the new `rsi` class
    -  mutate_at(vars(AMP_ND10:CIP_EE), as.rsi)
    +
    # transform variables
    +data <- WHONET %>%
    +  # get microbial ID based on given organism
    +  mutate(mo = as.mo(Organism)) %>% 
    +  # transform everything from "AMP_ND10" to "CIP_EE" to the new `rsi` class
    +  mutate_at(vars(AMP_ND10:CIP_EE), as.rsi)

    No errors or warnings, so all values are transformed succesfully. Let’s check it though, with a couple of frequency tables:

    -
    # our newly created `mo` variable
    -data %>% freq(mo, nmax = 10)
    +
    # our newly created `mo` variable
    +data %>% freq(mo, nmax = 10)

    Frequency table of mo from data (500 x 54)

    Class: mo (character)
    Length: 500 (of which NA: 0 = 0.00%)
    @@ -331,10 +331,10 @@ Species: 38

    (omitted 29 entries, n = 57 [11.4%])

    -
    
    -# our transformed antibiotic columns
    -# amoxicillin/clavulanic acid (J01CR02) as an example
    -data %>% freq(AMC_ND2)
    +
    
    +# our transformed antibiotic columns
    +# amoxicillin/clavulanic acid (J01CR02) as an example
    +data %>% freq(AMC_ND2)

    Frequency table of AMC_ND2 from data (500 x 54)

    Class: factor > ordered > rsi (numeric)
    Length: 500 (of which NA: 19 = 3.80%)
    diff --git a/docs/articles/benchmarks.html b/docs/articles/benchmarks.html index d950825d..84caae51 100644 --- a/docs/articles/benchmarks.html +++ b/docs/articles/benchmarks.html @@ -40,7 +40,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012

    @@ -192,7 +192,7 @@

    Benchmarks

    Matthijs S. Berends

    -

    01 July 2019

    +

    10 July 2019

    @@ -203,161 +203,161 @@

    One of the most important features of this package is the complete microbial taxonomic database, supplied by the Catalogue of Life. We created a function as.mo() that transforms any user input value to a valid microbial ID by using intelligent rules combined with the taxonomic tree of Catalogue of Life.

    Using the microbenchmark package, we can review the calculation performance of this function. Its function microbenchmark() runs different input expressions independently of each other and measures their time-to-result.

    -
    library(microbenchmark)
    -library(AMR)
    +
    library(microbenchmark)
    +library(AMR)

    In the next test, we try to ‘coerce’ different input values for Staphylococcus aureus. The actual result is the same every time: it returns its MO code B_STPHY_AUR (B stands for Bacteria, the taxonomic kingdom).

    But the calculation time differs a lot:

    -
    S.aureus <- microbenchmark(as.mo("sau"),
    -                           as.mo("stau"),
    -                           as.mo("staaur"),
    -                           as.mo("STAAUR"),
    -                           as.mo("S. aureus"),
    -                           as.mo("S.  aureus"),
    -                           as.mo("Staphylococcus aureus"),
    -                           times = 10)
    -print(S.aureus, unit = "ms", signif = 2)
    -# Unit: milliseconds
    -#                            expr  min   lq mean median   uq max neval
    -#                    as.mo("sau") 18.0 18.0   22   18.0 18.0  61    10
    -#                   as.mo("stau") 65.0 65.0   70   66.0 66.0 110    10
    -#                 as.mo("staaur") 18.0 18.0   33   18.0 62.0  81    10
    -#                 as.mo("STAAUR") 18.0 18.0   18   18.0 18.0  19    10
    -#              as.mo("S. aureus") 52.0 52.0   61   52.0 53.0  97    10
    -#             as.mo("S.  aureus") 52.0 52.0   71   53.0 97.0 150    10
    -#  as.mo("Staphylococcus aureus")  8.1  8.1   14    8.1  8.2  63    10
    +
    S.aureus <- microbenchmark(as.mo("sau"),
    +                           as.mo("stau"),
    +                           as.mo("staaur"),
    +                           as.mo("STAAUR"),
    +                           as.mo("S. aureus"),
    +                           as.mo("S.  aureus"),
    +                           as.mo("Staphylococcus aureus"),
    +                           times = 10)
    +print(S.aureus, unit = "ms", signif = 2)
    +# Unit: milliseconds
    +#                            expr  min   lq mean median   uq max neval
    +#                    as.mo("sau")  8.5  8.7 12.0    8.9  9.4  26    10
    +#                   as.mo("stau") 31.0 32.0 42.0   33.0 34.0 120    10
    +#                 as.mo("staaur")  8.6  8.7 11.0    9.1  9.2  26    10
    +#                 as.mo("STAAUR")  8.7  9.1  9.3    9.2  9.4  11    10
    +#              as.mo("S. aureus") 23.0 23.0 30.0   24.0 40.0  46    10
    +#             as.mo("S.  aureus") 22.0 23.0 27.0   24.0 25.0  41    10
    +#  as.mo("Staphylococcus aureus")  3.9  4.0  5.7    4.1  4.4  20    10

    In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 5 milliseconds means it can determine 200 input values per second. It case of 100 milliseconds, this is only 10 input values per second. The second input is the only one that has to be looked up thoroughly. All the others are known codes (the first one is a WHONET code) or common laboratory codes, or common full organism names like the last one. Full organism names are always preferred.

    To achieve this speed, the as.mo function also takes into account the prevalence of human pathogenic microorganisms. The downside is of course that less prevalent microorganisms will be determined less fast. See this example for the ID of Thermus islandicus (B_THERMS_ISL), a bug probably never found before in humans:

    -
    T.islandicus <- microbenchmark(as.mo("theisl"),
    -                               as.mo("THEISL"),
    -                               as.mo("T. islandicus"),
    -                               as.mo("T.  islandicus"),
    -                               as.mo("Thermus islandicus"),
    -                               times = 10)
    -print(T.islandicus, unit = "ms", signif = 2)
    -# Unit: milliseconds
    -#                         expr min  lq mean median  uq max neval
    -#              as.mo("theisl") 390 390  420    440 440 440    10
    -#              as.mo("THEISL") 390 390  410    400 440 440    10
    -#       as.mo("T. islandicus") 210 210  230    220 250 270    10
    -#      as.mo("T.  islandicus") 210 210  240    260 260 280    10
    -#  as.mo("Thermus islandicus")  72  72   92     73 120 130    10
    -

    That takes 6.8 times as much time on average. A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance. Full names (like Thermus islandicus) are almost fast - these are the most probable input from most data sets.

    +
    T.islandicus <- microbenchmark(as.mo("theisl"),
    +                               as.mo("THEISL"),
    +                               as.mo("T. islandicus"),
    +                               as.mo("T.  islandicus"),
    +                               as.mo("Thermus islandicus"),
    +                               times = 10)
    +print(T.islandicus, unit = "ms", signif = 2)
    +# Unit: milliseconds
    +#                         expr min  lq mean median  uq max neval
    +#              as.mo("theisl") 290 310  320    320 320 350    10
    +#              as.mo("THEISL") 290 300  310    310 330 340    10
    +#       as.mo("T. islandicus") 140 140  150    150 160 170    10
    +#      as.mo("T.  islandicus") 140 140  150    150 170 180    10
    +#  as.mo("Thermus islandicus")  50  52   58     54  68  70    10
    +

    That takes 10.2 times as much time on average. A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance. Full names (like Thermus islandicus) are almost fast - these are the most probable input from most data sets.

    In the figure below, we compare Escherichia coli (which is very common) with Prevotella brevis (which is moderately common) and with Thermus islandicus (which is very uncommon):

    -
    par(mar = c(5, 16, 4, 2)) # set more space for left margin text (16)
    -
    -boxplot(microbenchmark(as.mo("Thermus islandicus"),
    -                       as.mo("Prevotella brevis"),
    -                       as.mo("Escherichia coli"),
    -                       as.mo("T. islandicus"),
    -                       as.mo("P. brevis"),
    -                       as.mo("E. coli"),
    -                       times = 10),
    -        horizontal = TRUE, las = 1, unit = "s", log = FALSE,
    -        xlab = "", ylab = "Time in seconds",
    -        main = "Benchmarks per prevalence")
    +
    par(mar = c(5, 16, 4, 2)) # set more space for left margin text (16)
    +
    +boxplot(microbenchmark(as.mo("Thermus islandicus"),
    +                       as.mo("Prevotella brevis"),
    +                       as.mo("Escherichia coli"),
    +                       as.mo("T. islandicus"),
    +                       as.mo("P. brevis"),
    +                       as.mo("E. coli"),
    +                       times = 10),
    +        horizontal = TRUE, las = 1, unit = "s", log = FALSE,
    +        xlab = "", ylab = "Time in seconds",
    +        main = "Benchmarks per prevalence")

    Uncommon microorganisms take a lot more time than common microorganisms. To relieve this pitfall and further improve performance, two important calculations take almost no time at all: repetitive results and already precalculated results.

    Repetitive results

    Repetitive results are unique values that are present more than once. Unique values will only be calculated once by as.mo(). We will use mo_fullname() for this test - a helper function that returns the full microbial name (genus, species and possibly subspecies) which uses as.mo() internally.

    -
    library(dplyr)
    -# take all MO codes from the septic_patients data set
    -x <- septic_patients$mo %>%
    -  # keep only the unique ones
    -  unique() %>%
    -  # pick 50 of them at random
    -  sample(50) %>%
    -  # paste that 10,000 times
    -  rep(10000) %>%
    -  # scramble it
    -  sample()
    -  
    -# got indeed 50 times 10,000 = half a million?
    -length(x)
    -# [1] 500000
    -
    -# and how many unique values do we have?
    -n_distinct(x)
    -# [1] 50
    -
    -# now let's see:
    -run_it <- microbenchmark(mo_fullname(x),
    -                         times = 10)
    -print(run_it, unit = "ms", signif = 3)
    -# Unit: milliseconds
    -#            expr  min   lq mean median   uq  max neval
    -#  mo_fullname(x) 1050 1050 1100   1090 1120 1230    10
    -

    So transforming 500,000 values (!!) of 50 unique values only takes 1.09 seconds (1092 ms). You only lose time on your unique input values.

    +
    library(dplyr)
    +# take all MO codes from the septic_patients data set
    +x <- septic_patients$mo %>%
    +  # keep only the unique ones
    +  unique() %>%
    +  # pick 50 of them at random
    +  sample(50) %>%
    +  # paste that 10,000 times
    +  rep(10000) %>%
    +  # scramble it
    +  sample()
    +  
    +# got indeed 50 times 10,000 = half a million?
    +length(x)
    +# [1] 500000
    +
    +# and how many unique values do we have?
    +n_distinct(x)
    +# [1] 50
    +
    +# now let's see:
    +run_it <- microbenchmark(mo_fullname(x),
    +                         times = 10)
    +print(run_it, unit = "ms", signif = 3)
    +# Unit: milliseconds
    +#            expr min  lq mean median  uq max neval
    +#  mo_fullname(x) 611 628  643    635 650 714    10
    +

    So transforming 500,000 values (!!) of 50 unique values only takes 0.63 seconds (634 ms). You only lose time on your unique input values.

    Precalculated results

    What about precalculated results? If the input is an already precalculated result of a helper function like mo_fullname(), it almost doesn’t take any time at all (see ‘C’ below):

    -
    run_it <- microbenchmark(A = mo_fullname("B_STPHY_AUR"),
    -                         B = mo_fullname("S. aureus"),
    -                         C = mo_fullname("Staphylococcus aureus"),
    -                         times = 10)
    -print(run_it, unit = "ms", signif = 3)
    -# Unit: milliseconds
    -#  expr   min    lq  mean median    uq    max neval
    -#     A 13.00 13.20 13.60  13.60 14.00  14.40    10
    -#     B 49.40 50.00 57.50  51.90 52.40 103.00    10
    -#     C  1.52  1.72  1.81   1.78  1.98   1.99    10
    -

    So going from mo_fullname("Staphylococcus aureus") to "Staphylococcus aureus" takes 0.0018 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:

    -
    run_it <- microbenchmark(A = mo_species("aureus"),
    -                         B = mo_genus("Staphylococcus"),
    -                         C = mo_fullname("Staphylococcus aureus"),
    -                         D = mo_family("Staphylococcaceae"),
    -                         E = mo_order("Bacillales"),
    -                         F = mo_class("Bacilli"),
    -                         G = mo_phylum("Firmicutes"),
    -                         H = mo_kingdom("Bacteria"),
    -                         times = 10)
    -print(run_it, unit = "ms", signif = 3)
    -# Unit: milliseconds
    -#  expr   min    lq  mean median    uq   max neval
    -#     A 0.612 0.623 0.685  0.653 0.789 0.814    10
    -#     B 0.556 0.575 0.680  0.671 0.689 0.958    10
    -#     C 1.520 1.710 1.800  1.820 1.950 1.970    10
    -#     D 0.547 0.665 0.723  0.688 0.811 0.997    10
    -#     E 0.490 0.541 0.633  0.629 0.748 0.756    10
    -#     F 0.482 0.569 0.612  0.590 0.663 0.756    10
    -#     G 0.551 0.558 0.601  0.586 0.632 0.735    10
    -#     H 0.494 0.564 0.595  0.575 0.608 0.757    10
    +
    run_it <- microbenchmark(A = mo_fullname("B_STPHY_AUR"),
    +                         B = mo_fullname("S. aureus"),
    +                         C = mo_fullname("Staphylococcus aureus"),
    +                         times = 10)
    +print(run_it, unit = "ms", signif = 3)
    +# Unit: milliseconds
    +#  expr    min     lq   mean median    uq   max neval
    +#     A  6.730  7.030  8.030  7.750  8.72  9.73    10
    +#     B 22.400 23.000 27.100 23.600 27.10 46.00    10
    +#     C  0.835  0.877  0.978  0.925  1.12  1.18    10
    +

    So going from mo_fullname("Staphylococcus aureus") to "Staphylococcus aureus" takes 0.0009 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:

    +
    run_it <- microbenchmark(A = mo_species("aureus"),
    +                         B = mo_genus("Staphylococcus"),
    +                         C = mo_fullname("Staphylococcus aureus"),
    +                         D = mo_family("Staphylococcaceae"),
    +                         E = mo_order("Bacillales"),
    +                         F = mo_class("Bacilli"),
    +                         G = mo_phylum("Firmicutes"),
    +                         H = mo_kingdom("Bacteria"),
    +                         times = 10)
    +print(run_it, unit = "ms", signif = 3)
    +# Unit: milliseconds
    +#  expr   min    lq  mean median    uq   max neval
    +#     A 0.468 0.470 0.533  0.489 0.595 0.690    10
    +#     B 0.504 0.513 0.555  0.520 0.571 0.711    10
    +#     C 0.629 0.687 0.864  0.855 1.050 1.130    10
    +#     D 0.505 0.515 0.575  0.530 0.649 0.767    10
    +#     E 0.442 0.457 0.529  0.481 0.531 0.774    10
    +#     F 0.447 0.510 0.554  0.568 0.609 0.618    10
    +#     G 0.443 0.470 0.492  0.477 0.506 0.601    10
    +#     H 0.448 0.459 0.491  0.466 0.515 0.633    10

    Of course, when running mo_phylum("Firmicutes") the function has zero knowledge about the actual microorganism, namely S. aureus. But since the result would be "Firmicutes" too, there is no point in calculating the result. And because this package ‘knows’ all phyla of all known bacteria (according to the Catalogue of Life), it can just return the initial value immediately.

    Results in other languages

    When the system language is non-English and supported by this AMR package, some functions will have a translated result. This almost does’t take extra time:

    -
    mo_fullname("CoNS", language = "en") # or just mo_fullname("CoNS") on an English system
    -# [1] "Coagulase-negative Staphylococcus (CoNS)"
    -
    -mo_fullname("CoNS", language = "es") # or just mo_fullname("CoNS") on a Spanish system
    -# [1] "Staphylococcus coagulasa negativo (SCN)"
    -
    -mo_fullname("CoNS", language = "nl") # or just mo_fullname("CoNS") on a Dutch system
    -# [1] "Coagulase-negatieve Staphylococcus (CNS)"
    -
    -run_it <- microbenchmark(en = mo_fullname("CoNS", language = "en"),
    -                         de = mo_fullname("CoNS", language = "de"),
    -                         nl = mo_fullname("CoNS", language = "nl"),
    -                         es = mo_fullname("CoNS", language = "es"),
    -                         it = mo_fullname("CoNS", language = "it"),
    -                         fr = mo_fullname("CoNS", language = "fr"),
    -                         pt = mo_fullname("CoNS", language = "pt"),
    -                         times = 10)
    -print(run_it, unit = "ms", signif = 4)
    -# Unit: milliseconds
    -#  expr   min    lq  mean median    uq    max neval
    -#    en 43.00 43.12 45.51  44.82 44.89  56.61    10
    -#    de 46.47 46.99 52.11  47.57 48.11  93.77    10
    -#    nl 60.86 62.72 67.57  63.69 63.99 108.20    10
    -#    es 45.74 46.05 52.37  46.42 47.98 103.00    10
    -#    it 45.84 45.89 51.90  47.66 47.73  94.83    10
    -#    fr 45.97 46.92 47.44  47.76 47.86  48.49    10
    -#    pt 45.93 46.77 47.36  47.77 47.93  48.12    10
    +
    mo_fullname("CoNS", language = "en") # or just mo_fullname("CoNS") on an English system
    +# [1] "Coagulase-negative Staphylococcus (CoNS)"
    +
    +mo_fullname("CoNS", language = "es") # or just mo_fullname("CoNS") on a Spanish system
    +# [1] "Staphylococcus coagulasa negativo (SCN)"
    +
    +mo_fullname("CoNS", language = "nl") # or just mo_fullname("CoNS") on a Dutch system
    +# [1] "Coagulase-negatieve Staphylococcus (CNS)"
    +
    +run_it <- microbenchmark(en = mo_fullname("CoNS", language = "en"),
    +                         de = mo_fullname("CoNS", language = "de"),
    +                         nl = mo_fullname("CoNS", language = "nl"),
    +                         es = mo_fullname("CoNS", language = "es"),
    +                         it = mo_fullname("CoNS", language = "it"),
    +                         fr = mo_fullname("CoNS", language = "fr"),
    +                         pt = mo_fullname("CoNS", language = "pt"),
    +                         times = 10)
    +print(run_it, unit = "ms", signif = 4)
    +# Unit: milliseconds
    +#  expr   min    lq  mean median    uq   max neval
    +#    en 19.29 20.39 22.42  20.64 21.26 38.98    10
    +#    de 22.03 22.40 23.78  23.08 23.60 31.74    10
    +#    nl 27.98 28.60 29.22  28.87 30.13 30.53    10
    +#    es 21.44 22.97 30.04  24.11 45.71 46.46    10
    +#    it 21.17 21.82 22.47  22.44 23.27 23.52    10
    +#    fr 20.75 21.58 24.10  22.04 22.41 42.96    10
    +#    pt 21.24 21.92 24.31  22.75 23.26 39.91    10

    Currently supported are German, Dutch, Spanish, Italian, French and Portuguese.

    diff --git a/docs/articles/benchmarks_files/figure-html/unnamed-chunk-5-1.png b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-5-1.png index e79a5f73..ae65b719 100644 Binary files a/docs/articles/benchmarks_files/figure-html/unnamed-chunk-5-1.png and b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-5-1.png differ diff --git a/docs/articles/freq.html b/docs/articles/freq.html index 3da235d3..4424b9cf 100644 --- a/docs/articles/freq.html +++ b/docs/articles/freq.html @@ -40,7 +40,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 @@ -192,7 +192,7 @@

    How to create frequency tables

    Matthijs S. Berends

    -

    01 July 2019

    +

    10 July 2019

    @@ -210,17 +210,17 @@

    Frequencies of one variable

    To only show and quickly review the content of one variable, you can just select this variable in various ways. Let’s say we want to get the frequencies of the gender variable of the septic_patients dataset:

    -
    # Any of these will work:
    -# freq(septic_patients$gender)
    -# freq(septic_patients[, "gender"])
    -
    -# Using tidyverse:
    -# septic_patients$gender %>% freq()
    -# septic_patients[, "gender"] %>% freq()
    -# septic_patients %>% freq("gender")
    -
    -# Probably the fastest and easiest:
    -septic_patients %>% freq(gender)  
    +
    # Any of these will work:
    +# freq(septic_patients$gender)
    +# freq(septic_patients[, "gender"])
    +
    +# Using tidyverse:
    +# septic_patients$gender %>% freq()
    +# septic_patients[, "gender"] %>% freq()
    +# septic_patients %>% freq("gender")
    +
    +# Probably the fastest and easiest:
    +septic_patients %>% freq(gender)  

    Frequency table of gender from septic_patients (2,000 x 49)

    Class: character
    Length: 2,000 (of which NA: 0 = 0.00%)
    @@ -262,22 +262,22 @@ Longest: 1

    Frequencies of more than one variable

    Multiple variables will be pasted into one variable to review individual cases, keeping a univariate frequency table.

    For illustration, we could add some more variables to the septic_patients dataset to learn about bacterial properties:

    -
    my_patients <- septic_patients %>% left_join_microorganisms()
    -# Joining, by = "mo"
    +
    my_patients <- septic_patients %>% left_join_microorganisms()
    +# Joining, by = "mo"

    Now all variables of the microorganisms dataset have been joined to the septic_patients dataset. The microorganisms dataset consists of the following variables:

    -
    colnames(microorganisms)
    -#  [1] "mo"         "col_id"     "fullname"   "kingdom"    "phylum"    
    -#  [6] "class"      "order"      "family"     "genus"      "species"   
    -# [11] "subspecies" "rank"       "ref"        "species_id" "source"    
    -# [16] "prevalence"
    +
    colnames(microorganisms)
    +#  [1] "mo"         "col_id"     "fullname"   "kingdom"    "phylum"    
    +#  [6] "class"      "order"      "family"     "genus"      "species"   
    +# [11] "subspecies" "rank"       "ref"        "species_id" "source"    
    +# [16] "prevalence"

    If we compare the dimensions between the old and new dataset, we can see that these 15 variables were added:

    -
    dim(septic_patients)
    -# [1] 2000   49
    -dim(my_patients)
    -# [1] 2000   64
    +
    dim(septic_patients)
    +# [1] 2000   49
    +dim(my_patients)
    +# [1] 2000   64

    So now the genus and species variables are available. A frequency table of these combined variables can be created like this:

    -
    my_patients %>%
    -  freq(genus, species, nmax = 15)
    +
    my_patients %>%
    +  freq(genus, species, nmax = 15)

    Frequency table of genus and species from my_patients (2,000 x 64)

    Columns: 2
    Length: 2,000 (of which NA: 0 = 0.00%)
    @@ -423,10 +423,10 @@ Longest: 34

    Frequencies of numeric values

    Frequency tables can be created of any input.

    In case of numeric values (like integers, doubles, etc.) additional descriptive statistics will be calculated and shown into the header:

    -
    # # get age distribution of unique patients
    -septic_patients %>% 
    -  distinct(patient_id, .keep_all = TRUE) %>% 
    -  freq(age, nmax = 5, header = TRUE)
    +
    # # get age distribution of unique patients
    +septic_patients %>% 
    +  distinct(patient_id, .keep_all = TRUE) %>% 
    +  freq(age, nmax = 5, header = TRUE)

    Frequency table of age from a data.frame (981 x 49)

    Class: numeric
    Length: 981 (of which NA: 0 = 0.00%)
    @@ -506,8 +506,8 @@ Outliers: 15 (unique count: 12)

    Frequencies of factors

    To sort frequencies of factors on their levels instead of item count, use the sort.count parameter.

    sort.count is TRUE by default. Compare this default behaviour…

    -
    septic_patients %>%
    -  freq(hospital_id)
    +
    septic_patients %>%
    +  freq(hospital_id)

    Frequency table of hospital_id from septic_patients (2,000 x 49)

    Class: factor (numeric)
    Length: 2,000 (of which NA: 0 = 0.00%)
    @@ -558,8 +558,8 @@ Unique: 4

    … to this, where items are now sorted on factor levels:

    -
    septic_patients %>%
    -  freq(hospital_id, sort.count = FALSE)
    +
    septic_patients %>%
    +  freq(hospital_id, sort.count = FALSE)

    Frequency table of hospital_id from septic_patients (2,000 x 49)

    Class: factor (numeric)
    Length: 2,000 (of which NA: 0 = 0.00%)
    @@ -610,8 +610,8 @@ Unique: 4

    All classes will be printed into the header. Variables with the new rsi class of this AMR package are actually ordered factors and have three classes (look at Class in the header):

    -
    septic_patients %>%
    -  freq(AMX, header = TRUE)
    +
    septic_patients %>%
    +  freq(AMX, header = TRUE)

    Frequency table of AMX from septic_patients (2,000 x 49)

    Class: factor > ordered > rsi (numeric)
    Length: 2,000 (of which NA: 771 = 38.55%)
    @@ -661,8 +661,8 @@ Group: Beta-lactams/penicillins

    Frequencies of dates

    Frequencies of dates will show the oldest and newest date in the data, and the amount of days between them:

    -
    septic_patients %>%
    -  freq(date, nmax = 5, header = TRUE)
    +
    septic_patients %>%
    +  freq(date, nmax = 5, header = TRUE)

    Frequency table of date from septic_patients (2,000 x 49)

    Class: Date (numeric)
    Length: 2,000 (of which NA: 0 = 0.00%)
    @@ -728,11 +728,11 @@ Median: 31 July 2009 (47.39%)

    Assigning a frequency table to an object

    A frequency table is actually a regular data.frame, with the exception that it contains an additional class.

    -
    my_df <- septic_patients %>% freq(age)
    -class(my_df)
    +
    my_df <- septic_patients %>% freq(age)
    +class(my_df)

    [1] “freq” “data.frame”

    Because of this additional class, a frequency table prints like the examples above. But the object itself contains the complete table without a row limitation:

    -
    dim(my_df)
    +
    dim(my_df)

    [1] 74 5

    @@ -743,8 +743,8 @@ Median: 31 July 2009 (47.39%)

    Parameter na.rm

    With the na.rm parameter you can remove NA values from the frequency table (defaults to TRUE, but the number of NA values will always be shown into the header):

    -
    septic_patients %>%
    -  freq(AMX, na.rm = FALSE)
    +
    septic_patients %>%
    +  freq(AMX, na.rm = FALSE)

    Frequency table of AMX from septic_patients (2,000 x 49)

    Class: factor > ordered > rsi (numeric)
    Length: 2,000 (of which NA: 771 = 38.55%)
    @@ -803,8 +803,8 @@ Group: Beta-lactams/penicillins
    Parameter row.names

    A frequency table shows row indices. To remove them, use row.names = FALSE:

    -
    septic_patients %>%
    -  freq(hospital_id, row.names = FALSE)
    +
    septic_patients %>%
    +  freq(hospital_id, row.names = FALSE)

    Frequency table of hospital_id from septic_patients (2,000 x 49)

    Class: factor (numeric)
    Length: 2,000 (of which NA: 0 = 0.00%)
    @@ -855,21 +855,21 @@ Unique: 4

    Parameter markdown

    The markdown parameter is TRUE at default in non-interactive sessions, like in reports created with R Markdown. This will always print all rows, unless nmax is set. Without markdown (like in regular R), a frequency table would print like:

    -
    septic_patients %>%
    -  freq(hospital_id, markdown = FALSE)
    -# Frequency table of `hospital_id` from `septic_patients` (2,000 x 49) 
    -# 
    -# Class:   factor (numeric)
    -# Length:  2,000 (of which NA: 0 = 0.00%)
    -# Levels:  4: A, B, C, D
    -# Unique:  4
    -# 
    -#      Item    Count   Percent   Cum. Count   Cum. Percent
    -# ---  -----  ------  --------  -----------  -------------
    -# 1    D         762     38.1%          762          38.1%
    -# 2    B         663     33.2%        1,425          71.2%
    -# 3    A         321     16.0%        1,746          87.3%
    -# 4    C         254     12.7%        2,000         100.0%
    +
    septic_patients %>%
    +  freq(hospital_id, markdown = FALSE)
    +# Frequency table of `hospital_id` from `septic_patients` (2,000 x 49) 
    +# 
    +# Class:   factor (numeric)
    +# Length:  2,000 (of which NA: 0 = 0.00%)
    +# Levels:  4: A, B, C, D
    +# Unique:  4
    +# 
    +#      Item    Count   Percent   Cum. Count   Cum. Percent
    +# ---  -----  ------  --------  -----------  -------------
    +# 1    D         762     38.1%          762          38.1%
    +# 2    B         663     33.2%        1,425          71.2%
    +# 3    A         321     16.0%        1,746          87.3%
    +# 4    C         254     12.7%        2,000         100.0%
    diff --git a/docs/articles/index.html b/docs/articles/index.html index 63a90a6a..6b620390 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -78,7 +78,7 @@ AMR (for R) - 0.7.1.9010 + 0.7.1.9012 diff --git a/docs/articles/resistance_predict.html b/docs/articles/resistance_predict.html index 43c05208..707b0872 100644 --- a/docs/articles/resistance_predict.html +++ b/docs/articles/resistance_predict.html @@ -40,7 +40,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 @@ -192,7 +192,7 @@

    How to predict antimicrobial resistance

    Matthijs S. Berends

    -

    01 July 2019

    +

    10 July 2019

    @@ -206,28 +206,28 @@ Needed R packages

    As with many uses in R, we need some additional packages for AMR analysis. Our package works closely together with the tidyverse packages dplyr and ggplot2 by Dr Hadley Wickham. The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.

    Our AMR package depends on these packages and even extends their use and functions.

    -
    library(dplyr)
    -library(ggplot2)
    -library(AMR)
    -
    -# (if not yet installed, install with:)
    -# install.packages(c("tidyverse", "AMR"))
    +
    library(dplyr)
    +library(ggplot2)
    +library(AMR)
    +
    +# (if not yet installed, install with:)
    +# install.packages(c("tidyverse", "AMR"))

    Prediction analysis

    Our package contains a function resistance_predict(), which takes the same input as functions for other AMR analysis. Based on a date column, it calculates cases per year and uses a regression model to predict antimicrobial resistance.

    It is basically as easy as:

    -
    # resistance prediction of piperacillin/tazobactam (TZP):
    -resistance_predict(tbl = septic_patients, col_date = "date", col_ab = "TZP")
    -
    -# or:
    -septic_patients %>% 
    -  resistance_predict(col_ab = "TZP")
    -
    -# to bind it to object 'predict_TZP' for example:
    -predict_TZP <- septic_patients %>% 
    -  resistance_predict(col_ab = "TZP")
    +
    # resistance prediction of piperacillin/tazobactam (TZP):
    +resistance_predict(tbl = septic_patients, col_date = "date", col_ab = "TZP")
    +
    +# or:
    +septic_patients %>% 
    +  resistance_predict(col_ab = "TZP")
    +
    +# to bind it to object 'predict_TZP' for example:
    +predict_TZP <- septic_patients %>% 
    +  resistance_predict(col_ab = "TZP")

    The function will look for a date column itself if col_date is not set.

    When running any of these commands, a summary of the regression model will be printed unless using resistance_predict(..., info = FALSE).

    # NOTE: Using column `date` as input for `col_date`.
    @@ -257,55 +257,55 @@
     # 
     # Number of Fisher Scoring iterations: 4

    This text is only a printed summary - the actual result (output) of the function is a data.frame containing for each year: the number of observations, the actual observed resistance, the estimated resistance and the standard error below and above the estimation:

    -
    predict_TZP
    -#    year      value    se_min    se_max observations   observed  estimated
    -# 1  2003 0.06250000        NA        NA           32 0.06250000 0.05486389
    -# 2  2004 0.08536585        NA        NA           82 0.08536585 0.06089002
    -# 3  2005 0.05000000        NA        NA           60 0.05000000 0.06753075
    -# 4  2006 0.05084746        NA        NA           59 0.05084746 0.07483801
    -# 5  2007 0.12121212        NA        NA           66 0.12121212 0.08286570
    -# 6  2008 0.04166667        NA        NA           72 0.04166667 0.09166918
    -# 7  2009 0.01639344        NA        NA           61 0.01639344 0.10130461
    -# 8  2010 0.05660377        NA        NA           53 0.05660377 0.11182814
    -# 9  2011 0.18279570        NA        NA           93 0.18279570 0.12329488
    -# 10 2012 0.30769231        NA        NA           65 0.30769231 0.13575768
    -# 11 2013 0.06896552        NA        NA           58 0.06896552 0.14926576
    -# 12 2014 0.10000000        NA        NA           60 0.10000000 0.16386307
    -# 13 2015 0.23636364        NA        NA           55 0.23636364 0.17958657
    -# 14 2016 0.22619048        NA        NA           84 0.22619048 0.19646431
    -# 15 2017 0.16279070        NA        NA           86 0.16279070 0.21451350
    -# 16 2018 0.23373852 0.2021578 0.2653193           NA         NA 0.23373852
    -# 17 2019 0.25412909 0.2168525 0.2914057           NA         NA 0.25412909
    -# 18 2020 0.27565854 0.2321869 0.3191302           NA         NA 0.27565854
    -# 19 2021 0.29828252 0.2481942 0.3483709           NA         NA 0.29828252
    -# 20 2022 0.32193804 0.2649008 0.3789753           NA         NA 0.32193804
    -# 21 2023 0.34654311 0.2823269 0.4107593           NA         NA 0.34654311
    -# 22 2024 0.37199700 0.3004860 0.4435080           NA         NA 0.37199700
    -# 23 2025 0.39818127 0.3193839 0.4769787           NA         NA 0.39818127
    -# 24 2026 0.42496142 0.3390173 0.5109056           NA         NA 0.42496142
    -# 25 2027 0.45218939 0.3593720 0.5450068           NA         NA 0.45218939
    -# 26 2028 0.47970658 0.3804212 0.5789920           NA         NA 0.47970658
    -# 27 2029 0.50734745 0.4021241 0.6125708           NA         NA 0.50734745
    +
    predict_TZP
    +#    year      value    se_min    se_max observations   observed  estimated
    +# 1  2003 0.06250000        NA        NA           32 0.06250000 0.05486389
    +# 2  2004 0.08536585        NA        NA           82 0.08536585 0.06089002
    +# 3  2005 0.05000000        NA        NA           60 0.05000000 0.06753075
    +# 4  2006 0.05084746        NA        NA           59 0.05084746 0.07483801
    +# 5  2007 0.12121212        NA        NA           66 0.12121212 0.08286570
    +# 6  2008 0.04166667        NA        NA           72 0.04166667 0.09166918
    +# 7  2009 0.01639344        NA        NA           61 0.01639344 0.10130461
    +# 8  2010 0.05660377        NA        NA           53 0.05660377 0.11182814
    +# 9  2011 0.18279570        NA        NA           93 0.18279570 0.12329488
    +# 10 2012 0.30769231        NA        NA           65 0.30769231 0.13575768
    +# 11 2013 0.06896552        NA        NA           58 0.06896552 0.14926576
    +# 12 2014 0.10000000        NA        NA           60 0.10000000 0.16386307
    +# 13 2015 0.23636364        NA        NA           55 0.23636364 0.17958657
    +# 14 2016 0.22619048        NA        NA           84 0.22619048 0.19646431
    +# 15 2017 0.16279070        NA        NA           86 0.16279070 0.21451350
    +# 16 2018 0.23373852 0.2021578 0.2653193           NA         NA 0.23373852
    +# 17 2019 0.25412909 0.2168525 0.2914057           NA         NA 0.25412909
    +# 18 2020 0.27565854 0.2321869 0.3191302           NA         NA 0.27565854
    +# 19 2021 0.29828252 0.2481942 0.3483709           NA         NA 0.29828252
    +# 20 2022 0.32193804 0.2649008 0.3789753           NA         NA 0.32193804
    +# 21 2023 0.34654311 0.2823269 0.4107593           NA         NA 0.34654311
    +# 22 2024 0.37199700 0.3004860 0.4435080           NA         NA 0.37199700
    +# 23 2025 0.39818127 0.3193839 0.4769787           NA         NA 0.39818127
    +# 24 2026 0.42496142 0.3390173 0.5109056           NA         NA 0.42496142
    +# 25 2027 0.45218939 0.3593720 0.5450068           NA         NA 0.45218939
    +# 26 2028 0.47970658 0.3804212 0.5789920           NA         NA 0.47970658
    +# 27 2029 0.50734745 0.4021241 0.6125708           NA         NA 0.50734745

    The function plot is available in base R, and can be extended by other packages to depend the output based on the type of input. We extended its function to cope with resistance predictions:

    -
    plot(predict_TZP)
    +
    plot(predict_TZP)

    This is the fastest way to plot the result. It automatically adds the right axes, error bars, titles, number of available observations and type of model.

    We also support the ggplot2 package with our custom function ggplot_rsi_predict() to create more appealing plots:

    -
    ggplot_rsi_predict(predict_TZP)
    +
    ggplot_rsi_predict(predict_TZP)

    -
    
    -# choose for error bars instead of a ribbon
    -ggplot_rsi_predict(predict_TZP, ribbon = FALSE)
    +
    
    +# choose for error bars instead of a ribbon
    +ggplot_rsi_predict(predict_TZP, ribbon = FALSE)

    Choosing the right model

    Resistance is not easily predicted; if we look at vancomycin resistance in Gram positives, the spread (i.e. standard error) is enormous:

    -
    septic_patients %>%
    -  filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
    -  resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE) %>% 
    -  ggplot_rsi_predict()
    -# NOTE: Using column `date` as input for `col_date`.
    +
    septic_patients %>%
    +  filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
    +  resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE) %>% 
    +  ggplot_rsi_predict()
    +# NOTE: Using column `date` as input for `col_date`.

    Vancomycin resistance could be 100% in ten years, but might also stay around 0%.

    You can define the model with the model parameter. The default model is a generalised linear regression model using a binomial distribution, assuming that a period of zero resistance was followed by a period of increasing resistance leading slowly to more and more resistance.

    @@ -346,25 +346,25 @@

    For the vancomycin resistance in Gram positive bacteria, a linear model might be more appropriate since no (left half of a) binomial distribution is to be expected based on the observed years:

    -
    septic_patients %>%
    -  filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
    -  resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>% 
    -  ggplot_rsi_predict()
    -# NOTE: Using column `date` as input for `col_date`.
    +
    septic_patients %>%
    +  filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
    +  resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>% 
    +  ggplot_rsi_predict()
    +# NOTE: Using column `date` as input for `col_date`.

    This seems more likely, doesn’t it?

    The model itself is also available from the object, as an attribute:

    - +
    diff --git a/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-4-1.png b/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-4-1.png index 12921bf2..43bc315b 100644 Binary files a/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-4-1.png and b/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-4-1.png differ diff --git a/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-5-1.png b/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-5-1.png index b2341c9d..55784d09 100644 Binary files a/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-5-1.png and b/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-5-1.png differ diff --git a/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-5-2.png b/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-5-2.png index b15c2247..722cfef8 100644 Binary files a/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-5-2.png and b/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-5-2.png differ diff --git a/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-6-1.png b/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-6-1.png index 41041d74..210de02d 100644 Binary files a/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-6-1.png and b/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-6-1.png differ diff --git a/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-7-1.png b/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-7-1.png index 5ca2fafe..dfa1b0f1 100644 Binary files a/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-7-1.png and b/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-7-1.png differ diff --git a/docs/authors.html b/docs/authors.html index 7516492f..25c4783b 100644 --- a/docs/authors.html +++ b/docs/authors.html @@ -78,7 +78,7 @@ AMR (for R) - 0.7.1.9010 + 0.7.1.9012 diff --git a/docs/index.html b/docs/index.html index 36ab43d6..cce165d8 100644 --- a/docs/index.html +++ b/docs/index.html @@ -42,7 +42,7 @@ AMR (for R) - 0.7.1.9010 + 0.7.1.9012 @@ -190,9 +190,9 @@
    -
    +

    (TLDR - to find out how to conduct AMR analysis, please continue reading here to get started.


    @@ -248,7 +248,7 @@

    Latest released version

    This package is available on the official R network (CRAN), which has a peer-reviewed submission process. Install this package in R with:

    - +

    It will be downloaded and installed automatically. For RStudio, click on the menu Tools > Install Packages… and then type in “AMR” and press Install.

    Note: Not all functions on this website may be available in this latest release. To use all functions and data sets mentioned on this website, install the latest development version.

    @@ -256,8 +256,8 @@

    Latest development version

    The latest and unpublished development version can be installed with (precaution: may be unstable):

    -
    install.packages("devtools")
    -devtools::install_gitlab("msberends/AMR")
    +
    install.packages("devtools")
    +devtools::install_gitlab("msberends/AMR")
    @@ -299,9 +299,9 @@

    NOTE: The WHOCC copyright does not allow use for commercial purposes, unlike any other info from this package. See \url{https://www.whocc.no/copyright_disclaimer/}.

    Read more about the data from WHOCC in our manual.

    -
    +

    -WHONET / EARS-Net

    +WHONET / EARS-Net

    We support WHONET and EARS-Net data. Exported files from WHONET can be imported into R and can be analysed easily using this package. For education purposes, we created an example data set WHONET with the exact same structure as a WHONET export file. Furthermore, this package also contains a data set antibiotics with all EARS-Net antibiotic abbreviations, and knows almost all WHONET abbreviations for microorganisms. When using WHONET data as input for analysis, all input parameters will be set automatically.

    Read our tutorial about how to work with WHONET data here.

    diff --git a/docs/news/index.html b/docs/news/index.html index 0dd8aaad..2d75fc22 100644 --- a/docs/news/index.html +++ b/docs/news/index.html @@ -78,7 +78,7 @@ AMR (for R) - 0.7.1.9010 + 0.7.1.9012
    @@ -232,9 +232,9 @@
    -
    +

    -AMR 0.7.1.9010 Unreleased +AMR 0.7.1.9012 Unreleased

    @@ -242,29 +242,29 @@ @@ -288,9 +288,9 @@

    -
    +

    -AMR 0.7.1 2019-06-23 +AMR 0.7.1 2019-06-23

    @@ -298,14 +298,14 @@

    All these lead to the microbial ID of E. coli:

    - +
  • Function mo_info() as an analogy to ab_info(). The mo_info() prints a list with the full taxonomy, authors, and the URL to the online database of a microorganism
  • Function mo_synonyms() to get all previously accepted taxonomic names of a microorganism

  • @@ -369,9 +369,9 @@

    -
    +

    -AMR 0.7.0 2019-06-03 +AMR 0.7.0 2019-06-03

    + @@ -466,9 +466,9 @@ Please +

    -AMR 0.6.1 2019-03-29 +AMR 0.6.1 2019-03-29

    @@ -480,9 +480,9 @@ Please +

    -AMR 0.6.0 2019-03-27 +AMR 0.6.0 2019-03-27

    New website!

    We’ve got a new website: https://msberends.gitlab.io/AMR (built with the great pkgdown)

    @@ -505,7 +505,7 @@ Please catalogue_of_life_version(). -
  • Due to this change, some mo codes changed (e.g. Streptococcus changed from B_STRPTC to B_STRPT). A translation table is used internally to support older microorganism IDs, so users will not notice this difference.
  • +
  • Due to this change, some mo codes changed (e.g. Streptococcus changed from B_STRPTC to B_STRPT). A translation table is used internally to support older microorganism IDs, so users will not notice this difference.
  • New function mo_rank() for the taxonomic rank (genus, species, infraspecies, etc.)
  • New function mo_url() to get the direct URL of a species from the Catalogue of Life
  • @@ -519,33 +519,33 @@ This data is updated annually - check the included version with the new function
  • New filters for antimicrobial classes. Use these functions to filter isolates on results in one of more antibiotics from a specific class:

    - +

    The antibiotics data set will be searched, after which the input data will be checked for column names with a value in any abbreviations, codes or official names found in the antibiotics data set. For example:

    - +
  • All ab_* functions are deprecated and replaced by atc_* functions:

    - -These functions use as.atc() internally. The old atc_property has been renamed atc_online_property(). This is done for two reasons: firstly, not all ATC codes are of antibiotics (ab) but can also be of antivirals or antifungals. Secondly, the input must have class atc or must be coerable to this class. Properties of these classes should start with the same class name, analogous to as.mo() and e.g. mo_genus.
  • + +These functions use as.atc() internally. The old atc_property has been renamed atc_online_property(). This is done for two reasons: firstly, not all ATC codes are of antibiotics (ab) but can also be of antivirals or antifungals. Secondly, the input must have class atc or must be coerable to this class. Properties of these classes should start with the same class name, analogous to as.mo() and e.g. mo_genus.
  • New functions set_mo_source() and get_mo_source() to use your own predefined MO codes as input for as.mo() and consequently all mo_* functions
  • Support for the upcoming dplyr version 0.8.0
  • New function guess_ab_col() to find an antibiotic column in a table
  • @@ -556,20 +556,20 @@ These functions use as.atc()New function age_groups() to split ages into custom or predefined groups (like children or elderly). This allows for easier demographic antimicrobial resistance analysis per age group.
  • New function ggplot_rsi_predict() as well as the base R plot() function can now be used for resistance prediction calculated with resistance_predict():

    -
    x <- resistance_predict(septic_patients, col_ab = "amox")
    -plot(x)
    -ggplot_rsi_predict(x)
    +
    x <- resistance_predict(septic_patients, col_ab = "amox")
    +plot(x)
    +ggplot_rsi_predict(x)
  • Functions filter_first_isolate() and filter_first_weighted_isolate() to shorten and fasten filtering on data sets with antimicrobial results, e.g.:

    - +

    is equal to:

    -
    septic_patients %>%
    -  mutate(only_firsts = first_isolate(septic_patients, ...)) %>%
    -  filter(only_firsts == TRUE) %>%
    -  select(-only_firsts)
    +
    septic_patients %>%
    +  mutate(only_firsts = first_isolate(septic_patients, ...)) %>%
    +  filter(only_firsts == TRUE) %>%
    +  select(-only_firsts)
  • New function availability() to check the number of available (non-empty) results in a data.frame
  • @@ -598,33 +598,33 @@ These functions use as.atc()
  • Now handles incorrect spelling, like i instead of y and f instead of ph:

    - +
  • Uncertainty of the algorithm is now divided into four levels, 0 to 3, where the default allow_uncertain = TRUE is equal to uncertainty level 2. Run ?as.mo for more info about these levels.

    -
    # equal:
    -as.mo(..., allow_uncertain = TRUE)
    -as.mo(..., allow_uncertain = 2)
    -
    -# also equal:
    -as.mo(..., allow_uncertain = FALSE)
    -as.mo(..., allow_uncertain = 0)
    +
    # equal:
    +as.mo(..., allow_uncertain = TRUE)
    +as.mo(..., allow_uncertain = 2)
    +
    +# also equal:
    +as.mo(..., allow_uncertain = FALSE)
    +as.mo(..., allow_uncertain = 0)
    Using as.mo(..., allow_uncertain = 3) could lead to very unreliable results.
  • Implemented the latest publication of Becker et al. (2019), for categorising coagulase-negative Staphylococci
  • All microbial IDs that found are now saved to a local file ~/.Rhistory_mo. Use the new function clean_mo_history() to delete this file, which resets the algorithms.
  • Incoercible results will now be considered ‘unknown’, MO code UNKNOWN. On foreign systems, properties of these will be translated to all languages already previously supported: German, Dutch, French, Italian, Spanish and Portuguese:

    - +
  • Fix for vector containing only empty values
  • Finds better results when input is in other languages
  • @@ -670,19 +670,19 @@ Using as.mo(..., allow_uncertain = 3)
  • Support for tidyverse quasiquotation! Now you can create frequency tables of function outcomes:

    - +
  • Header info is now available as a list, with the header function
  • The parameter header is now set to TRUE at default, even for markdown
  • @@ -713,9 +713,9 @@ Using as.mo(..., allow_uncertain = 3)

    -
    +

    -AMR 0.5.0 2018-11-30 +AMR 0.5.0 2018-11-30

    @@ -757,10 +757,10 @@ Using as.mo(..., allow_uncertain = 3)Fewer than 3 characters as input for as.mo will return NA
  • Function as.mo (and all mo_* wrappers) now supports genus abbreviations with “species” attached

    -
    as.mo("E. species")        # B_ESCHR
    -mo_fullname("E. spp.")     # "Escherichia species"
    -as.mo("S. spp")            # B_STPHY
    -mo_fullname("S. species")  # "Staphylococcus species"
    +
    as.mo("E. species")        # B_ESCHR
    +mo_fullname("E. spp.")     # "Escherichia species"
    +as.mo("S. spp")            # B_STPHY
    +mo_fullname("S. species")  # "Staphylococcus species"
  • Added parameter combine_IR (TRUE/FALSE) to functions portion_df and count_df, to indicate that all values of I and R must be merged into one, so the output only consists of S vs. IR (susceptible vs. non-susceptible)
  • Fix for portion_*(..., as_percent = TRUE) when minimal number of isolates would not be met
  • @@ -773,17 +773,17 @@ Using as.mo(..., allow_uncertain = 3)
  • Support for grouping variables, test with:

    - +
  • Support for (un)selecting columns:

    - +
  • -
  • Check for hms::is.hms +
  • Check for hms::is.hms
  • Now prints in markdown at default in non-interactive sessions
  • No longer adds the factor level column and sorts factors on count again
  • @@ -840,9 +840,9 @@ Using as.mo(..., allow_uncertain = 3)

    -
    +

    -AMR 0.4.0 2018-10-01 +AMR 0.4.0 2018-10-01

    @@ -861,18 +861,18 @@ Using as.mo(..., allow_uncertain = 3)

    They also come with support for German, Dutch, French, Italian, Spanish and Portuguese:

    -
    mo_gramstain("E. coli")
    -# [1] "Gram negative"
    -mo_gramstain("E. coli", language = "de") # German
    -# [1] "Gramnegativ"
    -mo_gramstain("E. coli", language = "es") # Spanish
    -# [1] "Gram negativo"
    -mo_fullname("S. group A", language = "pt") # Portuguese
    -# [1] "Streptococcus grupo A"
    +
    mo_gramstain("E. coli")
    +# [1] "Gram negative"
    +mo_gramstain("E. coli", language = "de") # German
    +# [1] "Gramnegativ"
    +mo_gramstain("E. coli", language = "es") # Spanish
    +# [1] "Gram negativo"
    +mo_fullname("S. group A", language = "pt") # Portuguese
    +# [1] "Streptococcus grupo A"

    Furthermore, former taxonomic names will give a note about the current taxonomic name:

    - +
  • Functions count_R, count_IR, count_I, count_SI and count_S to selectively count resistant or susceptible isolates
  • @@ -975,9 +975,9 @@ Using as.mo(..., allow_uncertain = 3)
    -
    +

    -AMR 0.3.0 2018-08-14 +AMR 0.3.0 2018-08-14

    @@ -1082,7 +1082,7 @@ Using as.mo(..., allow_uncertain = 3) -
  • Now possible to coerce MIC values with a space between operator and value, i.e. as.mic("<= 0.002") now works
  • +
  • Now possible to coerce MIC values with a space between operator and value, i.e. as.mic("<= 0.002") now works
  • Classes rsi and mic do not add the attribute package.version anymore
  • Added "groups" option for atc_property(..., property). It will return a vector of the ATC hierarchy as defined by the WHO. The new function atc_groups is a convenient wrapper around this.
  • Build-in host check for atc_property as it requires the host set by url to be responsive
  • @@ -1112,9 +1112,9 @@ Using as.mo(..., allow_uncertain = 3)

    -
    +

    -AMR 0.2.0 2018-05-03 +AMR 0.2.0 2018-05-03

    @@ -1170,9 +1170,9 @@ Using as.mo(..., allow_uncertain = 3)

    -
    +

    -AMR 0.1.1 2018-03-14 +AMR 0.1.1 2018-03-14

    -
    +

    -AMR 0.1.0 2018-02-22 +AMR 0.1.0 2018-02-22

    diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml index 9f781c72c..1c0ed4af 100644 --- a/docs/pkgdown.yml +++ b/docs/pkgdown.yml @@ -1,4 +1,4 @@ -pandoc: '2.6' +pandoc: 2.3.1 pkgdown: 1.3.0 pkgdown_sha: ~ articles: diff --git a/docs/reference/AMR-deprecated.html b/docs/reference/AMR-deprecated.html index 368607bf..f405ff51 100644 --- a/docs/reference/AMR-deprecated.html +++ b/docs/reference/AMR-deprecated.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9007 + 0.7.1.9012
    diff --git a/docs/reference/AMR.html b/docs/reference/AMR.html index 6559b61c..410cdd64 100644 --- a/docs/reference/AMR.html +++ b/docs/reference/AMR.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012
    diff --git a/docs/reference/ITIS.html b/docs/reference/ITIS.html deleted file mode 100644 index 12a08129..00000000 --- a/docs/reference/ITIS.html +++ /dev/null @@ -1,337 +0,0 @@ - - - - - - - - -ITIS: Integrated Taxonomic Information System — ITIS • AMR (for R) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    -
    - - - -
    - -
    -
    - - -
    - -

    All taxonomic names of all microorganisms are included in this package, using the authoritative Integrated Taxonomic Information System (ITIS).

    - -
    - - -

    ITIS

    - - -


    -This package contains the complete microbial taxonomic data (with all nine taxonomic ranks - from kingdom to subspecies) from the publicly available Integrated Taxonomic Information System (ITIS, https://www.itis.gov).

    -

    All ~20,000 (sub)species from the taxonomic kingdoms Bacteria, Fungi and Protozoa are included in this package, as well as all their ~2,500 previously accepted names known to ITIS. Furthermore, the responsible authors and year of publication are available. This allows users to use authoritative taxonomic information for their data analysis on any microorganism, not only human pathogens. It also helps to quickly determine the Gram stain of bacteria, since ITIS honours the taxonomic branching order of bacterial phyla according to Cavalier-Smith (2002), which defines that all bacteria are classified into either subkingdom Negibacteria or subkingdom Posibacteria.

    -

    ITIS is a partnership of U.S., Canadian, and Mexican agencies and taxonomic specialists [3].

    - -

    Read more on our website!

    - - -


    -On our website https://msberends.gitlab.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.

    - - -

    Examples

    -
    # NOT RUN {
    -# Get a note when a species was renamed
    -mo_shortname("Chlamydia psittaci")
    -# Note: 'Chlamydia psittaci' (Page, 1968) was renamed
    -#       'Chlamydophila psittaci' (Everett et al., 1999)
    -# [1] "C. psittaci"
    -
    -# Get any property from the entire taxonomic tree for all included species
    -mo_class("E. coli")
    -# [1] "Gammaproteobacteria"
    -
    -mo_family("E. coli")
    -# [1] "Enterobacteriaceae"
    -
    -mo_subkingdom("E. coli")
    -# [1] "Negibacteria"
    -
    -mo_gramstain("E. coli") # based on subkingdom
    -# [1] "Gram negative"
    -
    -mo_ref("E. coli")
    -# [1] "Castellani and Chalmers, 1919"
    -
    -# Do not get mistaken - the package only includes microorganisms
    -mo_phylum("C. elegans")
    -# [1] "Cyanobacteria"                   # Bacteria?!
    -mo_fullname("C. elegans")
    -# [1] "Chroococcus limneticus elegans"  # Because a microorganism was found
    -# }
    -
    - -
    - - -
    - - - - - - - - - diff --git a/docs/reference/WHOCC.html b/docs/reference/WHOCC.html index 6e064615..c6e6bae6 100644 --- a/docs/reference/WHOCC.html +++ b/docs/reference/WHOCC.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9009 + 0.7.1.9012
    diff --git a/docs/reference/WHONET.html b/docs/reference/WHONET.html index 205a3cd2..4046f624 100644 --- a/docs/reference/WHONET.html +++ b/docs/reference/WHONET.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9007 + 0.7.1.9012
    diff --git a/docs/reference/ab_property.html b/docs/reference/ab_property.html index 95626063..57c13872 100644 --- a/docs/reference/ab_property.html +++ b/docs/reference/ab_property.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012
    diff --git a/docs/reference/age.html b/docs/reference/age.html index af812507..d1d89fa6 100644 --- a/docs/reference/age.html +++ b/docs/reference/age.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012
    diff --git a/docs/reference/age_groups.html b/docs/reference/age_groups.html index f1b5113b..2bdafb01 100644 --- a/docs/reference/age_groups.html +++ b/docs/reference/age_groups.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012
    diff --git a/docs/reference/antibiotics.html b/docs/reference/antibiotics.html index ae1cf0e2..dbf39e76 100644 --- a/docs/reference/antibiotics.html +++ b/docs/reference/antibiotics.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9009 + 0.7.1.9012
    diff --git a/docs/reference/as.ab.html b/docs/reference/as.ab.html index 33f9e61a..7aaa7194 100644 --- a/docs/reference/as.ab.html +++ b/docs/reference/as.ab.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9009 + 0.7.1.9012
    diff --git a/docs/reference/as.disk.html b/docs/reference/as.disk.html index b2aa1c90..914e8e06 100644 --- a/docs/reference/as.disk.html +++ b/docs/reference/as.disk.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012
    diff --git a/docs/reference/as.mic.html b/docs/reference/as.mic.html index 0e02ee0b..2528e2e2 100644 --- a/docs/reference/as.mic.html +++ b/docs/reference/as.mic.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 diff --git a/docs/reference/as.rsi.html b/docs/reference/as.rsi.html index 19a2fc57..82a2fb27 100644 --- a/docs/reference/as.rsi.html +++ b/docs/reference/as.rsi.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9007 + 0.7.1.9012 diff --git a/docs/reference/atc_online.html b/docs/reference/atc_online.html index 761c1564..aa05ed26 100644 --- a/docs/reference/atc_online.html +++ b/docs/reference/atc_online.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 diff --git a/docs/reference/availability.html b/docs/reference/availability.html index 2d22deff..5cb0e25a 100644 --- a/docs/reference/availability.html +++ b/docs/reference/availability.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 diff --git a/docs/reference/count.html b/docs/reference/count.html index 1c916826..9f3ae52a 100644 --- a/docs/reference/count.html +++ b/docs/reference/count.html @@ -81,7 +81,7 @@ count_R and count_IR can be used to count resistant isolates, count_S and count_ AMR (for R) - 0.7.1.9007 + 0.7.1.9012 diff --git a/docs/reference/eucast_rules.html b/docs/reference/eucast_rules.html index 2125b637..25ec02e6 100644 --- a/docs/reference/eucast_rules.html +++ b/docs/reference/eucast_rules.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9010 + 0.7.1.9012 diff --git a/docs/reference/extended-functions.html b/docs/reference/extended-functions.html index ce2e53b5..c2282751 100644 --- a/docs/reference/extended-functions.html +++ b/docs/reference/extended-functions.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 diff --git a/docs/reference/filter_ab_class.html b/docs/reference/filter_ab_class.html index 72788170..5bb4aa84 100644 --- a/docs/reference/filter_ab_class.html +++ b/docs/reference/filter_ab_class.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 diff --git a/docs/reference/first_isolate.html b/docs/reference/first_isolate.html index 5a72e039..29fb5e61 100644 --- a/docs/reference/first_isolate.html +++ b/docs/reference/first_isolate.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 diff --git a/docs/reference/g.test.html b/docs/reference/g.test.html index d4d57e49..3208e959 100644 --- a/docs/reference/g.test.html +++ b/docs/reference/g.test.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 diff --git a/docs/reference/ggplot_rsi.html b/docs/reference/ggplot_rsi.html index 82f7b606..012a38ab 100644 --- a/docs/reference/ggplot_rsi.html +++ b/docs/reference/ggplot_rsi.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9007 + 0.7.1.9012 diff --git a/docs/reference/guess_ab_col.html b/docs/reference/guess_ab_col.html index ce86c053..142e741d 100644 --- a/docs/reference/guess_ab_col.html +++ b/docs/reference/guess_ab_col.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 diff --git a/docs/reference/index.html b/docs/reference/index.html index 24b8b760..9a6a54c8 100644 --- a/docs/reference/index.html +++ b/docs/reference/index.html @@ -78,7 +78,7 @@ AMR (for R) - 0.7.1.9010 + 0.7.1.9012 diff --git a/docs/reference/itis.html b/docs/reference/itis.html deleted file mode 100644 index abe4b60b..00000000 --- a/docs/reference/itis.html +++ /dev/null @@ -1,316 +0,0 @@ - - - - - - - - -ITIS: Integrated Taxonomic Information System — itis • AMR (for R) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    -
    - - - -
    - -
    -
    - - -
    - -

    All taxonomic names of all microorganisms are included in this package, using the authoritative Integrated Taxonomic Information System (ITIS).

    - -
    - - -

    ITIS

    - - -


    -This package contains the complete microbial taxonomic data (with all nine taxonomic ranks - from kingdom to subspecies) from the publicly available Integrated Taxonomic Information System (ITIS, https://www.itis.gov).

    -

    All (sub)species from the taxonomic kingdoms Bacteria, Fungi and Protozoa are included in this package, as well as all previously accepted names known to ITIS. Furthermore, the responsible authors and year of publication are available. This allows users to use authoritative taxonomic information for their data analysis on any microorganism, not only human pathogens. It also helps to quickly determine the Gram stain of bacteria, since all bacteria are classified into subkingdom Negibacteria or Posibacteria.

    -

    ITIS is a partnership of U.S., Canadian, and Mexican agencies and taxonomic specialists [3].

    - -

    Read more on our website!

    - - -


    -On our website https://msberends.gitlab.io/AMR you can find a omprehensive tutorial about how to conduct AMR analysis and find the complete documentation of all functions, which reads a lot easier than in R.

    - - -

    Examples

    -
    # NOT RUN {
    -# Get a note when a species was renamed
    -mo_shortname("Chlamydia psittaci")
    -# Note: 'Chlamydia psittaci' (Page, 1968) was renamed 
    -#       'Chlamydophila psittaci' (Everett et al., 1999)
    -# [1] "C. psittaci"
    -
    -# Get any property from the entire taxonomic tree for all included species
    -mo_class("E. coli")
    -# [1] "Gammaproteobacteria"
    -
    -mo_family("E. coli")
    -# [1] "Enterobacteriaceae"
    -
    -mo_subkingdom("E. coli")
    -# [1] "Negibacteria"
    -
    -mo_gramstain("E. coli") # based on subkingdom
    -# [1] "Gram negative"
    -
    -mo_ref("E. coli")
    -# [1] "Castellani and Chalmers, 1919"
    -
    -# Do not get mistaken - the package only includes microorganisms
    -mo_phylum("C. elegans")
    -# [1] "Cyanobacteria"                   # Bacteria?!
    -mo_fullname("C. elegans")
    -# [1] "Chroococcus limneticus elegans"  # Because a microorganism was found
    -# }
    -
    - -
    - - -
    - - - - - - - - - diff --git a/docs/reference/join.html b/docs/reference/join.html index 81e39caf..0f87e79f 100644 --- a/docs/reference/join.html +++ b/docs/reference/join.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 diff --git a/docs/reference/key_antibiotics.html b/docs/reference/key_antibiotics.html index 114bde04..a48aeb53 100644 --- a/docs/reference/key_antibiotics.html +++ b/docs/reference/key_antibiotics.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 diff --git a/docs/reference/kurtosis.html b/docs/reference/kurtosis.html index 0d731157..771735a5 100644 --- a/docs/reference/kurtosis.html +++ b/docs/reference/kurtosis.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 diff --git a/docs/reference/like.html b/docs/reference/like.html index d7b74259..9e4e20a3 100644 --- a/docs/reference/like.html +++ b/docs/reference/like.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 diff --git a/docs/reference/mdro.html b/docs/reference/mdro.html index 7714b871..94fab294 100644 --- a/docs/reference/mdro.html +++ b/docs/reference/mdro.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9009 + 0.7.1.9012 diff --git a/docs/reference/mo_source.html b/docs/reference/mo_source.html index 0606ee24..cce98d60 100644 --- a/docs/reference/mo_source.html +++ b/docs/reference/mo_source.html @@ -81,7 +81,7 @@ This is the fastest way to have your organisation (or analysis) specific codes p AMR (for R) - 0.7.1.9005 + 0.7.1.9012 diff --git a/docs/reference/p.symbol.html b/docs/reference/p.symbol.html index 6bcaa8c4..722d4578 100644 --- a/docs/reference/p.symbol.html +++ b/docs/reference/p.symbol.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 diff --git a/docs/reference/portion.html b/docs/reference/portion.html index b86bb2f8..0ead9559 100644 --- a/docs/reference/portion.html +++ b/docs/reference/portion.html @@ -81,7 +81,7 @@ portion_R and portion_IR can be used to calculate resistance, portion_S and port AMR (for R) - 0.7.1.9007 + 0.7.1.9012 diff --git a/docs/reference/read.4D.html b/docs/reference/read.4D.html index c255e9d8..b49e4369 100644 --- a/docs/reference/read.4D.html +++ b/docs/reference/read.4D.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 diff --git a/docs/reference/resistance_predict.html b/docs/reference/resistance_predict.html index 6fea0ba0..2ca6a2d4 100644 --- a/docs/reference/resistance_predict.html +++ b/docs/reference/resistance_predict.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 diff --git a/docs/reference/rsi_translation.html b/docs/reference/rsi_translation.html index e96af83b..69f648aa 100644 --- a/docs/reference/rsi_translation.html +++ b/docs/reference/rsi_translation.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 diff --git a/docs/reference/septic_patients.html b/docs/reference/septic_patients.html index 98b03dce..85f4b704 100644 --- a/docs/reference/septic_patients.html +++ b/docs/reference/septic_patients.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9007 + 0.7.1.9012 diff --git a/docs/reference/skewness.html b/docs/reference/skewness.html index 2e7d66c1..9a765c78 100644 --- a/docs/reference/skewness.html +++ b/docs/reference/skewness.html @@ -81,7 +81,7 @@ When negative: the left tail is longer; the mass of the distribution is concentr AMR (for R) - 0.7.1.9005 + 0.7.1.9012 diff --git a/docs/reference/translate.html b/docs/reference/translate.html index 7c38140c..d372d531 100644 --- a/docs/reference/translate.html +++ b/docs/reference/translate.html @@ -80,7 +80,7 @@ AMR (for R) - 0.7.1.9005 + 0.7.1.9012 diff --git a/tests/testthat/test-mo_property.R b/tests/testthat/test-mo_property.R index 94dd4374..969948d4 100644 --- a/tests/testthat/test-mo_property.R +++ b/tests/testthat/test-mo_property.R @@ -50,6 +50,7 @@ test_that("mo_property works", { expect_equal(mo_year("Escherichia coli"), 1919) expect_equal(mo_shortname("Escherichia coli"), "E. coli") + expect_equal(mo_shortname("Escherichia"), "E. spp.") expect_equal(mo_shortname("Staphylococcus aureus"), "S. aureus") expect_equal(mo_shortname("Staphylococcus aureus", Becker = TRUE), "S. aureus") expect_equal(mo_shortname("Staphylococcus aureus", Becker = "all", language = "en"), "CoPS")