diff --git a/DESCRIPTION b/DESCRIPTION index 1c4e30e7..14a7cf81 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR -Version: 0.7.1.9010 -Date: 2019-07-09 +Version: 0.7.1.9012 +Date: 2019-07-10 Title: Antimicrobial Resistance Analysis Authors@R: c( person( diff --git a/NEWS.md b/NEWS.md index a5375b27..aa4d72d3 100755 --- a/NEWS.md +++ b/NEWS.md @@ -1,4 +1,4 @@ -# AMR 0.7.1.9010 +# AMR 0.7.1.9012 ### New * Additional way to calculate co-resistance, i.e. when using multiple antibiotics as input for `portion_*` functions or `count_*` functions. This can be used to determine the empiric susceptibily of a combination therapy. A new parameter `only_all_tested` (**which defaults to `FALSE`**) replaces the old `also_single_tested` and can be used to select one of the two methods to count isolates and calculate portions. The difference can be seen in this example table (which is also on the `portion` and `count` help pages), where the %SI is being determined: diff --git a/R/mo_property.R b/R/mo_property.R index 1e2829f9..56834003 100755 --- a/R/mo_property.R +++ b/R/mo_property.R @@ -151,9 +151,14 @@ mo_shortname <- function(x, language = get_locale(), ...) { x.mo <- AMR::as.mo(x, ...) metadata <- get_mo_failures_uncertainties_renamed() + replace_empty <- function(x) { + x[x == ""] <- "spp." + x + } + # get first char of genus and complete species in English - shortnames <- paste0(substr(mo_genus(x.mo, language = NULL), 1, 1), ". ", mo_species(x.mo, language = NULL)) - + shortnames <- paste0(substr(mo_genus(x.mo, language = NULL), 1, 1), ". ", replace_empty(mo_species(x.mo, language = NULL))) + # exceptions for Staphylococci shortnames[shortnames == "S. coagulase-negative" ] <- "CoNS" shortnames[shortnames == "S. coagulase-positive" ] <- "CoPS" diff --git a/R/sysdata.rda b/R/sysdata.rda index 43047d60..096a3d9b 100644 Binary files a/R/sysdata.rda and b/R/sysdata.rda differ diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index ce2d5e1f..415c8868 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -78,7 +78,7 @@
diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index a4c0afd1..f1537c11 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -40,7 +40,7 @@ @@ -192,7 +192,7 @@AMR.Rmd
Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 01 July 2019.
+Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 10 July 2019.
As with many uses in R, we need some additional packages for AMR analysis. Our package works closely together with the tidyverse packages dplyr
and ggplot2
by Dr Hadley Wickham. The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.
Our AMR
package depends on these packages and even extends their use and functions.
library(dplyr)
-library(ggplot2)
-library(AMR)
-
-# (if not yet installed, install with:)
-# install.packages(c("tidyverse", "AMR"))
library(dplyr)
+library(ggplot2)
+library(AMR)
+
+# (if not yet installed, install with:)
+# install.packages(c("tidyverse", "AMR"))
To start with patients, we need a unique list of patients.
- +The LETTERS
object is available in R - it’s a vector with 26 characters: A
to Z
. The patients
object we just created is now a vector of length 260, with values (patient IDs) varying from A1
to Z10
. Now we we also set the gender of our patients, by putting the ID and the gender in a table:
The first 135 patient IDs are now male, the other 125 are female.
Let’s pretend that our data consists of blood cultures isolates from between 1 January 2010 and 1 January 2018.
- +This dates
object now contains all days in our date range.
For this tutorial, we will uses four different microorganisms: Escherichia coli, Staphylococcus aureus, Streptococcus pneumoniae, and Klebsiella pneumoniae:
- +For completeness, we can also add the hospital where the patients was admitted and we need to define valid antibmicrobial results for our randomisation:
- +Using the sample()
function, we can randomly select items from all objects we defined earlier. To let our fake data reflect reality a bit, we will also approximately define the probabilities of bacteria and the antibiotic results with the prob
parameter.
sample_size <- 20000
-data <- data.frame(date = sample(dates, size = sample_size, replace = TRUE),
- patient_id = sample(patients, size = sample_size, replace = TRUE),
- hospital = sample(hospitals, size = sample_size, replace = TRUE,
- prob = c(0.30, 0.35, 0.15, 0.20)),
- bacteria = sample(bacteria, size = sample_size, replace = TRUE,
- prob = c(0.50, 0.25, 0.15, 0.10)),
- AMX = sample(ab_interpretations, size = sample_size, replace = TRUE,
- prob = c(0.60, 0.05, 0.35)),
- AMC = sample(ab_interpretations, size = sample_size, replace = TRUE,
- prob = c(0.75, 0.10, 0.15)),
- CIP = sample(ab_interpretations, size = sample_size, replace = TRUE,
- prob = c(0.80, 0.00, 0.20)),
- GEN = sample(ab_interpretations, size = sample_size, replace = TRUE,
- prob = c(0.92, 0.00, 0.08))
- )
sample_size <- 20000
+data <- data.frame(date = sample(dates, size = sample_size, replace = TRUE),
+ patient_id = sample(patients, size = sample_size, replace = TRUE),
+ hospital = sample(hospitals, size = sample_size, replace = TRUE,
+ prob = c(0.30, 0.35, 0.15, 0.20)),
+ bacteria = sample(bacteria, size = sample_size, replace = TRUE,
+ prob = c(0.50, 0.25, 0.15, 0.10)),
+ AMX = sample(ab_interpretations, size = sample_size, replace = TRUE,
+ prob = c(0.60, 0.05, 0.35)),
+ AMC = sample(ab_interpretations, size = sample_size, replace = TRUE,
+ prob = c(0.75, 0.10, 0.15)),
+ CIP = sample(ab_interpretations, size = sample_size, replace = TRUE,
+ prob = c(0.80, 0.00, 0.20)),
+ GEN = sample(ab_interpretations, size = sample_size, replace = TRUE,
+ prob = c(0.92, 0.00, 0.08))
+ )
Using the left_join()
function from the dplyr
package, we can ‘map’ the gender to the patient ID using the patients_table
object we created earlier:
The resulting data set contains 20,000 blood culture isolates. With the head()
function we can preview the first 6 values of this data set:
date | @@ -327,70 +327,70 @@||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2011-09-06 | -Z5 | -Hospital B | -Escherichia coli | -R | -S | -S | -S | -F | -||||||||
2015-03-21 | -E7 | +2011-07-20 | +X1 | Hospital C | -Escherichia coli | +Staphylococcus aureus | R | I | S | S | -M | -|||||
2010-08-11 | -X6 | -Hospital C | -Escherichia coli | -S | -S | -R | -S | F | ||||||||
2012-06-16 | -E10 | -Hospital D | -Staphylococcus aureus | -R | +2017-12-18 | +Q8 | +Hospital B | +Streptococcus pneumoniae | S | S | S | -M | +S | +F | ||
2016-12-29 | -J3 | -Hospital C | -Escherichia coli | +2015-12-08 | +X5 | +Hospital B | +Staphylococcus aureus | +S | +S | R | -S | -S | -S | -M | +R | +F |
2010-04-09 | -Q3 | +2011-12-08 | +X3 | Hospital B | +Klebsiella pneumoniae | +S | +S | +S | +S | +F | +||||||
2011-01-02 | +E1 | +Hospital A | Streptococcus pneumoniae | R | S | S | S | -F | +M | +|||||||
2012-10-22 | +K1 | +Hospital C | +Streptococcus pneumoniae | +S | +S | +S | +S | +M |
Use the frequency table function freq()
to look specifically for unique values in any variable. For example, for the gender
variable:
# Frequency table of `gender` from `data` (20,000 x 9)
#
# Class: factor (numeric)
@@ -411,82 +411,82 @@
#
# Item Count Percent Cum. Count Cum. Percent
# --- ----- ------- -------- ----------- -------------
-# 1 M 10,408 52.0% 10,408 52.0%
-# 2 F 9,592 48.0% 20,000 100.0%
+# 1 M 10,454 52.3% 10,454 52.3%
+# 2 F 9,546 47.7% 20,000 100.0%
So, we can draw at least two conclusions immediately. From a data scientists perspective, the data looks clean: only values M
and F
. From a researchers perspective: there are slightly more men. Nothing we didn’t already know.
The data is already quite clean, but we still need to transform some variables. The bacteria
column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate()
function of the dplyr
package makes this really easy:
We also want to transform the antibiotics, because in real life data we don’t know if they are really clean. The as.rsi()
function ensures reliability and reproducibility in these kind of variables. The mutate_at()
will run the as.rsi()
function on defined variables:
Finally, we will apply EUCAST rules on our antimicrobial results. In Europe, most medical microbiological laboratories already apply these rules. Our package features their latest insights on intrinsic resistance and exceptional phenotypes. Moreover, the eucast_rules()
function can also apply additional rules, like forcing
Because the amoxicillin (column AMX
) and amoxicillin/clavulanic acid (column AMC
) in our data were generated randomly, some rows will undoubtedly contain AMX = S and AMC = R, which is technically impossible. The eucast_rules()
fixes this:
data <- eucast_rules(data, col_mo = "bacteria")
-#
-# Rules by the European Committee on Antimicrobial Susceptibility Testing (EUCAST)
-# http://eucast.org/
-#
-# EUCAST Clinical Breakpoints (v9.0, 2019)
-# Aerococcus sanguinicola (no new changes)
-# Aerococcus urinae (no new changes)
-# Anaerobic Gram-negatives (no new changes)
-# Anaerobic Gram-positives (no new changes)
-# Campylobacter coli (no new changes)
-# Campylobacter jejuni (no new changes)
-# Enterobacteriales (Order) (no new changes)
-# Enterococcus (no new changes)
-# Haemophilus influenzae (no new changes)
-# Kingella kingae (no new changes)
-# Moraxella catarrhalis (no new changes)
-# Pasteurella multocida (no new changes)
-# Staphylococcus (no new changes)
-# Streptococcus groups A, B, C, G (no new changes)
-# Streptococcus pneumoniae (1,443 new changes)
-# Viridans group streptococci (no new changes)
-#
-# EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
-# Table 01: Intrinsic resistance in Enterobacteriaceae (1,332 new changes)
-# Table 02: Intrinsic resistance in non-fermentative Gram-negative bacteria (no new changes)
-# Table 03: Intrinsic resistance in other Gram-negative bacteria (no new changes)
-# Table 04: Intrinsic resistance in Gram-positive bacteria (2,723 new changes)
-# Table 08: Interpretive rules for B-lactam agents and Gram-positive cocci (no new changes)
-# Table 09: Interpretive rules for B-lactam agents and Gram-negative rods (no new changes)
-# Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins (no new changes)
-# Table 12: Interpretive rules for aminoglycosides (no new changes)
-# Table 13: Interpretive rules for quinolones (no new changes)
-#
-# Other rules
-# Non-EUCAST: amoxicillin/clav acid = S where ampicillin = S (2,213 new changes)
-# Non-EUCAST: ampicillin = R where amoxicillin/clav acid = R (127 new changes)
-# Non-EUCAST: piperacillin = R where piperacillin/tazobactam = R (no new changes)
-# Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no new changes)
-# Non-EUCAST: trimethoprim = R where trimethoprim/sulfa = R (no new changes)
-# Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no new changes)
-#
-# --------------------------------------------------------------------------
-# EUCAST rules affected 6,513 out of 20,000 rows, making a total of 7,838 edits
-# => added 0 test results
-#
-# => changed 7,838 test results
-# - 115 test results changed from S to I
-# - 4,719 test results changed from S to R
-# - 1,077 test results changed from I to S
-# - 335 test results changed from I to R
-# - 1,573 test results changed from R to S
-# - 19 test results changed from R to I
-# --------------------------------------------------------------------------
-#
-# Use verbose = TRUE to get a data.frame with all specified edits instead.
data <- eucast_rules(data, col_mo = "bacteria")
+#
+# Rules by the European Committee on Antimicrobial Susceptibility Testing (EUCAST)
+# http://eucast.org/
+#
+# EUCAST Clinical Breakpoints (v9.0, 2019)
+# Aerococcus sanguinicola (no new changes)
+# Aerococcus urinae (no new changes)
+# Anaerobic Gram-negatives (no new changes)
+# Anaerobic Gram-positives (no new changes)
+# Campylobacter coli (no new changes)
+# Campylobacter jejuni (no new changes)
+# Enterobacteriales (Order) (no new changes)
+# Enterococcus (no new changes)
+# Haemophilus influenzae (no new changes)
+# Kingella kingae (no new changes)
+# Moraxella catarrhalis (no new changes)
+# Pasteurella multocida (no new changes)
+# Staphylococcus (no new changes)
+# Streptococcus groups A, B, C, G (no new changes)
+# Streptococcus pneumoniae (1,481 new changes)
+# Viridans group streptococci (no new changes)
+#
+# EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
+# Table 01: Intrinsic resistance in Enterobacteriaceae (1,328 new changes)
+# Table 02: Intrinsic resistance in non-fermentative Gram-negative bacteria (no new changes)
+# Table 03: Intrinsic resistance in other Gram-negative bacteria (no new changes)
+# Table 04: Intrinsic resistance in Gram-positive bacteria (2,778 new changes)
+# Table 08: Interpretive rules for B-lactam agents and Gram-positive cocci (no new changes)
+# Table 09: Interpretive rules for B-lactam agents and Gram-negative rods (no new changes)
+# Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins (no new changes)
+# Table 12: Interpretive rules for aminoglycosides (no new changes)
+# Table 13: Interpretive rules for quinolones (no new changes)
+#
+# Other rules
+# Non-EUCAST: amoxicillin/clav acid = S where ampicillin = S (2,245 new changes)
+# Non-EUCAST: ampicillin = R where amoxicillin/clav acid = R (118 new changes)
+# Non-EUCAST: piperacillin = R where piperacillin/tazobactam = R (no new changes)
+# Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no new changes)
+# Non-EUCAST: trimethoprim = R where trimethoprim/sulfa = R (no new changes)
+# Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no new changes)
+#
+# --------------------------------------------------------------------------
+# EUCAST rules affected 6,581 out of 20,000 rows, making a total of 7,950 edits
+# => added 0 test results
+#
+# => changed 7,950 test results
+# - 120 test results changed from S to I
+# - 4,812 test results changed from S to R
+# - 1,089 test results changed from I to S
+# - 317 test results changed from I to R
+# - 1,588 test results changed from R to S
+# - 24 test results changed from R to I
+# --------------------------------------------------------------------------
+#
+# Use verbose = TRUE (on your original data) to get a data.frame with all specified edits instead.
Now that we have the microbial ID, we can add some taxonomic properties:
-data <- data %>%
- mutate(gramstain = mo_gramstain(bacteria),
- genus = mo_genus(bacteria),
- species = mo_species(bacteria))
data <- data %>%
+ mutate(gramstain = mo_gramstain(bacteria),
+ genus = mo_genus(bacteria),
+ species = mo_species(bacteria))
(…) When preparing a cumulative antibiogram to guide clinical decisions about empirical antimicrobial therapy of initial infections, only the first isolate of a given species per patient, per analysis period (eg, one year) should be included, irrespective of body site, antimicrobial susceptibility profile, or other phenotypical characteristics (eg, biotype). The first isolate is easily identified, and cumulative antimicrobial susceptibility test data prepared using the first isolate are generally comparable to cumulative antimicrobial susceptibility test data calculated by other methods, providing duplicate isolates are excluded.
M39-A4 Analysis and Presentation of Cumulative Antimicrobial Susceptibility Test Data, 4th Edition. CLSI, 2014. Chapter 6.4
This AMR
package includes this methodology with the first_isolate()
function. It adopts the episode of a year (can be changed by user) and it starts counting days after every selected isolate. This new variable can easily be added to our data:
data <- data %>%
- mutate(first = first_isolate(.))
-# NOTE: Using column `bacteria` as input for `col_mo`.
-# NOTE: Using column `date` as input for `col_date`.
-# NOTE: Using column `patient_id` as input for `col_patient_id`.
-# => Found 5,719 first isolates (28.6% of total)
So only 28.6% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
data <- data %>%
+ mutate(first = first_isolate(.))
+# NOTE: Using column `bacteria` as input for `col_mo`.
+# NOTE: Using column `date` as input for `col_date`.
+# NOTE: Using column `patient_id` as input for `col_patient_id`.
+# => Found 5,679 first isolates (28.4% of total)
So only 28.4% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
For future use, the above two syntaxes can be shortened with the filter_first_isolate()
function:
We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient S7, sorted on date:
+We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient B6, sorted on date:
isolate | @@ -529,30 +529,30 @@|||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | -2010-01-28 | -S7 | +2010-02-26 | +B6 | B_ESCHR_COL | R | -I | S | +R | S | TRUE | ||
2 | -2010-02-07 | -S7 | +2010-05-20 | +B6 | B_ESCHR_COL | -S | -S | -S | R | +S | +S | +S | FALSE |
3 | -2010-03-16 | -S7 | +2010-05-28 | +B6 | B_ESCHR_COL | R | S | @@ -562,10 +562,10 @@||||||
4 | -2010-10-09 | -S7 | +2010-06-27 | +B6 | B_ESCHR_COL | -S | +R | S | S | S | @@ -573,8 +573,8 @@|||
5 | -2011-01-25 | -S7 | +2010-09-06 | +B6 | B_ESCHR_COL | R | S | @@ -584,19 +584,19 @@||||||
6 | -2011-02-16 | -S7 | +2010-10-16 | +B6 | B_ESCHR_COL | +R | S | S | S | -S | -TRUE | +FALSE | |
7 | -2011-02-24 | -S7 | +2010-10-20 | +B6 | B_ESCHR_COL | S | S | @@ -606,32 +606,32 @@||||||
8 | -2011-03-30 | -S7 | +2010-11-01 | +B6 | B_ESCHR_COL | -R | +S | +S | S | R | -S | FALSE | |
9 | -2011-04-25 | -S7 | +2010-11-14 | +B6 | B_ESCHR_COL | S | S | S | -R | +S | FALSE | ||
10 | -2011-05-06 | -S7 | +2011-01-25 | +B6 | B_ESCHR_COL | -S | +R | S | S | S | @@ -639,18 +639,18 @@
Only 2 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics()
function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.
Only 1 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics()
function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.
If a column exists with a name like ‘key(…)ab’ the first_isolate()
function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output:
data <- data %>%
- mutate(keyab = key_antibiotics(.)) %>%
- mutate(first_weighted = first_isolate(.))
-# NOTE: Using column `bacteria` as input for `col_mo`.
-# NOTE: Using column `bacteria` as input for `col_mo`.
-# NOTE: Using column `date` as input for `col_date`.
-# NOTE: Using column `patient_id` as input for `col_patient_id`.
-# NOTE: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics = FALSE to prevent this.
-# [Criterion] Inclusion based on key antibiotics, ignoring I.
-# => Found 15,097 first weighted isolates (75.5% of total)
data <- data %>%
+ mutate(keyab = key_antibiotics(.)) %>%
+ mutate(first_weighted = first_isolate(.))
+# NOTE: Using column `bacteria` as input for `col_mo`.
+# NOTE: Using column `bacteria` as input for `col_mo`.
+# NOTE: Using column `date` as input for `col_date`.
+# NOTE: Using column `patient_id` as input for `col_patient_id`.
+# NOTE: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics = FALSE to prevent this.
+# [Criterion] Inclusion based on key antibiotics, ignoring I.
+# => Found 15,003 first weighted isolates (75.0% of total)
isolate | @@ -667,118 +667,118 @@||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | -2010-01-28 | -S7 | +2010-02-26 | +B6 | B_ESCHR_COL | R | -I | S | +R | S | TRUE | TRUE | ||
2 | -2010-02-07 | -S7 | +2010-05-20 | +B6 | B_ESCHR_COL | -S | -S | -S | R | +S | +S | +S | FALSE | TRUE |
3 | -2010-03-16 | -S7 | +2010-05-28 | +B6 | B_ESCHR_COL | R | S | S | S | FALSE | -TRUE | +FALSE | ||
4 | -2010-10-09 | -S7 | +2010-06-27 | +B6 | B_ESCHR_COL | -S | +R | S | S | S | FALSE | -TRUE | +FALSE | |
5 | -2011-01-25 | -S7 | +2010-09-06 | +B6 | B_ESCHR_COL | R | S | S | S | FALSE | -TRUE | +FALSE | ||
6 | -2011-02-16 | -S7 | +2010-10-16 | +B6 | B_ESCHR_COL | +R | S | S | S | -S | -TRUE | -TRUE | +FALSE | +FALSE |
7 | -2011-02-24 | -S7 | +2010-10-20 | +B6 | B_ESCHR_COL | S | S | S | S | FALSE | -FALSE | +TRUE | ||
8 | -2011-03-30 | -S7 | +2010-11-01 | +B6 | B_ESCHR_COL | -R | +S | +S | S | R | -S | FALSE | TRUE | |
9 | -2011-04-25 | -S7 | +2010-11-14 | +B6 | B_ESCHR_COL | S | S | S | -R | +S | FALSE | TRUE | ||
10 | -2011-05-06 | -S7 | +2011-01-25 | +B6 | B_ESCHR_COL | -S | +R | S | S | S | @@ -787,18 +787,19 @@
Instead of 2, now 9 isolates are flagged. In total, 75.5% of all isolates are marked ‘first weighted’ - 46.9% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
+Instead of 1, now 6 isolates are flagged. In total, 75% of all isolates are marked ‘first weighted’ - 46.6% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
As with filter_first_isolate()
, there’s a shortcut for this new algorithm too:
So we end up with 15,097 isolates for analysis.
+ +So we end up with 15,003 isolates for analysis.
We can remove unneeded columns:
- +Now our data looks like:
- +(omitted 29 entries, n = 57 [11.4%])
-
-# our transformed antibiotic columns
-# amoxicillin/clavulanic acid (J01CR02) as an example
-data %>% freq(AMC_ND2)
+# our transformed antibiotic columns
+# amoxicillin/clavulanic acid (J01CR02) as an example
+data %>% freq(AMC_ND2)
Frequency table of AMC_ND2
from data
(500 x 54)
Class: factor > ordered > rsi (numeric)
Length: 500 (of which NA: 19 = 3.80%)
diff --git a/docs/articles/benchmarks.html b/docs/articles/benchmarks.html
index d950825d..84caae51 100644
--- a/docs/articles/benchmarks.html
+++ b/docs/articles/benchmarks.html
@@ -40,7 +40,7 @@
benchmarks.Rmd
One of the most important features of this package is the complete microbial taxonomic database, supplied by the Catalogue of Life. We created a function as.mo()
that transforms any user input value to a valid microbial ID by using intelligent rules combined with the taxonomic tree of Catalogue of Life.
Using the microbenchmark
package, we can review the calculation performance of this function. Its function microbenchmark()
runs different input expressions independently of each other and measures their time-to-result.
In the next test, we try to ‘coerce’ different input values for Staphylococcus aureus. The actual result is the same every time: it returns its MO code B_STPHY_AUR
(B stands for Bacteria, the taxonomic kingdom).
But the calculation time differs a lot:
-S.aureus <- microbenchmark(as.mo("sau"),
- as.mo("stau"),
- as.mo("staaur"),
- as.mo("STAAUR"),
- as.mo("S. aureus"),
- as.mo("S. aureus"),
- as.mo("Staphylococcus aureus"),
- times = 10)
-print(S.aureus, unit = "ms", signif = 2)
-# Unit: milliseconds
-# expr min lq mean median uq max neval
-# as.mo("sau") 18.0 18.0 22 18.0 18.0 61 10
-# as.mo("stau") 65.0 65.0 70 66.0 66.0 110 10
-# as.mo("staaur") 18.0 18.0 33 18.0 62.0 81 10
-# as.mo("STAAUR") 18.0 18.0 18 18.0 18.0 19 10
-# as.mo("S. aureus") 52.0 52.0 61 52.0 53.0 97 10
-# as.mo("S. aureus") 52.0 52.0 71 53.0 97.0 150 10
-# as.mo("Staphylococcus aureus") 8.1 8.1 14 8.1 8.2 63 10
S.aureus <- microbenchmark(as.mo("sau"),
+ as.mo("stau"),
+ as.mo("staaur"),
+ as.mo("STAAUR"),
+ as.mo("S. aureus"),
+ as.mo("S. aureus"),
+ as.mo("Staphylococcus aureus"),
+ times = 10)
+print(S.aureus, unit = "ms", signif = 2)
+# Unit: milliseconds
+# expr min lq mean median uq max neval
+# as.mo("sau") 8.5 8.7 12.0 8.9 9.4 26 10
+# as.mo("stau") 31.0 32.0 42.0 33.0 34.0 120 10
+# as.mo("staaur") 8.6 8.7 11.0 9.1 9.2 26 10
+# as.mo("STAAUR") 8.7 9.1 9.3 9.2 9.4 11 10
+# as.mo("S. aureus") 23.0 23.0 30.0 24.0 40.0 46 10
+# as.mo("S. aureus") 22.0 23.0 27.0 24.0 25.0 41 10
+# as.mo("Staphylococcus aureus") 3.9 4.0 5.7 4.1 4.4 20 10
In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 5 milliseconds means it can determine 200 input values per second. It case of 100 milliseconds, this is only 10 input values per second. The second input is the only one that has to be looked up thoroughly. All the others are known codes (the first one is a WHONET code) or common laboratory codes, or common full organism names like the last one. Full organism names are always preferred.
To achieve this speed, the as.mo
function also takes into account the prevalence of human pathogenic microorganisms. The downside is of course that less prevalent microorganisms will be determined less fast. See this example for the ID of Thermus islandicus (B_THERMS_ISL
), a bug probably never found before in humans:
T.islandicus <- microbenchmark(as.mo("theisl"),
- as.mo("THEISL"),
- as.mo("T. islandicus"),
- as.mo("T. islandicus"),
- as.mo("Thermus islandicus"),
- times = 10)
-print(T.islandicus, unit = "ms", signif = 2)
-# Unit: milliseconds
-# expr min lq mean median uq max neval
-# as.mo("theisl") 390 390 420 440 440 440 10
-# as.mo("THEISL") 390 390 410 400 440 440 10
-# as.mo("T. islandicus") 210 210 230 220 250 270 10
-# as.mo("T. islandicus") 210 210 240 260 260 280 10
-# as.mo("Thermus islandicus") 72 72 92 73 120 130 10
That takes 6.8 times as much time on average. A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance. Full names (like Thermus islandicus) are almost fast - these are the most probable input from most data sets.
+T.islandicus <- microbenchmark(as.mo("theisl"),
+ as.mo("THEISL"),
+ as.mo("T. islandicus"),
+ as.mo("T. islandicus"),
+ as.mo("Thermus islandicus"),
+ times = 10)
+print(T.islandicus, unit = "ms", signif = 2)
+# Unit: milliseconds
+# expr min lq mean median uq max neval
+# as.mo("theisl") 290 310 320 320 320 350 10
+# as.mo("THEISL") 290 300 310 310 330 340 10
+# as.mo("T. islandicus") 140 140 150 150 160 170 10
+# as.mo("T. islandicus") 140 140 150 150 170 180 10
+# as.mo("Thermus islandicus") 50 52 58 54 68 70 10
That takes 10.2 times as much time on average. A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance. Full names (like Thermus islandicus) are almost fast - these are the most probable input from most data sets.
In the figure below, we compare Escherichia coli (which is very common) with Prevotella brevis (which is moderately common) and with Thermus islandicus (which is very uncommon):
-par(mar = c(5, 16, 4, 2)) # set more space for left margin text (16)
-
-boxplot(microbenchmark(as.mo("Thermus islandicus"),
- as.mo("Prevotella brevis"),
- as.mo("Escherichia coli"),
- as.mo("T. islandicus"),
- as.mo("P. brevis"),
- as.mo("E. coli"),
- times = 10),
- horizontal = TRUE, las = 1, unit = "s", log = FALSE,
- xlab = "", ylab = "Time in seconds",
- main = "Benchmarks per prevalence")
par(mar = c(5, 16, 4, 2)) # set more space for left margin text (16)
+
+boxplot(microbenchmark(as.mo("Thermus islandicus"),
+ as.mo("Prevotella brevis"),
+ as.mo("Escherichia coli"),
+ as.mo("T. islandicus"),
+ as.mo("P. brevis"),
+ as.mo("E. coli"),
+ times = 10),
+ horizontal = TRUE, las = 1, unit = "s", log = FALSE,
+ xlab = "", ylab = "Time in seconds",
+ main = "Benchmarks per prevalence")
Uncommon microorganisms take a lot more time than common microorganisms. To relieve this pitfall and further improve performance, two important calculations take almost no time at all: repetitive results and already precalculated results.
Repetitive results are unique values that are present more than once. Unique values will only be calculated once by as.mo()
. We will use mo_fullname()
for this test - a helper function that returns the full microbial name (genus, species and possibly subspecies) which uses as.mo()
internally.
library(dplyr)
-# take all MO codes from the septic_patients data set
-x <- septic_patients$mo %>%
- # keep only the unique ones
- unique() %>%
- # pick 50 of them at random
- sample(50) %>%
- # paste that 10,000 times
- rep(10000) %>%
- # scramble it
- sample()
-
-# got indeed 50 times 10,000 = half a million?
-length(x)
-# [1] 500000
-
-# and how many unique values do we have?
-n_distinct(x)
-# [1] 50
-
-# now let's see:
-run_it <- microbenchmark(mo_fullname(x),
- times = 10)
-print(run_it, unit = "ms", signif = 3)
-# Unit: milliseconds
-# expr min lq mean median uq max neval
-# mo_fullname(x) 1050 1050 1100 1090 1120 1230 10
So transforming 500,000 values (!!) of 50 unique values only takes 1.09 seconds (1092 ms). You only lose time on your unique input values.
+library(dplyr)
+# take all MO codes from the septic_patients data set
+x <- septic_patients$mo %>%
+ # keep only the unique ones
+ unique() %>%
+ # pick 50 of them at random
+ sample(50) %>%
+ # paste that 10,000 times
+ rep(10000) %>%
+ # scramble it
+ sample()
+
+# got indeed 50 times 10,000 = half a million?
+length(x)
+# [1] 500000
+
+# and how many unique values do we have?
+n_distinct(x)
+# [1] 50
+
+# now let's see:
+run_it <- microbenchmark(mo_fullname(x),
+ times = 10)
+print(run_it, unit = "ms", signif = 3)
+# Unit: milliseconds
+# expr min lq mean median uq max neval
+# mo_fullname(x) 611 628 643 635 650 714 10
So transforming 500,000 values (!!) of 50 unique values only takes 0.63 seconds (634 ms). You only lose time on your unique input values.
What about precalculated results? If the input is an already precalculated result of a helper function like mo_fullname()
, it almost doesn’t take any time at all (see ‘C’ below):
run_it <- microbenchmark(A = mo_fullname("B_STPHY_AUR"),
- B = mo_fullname("S. aureus"),
- C = mo_fullname("Staphylococcus aureus"),
- times = 10)
-print(run_it, unit = "ms", signif = 3)
-# Unit: milliseconds
-# expr min lq mean median uq max neval
-# A 13.00 13.20 13.60 13.60 14.00 14.40 10
-# B 49.40 50.00 57.50 51.90 52.40 103.00 10
-# C 1.52 1.72 1.81 1.78 1.98 1.99 10
So going from mo_fullname("Staphylococcus aureus")
to "Staphylococcus aureus"
takes 0.0018 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:
run_it <- microbenchmark(A = mo_species("aureus"),
- B = mo_genus("Staphylococcus"),
- C = mo_fullname("Staphylococcus aureus"),
- D = mo_family("Staphylococcaceae"),
- E = mo_order("Bacillales"),
- F = mo_class("Bacilli"),
- G = mo_phylum("Firmicutes"),
- H = mo_kingdom("Bacteria"),
- times = 10)
-print(run_it, unit = "ms", signif = 3)
-# Unit: milliseconds
-# expr min lq mean median uq max neval
-# A 0.612 0.623 0.685 0.653 0.789 0.814 10
-# B 0.556 0.575 0.680 0.671 0.689 0.958 10
-# C 1.520 1.710 1.800 1.820 1.950 1.970 10
-# D 0.547 0.665 0.723 0.688 0.811 0.997 10
-# E 0.490 0.541 0.633 0.629 0.748 0.756 10
-# F 0.482 0.569 0.612 0.590 0.663 0.756 10
-# G 0.551 0.558 0.601 0.586 0.632 0.735 10
-# H 0.494 0.564 0.595 0.575 0.608 0.757 10
run_it <- microbenchmark(A = mo_fullname("B_STPHY_AUR"),
+ B = mo_fullname("S. aureus"),
+ C = mo_fullname("Staphylococcus aureus"),
+ times = 10)
+print(run_it, unit = "ms", signif = 3)
+# Unit: milliseconds
+# expr min lq mean median uq max neval
+# A 6.730 7.030 8.030 7.750 8.72 9.73 10
+# B 22.400 23.000 27.100 23.600 27.10 46.00 10
+# C 0.835 0.877 0.978 0.925 1.12 1.18 10
So going from mo_fullname("Staphylococcus aureus")
to "Staphylococcus aureus"
takes 0.0009 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:
run_it <- microbenchmark(A = mo_species("aureus"),
+ B = mo_genus("Staphylococcus"),
+ C = mo_fullname("Staphylococcus aureus"),
+ D = mo_family("Staphylococcaceae"),
+ E = mo_order("Bacillales"),
+ F = mo_class("Bacilli"),
+ G = mo_phylum("Firmicutes"),
+ H = mo_kingdom("Bacteria"),
+ times = 10)
+print(run_it, unit = "ms", signif = 3)
+# Unit: milliseconds
+# expr min lq mean median uq max neval
+# A 0.468 0.470 0.533 0.489 0.595 0.690 10
+# B 0.504 0.513 0.555 0.520 0.571 0.711 10
+# C 0.629 0.687 0.864 0.855 1.050 1.130 10
+# D 0.505 0.515 0.575 0.530 0.649 0.767 10
+# E 0.442 0.457 0.529 0.481 0.531 0.774 10
+# F 0.447 0.510 0.554 0.568 0.609 0.618 10
+# G 0.443 0.470 0.492 0.477 0.506 0.601 10
+# H 0.448 0.459 0.491 0.466 0.515 0.633 10
Of course, when running mo_phylum("Firmicutes")
the function has zero knowledge about the actual microorganism, namely S. aureus. But since the result would be "Firmicutes"
too, there is no point in calculating the result. And because this package ‘knows’ all phyla of all known bacteria (according to the Catalogue of Life), it can just return the initial value immediately.
When the system language is non-English and supported by this AMR
package, some functions will have a translated result. This almost does’t take extra time:
mo_fullname("CoNS", language = "en") # or just mo_fullname("CoNS") on an English system
-# [1] "Coagulase-negative Staphylococcus (CoNS)"
-
-mo_fullname("CoNS", language = "es") # or just mo_fullname("CoNS") on a Spanish system
-# [1] "Staphylococcus coagulasa negativo (SCN)"
-
-mo_fullname("CoNS", language = "nl") # or just mo_fullname("CoNS") on a Dutch system
-# [1] "Coagulase-negatieve Staphylococcus (CNS)"
-
-run_it <- microbenchmark(en = mo_fullname("CoNS", language = "en"),
- de = mo_fullname("CoNS", language = "de"),
- nl = mo_fullname("CoNS", language = "nl"),
- es = mo_fullname("CoNS", language = "es"),
- it = mo_fullname("CoNS", language = "it"),
- fr = mo_fullname("CoNS", language = "fr"),
- pt = mo_fullname("CoNS", language = "pt"),
- times = 10)
-print(run_it, unit = "ms", signif = 4)
-# Unit: milliseconds
-# expr min lq mean median uq max neval
-# en 43.00 43.12 45.51 44.82 44.89 56.61 10
-# de 46.47 46.99 52.11 47.57 48.11 93.77 10
-# nl 60.86 62.72 67.57 63.69 63.99 108.20 10
-# es 45.74 46.05 52.37 46.42 47.98 103.00 10
-# it 45.84 45.89 51.90 47.66 47.73 94.83 10
-# fr 45.97 46.92 47.44 47.76 47.86 48.49 10
-# pt 45.93 46.77 47.36 47.77 47.93 48.12 10
mo_fullname("CoNS", language = "en") # or just mo_fullname("CoNS") on an English system
+# [1] "Coagulase-negative Staphylococcus (CoNS)"
+
+mo_fullname("CoNS", language = "es") # or just mo_fullname("CoNS") on a Spanish system
+# [1] "Staphylococcus coagulasa negativo (SCN)"
+
+mo_fullname("CoNS", language = "nl") # or just mo_fullname("CoNS") on a Dutch system
+# [1] "Coagulase-negatieve Staphylococcus (CNS)"
+
+run_it <- microbenchmark(en = mo_fullname("CoNS", language = "en"),
+ de = mo_fullname("CoNS", language = "de"),
+ nl = mo_fullname("CoNS", language = "nl"),
+ es = mo_fullname("CoNS", language = "es"),
+ it = mo_fullname("CoNS", language = "it"),
+ fr = mo_fullname("CoNS", language = "fr"),
+ pt = mo_fullname("CoNS", language = "pt"),
+ times = 10)
+print(run_it, unit = "ms", signif = 4)
+# Unit: milliseconds
+# expr min lq mean median uq max neval
+# en 19.29 20.39 22.42 20.64 21.26 38.98 10
+# de 22.03 22.40 23.78 23.08 23.60 31.74 10
+# nl 27.98 28.60 29.22 28.87 30.13 30.53 10
+# es 21.44 22.97 30.04 24.11 45.71 46.46 10
+# it 21.17 21.82 22.47 22.44 23.27 23.52 10
+# fr 20.75 21.58 24.10 22.04 22.41 42.96 10
+# pt 21.24 21.92 24.31 22.75 23.26 39.91 10
Currently supported are German, Dutch, Spanish, Italian, French and Portuguese.
freq.Rmd
To only show and quickly review the content of one variable, you can just select this variable in various ways. Let’s say we want to get the frequencies of the gender
variable of the septic_patients
dataset:
# Any of these will work:
-# freq(septic_patients$gender)
-# freq(septic_patients[, "gender"])
-
-# Using tidyverse:
-# septic_patients$gender %>% freq()
-# septic_patients[, "gender"] %>% freq()
-# septic_patients %>% freq("gender")
-
-# Probably the fastest and easiest:
-septic_patients %>% freq(gender)
# Any of these will work:
+# freq(septic_patients$gender)
+# freq(septic_patients[, "gender"])
+
+# Using tidyverse:
+# septic_patients$gender %>% freq()
+# septic_patients[, "gender"] %>% freq()
+# septic_patients %>% freq("gender")
+
+# Probably the fastest and easiest:
+septic_patients %>% freq(gender)
Frequency table of gender
from septic_patients
(2,000 x 49)
Class: character
Length: 2,000 (of which NA: 0 = 0.00%)
@@ -262,22 +262,22 @@ Longest: 1
Multiple variables will be pasted into one variable to review individual cases, keeping a univariate frequency table.
For illustration, we could add some more variables to the septic_patients
dataset to learn about bacterial properties:
Now all variables of the microorganisms
dataset have been joined to the septic_patients
dataset. The microorganisms
dataset consists of the following variables:
colnames(microorganisms)
-# [1] "mo" "col_id" "fullname" "kingdom" "phylum"
-# [6] "class" "order" "family" "genus" "species"
-# [11] "subspecies" "rank" "ref" "species_id" "source"
-# [16] "prevalence"
colnames(microorganisms)
+# [1] "mo" "col_id" "fullname" "kingdom" "phylum"
+# [6] "class" "order" "family" "genus" "species"
+# [11] "subspecies" "rank" "ref" "species_id" "source"
+# [16] "prevalence"
If we compare the dimensions between the old and new dataset, we can see that these 15 variables were added:
- +So now the genus
and species
variables are available. A frequency table of these combined variables can be created like this:
Frequency table of genus
and species
from my_patients
(2,000 x 64)
Columns: 2
Length: 2,000 (of which NA: 0 = 0.00%)
@@ -423,10 +423,10 @@ Longest: 34
Frequency tables can be created of any input.
In case of numeric values (like integers, doubles, etc.) additional descriptive statistics will be calculated and shown into the header:
-# # get age distribution of unique patients
-septic_patients %>%
- distinct(patient_id, .keep_all = TRUE) %>%
- freq(age, nmax = 5, header = TRUE)
# # get age distribution of unique patients
+septic_patients %>%
+ distinct(patient_id, .keep_all = TRUE) %>%
+ freq(age, nmax = 5, header = TRUE)
Frequency table of age
from a data.frame
(981 x 49)
Class: numeric
Length: 981 (of which NA: 0 = 0.00%)
@@ -506,8 +506,8 @@ Outliers: 15 (unique count: 12)
To sort frequencies of factors on their levels instead of item count, use the sort.count
parameter.
sort.count
is TRUE
by default. Compare this default behaviour…
Frequency table of hospital_id
from septic_patients
(2,000 x 49)
Class: factor (numeric)
Length: 2,000 (of which NA: 0 = 0.00%)
@@ -558,8 +558,8 @@ Unique: 4
… to this, where items are now sorted on factor levels:
- +Frequency table of hospital_id
from septic_patients
(2,000 x 49)
Class: factor (numeric)
Length: 2,000 (of which NA: 0 = 0.00%)
@@ -610,8 +610,8 @@ Unique: 4
All classes will be printed into the header. Variables with the new rsi
class of this AMR package are actually ordered factors and have three classes (look at Class
in the header):
Frequency table of AMX
from septic_patients
(2,000 x 49)
Class: factor > ordered > rsi (numeric)
Length: 2,000 (of which NA: 771 = 38.55%)
@@ -661,8 +661,8 @@ Group: Beta-lactams/penicillins
Frequencies of dates will show the oldest and newest date in the data, and the amount of days between them:
- +Frequency table of date
from septic_patients
(2,000 x 49)
Class: Date (numeric)
Length: 2,000 (of which NA: 0 = 0.00%)
@@ -728,11 +728,11 @@ Median: 31 July 2009 (47.39%)
A frequency table is actually a regular data.frame
, with the exception that it contains an additional class.
[1] “freq” “data.frame”
Because of this additional class, a frequency table prints like the examples above. But the object itself contains the complete table without a row limitation:
- +[1] 74 5
na.rm
With the na.rm
parameter you can remove NA
values from the frequency table (defaults to TRUE
, but the number of NA
values will always be shown into the header):
Frequency table of AMX
from septic_patients
(2,000 x 49)
Class: factor > ordered > rsi (numeric)
Length: 2,000 (of which NA: 771 = 38.55%)
@@ -803,8 +803,8 @@ Group: Beta-lactams/penicillins
Parameter row.names
A frequency table shows row indices. To remove them, use row.names = FALSE
:
Frequency table of hospital_id
from septic_patients
(2,000 x 49)
Class: factor (numeric)
Length: 2,000 (of which NA: 0 = 0.00%)
@@ -855,21 +855,21 @@ Unique: 4
markdown
The markdown
parameter is TRUE
at default in non-interactive sessions, like in reports created with R Markdown. This will always print all rows, unless nmax
is set. Without markdown (like in regular R), a frequency table would print like:
septic_patients %>%
- freq(hospital_id, markdown = FALSE)
-# Frequency table of `hospital_id` from `septic_patients` (2,000 x 49)
-#
-# Class: factor (numeric)
-# Length: 2,000 (of which NA: 0 = 0.00%)
-# Levels: 4: A, B, C, D
-# Unique: 4
-#
-# Item Count Percent Cum. Count Cum. Percent
-# --- ----- ------ -------- ----------- -------------
-# 1 D 762 38.1% 762 38.1%
-# 2 B 663 33.2% 1,425 71.2%
-# 3 A 321 16.0% 1,746 87.3%
-# 4 C 254 12.7% 2,000 100.0%
septic_patients %>%
+ freq(hospital_id, markdown = FALSE)
+# Frequency table of `hospital_id` from `septic_patients` (2,000 x 49)
+#
+# Class: factor (numeric)
+# Length: 2,000 (of which NA: 0 = 0.00%)
+# Levels: 4: A, B, C, D
+# Unique: 4
+#
+# Item Count Percent Cum. Count Cum. Percent
+# --- ----- ------ -------- ----------- -------------
+# 1 D 762 38.1% 762 38.1%
+# 2 B 663 33.2% 1,425 71.2%
+# 3 A 321 16.0% 1,746 87.3%
+# 4 C 254 12.7% 2,000 100.0%
resistance_predict.Rmd
As with many uses in R, we need some additional packages for AMR analysis. Our package works closely together with the tidyverse packages dplyr
and ggplot2
by Dr Hadley Wickham. The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.
Our AMR
package depends on these packages and even extends their use and functions.
library(dplyr)
-library(ggplot2)
-library(AMR)
-
-# (if not yet installed, install with:)
-# install.packages(c("tidyverse", "AMR"))
library(dplyr)
+library(ggplot2)
+library(AMR)
+
+# (if not yet installed, install with:)
+# install.packages(c("tidyverse", "AMR"))
Our package contains a function resistance_predict()
, which takes the same input as functions for other AMR analysis. Based on a date column, it calculates cases per year and uses a regression model to predict antimicrobial resistance.
It is basically as easy as:
-# resistance prediction of piperacillin/tazobactam (TZP):
-resistance_predict(tbl = septic_patients, col_date = "date", col_ab = "TZP")
-
-# or:
-septic_patients %>%
- resistance_predict(col_ab = "TZP")
-
-# to bind it to object 'predict_TZP' for example:
-predict_TZP <- septic_patients %>%
- resistance_predict(col_ab = "TZP")
# resistance prediction of piperacillin/tazobactam (TZP):
+resistance_predict(tbl = septic_patients, col_date = "date", col_ab = "TZP")
+
+# or:
+septic_patients %>%
+ resistance_predict(col_ab = "TZP")
+
+# to bind it to object 'predict_TZP' for example:
+predict_TZP <- septic_patients %>%
+ resistance_predict(col_ab = "TZP")
The function will look for a date column itself if col_date
is not set.
When running any of these commands, a summary of the regression model will be printed unless using resistance_predict(..., info = FALSE)
.
# NOTE: Using column `date` as input for `col_date`.
@@ -257,55 +257,55 @@
#
# Number of Fisher Scoring iterations: 4
This text is only a printed summary - the actual result (output) of the function is a data.frame
containing for each year: the number of observations, the actual observed resistance, the estimated resistance and the standard error below and above the estimation:
predict_TZP
-# year value se_min se_max observations observed estimated
-# 1 2003 0.06250000 NA NA 32 0.06250000 0.05486389
-# 2 2004 0.08536585 NA NA 82 0.08536585 0.06089002
-# 3 2005 0.05000000 NA NA 60 0.05000000 0.06753075
-# 4 2006 0.05084746 NA NA 59 0.05084746 0.07483801
-# 5 2007 0.12121212 NA NA 66 0.12121212 0.08286570
-# 6 2008 0.04166667 NA NA 72 0.04166667 0.09166918
-# 7 2009 0.01639344 NA NA 61 0.01639344 0.10130461
-# 8 2010 0.05660377 NA NA 53 0.05660377 0.11182814
-# 9 2011 0.18279570 NA NA 93 0.18279570 0.12329488
-# 10 2012 0.30769231 NA NA 65 0.30769231 0.13575768
-# 11 2013 0.06896552 NA NA 58 0.06896552 0.14926576
-# 12 2014 0.10000000 NA NA 60 0.10000000 0.16386307
-# 13 2015 0.23636364 NA NA 55 0.23636364 0.17958657
-# 14 2016 0.22619048 NA NA 84 0.22619048 0.19646431
-# 15 2017 0.16279070 NA NA 86 0.16279070 0.21451350
-# 16 2018 0.23373852 0.2021578 0.2653193 NA NA 0.23373852
-# 17 2019 0.25412909 0.2168525 0.2914057 NA NA 0.25412909
-# 18 2020 0.27565854 0.2321869 0.3191302 NA NA 0.27565854
-# 19 2021 0.29828252 0.2481942 0.3483709 NA NA 0.29828252
-# 20 2022 0.32193804 0.2649008 0.3789753 NA NA 0.32193804
-# 21 2023 0.34654311 0.2823269 0.4107593 NA NA 0.34654311
-# 22 2024 0.37199700 0.3004860 0.4435080 NA NA 0.37199700
-# 23 2025 0.39818127 0.3193839 0.4769787 NA NA 0.39818127
-# 24 2026 0.42496142 0.3390173 0.5109056 NA NA 0.42496142
-# 25 2027 0.45218939 0.3593720 0.5450068 NA NA 0.45218939
-# 26 2028 0.47970658 0.3804212 0.5789920 NA NA 0.47970658
-# 27 2029 0.50734745 0.4021241 0.6125708 NA NA 0.50734745
predict_TZP
+# year value se_min se_max observations observed estimated
+# 1 2003 0.06250000 NA NA 32 0.06250000 0.05486389
+# 2 2004 0.08536585 NA NA 82 0.08536585 0.06089002
+# 3 2005 0.05000000 NA NA 60 0.05000000 0.06753075
+# 4 2006 0.05084746 NA NA 59 0.05084746 0.07483801
+# 5 2007 0.12121212 NA NA 66 0.12121212 0.08286570
+# 6 2008 0.04166667 NA NA 72 0.04166667 0.09166918
+# 7 2009 0.01639344 NA NA 61 0.01639344 0.10130461
+# 8 2010 0.05660377 NA NA 53 0.05660377 0.11182814
+# 9 2011 0.18279570 NA NA 93 0.18279570 0.12329488
+# 10 2012 0.30769231 NA NA 65 0.30769231 0.13575768
+# 11 2013 0.06896552 NA NA 58 0.06896552 0.14926576
+# 12 2014 0.10000000 NA NA 60 0.10000000 0.16386307
+# 13 2015 0.23636364 NA NA 55 0.23636364 0.17958657
+# 14 2016 0.22619048 NA NA 84 0.22619048 0.19646431
+# 15 2017 0.16279070 NA NA 86 0.16279070 0.21451350
+# 16 2018 0.23373852 0.2021578 0.2653193 NA NA 0.23373852
+# 17 2019 0.25412909 0.2168525 0.2914057 NA NA 0.25412909
+# 18 2020 0.27565854 0.2321869 0.3191302 NA NA 0.27565854
+# 19 2021 0.29828252 0.2481942 0.3483709 NA NA 0.29828252
+# 20 2022 0.32193804 0.2649008 0.3789753 NA NA 0.32193804
+# 21 2023 0.34654311 0.2823269 0.4107593 NA NA 0.34654311
+# 22 2024 0.37199700 0.3004860 0.4435080 NA NA 0.37199700
+# 23 2025 0.39818127 0.3193839 0.4769787 NA NA 0.39818127
+# 24 2026 0.42496142 0.3390173 0.5109056 NA NA 0.42496142
+# 25 2027 0.45218939 0.3593720 0.5450068 NA NA 0.45218939
+# 26 2028 0.47970658 0.3804212 0.5789920 NA NA 0.47970658
+# 27 2029 0.50734745 0.4021241 0.6125708 NA NA 0.50734745
The function plot
is available in base R, and can be extended by other packages to depend the output based on the type of input. We extended its function to cope with resistance predictions:
This is the fastest way to plot the result. It automatically adds the right axes, error bars, titles, number of available observations and type of model.
We also support the ggplot2
package with our custom function ggplot_rsi_predict()
to create more appealing plots:
Resistance is not easily predicted; if we look at vancomycin resistance in Gram positives, the spread (i.e. standard error) is enormous:
-septic_patients %>%
- filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
- resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE) %>%
- ggplot_rsi_predict()
-# NOTE: Using column `date` as input for `col_date`.
septic_patients %>%
+ filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
+ resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE) %>%
+ ggplot_rsi_predict()
+# NOTE: Using column `date` as input for `col_date`.
Vancomycin resistance could be 100% in ten years, but might also stay around 0%.
You can define the model with the model
parameter. The default model is a generalised linear regression model using a binomial distribution, assuming that a period of zero resistance was followed by a period of increasing resistance leading slowly to more and more resistance.
For the vancomycin resistance in Gram positive bacteria, a linear model might be more appropriate since no (left half of a) binomial distribution is to be expected based on the observed years:
-septic_patients %>%
- filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
- resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>%
- ggplot_rsi_predict()
-# NOTE: Using column `date` as input for `col_date`.
septic_patients %>%
+ filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
+ resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>%
+ ggplot_rsi_predict()
+# NOTE: Using column `date` as input for `col_date`.
This seems more likely, doesn’t it?
The model itself is also available from the object, as an attribute
:
(
This package is available on the official R network (CRAN), which has a peer-reviewed submission process. Install this package in R with:
- +It will be downloaded and installed automatically. For RStudio, click on the menu Tools > Install Packages… and then type in “AMR” and press Install.
Note: Not all functions on this website may be available in this latest release. To use all functions and data sets mentioned on this website, install the latest development version.
The latest and unpublished development version can be installed with (precaution: may be unstable):
- +NOTE: The WHOCC copyright does not allow use for commercial purposes, unlike any other info from this package. See \url{https://www.whocc.no/copyright_disclaimer/}.
Read more about the data from WHOCC in our manual.
We support WHONET and EARS-Net data. Exported files from WHONET can be imported into R and can be analysed easily using this package. For education purposes, we created an example data set WHONET
with the exact same structure as a WHONET export file. Furthermore, this package also contains a data set antibiotics
with all EARS-Net antibiotic abbreviations, and knows almost all WHONET abbreviations for microorganisms. When using WHONET data as input for analysis, all input parameters will be set automatically.
Read our tutorial about how to work with WHONET data here.
diff --git a/docs/news/index.html b/docs/news/index.html index 0dd8aaad..2d75fc22 100644 --- a/docs/news/index.html +++ b/docs/news/index.html @@ -78,7 +78,7 @@Additional way to calculate co-resistance, i.e. when using multiple antibiotics as input for portion_*
functions or count_*
functions. This can be used to determine the empiric susceptibily of a combination therapy. A new parameter only_all_tested
(which defaults to FALSE
) replaces the old also_single_tested
and can be used to select one of the two methods to count isolates and calculate portions. The difference can be seen in this example table (which is also on the portion
and count
help pages), where the %SI is being determined:
# -------------------------------------------------------------------------
-# only_all_tested = FALSE only_all_tested = TRUE
-# Antibiotic Antibiotic ----------------------- -----------------------
-# A B include as include as include as include as
-# numerator denominator numerator denominator
-# ---------- ---------- ---------- ----------- ---------- -----------
-# S S X X X X
-# I S X X X X
-# R S X X X X
-# not tested S X X - -
-# S I X X X X
-# I I X X X X
-# R I X X X X
-# not tested I X X - -
-# S R X X X X
-# I R X X X X
-# R R - X - X
-# not tested R - - - -
-# S not tested X X - -
-# I not tested X X - -
-# R not tested - - - -
-# not tested not tested - - - -
-# -------------------------------------------------------------------------
# -------------------------------------------------------------------------
+# only_all_tested = FALSE only_all_tested = TRUE
+# Antibiotic Antibiotic ----------------------- -----------------------
+# A B include as include as include as include as
+# numerator denominator numerator denominator
+# ---------- ---------- ---------- ----------- ---------- -----------
+# S S X X X X
+# I S X X X X
+# R S X X X X
+# not tested S X X - -
+# S I X X X X
+# I I X X X X
+# R I X X X X
+# not tested I X X - -
+# S R X X X X
+# I R X X X X
+# R R - X - X
+# not tested R - - - -
+# S not tested X X - -
+# I not tested X X - -
+# R not tested - - - -
+# not tested not tested - - - -
+# -------------------------------------------------------------------------
Since this is a major change, usage of the old also_single_tested
will throw an informative error that it has been replaced by only_all_tested
.
Function rsi_df()
to transform a data.frame
to a data set containing only the microbial interpretation (S, I, R), the antibiotic, the percentage of S/I/R and the number of available isolates. This is a convenient combination of the existing functions count_df()
and portion_df()
to immediately show resistance percentages and number of available isolates:
Support for all scientifically published pathotypes of E. coli to date (that we could find). Supported are:
@@ -323,12 +323,12 @@All these lead to the microbial ID of E. coli:
-as.mo("UPEC")
-# B_ESCHR_COL
-mo_name("UPEC")
-# "Escherichia coli"
-mo_gramstain("EHEC")
-# "Gram-negative"
as.mo("UPEC")
+# B_ESCHR_COL
+mo_name("UPEC")
+# "Escherichia coli"
+mo_gramstain("EHEC")
+# "Gram-negative"
mo_info()
as an analogy to ab_info()
. The mo_info()
prints a list with the full taxonomy, authors, and the URL to the online database of a microorganismFunction mo_synonyms()
to get all previously accepted taxonomic names of a microorganism
septic_patients %>%
- freq(age) %>%
- boxplot()
-# grouped boxplots:
-septic_patients %>%
- group_by(hospital_id) %>%
- freq(age) %>%
- boxplot()
septic_patients %>%
+ freq(age) %>%
+ boxplot()
+# grouped boxplots:
+septic_patients %>%
+ group_by(hospital_id) %>%
+ freq(age) %>%
+ boxplot()
New website!
We’ve got a new website: https://msberends.gitlab.io/AMR (built with the great pkgdown
)
mo
codes changed (e.g. Streptococcus changed from B_STRPTC
to B_STRPT
). A translation table is used internally to support older microorganism IDs, so users will not notice this difference.mo
codes changed (e.g. Streptococcus changed from B_STRPTC
to B_STRPT
). A translation table is used internally to support older microorganism IDs, so users will not notice this difference.mo_rank()
for the taxonomic rank (genus, species, infraspecies, etc.)mo_url()
to get the direct URL of a species from the Catalogue of LifeNew filters for antimicrobial classes. Use these functions to filter isolates on results in one of more antibiotics from a specific class:
-filter_aminoglycosides()
-filter_carbapenems()
-filter_cephalosporins()
-filter_1st_cephalosporins()
-filter_2nd_cephalosporins()
-filter_3rd_cephalosporins()
-filter_4th_cephalosporins()
-filter_fluoroquinolones()
-filter_glycopeptides()
-filter_macrolides()
-filter_tetracyclines()
filter_aminoglycosides()
+filter_carbapenems()
+filter_cephalosporins()
+filter_1st_cephalosporins()
+filter_2nd_cephalosporins()
+filter_3rd_cephalosporins()
+filter_4th_cephalosporins()
+filter_fluoroquinolones()
+filter_glycopeptides()
+filter_macrolides()
+filter_tetracyclines()
The antibiotics
data set will be searched, after which the input data will be checked for column names with a value in any abbreviations, codes or official names found in the antibiotics
data set. For example:
All ab_*
functions are deprecated and replaced by atc_*
functions:
ab_property -> atc_property()
-ab_name -> atc_name()
-ab_official -> atc_official()
-ab_trivial_nl -> atc_trivial_nl()
-ab_certe -> atc_certe()
-ab_umcg -> atc_umcg()
-ab_tradenames -> atc_tradenames()
as.atc()
internally. The old atc_property
has been renamed atc_online_property()
. This is done for two reasons: firstly, not all ATC codes are of antibiotics (ab) but can also be of antivirals or antifungals. Secondly, the input must have class atc
or must be coerable to this class. Properties of these classes should start with the same class name, analogous to as.mo()
and e.g. mo_genus
.ab_property -> atc_property()
+ab_name -> atc_name()
+ab_official -> atc_official()
+ab_trivial_nl -> atc_trivial_nl()
+ab_certe -> atc_certe()
+ab_umcg -> atc_umcg()
+ab_tradenames -> atc_tradenames()
as.atc()
internally. The old atc_property
has been renamed atc_online_property()
. This is done for two reasons: firstly, not all ATC codes are of antibiotics (ab) but can also be of antivirals or antifungals. Secondly, the input must have class atc
or must be coerable to this class. Properties of these classes should start with the same class name, analogous to as.mo()
and e.g. mo_genus
.
set_mo_source()
and get_mo_source()
to use your own predefined MO codes as input for as.mo()
and consequently all mo_*
functionsdplyr
version 0.8.0guess_ab_col()
to find an antibiotic column in a tableas.atc()New function age_groups()
to split ages into custom or predefined groups (like children or elderly). This allows for easier demographic antimicrobial resistance analysis per age group.
New function ggplot_rsi_predict()
as well as the base R plot()
function can now be used for resistance prediction calculated with resistance_predict()
:
-
+
Functions filter_first_isolate()
and filter_first_weighted_isolate()
to shorten and fasten filtering on data sets with antimicrobial results, e.g.:
-
+
is equal to:
-
+
New function availability()
to check the number of available (non-empty) results in a data.frame
@@ -598,33 +598,33 @@ These functions use as.atc()
Now handles incorrect spelling, like i
instead of y
and f
instead of ph
:
-
+
Uncertainty of the algorithm is now divided into four levels, 0 to 3, where the default allow_uncertain = TRUE
is equal to uncertainty level 2. Run ?as.mo
for more info about these levels.
-# equal:
-as.mo(..., allow_uncertain = TRUE)
-as.mo(..., allow_uncertain = 2)
-
-# also equal:
-as.mo(..., allow_uncertain = FALSE)
-as.mo(..., allow_uncertain = 0)
+# equal:
+as.mo(..., allow_uncertain = TRUE)
+as.mo(..., allow_uncertain = 2)
+
+# also equal:
+as.mo(..., allow_uncertain = FALSE)
+as.mo(..., allow_uncertain = 0)
Using as.mo(..., allow_uncertain = 3)
could lead to very unreliable results.
Implemented the latest publication of Becker et al. (2019), for categorising coagulase-negative Staphylococci
All microbial IDs that found are now saved to a local file ~/.Rhistory_mo
. Use the new function clean_mo_history()
to delete this file, which resets the algorithms.
Incoercible results will now be considered ‘unknown’, MO code UNKNOWN
. On foreign systems, properties of these will be translated to all languages already previously supported: German, Dutch, French, Italian, Spanish and Portuguese:
-
+
Fix for vector containing only empty values
Finds better results when input is in other languages
@@ -670,19 +670,19 @@ Using as.mo(..., allow_uncertain = 3)
Support for tidyverse quasiquotation! Now you can create frequency tables of function outcomes:
-# Determine genus of microorganisms (mo) in `septic_patients` data set:
-# OLD WAY
-septic_patients %>%
- mutate(genus = mo_genus(mo)) %>%
- freq(genus)
-# NEW WAY
-septic_patients %>%
- freq(mo_genus(mo))
-
-# Even supports grouping variables:
-septic_patients %>%
- group_by(gender) %>%
- freq(mo_genus(mo))
+# Determine genus of microorganisms (mo) in `septic_patients` data set:
+# OLD WAY
+septic_patients %>%
+ mutate(genus = mo_genus(mo)) %>%
+ freq(genus)
+# NEW WAY
+septic_patients %>%
+ freq(mo_genus(mo))
+
+# Even supports grouping variables:
+septic_patients %>%
+ group_by(gender) %>%
+ freq(mo_genus(mo))
Header info is now available as a list, with the header
function
The parameter header
is now set to TRUE
at default, even for markdown
@@ -713,9 +713,9 @@ Using as.mo(..., allow_uncertain = 3)
as.mo(..., allow_uncertain = 3)Fewer than 3 characters as input for as.mo
will return NA
Function as.mo
(and all mo_*
wrappers) now supports genus abbreviations with “species” attached
-
+
Added parameter combine_IR
(TRUE/FALSE) to functions portion_df
and count_df
, to indicate that all values of I and R must be merged into one, so the output only consists of S vs. IR (susceptible vs. non-susceptible)
Fix for portion_*(..., as_percent = TRUE)
when minimal number of isolates would not be met
@@ -773,17 +773,17 @@ Using as.mo(..., allow_uncertain = 3)
Support for grouping variables, test with:
-
+
Support for (un)selecting columns:
-
+
-Check for hms::is.hms
+ Check for hms::is.hms
Now prints in markdown at default in non-interactive sessions
No longer adds the factor level column and sorts factors on count again
@@ -840,9 +840,9 @@ Using as.mo(..., allow_uncertain = 3)
as.mo(..., allow_uncertain = 3)
They also come with support for German, Dutch, French, Italian, Spanish and Portuguese:
-mo_gramstain("E. coli")
-# [1] "Gram negative"
-mo_gramstain("E. coli", language = "de") # German
-# [1] "Gramnegativ"
-mo_gramstain("E. coli", language = "es") # Spanish
-# [1] "Gram negativo"
-mo_fullname("S. group A", language = "pt") # Portuguese
-# [1] "Streptococcus grupo A"
+mo_gramstain("E. coli")
+# [1] "Gram negative"
+mo_gramstain("E. coli", language = "de") # German
+# [1] "Gramnegativ"
+mo_gramstain("E. coli", language = "es") # Spanish
+# [1] "Gram negativo"
+mo_fullname("S. group A", language = "pt") # Portuguese
+# [1] "Streptococcus grupo A"
Furthermore, former taxonomic names will give a note about the current taxonomic name:
-mo_gramstain("Esc blattae")
-# Note: 'Escherichia blattae' (Burgess et al., 1973) was renamed 'Shimwellia blattae' (Priest and Barker, 2010)
-# [1] "Gram negative"
+mo_gramstain("Esc blattae")
+# Note: 'Escherichia blattae' (Burgess et al., 1973) was renamed 'Shimwellia blattae' (Priest and Barker, 2010)
+# [1] "Gram negative"
Functions count_R
, count_IR
, count_I
, count_SI
and count_S
to selectively count resistant or susceptible isolates
@@ -883,18 +883,18 @@ Using as.mo(..., allow_uncertain = 3)
-
Functions as.mo
and is.mo
as replacements for as.bactid
and is.bactid
(since the microoganisms
data set not only contains bacteria). These last two functions are deprecated and will be removed in a future release. The as.mo
function determines microbial IDs using intelligent rules:
-as.mo("E. coli")
-# [1] B_ESCHR_COL
-as.mo("MRSA")
-# [1] B_STPHY_AUR
-as.mo("S group A")
-# [1] B_STRPTC_GRA
+as.mo("E. coli")
+# [1] B_ESCHR_COL
+as.mo("MRSA")
+# [1] B_STPHY_AUR
+as.mo("S group A")
+# [1] B_STRPTC_GRA
And with great speed too - on a quite regular Linux server from 2007 it takes us less than 0.02 seconds to transform 25,000 items:
-
+
- Added parameter
reference_df
for as.mo
, so users can supply their own microbial IDs, name or codes as a reference table
- Renamed all previous references to
bactid
to mo
, like:
@@ -922,12 +922,12 @@ Using as.mo(..., allow_uncertain = 3)Added three antimicrobial agents to the antibiotics
data set: Terbinafine (D01BA02), Rifaximin (A07AA11) and Isoconazole (D01AC05)
-
Added 163 trade names to the antibiotics
data set, it now contains 298 different trade names in total, e.g.:
-
+
- For
first_isolate
, rows will be ignored when there’s no species available
- Function
ratio
is now deprecated and will be removed in a future release, as it is not really the scope of this package
@@ -938,13 +938,13 @@ Using as.mo(..., allow_uncertain = 3)
-
Support for quasiquotation in the functions series count_*
and portions_*
, and n_rsi
. This allows to check for more than 2 vectors or columns.
-
+
- Edited
ggplot_rsi
and geom_rsi
so they can cope with count_df
. The new fun
parameter has value portion_df
at default, but can be set to count_df
.
- Fix for
ggplot_rsi
when the ggplot2
package was not loaded
@@ -958,12 +958,12 @@ Using as.mo(..., allow_uncertain = 3)
-
Support for types (classes) list and matrix for freq
-
+
For lists, subsetting is possible:
-
+
as.mo(..., allow_uncertain = 3)
as.mo(..., allow_uncertain = 3)
-Now possible to coerce MIC values with a space between operator and value, i.e. as.mic("<= 0.002")
now works
+Now possible to coerce MIC values with a space between operator and value, i.e. as.mic("<= 0.002")
now works
Classes rsi
and mic
do not add the attribute package.version
anymore
Added "groups"
option for atc_property(..., property)
. It will return a vector of the ATC hierarchy as defined by the WHO. The new function atc_groups
is a convenient wrapper around this.
Build-in host check for atc_property
as it requires the host set by url
to be responsive
@@ -1112,9 +1112,9 @@ Using as.mo(..., allow_uncertain = 3)
as.mo(..., allow_uncertain = 3)
as.mo(..., allow_uncertain = 3)Added barplots for rsi
and mic
classes
as.mo(..., allow_uncertain = 3)
Contents
ITIS.Rd
All taxonomic names of all microorganisms are included in this package, using the authoritative Integrated Taxonomic Information System (ITIS).
- -
-This package contains the complete microbial taxonomic data (with all nine taxonomic ranks - from kingdom to subspecies) from the publicly available Integrated Taxonomic Information System (ITIS, https://www.itis.gov).
All ~20,000 (sub)species from the taxonomic kingdoms Bacteria, Fungi and Protozoa are included in this package, as well as all their ~2,500 previously accepted names known to ITIS. Furthermore, the responsible authors and year of publication are available. This allows users to use authoritative taxonomic information for their data analysis on any microorganism, not only human pathogens. It also helps to quickly determine the Gram stain of bacteria, since ITIS honours the taxonomic branching order of bacterial phyla according to Cavalier-Smith (2002), which defines that all bacteria are classified into either subkingdom Negibacteria or subkingdom Posibacteria.
-ITIS is a partnership of U.S., Canadian, and Mexican agencies and taxonomic specialists [3].
- -
-On our website https://msberends.gitlab.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
# NOT RUN { -# Get a note when a species was renamed -mo_shortname("Chlamydia psittaci") -# Note: 'Chlamydia psittaci' (Page, 1968) was renamed -# 'Chlamydophila psittaci' (Everett et al., 1999) -# [1] "C. psittaci" - -# Get any property from the entire taxonomic tree for all included species -mo_class("E. coli") -# [1] "Gammaproteobacteria" - -mo_family("E. coli") -# [1] "Enterobacteriaceae" - -mo_subkingdom("E. coli") -# [1] "Negibacteria" - -mo_gramstain("E. coli") # based on subkingdom -# [1] "Gram negative" - -mo_ref("E. coli") -# [1] "Castellani and Chalmers, 1919" - -# Do not get mistaken - the package only includes microorganisms -mo_phylum("C. elegans") -# [1] "Cyanobacteria" # Bacteria?! -mo_fullname("C. elegans") -# [1] "Chroococcus limneticus elegans" # Because a microorganism was found -# }-
itis.Rd
All taxonomic names of all microorganisms are included in this package, using the authoritative Integrated Taxonomic Information System (ITIS).
- -
-This package contains the complete microbial taxonomic data (with all nine taxonomic ranks - from kingdom to subspecies) from the publicly available Integrated Taxonomic Information System (ITIS, https://www.itis.gov).
All (sub)species from the taxonomic kingdoms Bacteria, Fungi and Protozoa are included in this package, as well as all previously accepted names known to ITIS. Furthermore, the responsible authors and year of publication are available. This allows users to use authoritative taxonomic information for their data analysis on any microorganism, not only human pathogens. It also helps to quickly determine the Gram stain of bacteria, since all bacteria are classified into subkingdom Negibacteria or Posibacteria.
-ITIS is a partnership of U.S., Canadian, and Mexican agencies and taxonomic specialists [3].
- -
-On our website https://msberends.gitlab.io/AMR you can find a omprehensive tutorial about how to conduct AMR analysis and find the complete documentation of all functions, which reads a lot easier than in R.
# NOT RUN { -# Get a note when a species was renamed -mo_shortname("Chlamydia psittaci") -# Note: 'Chlamydia psittaci' (Page, 1968) was renamed -# 'Chlamydophila psittaci' (Everett et al., 1999) -# [1] "C. psittaci" - -# Get any property from the entire taxonomic tree for all included species -mo_class("E. coli") -# [1] "Gammaproteobacteria" - -mo_family("E. coli") -# [1] "Enterobacteriaceae" - -mo_subkingdom("E. coli") -# [1] "Negibacteria" - -mo_gramstain("E. coli") # based on subkingdom -# [1] "Gram negative" - -mo_ref("E. coli") -# [1] "Castellani and Chalmers, 1919" - -# Do not get mistaken - the package only includes microorganisms -mo_phylum("C. elegans") -# [1] "Cyanobacteria" # Bacteria?! -mo_fullname("C. elegans") -# [1] "Chroococcus limneticus elegans" # Because a microorganism was found -# }-