diff --git a/DESCRIPTION b/DESCRIPTION index 2cbd24a6..5b271565 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR -Version: 0.9.0.9005 -Date: 2019-12-21 +Version: 0.9.0.9006 +Date: 2019-12-22 Title: Antimicrobial Resistance Analysis Authors@R: c( person(role = c("aut", "cre"), diff --git a/NEWS.md b/NEWS.md index 3beea51d..e6a6bd6d 100755 --- a/NEWS.md +++ b/NEWS.md @@ -1,10 +1,13 @@ -# AMR 0.9.0.9005 -## Last updated: 21-Dec-2019 +# AMR 0.9.0.9006 +## Last updated: 22-Dec-2019 ### Changes * Speed improvement for `as.mo()` (and consequently all `mo_*` functions that use `as.mo()` internally), especially for the *G. species* format (G for genus), like *E. coli* and *K penumoniae* * Input values for `as.disk()` limited to a maximum of 50 millimeters +### Other +* Add a `CITATION` file + # AMR 0.9.0 ### Breaking diff --git a/R/globals.R b/R/globals.R index 8f0ac9d6..8513bb0d 100755 --- a/R/globals.R +++ b/R/globals.R @@ -35,6 +35,7 @@ globalVariables(c(".", "first_isolate_row_index", "fullname", "fullname_lower", + "g_species", "genus", "gramstain", "group", diff --git a/docs/404.html b/docs/404.html index 21f7adb5..56acff37 100644 --- a/docs/404.html +++ b/docs/404.html @@ -84,7 +84,7 @@ AMR (for R) - 0.9.0.9005 + 0.9.0.9006 diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index ea9b3b04..5ca7bd6b 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -84,7 +84,7 @@ AMR (for R) - 0.9.0.9005 + 0.9.0.9006 diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index ddceba46..7ef2a707 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -41,7 +41,7 @@ AMR (for R) - 0.9.0.9005 + 0.9.0.9006 @@ -187,7 +187,7 @@

How to conduct AMR analysis

Matthijs S. Berends

-

21 December 2019

+

22 December 2019

@@ -196,7 +196,7 @@ -

Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 21 December 2019.

+

Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 22 December 2019.

Introduction

@@ -227,21 +227,21 @@ -2019-12-21 +2019-12-22 abcd Escherichia coli S S -2019-12-21 +2019-12-22 abcd Escherichia coli S R -2019-12-21 +2019-12-22 efgh Escherichia coli R @@ -336,70 +336,70 @@ -2011-10-17 -V10 -Hospital D +2017-03-14 +J6 +Hospital B Staphylococcus aureus S +I +R S -S -S -F +M -2017-05-17 -R1 -Hospital B -Klebsiella pneumoniae -S -S -S -S -F - - -2015-07-05 -J5 -Hospital B -Escherichia coli -S +2015-04-15 +B9 +Hospital D +Staphylococcus aureus +R S S S M - -2012-01-01 -R10 -Hospital C -Escherichia coli -S -S -S -S -F - -2014-07-18 -T6 +2017-01-04 +Q9 Hospital A Escherichia coli S S +R +S +F + + +2015-05-23 +X2 +Hospital A +Streptococcus pneumoniae +R +S +R +S +F + + +2017-05-20 +X5 +Hospital C +Staphylococcus aureus +R +R S S F -2016-07-17 -K1 -Hospital C +2016-09-27 +N7 +Hospital A Escherichia coli -I -I +R S S -M +S +F @@ -421,8 +421,8 @@ # # Item Count Percent Cum. Count Cum. Percent # --- ----- ------- -------- ----------- ------------- -# 1 M 10,402 52.01% 10,402 52.01% -# 2 F 9,598 47.99% 20,000 100.00% +# 1 M 10,464 52.32% 10,464 52.32% +# 2 F 9,536 47.68% 20,000 100.00%

So, we can draw at least two conclusions immediately. From a data scientists perspective, the data looks clean: only values M and F. From a researchers perspective: there are slightly more men. Nothing we didn’t already know.

The data is already quite clean, but we still need to transform some variables. The bacteria column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate() function of the dplyr package makes this really easy:

data <- data %>%
@@ -437,8 +437,8 @@
 # Other rules by this AMR package
 # Non-EUCAST: inherit amoxicillin results for unavailable ampicillin (no changes)
 # Non-EUCAST: inherit ampicillin results for unavailable amoxicillin (no changes)
-# Non-EUCAST: set amoxicillin/clav acid = S where ampicillin = S (2,972 values changed)
-# Non-EUCAST: set ampicillin = R where amoxicillin/clav acid = R (139 values changed)
+# Non-EUCAST: set amoxicillin/clav acid = S where ampicillin = S (3,027 values changed)
+# Non-EUCAST: set ampicillin = R where amoxicillin/clav acid = R (129 values changed)
 # Non-EUCAST: set piperacillin = R where piperacillin/tazobactam = R (no changes)
 # Non-EUCAST: set piperacillin/tazobactam = S where piperacillin = S (no changes)
 # Non-EUCAST: set trimethoprim = R where trimethoprim/sulfa = R (no changes)
@@ -463,14 +463,14 @@
 # Pasteurella multocida (no changes)
 # Staphylococcus (no changes)
 # Streptococcus groups A, B, C, G (no changes)
-# Streptococcus pneumoniae (959 values changed)
+# Streptococcus pneumoniae (1,039 values changed)
 # Viridans group streptococci (no changes)
 # 
 # EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
-# Table 01: Intrinsic resistance in Enterobacteriaceae (1,333 values changed)
+# Table 01: Intrinsic resistance in Enterobacteriaceae (1,305 values changed)
 # Table 02: Intrinsic resistance in non-fermentative Gram-negative bacteria (no changes)
 # Table 03: Intrinsic resistance in other Gram-negative bacteria (no changes)
-# Table 04: Intrinsic resistance in Gram-positive bacteria (2,723 values changed)
+# Table 04: Intrinsic resistance in Gram-positive bacteria (2,835 values changed)
 # Table 08: Interpretive rules for B-lactam agents and Gram-positive cocci (no changes)
 # Table 09: Interpretive rules for B-lactam agents and Gram-negative rods (no changes)
 # Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins (no changes)
@@ -478,15 +478,15 @@
 # Table 13: Interpretive rules for quinolones (no changes)
 # 
 # -------------------------------------------------------------------------------
-# EUCAST rules affected 6,551 out of 20,000 rows, making a total of 8,126 edits
+# EUCAST rules affected 6,667 out of 20,000 rows, making a total of 8,335 edits
 # => added 0 test results
 # 
-# => changed 8,126 test results
-#    - 115 test results changed from S to I
-#    - 4,709 test results changed from S to R
-#    - 1,179 test results changed from I to S
-#    - 330 test results changed from I to R
-#    - 1,793 test results changed from R to S
+# => changed 8,335 test results
+#    - 121 test results changed from S to I
+#    - 4,855 test results changed from S to R
+#    - 1,244 test results changed from I to S
+#    - 332 test results changed from I to R
+#    - 1,783 test results changed from R to S
 # -------------------------------------------------------------------------------
 # 
 # Use eucast_rules(..., verbose = TRUE) (on your original data) to get a data.frame with all specified edits instead.
@@ -514,8 +514,8 @@ # NOTE: Using column `bacteria` as input for `col_mo`. # NOTE: Using column `date` as input for `col_date`. # NOTE: Using column `patient_id` as input for `col_patient_id`. -# => Found 5,691 first isolates (28.5% of total)
-

So only 28.5% is suitable for resistance analysis! We can now filter on it with the filter() function, also from the dplyr package:

+# => Found 5,713 first isolates (28.6% of total) +

So only 28.6% is suitable for resistance analysis! We can now filter on it with the filter() function, also from the dplyr package:

data_1st <- data %>% 
   filter(first == TRUE)

For future use, the above two syntaxes can be shortened with the filter_first_isolate() function:

@@ -525,7 +525,7 @@

First weighted isolates

-

We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient Y10, sorted on date:

+

We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient C4, sorted on date:

@@ -541,21 +541,21 @@ - - + + - + - - + + - + @@ -563,8 +563,8 @@ - - + + @@ -574,63 +574,19 @@ - - + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + @@ -639,9 +595,20 @@ - - - + + + + + + + + + + + + + + @@ -649,9 +616,42 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
isolate
12010-06-10Y102010-02-20C4 B_ESCHR_COLI S SRS S TRUE
22010-09-27Y102010-03-07C4 B_ESCHR_COLISR S S S
32010-11-27Y102010-06-08C4 B_ESCHR_COLI S S
42010-12-06Y102010-10-24C4 B_ESCHR_COLI S SRS S FALSE
52011-02-18Y10B_ESCHR_COLIRSSSFALSE
62011-04-05Y10B_ESCHR_COLIRRSRFALSE
72011-05-29Y10B_ESCHR_COLIRSSRFALSE
82011-06-04Y10B_ESCHR_COLISSSSFALSE
92011-06-14Y102011-04-27C4 B_ESCHR_COLI S STRUE
102011-11-02Y1062011-05-13C4B_ESCHR_COLIRSSSFALSE
72011-12-08C4 B_ESCHR_COLI S SS FALSE
82012-05-22C4B_ESCHR_COLIISSSTRUE
92012-06-10C4B_ESCHR_COLISSSSFALSE
102012-08-14C4B_ESCHR_COLIRSSSFALSE
-

Only 2 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics() function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.

+

Only 3 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics() function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.

If a column exists with a name like ‘key(…)ab’ the first_isolate() function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output:

data <- data %>% 
   mutate(keyab = key_antibiotics(.)) %>% 
@@ -662,7 +662,7 @@
 # NOTE: Using column `patient_id` as input for `col_patient_id`.
 # NOTE: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics = FALSE to prevent this.
 # [Criterion] Inclusion based on key antibiotics, ignoring I
-# => Found 15,020 first weighted isolates (75.1% of total)
+# => Found 14,948 first weighted isolates (74.7% of total)
@@ -679,22 +679,22 @@ - - + + - + - - + + - + @@ -703,8 +703,8 @@ - - + + @@ -715,82 +715,82 @@ - - + + - - - - - - - - - - - - - - - - + + + + - - + + + + + + + + + + + + + + - - + + - - + + - - + + + + + + + + + + + + + + - - - - - - - - - - - - - + - - + + - + @@ -799,11 +799,11 @@
isolate
12010-06-10Y102010-02-20C4 B_ESCHR_COLI S SRS S TRUE TRUE
22010-09-27Y102010-03-07C4 B_ESCHR_COLISR S S S
32010-11-27Y102010-06-08C4 B_ESCHR_COLI S S
42010-12-06Y102010-10-24C4 B_ESCHR_COLI S SRSFALSEFALSE
52011-02-18Y10B_ESCHR_COLIRS S S FALSE TRUE
62011-04-05Y10
52011-04-27C4 B_ESCHR_COLIRRS S RSTRUETRUE
62011-05-13C4B_ESCHR_COLIRSSS FALSE TRUE
72011-05-29Y102011-12-08C4 B_ESCHR_COLIR S SRSS FALSE TRUE
82011-06-04Y102012-05-22C4B_ESCHR_COLIISSSTRUETRUE
92012-06-10C4 B_ESCHR_COLI S S S S FALSETRUE
92011-06-14Y10B_ESCHR_COLISSRSTRUETRUEFALSE
102011-11-02Y102012-08-14C4 B_ESCHR_COLISR S S S
-

Instead of 2, now 9 isolates are flagged. In total, 75.1% of all isolates are marked ‘first weighted’ - 46.6% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.

+

Instead of 3, now 9 isolates are flagged. In total, 74.7% of all isolates are marked ‘first weighted’ - 46.2% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.

As with filter_first_isolate(), there’s a shortcut for this new algorithm too:

data_1st <- data %>% 
   filter_first_weighted_isolate()
-

So we end up with 15,020 isolates for analysis.

+

So we end up with 14,948 isolates for analysis.

We can remove unneeded columns:

data_1st <- data_1st %>% 
   select(-c(first, keyab))
@@ -811,7 +811,6 @@
head(data_1st)
- @@ -828,101 +827,95 @@ - - - + + - + + + - - - - - - + + + + - - - + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
date patient_id hospital
22017-05-17R12017-03-14J6 Hospital BB_KLBSL_PNMNB_STPHY_AURSSS R SSSFGram-negativeKlebsiellapneumoniaeMGram-positiveStaphylococcusaureus TRUE
92016-05-07N82015-04-15B9Hospital DB_STPHY_AURSRSSSMGram-positiveStaphylococcusaureusTRUE
2017-01-04Q9 Hospital A B_ESCHR_COLIR S S RS F Gram-negative Escherichia coli TRUE
112015-11-13U7Hospital BB_STPHY_AURSSSSSFGram-positiveStaphylococcusaureusTRUE
142011-08-27C10Hospital CB_STPHY_AURSSSRRMGram-positiveStaphylococcusaureusTRUE
152013-10-17R10Hospital BB_STPHY_AURSRSSSFGram-positiveStaphylococcusaureusTRUE
162011-10-01I62015-05-23X2 Hospital A B_STRPT_PNMNSS R RMRRF Gram-positive Streptococcus pneumoniae TRUE
2017-05-20X5Hospital CB_STPHY_AURSRRSSFGram-positiveStaphylococcusaureusTRUE
2016-09-27N7Hospital AB_ESCHR_COLIRSSSFGram-negativeEscherichiacoliTRUE

Time for the analysis!

@@ -942,7 +935,7 @@
data_1st %>% freq(genus, species)

Frequency table

Class: character
-Length: 15,020 (of which NA: 0 = 0%)
+Length: 14,948 (of which NA: 0 = 0%)
Unique: 4

Shortest: 16
Longest: 24

@@ -959,33 +952,33 @@ Longest: 24

1 Escherichia coli -7,478 -49.79% -7,478 -49.79% +7,408 +49.56% +7,408 +49.56% 2 Staphylococcus aureus -3,691 -24.57% -11,169 -74.36% +3,682 +24.63% +11,090 +74.19% 3 Streptococcus pneumoniae -2,306 -15.35% -13,475 -89.71% +2,373 +15.88% +13,463 +90.07% 4 Klebsiella pneumoniae -1,545 -10.29% -15,020 +1,485 +9.93% +14,948 100.00% @@ -997,7 +990,7 @@ Longest: 24

The functions resistance() and susceptibility() can be used to calculate antimicrobial resistance or susceptibility. For more specific analyses, the functions proportion_S(), proportion_SI(), proportion_I(), proportion_IR() and proportion_R() can be used to determine the proportion of a specific antimicrobial outcome.

As per the EUCAST guideline of 2019, we calculate resistance as the proportion of R (proportion_R(), equal to resistance()) and susceptibility as the proportion of S and I (proportion_SI(), equal to susceptibility()). These functions can be used on their own:

data_1st %>% resistance(AMX)
-# [1] 0.4637816
+# [1] 0.4658817

Or can be used in conjuction with group_by() and summarise(), both from the dplyr package:

data_1st %>% 
   group_by(hospital) %>% 
@@ -1010,19 +1003,19 @@ Longest: 24

Hospital A -0.4687360 +0.4639385 Hospital B -0.4664905 +0.4647623 Hospital C -0.4576649 +0.4596631 Hospital D -0.4561813 +0.4755767 @@ -1040,23 +1033,23 @@ Longest: 24

Hospital A -0.4687360 -4478 +0.4639385 +4423 Hospital B -0.4664905 -5297 +0.4647623 +5321 Hospital C -0.4576649 -2244 +0.4596631 +2256 Hospital D -0.4561813 -3001 +0.4755767 +2948 @@ -1076,27 +1069,27 @@ Longest: 24

Escherichia -0.9259160 -0.8915485 -0.9933137 +0.9298056 +0.8983531 +0.9950054 Klebsiella -0.9177994 -0.8990291 -0.9967638 +0.9313131 +0.8942761 +0.9932660 Staphylococcus -0.9244107 -0.9260363 -0.9943105 +0.9212385 +0.9166214 +0.9934818 Streptococcus -0.6183868 +0.6080910 0.0000000 -0.6183868 +0.6080910 diff --git a/docs/articles/AMR_files/figure-html/plot 1-1.png b/docs/articles/AMR_files/figure-html/plot 1-1.png index 7a459ed6..11b80307 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 1-1.png and b/docs/articles/AMR_files/figure-html/plot 1-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 3-1.png b/docs/articles/AMR_files/figure-html/plot 3-1.png index 1e77ae7e..9095e6a9 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 3-1.png and b/docs/articles/AMR_files/figure-html/plot 3-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 4-1.png b/docs/articles/AMR_files/figure-html/plot 4-1.png index d3e24f2b..cc3062b6 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 4-1.png and b/docs/articles/AMR_files/figure-html/plot 4-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 5-1.png b/docs/articles/AMR_files/figure-html/plot 5-1.png index 4b21fb11..6c8711ff 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 5-1.png and b/docs/articles/AMR_files/figure-html/plot 5-1.png differ diff --git a/docs/articles/benchmarks.html b/docs/articles/benchmarks.html index 74b073bd..bf4f0537 100644 --- a/docs/articles/benchmarks.html +++ b/docs/articles/benchmarks.html @@ -41,7 +41,7 @@ AMR (for R) - 0.9.0.9004 + 0.9.0.9006
@@ -187,7 +187,7 @@

Benchmarks

Matthijs S. Berends

-

20 December 2019

+

22 December 2019

@@ -221,36 +221,21 @@ times = 10) print(S.aureus, unit = "ms", signif = 2) # Unit: milliseconds -# expr min lq mean median uq max -# as.mo("sau") 9.7 10.0 33.0 11.0 35.0 110.0 -# as.mo("stau") 36.0 36.0 45.0 37.0 60.0 65.0 -# as.mo("STAU") 33.0 34.0 41.0 37.0 40.0 61.0 -# as.mo("staaur") 9.6 9.9 13.0 10.0 11.0 33.0 -# as.mo("STAAUR") 9.6 10.0 11.0 11.0 11.0 13.0 -# as.mo("S. aureus") 25.0 26.0 32.0 26.0 28.0 59.0 -# as.mo("S aureus") 25.0 26.0 33.0 26.0 33.0 55.0 -# as.mo("Staphylococcus aureus") 4.7 4.8 5.2 5.2 5.5 6.4 -# as.mo("Staphylococcus aureus (MRSA)") 620.0 640.0 690.0 660.0 680.0 840.0 -# as.mo("Sthafilokkockus aaureuz") 310.0 340.0 360.0 350.0 370.0 420.0 -# as.mo("MRSA") 9.8 10.0 13.0 11.0 12.0 35.0 -# as.mo("VISA") 21.0 21.0 29.0 22.0 27.0 60.0 -# as.mo("VRSA") 20.0 21.0 31.0 25.0 44.0 47.0 -# as.mo(22242419) 19.0 19.0 26.0 20.0 25.0 52.0 -# neval -# 10 -# 10 -# 10 -# 10 -# 10 -# 10 -# 10 -# 10 -# 10 -# 10 -# 10 -# 10 -# 10 -# 10
+# expr min lq mean median uq max neval +# as.mo("sau") 9.1 9.3 15 9.8 13.0 34 10 +# as.mo("stau") 33.0 34.0 40 35.0 39.0 60 10 +# as.mo("STAU") 33.0 34.0 51 35.0 57.0 150 10 +# as.mo("staaur") 8.8 9.2 16 10.0 13.0 43 10 +# as.mo("STAAUR") 9.3 9.4 23 9.7 10.0 120 10 +# as.mo("S. aureus") 10.0 10.0 16 11.0 12.0 41 10 +# as.mo("S aureus") 10.0 10.0 28 12.0 35.0 110 10 +# as.mo("Staphylococcus aureus") 4.6 4.8 10 4.9 5.2 56 10 +# as.mo("Staphylococcus aureus (MRSA)") 660.0 670.0 700 680.0 710.0 770 10 +# as.mo("Sthafilokkockus aaureuz") 330.0 350.0 370 370.0 390.0 430 10 +# as.mo("MRSA") 9.2 9.2 14 9.4 9.5 35 10 +# as.mo("VISA") 20.0 20.0 26 21.0 23.0 45 10 +# as.mo("VRSA") 20.0 21.0 30 22.0 44.0 47 10 +# as.mo(22242419) 19.0 20.0 43 20.0 28.0 130 10

In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 5 milliseconds means it can determine 200 input values per second. It case of 100 milliseconds, this is only 10 input values per second. The second input is the only one that has to be looked up thoroughly. All the others are known codes (the first one is a WHONET code) or common laboratory codes, or common full organism names like the last one. Full organism names are always preferred.

To achieve this speed, the as.mo function also takes into account the prevalence of human pathogenic microorganisms. The downside is of course that less prevalent microorganisms will be determined less fast. See this example for the ID of Methanosarcina semesiae (B_MTHNSR_SEMS), a bug probably never found before in humans:

@@ -262,19 +247,19 @@ times = 10) print(M.semesiae, unit = "ms", signif = 4) # Unit: milliseconds -# expr min lq mean median uq -# as.mo("metsem") 1435.000 1473.000 1510.00 1500.000 1520.00 -# as.mo("METSEM") 1466.000 1516.000 1542.00 1541.000 1557.00 -# as.mo("M. semesiae") 2113.000 2173.000 2207.00 2187.000 2243.00 -# as.mo("M. semesiae") 2154.000 2192.000 2238.00 2215.000 2305.00 -# as.mo("Methanosarcina semesiae") 5.435 5.508 13.05 5.724 28.23 +# expr min lq mean median uq +# as.mo("metsem") 1469.000 1482.000 1515.000 1507.000 1545.000 +# as.mo("METSEM") 1435.000 1452.000 1490.000 1479.000 1520.000 +# as.mo("M. semesiae") 10.840 11.090 16.220 11.150 11.600 +# as.mo("M. semesiae") 10.670 10.820 20.140 11.180 38.040 +# as.mo("Methanosarcina semesiae") 5.138 5.185 7.838 5.366 5.493 # max neval -# 1651.00 10 -# 1632.00 10 -# 2301.00 10 -# 2332.00 10 -# 32.95 10 -

That takes 15.5 times as much time on average. A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance. Full names (like Methanosarcina semesiae) are always very fast and only take some thousands of seconds to coerce - they are the most probable input from most data sets.

+# 1574.00 10 +# 1563.00 10 +# 36.46 10 +# 44.17 10 +# 30.28 10 +

That takes 6.2 times as much time on average. A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance. Full names (like Methanosarcina semesiae) are always very fast and only take some thousands of seconds to coerce - they are the most probable input from most data sets.

In the figure below, we compare Escherichia coli (which is very common) with Prevotella brevis (which is moderately common) and with Methanosarcina semesiae (which is uncommon):

The highest outliers are the first times. All next determinations were done in only thousands of seconds, because the as.mo() function learns from its own output to speed up determinations for next times.

@@ -311,8 +296,8 @@ print(run_it, unit = "ms", signif = 3) # Unit: milliseconds # expr min lq mean median uq max neval -# mo_name(x) 539 576 598 586 610 728 100 -

So transforming 500,000 values (!!) of 50 unique values only takes 0.59 seconds (586 ms). You only lose time on your unique input values.

+# mo_name(x) 548 581 605 593 611 735 100 +

So transforming 500,000 values (!!) of 50 unique values only takes 0.59 seconds (593 ms). You only lose time on your unique input values.

@@ -324,10 +309,10 @@ times = 10) print(run_it, unit = "ms", signif = 3) # Unit: milliseconds -# expr min lq mean median uq max neval -# A 6.460 6.61 6.940 6.720 6.940 8.14 10 -# B 25.400 25.80 30.700 26.300 27.600 61.60 10 -# C 0.653 0.67 0.784 0.806 0.842 0.90 10

+# expr min lq mean median uq max neval +# A 6.380 6.430 6.800 6.540 6.690 8.94 10 +# B 10.900 10.900 14.200 11.100 11.400 37.20 10 +# C 0.735 0.772 0.832 0.792 0.875 1.01 10

So going from mo_name("Staphylococcus aureus") to "Staphylococcus aureus" takes 0.0008 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:

run_it <- microbenchmark(A = mo_species("aureus"),
                          B = mo_genus("Staphylococcus"),
@@ -341,14 +326,14 @@
 print(run_it, unit = "ms", signif = 3)
 # Unit: milliseconds
 #  expr   min    lq  mean median    uq   max neval
-#     A 0.429 0.450 0.458  0.453 0.469 0.493    10
-#     B 0.484 0.499 0.519  0.506 0.518 0.646    10
-#     C 0.674 0.721 0.777  0.796 0.816 0.853    10
-#     D 0.485 0.490 0.507  0.499 0.518 0.554    10
-#     E 0.439 0.453 0.462  0.460 0.476 0.482    10
-#     F 0.425 0.446 0.454  0.455 0.462 0.477    10
-#     G 0.447 0.450 0.464  0.457 0.471 0.508    10
-#     H 0.449 0.457 0.477  0.459 0.464 0.641    10
+# A 0.445 0.453 0.463 0.464 0.467 0.492 10 +# B 0.484 0.496 0.522 0.502 0.505 0.724 10 +# C 0.667 0.746 0.755 0.758 0.786 0.800 10 +# D 0.488 0.491 0.507 0.505 0.509 0.558 10 +# E 0.454 0.455 0.462 0.461 0.465 0.490 10 +# F 0.432 0.447 0.456 0.458 0.459 0.490 10 +# G 0.438 0.446 0.456 0.454 0.460 0.486 10 +# H 0.439 0.442 0.454 0.450 0.459 0.501 10

Of course, when running mo_phylum("Firmicutes") the function has zero knowledge about the actual microorganism, namely S. aureus. But since the result would be "Firmicutes" too, there is no point in calculating the result. And because this package ‘knows’ all phyla of all known bacteria (according to the Catalogue of Life), it can just return the initial value immediately.

@@ -375,13 +360,13 @@ print(run_it, unit = "ms", signif = 4) # Unit: milliseconds # expr min lq mean median uq max neval -# en 20.68 22.22 25.57 22.81 24.02 58.73 100 -# de 21.96 23.63 28.16 24.45 25.50 138.10 100 -# nl 27.91 29.85 33.99 31.06 32.41 63.72 100 -# es 22.06 23.81 28.02 24.51 25.63 57.01 100 -# it 21.62 23.79 29.36 24.79 27.55 67.42 100 -# fr 21.77 23.67 27.02 24.12 25.51 65.44 100 -# pt 21.68 23.59 26.42 24.03 25.23 54.05 100
+# en 21.02 22.35 26.62 22.93 23.58 55.03 100 +# de 22.22 23.82 29.38 24.32 25.30 60.12 100 +# nl 27.58 28.91 33.76 30.03 30.78 144.40 100 +# es 22.31 23.57 28.23 24.31 25.96 53.68 100 +# it 22.02 23.74 30.82 24.32 26.97 158.30 100 +# fr 22.38 23.39 28.85 24.29 25.71 143.30 100 +# pt 22.14 23.44 27.47 24.17 25.12 56.93 100

Currently supported are German, Dutch, Spanish, Italian, French and Portuguese.

diff --git a/docs/articles/benchmarks_files/figure-html/unnamed-chunk-5-1.png b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-5-1.png index c075bd5e..8d3785ba 100644 Binary files a/docs/articles/benchmarks_files/figure-html/unnamed-chunk-5-1.png and b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-5-1.png differ diff --git a/docs/articles/benchmarks_files/figure-html/unnamed-chunk-8-1.png b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-8-1.png index 75763973..25e39db4 100644 Binary files a/docs/articles/benchmarks_files/figure-html/unnamed-chunk-8-1.png and b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-8-1.png differ diff --git a/docs/articles/benchmarks_files/figure-html/unnamed-chunk-9-1.png b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-9-1.png index 8c2dc24d..f88a9ad1 100644 Binary files a/docs/articles/benchmarks_files/figure-html/unnamed-chunk-9-1.png and b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-9-1.png differ diff --git a/docs/articles/index.html b/docs/articles/index.html index be934c10..f87f8691 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -84,7 +84,7 @@ AMR (for R) - 0.9.0.9005 + 0.9.0.9006 diff --git a/docs/authors.html b/docs/authors.html index aa58853e..ddd67235 100644 --- a/docs/authors.html +++ b/docs/authors.html @@ -6,7 +6,7 @@ -Authors • AMR (for R) +Citation and Authors • AMR (for R) @@ -50,7 +50,7 @@ - + @@ -71,7 +71,7 @@ -
+