diff --git a/DESCRIPTION b/DESCRIPTION index 8dbd4116..251378b0 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR Version: 0.5.0.9017 -Date: 2019-02-13 +Date: 2019-02-14 Title: Antimicrobial Resistance Analysis Authors@R: c( person( diff --git a/R/freq.R b/R/freq.R index b21ba2ec..1f8f5aac 100755 --- a/R/freq.R +++ b/R/freq.R @@ -667,16 +667,17 @@ format_header <- function(x, markdown = FALSE, decimal.mark = ".", big.mark = ", header <- header[names(header) != "na_length"] # format all numeric values - header <- lapply(header, function(x) - if (is.numeric(x)) + header <- lapply(header, function(x) { + if (is.numeric(x)) { if (any(x < 1000)) { format(round2(x, digits = digits), decimal.mark = decimal.mark, big.mark = big.mark) } else { format(x, digits = digits, decimal.mark = decimal.mark, big.mark = big.mark) } - else + } else { x - ) + } + }) # numeric values if (has_length == TRUE & any(x_class %in% c("double", "integer", "numeric", "raw", "single"))) { diff --git a/data/microorganisms.codes.rda b/data/microorganisms.codes.rda index 518404aa..a7f81ff2 100644 Binary files a/data/microorganisms.codes.rda and b/data/microorganisms.codes.rda differ diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index 29aa831d..6bdc843a 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -185,7 +185,7 @@

How to conduct AMR analysis

Matthijs S. Berends

-

12 February 2019

+

14 February 2019

@@ -194,7 +194,7 @@ -

Note: values on this page will change with every website update since they are based on randomly created values and the page was written in RMarkdown. However, the methodology remains unchanged. This page was generated on 12 February 2019.

+

Note: values on this page will change with every website update since they are based on randomly created values and the page was written in RMarkdown. However, the methodology remains unchanged. This page was generated on 14 February 2019.

Introduction

@@ -210,21 +210,21 @@ -2019-02-12 +2019-02-14 abcd Escherichia coli S S -2019-02-12 +2019-02-14 abcd Escherichia coli S R -2019-02-12 +2019-02-14 efgh Escherichia coli R @@ -320,70 +320,70 @@ -2012-04-28 -X6 -Hospital B -Escherichia coli -R +2016-08-15 +E9 +Hospital C +Staphylococcus aureus S R S -F +S +M -2012-12-23 -Z1 -Hospital A -Escherichia coli -I +2013-02-21 +I8 +Hospital D +Streptococcus pneumoniae S +I R S -F +M -2011-05-27 -G10 +2015-05-12 +F4 Hospital B +Escherichia coli +S +S +S +S +M + + +2013-09-20 +M3 +Hospital A +Escherichia coli +S +I +S +S +M + + +2013-08-03 +V8 +Hospital C Streptococcus pneumoniae S S S S -M - - -2012-08-19 -Q1 -Hospital D -Escherichia coli -S -R -S -S -F - - -2016-05-06 -Y2 -Hospital D -Klebsiella pneumoniae -S -S -S -S F -2010-08-27 -I8 -Hospital A -Escherichia coli -R +2017-10-01 +V9 +Hospital C +Staphylococcus aureus +S +I S S -S -M +F @@ -398,14 +398,14 @@
#> Frequency table of `gender` from a data.frame (20,000 x 9) 
 #> 
 #> Class:   factor (numeric)
-#> Levels:  F, M
 #> Length:  20,000 (of which NA: 0 = 0.00%)
+#> Levels:  2: F, M
 #> Unique:  2
 #> 
 #>      Item     Count   Percent   Cum. Count   Cum. Percent
 #> ---  -----  -------  --------  -----------  -------------
-#> 1    M       10,303     51.5%       10,303          51.5%
-#> 2    F        9,697     48.5%       20,000         100.0%
+#> 1 M 10,444 52.2% 10,444 52.2% +#> 2 F 9,556 47.8% 20,000 100.0%

So, we can draw at least two conclusions immediately. From a data scientist perspective, the data looks clean: only values M and F. From a researcher perspective: there are slightly more men. Nothing we didn’t already know.

The data is already quite clean, but we still need to transform some variables. The bacteria column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate() function of the dplyr package makes this really easy:

data <- data %>%
@@ -436,10 +436,10 @@
 #> Kingella kingae (no changes)
 #> 
 #> EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
-#> Table 1:  Intrinsic resistance in Enterobacteriaceae (1278 changes)
+#> Table 1:  Intrinsic resistance in Enterobacteriaceae (1293 changes)
 #> Table 2:  Intrinsic resistance in non-fermentative Gram-negative bacteria (no changes)
 #> Table 3:  Intrinsic resistance in other Gram-negative bacteria (no changes)
-#> Table 4:  Intrinsic resistance in Gram-positive bacteria (2750 changes)
+#> Table 4:  Intrinsic resistance in Gram-positive bacteria (2812 changes)
 #> Table 8:  Interpretive rules for B-lactam agents and Gram-positive cocci (no changes)
 #> Table 9:  Interpretive rules for B-lactam agents and Gram-negative rods (no changes)
 #> Table 10: Interpretive rules for B-lactam agents and other Gram-negative bacteria (no changes)
@@ -455,9 +455,9 @@
 #> Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no changes)
 #> Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no changes)
 #> 
-#> => EUCAST rules affected 7,442 out of 20,000 rows
+#> => EUCAST rules affected 7,447 out of 20,000 rows
 #>    -> added 0 test results
-#>    -> changed 4,028 test results (0 to S; 0 to I; 4,028 to R)
+#> -> changed 4,105 test results (0 to S; 0 to I; 4,105 to R)

@@ -482,8 +482,8 @@ #> NOTE: Using column `bacteria` as input for `col_mo`. #> NOTE: Using column `date` as input for `col_date`. #> NOTE: Using column `patient_id` as input for `col_patient_id`. -#> => Found 5,652 first isolates (28.3% of total)

-

So only 28.3% is suitable for resistance analysis! We can now filter on it with the filter() function, also from the dplyr package:

+#> => Found 5,692 first isolates (28.5% of total) +

So only 28.5% is suitable for resistance analysis! We can now filter on it with the filter() function, also from the dplyr package:

data_1st <- data %>% 
   filter(first == TRUE)

For future use, the above two syntaxes can be shortened with the filter_first_isolate() function:

@@ -509,32 +509,32 @@ 1 -2010-01-15 -V8 +2010-06-20 +Y4 B_ESCHR_COL S -R S +R S TRUE 2 -2010-02-08 -V8 +2010-07-31 +Y4 B_ESCHR_COL S S -R +S S FALSE 3 -2010-02-24 -V8 +2010-08-26 +Y4 B_ESCHR_COL -R +S S S S @@ -542,8 +542,8 @@ 4 -2010-04-28 -V8 +2010-12-11 +Y4 B_ESCHR_COL R S @@ -553,8 +553,8 @@ 5 -2010-05-13 -V8 +2010-12-30 +Y4 B_ESCHR_COL R S @@ -564,62 +564,62 @@ 6 -2010-05-14 -V8 +2011-04-02 +Y4 B_ESCHR_COL +R +I S -S -S -S +R FALSE 7 -2010-08-04 -V8 +2011-04-06 +Y4 B_ESCHR_COL +S +S +S R -S -S -S FALSE 8 -2010-09-12 -V8 +2011-04-07 +Y4 B_ESCHR_COL S S -R +S S FALSE 9 -2010-10-03 -V8 +2011-05-28 +Y4 B_ESCHR_COL R -I +S S S FALSE 10 -2011-01-09 -V8 +2011-09-09 +Y4 B_ESCHR_COL -R +S S R S -FALSE +TRUE -

Only 1 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics() function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.

+

Only 2 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics() function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.

If a column exists with a name like ‘key(…)ab’ the first_isolate() function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output:

data <- data %>% 
   mutate(keyab = key_antibiotics(.)) %>% 
@@ -630,7 +630,7 @@
 #> NOTE: Using column `patient_id` as input for `col_patient_id`.
 #> NOTE: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics  = FALSE to prevent this.
 #> [Criterion] Inclusion based on key antibiotics, ignoring I.
-#> => Found 15,783 first weighted isolates (78.9% of total)
+#> => Found 15,866 first weighted isolates (79.3% of total) @@ -647,32 +647,44 @@ - - + + - + - - + + - + - - + + + + + + + + + + + + + + @@ -681,22 +693,10 @@ - - - - - - - - - - - - - - + + @@ -707,47 +707,47 @@ - - + + + + - - - + - - + + + + + - - - - - + + - + - - + + - + @@ -755,23 +755,23 @@ - - + + - + - +
isolate
12010-01-15V82010-06-20Y4 B_ESCHR_COL SR SR S TRUE TRUE
22010-02-08V82010-07-31Y4 B_ESCHR_COL S SRS S FALSE TRUE
32010-02-24V82010-08-26Y4B_ESCHR_COLSSSSFALSEFALSE
42010-12-11Y4 B_ESCHR_COL R SFALSE TRUE
42010-04-28V8B_ESCHR_COLRSSSFALSEFALSE
52010-05-13V82010-12-30Y4 B_ESCHR_COL R S
62010-05-14V82011-04-02Y4 B_ESCHR_COLRI SSSSR FALSE TRUE
72010-08-04V82011-04-06Y4 B_ESCHR_COLSSS RSSS FALSE TRUE
82010-09-12V82011-04-07Y4 B_ESCHR_COL S SRS S FALSE TRUE
92010-10-03V82011-05-28Y4 B_ESCHR_COL RIS S S FALSE
102011-01-09V82011-09-09Y4 B_ESCHR_COLRS S R SFALSETRUE TRUE
-

Instead of 1, now 8 isolates are flagged. In total, 78.9% of all isolates are marked ‘first weighted’ - 50.7% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.

+

Instead of 2, now 8 isolates are flagged. In total, 79.3% of all isolates are marked ‘first weighted’ - 50.9% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.

As with filter_first_isolate(), there’s a shortcut for this new algorithm too:

data_1st <- data %>% 
   filter_first_weighted_isolate()
-

So we end up with 15,783 isolates for analysis.

+

So we end up with 15,866 isolates for analysis.

We can remove unneeded columns:

data_1st <- data_1st %>% 
   select(-c(first, keyab))
@@ -779,6 +779,7 @@
head(data_1st)
+ @@ -795,43 +796,30 @@ - - - - - + + + + + - - - - + + + + + - - - - - - - - - - - - - - - - - - + + + + - - + + @@ -839,45 +827,64 @@ - - - - + + + + + - - + + - - - - - + + + + + + + + + - - - - - + + + + + + + + + + + + + + + + + + - - - + + + + + + - - @@ -901,9 +908,9 @@
freq(paste(data_1st$genus, data_1st$species))

Or can be used like the dplyr way, which is easier readable:

data_1st %>% freq(genus, species)
-

Frequency table of genus and species from a data.frame (15,783 x 13)

+

Frequency table of genus and species from a data.frame (15,866 x 13)

Columns: 2
-Length: 15,783 (of which NA: 0 = 0.00%)
+Length: 15,866 (of which NA: 0 = 0.00%)
Unique: 4

Shortest: 16
Longest: 24

@@ -920,33 +927,33 @@ Longest: 24

- - - - + + + + - - - - + + + + - - - - + + + + - - - + + + @@ -957,7 +964,7 @@ Longest: 24

Resistance percentages

The functions portion_R, portion_RI, portion_I, portion_IS and portion_S can be used to determine the portion of a specific antimicrobial outcome. They can be used on their own:

data_1st %>% portion_IR(amox)
-#> [1] 0.4756383
+#> [1] 0.4766797

Or can be used in conjuction with group_by() and summarise(), both from the dplyr package:

data_1st %>% 
   group_by(hospital) %>% 
@@ -970,19 +977,19 @@ Longest: 24

- + - + - + - +
date patient_id hospital
2012-04-28X6Hospital BB_ESCHR_COLR12016-08-15E9Hospital CB_STPHY_AUR S R SFGram negativeEscherichiacoliSMGram positiveStaphylococcusaureus TRUE
2012-12-23Z1Hospital AB_ESCHR_COLISRSFGram negativeEscherichiacoliTRUE
2011-05-27G10Hospital B22013-02-21I8Hospital D B_STRPTC_PNE SSSIR R M Gram positivepneumoniae TRUE
2012-08-19Q1Hospital D
32015-05-12F4Hospital B B_ESCHR_COL SR S SFSM Gram negative Escherichia coli TRUE
2016-05-06Y2Hospital DB_KLBSL_PNE
52013-08-03V8Hospital CB_STRPTC_PNESSS RSSS FGram negativeKlebsiellaGram positiveStreptococcus pneumoniae TRUE
62017-10-01V9Hospital CB_STPHY_AURSISSFGram positiveStaphylococcusaureusTRUE
2010-08-27I8Hospital A72014-06-20E9Hospital B B_ESCHR_COLSS R SSS M Gram negative Escherichia
1 Escherichia coli7,82549.6%7,82549.6%7,86049.5%7,86049.5%
2 Staphylococcus aureus3,97925.2%11,80474.8%3,89224.5%11,75274.1%
3 Streptococcus pneumoniae2,44915.5%14,25390.3%2,55616.1%14,30890.2%
4 Klebsiella pneumoniae1,5309.7%15,7831,5589.8%15,866 100.0%
Hospital A0.46973260.4749633
Hospital B0.47709580.4879713
Hospital C0.47996700.4761905
Hospital D0.47850820.4600062
@@ -1000,23 +1007,23 @@ Longest: 24

Hospital A -0.4697326 -4675 +0.4749633 +4773 Hospital B -0.4770958 -5523 +0.4879713 +5570 Hospital C -0.4799670 -2421 +0.4761905 +2310 Hospital D -0.4785082 -3164 +0.4600062 +3213 @@ -1036,27 +1043,27 @@ Longest: 24

Escherichia -0.7345687 -0.9051757 -0.9773802 +0.7249364 +0.8996183 +0.9725191 Klebsiella -0.7307190 -0.9071895 -0.9771242 +0.7291399 +0.9037227 +0.9762516 Staphylococcus -0.7353606 -0.9210857 -0.9816537 +0.7410072 +0.9229188 +0.9786742 Streptococcus -0.7268273 +0.7292645 0.0000000 -0.7268273 +0.7292645 diff --git a/docs/articles/AMR_files/figure-html/plot 1-1.png b/docs/articles/AMR_files/figure-html/plot 1-1.png index f78b6151..b8ebcb6c 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 1-1.png and b/docs/articles/AMR_files/figure-html/plot 1-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 3-1.png b/docs/articles/AMR_files/figure-html/plot 3-1.png index 9b94b88c..33c83dcc 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 3-1.png and b/docs/articles/AMR_files/figure-html/plot 3-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 4-1.png b/docs/articles/AMR_files/figure-html/plot 4-1.png index 430bdd63..9bfe6803 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 4-1.png and b/docs/articles/AMR_files/figure-html/plot 4-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 5-1.png b/docs/articles/AMR_files/figure-html/plot 5-1.png index 13afa548..70b13bd3 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 5-1.png and b/docs/articles/AMR_files/figure-html/plot 5-1.png differ diff --git a/docs/articles/EUCAST.html b/docs/articles/EUCAST.html index bc96c8ae..84e39b40 100644 --- a/docs/articles/EUCAST.html +++ b/docs/articles/EUCAST.html @@ -185,7 +185,7 @@

How to apply EUCAST rules

Matthijs S. Berends

-

12 February 2019

+

14 February 2019

diff --git a/docs/articles/G_test.html b/docs/articles/G_test.html index 7e04b8c9..439fdc60 100644 --- a/docs/articles/G_test.html +++ b/docs/articles/G_test.html @@ -70,7 +70,7 @@
  • - + Predict antimicrobial resistance @@ -91,14 +91,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic @@ -185,7 +185,7 @@

    How to use the G-test

    Matthijs S. Berends

    -

    09 February 2019

    +

    14 February 2019

    diff --git a/docs/articles/WHONET.html b/docs/articles/WHONET.html index 73d65e61..7e0c1b40 100644 --- a/docs/articles/WHONET.html +++ b/docs/articles/WHONET.html @@ -70,7 +70,7 @@
  • - + Predict antimicrobial resistance @@ -91,14 +91,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic @@ -185,7 +185,7 @@

    How to work with WHONET data

    Matthijs S. Berends

    -

    09 February 2019

    +

    14 February 2019

    @@ -199,33 +199,33 @@
    Import of data

    This tutorial assumes you already imported the WHONET data with e.g. the readxl package. In RStudio, this can be done using the menu button ‘Import Dataset’ in the tab ‘Environment’. Choose the option ‘From Excel’ and select your exported file. Make sure date fields are imported correctly.

    An example syntax could look like this:

    -
    library(readxl)
    -data <- read_excel(path = "path/to/your/file.xlsx")
    +
    library(readxl)
    +data <- read_excel(path = "path/to/your/file.xlsx")

    This package comes with an example data set WHONET. We will use it for this analysis.

    Preparation

    First, load the relevant packages if you did not yet did this. I use the tidyverse for all of my analyses. All of them. If you don’t know it yet, I suggest you read about it on their website: https://www.tidyverse.org/.

    -
    library(dplyr)   # part of tidyverse
    -library(ggplot2) # part of tidyverse
    -library(AMR)     # this package
    +
    library(dplyr)   # part of tidyverse
    +library(ggplot2) # part of tidyverse
    +library(AMR)     # this package

    We will have to transform some variables to simplify and automate the analysis:

    • Microorganisms should be transformed to our own microorganism IDs (called an mo) using the ITIS reference data set, which contains all ~20,000 microorganisms from the taxonomic kingdoms Bacteria, Fungi and Protozoa. We do the tranformation with as.mo(). This function also recognises almost all WHONET abbreviations of microorganisms.
    • Antimicrobial results or interpretations have to be clean and valid. In other words, they should only contain values "S", "I" or "R". That is exactly where the as.rsi() function is for.
    - +

    No errors or warnings, so all values are transformed succesfully. Let’s check it though, with a couple of frequency tables:

    - -

    Frequency table of mo from a data.frame (500 x 54)
    -Class: mo (character)
    +

    +

    Frequency table of mo from a data.frame (500 x 54)

    +

    Class: mo (character)
    Length: 500 (of which NA: 0 = 0.00%)
    Unique: 39

    Families: 9
    @@ -324,16 +324,16 @@ Species: 36

    (omitted 29 entries, n = 57 [11.4%])

    - -

    Frequency table of AMC_ND2 from a data.frame (500 x 54)
    -Class: factor > ordered > rsi (numeric)
    -Levels: S < I < R
    +

    +

    Frequency table of AMC_ND2 from a data.frame (500 x 54)

    +

    Class: factor > ordered > rsi (numeric)
    Length: 500 (of which NA: 19 = 3.80%)
    +Levels: 3: S < I < R
    Unique: 3

    -

    %IR: 25.99% (ratio S : IR = 1.0 : 0.4)

    +

    %IR: 25.00% (ratio 2.8:1)

    diff --git a/docs/articles/atc_property.html b/docs/articles/atc_property.html index 4c63ca08..6fb5fb0a 100644 --- a/docs/articles/atc_property.html +++ b/docs/articles/atc_property.html @@ -185,7 +185,7 @@

    How to get properties of an antibiotic

    Matthijs S. Berends

    -

    12 February 2019

    +

    14 February 2019

    diff --git a/docs/articles/benchmarks.html b/docs/articles/benchmarks.html index 8d613930..c366169e 100644 --- a/docs/articles/benchmarks.html +++ b/docs/articles/benchmarks.html @@ -70,7 +70,7 @@
  • - + Predict antimicrobial resistance @@ -91,14 +91,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic @@ -185,7 +185,7 @@

    Benchmarks

    Matthijs S. Berends

    -

    09 February 2019

    +

    14 February 2019

    @@ -195,150 +195,160 @@

    One of the most important features of this package is the complete microbial taxonomic database, supplied by ITIS (https://www.itis.gov). We created a function as.mo() that transforms any user input value to a valid microbial ID by using AI (Artificial Intelligence) and based on the taxonomic tree of ITIS.

    -

    Using the microbenchmark package, we can review the calculation performance of this function.

    -
    library(microbenchmark)
    -library(AMR)
    +

    Using the microbenchmark package, we can review the calculation performance of this function. Its function microbenchmark() calculates different input expressions independently of each others and runs every expression 100 times.

    +
    library(microbenchmark)
    +library(AMR)

    In the next test, we try to ‘coerce’ different input values for Staphylococcus aureus. The actual result is the same every time: it returns its MO code B_STPHY_AUR (B stands for Bacteria, the taxonomic kingdom).

    But the calculation time differs a lot. Here, the AI effect can be reviewed best:

    - -

    In the table above, all measurements are shown in milliseconds (thousands of seconds), tested on a quite regular Linux server from 2007 (Core 2 Duo 2.7 GHz, 2 GB DDR2 RAM). A value of 6.9 milliseconds means it will roughly determine 144 input values per second. It case of 39.2 milliseconds, this is only 26 input values per second. The more an input value resembles a full name (like C, D and F), the faster the result will be found. In case of G, the input is already a valid MO code, so it only almost takes no time at all (0.0001 seconds on our server).

    + +

    +

    In the table above, all measurements are shown in milliseconds (thousands of seconds), tested on a quite regular Linux server from 2007 (Core 2 Duo 2.7 GHz, 2 GB DDR2 RAM). A value of 8 milliseconds means it can determine 125 input values per second. It case of 40 milliseconds, this is only 25 input values per second. The more an input value resembles a full name, the faster the result will be found. In case of as.mo("B_STPHY_AUR"), the input is already a valid MO code, so it only almost takes no time at all (0.0002 seconds on our server).

    To achieve this speed, the as.mo function also takes into account the prevalence of human pathogenic microorganisms. The downside is of course that less prevalent microorganisms will be determined far less faster. See this example for the ID of Burkholderia nodosa (B_BRKHL_NOD):

    - -

    That takes up to 11 times as much time! A value of 158.4 milliseconds means it can only determine ~6 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance.

    + +

    +

    That takes up to 8 times as much time! A value of 145 milliseconds means it can only determine ~7 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance.

    To relieve this pitfall and further improve performance, two important calculations take almost no time at all: repetitive results and already precalculated results.

    Repetitive results

    Repetitive results mean that unique values are present more than once. Unique values will only be calculated once by as.mo(). We will use mo_fullname() for this test - a helper function that returns the full microbial name (genus, species and possibly subspecies) and uses as.mo() internally.

    - +

    So transforming 500,000 values (!) of 96 unique values only takes 0.12 seconds (120 ms). You only lose time on your unique input values.

    Results of a tenfold - 5,000,000 values:

    - -

    Even the full names of 5 Million values are calculated within a second.

    + +

    Even determining the full names of 5 Million values is done within a second.

    Precalculated results

    What about precalculated results? If the input is an already precalculated result of a helper function like mo_fullname(), it almost doesn’t take any time at all (see ‘C’ below):

    - +

    So going from mo_fullname("Staphylococcus aureus") to "Staphylococcus aureus" takes 0.0001 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:

    - +

    Of course, when running mo_phylum("Firmicutes") the function has zero knowledge about the actual microorganism, namely S. aureus. But since the result would be "Firmicutes" too, there is no point in calculating the result. And because this package ‘knows’ all phyla of all known microorganisms (according to ITIS), it can just return the initial value immediately.

    Results in other languages

    When the system language is non-English and supported by this AMR package, some functions take a little while longer:

    - +

    Currently supported are German, Dutch, Spanish, Italian, French and Portuguese.

    diff --git a/docs/articles/freq.html b/docs/articles/freq.html index 45f79d3f..ca9f0361 100644 --- a/docs/articles/freq.html +++ b/docs/articles/freq.html @@ -70,7 +70,7 @@
  • - + Predict antimicrobial resistance @@ -91,14 +91,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic @@ -185,7 +185,7 @@

    How to create frequency tables

    Matthijs S. Berends

    -

    09 February 2019

    +

    14 February 2019

    @@ -203,9 +203,9 @@

    Frequencies of one variable

    To only show and quickly review the content of one variable, you can just select this variable in various ways. Let’s say we want to get the frequencies of the gender variable of the septic_patients dataset:

    - -

    Frequency table of gender from a data.frame (2,000 x 49)
    -Class: character (character)
    +

    +

    Frequency table of gender from a data.frame (2,000 x 49)

    +

    Class: character (character)
    Length: 2,000 (of which NA: 0 = 0.00%)
    Unique: 2

    Shortest: 1
    @@ -245,23 +245,23 @@ Longest: 1

    Frequencies of more than one variable

    Multiple variables will be pasted into one variable to review individual cases, keeping a univariate frequency table.

    For illustration, we could add some more variables to the septic_patients dataset to learn about bacterial properties:

    - +

    Now all variables of the microorganisms dataset have been joined to the septic_patients dataset. The microorganisms dataset consists of the following variables:

    - +

    If we compare the dimensions between the old and new dataset, we can see that these 14 variables were added:

    -
    dim(septic_patients)
    -# [1] 2000   49
    -dim(my_patients)
    -# [1] 2000   63
    +
    dim(septic_patients)
    +# [1] 2000   49
    +dim(my_patients)
    +# [1] 2000   63

    So now the genus and species variables are available. A frequency table of these combined variables can be created like this:

    -
    my_patients %>%
    -  freq(genus, species, nmax = 15)
    -

    Frequency table of genus and species from a data.frame (2,000 x 63)
    -Columns: 2
    +

    my_patients %>%
    +  freq(genus, species, nmax = 15)
    +

    Frequency table of genus and species from a data.frame (2,000 x 63)

    +

    Columns: 2
    Length: 2,000 (of which NA: 0 = 0.00%)
    Unique: 96

    Shortest: 12
    @@ -405,12 +405,12 @@ Longest: 34

    Frequencies of numeric values

    Frequency tables can be created of any input.

    In case of numeric values (like integers, doubles, etc.) additional descriptive statistics will be calculated and shown into the header:

    - -

    Frequency table of age from a data.frame (981 x 49)
    -Class: numeric (numeric)
    +

    +

    Frequency table of age from a data.frame (981 x 49)

    +

    Class: numeric (numeric)
    Length: 981 (of which NA: 0 = 0.00%)
    Unique: 73

    Mean: 71.08
    @@ -486,12 +486,12 @@ Outliers: 15 (unique count: 12)

    Frequencies of factors

    To sort frequencies of factors on factor level instead of item count, use the sort.count parameter.

    sort.count is TRUE by default. Compare this default behaviour…

    - -

    Frequency table of hospital_id from a data.frame (2,000 x 49)
    -Class: factor (numeric)
    -Levels: A, B, C, D
    +

    +

    Frequency table of hospital_id from a data.frame (2,000 x 49)

    +

    Class: factor (numeric)
    Length: 2,000 (of which NA: 0 = 0.00%)
    +Levels: 4: A, B, C, D
    Unique: 4

  • @@ -538,12 +538,12 @@ Unique: 4

    … with this, where items are now sorted on count:

    -
    septic_patients %>%
    -  freq(hospital_id, sort.count = FALSE)
    -

    Frequency table of hospital_id from a data.frame (2,000 x 49)
    -Class: factor (numeric)
    -Levels: A, B, C, D
    +

    septic_patients %>%
    +  freq(hospital_id, sort.count = FALSE)
    +

    Frequency table of hospital_id from a data.frame (2,000 x 49)

    +

    Class: factor (numeric)
    Length: 2,000 (of which NA: 0 = 0.00%)
    +Levels: 4: A, B, C, D
    Unique: 4

    @@ -590,14 +590,14 @@ Unique: 4

    All classes will be printed into the header (default is FALSE when using markdown like this document). Variables with the new rsi class of this AMR package are actually ordered factors and have three classes (look at Class in the header):

    -
    septic_patients %>%
    -  freq(amox, header = TRUE)
    -

    Frequency table of amox from a data.frame (2,000 x 49)
    -Class: factor > ordered > rsi (numeric)
    -Levels: S < I < R
    +

    septic_patients %>%
    +  freq(amox, header = TRUE)
    +

    Frequency table of amox from a data.frame (2,000 x 49)

    +

    Class: factor > ordered > rsi (numeric)
    Length: 2,000 (of which NA: 771 = 38.55%)
    +Levels: 3: S < I < R
    Unique: 3

    -

    %IR: 55.82% (ratio S : IR = 1.0 : 1.3)

    +

    %IR: 34.30% (ratio 1:1.3)

    @@ -639,10 +639,10 @@ Unique: 3

    Frequencies of dates

    Frequencies of dates will show the oldest and newest date in the data, and the amount of days between them:

    -
    septic_patients %>%
    -  freq(date, nmax = 5, header = TRUE)
    -

    Frequency table of date from a data.frame (2,000 x 49)
    -Class: Date (numeric)
    +

    septic_patients %>%
    +  freq(date, nmax = 5, header = TRUE)
    +

    Frequency table of date from a data.frame (2,000 x 49)

    +

    Class: Date (numeric)
    Length: 2,000 (of which NA: 0 = 0.00%)
    Unique: 1,140

    Oldest: 2 January 2002
    @@ -706,11 +706,11 @@ Median: 31 July 2009 (47.39%)

    Assigning a frequency table to an object

    A frequency table is actaually a regular data.frame, with the exception that it contains an additional class.

    - +

    [1] “frequency_tbl” “data.frame”

    Because of this additional class, a frequency table prints like the examples above. But the object itself contains the complete table without a row limitation:

    -
    dim(my_df)
    +
    dim(my_df)

    [1] 74 5

    @@ -721,14 +721,14 @@ Median: 31 July 2009 (47.39%)

    Parameter na.rm

    With the na.rm parameter (defaults to TRUE, but they will always be shown into the header), you can include NA values in the frequency table:

    -
    septic_patients %>%
    -  freq(amox, na.rm = FALSE)
    -

    Frequency table of amox from a data.frame (2,000 x 49)
    -Class: factor > ordered > rsi (numeric)
    -Levels: S < I < R
    -Length: 2,771 (of which NA: 771 = 27.82%)
    +

    septic_patients %>%
    +  freq(amox, na.rm = FALSE)
    +

    Frequency table of amox from a data.frame (2,000 x 49)

    +

    Class: factor > ordered > rsi (numeric)
    +Length: 2,000 (of which NA: 771 = 38.55%)
    +Levels: 3: S < I < R
    Unique: 4

    -

    %IR: 34.30% (ratio S : IR = 1.0 : 1.3)

    +

    %IR: 34.30% (ratio 1:1.3)

    @@ -779,12 +779,12 @@ Unique: 4

    Parameter row.names

    The default frequency tables shows row indices. To remove them, use row.names = FALSE:

    -
    septic_patients %>%
    -  freq(hospital_id, row.names = FALSE)
    -

    Frequency table of hospital_id from a data.frame (2,000 x 49)
    -Class: factor (numeric)
    -Levels: A, B, C, D
    +

    septic_patients %>%
    +  freq(hospital_id, row.names = FALSE)
    +

    Frequency table of hospital_id from a data.frame (2,000 x 49)

    +

    Class: factor (numeric)
    Length: 2,000 (of which NA: 0 = 0.00%)
    +Levels: 4: A, B, C, D
    Unique: 4

    @@ -831,12 +831,12 @@ Unique: 4

    Parameter markdown

    The markdown parameter is TRUE at default in non-interactive sessions, like in reports created with R Markdown. This will always print all rows, unless nmax is set.

    -
    septic_patients %>%
    -  freq(hospital_id, markdown = TRUE)
    -

    Frequency table of hospital_id from a data.frame (2,000 x 49)
    -Class: factor (numeric)
    -Levels: A, B, C, D
    +

    septic_patients %>%
    +  freq(hospital_id, markdown = TRUE)
    +

    Frequency table of hospital_id from a data.frame (2,000 x 49)

    +

    Class: factor (numeric)
    Length: 2,000 (of which NA: 0 = 0.00%)
    +Levels: 4: A, B, C, D
    Unique: 4

    diff --git a/docs/articles/mo_property.html b/docs/articles/mo_property.html index 812e42df..3a1da9ed 100644 --- a/docs/articles/mo_property.html +++ b/docs/articles/mo_property.html @@ -185,7 +185,7 @@

    How to get properties of a microorganism

    Matthijs S. Berends

    -

    12 February 2019

    +

    14 February 2019

    diff --git a/docs/articles/resistance_predict.html b/docs/articles/resistance_predict.html index b547d194..531e53c3 100644 --- a/docs/articles/resistance_predict.html +++ b/docs/articles/resistance_predict.html @@ -185,7 +185,7 @@

    How to predict antimicrobial resistance

    Matthijs S. Berends

    -

    12 February 2019

    +

    14 February 2019

    diff --git a/docs/reference/AMR-deprecated.html b/docs/reference/AMR-deprecated.html index fe2e50a2..f86c8d56 100644 --- a/docs/reference/AMR-deprecated.html +++ b/docs/reference/AMR-deprecated.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/AMR.html b/docs/reference/AMR.html index cdb35b88..8a29108d 100644 --- a/docs/reference/AMR.html +++ b/docs/reference/AMR.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/ITIS.html b/docs/reference/ITIS.html index e3d0f43a..12c02903 100644 --- a/docs/reference/ITIS.html +++ b/docs/reference/ITIS.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/WHOCC.html b/docs/reference/WHOCC.html index bc9fad61..42a5e086 100644 --- a/docs/reference/WHOCC.html +++ b/docs/reference/WHOCC.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/WHONET.html b/docs/reference/WHONET.html index ba9e5742..282ca212 100644 --- a/docs/reference/WHONET.html +++ b/docs/reference/WHONET.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/abname.html b/docs/reference/abname.html index ae38727e..2d5c5f6f 100644 --- a/docs/reference/abname.html +++ b/docs/reference/abname.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/age.html b/docs/reference/age.html index e0a379ca..9719d625 100644 --- a/docs/reference/age.html +++ b/docs/reference/age.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/age_groups.html b/docs/reference/age_groups.html index be948266..2cb26f00 100644 --- a/docs/reference/age_groups.html +++ b/docs/reference/age_groups.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/antibiotics.html b/docs/reference/antibiotics.html index 124556bb..9514f821 100644 --- a/docs/reference/antibiotics.html +++ b/docs/reference/antibiotics.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/as.atc.html b/docs/reference/as.atc.html index 0cf85eb3..23e15ff3 100644 --- a/docs/reference/as.atc.html +++ b/docs/reference/as.atc.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/as.mic.html b/docs/reference/as.mic.html index 5a92422c..fd4b3a76 100644 --- a/docs/reference/as.mic.html +++ b/docs/reference/as.mic.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/as.mo.html b/docs/reference/as.mo.html index 59158283..f5160d59 100644 --- a/docs/reference/as.mo.html +++ b/docs/reference/as.mo.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/as.rsi.html b/docs/reference/as.rsi.html index 9d17eda5..eaa246dc 100644 --- a/docs/reference/as.rsi.html +++ b/docs/reference/as.rsi.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/atc_online.html b/docs/reference/atc_online.html index d157add9..5307b3c6 100644 --- a/docs/reference/atc_online.html +++ b/docs/reference/atc_online.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/atc_property.html b/docs/reference/atc_property.html index af4bf8f1..8bc08031 100644 --- a/docs/reference/atc_property.html +++ b/docs/reference/atc_property.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/availability.html b/docs/reference/availability.html index f05cc130..762976e8 100644 --- a/docs/reference/availability.html +++ b/docs/reference/availability.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/count.html b/docs/reference/count.html index 62dccb13..566f7d3f 100644 --- a/docs/reference/count.html +++ b/docs/reference/count.html @@ -111,7 +111,7 @@ count_R and count_IR can be used to count resistant isolates, count_S and count_
  • - + Predict antimicrobial resistance @@ -132,14 +132,14 @@ count_R and count_IR can be used to count resistant isolates, count_S and count_
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/eucast_rules.html b/docs/reference/eucast_rules.html index 63ca46f3..db29db09 100644 --- a/docs/reference/eucast_rules.html +++ b/docs/reference/eucast_rules.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/figures/benchmark_1.png b/docs/reference/figures/benchmark_1.png new file mode 100644 index 00000000..10613186 Binary files /dev/null and b/docs/reference/figures/benchmark_1.png differ diff --git a/docs/reference/figures/benchmark_2.png b/docs/reference/figures/benchmark_2.png new file mode 100644 index 00000000..a28381c1 Binary files /dev/null and b/docs/reference/figures/benchmark_2.png differ diff --git a/docs/reference/first_isolate.html b/docs/reference/first_isolate.html index 2fa42e85..6a933c76 100644 --- a/docs/reference/first_isolate.html +++ b/docs/reference/first_isolate.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/freq.html b/docs/reference/freq.html index c3fba94b..984ed3dd 100644 --- a/docs/reference/freq.html +++ b/docs/reference/freq.html @@ -111,7 +111,7 @@ top_freq can be used to get the top/bottom n items of a frequency table, with co
  • - + Predict antimicrobial resistance @@ -132,14 +132,14 @@ top_freq can be used to get the top/bottom n items of a frequency table, with co
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/g.test.html b/docs/reference/g.test.html index 5ec01421..2a424420 100644 --- a/docs/reference/g.test.html +++ b/docs/reference/g.test.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/get_locale.html b/docs/reference/get_locale.html index 00275c56..c1da1cbd 100644 --- a/docs/reference/get_locale.html +++ b/docs/reference/get_locale.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/ggplot_rsi.html b/docs/reference/ggplot_rsi.html index cf5b7f0d..2aaf9650 100644 --- a/docs/reference/ggplot_rsi.html +++ b/docs/reference/ggplot_rsi.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/guess_ab_col.html b/docs/reference/guess_ab_col.html index c39fc7ff..2b699ed8 100644 --- a/docs/reference/guess_ab_col.html +++ b/docs/reference/guess_ab_col.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/join.html b/docs/reference/join.html index 05b5404e..5624fb43 100644 --- a/docs/reference/join.html +++ b/docs/reference/join.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/key_antibiotics.html b/docs/reference/key_antibiotics.html index 5867bb58..72d175a6 100644 --- a/docs/reference/key_antibiotics.html +++ b/docs/reference/key_antibiotics.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/kurtosis.html b/docs/reference/kurtosis.html index 1b52aa61..2fa24e13 100644 --- a/docs/reference/kurtosis.html +++ b/docs/reference/kurtosis.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/like.html b/docs/reference/like.html index c6a2892a..4d035350 100644 --- a/docs/reference/like.html +++ b/docs/reference/like.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/mdro.html b/docs/reference/mdro.html index b57fb00f..603a767d 100644 --- a/docs/reference/mdro.html +++ b/docs/reference/mdro.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/microorganisms.codes.html b/docs/reference/microorganisms.codes.html index bf51b53a..4633a9d0 100644 --- a/docs/reference/microorganisms.codes.html +++ b/docs/reference/microorganisms.codes.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/microorganisms.html b/docs/reference/microorganisms.html index 60c349b7..90f7c5c0 100644 --- a/docs/reference/microorganisms.html +++ b/docs/reference/microorganisms.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/microorganisms.old.html b/docs/reference/microorganisms.old.html index 03e7f329..3e976e17 100644 --- a/docs/reference/microorganisms.old.html +++ b/docs/reference/microorganisms.old.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/mo_property.html b/docs/reference/mo_property.html index 69732bef..2098a29f 100644 --- a/docs/reference/mo_property.html +++ b/docs/reference/mo_property.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/mo_source.html b/docs/reference/mo_source.html index 23eb4857..7250c4f7 100644 --- a/docs/reference/mo_source.html +++ b/docs/reference/mo_source.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/p.symbol.html b/docs/reference/p.symbol.html index 3217086b..7f15d201 100644 --- a/docs/reference/p.symbol.html +++ b/docs/reference/p.symbol.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/portion.html b/docs/reference/portion.html index c91a1b0c..6b9d0cbe 100644 --- a/docs/reference/portion.html +++ b/docs/reference/portion.html @@ -111,7 +111,7 @@ portion_R and portion_IR can be used to calculate resistance, portion_S and port
  • - + Predict antimicrobial resistance @@ -132,14 +132,14 @@ portion_R and portion_IR can be used to calculate resistance, portion_S and port
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/read.4D.html b/docs/reference/read.4D.html index 5fd56a25..6d6c0eb6 100644 --- a/docs/reference/read.4D.html +++ b/docs/reference/read.4D.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/resistance_predict.html b/docs/reference/resistance_predict.html index 3a4efff0..3745dd79 100644 --- a/docs/reference/resistance_predict.html +++ b/docs/reference/resistance_predict.html @@ -359,7 +359,7 @@ On our website https://msberends.gitla library(dplyr) x <- septic_patients %>% filter_first_isolate() %>% - filter(mo_genus(mo) == "Staphylococcus") %>% + filter(mo_genus(mo) == "Staphylococcus") %>% resistance_predict("peni") plot(x) @@ -373,27 +373,27 @@ On our website https://msberends.gitla if (!require(ggplot2)) { data <- septic_patients %>% - filter(mo == as.mo("E. coli")) %>% + filter(mo == as.mo("E. coli")) %>% resistance_predict(col_ab = "amox", col_date = "date", info = FALSE, minimum = 15) - ggplot(data, - aes(x = year)) + - geom_col(aes(y = value), + ggplot(data, + aes(x = year)) + + geom_col(aes(y = value), fill = "grey75") + - geom_errorbar(aes(ymin = se_min, + geom_errorbar(aes(ymin = se_min, ymax = se_max), colour = "grey50") + - scale_y_continuous(limits = c(0, 1), + scale_y_continuous(limits = c(0, 1), breaks = seq(0, 1, 0.1), labels = paste0(seq(0, 100, 10), "%")) + - labs(title = expression(paste("Forecast of amoxicillin resistance in ", + labs(title = expression(paste("Forecast of amoxicillin resistance in ", italic("E. coli"))), y = "%IR", x = "Year") + - theme_minimal(base_size = 13) + theme_minimal(base_size = 13) } # } diff --git a/docs/reference/rsi.html b/docs/reference/rsi.html index ba849e6d..f5503c0f 100644 --- a/docs/reference/rsi.html +++ b/docs/reference/rsi.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/septic_patients.html b/docs/reference/septic_patients.html index 8a75c3d2..fc4dbbd5 100644 --- a/docs/reference/septic_patients.html +++ b/docs/reference/septic_patients.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/skewness.html b/docs/reference/skewness.html index cd672f49..1b019af9 100644 --- a/docs/reference/skewness.html +++ b/docs/reference/skewness.html @@ -111,7 +111,7 @@ When negative: the left tail is longer; the mass of the distribution is concentr
  • - + Predict antimicrobial resistance @@ -132,14 +132,14 @@ When negative: the left tail is longer; the mass of the distribution is concentr
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/docs/reference/supplementary_data.html b/docs/reference/supplementary_data.html index d48f3a70..9745acc7 100644 --- a/docs/reference/supplementary_data.html +++ b/docs/reference/supplementary_data.html @@ -110,7 +110,7 @@
  • - + Predict antimicrobial resistance @@ -131,14 +131,14 @@
  • - + Get properties of a microorganism
  • - + Get properties of an antibiotic diff --git a/man/figures/benchmark_1.png b/man/figures/benchmark_1.png new file mode 100644 index 00000000..10613186 Binary files /dev/null and b/man/figures/benchmark_1.png differ diff --git a/man/figures/benchmark_2.png b/man/figures/benchmark_2.png new file mode 100644 index 00000000..a28381c1 Binary files /dev/null and b/man/figures/benchmark_2.png differ diff --git a/vignettes/benchmarks.Rmd b/vignettes/benchmarks.Rmd index cc408fb0..7c5d103a 100755 --- a/vignettes/benchmarks.Rmd +++ b/vignettes/benchmarks.Rmd @@ -25,9 +25,9 @@ knitr::opts_chunk$set( One of the most important features of this package is the complete microbial taxonomic database, supplied by ITIS (https://www.itis.gov). We created a function `as.mo()` that transforms any user input value to a valid microbial ID by using AI (Artificial Intelligence) and based on the taxonomic tree of ITIS. -Using the `microbenchmark` package, we can review the calculation performance of this function. +Using the `microbenchmark` package, we can review the calculation performance of this function. Its function `microbenchmark()` calculates different input expressions independently of each others and runs every expression 100 times. -```r +```{r, eval = FALSE} library(microbenchmark) library(AMR) ``` @@ -36,53 +36,65 @@ In the next test, we try to 'coerce' different input values for *Staphylococcus But the calculation time differs a lot. Here, the AI effect can be reviewed best: -```r -microbenchmark(A = as.mo("stau"), - B = as.mo("staaur"), - C = as.mo("S. aureus"), - D = as.mo("S. aureus"), - E = as.mo("STAAUR"), - F = as.mo("Staphylococcus aureus"), - G = as.mo("B_STPHY_AUR"), - times = 10, - unit = "ms") +```{r, eval = FALSE} +benchmark <- microbenchmark(as.mo("sau"), + as.mo("stau"), + as.mo("staaur"), + as.mo("S. aureus"), + as.mo("S. aureus"), + as.mo("STAAUR"), + as.mo("Staphylococcus aureus"), + as.mo("B_STPHY_AUR")) +print(benchmark, unit = "ms") # Unit: milliseconds -# expr min lq mean median uq max neval -# A 34.745551 34.798630 35.2596102 34.8994810 35.258325 38.067062 10 -# B 7.095386 7.125348 7.2219948 7.1613865 7.240377 7.495857 10 -# C 11.677114 11.733826 11.8304789 11.7715050 11.843756 12.317559 10 -# D 11.694435 11.730054 11.9859313 11.8775585 12.206371 12.750016 10 -# E 7.044402 7.117387 7.2271630 7.1923610 7.246104 7.742396 10 -# F 6.642326 6.778446 6.8988042 6.8753165 6.923577 7.513945 10 -# G 0.106788 0.131023 0.1351229 0.1357725 0.144014 0.146458 10 +# expr min lq mean median uq max neval +# as.mo("sau") 18.983141 19.121148 19.9676944 19.1967505 19.2871260 38.635012 100 +# as.mo("stau") 37.503863 37.692049 38.9856547 37.8244335 37.9851040 57.576107 100 +# as.mo("staaur") 18.945427 19.122579 19.6392560 19.2241285 19.3536140 38.687672 100 +# as.mo("S. aureus") 15.305229 15.471103 16.3477096 15.5545630 15.6689280 36.363005 100 +# as.mo("S. aureus") 15.308232 15.469881 16.5269706 15.5506870 15.6277560 42.155292 100 +# as.mo("STAAUR") 18.984049 19.117166 19.6104597 19.2219285 19.3161095 38.638783 100 +# as.mo("Staphylococcus aureus") 8.103546 8.198285 8.6422018 8.2636915 8.3200535 27.002527 100 +# as.mo("B_STPHY_AUR") 0.156236 0.196779 0.2017926 0.2035535 0.2115505 0.241861 100 + +par(mar = c(5, 15, 4, 2)) # set more space for left margin text (15) +boxplot(benchmark, horizontal = TRUE, las = 1, unit = "ms", log = FALSE, xlab = "", ylim = c(0, 200), + main = expression(paste("Benchmark of ", italic("Staphylococcus aureus")))) ``` -In the table above, all measurements are shown in milliseconds (thousands of seconds), tested on a quite regular Linux server from 2007 (Core 2 Duo 2.7 GHz, 2 GB DDR2 RAM). A value of 6.9 milliseconds means it will roughly determine 144 input values per second. It case of 39.2 milliseconds, this is only 26 input values per second. The more an input value resembles a full name (like C, D and F), the faster the result will be found. In case of G, the input is already a valid MO code, so it only almost takes no time at all (0.0001 seconds on our server). +![](../reference/figures/benchmark_1.png) + +In the table above, all measurements are shown in milliseconds (thousands of seconds), tested on a quite regular Linux server from 2007 (Core 2 Duo 2.7 GHz, 2 GB DDR2 RAM). A value of 8 milliseconds means it can determine 125 input values per second. It case of 40 milliseconds, this is only 25 input values per second. The more an input value resembles a full name, the faster the result will be found. In case of `as.mo("B_STPHY_AUR")`, the input is already a valid MO code, so it only almost takes no time at all (0.0002 seconds on our server). To achieve this speed, the `as.mo` function also takes into account the prevalence of human pathogenic microorganisms. The downside is of course that less prevalent microorganisms will be determined far less faster. See this example for the ID of *Burkholderia nodosa* (`B_BRKHL_NOD`): -```r -microbenchmark(A = as.mo("buno"), - B = as.mo("burnod"), - C = as.mo("B. nodosa"), - D = as.mo("B. nodosa"), - E = as.mo("BURNOD"), - F = as.mo("Burkholderia nodosa"), - G = as.mo("B_BRKHL_NOD"), - times = 10, - unit = "ms") +```{r, eval = FALSE} +benchmark <- microbenchmark(as.mo("buno"), + as.mo("burnod"), + as.mo("B. nodosa"), + as.mo("B. nodosa"), + as.mo("BURNOD"), + as.mo("Burkholderia nodosa"), + as.mo("B_BRKHL_NOD")) +print(benchmark, unit = "ms") + # Unit: milliseconds -# expr min lq mean median uq max neval -# A 124.175427 124.474837 125.8610536 125.3750560 126.160945 131.485994 10 -# B 154.249713 155.364729 160.9077032 156.8738940 157.136183 197.315105 10 -# C 66.066571 66.162393 66.5538611 66.4488130 66.698077 67.623404 10 -# D 86.747693 86.918665 90.7831016 87.8149725 89.440982 116.767991 10 -# E 154.863827 155.208563 162.6535954 158.4062465 168.593785 187.378088 10 -# F 32.427028 32.638648 32.9929454 32.7860475 32.992813 34.674241 10 -# G 0.213155 0.216578 0.2369226 0.2338985 0.253734 0.285581 10 +# expr min lq mean median uq max neval +# as.mo("buno") 125.141333 125.8553210 129.5727691 126.3899910 127.0954925 194.51985 100 +# as.mo("burnod") 142.300359 144.1611750 147.0642288 144.6074960 145.5243025 176.91649 100 +# as.mo("B. nodosa") 81.530132 81.9360840 83.3915418 82.1852770 82.6848870 102.63184 100 +# as.mo("B. nodosa") 81.109547 81.9836805 84.7595894 82.3437825 82.8282705 110.67036 100 +# as.mo("BURNOD") 143.163527 143.9134485 148.7192688 144.5582580 145.7489115 314.92070 100 +# as.mo("Burkholderia nodosa") 36.226325 36.5499000 37.1309929 36.6581540 36.7551985 56.25597 100 +# as.mo("B_BRKHL_NOD") 0.172509 0.3038455 0.4806591 0.3078265 0.3121215 19.16173 100 + +boxplot(benchmark, horizontal = TRUE, las = 1, unit = "ms", log = FALSE, xlab = "", ylim = c(0, 200), + main = expression(paste("Benchmark of ", italic("Burkholderia nodosa")))) ``` -That takes up to 11 times as much time! A value of 158.4 milliseconds means it can only determine ~6 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance. +![](../reference/figures/benchmark_2.png) + +That takes up to 8 times as much time! A value of 145 milliseconds means it can only determine ~7 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance. To relieve this pitfall and further improve performance, two important calculations take almost no time at all: **repetitive results** and **already precalculated results**. @@ -90,7 +102,7 @@ To relieve this pitfall and further improve performance, two important calculati Repetitive results mean that unique values are present more than once. Unique values will only be calculated once by `as.mo()`. We will use `mo_fullname()` for this test - a helper function that returns the full microbial name (genus, species and possibly subspecies) and uses `as.mo()` internally. -```r +```{r, eval = FALSE} library(dplyr) # take 500,000 random MO codes from the septic_patients data set x = septic_patients %>% @@ -118,19 +130,19 @@ So transforming 500,000 values (!) of 96 unique values only takes 0.12 seconds ( Results of a tenfold - 5,000,000 values: -```r +```{r, eval = FALSE} # Unit: milliseconds # expr min lq mean median uq max neval # X 882.9045 901.3011 1001.677 940.3421 1168.088 1226.846 10 ``` -Even the full names of 5 *Million* values are calculated within a second. +Even determining the full names of 5 *Million* values is done within a second. ### Precalculated results What about precalculated results? If the input is an already precalculated result of a helper function like `mo_fullname()`, it almost doesn't take any time at all (see 'C' below): -```r +```{r, eval = FALSE} microbenchmark(A = mo_fullname("B_STPHY_AUR"), B = mo_fullname("S. aureus"), C = mo_fullname("Staphylococcus aureus"), @@ -145,7 +157,7 @@ microbenchmark(A = mo_fullname("B_STPHY_AUR"), So going from `mo_fullname("Staphylococcus aureus")` to `"Staphylococcus aureus"` takes 0.0001 seconds - it doesn't even start calculating *if the result would be the same as the expected resulting value*. That goes for all helper functions: -```r +```{r, eval = FALSE} microbenchmark(A = mo_species("aureus"), B = mo_genus("Staphylococcus"), C = mo_fullname("Staphylococcus aureus"), @@ -176,7 +188,7 @@ Of course, when running `mo_phylum("Firmicutes")` the function has zero knowledg When the system language is non-English and supported by this `AMR` package, some functions take a little while longer: -```r +```{r, eval = FALSE} mo_fullname("CoNS", language = "en") # or just mo_fullname("CoNS") on an English system # "Coagulase Negative Staphylococcus (CoNS)"