diff --git a/DESCRIPTION b/DESCRIPTION index dc718674..598c0391 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR -Version: 0.5.0.9025 -Date: 2019-03-26 +Version: 0.6.0 +Date: 2019-03-27 Title: Antimicrobial Resistance Analysis Authors@R: c( person( diff --git a/NEWS.md b/NEWS.md index 2d7bed2b..ff88a349 100755 --- a/NEWS.md +++ b/NEWS.md @@ -1,5 +1,4 @@ -# AMR 0.5.0.90xx -**Note: this is the development version, which will eventually be released as AMR 0.6.0.** +# AMR 0.6.0 **New website!** @@ -11,7 +10,7 @@ We've got a new website: [https://msberends.gitlab.io/AMR](https://msberends.git #### New * **BREAKING**: removed deprecated functions, parameters and references to 'bactid'. Use `as.mo()` to identify an MO code. * Catalogue of Life as a new taxonomic source for data about microorganisms, which also contains all ITIS data we used previously. The `microorganisms` data set now contains: - * All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses + * All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria and Protozoa * All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales (covering at least like all species of *Aspergillus*, *Candida*, *Pneumocystis*, *Saccharomyces* and *Trichophyton*) * All ~2,000 (sub)species from ~100 other relevant genera, from the kingdoms of Animalia and Plantae (like *Strongyloides* and *Taenia*) * All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed @@ -122,7 +121,8 @@ We've got a new website: [https://msberends.gitlab.io/AMR](https://msberends.git as.mo(..., allow_uncertain = 0) ``` Using `as.mo(..., allow_uncertain = 3)` could lead to very unreliable results. - * All microbial IDs that are found with zero uncertainty are now saved to a local file `~/.Rhistory_mo`. Use the new function `clean_mo_history()` to delete this file, which resets the algorithms. + * Implemented the latest publication of Becker *et al.* (2019), for categorising coagulase-negative *Staphylococci* + * All microbial IDs that found are now saved to a local file `~/.Rhistory_mo`. Use the new function `clean_mo_history()` to delete this file, which resets the algorithms. * Incoercible results will now be considered 'unknown', MO code `UNKNOWN`. On foreign systems, properties of these will be translated to all languages already previously supported: German, Dutch, French, Italian, Spanish and Portuguese: ```r mo_genus("qwerty", language = "es") diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index a8fb96d7..3cf55fe5 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -78,7 +78,7 @@ AMR (for R) - 0.5.0.9025 + 0.6.0 diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index 6c761450..d2cef62d 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -40,7 +40,7 @@ AMR (for R) - 0.5.0.9025 + 0.6.0 @@ -192,7 +192,7 @@

How to conduct AMR analysis

Matthijs S. Berends

-

26 March 2019

+

27 March 2019

@@ -201,7 +201,7 @@ -

Note: values on this page will change with every website update since they are based on randomly created values and the page was written in RMarkdown. However, the methodology remains unchanged. This page was generated on 26 March 2019.

+

Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 27 March 2019.

Introduction

@@ -217,21 +217,21 @@ -2019-03-26 +2019-03-27 abcd Escherichia coli S S -2019-03-26 +2019-03-27 abcd Escherichia coli S R -2019-03-26 +2019-03-27 efgh Escherichia coli R @@ -327,71 +327,71 @@ -2014-08-13 -C5 +2015-10-28 +A4 +Hospital B +Escherichia coli +S +R +S +S +M + + +2011-07-22 +Y8 +Hospital C +Streptococcus pneumoniae +S +S +S +S +F + + +2014-09-30 +N3 +Hospital B +Streptococcus pneumoniae +S +S +S +S +M + + +2013-02-14 +Z6 +Hospital A +Escherichia coli +S +S +R +S +F + + +2015-02-01 +D5 Hospital C Escherichia coli -R S -R +I +S S M -2017-11-08 -R4 -Hospital A -Escherichia coli -S -I -S -S -F - - -2015-01-27 -U9 -Hospital D -Klebsiella pneumoniae -S -S -S -S -F - - -2010-09-17 -R7 -Hospital A -Escherichia coli -R -I -R -S -F - - -2017-04-07 -Z10 +2013-12-28 +V6 Hospital B Staphylococcus aureus +R S -S -S +R S F - -2015-08-27 -C7 -Hospital A -Escherichia coli -S -S -S -S -M -

Now, let’s start the cleaning and the analysis!

@@ -411,8 +411,8 @@ #> #> Item Count Percent Cum. Count Cum. Percent #> --- ----- ------- -------- ----------- ------------- -#> 1 M 10,435 52.2% 10,435 52.2% -#> 2 F 9,565 47.8% 20,000 100.0% +#> 1 M 10,344 51.7% 10,344 51.7% +#> 2 F 9,656 48.3% 20,000 100.0%

So, we can draw at least two conclusions immediately. From a data scientist perspective, the data looks clean: only values M and F. From a researcher perspective: there are slightly more men. Nothing we didn’t already know.

The data is already quite clean, but we still need to transform some variables. The bacteria column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate() function of the dplyr package makes this really easy:

data <- data %>%
@@ -443,10 +443,10 @@
 #> Kingella kingae (no changes)
 #> 
 #> EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
-#> Table 1:  Intrinsic resistance in Enterobacteriaceae (1262 changes)
+#> Table 1:  Intrinsic resistance in Enterobacteriaceae (1342 changes)
 #> Table 2:  Intrinsic resistance in non-fermentative Gram-negative bacteria (no changes)
 #> Table 3:  Intrinsic resistance in other Gram-negative bacteria (no changes)
-#> Table 4:  Intrinsic resistance in Gram-positive bacteria (2756 changes)
+#> Table 4:  Intrinsic resistance in Gram-positive bacteria (2726 changes)
 #> Table 8:  Interpretive rules for B-lactam agents and Gram-positive cocci (no changes)
 #> Table 9:  Interpretive rules for B-lactam agents and Gram-negative rods (no changes)
 #> Table 10: Interpretive rules for B-lactam agents and other Gram-negative bacteria (no changes)
@@ -462,9 +462,9 @@
 #> Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no changes)
 #> Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no changes)
 #> 
-#> => EUCAST rules affected 7,403 out of 20,000 rows
+#> => EUCAST rules affected 7,400 out of 20,000 rows
 #>    -> added 0 test results
-#>    -> changed 4,018 test results (0 to S; 0 to I; 4,018 to R)
+#> -> changed 4,068 test results (0 to S; 0 to I; 4,068 to R)

@@ -489,8 +489,8 @@ #> NOTE: Using column `bacteria` as input for `col_mo`. #> NOTE: Using column `date` as input for `col_date`. #> NOTE: Using column `patient_id` as input for `col_patient_id`. -#> => Found 5,648 first isolates (28.2% of total)

-

So only 28.2% is suitable for resistance analysis! We can now filter on it with the filter() function, also from the dplyr package:

+#> => Found 5,665 first isolates (28.3% of total) +

So only 28.3% is suitable for resistance analysis! We can now filter on it with the filter() function, also from the dplyr package:

data_1st <- data %>% 
   filter(first == TRUE)

For future use, the above two syntaxes can be shortened with the filter_first_isolate() function:

@@ -516,111 +516,111 @@ 1 -2010-01-29 -P7 +2010-09-11 +L7 B_ESCHR_COL -S -S +R +I S S TRUE 2 -2010-05-18 -P7 +2010-11-07 +L7 B_ESCHR_COL S +S +S R -S -S FALSE 3 -2010-06-01 -P7 +2011-01-16 +L7 B_ESCHR_COL -R S R S +S FALSE 4 -2010-07-21 -P7 +2011-02-25 +L7 B_ESCHR_COL +R S -I S S FALSE 5 -2010-08-20 -P7 +2011-08-07 +L7 B_ESCHR_COL S -R +S S S FALSE 6 -2010-12-14 -P7 +2011-08-16 +L7 B_ESCHR_COL S -I S +R S FALSE 7 -2011-03-02 -P7 +2011-10-08 +L7 B_ESCHR_COL S -S -S R +S +S TRUE 8 -2011-03-14 -P7 +2011-10-26 +L7 B_ESCHR_COL -S -S R S +S +S FALSE 9 -2011-05-28 -P7 +2012-01-15 +L7 B_ESCHR_COL S I -S +R S FALSE 10 -2011-08-09 -P7 +2012-02-08 +L7 B_ESCHR_COL -I S -R +S +S S FALSE @@ -637,7 +637,7 @@ #> NOTE: Using column `patient_id` as input for `col_patient_id`. #> NOTE: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics = FALSE to prevent this. #> [Criterion] Inclusion based on key antibiotics, ignoring I. -#> => Found 15,891 first weighted isolates (79.5% of total) +#> => Found 15,729 first weighted isolates (78.6% of total) @@ -654,11 +654,11 @@ - - + + - - + + @@ -666,35 +666,35 @@ - - + + + + - - - - + + - + - - + + + - @@ -702,83 +702,83 @@ - - + + - + - + - - + + - - - - - - - - - - - - - + + + + + + + + + + + + + - - + + - - + + - - + + - + - - + + - - + +
isolate
12010-01-29P72010-09-11L7 B_ESCHR_COLSSRI S S TRUE
22010-05-18P72010-11-07L7 B_ESCHR_COL SSS RSS FALSE TRUE
32010-06-01P72011-01-16L7 B_ESCHR_COLR S R SS FALSE TRUE
42010-07-21P72011-02-25L7 B_ESCHR_COLR SI S S FALSE
52010-08-20P72011-08-07L7 B_ESCHR_COL SRS S S FALSEFALSETRUE
62010-12-14P72011-08-16L7 B_ESCHR_COL SISSFALSEFALSE
72011-03-02P7B_ESCHR_COLSS S RSFALSETRUE
72011-10-08L7B_ESCHR_COLSRSS TRUE TRUE
82011-03-14P72011-10-26L7 B_ESCHR_COLSS R SSS FALSE TRUE
92011-05-28P72012-01-15L7 B_ESCHR_COL S ISR S FALSE TRUE
102011-08-09P72012-02-08L7 B_ESCHR_COLI SRSS S FALSE TRUE
-

Instead of 2, now 8 isolates are flagged. In total, 79.5% of all isolates are marked ‘first weighted’ - 51.2% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.

+

Instead of 2, now 10 isolates are flagged. In total, 78.6% of all isolates are marked ‘first weighted’ - 50.3% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.

As with filter_first_isolate(), there’s a shortcut for this new algorithm too:

data_1st <- data %>% 
   filter_first_weighted_isolate()
-

So we end up with 15,891 isolates for analysis.

+

So we end up with 15,729 isolates for analysis.

We can remove unneeded columns:

data_1st <- data_1st %>% 
   select(-c(first, keyab))
@@ -786,7 +786,6 @@
head(data_1st)
- @@ -803,15 +802,14 @@ - - - - + + + - + @@ -819,13 +817,42 @@ - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - + + @@ -835,14 +862,28 @@ - - - + + + + + + + + + + + + + + + + + + - - + @@ -850,54 +891,6 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
date patient_id hospital
12014-08-13C5Hospital C2015-10-28A4Hospital B B_ESCHR_COLR S R SS M Gram negative EscherichiaTRUE
42010-09-17R72011-07-22Y8Hospital CB_STRPT_PNESSSRFGram positiveStreptococcuspneumoniaeTRUE
2014-09-30N3Hospital BB_STRPT_PNESSSRMGram positiveStreptococcuspneumoniaeTRUE
2013-02-14Z6 Hospital A B_ESCHR_COLRISS R S FTRUE
52017-04-07Z102015-02-01D5Hospital CB_ESCHR_COLSISSMGram negativeEscherichiacoliTRUE
2013-12-28V6 Hospital B B_STPHY_AURR SSSR S F Gram positiveaureus TRUE
72012-04-03J2Hospital AB_ESCHR_COLRRRSMGram negativeEscherichiacoliTRUE
92017-09-09U3Hospital AB_ESCHR_COLRSSSFGram negativeEscherichiacoliTRUE
102015-12-21E1Hospital BB_ESCHR_COLSSSSMGram negativeEscherichiacoliTRUE

Time for the analysis!

@@ -915,9 +908,9 @@
freq(paste(data_1st$genus, data_1st$species))

Or can be used like the dplyr way, which is easier readable:

data_1st %>% freq(genus, species)
-

Frequency table of genus and species from a data.frame (15,891 x 13)

+

Frequency table of genus and species from a data.frame (15,729 x 13)

Columns: 2
-Length: 15,891 (of which NA: 0 = 0.00%)
+Length: 15,729 (of which NA: 0 = 0.00%)
Unique: 4

Shortest: 16
Longest: 24

@@ -934,33 +927,33 @@ Longest: 24

1 Escherichia coli -7,952 -50.0% -7,952 -50.0% +7,807 +49.6% +7,807 +49.6% 2 Staphylococcus aureus -3,895 -24.5% -11,847 +3,919 +24.9% +11,726 74.6% 3 Streptococcus pneumoniae -2,502 -15.7% -14,349 -90.3% +2,426 +15.4% +14,152 +90.0% 4 Klebsiella pneumoniae -1,542 -9.7% -15,891 +1,577 +10.0% +15,729 100.0% @@ -971,7 +964,7 @@ Longest: 24

Resistance percentages

The functions portion_S(), portion_SI(), portion_I(), portion_IR() and portion_R() can be used to determine the portion of a specific antimicrobial outcome. They can be used on their own:

data_1st %>% portion_IR(amox)
-#> [1] 0.4711472
+#> [1] 0.4796236

Or can be used in conjuction with group_by() and summarise(), both from the dplyr package:

data_1st %>% 
   group_by(hospital) %>% 
@@ -984,19 +977,19 @@ Longest: 24

Hospital A -0.4674370 +0.4841779 Hospital B -0.4698925 +0.4800215 Hospital C -0.4813574 +0.4663419 Hospital D -0.4712389 +0.4815057 @@ -1014,23 +1007,23 @@ Longest: 24

Hospital A -0.4674370 -4760 +0.4841779 +4835 Hospital B -0.4698925 -5580 +0.4800215 +5581 Hospital C -0.4813574 -2387 +0.4663419 +2258 Hospital D -0.4712389 -3164 +0.4815057 +3055 @@ -1050,27 +1043,27 @@ Longest: 24

Escherichia -0.7272384 -0.9034205 -0.9763581 +0.7376713 +0.8986807 +0.9754067 Klebsiella -0.7457847 -0.9014267 -0.9760052 +0.7305010 +0.8953710 +0.9727330 Staphylococcus -0.7245186 -0.9181001 -0.9756098 +0.7305435 +0.9275325 +0.9793315 Streptococcus -0.7234213 +0.7320692 0.0000000 -0.7234213 +0.7320692 diff --git a/docs/articles/AMR_files/figure-html/plot 1-1.png b/docs/articles/AMR_files/figure-html/plot 1-1.png index bb00739e..b76ef17b 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 1-1.png and b/docs/articles/AMR_files/figure-html/plot 1-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 3-1.png b/docs/articles/AMR_files/figure-html/plot 3-1.png index 1eb0f5f2..a6b86eeb 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 3-1.png and b/docs/articles/AMR_files/figure-html/plot 3-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 4-1.png b/docs/articles/AMR_files/figure-html/plot 4-1.png index 8232c403..9bc0d924 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 4-1.png and b/docs/articles/AMR_files/figure-html/plot 4-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 5-1.png b/docs/articles/AMR_files/figure-html/plot 5-1.png index 5a888aaf..95f86455 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 5-1.png and b/docs/articles/AMR_files/figure-html/plot 5-1.png differ diff --git a/docs/articles/EUCAST.html b/docs/articles/EUCAST.html index 47e067ad..5e05b9fd 100644 --- a/docs/articles/EUCAST.html +++ b/docs/articles/EUCAST.html @@ -40,7 +40,7 @@ AMR (for R) - 0.5.0.9023 + 0.6.0
@@ -192,7 +192,7 @@

How to apply EUCAST rules

Matthijs S. Berends

-

15 March 2019

+

27 March 2019

diff --git a/docs/articles/G_test.html b/docs/articles/G_test.html index d7cbe252..9f07c8e3 100644 --- a/docs/articles/G_test.html +++ b/docs/articles/G_test.html @@ -40,7 +40,7 @@ AMR (for R) - 0.5.0.9023 + 0.6.0 @@ -192,7 +192,7 @@

How to use the G-test

Matthijs S. Berends

-

15 March 2019

+

27 March 2019

diff --git a/docs/articles/SPSS.html b/docs/articles/SPSS.html index 9d08b222..30afa341 100644 --- a/docs/articles/SPSS.html +++ b/docs/articles/SPSS.html @@ -40,7 +40,7 @@ AMR (for R) - 0.5.0.9023 + 0.6.0 @@ -192,7 +192,7 @@

How to import data from SPSS / SAS / Stata

Matthijs S. Berends

-

15 March 2019

+

27 March 2019

@@ -213,7 +213,7 @@

If you sometimes write syntaxes in SPSS to run a complete analysis or to ‘automate’ some of your work, you should perhaps do this in R. You will notice that writing syntaxes in R is a lot more nifty and clever than in SPSS.

@@ -280,7 +280,7 @@

RStudio

-

To work with R, probably the best option is to use RStudio. It is an open-source and free desktop environment which not only allows you to run R code, but also supports project management, version management, package management and convenient import menus to work with other data sources. You can also run RStudio Server, which is nothing less than the complete RStudio software available as a website (e.g. in your corporate network or at home).

+

To work with R, probably the best option is to use RStudio. It is an open-source and free desktop environment which not only allows you to run R code, but also supports project management, version management, package management and convenient import menus to work with other data sources. You can also install RStudio Server on a private or corporate server, which brings nothing less than the complete RStudio software to you as a website (at home or at work).

To import a data file, just click Import Dataset in the Environment tab:

If additional packages are needed, RStudio will ask you if they should be installed on beforehand.

diff --git a/docs/articles/WHONET.html b/docs/articles/WHONET.html index e3c39870..ff1d5e4b 100644 --- a/docs/articles/WHONET.html +++ b/docs/articles/WHONET.html @@ -40,7 +40,7 @@ AMR (for R) - 0.5.0.9023 + 0.6.0
@@ -192,7 +192,7 @@

How to work with WHONET data

Matthijs S. Berends

-

15 March 2019

+

27 March 2019

@@ -234,10 +234,10 @@

Frequency table of mo from a data.frame (500 x 54)

Class: mo (character)
Length: 500 (of which NA: 0 = 0.00%)
-Unique: 37

+Unique: 39

Families: 10
Genera: 17
-Species: 35

+Species: 38

@@ -258,7 +258,7 @@ Species: 35

- + @@ -314,33 +314,34 @@ Species: 35

- - - - - - - - - - + + + + + + + + + +
2B_STPHYB_STPHY_CNS 74 14.8% 319
9B_STRPT81.6%44288.4%
10 B_ENTRB_CLO 5 1.0%44789.4%43987.8%
10B_ENTRC_COL40.8%44388.6%
-

(omitted 27 entries, n = 53 [10.6%])

+

(omitted 29 entries, n = 57 [11.4%])


 # our transformed antibiotic columns
 # amoxicillin/clavulanic acid (J01CR02) as an example
 data %>% freq(AMC_ND2)

Frequency table of AMC_ND2 from a data.frame (500 x 54)

+
# Warning: These values could not be coerced to a valid atc: "AMCND".

Class: factor > ordered > rsi (numeric)
Length: 500 (of which NA: 19 = 3.80%)
Levels: 3: S < I < R
Unique: 3

-

%IR: 25.00% (ratio 2.8:1)

+

%IR: 25.99%

diff --git a/docs/articles/atc_property.html b/docs/articles/atc_property.html index bcc1f409..4f0f304b 100644 --- a/docs/articles/atc_property.html +++ b/docs/articles/atc_property.html @@ -40,7 +40,7 @@ AMR (for R) - 0.5.0.9023 + 0.6.0 @@ -192,7 +192,7 @@

How to get properties of an antibiotic

Matthijs S. Berends

-

15 March 2019

+

27 March 2019

diff --git a/docs/articles/benchmarks.html b/docs/articles/benchmarks.html index 90cf473b..08ac4c56 100644 --- a/docs/articles/benchmarks.html +++ b/docs/articles/benchmarks.html @@ -40,7 +40,7 @@ AMR (for R) - 0.5.0.9024 + 0.6.0 @@ -192,7 +192,7 @@

Benchmarks

Matthijs S. Berends

-

18 March 2019

+

27 March 2019

@@ -218,13 +218,13 @@ print(S.aureus, unit ="ms", signif =2)#> Unit: milliseconds#> expr min lq mean median uq max neval -#> as.mo("sau") 18.0 18.0 22 18.0 18.0 61 10 -#> as.mo("stau") 49.0 50.0 62 50.0 50.0 130 10 -#> as.mo("staaur") 18.0 18.0 27 18.0 18.0 66 10 -#> as.mo("STAAUR") 18.0 18.0 23 18.0 19.0 66 10 -#> as.mo("S. aureus") 29.0 29.0 39 29.0 42.0 73 10 -#> as.mo("S. aureus") 29.0 29.0 38 29.0 31.0 72 10 -#> as.mo("Staphylococcus aureus") 8.3 8.3 12 8.3 8.8 44 10 +#> as.mo("sau") 18.0 18.0 22 18.0 20.0 48 10 +#> as.mo("stau") 48.0 48.0 76 67.0 92.0 150 10 +#> as.mo("staaur") 18.0 18.0 24 19.0 22.0 62 10 +#> as.mo("STAAUR") 18.0 18.0 23 18.0 20.0 61 10 +#> as.mo("S. aureus") 29.0 29.0 35 29.0 30.0 76 10 +#> as.mo("S. aureus") 29.0 29.0 37 29.0 30.0 79 10 +#> as.mo("Staphylococcus aureus") 8.2 8.2 13 8.2 8.4 52 10

In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 5 milliseconds means it can determine 200 input values per second. It case of 100 milliseconds, this is only 10 input values per second. The second input is the only one that has to be looked up thoroughly. All the others are known codes (the first one is a WHONET code) or common laboratory codes, or common full organism names like the last one. Full organism names are always preferred.

To achieve this speed, the as.mo function also takes into account the prevalence of human pathogenic microorganisms. The downside is of course that less prevalent microorganisms will be determined less fast. See this example for the ID of Thermus islandicus (B_THERMS_ISL), a bug probably never found before in humans:

T.islandicus <- microbenchmark(as.mo("theisl"),
@@ -236,12 +236,12 @@
 print(T.islandicus, unit = "ms", signif = 2)
 #> Unit: milliseconds
 #>                         expr min  lq mean median  uq max neval
-#>              as.mo("theisl") 470 470  490    470 510 520    10
-#>              as.mo("THEISL") 470 470  500    500 520 530    10
-#>       as.mo("T. islandicus")  74  74   84     75  77 130    10
-#>      as.mo("T.  islandicus")  74  74   84     74  75 120    10
-#>  as.mo("Thermus islandicus")  74  78  100    120 120 130    10
-

That takes 7.9 times as much time on average. A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance. Full names (like Thermus islandicus) are almost fast - these are the most probable input from most data sets.

+#> as.mo("theisl") 460 460 500 510 510 560 10 +#> as.mo("THEISL") 460 470 490 490 510 530 10 +#> as.mo("T. islandicus") 74 74 84 75 78 120 10 +#> as.mo("T. islandicus") 74 75 88 75 120 120 10 +#> as.mo("Thermus islandicus") 73 73 84 74 75 130 10 +

That takes 7.6 times as much time on average. A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance. Full names (like Thermus islandicus) are almost fast - these are the most probable input from most data sets.

In the figure below, we compare Escherichia coli (which is very common) with Prevotella brevis (which is moderately common) and with Thermus islandicus (which is very uncommon):

par(mar = c(5, 16, 4, 2)) # set more space for left margin text (16)
 
@@ -257,6 +257,7 @@
         main = "Benchmarks per prevalence")

In reality, the as.mo() functions learns from its own output to speed up determinations for next times. In above figure, this effect was disabled to show the difference with the boxplot below - when you would use as.mo() yourself:

+
#> File /home/uscloud/.Rhistory_mo removed.

The highest outliers are the first times. All next determinations were done in only thousands of seconds.

Still, uncommon microorganisms take a lot more time than common microorganisms, especially the first time. To relieve this pitfall and further improve performance, two important calculations take almost no time at all: repetitive results and already precalculated results.

@@ -264,103 +265,103 @@

Repetitive results

Repetitive results are unique values that are present more than once. Unique values will only be calculated once by as.mo(). We will use mo_fullname() for this test - a helper function that returns the full microbial name (genus, species and possibly subspecies) which uses as.mo() internally.

-
library(dplyr)
-# take all MO codes from the septic_patients data set
-x <- septic_patients$mo %>%
-  # keep only the unique ones
-  unique() %>%
-  # pick 50 of them at random
-  sample(50) %>%
-  # paste that 10,000 times
-  rep(10000) %>%
-  # scramble it
-  sample()
-  
-# got indeed 50 times 10,000 = half a million?
-length(x)
-#> [1] 500000
-
-# and how many unique values do we have?
-n_distinct(x)
-#> [1] 50
-
-# now let's see:
-run_it <- microbenchmark(mo_fullname(x),
-                         times = 10)
-print(run_it, unit = "ms", signif = 3)
-#> Unit: milliseconds
-#>            expr min  lq mean median  uq max neval
-#>  mo_fullname(x) 770 811  822    817 824 952    10
-

So transforming 500,000 values (!!) of 50 unique values only takes 0.82 seconds (816 ms). You only lose time on your unique input values.

+
library(dplyr)
+# take all MO codes from the septic_patients data set
+x <- septic_patients$mo %>%
+  # keep only the unique ones
+  unique() %>%
+  # pick 50 of them at random
+  sample(50) %>%
+  # paste that 10,000 times
+  rep(10000) %>%
+  # scramble it
+  sample()
+  
+# got indeed 50 times 10,000 = half a million?
+length(x)
+#> [1] 500000
+
+# and how many unique values do we have?
+n_distinct(x)
+#> [1] 50
+
+# now let's see:
+run_it <- microbenchmark(mo_fullname(x),
+                         times = 10)
+print(run_it, unit = "ms", signif = 3)
+#> Unit: milliseconds
+#>            expr min  lq mean median  uq  max neval
+#>  mo_fullname(x) 825 845  887    867 903 1080    10
+

So transforming 500,000 values (!!) of 50 unique values only takes 0.87 seconds (867 ms). You only lose time on your unique input values.

Precalculated results

What about precalculated results? If the input is an already precalculated result of a helper function like mo_fullname(), it almost doesn’t take any time at all (see ‘C’ below):

- -

So going from mo_fullname("Staphylococcus aureus") to "Staphylococcus aureus" takes 0.0008 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:

- +

So going from mo_fullname("Staphylococcus aureus") to "Staphylococcus aureus" takes 0.0008 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:

+

Of course, when running mo_phylum("Firmicutes") the function has zero knowledge about the actual microorganism, namely S. aureus. But since the result would be "Firmicutes" too, there is no point in calculating the result. And because this package ‘knows’ all phyla of all known bacteria (according to the Catalogue of Life), it can just return the initial value immediately.

Results in other languages

When the system language is non-English and supported by this AMR package, some functions will have a translated result. This almost does’t take extra time:

-
mo_fullname("CoNS", language = "en") # or just mo_fullname("CoNS") on an English system
-#> [1] "Coagulase-negative Staphylococcus (CoNS)"
-
-mo_fullname("CoNS", language = "es") # or just mo_fullname("CoNS") on a Spanish system
-#> [1] "Staphylococcus coagulasa negativo (CoNS)"
-
-mo_fullname("CoNS", language = "nl") # or just mo_fullname("CoNS") on a Dutch system
-#> [1] "Coagulase-negatieve Staphylococcus (CNS)"
-
-run_it <- microbenchmark(en = mo_fullname("CoNS", language = "en"),
-                         de = mo_fullname("CoNS", language = "de"),
-                         nl = mo_fullname("CoNS", language = "nl"),
-                         es = mo_fullname("CoNS", language = "es"),
-                         it = mo_fullname("CoNS", language = "it"),
-                         fr = mo_fullname("CoNS", language = "fr"),
-                         pt = mo_fullname("CoNS", language = "pt"),
-                         times = 10)
-print(run_it, unit = "ms", signif = 4)
-#> Unit: milliseconds
-#>  expr   min    lq  mean median    uq   max neval
-#>    en 19.22 19.33 20.42  19.58 19.84 28.13    10
-#>    de 31.28 31.62 41.16  32.79 34.86 75.79    10
-#>    nl 31.56 31.71 36.86  31.97 33.34 78.40    10
-#>    es 31.32 31.94 42.76  32.98 41.72 81.33    10
-#>    it 31.31 31.67 31.96  31.90 32.15 33.26    10
-#>    fr 31.09 31.43 37.49  32.53 33.73 75.53    10
-#>    pt 31.24 31.82 36.35  31.95 32.12 76.57    10
+
mo_fullname("CoNS", language = "en") # or just mo_fullname("CoNS") on an English system
+#> [1] "Coagulase-negative Staphylococcus (CoNS)"
+
+mo_fullname("CoNS", language = "es") # or just mo_fullname("CoNS") on a Spanish system
+#> [1] "Staphylococcus coagulasa negativo (SCN)"
+
+mo_fullname("CoNS", language = "nl") # or just mo_fullname("CoNS") on a Dutch system
+#> [1] "Coagulase-negatieve Staphylococcus (CNS)"
+
+run_it <- microbenchmark(en = mo_fullname("CoNS", language = "en"),
+                         de = mo_fullname("CoNS", language = "de"),
+                         nl = mo_fullname("CoNS", language = "nl"),
+                         es = mo_fullname("CoNS", language = "es"),
+                         it = mo_fullname("CoNS", language = "it"),
+                         fr = mo_fullname("CoNS", language = "fr"),
+                         pt = mo_fullname("CoNS", language = "pt"),
+                         times = 10)
+print(run_it, unit = "ms", signif = 4)
+#> Unit: milliseconds
+#>  expr   min    lq  mean median    uq   max neval
+#>    en 18.83 19.19 24.20  19.53 20.67 63.66    10
+#>    de 31.53 32.10 36.79  32.22 33.81 75.80    10
+#>    nl 31.24 31.78 32.22  32.10 32.24 33.43    10
+#>    es 31.40 32.07 45.82  33.08 75.31 76.95    10
+#>    it 31.17 31.95 36.82  32.09 32.19 79.48    10
+#>    fr 31.48 31.64 31.89  31.96 32.07 32.33    10
+#>    pt 31.27 31.66 36.80  32.06 32.34 80.53    10

Currently supported are German, Dutch, Spanish, Italian, French and Portuguese.

diff --git a/docs/articles/benchmarks_files/figure-html/unnamed-chunk-5-1.png b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-5-1.png index d79e8873..882bc183 100644 Binary files a/docs/articles/benchmarks_files/figure-html/unnamed-chunk-5-1.png and b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-5-1.png differ diff --git a/docs/articles/benchmarks_files/figure-html/unnamed-chunk-6-1.png b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-6-1.png index 858c8cd7..d642d2bb 100644 Binary files a/docs/articles/benchmarks_files/figure-html/unnamed-chunk-6-1.png and b/docs/articles/benchmarks_files/figure-html/unnamed-chunk-6-1.png differ diff --git a/docs/articles/freq.html b/docs/articles/freq.html index 94897c26..d9147c73 100644 --- a/docs/articles/freq.html +++ b/docs/articles/freq.html @@ -40,7 +40,7 @@ AMR (for R) - 0.5.0.9023 + 0.6.0 @@ -192,7 +192,7 @@

How to create frequency tables

Matthijs S. Berends

-

15 March 2019

+

27 March 2019

@@ -258,16 +258,17 @@ Longest: 1

colnames(microorganisms)
 #  [1] "mo"         "col_id"     "fullname"   "kingdom"    "phylum"    
 #  [6] "class"      "order"      "family"     "genus"      "species"   
-# [11] "subspecies" "rank"       "ref"        "species_id"
-

If we compare the dimensions between the old and new dataset, we can see that these 13 variables were added:

+# [11] "subspecies" "rank" "ref" "species_id" "source" +# [16] "prevalence" +

If we compare the dimensions between the old and new dataset, we can see that these 15 variables were added:

dim(septic_patients)
 # [1] 2000   49
 dim(my_patients)
-# [1] 2000   62
+# [1] 2000 64

So now the genus and species variables are available. A frequency table of these combined variables can be created like this:

my_patients %>%
   freq(genus, species, nmax = 15)
-

Frequency table of genus and species from a data.frame (2,000 x 62)

+

Frequency table of genus and species from a data.frame (2,000 x 64)

Columns: 2
Length: 2,000 (of which NA: 0 = 0.00%)
Unique: 95

@@ -293,7 +294,7 @@ Longest: 34

- + @@ -604,7 +605,8 @@ Unique: 4

Length: 2,000 (of which NA: 771 = 38.55%)
Levels: 3: S < I < R
Unique: 3

-

%IR: 34.30% (ratio 1:1.3)

+

Drug: Amoxicillin
+%IR: 55.82%

2Staphylococcus coagulase negativeStaphylococcus coagulase-negative 313 15.7% 780
@@ -735,7 +737,8 @@ Median: 31 July 2009 (47.39%)

Length: 2,000 (of which NA: 771 = 38.55%)
Levels: 3: S < I < R
Unique: 4

-

%IR: 34.30% (ratio 1:1.3)

+

Drug: Amoxicillin
+%IR: 55.82%

diff --git a/docs/articles/index.html b/docs/articles/index.html index 9f777dc6..ffcbca60 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -78,7 +78,7 @@ AMR (for R) - 0.5.0.9025 + 0.6.0 diff --git a/docs/articles/mo_property.html b/docs/articles/mo_property.html index 1ba5e03b..a730030e 100644 --- a/docs/articles/mo_property.html +++ b/docs/articles/mo_property.html @@ -40,7 +40,7 @@ AMR (for R) - 0.5.0.9023 + 0.6.0 @@ -192,7 +192,7 @@

How to get properties of a microorganism

Matthijs S. Berends

-

15 March 2019

+

27 March 2019

diff --git a/docs/articles/resistance_predict.html b/docs/articles/resistance_predict.html index b98b3bf3..8a7b74d3 100644 --- a/docs/articles/resistance_predict.html +++ b/docs/articles/resistance_predict.html @@ -40,7 +40,7 @@ AMR (for R) - 0.5.0.9023 + 0.6.0 @@ -192,7 +192,7 @@

How to predict antimicrobial resistance

Matthijs S. Berends

-

15 March 2019

+

27 March 2019

diff --git a/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-4-1.png b/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-4-1.png index 544c7be2..77dfb5f7 100644 Binary files a/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-4-1.png and b/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-4-1.png differ diff --git a/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-5-1.png b/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-5-1.png index 4e99fb8b..0d0b014a 100644 Binary files a/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-5-1.png and b/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-5-1.png differ diff --git a/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-5-2.png b/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-5-2.png index 6090dfa7..420a1eb2 100644 Binary files a/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-5-2.png and b/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-5-2.png differ diff --git a/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-6-1.png b/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-6-1.png index f968ce8b..6edd144d 100644 Binary files a/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-6-1.png and b/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-6-1.png differ diff --git a/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-7-1.png b/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-7-1.png index 4db59da9..8a19589f 100644 Binary files a/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-7-1.png and b/docs/articles/resistance_predict_files/figure-html/unnamed-chunk-7-1.png differ diff --git a/docs/authors.html b/docs/authors.html index dac0b881..63ac49f5 100644 --- a/docs/authors.html +++ b/docs/authors.html @@ -78,7 +78,7 @@ AMR (for R) - 0.5.0.9025 + 0.6.0 diff --git a/docs/extra.js b/docs/extra.js index a8fbe436..876bc3fb 100644 --- a/docs/extra.js +++ b/docs/extra.js @@ -42,7 +42,7 @@ $( document ).ready(function() { ' Learn R reading this great book: R for Data Science.' + '

' + ' Click to read it online - it was published for free.' + - ' ' + + ' ' + ' ' + '
' + ''); @@ -53,7 +53,7 @@ $( document ).ready(function() { '

' + $('footer .copyright p').html().replace( "Developed by", 'AMR (for R). Developed at the University of Groningen.
Authors:') + '

' + - '' + + '' + ''); // doctoral titles of authors diff --git a/docs/index.html b/docs/index.html index 7712b63e..1a6a6a25 100644 --- a/docs/index.html +++ b/docs/index.html @@ -42,7 +42,7 @@ AMR (for R) - 0.5.0.9025 + 0.6.0 @@ -197,7 +197,7 @@

(TLDR - to find out how to conduct AMR analysis, please continue reading here to get started.


AMR is a free and open-source R package to simplify the analysis and prediction of Antimicrobial Resistance (AMR) and to work with microbial and antimicrobial properties by using evidence-based methods. It supports any data format, including WHONET/EARS-Net data.

-

After installing this package, R knows ~65,000 microorganisms and ~500 antibiotics by name and code, and knows all about valid RSI and MIC values.

+

After installing this package, R knows ~65,000 microorganisms and ~500 antibiotics by name and code, and knows all about valid RSI and MIC values.

Used to SPSS? Read our tutorial on how to import data from SPSS, SAS or Stata and learn in which ways R outclasses any of these statistical packages.

We created this package for both academic research and routine analysis at the Faculty of Medical Sciences of the University of Groningen, the Netherlands, and the Medical Microbiology & Infection Prevention (MMBI) department of the University Medical Center Groningen (UMCG). This R package is actively maintained and is free software; you can freely use and distribute it for both personal and commercial (but not patent) purposes under the terms of the GNU General Public License version 2.0 (GPL-2), as published by the Free Software Foundation. Read the full license here.

This package can be used for:

@@ -275,7 +275,7 @@

This package contains the complete taxonomic tree of almost all microorganisms from the authoritative and comprehensive Catalogue of Life (www.catalogueoflife.org).

Included are: