diff --git a/DESCRIPTION b/DESCRIPTION
index 2cbd24a6..5b271565 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -1,6 +1,6 @@
Package: AMR
-Version: 0.9.0.9005
-Date: 2019-12-21
+Version: 0.9.0.9006
+Date: 2019-12-22
Title: Antimicrobial Resistance Analysis
Authors@R: c(
person(role = c("aut", "cre"),
diff --git a/NEWS.md b/NEWS.md
index 3beea51d..e6a6bd6d 100755
--- a/NEWS.md
+++ b/NEWS.md
@@ -1,10 +1,13 @@
-# AMR 0.9.0.9005
-## Last updated: 21-Dec-2019
+# AMR 0.9.0.9006
+## Last updated: 22-Dec-2019
### Changes
* Speed improvement for `as.mo()` (and consequently all `mo_*` functions that use `as.mo()` internally), especially for the *G. species* format (G for genus), like *E. coli* and *K penumoniae*
* Input values for `as.disk()` limited to a maximum of 50 millimeters
+### Other
+* Add a `CITATION` file
+
# AMR 0.9.0
### Breaking
diff --git a/R/globals.R b/R/globals.R
index 8f0ac9d6..8513bb0d 100755
--- a/R/globals.R
+++ b/R/globals.R
@@ -35,6 +35,7 @@ globalVariables(c(".",
"first_isolate_row_index",
"fullname",
"fullname_lower",
+ "g_species",
"genus",
"gramstain",
"group",
diff --git a/docs/404.html b/docs/404.html
index 21f7adb5..56acff37 100644
--- a/docs/404.html
+++ b/docs/404.html
@@ -84,7 +84,7 @@
AMR (for R)
- 0.9.0.9005
+ 0.9.0.9006
diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html
index ea9b3b04..5ca7bd6b 100644
--- a/docs/LICENSE-text.html
+++ b/docs/LICENSE-text.html
@@ -84,7 +84,7 @@
AMR (for R)
- 0.9.0.9005
+ 0.9.0.9006
diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html
index ddceba46..7ef2a707 100644
--- a/docs/articles/AMR.html
+++ b/docs/articles/AMR.html
@@ -41,7 +41,7 @@
AMR (for R)
- 0.9.0.9005
+ 0.9.0.9006
@@ -187,7 +187,7 @@
How to conduct AMR analysis
Matthijs S. Berends
-
21 December 2019
+
22 December 2019
AMR.Rmd
@@ -196,7 +196,7 @@
-
Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 21 December 2019.
+
Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 22 December 2019.
So, we can draw at least two conclusions immediately. From a data scientists perspective, the data looks clean: only values M and F. From a researchers perspective: there are slightly more men. Nothing we didn’t already know.
The data is already quite clean, but we still need to transform some variables. The bacteria column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate() function of the dplyr package makes this really easy:
For future use, the above two syntaxes can be shortened with the filter_first_isolate() function:
@@ -525,7 +525,7 @@
First weighted isolates
-
We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient Y10, sorted on date:
+
We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient C4, sorted on date:
isolate
@@ -541,21 +541,21 @@
1
-
2010-06-10
-
Y10
+
2010-02-20
+
C4
B_ESCHR_COLI
S
S
-
R
+
S
S
TRUE
2
-
2010-09-27
-
Y10
+
2010-03-07
+
C4
B_ESCHR_COLI
-
S
+
R
S
S
S
@@ -563,8 +563,8 @@
3
-
2010-11-27
-
Y10
+
2010-06-08
+
C4
B_ESCHR_COLI
S
S
@@ -574,63 +574,19 @@
4
-
2010-12-06
-
Y10
+
2010-10-24
+
C4
B_ESCHR_COLI
S
S
-
R
+
S
S
FALSE
5
-
2011-02-18
-
Y10
-
B_ESCHR_COLI
-
R
-
S
-
S
-
S
-
FALSE
-
-
-
6
-
2011-04-05
-
Y10
-
B_ESCHR_COLI
-
R
-
R
-
S
-
R
-
FALSE
-
-
-
7
-
2011-05-29
-
Y10
-
B_ESCHR_COLI
-
R
-
S
-
S
-
R
-
FALSE
-
-
-
8
-
2011-06-04
-
Y10
-
B_ESCHR_COLI
-
S
-
S
-
S
-
S
-
FALSE
-
-
-
9
-
2011-06-14
-
Y10
+
2011-04-27
+
C4
B_ESCHR_COLI
S
S
@@ -639,9 +595,20 @@
TRUE
-
10
-
2011-11-02
-
Y10
+
6
+
2011-05-13
+
C4
+
B_ESCHR_COLI
+
R
+
S
+
S
+
S
+
FALSE
+
+
+
7
+
2011-12-08
+
C4
B_ESCHR_COLI
S
S
@@ -649,9 +616,42 @@
S
FALSE
+
+
8
+
2012-05-22
+
C4
+
B_ESCHR_COLI
+
I
+
S
+
S
+
S
+
TRUE
+
+
+
9
+
2012-06-10
+
C4
+
B_ESCHR_COLI
+
S
+
S
+
S
+
S
+
FALSE
+
+
+
10
+
2012-08-14
+
C4
+
B_ESCHR_COLI
+
R
+
S
+
S
+
S
+
FALSE
+
-
Only 2 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics() function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.
+
Only 3 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics() function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.
If a column exists with a name like ‘key(…)ab’ the first_isolate() function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output:
Instead of 2, now 9 isolates are flagged. In total, 75.1% of all isolates are marked ‘first weighted’ - 46.6% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
+
Instead of 3, now 9 isolates are flagged. In total, 74.7% of all isolates are marked ‘first weighted’ - 46.2% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
As per the EUCAST guideline of 2019, we calculate resistance as the proportion of R (proportion_R(), equal to resistance()) and susceptibility as the proportion of S and I (proportion_SI(), equal to susceptibility()). These functions can be used on their own:
In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 5 milliseconds means it can determine 200 input values per second. It case of 100 milliseconds, this is only 10 input values per second. The second input is the only one that has to be looked up thoroughly. All the others are known codes (the first one is a WHONET code) or common laboratory codes, or common full organism names like the last one. Full organism names are always preferred.
To achieve this speed, the as.mo function also takes into account the prevalence of human pathogenic microorganisms. The downside is of course that less prevalent microorganisms will be determined less fast. See this example for the ID of Methanosarcina semesiae (B_MTHNSR_SEMS), a bug probably never found before in humans:
That takes 15.5 times as much time on average. A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance. Full names (like Methanosarcina semesiae) are always very fast and only take some thousands of seconds to coerce - they are the most probable input from most data sets.
That takes 6.2 times as much time on average. A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance. Full names (like Methanosarcina semesiae) are always very fast and only take some thousands of seconds to coerce - they are the most probable input from most data sets.
In the figure below, we compare Escherichia coli (which is very common) with Prevotella brevis (which is moderately common) and with Methanosarcina semesiae (which is uncommon):
The highest outliers are the first times. All next determinations were done in only thousands of seconds, because the as.mo() function learns from its own output to speed up determinations for next times.
So going from mo_name("Staphylococcus aureus") to "Staphylococcus aureus" takes 0.0008 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:
Of course, when running mo_phylum("Firmicutes") the function has zero knowledge about the actual microorganism, namely S. aureus. But since the result would be "Firmicutes" too, there is no point in calculating the result. And because this package ‘knows’ all phyla of all known bacteria (according to the Catalogue of Life), it can just return the initial value immediately.
Berends MS, Luz CF, Friedrich AW, Sinha BNM, Albers CJ, Glasner C (2019).
+“AMR - An R Package for Working with Antimicrobial Resistance Data.”
+bioRxiv.
+doi: 10.1101/810622, https://doi.org/10.1101/810622.
+
+
@Article{,
+ title = {AMR - An R Package for Working with Antimicrobial Resistance Data},
+ author = {M S Berends and C F Luz and A W Friedrich and B N M Sinha and C J Albers and C Glasner},
+ journal = {bioRxiv},
+ publisher = {Cold Spring Harbor Laboratory},
+ year = {2019},
+ doi = {10.1101/810622},
+ url = {https://doi.org/10.1101/810622},
+}
+
Authors
diff --git a/docs/countries.png b/docs/countries.png
index fd6f4c8d..74b9202a 100644
Binary files a/docs/countries.png and b/docs/countries.png differ
diff --git a/docs/countries_large.png b/docs/countries_large.png
index 9bd54edf..ca358f6d 100644
Binary files a/docs/countries_large.png and b/docs/countries_large.png differ
diff --git a/docs/index.html b/docs/index.html
index 09b83360..42241438 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -45,7 +45,7 @@
AMR (for R)
- 0.9.0.9005
+ 0.9.0.9006
@@ -205,8 +205,8 @@ A methods paper about this package has been preprinted at bioRxiv (DOI: 10.1101/
- Used in almost 80 countries
- Since its first public release in early 2018, this package has been downloaded over 25,000 times from 79 countries (as of December 2019, CRAN logs). Click the map to enlarge.
+ Used in 80 countries
+ Since its first public release in early 2018, this package has been downloaded over 25,000 times from 80 countries (as of December 2019, CRAN logs). Click the map to enlarge.
@@ -410,6 +410,12 @@ A methods paper about this package has been preprinted at bioRxiv (DOI: 10.1101/
Input values for as.disk() limited to a maximum of 50 millimeters
+
+
+Other
+
+
Add a CITATION file
+
+
@@ -339,9 +346,9 @@
-
+
-Other
+Other
Rewrote the complete documentation to markdown format, to be able to use the very latest version of the great Roxygen2, released in November 2019. This tremously improved the documentation quality, since the rewrite forced us to go over all texts again and make changes where needed.
Change dependency on clean to cleaner, as this package was renamed accordingly upon CRAN request
@@ -490,9 +497,9 @@ Since this is a major change, usage of the old also_single_tested w
Added Prof. Dr. Casper Albers as doctoral advisor and added Dr. Judith Fonville, Eric Hazenberg, Dr. Bart Meijer, Dr. Dennis Souverein and Annick Lenglet as contributors
Cleaned the coding style of every single syntax line in this package with the help of the lintr package
@@ -573,9 +580,9 @@ Since this is a major change, usage of the old also_single_tested w
-
+
-Other
+Other
Fixed a note thrown by CRAN tests
@@ -669,9 +676,9 @@ Please mo_shortname() where species would not be determined correctly
-
+
-Other
+Other
Support for R 3.6.0 and later by providing support for staged install
diff --git a/index.md b/index.md
index e0991cce..9bca014c 100644
--- a/index.md
+++ b/index.md
@@ -17,8 +17,8 @@ We created this package for both routine analysis and academic research (as part
- Used in almost 80 countries
- Since its first public release in early 2018, this package has been downloaded over 25,000 times from 79 countries (as of December 2019, CRAN logs). Click the map to enlarge.
+ Used in 80 countries
+ Since its first public release in early 2018, this package has been downloaded over 25,000 times from 80 countries (as of December 2019, CRAN logs). Click the map to enlarge.
#### Partners
diff --git a/inst/CITATION b/inst/CITATION
new file mode 100644
index 00000000..61aa5b7b
--- /dev/null
+++ b/inst/CITATION
@@ -0,0 +1,15 @@
+citHeader("To cite our AMR package in publications, please use:")
+
+citEntry(
+ entry = "Article",
+ title = "AMR - An R Package for Working with Antimicrobial Resistance Data",
+ author = "M S Berends and C F Luz and A W Friedrich and B N M Sinha and C J Albers and C Glasner",
+ journal = "bioRxiv",
+ publisher = "Cold Spring Harbor Laboratory",
+ year = 2019,
+ doi = "10.1101/810622",
+ url = "https://doi.org/10.1101/810622",
+ textVersion = "Berends MS, Luz CF et al. (2019). AMR - An R Package for Working with Antimicrobial Resistance Data. bioRxiv, https://doi.org/10.1101/810622"
+)
+
+citFooter("Many thanks for using our open-source method to work with microbial and antimicrobial data!")
diff --git a/pkgdown/logos/countries.png b/pkgdown/logos/countries.png
index fd6f4c8d..74b9202a 100644
Binary files a/pkgdown/logos/countries.png and b/pkgdown/logos/countries.png differ
diff --git a/pkgdown/logos/countries_large.png b/pkgdown/logos/countries_large.png
index 9bd54edf..ca358f6d 100644
Binary files a/pkgdown/logos/countries_large.png and b/pkgdown/logos/countries_large.png differ
diff --git a/vignettes/benchmarks.Rmd b/vignettes/benchmarks.Rmd
index fabce857..e67cd5bd 100755
--- a/vignettes/benchmarks.Rmd
+++ b/vignettes/benchmarks.Rmd
@@ -21,7 +21,7 @@ knitr::opts_chunk$set(
fig.width = 7.5,
fig.height = 4.5,
dpi = 75
-)
+)
options(AMR_disable_mo_history = FALSE)
```