diff --git a/DESCRIPTION b/DESCRIPTION index 1bda40fc..5b13cd75 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR Version: 0.5.0.9018 -Date: 2019-02-21 +Date: 2019-02-22 Title: Antimicrobial Resistance Analysis Authors@R: c( person( diff --git a/NAMESPACE b/NAMESPACE index 23c2c8f5..a4708cc3 100755 --- a/NAMESPACE +++ b/NAMESPACE @@ -120,6 +120,7 @@ export(mo_kingdom) export(mo_order) export(mo_phylum) export(mo_property) +export(mo_rank) export(mo_ref) export(mo_renamed) export(mo_shortname) @@ -268,5 +269,6 @@ importFrom(stats,mad) importFrom(stats,pchisq) importFrom(stats,predict) importFrom(stats,sd) +importFrom(utils,browseURL) importFrom(utils,browseVignettes) importFrom(utils,installed.packages) diff --git a/NEWS.md b/NEWS.md index fca8315b..79a22b84 100755 --- a/NEWS.md +++ b/NEWS.md @@ -18,8 +18,10 @@ We've got a new website: [https://msberends.gitlab.io/AMR](https://msberends.git * All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed * The responsible author(s) and year of scientific publication - This data is updated annually - check the included version with `catalogue_of_life_version()`. + This data is updated annually - check the included version with `catalogue_of_life_version()`. * Due to this change, some `mo` codes changed (e.g. *Streptococcus* changed from `B_STRPTC` to `B_STRPT`). A translation table is used internally to support older microorganism IDs, so users will not notice this difference. +* New function `mo_rank()` for the taxonomic rank (genus, species, infraspecies, etc.) +* New function `mo_url()` to get the URL to the Catalogue of Life * Support for data from [WHONET](https://whonet.org/) and [EARS-Net](https://ecdc.europa.eu/en/about-us/partnerships-and-networks/disease-and-laboratory-networks/ears-net) (European Antimicrobial Resistance Surveillance Network): * Exported files from WHONET can be read and used in this package. For functions like `first_isolate()` and `eucast_rules()`, all parameters will be filled in automatically. * This package now knows all antibiotic abbrevations by EARS-Net (which are also being used by WHONET) - the `antibiotics` data set now contains a column `ears_net`. diff --git a/R/catalogue_of_life.R b/R/catalogue_of_life.R index 480ab496..89fa6507 100755 --- a/R/catalogue_of_life.R +++ b/R/catalogue_of_life.R @@ -29,7 +29,7 @@ #' Included are: #' \itemize{ #' \item{All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses} -#' \item{All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant (sub)species are covered (like all species of \emph{Aspergillus}, \emph{Candida}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} +#' \item{All ~3,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of \emph{Aspergillus}, \emph{Candida}, \emph{Cryptococcus}, \emph{Histplasma}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} #' \item{All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed} #' \item{The complete taxonomic tree of all included (sub)species: from kingdom to subspecies} #' \item{The responsible author(s) and year of scientific publication} diff --git a/R/data.R b/R/data.R index 6e8d1840..45d43fca 100755 --- a/R/data.R +++ b/R/data.R @@ -134,7 +134,7 @@ #' #' A data set containing the microbial taxonomy of six kingdoms from the Catalogue of Life. MO codes can be looked up using \code{\link{as.mo}}. #' @inheritSection catalogue_of_life Catalogue of Life -#' @format A \code{\link{data.frame}} with 56,672 observations and 14 variables: +#' @format A \code{\link{data.frame}} with 57,158 observations and 14 variables: #' \describe{ #' \item{\code{mo}}{ID of microorganism as used by this package} #' \item{\code{col_id}}{Catalogue of Life ID} @@ -163,6 +163,7 @@ "microorganisms" catalogue_of_life <- list( + year = 2018, version = "Catalogue of Life: 2018 Annual Checklist", url = "http://www.catalogueoflife.org/annual-checklist/2018" ) @@ -175,6 +176,8 @@ catalogue_of_life <- list( catalogue_of_life_version <- function() { list(version = catalogue_of_life$version, url = catalogue_of_life$url, + # annual release always somewhere in March + is_latest_annual_release = Sys.Date() < as.Date(paste0(catalogue_of_life$year + 1, "-04-01")), no_of_species = nrow(AMR::microorganisms), no_of_synonyms = nrow(AMR::microorganisms.old)) } @@ -183,7 +186,7 @@ catalogue_of_life_version <- function() { #' #' A data set containing old (previously valid or accepted) taxonomic names according to the Catalogue of Life. This data set is used internally by \code{\link{as.mo}}. #' @inheritSection catalogue_of_life Catalogue of Life -#' @format A \code{\link{data.frame}} with 14,506 observations and 4 variables: +#' @format A \code{\link{data.frame}} with 14,487 observations and 4 variables: #' \describe{ #' \item{\code{col_id}}{Catalogue of Life ID} #' \item{\code{tsn_new}}{New Catalogue of Life ID} diff --git a/R/mo_property.R b/R/mo_property.R index cc938e1f..49c4624f 100755 --- a/R/mo_property.R +++ b/R/mo_property.R @@ -26,6 +26,7 @@ #' @param property one of the column names of one of the \code{\link{microorganisms}} data set or \code{"shortname"} #' @param language language of the returned text, defaults to system language (see \code{\link{get_locale}}) and can also be set with \code{\link{getOption}("AMR_locale")}. Use \code{language = NULL} or \code{language = ""} to prevent translation. #' @param ... other parameters passed on to \code{\link{as.mo}} +#' @param open browse the URL using \code{\link[utils]{browseURL}} #' @details All functions will return the most recently known taxonomic property according to the Catalogue of Life, except for \code{mo_ref}, \code{mo_authors} and \code{mo_year}. This leads to the following results: #' \itemize{ #' \item{\code{mo_fullname("Chlamydia psittaci")} will return \code{"Chlamydophila psittaci"} (with a warning about the renaming)} @@ -44,14 +45,14 @@ #' @return \itemize{ #' \item{An \code{integer} in case of \code{mo_year}} #' \item{A \code{list} in case of \code{mo_taxonomy}} +#' \item{A named \code{character} in case of \code{mo_url}} #' \item{A \code{character} in all other cases} #' } #' @export #' @seealso \code{\link{microorganisms}} #' @inheritSection AMR Read more on our website! #' @examples -#' # All properties of Escherichia coli -#' ## taxonomic properties +#' ## taxonomic tree #' mo_kingdom("E. coli") # "Bacteria" #' mo_phylum("E. coli") # "Proteobacteria" #' mo_class("E. coli") # "Gammaproteobacteria" @@ -68,10 +69,12 @@ #' ## other properties #' mo_gramstain("E. coli") # "Gram negative" #' mo_type("E. coli") # "Bacteria" (equal to kingdom) +#' mo_rank("E. coli") # "species" +#' mo_url("E. coli") # get the direct url to the Catalogue of Life #' #' ## scientific reference -#' mo_ref("E. coli") # "Castellani and Chalmers, 1919" -#' mo_authors("E. coli") # "Castellani and Chalmers" +#' mo_ref("E. coli") # "Castellani et al., 1919" +#' mo_authors("E. coli") # "Castellani et al." #' mo_year("E. coli") # 1919 #' #' @@ -107,7 +110,7 @@ #' mo_shortname("S. pyo", Lancefield = TRUE) # "GAS" ('Group A streptococci') #' #' -#' # Language support for German, Dutch, Spanish, Portuguese, Italian and French +#' # language support for German, Dutch, Spanish, Portuguese, Italian and French #' mo_gramstain("E. coli", language = "de") # "Gramnegativ" #' mo_gramstain("E. coli", language = "nl") # "Gram-negatief" #' mo_gramstain("E. coli", language = "es") # "Gram negativo" @@ -125,7 +128,7 @@ #' language = "nl") # "Streptococcus groep A" #' #' -#' # Get a list with the complete taxonomy (kingdom to subspecies) +#' # get a list with the complete taxonomy (kingdom to subspecies) #' mo_taxonomy("E. coli") mo_fullname <- function(x, language = get_locale(), ...) { x <- mo_validate(x = x, property = "fullname", ...) @@ -259,9 +262,9 @@ mo_ref <- function(x, ...) { #' @export mo_authors <- function(x, ...) { x <- mo_validate(x = x, property = "ref", ...) - # remove last 4 digits and presumably the comma and space that preceed them + # remove last 4 digits and presumably the comma and space that preceeds them x[!is.na(x)] <- gsub(",? ?[0-9]{4}", "", x[!is.na(x)]) - x + suppressWarnings(x) } #' @rdname mo_property @@ -270,7 +273,13 @@ mo_year <- function(x, ...) { x <- mo_validate(x = x, property = "ref", ...) # get last 4 digits x[!is.na(x)] <- gsub(".*([0-9]{4})$", "\\1", x[!is.na(x)]) - as.integer(x) + suppressWarnings(as.integer(x)) +} + +#' @rdname mo_property +#' @export +mo_rank <- function(x, ...) { + mo_validate(x = x, property = "rank", ...) } #' @rdname mo_property @@ -288,10 +297,18 @@ mo_taxonomy <- function(x, ...) { } #' @rdname mo_property +#' @importFrom utils browseURL #' @export -mo_url <- function(x, ...) { +mo_url <- function(x, open = FALSE, ...) { u <- mo_validate(x = x, property = "species_id", ...) u[u != ""] <- paste0(catalogue_of_life$url, "/details/species/id/", u) + names(u) <- mo_fullname(x = x, ... = ...) + if (open == TRUE) { + if (length(u) > 1) { + warning("only the first URL will be opened, as `browseURL` only suports one string.") + } + browseURL(u[1L]) + } u } diff --git a/data/microorganisms.old.rda b/data/microorganisms.old.rda index 027d5334..b35d6827 100644 Binary files a/data/microorganisms.old.rda and b/data/microorganisms.old.rda differ diff --git a/data/microorganisms.rda b/data/microorganisms.rda index 50db4fbe..b1cd6266 100755 Binary files a/data/microorganisms.rda and b/data/microorganisms.rda differ diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index 89199875..f9e3bbcb 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -192,7 +192,7 @@
AMR.Rmd
Note: values on this page will change with every website update since they are based on randomly created values and the page was written in RMarkdown. However, the methodology remains unchanged. This page was generated on 21 February 2019.
+Note: values on this page will change with every website update since they are based on randomly created values and the page was written in RMarkdown. However, the methodology remains unchanged. This page was generated on 22 February 2019.
So, we can draw at least two conclusions immediately. From a data scientist perspective, the data looks clean: only values M
and F
. From a researcher perspective: there are slightly more men. Nothing we didn’t already know.
The data is already quite clean, but we still need to transform some variables. The bacteria
column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate()
function of the dplyr
package makes this really easy:
data <- data %>%
@@ -443,10 +443,10 @@
#> Kingella kingae (no changes)
#>
#> EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
-#> Table 1: Intrinsic resistance in Enterobacteriaceae (1333 changes)
+#> Table 1: Intrinsic resistance in Enterobacteriaceae (1284 changes)
#> Table 2: Intrinsic resistance in non-fermentative Gram-negative bacteria (no changes)
#> Table 3: Intrinsic resistance in other Gram-negative bacteria (no changes)
-#> Table 4: Intrinsic resistance in Gram-positive bacteria (2733 changes)
+#> Table 4: Intrinsic resistance in Gram-positive bacteria (2790 changes)
#> Table 8: Interpretive rules for B-lactam agents and Gram-positive cocci (no changes)
#> Table 9: Interpretive rules for B-lactam agents and Gram-negative rods (no changes)
#> Table 10: Interpretive rules for B-lactam agents and other Gram-negative bacteria (no changes)
@@ -462,9 +462,9 @@
#> Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no changes)
#> Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no changes)
#>
-#> => EUCAST rules affected 7,452 out of 20,000 rows
+#> => EUCAST rules affected 7,321 out of 20,000 rows
#> -> added 0 test results
-#> -> changed 4,066 test results (0 to S; 0 to I; 4,066 to R)
So only 28.5% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
So only 28.4% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
For future use, the above two syntaxes can be shortened with the filter_first_isolate()
function:
isolate | @@ -654,34 +654,34 @@||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | -2010-01-14 | -O6 | +2010-01-10 | +X9 | B_ESCHR_COL | R | -I | -R | +S | +S | S | TRUE | TRUE | |
2 | -2010-02-22 | -O6 | +2010-04-18 | +X9 | B_ESCHR_COL | -S | -S | +R | +I | S | S | FALSE | -TRUE | +FALSE |
3 | -2010-04-01 | -O6 | +2010-07-02 | +X9 | B_ESCHR_COL | -S | +R | S | S | S | @@ -690,44 +690,32 @@||||
4 | -2010-04-25 | -O6 | +2010-09-21 | +X9 | B_ESCHR_COL | +R | S | -S | -S | -S | -FALSE | -FALSE | -||
5 | -2010-05-09 | -O6 | -B_ESCHR_COL | -S | -S | -S | -S | -FALSE | -FALSE | -|||||
6 | -2010-05-29 | -O6 | -B_ESCHR_COL | -S | -I | R | S | FALSE | TRUE | |||||
7 | -2010-06-27 | -O6 | +5 | +2010-09-22 | +X9 | +B_ESCHR_COL | +R | +S | +S | +S | +FALSE | +TRUE | +||
6 | +2010-10-06 | +X9 | B_ESCHR_COL | S | S | @@ -736,22 +724,34 @@FALSE | TRUE | |||||||
7 | +2010-10-14 | +X9 | +B_ESCHR_COL | +R | +S | +S | +S | +FALSE | +TRUE | +|||||
8 | -2010-06-27 | -O6 | +2011-01-09 | +X9 | B_ESCHR_COL | S | +I | S | -S | -S | -FALSE | +R | FALSE | +TRUE |
9 | -2011-02-01 | -O6 | +2011-03-31 | +X9 | B_ESCHR_COL | R | S | @@ -762,23 +762,23 @@|||||||
10 | -2011-03-20 | -O6 | +2011-03-31 | +X9 | B_ESCHR_COL | S | S | -S | +R | S | FALSE | TRUE |
Instead of 2, now 6 isolates are flagged. In total, 79.3% of all isolates are marked ‘first weighted’ - 50.8% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
+Instead of 2, now 8 isolates are flagged. In total, 79.3% of all isolates are marked ‘first weighted’ - 50.9% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
As with filter_first_isolate()
, there’s a shortcut for this new algorithm too:
So we end up with 15,851 isolates for analysis.
+So we end up with 15,854 isolates for analysis.
We can remove unneeded columns:
@@ -803,10 +803,58 @@Or can be used like the dplyr
way, which is easier readable:
Frequency table of genus
and species
from a data.frame
(15,851 x 13)
Frequency table of genus
and species
from a data.frame
(15,854 x 13)
Columns: 2
-Length: 15,851 (of which NA: 0 = 0.00%)
+Length: 15,854 (of which NA: 0 = 0.00%)
Unique: 4
Shortest: 16
Longest: 24
The functions portion_R
, portion_RI
, portion_I
, portion_IS
and portion_S
can be used to determine the portion of a specific antimicrobial outcome. They can be used on their own:
Or can be used in conjuction with group_by()
and summarise()
, both from the dplyr
package:
data_1st %>%
group_by(hospital) %>%
@@ -984,19 +984,19 @@ Longest: 24
Hospital A
-0.4722165
+0.4737395
Hospital B
-0.4788707
+0.4763709
Hospital C
-0.4670535
+0.4739257
Hospital D
-0.4859994
+0.4636854
@@ -1014,23 +1014,23 @@ Longest: 24
Hospital A
-0.4722165
-4841
+0.4737395
+4760
Hospital B
-0.4788707
-5490
+0.4763709
+5544
Hospital C
-0.4670535
-2413
+0.4739257
+2397
Hospital D
-0.4859994
-3107
+0.4636854
+3153
@@ -1050,27 +1050,27 @@ Longest: 24
Escherichia
-0.7324589
-0.9016936
-0.9759328
+0.7350341
+0.9051528
+0.9761303
Klebsiella
-0.7221195
-0.9081947
-0.9821319
+0.7274536
+0.9177719
+0.9781167
Staphylococcus
-0.7420746
-0.9163074
-0.9792037
+0.7432570
+0.9216285
+0.9788804
Streptococcus
-0.7203947
+0.7273819
0.0000000
-0.7203947
+0.7273819
diff --git a/docs/articles/AMR_files/figure-html/plot 1-1.png b/docs/articles/AMR_files/figure-html/plot 1-1.png
index c531478d..8765e873 100644
Binary files a/docs/articles/AMR_files/figure-html/plot 1-1.png and b/docs/articles/AMR_files/figure-html/plot 1-1.png differ
diff --git a/docs/articles/AMR_files/figure-html/plot 3-1.png b/docs/articles/AMR_files/figure-html/plot 3-1.png
index fe381070..139f69f6 100644
Binary files a/docs/articles/AMR_files/figure-html/plot 3-1.png and b/docs/articles/AMR_files/figure-html/plot 3-1.png differ
diff --git a/docs/articles/AMR_files/figure-html/plot 4-1.png b/docs/articles/AMR_files/figure-html/plot 4-1.png
index 9919bc8c..25e89498 100644
Binary files a/docs/articles/AMR_files/figure-html/plot 4-1.png and b/docs/articles/AMR_files/figure-html/plot 4-1.png differ
diff --git a/docs/articles/AMR_files/figure-html/plot 5-1.png b/docs/articles/AMR_files/figure-html/plot 5-1.png
index 5154ee14..cb7bf92f 100644
Binary files a/docs/articles/AMR_files/figure-html/plot 5-1.png and b/docs/articles/AMR_files/figure-html/plot 5-1.png differ
diff --git a/docs/articles/EUCAST.html b/docs/articles/EUCAST.html
index 1ac9719f..082df783 100644
--- a/docs/articles/EUCAST.html
+++ b/docs/articles/EUCAST.html
@@ -192,7 +192,7 @@
How to apply EUCAST rules
Matthijs S. Berends
- 21 February 2019
+ 22 February 2019
EUCAST.Rmd
diff --git a/docs/articles/G_test.html b/docs/articles/G_test.html
index 7298828d..0742e808 100644
--- a/docs/articles/G_test.html
+++ b/docs/articles/G_test.html
@@ -192,7 +192,7 @@
How to use the G-test
Matthijs S. Berends
- 21 February 2019
+ 22 February 2019
G_test.Rmd
diff --git a/docs/articles/WHONET.html b/docs/articles/WHONET.html
index eb202bad..e6aea648 100644
--- a/docs/articles/WHONET.html
+++ b/docs/articles/WHONET.html
@@ -192,7 +192,7 @@
How to work with WHONET data
Matthijs S. Berends
- 21 February 2019
+ 22 February 2019
WHONET.Rmd
diff --git a/docs/articles/atc_property.html b/docs/articles/atc_property.html
index 61e7491b..54772c9d 100644
--- a/docs/articles/atc_property.html
+++ b/docs/articles/atc_property.html
@@ -192,7 +192,7 @@
How to get properties of an antibiotic
Matthijs S. Berends
- 21 February 2019
+ 22 February 2019
atc_property.Rmd
diff --git a/docs/articles/benchmarks.html b/docs/articles/benchmarks.html
index 8095f07d..7557f953 100644
--- a/docs/articles/benchmarks.html
+++ b/docs/articles/benchmarks.html
@@ -192,7 +192,7 @@
Benchmarks
Matthijs S. Berends
- 21 February 2019
+ 22 February 2019
benchmarks.Rmd
@@ -214,29 +214,18 @@
as.mo("S. aureus"),
as.mo("STAAUR"),
as.mo("Staphylococcus aureus"),
- as.mo("B_STPHY_AUR"),
- times = 10)
-print(S.aureus, unit = "ms", signif = 2)
-#> Unit: milliseconds
-#> expr min lq mean median uq max
-#> as.mo("sau") 100.00 100.00 110.00 100.00 100.00 160.00
-#> as.mo("stau") 140.00 140.00 170.00 160.00 190.00 200.00
-#> as.mo("staaur") 99.00 100.00 100.00 100.00 100.00 110.00
-#> as.mo("S. aureus") 64.00 64.00 65.00 65.00 66.00 67.00
-#> as.mo("S. aureus") 65.00 65.00 70.00 66.00 66.00 110.00
-#> as.mo("STAAUR") 97.00 98.00 100.00 100.00 100.00 100.00
-#> as.mo("Staphylococcus aureus") 35.00 35.00 36.00 36.00 37.00 38.00
-#> as.mo("B_STPHY_AUR") 0.34 0.47 0.52 0.48 0.56 0.89
-#> neval
-#> 10
-#> 10
-#> 10
-#> 10
-#> 10
-#> 10
-#> 10
-#> 10
In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 10 milliseconds means it can determine 100 input values per second. It case of 50 milliseconds, this is only 20 input values per second. The more an input value resembles a full name, the faster the result will be found. In case of as.mo("B_STPHY_AUR")
, the input is already a valid MO code, so it only almost takes no time at all (476 millionths of a second).
In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 10 milliseconds means it can determine 100 input values per second. It case of 50 milliseconds, this is only 20 input values per second. The more an input value resembles a full name, the faster the result will be found.
To achieve this speed, the as.mo
function also takes into account the prevalence of human pathogenic microorganisms. The downside is of course that less prevalent microorganisms will be determined less fast. See this example for the ID of Mycoplasma leonicaptivi (B_MYCPL_LEO
), a bug probably never found before in humans:
M.leonicaptivi <- microbenchmark(as.mo("myle"),
as.mo("mycleo"),
@@ -244,134 +233,127 @@
as.mo("M. leonicaptivi"),
as.mo("MYCLEO"),
as.mo("Mycoplasma leonicaptivi"),
- as.mo("B_MYCPL_LEO"),
- times = 10)
-print(M.leonicaptivi, unit = "ms", signif = 2)
-#> Unit: milliseconds
-#> expr min lq mean median uq max
-#> as.mo("myle") 210.00 220.00 240.0 230.00 260.00 310
-#> as.mo("mycleo") 610.00 630.00 680.0 680.00 720.00 770
-#> as.mo("M. leonicaptivi") 370.00 370.00 390.0 390.00 410.00 410
-#> as.mo("M. leonicaptivi") 350.00 350.00 390.0 390.00 410.00 480
-#> as.mo("MYCLEO") 630.00 650.00 680.0 670.00 680.00 880
-#> as.mo("Mycoplasma leonicaptivi") 250.00 250.00 260.0 250.00 260.00 290
-#> as.mo("B_MYCPL_LEO") 0.35 0.43 5.6 0.69 0.75 50
-#> neval
-#> 10
-#> 10
-#> 10
-#> 10
-#> 10
-#> 10
-#> 10
That takes 4.7 times as much time on average! A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance:
+ times = 10) +print(M.leonicaptivi, unit = "ms", signif = 3) +#> Unit: milliseconds +#> expr min lq mean median uq max neval +#> as.mo("myle") 141 142 162 142 142 299 10 +#> as.mo("mycleo") 479 481 520 525 530 634 10 +#> as.mo("M. leonicaptivi") 241 242 273 263 281 382 10 +#> as.mo("M. leonicaptivi") 239 241 268 282 283 299 10 +#> as.mo("MYCLEO") 487 520 525 524 528 601 10 +#> as.mo("Mycoplasma leonicaptivi") 152 156 183 174 200 261 10 +That takes 7.3 times as much time on average! A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance.
+In the figure below, we compare Escherichia coli (which is very common) with Prevotella brevis (which is moderately common) and with Mycoplasma leonicaptivi (which is very uncommon):
par(mar = c(5, 16, 4, 2)) # set more space for left margin text (16)
-# highest value on y axis
-max_y_axis <- max(S.aureus$time, M.leonicaptivi$time, na.rm = TRUE) / 1e6
-
-boxplot(S.aureus, horizontal = TRUE, las = 1, unit = "ms", log = FALSE, xlab = "", ylim = c(0, max_y_axis),
- main = expression(paste("Benchmark of ", italic("Staphylococcus aureus"))))
boxplot(M.leonicaptivi, horizontal = TRUE, las = 1, unit = "ms", log = FALSE, xlab = "", ylim = c(0, max_y_axis),
- main = expression(paste("Benchmark of ", italic("Mycoplasma leonicaptivi"))))
To relieve this pitfall and further improve performance, two important calculations take almost no time at all: repetitive results and already precalculated results.
+Uncommon microorganisms take a lot more time than common microorganisms. To relieve this pitfall and further improve performance, two important calculations take almost no time at all: repetitive results and already precalculated results.
Repetitive results mean that unique values are present more than once. Unique values will only be calculated once by as.mo()
. We will use mo_fullname()
for this test - a helper function that returns the full microbial name (genus, species and possibly subspecies) which uses as.mo()
internally.
library(dplyr)
-# take 500,000 random MO codes from the septic_patients data set
-x = septic_patients %>%
- sample_n(500000, replace = TRUE) %>%
- pull(mo)
-
-# got the right length?
-length(x)
-#> [1] 500000
-
-# and how many unique values do we have?
-n_distinct(x)
-#> [1] 95
-
-# now let's see:
-run_it <- microbenchmark(mo_fullname(x),
- times = 10)
-print(run_it, unit = "ms", signif = 3)
-#> Unit: milliseconds
-#> expr min lq mean median uq max neval
-#> mo_fullname(x) 487 499 527 535 538 573 10
So transforming 500,000 values (!) of 95 unique values only takes 0.54 seconds (535 ms). You only lose time on your unique input values.
+library(dplyr)
+# take 500,000 random MO codes from the septic_patients data set
+x = septic_patients %>%
+ sample_n(500000, replace = TRUE) %>%
+ pull(mo)
+
+# got the right length?
+length(x)
+#> [1] 500000
+
+# and how many unique values do we have?
+n_distinct(x)
+#> [1] 95
+
+# now let's see:
+run_it <- microbenchmark(mo_fullname(x),
+ times = 10)
+print(run_it, unit = "ms", signif = 3)
+#> Unit: milliseconds
+#> expr min lq mean median uq max neval
+#> mo_fullname(x) 400 405 463 441 533 558 10
So transforming 500,000 values (!) of 95 unique values only takes 0.44 seconds (441 ms). You only lose time on your unique input values.
What about precalculated results? If the input is an already precalculated result of a helper function like mo_fullname()
, it almost doesn’t take any time at all (see ‘C’ below):
run_it <- microbenchmark(A = mo_fullname("B_STPHY_AUR"),
- B = mo_fullname("S. aureus"),
+run_it <- microbenchmark(A = mo_fullname("B_STPHY_AUR"),
+ B = mo_fullname("S. aureus"),
+ C = mo_fullname("Staphylococcus aureus"),
+ times = 10)
+print(run_it, unit = "ms", signif = 3)
+#> Unit: milliseconds
+#> expr min lq mean median uq max neval
+#> A 39.000 39.80 40.000 40.100 40.300 41.100 10
+#> B 24.400 24.70 25.000 24.900 25.200 25.600 10
+#> C 0.294 0.39 0.422 0.401 0.505 0.535 10
+So going from mo_fullname("Staphylococcus aureus")
to "Staphylococcus aureus"
takes 0.0004 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:
+run_it <- microbenchmark(A = mo_species("aureus"),
+ B = mo_genus("Staphylococcus"),
C = mo_fullname("Staphylococcus aureus"),
- times = 10)
-print(run_it, unit = "ms", signif = 3)
-#> Unit: milliseconds
-#> expr min lq mean median uq max neval
-#> A 65.500 66.100 66.200 66.300 66.500 66.700 10
-#> B 61.000 61.200 61.900 61.700 62.300 64.500 10
-#> C 0.329 0.335 0.461 0.527 0.551 0.556 10
-So going from mo_fullname("Staphylococcus aureus")
to "Staphylococcus aureus"
takes 0.0005 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:
-run_it <- microbenchmark(A = mo_species("aureus"),
- B = mo_genus("Staphylococcus"),
- C = mo_fullname("Staphylococcus aureus"),
- D = mo_family("Staphylococcaceae"),
- E = mo_order("Bacillales"),
- F = mo_class("Bacilli"),
- G = mo_phylum("Firmicutes"),
- H = mo_kingdom("Bacteria"),
- times = 10)
-print(run_it, unit = "ms", signif = 3)
-#> Unit: milliseconds
-#> expr min lq mean median uq max neval
-#> A 0.288 0.372 0.440 0.418 0.481 0.662 10
-#> B 0.281 0.294 0.364 0.369 0.411 0.461 10
-#> C 0.390 0.493 0.563 0.550 0.645 0.731 10
-#> D 0.244 0.269 0.733 0.337 0.347 4.420 10
-#> E 0.283 0.344 0.368 0.363 0.410 0.434 10
-#> F 0.250 0.319 0.343 0.339 0.354 0.492 10
-#> G 0.286 0.329 0.363 0.340 0.392 0.496 10
-#> H 0.292 0.305 0.365 0.359 0.421 0.459 10
+ D = mo_family("Staphylococcaceae"),
+ E = mo_order("Bacillales"),
+ F = mo_class("Bacilli"),
+ G = mo_phylum("Firmicutes"),
+ H = mo_kingdom("Bacteria"),
+ times = 10)
+print(run_it, unit = "ms", signif = 3)
+#> Unit: milliseconds
+#> expr min lq mean median uq max neval
+#> A 0.298 0.327 0.398 0.400 0.452 0.535 10
+#> B 0.251 0.287 0.339 0.344 0.377 0.436 10
+#> C 0.293 0.403 0.451 0.487 0.500 0.537 10
+#> D 0.250 0.262 0.300 0.277 0.336 0.395 10
+#> E 0.249 0.261 0.306 0.313 0.344 0.384 10
+#> F 0.273 0.283 0.325 0.326 0.338 0.420 10
+#> G 0.238 0.293 0.312 0.325 0.342 0.356 10
+#> H 0.250 0.262 0.304 0.316 0.337 0.358 10
Of course, when running mo_phylum("Firmicutes")
the function has zero knowledge about the actual microorganism, namely S. aureus. But since the result would be "Firmicutes"
too, there is no point in calculating the result. And because this package ‘knows’ all phyla of all known bacteria (according to the Catalogue of Life), it can just return the initial value immediately.
When the system language is non-English and supported by this AMR
package, some functions will have a translated result. This almost does’t take extra time:
mo_fullname("CoNS", language = "en") # or just mo_fullname("CoNS") on an English system
-#> [1] "Coagulase Negative Staphylococcus (CoNS)"
-
-mo_fullname("CoNS", language = "es") # or just mo_fullname("CoNS") on a Spanish system
-#> [1] "Staphylococcus coagulasa negativo (CoNS)"
-
-mo_fullname("CoNS", language = "nl") # or just mo_fullname("CoNS") on a Dutch system
-#> [1] "Coagulase-negatieve Staphylococcus (CNS)"
-
-run_it <- microbenchmark(en = mo_fullname("CoNS", language = "en"),
- de = mo_fullname("CoNS", language = "de"),
- nl = mo_fullname("CoNS", language = "nl"),
- es = mo_fullname("CoNS", language = "es"),
- it = mo_fullname("CoNS", language = "it"),
- fr = mo_fullname("CoNS", language = "fr"),
- pt = mo_fullname("CoNS", language = "pt"),
- times = 10)
-print(run_it, unit = "ms", signif = 4)
-#> Unit: milliseconds
-#> expr min lq mean median uq max neval
-#> en 24.41 25.27 26.34 25.41 26.92 30.60 10
-#> de 35.53 35.76 36.76 35.98 37.20 41.19 10
-#> nl 34.51 35.55 39.93 35.60 40.15 69.76 10
-#> es 34.36 35.98 44.29 37.46 39.98 73.16 10
-#> it 35.78 36.22 37.44 36.75 38.70 40.78 10
-#> fr 35.45 35.71 36.09 35.79 36.15 37.93 10
-#> pt 35.10 35.44 44.61 35.76 39.68 77.27 10
mo_fullname("CoNS", language = "en") # or just mo_fullname("CoNS") on an English system
+#> [1] "Coagulase Negative Staphylococcus (CoNS)"
+
+mo_fullname("CoNS", language = "es") # or just mo_fullname("CoNS") on a Spanish system
+#> [1] "Staphylococcus coagulasa negativo (CoNS)"
+
+mo_fullname("CoNS", language = "nl") # or just mo_fullname("CoNS") on a Dutch system
+#> [1] "Coagulase-negatieve Staphylococcus (CNS)"
+
+run_it <- microbenchmark(en = mo_fullname("CoNS", language = "en"),
+ de = mo_fullname("CoNS", language = "de"),
+ nl = mo_fullname("CoNS", language = "nl"),
+ es = mo_fullname("CoNS", language = "es"),
+ it = mo_fullname("CoNS", language = "it"),
+ fr = mo_fullname("CoNS", language = "fr"),
+ pt = mo_fullname("CoNS", language = "pt"),
+ times = 10)
+print(run_it, unit = "ms", signif = 4)
+#> Unit: milliseconds
+#> expr min lq mean median uq max neval
+#> en 10.78 11.11 11.15 11.14 11.30 11.41 10
+#> de 19.60 19.65 23.24 19.76 20.61 52.47 10
+#> nl 19.14 19.71 19.75 19.72 19.87 20.22 10
+#> es 19.64 19.73 28.36 20.60 25.91 64.67 10
+#> it 19.33 19.49 23.13 19.68 19.97 52.72 10
+#> fr 19.43 19.54 20.08 19.72 20.60 21.46 10
+#> pt 19.34 19.66 23.15 19.80 20.48 52.40 10
Currently supported are German, Dutch, Spanish, Italian, French and Portuguese.
freq.Rmd
mo_property.Rmd
resistance_predict.Rmd
This package is available on the official R network (CRAN), which has a peer-reviewed submission process. Install this package in R with:
It will be downloaded and installed automatically. For RStudio, click on the menu Tools > Install Packages… and then type in “AMR” and press Install.
-Note: Not all functions on this website may be available in this latest release. To use all functions and data sets mentioned on this website, install the latest development version.
+Note: Not all functions on this website may be available in this latest release. To use all functions and data sets mentioned on this website, install the latest development version.
All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses
All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales.
-The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant (sub)species are covered (like all species of Aspergillus, Candida, Pneumocystis, Saccharomyces and Trichophyton).
+All ~3,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales.
+The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of Aspergillus, Candida, Cryptococcus, Histoplasma, Pneumocystis, Saccharomyces and Trichophyton).
All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed
The responsible author(s) and year of scientific publication
It cleanses existing data by providing new classes for microoganisms, antibiotics and antimicrobial results (both S/I/R and MIC). By installing this package, you teach R everything about microbiology that is needed for analysis. These functions all use artificial intelligence to guess results that you would expect:
as.mo()
to get an ID of a microorganism. The IDs are human readable for the trained eye - the ID of Klebsiella pneumoniae is “B_KLBSL_PNE” (B stands for Bacteria) and the ID of S. aureus is “B_STPHY_AUR”. The function takes almost any text as input that looks like the name or code of a microorganism like “E. coli”, “esco” or “esccol” and tries to find expected results using artificial intelligence (AI) on the included ITIS data set, consisting of almost 20,000 microorganisms. It is very fast, please see our benchmarks. Moreover, it can group Staphylococci into coagulase negative and positive (CoNS and CoPS, see source) and can categorise Streptococci into Lancefield groups (like beta-haemolytic Streptococcus Group B, source).as.mo()
to get an ID of a microorganism. The IDs are human readable for the trained eye - the ID of Klebsiella pneumoniae is “B_KLBSL_PNE” (B stands for Bacteria) and the ID of S. aureus is “B_STPHY_AUR”. The function takes almost any text as input that looks like the name or code of a microorganism like “E. coli”, “esco” or “esccol” and tries to find expected results using artificial intelligence on the included Catalogue of Life data set, consisting of almost 60,000 microorganisms. It only takes milliseconds to find results, please see our benchmarks. Moreover, it can group Staphylococci into coagulase negative and positive (CoNS and CoPS, see source) and can categorise Streptococci into Lancefield groups (like beta-haemolytic Streptococcus Group B, source).as.rsi()
to transform values to valid antimicrobial results. It produces just S, I or R based on your input and warns about invalid values. Even values like “<=0.002; S” (combined MIC/RSI) will result in “S”.as.mic()
to cleanse your MIC values. It produces a so-called factor (called ordinal in SPSS) with valid MIC values as levels. A value like “<=0.002; S” (combined MIC/RSI) will result in “<=0.002”.as.atc()
to get the ATC code of an antibiotic as defined by the WHO. This package contains a database with most LIS codes, official names, DDDs and even trade names of antibiotics. For example, the values “Furabid”, “Furadantin”, “nitro” all return the ATC code of Nitrofurantoine.mdro()
(abbreviation of Multi Drug Resistant Organisms) to check your isolates for exceptional resistance with country-specific guidelines or EUCAST rules. Currently, national guidelines for Germany and the Netherlands are supported.microorganisms
contains the complete taxonomic tree of almost 20,000 microorganisms (bacteria, fungi/yeasts and protozoa). Furthermore, the colloquial name and Gram stain are available, which enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like mo_genus()
, mo_family()
, mo_gramstain()
or even mo_phylum()
. As they use as.mo()
internally, they also use artificial intelligence. For example, mo_genus("MRSA")
and mo_genus("S. aureus")
will both return "Staphylococcus"
. They also come with support for German, Dutch, Spanish, Italian, French and Portuguese. These functions can be used to add new variables to your data.microorganisms
contains the complete taxonomic tree of almost 60,000 microorganisms. Furthermore, the colloquial name and Gram stain are available, which enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like mo_genus()
, mo_family()
, mo_gramstain()
or even mo_phylum()
. As they use as.mo()
internally, they also use artificial intelligence. For example, mo_genus("MRSA")
and mo_genus("S. aureus")
will both return "Staphylococcus"
. They also come with support for German, Dutch, Spanish, Italian, French and Portuguese. These functions can be used to add new variables to your data.antibiotics
contains almost 500 antimicrobial drugs with their ATC code, EARS-Net code, common LIS codes, official name, trivial name and DDD of both oral and parenteral administration. It also contains hundreds of trade names. Use functions like atc_name()
and atc_tradenames()
to look up values. The atc_*
functions use as.atc()
internally so they support AI to guess your expected result. For example, atc_name("Fluclox")
, atc_name("Floxapen")
and atc_name("J01CF05")
will all return "Flucloxacillin"
. These functions can again be used to add new variables to your data.All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales.
The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant (sub)species are covered (like all species of Aspergillus, Candida, Pneumocystis, Saccharomyces and Trichophyton).The responsible author(s) and year of scientific publication
catalogue_of_life_version()
.
-The responsible author(s) and year of scientific publication
+This data is updated annually - check the included version withcatalogue_of_life_version()
.mo
codes changed (e.g. Streptococcus changed from B_STRPTC
to B_STRPT
). A translation table is used internally to support older microorganism IDs, so users will not notice this difference.mo_rank()
for the taxonomic rank (genus, species, infraspecies, etc.)mo_url()
to get the URL to the Catalogue of Lifefirst_isolate()
and eucast_rules()
, all parameters will be filled in automatically.allow_uncertain = TRUE
(which is the default setting), i
This package contains the complete taxonomic tree of almost all microorganisms from the authoritative and comprehensive Catalogue of Life (http://www.catalogueoflife.org). This data is updated annually - check the included version with catalogue_of_life_version
.
Included are:
All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses
All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant (sub)species are covered (like all species of Aspergillus, Candida, Pneumocystis, Saccharomyces and Trichophyton).
All ~3,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of Aspergillus, Candida, Cryptococcus, Histplasma, Pneumocystis, Saccharomyces and Trichophyton).
All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed
The complete taxonomic tree of all included (sub)species: from kingdom to subspecies
The responsible author(s) and year of scientific publication
catalogue_of_life_version
.
Included are:
All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses
All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant (sub)species are covered (like all species of Aspergillus, Candida, Pneumocystis, Saccharomyces and Trichophyton).
All ~3,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of Aspergillus, Candida, Cryptococcus, Histplasma, Pneumocystis, Saccharomyces and Trichophyton).
All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed
The complete taxonomic tree of all included (sub)species: from kingdom to subspecies
The responsible author(s) and year of scientific publication
catalogue_of_life_version
.
Included are:
All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses
All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant (sub)species are covered (like all species of Aspergillus, Candida, Pneumocystis, Saccharomyces and Trichophyton).
All ~3,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of Aspergillus, Candida, Cryptococcus, Histplasma, Pneumocystis, Saccharomyces and Trichophyton).
All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed
The complete taxonomic tree of all included (sub)species: from kingdom to subspecies
The responsible author(s) and year of scientific publication
mo_fullname()
mo_shortname()
mo_subspecies()
mo_species()
mo_genus()
mo_family()
mo_order()
mo_class()
mo_phylum()
mo_kingdom()
mo_type()
mo_gramstain()
mo_ref()
mo_authors()
mo_year()
mo_taxonomy()
mo_url()
mo_property()
mo_fullname()
mo_shortname()
mo_subspecies()
mo_species()
mo_genus()
mo_family()
mo_order()
mo_class()
mo_phylum()
mo_kingdom()
mo_type()
mo_gramstain()
mo_ref()
mo_authors()
mo_year()
mo_rank()
mo_taxonomy()
mo_url()
mo_property()
Property of a microorganism
catalogue_of_life_version
.
Included are:
All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses
All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant (sub)species are covered (like all species of Aspergillus, Candida, Pneumocystis, Saccharomyces and Trichophyton).
All ~3,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of Aspergillus, Candida, Cryptococcus, Histplasma, Pneumocystis, Saccharomyces and Trichophyton).
All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed
The complete taxonomic tree of all included (sub)species: from kingdom to subspecies
The responsible author(s) and year of scientific publication
A data.frame
with 56,672 observations and 14 variables:
A data.frame
with 57,158 observations and 14 variables:
mo
ID of microorganism as used by this package
col_id
Catalogue of Life ID
fullname
Full name, like "Echerichia coli"
catalogue_of_life_version
.
Included are:
All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses
All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant (sub)species are covered (like all species of Aspergillus, Candida, Pneumocystis, Saccharomyces and Trichophyton).
All ~3,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of Aspergillus, Candida, Cryptococcus, Histplasma, Pneumocystis, Saccharomyces and Trichophyton).
All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed
The complete taxonomic tree of all included (sub)species: from kingdom to subspecies
The responsible author(s) and year of scientific publication
A data.frame
with 14,506 observations and 4 variables:
A data.frame
with 14,487 observations and 4 variables:
col_id
Catalogue of Life ID
tsn_new
New Catalogue of Life ID
fullname
Old taxonomic name of the microorganism
catalogue_of_life_version
.
Included are:
All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses
All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant (sub)species are covered (like all species of Aspergillus, Candida, Pneumocystis, Saccharomyces and Trichophyton).
All ~3,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of Aspergillus, Candida, Cryptococcus, Histplasma, Pneumocystis, Saccharomyces and Trichophyton).
All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed
The complete taxonomic tree of all included (sub)species: from kingdom to subspecies
The responsible author(s) and year of scientific publication
other parameters passed on to as.mo
browse the URL using browseURL
one of the column names of one of the microorganisms
data set or "shortname"
An integer
in case of mo_year
A list
in case of mo_taxonomy
A named character
in case of mo_url
A character
in all other cases
catalogue_of_life_version
.
Included are:
All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses
All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant (sub)species are covered (like all species of Aspergillus, Candida, Pneumocystis, Saccharomyces and Trichophyton).
All ~3,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of Aspergillus, Candida, Cryptococcus, Histplasma, Pneumocystis, Saccharomyces and Trichophyton).
All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed
The complete taxonomic tree of all included (sub)species: from kingdom to subspecies
The responsible author(s) and year of scientific publication
# NOT RUN { -# All properties of Escherichia coli -## taxonomic properties +## taxonomic tree mo_kingdom("E. coli") # "Bacteria" mo_phylum("E. coli") # "Proteobacteria" mo_class("E. coli") # "Gammaproteobacteria" @@ -375,10 +381,12 @@ This package contains the complete taxonomic tree of almost all microorganisms f ## other properties mo_gramstain("E. coli") # "Gram negative" mo_type("E. coli") # "Bacteria" (equal to kingdom) +mo_rank("E. coli") # "species" +mo_url("E. coli") # get the direct url to the Catalogue of Life ## scientific reference -mo_ref("E. coli") # "Castellani and Chalmers, 1919" -mo_authors("E. coli") # "Castellani and Chalmers" +mo_ref("E. coli") # "Castellani et al., 1919" +mo_authors("E. coli") # "Castellani et al." mo_year("E. coli") # 1919 @@ -414,7 +422,7 @@ This package contains the complete taxonomic tree of almost all microorganisms f mo_shortname("S. pyo", Lancefield = TRUE) # "GAS" ('Group A streptococci') -# Language support for German, Dutch, Spanish, Portuguese, Italian and French +# language support for German, Dutch, Spanish, Portuguese, Italian and French mo_gramstain("E. coli", language = "de") # "Gramnegativ" mo_gramstain("E. coli", language = "nl") # "Gram-negatief" mo_gramstain("E. coli", language = "es") # "Gram negativo" @@ -432,7 +440,7 @@ This package contains the complete taxonomic tree of almost all microorganisms f language = "nl") # "Streptococcus groep A" -# Get a list with the complete taxonomy (kingdom to subspecies) +# get a list with the complete taxonomy (kingdom to subspecies) mo_taxonomy("E. coli") # }diff --git a/index.md b/index.md index 88ba4901..7ea44bbc 100644 --- a/index.md +++ b/index.md @@ -66,7 +66,7 @@ install.packages("AMR") It will be downloaded and installed automatically. For RStudio, click on the menu *Tools* > *Install Packages...* and then type in "AMR" and press Install. -*Note:* Not all functions on this website may be available in this latest release. To use all functions and data sets mentioned on this website, install the latest development version. +**Note:** Not all functions on this website may be available in this latest release. To use all functions and data sets mentioned on this website, install the latest development version. #### Latest development version @@ -92,9 +92,9 @@ Included are: * All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses -* All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. +* All ~3,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. - The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant (sub)species are covered (like all species of *Aspergillus*, *Candida*, *Pneumocystis*, *Saccharomyces* and *Trichophyton*). + The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of *Aspergillus*, *Candida*, *Cryptococcus*, *Histoplasma*, *Pneumocystis*, *Saccharomyces* and *Trichophyton*). * All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed diff --git a/man/as.mo.Rd b/man/as.mo.Rd index 99e3005e..0f7bce55 100644 --- a/man/as.mo.Rd +++ b/man/as.mo.Rd @@ -118,7 +118,7 @@ This package contains the complete taxonomic tree of almost all microorganisms f Included are: \itemize{ \item{All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses} - \item{All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant (sub)species are covered (like all species of \emph{Aspergillus}, \emph{Candida}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} + \item{All ~3,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of \emph{Aspergillus}, \emph{Candida}, \emph{Cryptococcus}, \emph{Histplasma}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} \item{All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed} \item{The complete taxonomic tree of all included (sub)species: from kingdom to subspecies} \item{The responsible author(s) and year of scientific publication} diff --git a/man/catalogue_of_life.Rd b/man/catalogue_of_life.Rd index 91b8fb96..a581c157 100644 --- a/man/catalogue_of_life.Rd +++ b/man/catalogue_of_life.Rd @@ -14,7 +14,7 @@ This package contains the complete taxonomic tree of almost all microorganisms f Included are: \itemize{ \item{All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses} - \item{All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant (sub)species are covered (like all species of \emph{Aspergillus}, \emph{Candida}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} + \item{All ~3,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of \emph{Aspergillus}, \emph{Candida}, \emph{Cryptococcus}, \emph{Histplasma}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} \item{All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed} \item{The complete taxonomic tree of all included (sub)species: from kingdom to subspecies} \item{The responsible author(s) and year of scientific publication} diff --git a/man/catalogue_of_life_version.Rd b/man/catalogue_of_life_version.Rd index 992798ed..3e962783 100644 --- a/man/catalogue_of_life_version.Rd +++ b/man/catalogue_of_life_version.Rd @@ -17,7 +17,7 @@ This package contains the complete taxonomic tree of almost all microorganisms f Included are: \itemize{ \item{All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses} - \item{All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant (sub)species are covered (like all species of \emph{Aspergillus}, \emph{Candida}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} + \item{All ~3,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of \emph{Aspergillus}, \emph{Candida}, \emph{Cryptococcus}, \emph{Histplasma}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} \item{All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed} \item{The complete taxonomic tree of all included (sub)species: from kingdom to subspecies} \item{The responsible author(s) and year of scientific publication} diff --git a/man/microorganisms.Rd b/man/microorganisms.Rd index 223f2795..86b59567 100755 --- a/man/microorganisms.Rd +++ b/man/microorganisms.Rd @@ -4,7 +4,7 @@ \name{microorganisms} \alias{microorganisms} \title{Data set with ~60,000 microorganisms} -\format{A \code{\link{data.frame}} with 56,672 observations and 14 variables: +\format{A \code{\link{data.frame}} with 57,158 observations and 14 variables: \describe{ \item{\code{mo}}{ID of microorganism as used by this package} \item{\code{col_id}}{Catalogue of Life ID} @@ -46,7 +46,7 @@ This package contains the complete taxonomic tree of almost all microorganisms f Included are: \itemize{ \item{All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses} - \item{All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant (sub)species are covered (like all species of \emph{Aspergillus}, \emph{Candida}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} + \item{All ~3,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of \emph{Aspergillus}, \emph{Candida}, \emph{Cryptococcus}, \emph{Histplasma}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} \item{All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed} \item{The complete taxonomic tree of all included (sub)species: from kingdom to subspecies} \item{The responsible author(s) and year of scientific publication} diff --git a/man/microorganisms.codes.Rd b/man/microorganisms.codes.Rd index ff79772b..0d8f97cf 100644 --- a/man/microorganisms.codes.Rd +++ b/man/microorganisms.codes.Rd @@ -23,7 +23,7 @@ This package contains the complete taxonomic tree of almost all microorganisms f Included are: \itemize{ \item{All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses} - \item{All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant (sub)species are covered (like all species of \emph{Aspergillus}, \emph{Candida}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} + \item{All ~3,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of \emph{Aspergillus}, \emph{Candida}, \emph{Cryptococcus}, \emph{Histplasma}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} \item{All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed} \item{The complete taxonomic tree of all included (sub)species: from kingdom to subspecies} \item{The responsible author(s) and year of scientific publication} diff --git a/man/microorganisms.old.Rd b/man/microorganisms.old.Rd index 9901406d..5336ea64 100644 --- a/man/microorganisms.old.Rd +++ b/man/microorganisms.old.Rd @@ -4,7 +4,7 @@ \name{microorganisms.old} \alias{microorganisms.old} \title{Data set with previously accepted taxonomic names} -\format{A \code{\link{data.frame}} with 14,506 observations and 4 variables: +\format{A \code{\link{data.frame}} with 14,487 observations and 4 variables: \describe{ \item{\code{col_id}}{Catalogue of Life ID} \item{\code{tsn_new}}{New Catalogue of Life ID} @@ -28,7 +28,7 @@ This package contains the complete taxonomic tree of almost all microorganisms f Included are: \itemize{ \item{All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses} - \item{All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant (sub)species are covered (like all species of \emph{Aspergillus}, \emph{Candida}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} + \item{All ~3,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of \emph{Aspergillus}, \emph{Candida}, \emph{Cryptococcus}, \emph{Histplasma}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} \item{All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed} \item{The complete taxonomic tree of all included (sub)species: from kingdom to subspecies} \item{The responsible author(s) and year of scientific publication} diff --git a/man/mo_property.Rd b/man/mo_property.Rd index db0e42de..216424d8 100644 --- a/man/mo_property.Rd +++ b/man/mo_property.Rd @@ -17,6 +17,7 @@ \alias{mo_ref} \alias{mo_authors} \alias{mo_year} +\alias{mo_rank} \alias{mo_taxonomy} \alias{mo_url} \title{Property of a microorganism} @@ -51,9 +52,11 @@ mo_authors(x, ...) mo_year(x, ...) +mo_rank(x, ...) + mo_taxonomy(x, ...) -mo_url(x, ...) +mo_url(x, open = FALSE, ...) mo_property(x, property = "fullname", language = get_locale(), ...) } @@ -64,12 +67,15 @@ mo_property(x, property = "fullname", language = get_locale(), ...) \item{...}{other parameters passed on to \code{\link{as.mo}}} +\item{open}{browse the URL using \code{\link[utils]{browseURL}}} + \item{property}{one of the column names of one of the \code{\link{microorganisms}} data set or \code{"shortname"}} } \value{ \itemize{ \item{An \code{integer} in case of \code{mo_year}} \item{A \code{list} in case of \code{mo_taxonomy}} + \item{A named \code{character} in case of \code{mo_url}} \item{A \code{character} in all other cases} } } @@ -101,7 +107,7 @@ This package contains the complete taxonomic tree of almost all microorganisms f Included are: \itemize{ \item{All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses} - \item{All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant (sub)species are covered (like all species of \emph{Aspergillus}, \emph{Candida}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} + \item{All ~3,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of \emph{Aspergillus}, \emph{Candida}, \emph{Cryptococcus}, \emph{Histplasma}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} \item{All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed} \item{The complete taxonomic tree of all included (sub)species: from kingdom to subspecies} \item{The responsible author(s) and year of scientific publication} @@ -127,8 +133,7 @@ On our website \url{https://msberends.gitlab.io/AMR} you can find \href{https:// } \examples{ -# All properties of Escherichia coli -## taxonomic properties +## taxonomic tree mo_kingdom("E. coli") # "Bacteria" mo_phylum("E. coli") # "Proteobacteria" mo_class("E. coli") # "Gammaproteobacteria" @@ -145,10 +150,12 @@ mo_shortname("E. coli") # "E. coli" ## other properties mo_gramstain("E. coli") # "Gram negative" mo_type("E. coli") # "Bacteria" (equal to kingdom) +mo_rank("E. coli") # "species" +mo_url("E. coli") # get the direct url to the Catalogue of Life ## scientific reference -mo_ref("E. coli") # "Castellani and Chalmers, 1919" -mo_authors("E. coli") # "Castellani and Chalmers" +mo_ref("E. coli") # "Castellani et al., 1919" +mo_authors("E. coli") # "Castellani et al." mo_year("E. coli") # 1919 @@ -184,7 +191,7 @@ mo_shortname("S. pyo") # "S. pyogenes" mo_shortname("S. pyo", Lancefield = TRUE) # "GAS" ('Group A streptococci') -# Language support for German, Dutch, Spanish, Portuguese, Italian and French +# language support for German, Dutch, Spanish, Portuguese, Italian and French mo_gramstain("E. coli", language = "de") # "Gramnegativ" mo_gramstain("E. coli", language = "nl") # "Gram-negatief" mo_gramstain("E. coli", language = "es") # "Gram negativo" @@ -202,7 +209,7 @@ mo_fullname("S. pyogenes", language = "nl") # "Streptococcus groep A" -# Get a list with the complete taxonomy (kingdom to subspecies) +# get a list with the complete taxonomy (kingdom to subspecies) mo_taxonomy("E. coli") } \seealso{ diff --git a/reproduction_of_microorganisms.R b/reproduction_of_microorganisms.R index e95a1a3d..3722ca09 100644 --- a/reproduction_of_microorganisms.R +++ b/reproduction_of_microorganisms.R @@ -2,8 +2,8 @@ # Data retrieved from Encyclopaedia of Life: # https://opendata.eol.org/dataset/catalogue-of-life/ -# unzip and extract taxon.tab, then: -taxon <- data.table::fread("taxon.tab") +# unzip and extract taxon.tab (around 1.5 GB), then: +taxon <- data.table::fread("Downloads/taxon.tab") # result is over 3.7M rows: library(dplyr) library(AMR) @@ -29,25 +29,45 @@ MOs <- taxon %>% !taxonRank %in% c("kingdom", "phylum", "superfamily", "class", "order", "family"), # not all fungi: Aspergillus, Candida, Trichphyton and Pneumocystis are the most important, # so only keep these orders from the fungi: - !(kingdom == "Fungi" & !order %in% c("Eurotiales", "Saccharomycetales", "Schizosaccharomycetales", "Onygenales", "Pneumocystales"))) %>% + !(kingdom == "Fungi" & !order %in% c("Eurotiales", "Saccharomycetales", "Schizosaccharomycetales", "Tremellales", "Onygenales", "Pneumocystales"))) %>% # remove text if it contains 'Not assigned' like phylum in viruses mutate_all(funs(gsub("Not assigned", "", .))) %>% - # only latest ref, not original authors - mutate(scientificNameAuthorship = trimws(gsub(".*[)] ", "", scientificNameAuthorship)), - scientificNameAuthorship = ifelse(grepl(" emend[. ]", scientificNameAuthorship, ignore.case = TRUE), - gsub("(.*)emend[. ]+(.*)", "\\2", scientificNameAuthorship, ignore.case = TRUE), - scientificNameAuthorship), - scientificNameAuthorship = gsub(".", "", scientificNameAuthorship, fixed = TRUE), - scientificNameAuthorship = gsub(",? et al", " et al.", scientificNameAuthorship, fixed = FALSE, ignore.case = TRUE), - scientificNameAuthorship = gsub("[()]", "", scientificNameAuthorship), - # year always preceded by comma - scientificNameAuthorship = gsub(" ([0-9]{4})$", ", \\1", scientificNameAuthorship), - scientificNameAuthorship = gsub(",,", ",", scientificNameAuthorship, fixed = TRUE), - # only first author with *et al.* - scientificNameAuthorship = gsub(",.*,", " et al.,", scientificNameAuthorship), - scientificNameAuthorship = gsub(" (and|&) .*,", " et al.,", scientificNameAuthorship), - scientificNameAuthorship = gsub(", [^0-9]+", ", ", scientificNameAuthorship), - scientificNameAuthorship = gsub(", $", "", scientificNameAuthorship) + # Transform 'Smith, Jones, 2011' to 'Smith et al., 2011': + mutate(authors2 = iconv(scientificNameAuthorship, from = "UTF-8", to = "ASCII//TRANSLIT"), + # remove leading and trailing brackets + authors2 = gsub("^[(](.*)[)]$", "\\1", authors2), + # only take part after brackets if there's a name + authors2 = ifelse(grepl(".*[)] [a-zA-Z]+.*", authors2), + gsub(".*[)] (.*)", "\\1", authors2), + authors2), + # get year from last 4 digits + lastyear = as.integer(gsub(".*([0-9]{4})$", "\\1", authors2)), + # can never be later than now + lastyear = ifelse(lastyear > as.integer(format(Sys.Date(), "%Y")), + NA, + lastyear), + # get authors without last year + authors = gsub("(.*)[0-9]{4}$", "\\1", authors2), + # remove nonsense characters from names + authors = gsub("[^a-zA-Z,'& -]", "", authors), + # remove trailing and leading spaces + authors = trimws(authors), + # only keep first author and replace all others by 'et al' + authors = gsub("(,| and| &| ex| emend\\.?) .*", " et al.", authors), + # et al. always with ending dot + authors = gsub(" et al\\.?", " et al.", authors), + authors = gsub(" ?,$", "", authors), + # don't start with 'sensu' or 'ehrenb' + authors = gsub("^(sensu|Ehrenb.?) ", "", authors, ignore.case = TRUE), + # no initials, only surname + authors = gsub("^([A-Z]+ )+", "", authors, ignore.case = FALSE), + # combine author and year if year is available + ref = ifelse(!is.na(lastyear), + paste0(authors, ", ", lastyear), + authors), + # fix beginning and ending + ref = gsub(", $", "", ref), + ref = gsub("^, ", "", ref) ) # remove non-ASCII characters (not allowed by CRAN) @@ -58,7 +78,7 @@ MOs <- MOs %>% # split old taxonomic names - they refer to a new `taxonID` with `acceptedNameUsageID` MOs.old <- MOs %>% filter(!is.na(acceptedNameUsageID), - scientificNameAuthorship != "") %>% + ref != "") %>% transmute(col_id = taxonID, col_id_new = acceptedNameUsageID, fullname = @@ -66,9 +86,9 @@ MOs.old <- MOs %>% gsub("(.*)[(].*", "\\1", stringr::str_replace( string = scientificName, - pattern = stringr::fixed(scientificNameAuthorship), + pattern = stringr::fixed(ref), replacement = ""))), - ref = scientificNameAuthorship) %>% + ref = ref) %>% filter(!is.na(fullname)) %>% distinct(fullname, .keep_all = TRUE) %>% arrange(col_id) @@ -88,7 +108,7 @@ MOs <- MOs %>% species = specificEpithet, subspecies = infraspecificEpithet, rank = taxonRank, - ref = scientificNameAuthorship, + ref = ref, species_id = gsub(".*/([a-f0-9]+)", "\\1", furtherInformationURL)) %>% distinct(fullname, .keep_all = TRUE) %>% filter(!grepl("unassigned", fullname, ignore.case = TRUE)) @@ -254,6 +274,7 @@ class(MOs$mo) <- "mo" saveRDS(MOs, "microorganisms.rds") saveRDS(MOs.old, "microorganisms.old.rds") + # on the server: # usethis::use_data(microorganisms, overwrite = TRUE) # usethis::use_data(microorganisms.old, overwrite = TRUE) diff --git a/vignettes/benchmarks.Rmd b/vignettes/benchmarks.Rmd index 8dc398f8..7fea7540 100755 --- a/vignettes/benchmarks.Rmd +++ b/vignettes/benchmarks.Rmd @@ -48,12 +48,11 @@ S.aureus <- microbenchmark(as.mo("sau"), as.mo("S. aureus"), as.mo("STAAUR"), as.mo("Staphylococcus aureus"), - as.mo("B_STPHY_AUR"), times = 10) -print(S.aureus, unit = "ms", signif = 2) +print(S.aureus, unit = "ms", signif = 3) ``` -In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 10 milliseconds means it can determine 100 input values per second. It case of 50 milliseconds, this is only 20 input values per second. The more an input value resembles a full name, the faster the result will be found. In case of `as.mo("B_STPHY_AUR")`, the input is already a valid MO code, so it only almost takes no time at all (`r as.integer(S.aureus %>% filter(expr == 'as.mo("B_STPHY_AUR")') %>% pull(time) %>% median(na.rm = TRUE) / 1000)` millionths of a second). +In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 10 milliseconds means it can determine 100 input values per second. It case of 50 milliseconds, this is only 20 input values per second. The more an input value resembles a full name, the faster the result will be found. To achieve this speed, the `as.mo` function also takes into account the prevalence of human pathogenic microorganisms. The downside is of course that less prevalent microorganisms will be determined less fast. See this example for the ID of *Mycoplasma leonicaptivi* (`B_MYCPL_LEO`), a bug probably never found before in humans: @@ -64,26 +63,30 @@ M.leonicaptivi <- microbenchmark(as.mo("myle"), as.mo("M. leonicaptivi"), as.mo("MYCLEO"), as.mo("Mycoplasma leonicaptivi"), - as.mo("B_MYCPL_LEO"), times = 10) -print(M.leonicaptivi, unit = "ms", signif = 2) +print(M.leonicaptivi, unit = "ms", signif = 3) ``` -That takes `r round(mean(M.leonicaptivi$time, na.rm = TRUE) / mean(S.aureus$time, na.rm = TRUE), 1)` times as much time on average! A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance: +That takes `r round(mean(M.leonicaptivi$time, na.rm = TRUE) / mean(S.aureus$time, na.rm = TRUE), 1)` times as much time on average! A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance. + +In the figure below, we compare *Escherichia coli* (which is very common) with *Prevotella brevis* (which is moderately common) and with *Mycoplasma leonicaptivi* (which is very uncommon): ```{r} par(mar = c(5, 16, 4, 2)) # set more space for left margin text (16) -# highest value on y axis -max_y_axis <- max(S.aureus$time, M.leonicaptivi$time, na.rm = TRUE) / 1e6 - -boxplot(S.aureus, horizontal = TRUE, las = 1, unit = "ms", log = FALSE, xlab = "", ylim = c(0, max_y_axis), - main = expression(paste("Benchmark of ", italic("Staphylococcus aureus")))) -boxplot(M.leonicaptivi, horizontal = TRUE, las = 1, unit = "ms", log = FALSE, xlab = "", ylim = c(0, max_y_axis), - main = expression(paste("Benchmark of ", italic("Mycoplasma leonicaptivi")))) +boxplot(microbenchmark(as.mo("M. leonicaptivi"), + as.mo("Mycoplasma leonicaptivi"), + as.mo("P. brevis"), + as.mo("Prevotella brevis"), + as.mo("E. coli"), + as.mo("Escherichia coli"), + times = 50), + horizontal = TRUE, las = 1, unit = "s", log = FALSE, + xlab = "", ylab = "Time in seconds", + main = "Benchmarks per prevalence") ``` -To relieve this pitfall and further improve performance, two important calculations take almost no time at all: **repetitive results** and **already precalculated results**. +Uncommon microorganisms take a lot more time than common microorganisms. To relieve this pitfall and further improve performance, two important calculations take almost no time at all: **repetitive results** and **already precalculated results**. ### Repetitive results