(v1.3.0.9005) website update

2025-08-24 13:12:09 +02:00 · 2020-08-17 21:49:58 +02:00
parent dab017a50f
commit 818d0441e0
54 changed files with 13094 additions and 424 deletions
--- a/vignettes/EUCAST.Rmd
+++ b/vignettes/EUCAST.Rmd
@@ -1,7 +1,5 @@
 ---
 title: "How to apply EUCAST rules"
-author: "Matthijs S. Berends"
-date: '`r format(Sys.Date(), "%d %B %Y")`'
 output:
  rmarkdown::html_vignette:
    toc: true
--- a/vignettes/MDR.Rmd
+++ b/vignettes/MDR.Rmd
@@ -1,7 +1,5 @@
 ---
 title: "How to determine multi-drug resistance (MDR)"
-author: "Matthijs S. Berends"
-date: '`r format(Sys.Date(), "%d %B %Y")`'
 output: 
  rmarkdown::html_vignette:
    toc: true
--- a/vignettes/PCA.Rmd
+++ b/vignettes/PCA.Rmd
@@ -1,7 +1,5 @@
 ---
 title: "How to conduct principal component analysis (PCA) for AMR"
-author: "Matthijs S. Berends"
-date: '`r format(Sys.Date(), "%d %B %Y")`'
 output: 
  rmarkdown::html_vignette:
    toc: true
--- a/vignettes/SPSS.Rmd
+++ b/vignettes/SPSS.Rmd
@@ -33,7 +33,7 @@ As said, SPSS is easier to learn than R. But SPSS, SAS and Stata come with major

 * **R is highly modular.**

-  The [official R network (CRAN)](https://cran.r-project.org/) features almost 14,000 packages at the time of writing, our `AMR` package being one of them. All these packages were peer-reviewed before publication. Aside from this official channel, there are also developers who choose not to submit to CRAN, but rather keep it on their own public repository, like GitHub. So there may even be a lot more than 14,000 packages out there.
+  The [official R network (CRAN)](https://cran.r-project.org/) features more than 16,000 packages at the time of writing, our `AMR` package being one of them. All these packages were peer-reviewed before publication. Aside from this official channel, there are also developers who choose not to submit to CRAN, but rather keep it on their own public repository, like GitHub. So there may even be a lot more than 14,000 packages out there.
  
  Bottom line is, you can really extend it yourself or ask somebody to do this for you. Take for example our `AMR` package. Among other things, it adds reliable reference data to R to help you with the data cleaning and analysis. SPSS, SAS and Stata will never know what a valid MIC value is or what the Gram stain of *E. coli* is. Or that all species of *Klebiella* are resistant to amoxicillin and that Floxapen^&reg;^ is a trade name of flucloxacillin. These facts and properties are often needed to clean existing data, which would be very inconvenient in a software package without reliable reference data. See below for a demonstration.
  
@@ -49,7 +49,7 @@ As said, SPSS is easier to learn than R. But SPSS, SAS and Stata come with major
  
 * **R has a huge community.**

-  Many R users just ask questions on websites like [StackOverflow.com](https://stackoverflow.com), the largest online community for programmers. At the time of writing, more than [300,000 R-related questions](https://stackoverflow.com/questions/tagged/r?sort=votes) have already been asked on this platform (which covers questions and answers for any programming language). In my own experience, most questions are answered within a couple of minutes.
+  Many R users just ask questions on websites like [StackOverflow.com](https://stackoverflow.com), the largest online community for programmers. At the time of writing, more than [360,000 R-related questions](https://stackoverflow.com/questions/tagged/r?sort=votes) have already been asked on this platform (which covers questions and answers for any programming language). In my own experience, most questions are answered within a couple of minutes.
  
 * **R understands any data type, including SPSS/SAS/Stata.**

--- a/vignettes/WHONET.Rmd
+++ b/vignettes/WHONET.Rmd
@@ -1,7 +1,5 @@
 ---
 title: "How to work with WHONET data"
-author: "Matthijs S. Berends"
-date: '`r format(Sys.Date(), "%d %B %Y")`'
 output:
  rmarkdown::html_vignette:
    toc: true
--- a/vignettes/benchmarks.Rmd
+++ b/vignettes/benchmarks.Rmd
@@ -1,7 +1,5 @@
 ---
 title: "Benchmarks"
-author: "Matthijs S. Berends"
-date: '`r format(Sys.Date(), "%d %B %Y")`'
 output: 
  rmarkdown::html_vignette:
    toc: true
@@ -88,35 +86,7 @@ ggplot.bm(S.aureus)

 In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 5 milliseconds means it can determine 200 input values per second. It case of 100 milliseconds, this is only 10 input values per second. 

-To achieve this speed, the `as.mo` function also takes into account the prevalence of human pathogenic microorganisms. The downside of this is of course that less prevalent microorganisms will be determined less fast. See this example for the ID of *Methanosarcina semesiae* (`B_MTHNSR_SEMS`), a bug probably never found before in humans:
-
-```{r, warning=FALSE}
-M.semesiae <- microbenchmark(as.mo("metsem"),
-                             as.mo("METSEM"),
-                             as.mo("M. semesiae"),
-                             as.mo("M.  semesiae"),
-                             as.mo("Methanosarcina semesiae"),
-                             times = 10)
-print(M.semesiae, unit = "ms", signif = 4)
-```
-
-Looking up arbitrary codes of less prevalent microorganisms costs the most time. Full names (like *Methanosarcina semesiae*) are always very fast and only take some thousands of seconds to coerce - they are the most probable input from most data sets.
-
-In the figure below, we compare *Escherichia coli* (which is very common) with *Prevotella brevis* (which is moderately common) and with *Methanosarcina semesiae* (which is uncommon):
-
-```{r, echo = FALSE, fig.width=12}
-par(mar = c(5, 16, 4, 2))
-boxplot(microbenchmark(
-  as.mo("Meth. semesiae"),
-  as.mo("Prev. brevis"),
-  as.mo("Esc. coli"),
-  times = 100),
-        horizontal = TRUE, las = 1, unit = "s", log = TRUE,
-        xlab = "", ylab = "Time in seconds (log)",
-        main = "Benchmarks per prevalence")
-```
-
-Uncommon microorganisms take some more time than common microorganisms. To further improve performance, two important calculations take almost no time at all: **repetitive results** and **already precalculated results**.
+To improve performance, two important calculations take almost no time at all: **repetitive results** and **already precalculated results**.

 ### Repetitive results

--- a/vignettes/datasets.Rmd
+++ b/vignettes/datasets.Rmd
@@ -1,11 +1,11 @@
 ---
-title: "Data sets for download"
+title: "Data sets for download / own use"
 output: 
  rmarkdown::html_vignette:
    toc: true
-    toc_depth: 3
+    toc_depth: 2
 vignette: >
-  %\VignetteIndexEntry{Data sets for download}
+  %\VignetteIndexEntry{Data sets for download / own use}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
 editor_options: 
@@ -24,25 +24,25 @@ options(knitr.kable.NA = '')

 file_size <- function(...) {
  size_kb <- file.size(...) / 1024
-  if (size_kb > 500) {
-    paste(round(size_kb / 1024, 1), "MB")
+  if (size_kb < 100) {
+    paste(round(size_kb, 0), "kB")
  } else {
-    paste(round(size_kb, 1), "kB")
+    paste(round(size_kb / 1024, 1), "MB")
  }
 }

 structure_txt <- function(dataset) {
  paste0("A data set with ",
         format(nrow(dataset), big.mark = ","), " rows and ", 
-         ncol(dataset), " columns, containing the following column names:\n\n*",
+         ncol(dataset), " columns, containing the following column names:  \n*",
         paste0(colnames(dataset), collapse = ", "), "*.")
 }

 download_txt <- function(filename) {
-    msg <- paste0("Download the data set preferably in the software you use, so the data file already has the correct data structure. Below files were updated on ", 
-                trimws(format(file.mtime(paste0("../data/", filename, ".rda")), "%e %B %Y %H:%M:%S %Z")), ".")
-                github_base <- "https://github.com/msberends/AMR/raw/master/data-raw/"
-  gitlab_base <- "https://gitlab.com/msberends/AMR/-/raw/master/data-raw/"
+  msg <- paste0("It was last updated on ", 
+                trimws(format(file.mtime(paste0("../data/", filename, ".rda")), "%e %B %Y %H:%M:%S %Z")), 
+                ".\n\nDirect download links:  \n")
+  github_base <- "https://github.com/msberends/AMR/raw/master/data-raw/"
  filename <- paste0("../data-raw/", filename)
  txt <- paste0(filename, ".txt")
  rds <- paste0(filename, ".rds")
@@ -51,10 +51,7 @@ download_txt <- function(filename) {
  sas <- paste0(filename, ".dta")
  excel <- paste0(filename, ".xlsx")
  create_txt <- function(filename, type) {
-    paste0("* ", type, ": ",
-           "[from GitHub](", github_base, filename, "), ",
-           "[from GitLab](", gitlab_base, filename, ") ",
-           "(file size: ", file_size(filename), ")")
+    paste0("[", type, "](", github_base, filename, "), ", file_size(filename), " -- ")
  }

  if (file.exists(rds)) msg <- c(msg, create_txt(rds, "R file (.rds)"))
@@ -63,7 +60,8 @@ download_txt <- function(filename) {
  if (file.exists(stata)) msg <- c(msg, create_txt(stata, "Stata file (.dta)"))
  if (file.exists(sas)) msg <- c(msg, create_txt(sas, "SAS file (.sas)"))
  if (file.exists(txt)) msg <- c(msg, create_txt(txt, "Tab separated file (.txt)"))
-  paste0(msg, collapse = "\n\n")
+  msg[length(msg)] <- gsub(" --", ".", msg[length(msg)], fixed = TRUE)
+  paste0(msg, collapse = "")
 }

 library(AMR)
@@ -92,28 +90,30 @@ print_df <- function(x) {

 ```

-This package contains a lot of reference data sets that are all reliable, up-to-date and free to download. You can even use them outside of R, for example to train your laboratory information system (LIS) about intrinsic resistance! 
+This package contains a lot of reference data sets that are all reliable, up-to-date and free to download. You can even use them outside of R, for example to teach your laboratory information system (LIS) about intrinsic resistance! 

-We included them in our `AMR` package, but also automatically 'mirror' them to our public repository in different software formats. On this page, we explain how to download them and how the structure of the data sets look like. The tab separated files **allow for machine reading taxonomic data and EUCAST and CLSI interpretation guidelines**, which is almost impossible with the Excel and PDF files distributed by EUCAST and CLSI.
+We included them in our `AMR` package, but also automatically 'mirror' them to our public repository in different software formats. On this page, we explain how to download them and how the structure of the data sets look like. The tab separated files **allow for machine reading taxonomic data and EUCAST and CLSI interpretation guidelines**, which is almost impossible with the Excel and PDF files distributed by EUCAST and CLSI. We also offer all data sets in formats for R, SPSS, SAS, Stata and Excel.

 *Note: Years and dates of updates mentioned on this page, are from on `AMR` package version `r utils::packageVersion("AMR")`, online released on `r format(utils::packageDate("AMR"), "%e %B %Y")`. **If you are reading this page from within R, please [visit our website](https://msberends.github.io/AMR/articles/datasets.html) for the latest update.***

-## Microorganisms
+## Microorganisms (currently accepted names)

 This data set is in R available as `microorganisms`, after you load the `AMR` package.

-#### Source
+`r download_txt("microorganisms")`
+
+### Source

 Our full taxonomy of microorganisms is based on the authoritative and comprehensive:

 * [Catalogue of Life](http://www.catalogueoflife.org) (included version: `r AMR:::catalogue_of_life$year`)
 * [List of Prokaryotic names with Standing in Nomenclature](https://lpsn.dsmz.de) (LPSN, included version: `r AMR:::catalogue_of_life$yearmonth_DSMZ`)

-#### Structure
+### Structure

 `r structure_txt(microorganisms)`

-Included per taxonomic kingdom:
+Included (sub)species per taxonomic kingdom:

 ```{r, echo = FALSE}
 microorganisms %>% 
@@ -125,13 +125,6 @@ microorganisms %>%
  print_df()
 ```

-
-#### Download
-
-`r download_txt("microorganisms")`
-
-#### Example
-
 Example rows when filtering on genus *Escherichia*:

 ```{r, echo = FALSE}
@@ -140,11 +133,41 @@ microorganisms %>%
  print_df()
 ```

+## Microorganisms (previously accepted names)
+
+This data set is in R available as `microorganisms.old`, after you load the `AMR` package.
+
+`r download_txt("microorganisms.old")`
+
+### Source
+
+This data set contains old, previously accepted taxonomic names. The data sources are the same as the `microorganisms` data set:
+
+* [Catalogue of Life](http://www.catalogueoflife.org) (included version: `r AMR:::catalogue_of_life$year`)
+* [List of Prokaryotic names with Standing in Nomenclature](https://lpsn.dsmz.de) (LPSN, included version: `r AMR:::catalogue_of_life$yearmonth_DSMZ`)
+
+### Structure
+
+`r structure_txt(microorganisms.old)`
+
+**Note:** remember that the 'ref' columns contains the scientific reference to the old taxonomic entries, i.e. of column *fullname*. For the scientific reference of the new names, i.e. of column *fullname_new*, see the `microorganisms` data set.
+
+Example rows when filtering on *Escherichia*:
+
+```{r, echo = FALSE}
+microorganisms.old %>%
+  filter(fullname %like% "^Escherichia") %>% 
+  print_df()
+```
+
+
 ## Antibiotic agents

 This data set is in R available as `antibiotics`, after you load the `AMR` package.

-#### Source
+`r download_txt("antibiotics")`
+
+### Source

 This data set contains all EARS-Net and ATC codes gathered from WHO and WHONET, and all compound IDs from PubChem. It also contains all brand names (synonyms) as found on PubChem and Defined Daily Doses (DDDs) for oral and parenteral administration.

@@ -152,16 +175,10 @@ This data set contains all EARS-Net and ATC codes gathered from WHO and WHONET,
 * [PubChem by the US National Library of Medicine](https://pubchem.ncbi.nlm.nih.gov)
 * [WHONET software 2019](https://whonet.org)

-#### Structure
+### Structure

 `r structure_txt(antibiotics)`

-#### Download
-
-`r download_txt("antibiotics")`
-
-#### Example
-
 Example rows:

 ```{r, echo = FALSE}
@@ -175,23 +192,19 @@ antibiotics %>%

 This data set is in R available as `antivirals`, after you load the `AMR` package.

-#### Source
+`r download_txt("antivirals")`
+
+### Source

 This data set contains all ATC codes gathered from WHO and all compound IDs from PubChem. It also contains all brand names (synonyms) as found on PubChem and Defined Daily Doses (DDDs) for oral and parenteral administration.

 * [ATC/DDD index from WHO Collaborating Centre for Drug Statistics Methodology](https://www.whocc.no/atc_ddd_index/) (note: this may not be used for commercial purposes, but is frelly available from the WHO CC website for personal use)
 * [PubChem by the US National Library of Medicine](https://pubchem.ncbi.nlm.nih.gov)

-#### Structure
+### Structure

 `r structure_txt(antivirals)`

-#### Download
-
-`r download_txt("antivirals")`
-
-#### Example
-
 Example rows:

 ```{r, echo = FALSE}
@@ -204,23 +217,19 @@ antivirals %>%

 This data set is in R available as `intrinsic_resistant`, after you load the `AMR` package.

-#### Source
+`r download_txt("intrinsic_resistant")`
+
+### Source

 This data set contains all defined intrinsic resistance by EUCAST of all bug-drug combinations. 

 The data set is based on 'EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes', version `r AMR:::EUCAST_VERSION_EXPERT_RULES`.

-#### Structure
+### Structure

 `r structure_txt(intrinsic_resistant)`

-#### Download
-
-`r download_txt("intrinsic_resistant")`
-
-#### Example
-
-Example rows:
+Example rows when filtering on *Klebsiella*:

 ```{r, echo = FALSE}
 intrinsic_resistant %>%
@@ -233,20 +242,16 @@ intrinsic_resistant %>%

 This data set is in R available as `rsi_translation`, after you load the `AMR` package.

-#### Source
+`r download_txt("rsi_translation")`
+
+### Source

 This data set contains interpretation rules for MIC values and disk diffusion diameters. Included guidelines are CLSI (`r min(as.integer(gsub("[^0-9]", "", subset(rsi_translation, guideline %like% "CLSI")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(rsi_translation, guideline %like% "CLSI")$guideline)))`) and EUCAST (`r min(as.integer(gsub("[^0-9]", "", subset(rsi_translation, guideline %like% "EUCAST")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(rsi_translation, guideline %like% "EUCAST")$guideline)))`).

-#### Structure
+### Structure

 `r structure_txt(rsi_translation)`

-#### Download
-
-`r download_txt("rsi_translation")`
-
-#### Example
-
 Example rows:

 ```{r, echo = FALSE}
--- a/vignettes/resistance_predict.Rmd
+++ b/vignettes/resistance_predict.Rmd
@@ -1,7 +1,5 @@
 ---
 title: "How to predict antimicrobial resistance"
-author: "Matthijs S. Berends"
-date: '`r format(Sys.Date(), "%d %B %Y")`'
 output: 
  rmarkdown::html_vignette:
    toc: true
--- a/vignettes/welcome_to_AMR.Rmd
+++ b/vignettes/welcome_to_AMR.Rmd
@@ -1,7 +1,5 @@
 ---
 title: "Welcome to the AMR package"
-author: "Matthijs S. Berends"
-date: '`r format(Sys.Date(), "%d %B %Y")`'
 output: 
  rmarkdown::html_vignette:
    toc: true