diff --git a/.gitignore b/.gitignore index e2371544..9a2c3cb6 100755 --- a/.gitignore +++ b/.gitignore @@ -21,3 +21,5 @@ vignettes/*.R packrat/lib*/ packrat/src/ cran-comments.md +data-raw/taxon.tab +data-raw/DSMZ_bactnames.xlsx diff --git a/DESCRIPTION b/DESCRIPTION index 7dc38214..9f79848f 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,5 +1,5 @@ Package: AMR -Version: 0.7.1.9032 +Version: 0.7.1.9033 Date: 2019-08-09 Title: Antimicrobial Resistance Analysis Authors@R: c( diff --git a/NEWS.md b/NEWS.md index 75a89294..6f15c214 100755 --- a/NEWS.md +++ b/NEWS.md @@ -1,4 +1,4 @@ -# AMR 0.7.1.9032 +# AMR 0.7.1.9033 ### Breaking * Function `freq()` has moved to a new package, [`clean`](https://github.com/msberends/clean) ([CRAN link](https://cran.r-project.org/package=clean)). Creating frequency tables is actually not the scope of this package (never was) and this function has matured a lot over the last two years. Therefore, a new package was created for data cleaning and checking and it perfectly fits the `freq()` function. The [`clean`](https://github.com/msberends/clean) package is available on CRAN and will be installed automatically when updating the `AMR` package, that now imports it. In a later stage, the `skewness()` and `kurtosis()` functions will be moved to the `clean` package too. @@ -60,7 +60,8 @@ * Some misspelled input were not understood * These new trivial names known to the field are now understood: meningococcus, gonococcus, pneumococcus * Added support for unknown yeasts and fungi -* Added the newest taxonomic data from the IJSEM journal (now up to date until August 2019) +* Updated the `microorganisms` data set to contain the latest taxonomic data from the IJSEM journal (now up to date until August 2019) +* Added almost 5,000 new fungi to the `microorganisms` data set * Fix for using `mo_*` functions where the coercion uncertainties and failures would not be available through `mo_uncertainties()` and `mo_failures()` anymore * Deprecated the `country` parameter of `mdro()` in favour of the already existing `guideline` parameter to support multiple guidelines within one country * The `name` of `RIF` is now Rifampicin instead of Rifampin diff --git a/R/catalogue_of_life.R b/R/catalogue_of_life.R index 0ec29f8a..5c92d9e1 100755 --- a/R/catalogue_of_life.R +++ b/R/catalogue_of_life.R @@ -30,10 +30,10 @@ #' @section Included taxa: #' Included are: #' \itemize{ -#' \item{All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria and Protozoa} -#' \item{All ~3,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of \emph{Aspergillus}, \emph{Candida}, \emph{Cryptococcus}, \emph{Histplasma}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} -#' \item{All ~2,000 (sub)species from ~100 other relevant genera, from the kingdoms of Animalia and Plantae (like \emph{Strongyloides} and \emph{Taenia})} -#' \item{All ~21,000 previously accepted names of included (sub)species that have been taxonomically renamed} +#' \item{All ~61,000 (sub)species from the kingdoms of Archaea, Bacteria, Chromista and Protozoa} +#' \item{All ~8,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Microascales, Mucorales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of \emph{Aspergillus}, \emph{Candida}, \emph{Cryptococcus}, \emph{Histplasma}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).} +#' \item{All ~150 (sub)species from ~100 other relevant genera from the kingdom of Animalia (like \emph{Strongyloides} and \emph{Taenia})} +#' \item{All ~23,000 previously accepted names of all included (sub)species (these were taxonomically renamed)} #' \item{The complete taxonomic tree of all included (sub)species: from kingdom to subspecies} #' \item{The responsible author(s) and year of scientific publication} #' } diff --git a/R/data.R b/R/data.R index 1974ab19..4bc5bc77 100755 --- a/R/data.R +++ b/R/data.R @@ -51,11 +51,11 @@ #' @seealso \code{\link{microorganisms}} "antibiotics" -#' Data set with ~65,000 microorganisms +#' Data set with ~70,000 microorganisms #' #' A data set containing the microbial taxonomy of six kingdoms from the Catalogue of Life. MO codes can be looked up using \code{\link{as.mo}}. #' @inheritSection catalogue_of_life Catalogue of Life -#' @format A \code{\link{data.frame}} with 68,260 observations and 16 variables: +#' @format A \code{\link{data.frame}} with 69,854 observations and 16 variables: #' \describe{ #' \item{\code{mo}}{ID of microorganism as used by this package} #' \item{\code{col_id}}{Catalogue of Life ID} @@ -98,7 +98,7 @@ catalogue_of_life <- list( #' #' A data set containing old (previously valid or accepted) taxonomic names according to the Catalogue of Life. This data set is used internally by \code{\link{as.mo}}. #' @inheritSection catalogue_of_life Catalogue of Life -#' @format A \code{\link{data.frame}} with 21,743 observations and 4 variables: +#' @format A \code{\link{data.frame}} with 22,932 observations and 4 variables: #' \describe{ #' \item{\code{col_id}}{Catalogue of Life ID that was originally given} #' \item{\code{col_id_new}}{New Catalogue of Life ID that responds to an entry in the \code{\link{microorganisms}} data set} diff --git a/R/ggplot_rsi.R b/R/ggplot_rsi.R index 40b43ccb..c54050e3 100755 --- a/R/ggplot_rsi.R +++ b/R/ggplot_rsi.R @@ -182,8 +182,8 @@ ggplot_rsi <- function(data, title = NULL, subtitle = NULL, caption = NULL, - x.title = NULL, - y.title = NULL, + x.title = "Antimicrobial", + y.title = "Proportion", ...) { stopifnot_installed_package("ggplot2") diff --git a/data-raw/reproduction_of_microorganisms.R b/data-raw/reproduction_of_microorganisms.R index 79b18291..1b639428 100644 --- a/data-raw/reproduction_of_microorganisms.R +++ b/data-raw/reproduction_of_microorganisms.R @@ -11,10 +11,10 @@ library(dplyr) library(AMR) # unzip and extract taxon.tab (around 1.5 GB) from the CoL archive, then: -data_col <- data.table::fread("Downloads/taxon.tab") +data_col <- data.table::fread("data-raw/taxon.tab") # read the xlsx file from DSMZ (only around 2.5 MB): -data_dsmz <- readxl::read_xlsx("Downloads/DSMZ_bactnames.xlsx") +data_dsmz <- readxl::read_xlsx("data-raw/DSMZ_bactnames.xlsx") # the CoL data is over 3.7M rows: data_col %>% freq(kingdom) @@ -99,23 +99,25 @@ MOs <- data_total %>% # and not all fungi: Aspergillus, Candida, Trichphyton and Pneumocystis are the most important, # so only keep these orders from the fungi: & !(kingdom == "Fungi" - & !order %in% c("Eurotiales", "Mucorales", "Saccharomycetales", "Schizosaccharomycetales", "Tremellales", "Onygenales", "Pneumocystales")) + & !order %in% c("Eurotiales", "Microascales", "Mucorales", "Saccharomycetales", "Schizosaccharomycetales", "Tremellales", "Onygenales", "Pneumocystales")) ) # or the genus has to be one of the genera we found in our hospitals last decades (Northern Netherlands, 2002-2018) | genus %in% c("Absidia", "Acremonium", "Actinotignum", "Alternaria", "Anaerosalibacter", "Ancylostoma", "Anisakis", "Apophysomyces", "Arachnia", "Ascaris", "Aureobacterium", "Aureobasidium", "Balantidum", "Bilophilia", "Branhamella", "Brochontrix", "Brugia", "Calymmatobacterium", "Catabacter", "Cdc", "Chilomastix", "Chryseomonas", "Cladophialophora", "Cladosporium", "Clonorchis", "Cordylobia", "Curvularia", "Demodex", "Dermatobia", "Diphyllobothrium", "Dracunculus", "Echinococcus", - "Enterobius", "Euascomycetes", "Exophiala", "Fasciola", "Fusarium", "Hendersonula", "Hymenolepis", "Kloeckera", + "Enterobius", "Euascomycetes", "Exophiala", "Fasciola", "Fusarium", "Hendersonula", "Hymenolepis", "Hypomyces", "Kloeckera", "Koserella", "Larva", "Leishmania", "Lelliottia", "Loa", "Lumbricus", "Malassezia", "Metagonimus", "Molonomonas", - "Mucor", "Nattrassia", "Necator", "Novospingobium", "Onchocerca", "Opistorchis", "Paragonimus", "Paramyxovirus", + "Mucor", "Nattrassia", "Necator", "Nectria", "Novospingobium", "Onchocerca", "Opistorchis", "Paragonimus", "Paramyxovirus", "Pediculus", "Phoma", "Phthirus", "Pityrosporum", "Pseudallescheria", "Pulex", "Rhizomucor", "Rhizopus", "Rhodotorula", "Salinococcus", "Sanguibacteroides", "Schistosoma", "Scopulariopsis", "Scytalidium", "Sporobolomyces", "Stomatococcus", - "Strongyloides", "Syncephalastraceae", "Taenia", "Torulopsis", "Trichinella", "Trichobilharzia", "Trichomonas", + "Strongyloides", "Syncephalastraceae", "Taenia", "Torulopsis", "Trichinella", "Trichobilharzia", "Trichoderma", "Trichomonas", "Trichosporon", "Trichuris", "Trypanosoma", "Wuchereria") # or the taxonomic entry is old - the species was renamed | !is.na(col_id_new) - ) + ) %>% + # really no Plantae (e.g. Dracunculus exist both as worm and as plant) + filter(kingdom != "Plantae") # filter old taxonomic names so only the ones with an existing reference will be kept MOs <- MOs %>% @@ -222,7 +224,7 @@ MOs <- MOs %>% distinct(fullname, .keep_all = TRUE) # what characters are in the fullnames? -paste(unique(sort(unlist(strsplit(x = paste(MOs$fullname, collapse = ""), split = "")))), collapse = "") +table(sort(unlist(strsplit(x = paste(MOs$fullname, collapse = ""), split = "")))) # Add abbreviations so we can easily know which ones are which ones. # These will become valid and unique microbial IDs for the AMR package. @@ -522,12 +524,20 @@ MOs <- MOs %>% # everything distinct? sum(duplicated(MOs$mo)) +sum(duplicated(MOs$fullname)) colnames(MOs) # here we welcome the new ones: MOs %>% filter(!fullname %in% AMR::microorganisms$fullname) %>% View() # and the ones we lost: AMR::microorganisms %>% filter(!fullname %in% MOs$fullname) %>% View() +# and these IDs have changed: +MOs %>% + filter(fullname %in% AMR::microorganisms$fullname) %>% + left_join(AMR::microorganisms %>% select(mo, fullname), by = "fullname", suffix = c("_new", "_old")) %>% + filter(mo_new != mo_old) %>% + select(mo_old, mo_new, everything()) %>% + View() # set prevalence per species MOs <- MOs %>% diff --git a/data/microorganisms.old.rda b/data/microorganisms.old.rda index 674279f6..805070c3 100644 Binary files a/data/microorganisms.old.rda and b/data/microorganisms.old.rda differ diff --git a/data/microorganisms.rda b/data/microorganisms.rda index b5a2d375..871cb79e 100755 Binary files a/data/microorganisms.rda and b/data/microorganisms.rda differ diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index f48faeb2..74853816 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -78,7 +78,7 @@
diff --git a/docs/articles/index.html b/docs/articles/index.html index 9d070245..7b57cf45 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -78,7 +78,7 @@ diff --git a/docs/authors.html b/docs/authors.html index 2f345f87..30b341f9 100644 --- a/docs/authors.html +++ b/docs/authors.html @@ -78,7 +78,7 @@ diff --git a/docs/index.html b/docs/index.html index 99a91fb4..4daa6daf 100644 --- a/docs/index.html +++ b/docs/index.html @@ -42,7 +42,7 @@ @@ -189,10 +189,23 @@(
AMR
is a free and open-source R package to simplify the analysis and prediction of Antimicrobial Resistance (AMR) and to work with microbial and antimicrobial properties by using evidence-based methods. It supports any data format, including WHONET/EARS-Net data.
After installing this package, R knows ~65,000 microorganisms and ~450 antibiotics by name and code, and knows all about valid RSI and MIC values.
-Used to SPSS? Read our tutorial on how to import data from SPSS, SAS or Stata and learn in which ways R outclasses any of these statistical packages.
-We created this package for both academic research and routine analysis at the Faculty of Medical Sciences of the University of Groningen, the Netherlands, and the Medical Microbiology & Infection Prevention (MMBI) department of the University Medical Center Groningen (UMCG). This R package is actively maintained and is free software; you can freely use and distribute it for both personal and commercial (but not patent) purposes under the terms of the GNU General Public License version 2.0 (GPL-2), as published by the Free Software Foundation. Read the full license here.
+AMR
(for R)?AMR
is a free and open-source R package to simplify the analysis and prediction of Antimicrobial Resistance (AMR) and to work with microbial and antimicrobial properties by using evidence-based methods. Since its first public release in early 2018, this package has been downloaded over 20,000 times from more than 40 countries (source: CRAN logs, 2019).
After installing this package, R knows ~70,000 microorganisms (distinct microbial species) and ~450 antibiotics by name and code, and knows all about valid RSI and MIC values. It supports any data format, including WHONET/EARS-Net data.
+We created this package for both academic research and routine analysis at the Faculty of Medical Sciences of the University of Groningen, the Netherlands, and the Medical Microbiology & Infection Prevention (MMBI) department of the University Medical Center Groningen (UMCG). This R package is actively maintained and is free software; you can freely use and distribute it for both personal and commercial (but not patent) purposes under the terms of the GNU General Public License version 2.0 (GPL-2), as published by the Free Software Foundation. Read the full license here.
+Used to SPSS? Read our tutorial on how to import data from SPSS, SAS or Stata.
+ +This package can be used for:
To find out how to conduct AMR analysis, please continue reading here to get started or click the links in the ‘How to’ menu.
This package contains the complete taxonomic tree of almost all microorganisms from the authoritative and comprehensive Catalogue of Life (www.catalogueoflife.org).
Included are:
All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria and Protozoa
All ~61,000 (sub)species from the kingdoms of Archaea, Bacteria, Chromista and Protozoa
All ~3,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales.
+All ~8,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Microascales, Mucorales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales.
The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of Aspergillus, Candida, Cryptococcus, Histoplasma, Pneumocystis, Saccharomyces and Trichophyton).
All ~2,000 (sub)species from ~100 other relevant genera, from the kingdoms of Animalia and Plantae (like Strongyloides and Taenia)
All ~21,000 previously accepted names of included (sub)species that have been taxonomically renamed
All ~150 (sub)species from ~100 other relevant genera from the kingdom of Animalia (like Strongyloides and Taenia)
All ~23,000 previously accepted names of all included (sub)species (these were taxonomically renamed)
The responsible author(s) and year of scientific publication
This data is updated annually - check the included version with catalogue_of_life_version()
.
mdro()
(abbreviation of Multi Drug Resistant Organisms) to check your isolates for exceptional resistance with country-specific guidelines or EUCAST rules. Currently, national guidelines for Germany and the Netherlands are supported.microorganisms
contains the complete taxonomic tree of ~65,000 microorganisms. Furthermore, some colloquial names and all Gram stains are available, which enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like mo_genus()
, mo_family()
, mo_gramstain()
or even mo_phylum()
. As they use as.mo()
internally, they also use the same intelligent rules for determination. For example, mo_genus("MRSA")
and mo_genus("S. aureus")
will both return "Staphylococcus"
. They also come with support for German, Dutch, Spanish, Italian, French and Portuguese. These functions can be used to add new variables to your data.microorganisms
contains the complete taxonomic tree of ~70,000 microorganisms. Furthermore, some colloquial names and all Gram stains are available, which enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like mo_genus()
, mo_family()
, mo_gramstain()
or even mo_phylum()
. As they use as.mo()
internally, they also use the same intelligent rules for determination. For example, mo_genus("MRSA")
and mo_genus("S. aureus")
will both return "Staphylococcus"
. They also come with support for German, Dutch, Spanish, Italian, French and Portuguese. These functions can be used to add new variables to your data.antibiotics
contains ~450 antimicrobial drugs with their EARS-Net code, ATC code, PubChem compound ID, official name, common LIS codes and DDDs of both oral and parenteral administration. It also contains all (thousands of) trade names found in PubChem. The function ab_atc()
will return the ATC code of an antibiotic as defined by the WHO. Use functions like ab_name()
, ab_group()
and ab_tradenames()
to look up values. The ab_*
functions use as.ab()
internally so they support the same intelligent rules to guess the most probable result. For example, ab_name("Fluclox")
, ab_name("Floxapen")
and ab_name("J01CF05")
will all return "Flucloxacillin"
. These functions can again be used to add new variables to your data.microorganisms
data set to contain the latest taxonomic data from the IJSEM journal (now up to date until August 2019)microorganisms
data setmo_*
functions where the coercion uncertainties and failures would not be available through mo_uncertainties()
and mo_failures()
anymorecountry
parameter of mdro()
in favour of the already existing guideline
parameter to support multiple guidelines within one countryname
of RIF
is now Rifampicin instead of Rifampinas.mo(..., allow_uncertain = 3)
Contents
Included are:
All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria and Protozoa
All ~3,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of Aspergillus, Candida, Cryptococcus, Histplasma, Pneumocystis, Saccharomyces and Trichophyton).
All ~2,000 (sub)species from ~100 other relevant genera, from the kingdoms of Animalia and Plantae (like Strongyloides and Taenia)
All ~21,000 previously accepted names of included (sub)species that have been taxonomically renamed
All ~61,000 (sub)species from the kingdoms of Archaea, Bacteria, Chromista and Protozoa
All ~8,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Microascales, Mucorales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of Aspergillus, Candida, Cryptococcus, Histplasma, Pneumocystis, Saccharomyces and Trichophyton).
All ~150 (sub)species from ~100 other relevant genera from the kingdom of Animalia (like Strongyloides and Taenia)
All ~23,000 previously accepted names of all included (sub)species (these were taxonomically renamed)
The complete taxonomic tree of all included (sub)species: from kingdom to subspecies
The responsible author(s) and year of scientific publication
Data set with ~65,000 microorganisms
Data set with ~70,000 microorganisms
microorganisms.Rd
A data.frame
with 68,260 observations and 16 variables:
A data.frame
with 69,854 observations and 16 variables:
mo
ID of microorganism as used by this package
col_id
Catalogue of Life ID
fullname
Full name, like "Escherichia coli"
A data.frame
with 21,743 observations and 4 variables:
A data.frame
with 22,932 observations and 4 variables:
col_id
Catalogue of Life ID that was originally given
col_id_new
New Catalogue of Life ID that responds to an entry in the microorganisms
data set
fullname
Old full taxonomic name of the microorganism