mirror of
https://github.com/msberends/AMR.git
synced 2025-01-23 21:44:35 +01:00
microorganisms update, added Salmonella groups
This commit is contained in:
parent
8da2467209
commit
5f3a7694aa
@ -1,6 +1,6 @@
|
||||
Package: AMR
|
||||
Version: 1.8.2.9062
|
||||
Date: 2022-12-12
|
||||
Version: 1.8.2.9063
|
||||
Date: 2022-12-16
|
||||
Title: Antimicrobial Resistance Data Analysis
|
||||
Description: Functions to simplify and standardise antimicrobial resistance (AMR)
|
||||
data analysis and to work with microbial and antimicrobial properties by
|
||||
|
109
NEWS.md
109
NEWS.md
@ -1,57 +1,94 @@
|
||||
# AMR 1.8.2.9062
|
||||
# AMR 1.8.2.9063
|
||||
|
||||
This version will eventually become v2.0! We're happy to reach a new major milestone soon!
|
||||
*(this beta version will eventually become v2.0! We're happy to reach a new major milestone soon!)*
|
||||
|
||||
### Breaking
|
||||
* **Removed all species of the taxonomic kingdom Chromista** from the package. This was done for multiple reasons:
|
||||
* CRAN allows packages to be around 5 MB maximum, some packages are exempted but this package is not one of them
|
||||
* Chromista are not relevant when it comes to antimicrobial resistance, thus lacking the primary scope of this package
|
||||
* Chromista are almost never clinically relevant, thus lacking the secondary scope of this package
|
||||
* The `microorganisms` no longer relies on the Catalogue of Life, but now primarily on the List of Prokaryotic names with Standing in Nomenclature (LPSN) and is supplemented with the Global Biodiversity Information Facility (GBIF). The structure of this data set has changed to include separate LPSN and GBIF identifiers. Almost all previous MO codes were retained. It contains over 1,000 taxonomic names from 2022 already.
|
||||
* **The `microorganisms.old` data set was removed**, and all previously accepted names are now included in the `microorganisms` data set. A new column `status` contains `"accepted"` for currently accepted names and `"synonym"` for taxonomic synonyms; currently invalid names. All previously accepted names now have a microorganisms ID and - if available - an LPSN, GBIF and SNOMED CT identifier.
|
||||
* The MO matching score algorithm (`mo_matching_score()`) now counts deletions and substitutions as 2 instead of 1, which impacts the outcome of `as.mo()` and any `mo_*()` function
|
||||
* **Argument `combine_IR` has been removed** from this package (affecting functions `count_df()`, `proportion_df()`, and `rsi_df()` and some plotting functions), since it was replaced with `combine_SI` three years ago
|
||||
* Interpretation **guidelines older than 10 years were removed**, the oldest now included guidelines of EUCAST and CLSI are from 2013
|
||||
* Using `units` in `ab_ddd(..., units = "...")` had been deprecated and is now not supported anymore. Use `ab_ddd_units()` instead.
|
||||
This is a new major release of the AMR package, with great new additions but also some breaking changes for current users. These are all listed below.
|
||||
|
||||
**TL;DR**
|
||||
|
||||
* Microbiological taxonomy (`microorganisms` data set) updated to 2022 and now based on LPSN and GBIF
|
||||
* Much increased algorithms to translate user input to valid taxonomy
|
||||
* Clinical breakpoints added for EUCAST 2022 and CLSI 2022
|
||||
* 20 new antibiotics added and updated all DDDs and ATC codes
|
||||
* Extended support for antiviral agents (`antivirals` data set), with many new functions
|
||||
* Now available in 16 languages
|
||||
* Many new interesting functions, such as `rsi_confidence_interval()` and `mean_amr_distance()`
|
||||
* Hundreds of bug fixes
|
||||
|
||||
### New
|
||||
* **EUCAST 2022 and CLSI 2022 guidelines** have been added for `as.rsi()`. EUCAST 2022 (v12.0) is now the new default guideline for all MIC and disks diffusion interpretations, and for `eucast_rules()` to apply EUCAST Expert Rules.
|
||||
* Support for the following languages: Chinese, Greek, Japanese, Polish, Turkish and Ukrainian. We are very grateful for the valuable input by our colleagues from other countries. The `AMR` package is now available in 16 languages. The automatic language determination will give a note on systems in supported languages.
|
||||
* **All new algorithm for `as.mo()`** (and thus all `mo_*()` functions) while still following our original set-up as described in our recently published JSS paper (DOI [10.18637/jss.v104.i03](https://doi.org/10.18637/jss.v104.i03)).
|
||||
|
||||
#### Interpretation of MIC and disk diffusion values
|
||||
|
||||
EUCAST 2022 and CLSI 2022 guidelines have been added for `as.rsi()`. EUCAST 2022 (v12.0) is now the new default guideline for all MIC and disks diffusion interpretations, and for `eucast_rules()` to apply EUCAST Expert Rules. The default guideline (EUCAST) can now be changed with the new `AMR_guideline` option, such as: `options(AMR_guideline = "CLSI 2020")`.
|
||||
|
||||
Interpretation guidelines older than 10 years were removed, the oldest now included guidelines of EUCAST and CLSI are from 2013.
|
||||
|
||||
#### Supported languages
|
||||
|
||||
We added support for the following languages: Chinese, Greek, Japanese, Polish, Turkish and Ukrainian. All antibiotic names are now available in these languages, and the AMR package will automatically determine a supported language based on the user system language.
|
||||
|
||||
We are very grateful for the valuable input by our colleagues from other countries. The `AMR` package is now available in 16 languages and according to download stats used in almost all countries in the world!
|
||||
|
||||
#### Microbiological taxonomy
|
||||
The `microorganisms` no longer relies on the Catalogue of Life, but on the List of Prokaryotic names with Standing in Nomenclature (LPSN) and is supplemented with the 'backbone taxonomy' from the Global Biodiversity Information Facility (GBIF). The structure of this data set has changed to include separate LPSN and GBIF identifiers. Almost all previous MO codes were retained. It contains over 1,400 taxonomic names from 2022.
|
||||
|
||||
We also made the following changes regarding the included taxonomy or microorganisms functions:
|
||||
* Updated full microbiological taxonomy according to the latest daily LPSN data set (December 2022) and latest yearly GBIF taxonomy backbone (November 2022)
|
||||
* Support for all 1,515 city-like serovars of *Salmonella*, such as *Salmonella* Goldcoast. Formally, these are serovars belonging to the *S. enterica* species, but they are reported with only the name of the genus and the city. For this reason, the serovars are in the `subspecies` column of the `microorganisms` data set and "enterica" is in the `species` column, but the full name does not contain the species name (*enterica*).
|
||||
* All new algorithm for `as.mo()` (and thus all `mo_*()` functions) while still following our original set-up as described in our recently published JSS paper (DOI [10.18637/jss.v104.i03](https://doi.org/10.18637/jss.v104.i03)).
|
||||
* A new argument `keep_synonyms` allows to *not* correct for updated taxonomy, in favour of the now deleted argument `allow_uncertain`
|
||||
* It has increased tremendously in speed and returns generally more consequent results
|
||||
* Sequential coercion is now extremely fast as results are stored to the package environment, although coercion of unknown values must be run once per session. Previous results can be reset/removed with the new `mo_reset_session()` function.
|
||||
* Support for microorganism codes of the ASIan Antimicrobial Resistance Surveillance Network (ASIARS-Net)
|
||||
* **Extensive support for antiviral agents!** For the first time, the `AMR` package has extensive support for antiviral drugs and to work with their names, codes and other data in any way.
|
||||
* The `antivirals` data set has been extended with 18 new drugs (also from the [new J05AJ ATC group](https://www.whocc.no/atc_ddd_index/?code=J05AJ&showdescription=no)) and now also contains antiviral identifiers and LOINC codes
|
||||
* A new data type `av` (*antivirals*) has been added, which is functionally similar to `ab` for antibiotics
|
||||
* Functions `as.av()`, `av_name()`, `av_atc()`, `av_synonyms()`, `av_from_text()` have all been added as siblings to their `ab_*()` equivalents
|
||||
* **Other new functions!**
|
||||
* Function `rsi_confidence_interval()` to add confidence intervals in AMR calculation. This is also included in `rsi_df()` and `proportion_df()`
|
||||
* Function `mean_amr_distance()` to calculate the mean AMR distance. The mean AMR distance is a normalised numeric value to compare AMR test results and can help to identify similar isolates, without comparing antibiograms by hand.
|
||||
* Function `rsi_interpretation_history()` to view the history of previous runs of `as.rsi()`. This returns a 'logbook' with the selected guideline, reference table and specific interpretation of each row in a data set on which `as.rsi()` was run.
|
||||
* Function `mo_current()` to get the currently valid taxonomic name of a microorganism
|
||||
* Function `add_custom_antimicrobials()` to add custom antimicrobial codes and names to the `AMR` package
|
||||
* New and updated entries for the `antibiotics` data set
|
||||
* The following **20 antibiotics have been added** (also includes the [new J01RA ATC group](https://www.whocc.no/atc_ddd_index/?code=J01RA&showdescription=no)): azithromycin/fluconazole/secnidazole (AFC), cefepime/amikacin (CFA), cefixime/ornidazole (CEO), ceftriaxone/beta-lactamase inhibitor (CEB), ciprofloxacin/metronidazole (CIM), ciprofloxacin/ornidazole (CIO), ciprofloxacin/tinidazole (CIT), furazidin (FUR), isoniazid/sulfamethoxazole/trimethoprim/pyridoxine (IST), lascufloxacin (LSC), levofloxacin/ornidazole (LEO), nemonoxacin (NEM), norfloxacin/metronidazole (NME), norfloxacin/tinidazole (NTI), ofloxacin/ornidazole (OOR), oteseconazole (OTE), rifampicin/ethambutol/isoniazid (REI), sarecycline (SRC), tetracycline/oleandomycin (TOL), and thioacetazone (TAT)
|
||||
* The MO matching score algorithm (`mo_matching_score()`) now counts deletions and substitutions as 2 instead of 1, which impacts the outcome of `as.mo()` and any `mo_*()` function
|
||||
* **Removed all species of the taxonomic kingdom Chromista** from the package. This was done for multiple reasons:
|
||||
* CRAN allows packages to be around 5 MB maximum, some packages are exempted but this package is not one of them
|
||||
* Chromista are not relevant when it comes to antimicrobial resistance, thus lacking the primary scope of this package
|
||||
* Chromista are almost never clinically relevant, thus lacking the secondary scope of this package
|
||||
* The `microorganisms.old` data set was removed, and all previously accepted names are now included in the `microorganisms` data set. A new column `status` contains `"accepted"` for currently accepted names and `"synonym"` for taxonomic synonyms; currently invalid names. All previously accepted names now have a microorganisms ID and - if available - an LPSN, GBIF and SNOMED CT identifier.
|
||||
|
||||
#### Antibiotic agents and selectors
|
||||
|
||||
The new function `add_custom_antimicrobials()` allows users to add custom antimicrobial codes and names to the `AMR` package.
|
||||
|
||||
The `antibiotics` data set was greatly updated:
|
||||
* The following 20 antibiotics have been added (also includes the [new J01RA ATC group](https://www.whocc.no/atc_ddd_index/?code=J01RA&showdescription=no)): azithromycin/fluconazole/secnidazole (AFC), cefepime/amikacin (CFA), cefixime/ornidazole (CEO), ceftriaxone/beta-lactamase inhibitor (CEB), ciprofloxacin/metronidazole (CIM), ciprofloxacin/ornidazole (CIO), ciprofloxacin/tinidazole (CIT), furazidin (FUR), isoniazid/sulfamethoxazole/trimethoprim/pyridoxine (IST), lascufloxacin (LSC), levofloxacin/ornidazole (LEO), nemonoxacin (NEM), norfloxacin/metronidazole (NME), norfloxacin/tinidazole (NTI), ofloxacin/ornidazole (OOR), oteseconazole (OTE), rifampicin/ethambutol/isoniazid (REI), sarecycline (SRC), tetracycline/oleandomycin (TOL), and thioacetazone (TAT)
|
||||
* Added some missing ATC codes
|
||||
* Updated DDDs and PubChem Compound IDs
|
||||
* Updated some antibiotic name spelling, now used by WHOCC (such as cephalexin -> cefalexin, and phenethicillin -> pheneticillin)
|
||||
* Antibiotic code "CEI" for ceftolozane/tazobactam has been replaced with "CZT" to comply with EARS-Net and WHONET 2022. The old code will still work in all cases when using `as.ab()` or any of the `ab_*()` functions.
|
||||
* Support for antimicrobial interpretation of anaerobic bacteria, by adding a 'placeholder' code `B_ANAER` to the `microorganisms` data set and add the breakpoints of anaerobics to the `rsi_interpretation` data set, which is used by `as.rsi()` when interpreting MIC and disk diffusion values
|
||||
* Support for `data.frame`-enhancing R packages, more specifically: `data.table::data.table`, `janitor::tabyl`, `tibble::tibble`, and `tsibble::tsibble`. AMR package functions that have a data set as output (such as `rsi_df()` and `bug_drug_combinations()`), will now return the same data type as the input.
|
||||
* All data sets in this package are **now exported as `tibble`**, instead of base R `data.frame`s. Older R versions are still supported.
|
||||
* Our data sets are now also continually exported to **Apache Feather and Apache Parquet formats**. You can find more info [in this article on our website](https://msberends.github.io/AMR/articles/datasets.html).
|
||||
* Support for using antibiotic selectors in scoped `dplyr` verbs (with or without `vars()`), such as in: `... %>% summarise_at(aminoglycosides(), resistance)`, see `resistance()`
|
||||
* Support for antimicrobial interpretation of anaerobic bacteria, by adding a 'placeholder' code `B_ANAER` to the `microorganisms` data set and adding the breakpoints of anaerobics to the `rsi_interpretation` data set, which is used by `as.rsi()` for interpretion of MIC and disk diffusion values
|
||||
|
||||
Also, we added support for using antibiotic selectors in scoped `dplyr` verbs (with or without using `vars()`), such as in: `... %>% summarise_at(aminoglycosides(), resistance)`, please see `resistance()` for examples.
|
||||
|
||||
#### Antiviral agents
|
||||
|
||||
We now added extensive support for antiviral agents! For the first time, the `AMR` package has extensive support for antiviral drugs and to work with their names, codes and other data in any way.
|
||||
|
||||
* The `antivirals` data set has been extended with 18 new drugs (also from the [new J05AJ ATC group](https://www.whocc.no/atc_ddd_index/?code=J05AJ&showdescription=no)) and now also contains antiviral identifiers and LOINC codes
|
||||
* A new data type `av` (*antivirals*) has been added, which is functionally similar to `ab` for antibiotics
|
||||
* Functions `as.av()`, `av_name()`, `av_atc()`, `av_synonyms()`, `av_from_text()` have all been added as siblings to their `ab_*()` equivalents
|
||||
|
||||
#### Other new functions
|
||||
|
||||
* Function `rsi_confidence_interval()` to add confidence intervals in AMR calculation. This is now also included in `rsi_df()` and `proportion_df()`.
|
||||
* Function `mean_amr_distance()` to calculate the mean AMR distance. The mean AMR distance is a normalised numeric value to compare AMR test results and can help to identify similar isolates, without comparing antibiograms by hand.
|
||||
* Function `rsi_interpretation_history()` to view the history of previous runs of `as.rsi()`. This returns a 'logbook' with the selected guideline, reference table and specific interpretation of each row in a data set on which `as.rsi()` was run.
|
||||
* Function `mo_current()` to get the currently valid taxonomic name of a microorganism
|
||||
* Function `add_custom_antimicrobials()` to add custom antimicrobial codes and names to the `AMR` package
|
||||
|
||||
### Changes
|
||||
* Updated the microbiological taxonomy using the latest GBIF backbone (November 2022) and latest LPSN records (11 December 2022)
|
||||
|
||||
* Argument `combine_IR` has been removed from this package (affecting functions `count_df()`, `proportion_df()`, and `rsi_df()` and some plotting functions), since it was replaced with `combine_SI` three years ago
|
||||
* Using `units` in `ab_ddd(..., units = "...")` had been deprecated for some time and is now not supported anymore. Use `ab_ddd_units()` instead.
|
||||
* Support for `data.frame`-enhancing R packages, more specifically: `data.table::data.table`, `janitor::tabyl`, `tibble::tibble`, and `tsibble::tsibble`. AMR package functions that have a data set as output (such as `rsi_df()` and `bug_drug_combinations()`), will now return the same data type as the input.
|
||||
* All data sets in this package are now a `tibble`, instead of base R `data.frame`s. Older R versions are still supported, even if they do not support `tibble`s.
|
||||
* Our data sets are now also continually exported to **Apache Feather and Apache Parquet formats**. You can find more info [in this article on our website](https://msberends.github.io/AMR/articles/datasets.html).
|
||||
* For `as.rsi()`:
|
||||
* Fixed certain EUCAST breakpoints for MIC values
|
||||
* Allow `NA` values (e.g. `as.rsi(as.disk(NA), ...)`)
|
||||
* Fix for bug-drug combinations with multiple breakpoints for different body sites
|
||||
* Interpretation from MIC and disk zones is now more informative about availability of breakpoints and more robust
|
||||
* The default guideline (EUCAST) can now be changed with `options(AMR_guideline = "...")`
|
||||
* Removed the `as.integer()` method for MIC values, since MIC are not integer values and running `table()` on MIC values consequently failed for not being able to retrieve the level position (as that's how normally `as.integer()` on `factor`s work)
|
||||
* Fixed determination of Gram stains (`mo_gramstain()`), since the taxonomic phyla Actinobacteria, Chloroflexi, Firmicutes, and Tenericutes have been renamed to respectively Actinomycetota, Chloroflexota, Bacillota, and Mycoplasmatota in 2021
|
||||
* `droplevels()` on MIC will now return a common `factor` at default and will lose the `mic` class. Use `droplevels(..., as.mic = TRUE)` to keep the `mic` class.
|
||||
@ -59,8 +96,6 @@ This version will eventually become v2.0! We're happy to reach a new major miles
|
||||
* Fixes for reading in text files using `set_mo_source()`, which now also allows the source file to contain valid taxonomic names instead of only valid microorganism ID of this package
|
||||
* Fixed a bug for `mdro()` when using similar column names with the Magiorakos guideline
|
||||
* Using any `random_*()` function (such as `random_mic()`) is now possible by directly calling the package without loading it first: `AMR::random_mic(10)`
|
||||
* Added *Toxoplasma gondii* (`P_TXPL_GOND`) to the `microorganisms` data set, together with its genus, family, and order
|
||||
* Changed value in column `prevalence` of the `microorganisms` data set from 3 to 2 for these genera: *Acholeplasma*, *Alistipes*, *Alloprevotella*, *Bergeyella*, *Borrelia*, *Brachyspira*, *Butyricimonas*, *Cetobacterium*, *Chlamydia*, *Chlamydophila*, *Deinococcus*, *Dysgonomonas*, *Elizabethkingia*, *Empedobacter*, *Haloarcula*, *Halobacterium*, *Halococcus*, *Myroides*, *Odoribacter*, *Ornithobacterium*, *Parabacteroides*, *Pedobacter*, *Phocaeicola*, *Porphyromonas*, *Riemerella*, *Sphingobacterium*, *Streptobacillus*, *Tenacibaculum*, *Terrimonas*, *Victivallis*, *Wautersiella*, *Weeksella*
|
||||
* Extended support for the `vctrs` package, used internally by the tidyverse. This allows to change values of class `mic`, `disk`, `rsi`, `mo` and `ab` in tibbles, and to use antibiotic selectors for selecting/filtering, e.g. `df[carbapenems() == "R", ]`
|
||||
* Fix for using `info = FALSE` in `mdro()`
|
||||
* For all interpretation guidelines using `as.rsi()` on amoxicillin, the rules for ampicillin will be used if amoxicillin rules are not available
|
||||
|
3
R/data.R
3
R/data.R
@ -120,6 +120,7 @@
|
||||
#' ### Manual additions
|
||||
#' For convenience, some entries were added manually:
|
||||
#'
|
||||
#' - `r format_included_data_number(which(microorganisms$genus == "Salmonella" & microorganisms$species == "enterica" & microorganisms$source == "manually added"))` entries for the city-like serovars of *Salmonellae*
|
||||
#' - 11 entries of *Streptococcus* (beta-haemolytic: groups A, B, C, D, F, G, H, K and unspecified; other: viridans, milleri)
|
||||
#' - 2 entries of *Staphylococcus* (coagulase-negative (CoNS) and coagulase-positive (CoPS))
|
||||
#' - 1 entry of *Blastocystis* (*B. hominis*), although it officially does not exist (Noel *et al.* 2005, PMID 15634993)
|
||||
@ -140,6 +141,8 @@
|
||||
#' * `r TAXONOMY_VERSION$GBIF$citation` Accessed from <`r TAXONOMY_VERSION$GBIF$url`> on `r documentation_date(TAXONOMY_VERSION$GBIF$accessed_date)`.
|
||||
#'
|
||||
#' * `r TAXONOMY_VERSION$SNOMED$citation` URL: <`r TAXONOMY_VERSION$SNOMED$url`>
|
||||
#'
|
||||
#' * Grimont *et al.*. Antigenic Formulae of the Salmonella Serovars, 2007, 9th Edition. WHO Collaborating Centre for Reference and Research on *Salmonella* (WHOCC-SALM).
|
||||
#' @seealso [as.mo()], [mo_property()], [microorganisms.codes], [intrinsic_resistant]
|
||||
#' @examples
|
||||
#' microorganisms
|
||||
|
13
R/mo.R
13
R/mo.R
@ -429,10 +429,12 @@ as.mo <- function(x,
|
||||
|
||||
# Apply Lancefield ----
|
||||
if (isTRUE(Lancefield) || Lancefield == "all") {
|
||||
# (using `%like_case%` to also match subspecies)
|
||||
|
||||
# group A - S. pyogenes
|
||||
out[out == "B_STRPT_PYGN"] <- "B_STRPT_GRPA"
|
||||
out[out %like_case% "^B_STRPT_PYGN(_|$)"] <- "B_STRPT_GRPA"
|
||||
# group B - S. agalactiae
|
||||
out[out == "B_STRPT_AGLC"] <- "B_STRPT_GRPB"
|
||||
out[out %like_case% "^B_STRPT_AGLC(_|$)"] <- "B_STRPT_GRPB"
|
||||
# group C - all subspecies within S. dysgalactiae and S. equi (such as S. equi zooepidemicus)
|
||||
out[out %like_case% "^B_STRPT_(DYSG|EQUI)(_|$)"] <- "B_STRPT_GRPC"
|
||||
if (Lancefield == "all") {
|
||||
@ -441,12 +443,13 @@ as.mo <- function(x,
|
||||
}
|
||||
# group F - S. anginosus, incl. S. anginosus anginosus and S. anginosus whileyi
|
||||
out[out %like_case% "^B_STRPT_ANGN(_|$)"] <- "B_STRPT_GRPF"
|
||||
# group G - only S. dysgalactiae which is also group C, so ignore it here
|
||||
# group G - S. dysgalactiae and S. canis (though dysgalactiae is also group C and will be matched there)
|
||||
out[out %like_case% "^B_STRPT_(DYSG|CANS)(_|$)"] <- "B_STRPT_GRPG"
|
||||
# group H - S. sanguinis
|
||||
out[out == "B_STRPT_SNGN"] <- "B_STRPT_GRPH"
|
||||
out[out %like_case% "^B_STRPT_SNGN(_|$)"] <- "B_STRPT_GRPH"
|
||||
# group K - S. salivarius, incl. S. salivarius salivariuss and S. salivarius thermophilus
|
||||
out[out %like_case% "^B_STRPT_SLVR(_|$)"] <- "B_STRPT_GRPK"
|
||||
# group L - only S. dysgalactiae which is also group C, so ignore it here
|
||||
# group L - only S. dysgalactiae which is also group C & G, so ignore it here
|
||||
}
|
||||
|
||||
# All unknowns ----
|
||||
|
BIN
R/sysdata.rda
BIN
R/sysdata.rda
Binary file not shown.
@ -8,7 +8,7 @@ This work was published in the Journal of Statistical Software (Volume 104(3); [
|
||||
|
||||
`AMR` is a free, open-source and independent R package to simplify the analysis and prediction of Antimicrobial Resistance (AMR) and to work with microbial and antimicrobial data and properties, by using evidence-based methods. Our aim is to provide a standard for clean and reproducible antimicrobial resistance data analysis, that can therefore empower epidemiological analyses to continuously enable surveillance and treatment evaluation in any setting. It is currently being used in over 175 countries.
|
||||
|
||||
After installing this package, R knows ~48,000 distinct microbial species and all ~600 antibiotic, antimycotic, and antiviral drugs by name and code (including ATC, WHONET/EARS-Net, PubChem, LOINC and SNOMED CT), and knows all about valid R/SI and MIC values. It supports any data format, including WHONET/EARS-Net data. Antimicrobial names and group names are available in English, Chinese, Danish, Dutch, French, German, Greek, Italian, Japanese, Polish, Portuguese, Russian, Spanish, Swedish, Turkish, and Ukrainian.
|
||||
After installing this package, R knows ~52,000 distinct microbial species and all ~600 antibiotic, antimycotic, and antiviral drugs by name and code (including ATC, WHONET/EARS-Net, PubChem, LOINC and SNOMED CT), and knows all about valid R/SI and MIC values. It supports any data format, including WHONET/EARS-Net data. Antimicrobial names and group names are available in English, Chinese, Danish, Dutch, French, German, Greek, Italian, Japanese, Polish, Portuguese, Russian, Spanish, Swedish, Turkish, and Ukrainian.
|
||||
|
||||
This package is fully independent of any other R package and works on Windows, macOS and Linux with all versions of R since R-3.0.0 (April 2013). It was designed to work in any setting, including those with very limited resources. It was created for both routine data analysis and academic research at the Faculty of Medical Sciences of the University of Groningen, in collaboration with non-profit organisations Certe Medical Diagnostics and Advice Foundation and University Medical Center Groningen. This R package is actively maintained and free software; you can freely use and distribute it for both personal and commercial (but not patent) purposes under the terms of the GNU General Public License version 2.0 (GPL-2), as published by the Free Software Foundation.
|
||||
|
||||
|
File diff suppressed because one or more lines are too long
@ -161,8 +161,8 @@ MO_CONS <- create_species_cons_cops("CoNS")
|
||||
MO_COPS <- create_species_cons_cops("CoPS")
|
||||
MO_STREP_ABCG <- AMR_env$MO_lookup$mo[which(AMR_env$MO_lookup$genus == "Streptococcus" &
|
||||
AMR_env$MO_lookup$species %in% c(
|
||||
"pyogenes", "agalactiae", "dysgalactiae", "equi", "anginosus", "sanguinis", "salivarius",
|
||||
"group A", "group B", "group C", "group D", "group F", "group G", "group H", "group K", "group L"
|
||||
"pyogenes", "agalactiae", "dysgalactiae", "equi", "canis",
|
||||
"group A", "group B", "group C", "group G"
|
||||
))]
|
||||
MO_FULLNAME_LOWER <- create_MO_fullname_lower()
|
||||
MO_PREVALENT_GENERA <- c(
|
||||
|
@ -1 +1 @@
|
||||
33ff38f7d52ae39f951f60b73db29630
|
||||
ddbee37dd1dfa56d90d9b70ad2ed6308
|
||||
|
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
File diff suppressed because it is too large
Load Diff
Binary file not shown.
Binary file not shown.
Binary file not shown.
@ -1 +1 @@
|
||||
9d0367aa37e7f7d6923caae506ea434d
|
||||
88f3ce0e84bf8336f43cd6565215db1d
|
||||
|
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
File diff suppressed because one or more lines are too long
Binary file not shown.
@ -50,8 +50,7 @@ library(dplyr)
|
||||
library(vroom) # to import files
|
||||
library(rvest) # to scape LPSN website
|
||||
library(progress) # to show progress bars
|
||||
library(AMR)
|
||||
# also requires 'rvest' and 'progress' packages
|
||||
devtools::load_all(".") # load AMR package
|
||||
|
||||
# Helper functions --------------------------------------------------------
|
||||
|
||||
@ -538,6 +537,16 @@ taxonomy <- taxonomy %>%
|
||||
arrange(fullname) %>%
|
||||
filter(fullname != "")
|
||||
|
||||
# get missing entries from existing microorganisms data set
|
||||
taxonomy <- taxonomy %>%
|
||||
bind_rows(AMR::microorganisms %>%
|
||||
select(all_of(colnames(taxonomy))) %>%
|
||||
filter(!paste(kingdom, fullname) %in% paste(taxonomy$kingdom, taxonomy$fullname),
|
||||
# these will be added later:
|
||||
source != "manually added")) %>%
|
||||
arrange(fullname) %>%
|
||||
filter(fullname != "")
|
||||
|
||||
# fix rank
|
||||
table(taxonomy$rank, useNA = "always")
|
||||
taxonomy <- taxonomy %>%
|
||||
@ -554,26 +563,13 @@ taxonomy <- taxonomy %>%
|
||||
))
|
||||
table(taxonomy$rank, useNA = "always")
|
||||
|
||||
# get the latest upper taxonomy from LPSN to update the GBIF data
|
||||
# (e.g., phylum above class "Bacilli" was still "Firmicutes", should be "Bacillota" in 2022)
|
||||
for (k in unique(taxonomy$kingdom[taxonomy$kingdom != ""])) {
|
||||
message("Fixing GBIF taxonomy for kingdom ", k, "...")
|
||||
for (g in unique(taxonomy$genus[taxonomy$genus != "" & taxonomy$kingdom == k & taxonomy$source == "LPSN"])) {
|
||||
taxonomy$family[which(taxonomy$genus == g & taxonomy$kingdom == k)] <- taxonomy$family[which(taxonomy$genus == g & taxonomy$kingdom == k & taxonomy$source == "LPSN")][1]
|
||||
}
|
||||
for (f in unique(taxonomy$family[taxonomy$family != "" & taxonomy$kingdom == k & taxonomy$source == "LPSN"])) {
|
||||
taxonomy$order[which(taxonomy$family == f & taxonomy$kingdom == k)] <- taxonomy$order[which(taxonomy$family == f & taxonomy$kingdom == k & taxonomy$source == "LPSN")][1]
|
||||
}
|
||||
for (o in unique(taxonomy$order[taxonomy$order != "" & taxonomy$kingdom == k & taxonomy$source == "LPSN"])) {
|
||||
taxonomy$class[which(taxonomy$order == o & taxonomy$kingdom == k)] <- taxonomy$class[which(taxonomy$order == o & taxonomy$kingdom == k & taxonomy$source == "LPSN")][1]
|
||||
}
|
||||
for (cc in unique(taxonomy$class[taxonomy$class != "" & taxonomy$kingdom == k & taxonomy$source == "LPSN"])) {
|
||||
taxonomy$phylum[which(taxonomy$class == cc & taxonomy$kingdom == k)] <- taxonomy$phylum[which(taxonomy$class == cc & taxonomy$kingdom == k & taxonomy$source == "LPSN")][1]
|
||||
}
|
||||
}
|
||||
|
||||
# Save intermediate results (0) -------------------------------------------
|
||||
|
||||
saveRDS(taxonomy, "data-raw/taxonomy0.rds")
|
||||
|
||||
|
||||
# Add missing taxonomic entries -------------------------------------------
|
||||
# Add missing and fix old taxonomic entries -------------------------------
|
||||
|
||||
# this part will make sure that the whole taxonomy of every included species exists, so no missing genera, classes, etc.
|
||||
|
||||
@ -639,19 +635,6 @@ taxonomy_all_missing %>% View()
|
||||
taxonomy <- taxonomy %>%
|
||||
bind_rows(taxonomy_all_missing)
|
||||
|
||||
# we need to fix parent GBIF identifiers
|
||||
taxonomy$gbif_parent[taxonomy$rank == "phylum" & !is.na(taxonomy$gbif)] <- taxonomy$gbif[match(taxonomy$kingdom[taxonomy$rank == "phylum" & !is.na(taxonomy$gbif)], taxonomy$fullname)]
|
||||
taxonomy$gbif_parent[taxonomy$rank == "class" & !is.na(taxonomy$gbif)] <- taxonomy$gbif[match(taxonomy$phylum[taxonomy$rank == "class" & !is.na(taxonomy$gbif)], taxonomy$fullname)]
|
||||
taxonomy$gbif_parent[taxonomy$rank == "order" & !is.na(taxonomy$gbif)] <- taxonomy$gbif[match(taxonomy$class[taxonomy$rank == "order" & !is.na(taxonomy$gbif)], taxonomy$fullname)]
|
||||
taxonomy$gbif_parent[taxonomy$rank == "family" & !is.na(taxonomy$gbif)] <- taxonomy$gbif[match(taxonomy$order[taxonomy$rank == "family" & !is.na(taxonomy$gbif)], taxonomy$fullname)]
|
||||
taxonomy$gbif_parent[taxonomy$rank == "genus" & !is.na(taxonomy$gbif)] <- taxonomy$gbif[match(taxonomy$family[taxonomy$rank == "genus" & !is.na(taxonomy$gbif)], taxonomy$fullname)]
|
||||
taxonomy$gbif_parent[taxonomy$rank == "species" & !is.na(taxonomy$gbif)] <- taxonomy$gbif[match(taxonomy$genus[taxonomy$rank == "species" & !is.na(taxonomy$gbif)], taxonomy$fullname)]
|
||||
taxonomy$gbif_parent[taxonomy$rank == "subspecies" & !is.na(taxonomy$gbif)] <- taxonomy$gbif[match(paste(taxonomy$genus[taxonomy$rank == "subspecies" & !is.na(taxonomy$gbif)], taxonomy$species[taxonomy$rank == "subspecies" & !is.na(taxonomy$gbif)]), taxonomy$fullname)]
|
||||
|
||||
# these still have no record in our data set:
|
||||
all(taxonomy$lpsn_parent %in% taxonomy$lpsn)
|
||||
all(taxonomy$gbif_parent %in% taxonomy$gbif)
|
||||
|
||||
# fix for duplicate fullnames within a kingdom (such as Nitrospira which is the name of the genus AND its class)
|
||||
taxonomy <- taxonomy %>%
|
||||
mutate(rank_index = case_when(rank == "subspecies" ~ 1,
|
||||
@ -666,7 +649,7 @@ taxonomy <- taxonomy %>%
|
||||
group_by(kingdom, fullname) %>%
|
||||
mutate(fullname = if_else(row_number() > 1, fullname_rank, fullname)) %>%
|
||||
ungroup() %>%
|
||||
select(-fullname_rank) %>%
|
||||
select(-fullname_rank, -rank_index) %>%
|
||||
arrange(fullname)
|
||||
|
||||
# now also add missing species (requires combination with genus)
|
||||
@ -699,7 +682,7 @@ taxonomy <- taxonomy %>%
|
||||
filter(kingdom != "")
|
||||
|
||||
|
||||
# Save intermediate results -----------------------------------------------
|
||||
# Save intermediate results (1) -------------------------------------------
|
||||
|
||||
saveRDS(taxonomy, "data-raw/taxonomy1.rds")
|
||||
|
||||
@ -710,6 +693,9 @@ manually_added <- AMR::microorganisms %>%
|
||||
filter(source == "manually added", !paste(kingdom, fullname) %in% paste(taxonomy$kingdom, taxonomy$fullname)) %>%
|
||||
select(fullname:subspecies, ref, source, rank)
|
||||
|
||||
manually_added <- manually_added %>%
|
||||
bind_rows(salmonellae)
|
||||
|
||||
# get latest taxonomy for those entries
|
||||
for (g in unique(manually_added$genus[manually_added$genus != "" & manually_added$genus %in% taxonomy$genus])) {
|
||||
manually_added$family[which(manually_added$genus == g)] <- taxonomy$family[which(taxonomy$genus == g & is.na(taxonomy$lpsn))][1]
|
||||
@ -723,14 +709,19 @@ for (o in unique(manually_added$order[manually_added$order != "" & manually_adde
|
||||
for (cc in unique(manually_added$class[manually_added$class != "" & manually_added$class %in% taxonomy$class])) {
|
||||
manually_added$phylum[which(manually_added$class == cc)] <- taxonomy$phylum[which(taxonomy$class == cc & is.na(taxonomy$lpsn))][1]
|
||||
}
|
||||
for (p in unique(manually_added$phylum[manually_added$phylum != "" & manually_added$phylum %in% taxonomy$phylum])) {
|
||||
manually_added$kingdom[which(manually_added$phylum == p)] <- taxonomy$kingdom[which(taxonomy$phylum == p & is.na(taxonomy$lpsn))][1]
|
||||
}
|
||||
|
||||
manually_added <- manually_added %>%
|
||||
mutate(
|
||||
status = "accepted",
|
||||
rank = ifelse(fullname %like% "unknown", "(unknown rank)", rank)
|
||||
)
|
||||
manually_added
|
||||
|
||||
taxonomy <- taxonomy %>%
|
||||
# here also the 'unknowns' are added, such as "(unknown fungus)"
|
||||
bind_rows(manually_added) %>%
|
||||
arrange(fullname)
|
||||
|
||||
@ -743,6 +734,49 @@ taxonomy <- taxonomy %>%
|
||||
mutate(ref = get_author_year(ref))
|
||||
|
||||
|
||||
# Get the latest upper taxonomy from LPSN for non-LPSN data ---------------
|
||||
|
||||
# (e.g., phylum above class "Bacilli" was still "Firmicutes", should be "Bacillota" in 2022)
|
||||
for (k in unique(taxonomy$kingdom[taxonomy$kingdom != ""])) {
|
||||
message("Fixing GBIF taxonomy for kingdom ", k, ".", appendLF = FALSE)
|
||||
i <- 0
|
||||
for (g in unique(taxonomy$genus[taxonomy$genus != "" & taxonomy$kingdom == k & taxonomy$source == "LPSN"])) {
|
||||
i <- i + 1
|
||||
if (i %% 50 == 0) message(".", appendLF = FALSE)
|
||||
taxonomy$family[which(taxonomy$genus == g & taxonomy$kingdom == k)] <- taxonomy$family[which(taxonomy$genus == g & taxonomy$kingdom == k & taxonomy$source == "LPSN")][1]
|
||||
}
|
||||
for (f in unique(taxonomy$family[taxonomy$family != "" & taxonomy$kingdom == k & taxonomy$source == "LPSN"])) {
|
||||
i <- i + 1
|
||||
if (i %% 50 == 0) message(".", appendLF = FALSE)
|
||||
taxonomy$order[which(taxonomy$family == f & taxonomy$kingdom == k)] <- taxonomy$order[which(taxonomy$family == f & taxonomy$kingdom == k & taxonomy$source == "LPSN")][1]
|
||||
}
|
||||
for (o in unique(taxonomy$order[taxonomy$order != "" & taxonomy$kingdom == k & taxonomy$source == "LPSN"])) {
|
||||
i <- i + 1
|
||||
if (i %% 50 == 0) message(".", appendLF = FALSE)
|
||||
taxonomy$class[which(taxonomy$order == o & taxonomy$kingdom == k)] <- taxonomy$class[which(taxonomy$order == o & taxonomy$kingdom == k & taxonomy$source == "LPSN")][1]
|
||||
}
|
||||
for (cc in unique(taxonomy$class[taxonomy$class != "" & taxonomy$kingdom == k & taxonomy$source == "LPSN"])) {
|
||||
i <- i + 1
|
||||
if (i %% 50 == 0) message(".", appendLF = FALSE)
|
||||
taxonomy$phylum[which(taxonomy$class == cc & taxonomy$kingdom == k)] <- taxonomy$phylum[which(taxonomy$class == cc & taxonomy$kingdom == k & taxonomy$source == "LPSN")][1]
|
||||
}
|
||||
message("OK.")
|
||||
}
|
||||
|
||||
# we need to fix parent GBIF identifiers
|
||||
taxonomy$gbif_parent[taxonomy$rank == "phylum" & !is.na(taxonomy$gbif)] <- taxonomy$gbif[match(taxonomy$kingdom[taxonomy$rank == "phylum" & !is.na(taxonomy$gbif)], taxonomy$fullname)]
|
||||
taxonomy$gbif_parent[taxonomy$rank == "class" & !is.na(taxonomy$gbif)] <- taxonomy$gbif[match(taxonomy$phylum[taxonomy$rank == "class" & !is.na(taxonomy$gbif)], taxonomy$fullname)]
|
||||
taxonomy$gbif_parent[taxonomy$rank == "order" & !is.na(taxonomy$gbif)] <- taxonomy$gbif[match(taxonomy$class[taxonomy$rank == "order" & !is.na(taxonomy$gbif)], taxonomy$fullname)]
|
||||
taxonomy$gbif_parent[taxonomy$rank == "family" & !is.na(taxonomy$gbif)] <- taxonomy$gbif[match(taxonomy$order[taxonomy$rank == "family" & !is.na(taxonomy$gbif)], taxonomy$fullname)]
|
||||
taxonomy$gbif_parent[taxonomy$rank == "genus" & !is.na(taxonomy$gbif)] <- taxonomy$gbif[match(taxonomy$family[taxonomy$rank == "genus" & !is.na(taxonomy$gbif)], taxonomy$fullname)]
|
||||
taxonomy$gbif_parent[taxonomy$rank == "species" & !is.na(taxonomy$gbif)] <- taxonomy$gbif[match(taxonomy$genus[taxonomy$rank == "species" & !is.na(taxonomy$gbif)], taxonomy$fullname)]
|
||||
taxonomy$gbif_parent[taxonomy$rank == "subspecies" & !is.na(taxonomy$gbif)] <- taxonomy$gbif[match(paste(taxonomy$genus[taxonomy$rank == "subspecies" & !is.na(taxonomy$gbif)], taxonomy$species[taxonomy$rank == "subspecies" & !is.na(taxonomy$gbif)]), taxonomy$fullname)]
|
||||
|
||||
# these still have no record in our data set:
|
||||
all(taxonomy$lpsn_parent %in% taxonomy$lpsn)
|
||||
all(taxonomy$gbif_parent %in% taxonomy$gbif)
|
||||
|
||||
|
||||
# Add prevalence ----------------------------------------------------------
|
||||
|
||||
# update prevalence based on taxonomy (our own JSS paper: Berends MS et al. (2022), DOI 10.18637/jss.v104.i03)
|
||||
@ -768,43 +802,6 @@ taxonomy <- taxonomy %>%
|
||||
table(taxonomy$prevalence, useNA = "always")
|
||||
# (a lot will be removed further below)
|
||||
|
||||
|
||||
# Add old entries that must be kept ---------------------------------------
|
||||
|
||||
# these are bacteria now removed, but have an renamed-to identifier to a current record, so add them
|
||||
old_to_keep <- microorganisms %>%
|
||||
filter(!paste(kingdom, fullname) %in% paste(taxonomy$kingdom, taxonomy$fullname),
|
||||
kingdom == "Bacteria",
|
||||
(gbif_renamed_to %in% taxonomy$gbif & !is.na(gbif_renamed_to)) | (lpsn_renamed_to %in% taxonomy$lpsn & !is.na(lpsn_renamed_to)),
|
||||
rank %in% c("genus", "species", "subspecies")) %>%
|
||||
select(-mo, -snomed)
|
||||
|
||||
taxonomy <- taxonomy %>%
|
||||
bind_rows(old_to_keep) %>%
|
||||
arrange(fullname)
|
||||
|
||||
# and these had prevalence = 1, why do they miss now?
|
||||
old_to_keep2 <- microorganisms %>%
|
||||
filter(!paste(kingdom, fullname) %in% paste(taxonomy$kingdom, taxonomy$fullname),
|
||||
prevalence == 1,
|
||||
!is.na(gbif_parent) & gbif_parent %in% taxonomy$gbif,
|
||||
rank %in% c("genus", "species", "subspecies")) %>%
|
||||
select(-mo, -snomed)
|
||||
|
||||
taxonomy <- taxonomy %>%
|
||||
bind_rows(old_to_keep2) %>%
|
||||
arrange(fullname)
|
||||
|
||||
# strangly, Trichomonas is no longer in GBIF?
|
||||
old_to_keep3 <- microorganisms %>%
|
||||
filter(fullname %like% "^trichomona") %>%
|
||||
select(-mo, -snomed)
|
||||
|
||||
taxonomy <- taxonomy %>%
|
||||
filter(!fullname %in% old_to_keep3$fullname) %>%
|
||||
bind_rows(old_to_keep3) %>%
|
||||
arrange(fullname)
|
||||
|
||||
# fix rank
|
||||
taxonomy <- taxonomy %>%
|
||||
mutate(rank = case_when(
|
||||
@ -820,6 +817,11 @@ taxonomy <- taxonomy %>%
|
||||
))
|
||||
|
||||
|
||||
# Save intermediate results (2) -------------------------------------------
|
||||
|
||||
saveRDS(taxonomy, "data-raw/taxonomy2.rds")
|
||||
|
||||
|
||||
# Add microbial IDs -------------------------------------------------------
|
||||
|
||||
# MO codes in the AMR package have the form KINGDOM_GENUS_SPECIES_SUBSPECIES where all are abbreviated.
|
||||
@ -995,7 +997,7 @@ mo_genus <- mo_genus %>%
|
||||
mo_species <- taxonomy %>%
|
||||
filter(rank == "species") %>%
|
||||
distinct(kingdom, genus, species) %>%
|
||||
left_join(microorganisms %>%
|
||||
left_join(AMR::microorganisms %>%
|
||||
filter(rank == "species") %>%
|
||||
transmute(mo_species_old = gsub("^[A-Z]+_[A-Z]+_", "", as.character(mo)), kingdom, genus, species) %>%
|
||||
filter(mo_species_old %unlike% "-") %>%
|
||||
@ -1043,7 +1045,7 @@ mo_species <- mo_species %>%
|
||||
mo_subspecies <- taxonomy %>%
|
||||
filter(rank == "subspecies") %>%
|
||||
distinct(kingdom, genus, species, subspecies) %>%
|
||||
left_join(microorganisms %>%
|
||||
left_join(AMR::microorganisms %>%
|
||||
filter(rank %in% c("subspecies", "subsp.", "infraspecies")) %>%
|
||||
transmute(mo_subspecies_old = gsub("^[A-Z]+_[A-Z]+_[A-Z]+_", "", as.character(mo)), kingdom, genus, species, subspecies) %>%
|
||||
filter(mo_subspecies_old %unlike% "-") %>%
|
||||
@ -1138,13 +1140,31 @@ taxonomy <- taxonomy %>%
|
||||
taxonomy %>% filter(mo %like% "__") %>% View()
|
||||
taxonomy <- taxonomy %>% filter(mo %unlike% "__")
|
||||
|
||||
# this must be empty of course
|
||||
taxonomy %>% filter(mo %in% .[duplicated(mo), "mo", drop = TRUE]) %>% View()
|
||||
|
||||
# Some integrity checks ---------------------------------------------------
|
||||
|
||||
# are mo codes unique?
|
||||
taxonomy %>% filter(mo %in% .[duplicated(mo), "mo", drop = TRUE])
|
||||
taxonomy <- taxonomy %>% distinct(mo, .keep_all = TRUE)
|
||||
|
||||
# Save intermediate results -----------------------------------------------
|
||||
# are fullnames unique?
|
||||
taxonomy %>% filter(fullname %in% .[duplicated(fullname), "fullname", drop = TRUE])
|
||||
|
||||
saveRDS(taxonomy, "data-raw/taxonomy2.rds")
|
||||
# are all GBIFs available?
|
||||
taxonomy %>% filter(!gbif_parent %in% gbif) %>% count(rank)
|
||||
# try to find the right gbif IDs
|
||||
taxonomy$gbif_parent[which(!taxonomy$gbif_parent %in% taxonomy$gbif & taxonomy$rank == "species")] <- taxonomy$gbif[match(taxonomy$genus[which(!taxonomy$gbif_parent %in% taxonomy$gbif & taxonomy$rank == "species")], taxonomy$genus)]
|
||||
taxonomy$gbif_parent[which(!taxonomy$gbif_parent %in% taxonomy$gbif & taxonomy$rank == "class")] <- taxonomy$gbif[match(taxonomy$phylum[which(!taxonomy$gbif_parent %in% taxonomy$gbif & taxonomy$rank == "class")], taxonomy$phylum)]
|
||||
taxonomy %>% filter(!gbif_parent %in% gbif) %>% count(rank)
|
||||
|
||||
# are all LPSNs available?
|
||||
taxonomy %>% filter(!lpsn_parent %in% lpsn) %>% count(rank)
|
||||
# make GBIF refer to newest renaming according to LPSN
|
||||
taxonomy$gbif_renamed_to[which(!is.na(taxonomy$gbif_renamed_to) & !is.na(taxonomy$lpsn_renamed_to))] <- taxonomy$gbif[match(taxonomy$lpsn_renamed_to[which(!is.na(taxonomy$gbif_renamed_to) & !is.na(taxonomy$lpsn_renamed_to))], taxonomy$lpsn)]
|
||||
|
||||
# Save intermediate results (3) -------------------------------------------
|
||||
|
||||
saveRDS(taxonomy, "data-raw/taxonomy3.rds")
|
||||
|
||||
|
||||
# Remove unwanted taxonomic entries from Protoza/Fungi --------------------
|
||||
@ -1176,13 +1196,13 @@ taxonomy <- taxonomy %>%
|
||||
|
||||
|
||||
message("\nCongratulations! The new taxonomic table will contain ", format(nrow(taxonomy), big.mark = ","), " rows.\n",
|
||||
"This was ", format(nrow(microorganisms), big.mark = ","), " rows.\n")
|
||||
"This was ", format(nrow(AMR::microorganisms), big.mark = ","), " rows.\n")
|
||||
|
||||
# these are the new ones:
|
||||
taxonomy %>% filter(!paste(kingdom, fullname) %in% paste(microorganisms$kingdom, microorganisms$fullname)) %>% View()
|
||||
taxonomy %>% filter(!paste(kingdom, fullname) %in% paste(AMR::microorganisms$kingdom, AMR::microorganisms$fullname)) %>% View()
|
||||
# these were removed:
|
||||
microorganisms %>% filter(!paste(kingdom, fullname) %in% paste(taxonomy$kingdom, taxonomy$fullname)) %>% View()
|
||||
microorganisms %>% filter(!fullname %in% taxonomy$fullname) %>% View()
|
||||
AMR::microorganisms %>% filter(!paste(kingdom, fullname) %in% paste(taxonomy$kingdom, taxonomy$fullname)) %>% View()
|
||||
AMR::microorganisms %>% filter(!fullname %in% taxonomy$fullname) %>% View()
|
||||
|
||||
|
||||
# Add SNOMED CT -----------------------------------------------------------
|
||||
|
1584
data-raw/salmonellae.R
Normal file
1584
data-raw/salmonellae.R
Normal file
File diff suppressed because it is too large
Load Diff
BIN
data-raw/taxonomy0.rds
Normal file
BIN
data-raw/taxonomy0.rds
Normal file
Binary file not shown.
Binary file not shown.
Binary file not shown.
BIN
data-raw/taxonomy3.rds
Normal file
BIN
data-raw/taxonomy3.rds
Normal file
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
2
index.md
2
index.md
@ -20,7 +20,7 @@ The `AMR` package is a [free and open-source](#copyright) R package with [zero d
|
||||
|
||||
This work was published in the Journal of Statistical Software (Volume 104(3); [DOI 10.18637/jss.v104.i03](https://doi.org/10.18637/jss.v104.i03)) and formed the basis of two PhD theses ([DOI 10.33612/diss.177417131](https://doi.org/10.33612/diss.177417131) and [DOI 10.33612/diss.192486375](https://doi.org/10.33612/diss.192486375)).
|
||||
|
||||
After installing this package, R knows [**~48,000 distinct microbial species**](./reference/microorganisms.html) (updated December 2022) and all [**~600 antibiotic, antimycotic and antiviral drugs**](./reference/antibiotics.html) by name and code (including ATC, EARS-Net, ASIARS-Net, PubChem, LOINC and SNOMED CT), and knows all about valid R/SI and MIC values. The integral breakpoint guidelines from CLSI and EUCAST are included from the last 10 years. It supports and can read any data format, including WHONET data. This package works on Windows, macOS and Linux with all versions of R since R-3.0 (April 2013). **It was designed to work in any setting, including those with very limited resources**. It was created for both routine data analysis and academic research at the Faculty of Medical Sciences of the [University of Groningen](https://www.rug.nl), in collaboration with non-profit organisations [Certe Medical Diagnostics and Advice Foundation](https://www.certe.nl) and [University Medical Center Groningen](https://www.umcg.nl), and is being [actively and durably maintained](./news) by two public healthcare organisations in the Netherlands.
|
||||
After installing this package, R knows [**~52,000 distinct microbial species**](./reference/microorganisms.html) (updated December 2022) and all [**~600 antibiotic, antimycotic and antiviral drugs**](./reference/antibiotics.html) by name and code (including ATC, EARS-Net, ASIARS-Net, PubChem, LOINC and SNOMED CT), and knows all about valid R/SI and MIC values. The integral breakpoint guidelines from CLSI and EUCAST are included from the last 10 years. It supports and can read any data format, including WHONET data. This package works on Windows, macOS and Linux with all versions of R since R-3.0 (April 2013). **It was designed to work in any setting, including those with very limited resources**. It was created for both routine data analysis and academic research at the Faculty of Medical Sciences of the [University of Groningen](https://www.rug.nl), in collaboration with non-profit organisations [Certe Medical Diagnostics and Advice Foundation](https://www.certe.nl) and [University Medical Center Groningen](https://www.umcg.nl), and is being [actively and durably maintained](./news) by two public healthcare organisations in the Netherlands.
|
||||
|
||||
##### Used in 175 countries, translated to 16 languages
|
||||
|
||||
|
@ -32,7 +32,7 @@ Welcome to the \code{AMR} package.
|
||||
|
||||
This work was published in the Journal of Statistical Software (Volume 104(3); \doi{10.18637/jss.v104.i03}) and formed the basis of two PhD theses (\doi{10.33612/diss.177417131} and \doi{10.33612/diss.192486375}).
|
||||
|
||||
After installing this package, \R knows ~48,000 distinct microbial species and all ~600 antibiotic, antimycotic and antiviral drugs by name and code (including ATC, EARS-NET, LOINC and SNOMED CT), and knows all about valid R/SI and MIC values. It supports any data format, including WHONET/EARS-Net data.
|
||||
After installing this package, \R knows ~52,000 distinct microbial species and all ~600 antibiotic, antimycotic and antiviral drugs by name and code (including ATC, EARS-NET, LOINC and SNOMED CT), and knows all about valid R/SI and MIC values. It supports any data format, including WHONET/EARS-Net data.
|
||||
|
||||
This package is fully independent of any other \R package and works on Windows, macOS and Linux with all versions of \R since R-3.0.0 (April 2013). It was designed to work in any setting, including those with very limited resources. It was created for both routine data analysis and academic research at the Faculty of Medical Sciences of the University of Groningen, in collaboration with non-profit organisations Certe Medical Diagnostics and Advice and University Medical Center Groningen. This \R package is actively maintained and free software; you can freely use and distribute it for both personal and commercial (but not patent) purposes under the terms of the GNU General Public License version 2.0 (GPL-2), as published by the Free Software Foundation.
|
||||
|
||||
|
@ -5,7 +5,7 @@
|
||||
\alias{intrinsic_resistant}
|
||||
\title{Data Set with Bacterial Intrinsic Resistance}
|
||||
\format{
|
||||
A \link[tibble:tibble]{tibble} with 206,832 observations and 2 variables:
|
||||
A \link[tibble:tibble]{tibble} with 134,634 observations and 2 variables:
|
||||
\itemize{
|
||||
\item \code{mo}\cr Microorganism ID
|
||||
\item \code{ab}\cr Antibiotic ID
|
||||
|
@ -3,9 +3,9 @@
|
||||
\docType{data}
|
||||
\name{microorganisms}
|
||||
\alias{microorganisms}
|
||||
\title{Data Set with 48,050 Microorganisms}
|
||||
\title{Data Set with 52,144 Microorganisms}
|
||||
\format{
|
||||
A \link[tibble:tibble]{tibble} with 48,050 observations and 22 variables:
|
||||
A \link[tibble:tibble]{tibble} with 52,144 observations and 22 variables:
|
||||
\itemize{
|
||||
\item \code{mo}\cr ID of microorganism as used by this package
|
||||
\item \code{fullname}\cr Full name, like \code{"Escherichia coli"}. For the taxonomic ranks genus, species and subspecies, this is the 'pasted' text of genus, species, and subspecies. For all taxonomic ranks higher than genus, this is the name of the taxon.
|
||||
@ -29,6 +29,7 @@ A \link[tibble:tibble]{tibble} with 48,050 observations and 22 variables:
|
||||
\item Parte, AC \emph{et al.} (2020). \strong{List of Prokaryotic names with Standing in Nomenclature (LPSN) moves to the DSMZ.} International Journal of Systematic and Evolutionary Microbiology, 70, 5607-5612; \doi{10.1099/ijsem.0.004332}. Accessed from \url{https://lpsn.dsmz.de} on 11 December, 2022.
|
||||
\item GBIF Secretariat (2022). GBIF Backbone Taxonomy. Checklist dataset \doi{10.15468/39omei}. Accessed from \url{https://www.gbif.org} on 11 December, 2022.
|
||||
\item Public Health Information Network Vocabulary Access and Distribution System (PHIN VADS). US Edition of SNOMED CT from 1 September 2020. Value Set Name 'Microoganism', OID 2.16.840.1.114222.4.11.1009 (v12). URL: \url{https://phinvads.cdc.gov}
|
||||
\item Grimont \emph{et al.}. Antigenic Formulae of the Salmonella Serovars, 2007, 9th Edition. WHO Collaborating Centre for Reference and Research on \emph{Salmonella} (WHOCC-SALM).
|
||||
}
|
||||
}
|
||||
\usage{
|
||||
@ -46,11 +47,11 @@ For example, \emph{Staphylococcus pettenkoferi} was described for the first time
|
||||
|
||||
Included taxonomic data are:
|
||||
\itemize{
|
||||
\item All ~34,000 (sub)species from the kingdoms of Archaea and Bacteria
|
||||
\item ~6,900 (sub)species from the kingdom of Fungi. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package. Only relevant fungi are covered (such as all species of \emph{Aspergillus}, \emph{Candida}, \emph{Cryptococcus}, \emph{Histoplasma}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).
|
||||
\item ~4,400 (sub)species from the kingdom of Protozoa
|
||||
\item ~1,100 (sub)species from ~40 other relevant genera from the kingdom of Animalia (such as \emph{Strongyloides} and \emph{Taenia})
|
||||
\item All ~9,100 previously accepted names of all included (sub)species (these were taxonomically renamed)
|
||||
\item All ~36,000 (sub)species from the kingdoms of Archaea and Bacteria
|
||||
\item ~7,900 (sub)species from the kingdom of Fungi. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package. Only relevant fungi are covered (such as all species of \emph{Aspergillus}, \emph{Candida}, \emph{Cryptococcus}, \emph{Histoplasma}, \emph{Pneumocystis}, \emph{Saccharomyces} and \emph{Trichophyton}).
|
||||
\item ~5,100 (sub)species from the kingdom of Protozoa
|
||||
\item ~1,400 (sub)species from ~40 other relevant genera from the kingdom of Animalia (such as \emph{Strongyloides} and \emph{Taenia})
|
||||
\item All ~9,800 previously accepted names of all included (sub)species (these were taxonomically renamed)
|
||||
\item The complete taxonomic tree of all included (sub)species: from kingdom to subspecies
|
||||
\item The identifier of the parent taxons
|
||||
\item The year and first author of the related scientific publication
|
||||
@ -59,6 +60,7 @@ Included taxonomic data are:
|
||||
|
||||
For convenience, some entries were added manually:
|
||||
\itemize{
|
||||
\item ~1,500 entries for the city-like serovars of \emph{Salmonellae}
|
||||
\item 11 entries of \emph{Streptococcus} (beta-haemolytic: groups A, B, C, D, F, G, H, K and unspecified; other: viridans, milleri)
|
||||
\item 2 entries of \emph{Staphylococcus} (coagulase-negative (CoNS) and coagulase-positive (CoPS))
|
||||
\item 1 entry of \emph{Blastocystis} (\emph{B. hominis}), although it officially does not exist (Noel \emph{et al.} 2005, PMID 15634993)
|
||||
|
@ -3,9 +3,9 @@
|
||||
\docType{data}
|
||||
\name{microorganisms.codes}
|
||||
\alias{microorganisms.codes}
|
||||
\title{Data Set with 5,932 Common Microorganism Codes}
|
||||
\title{Data Set with 5,910 Common Microorganism Codes}
|
||||
\format{
|
||||
A \link[tibble:tibble]{tibble} with 5,932 observations and 2 variables:
|
||||
A \link[tibble:tibble]{tibble} with 5,910 observations and 2 variables:
|
||||
\itemize{
|
||||
\item \code{code}\cr Commonly used code of a microorganism
|
||||
\item \code{mo}\cr ID of the microorganism in the \link{microorganisms} data set
|
||||
|
Binary file not shown.
Before Width: | Height: | Size: 94 KiB After Width: | Height: | Size: 98 KiB |
File diff suppressed because one or more lines are too long
Before Width: | Height: | Size: 103 KiB After Width: | Height: | Size: 359 KiB |
Binary file not shown.
Before Width: | Height: | Size: 56 KiB |
Loading…
Reference in New Issue
Block a user