Data sets for download / own use
-11 December 2022
+16 December 2022
Source:vignettes/datasets.Rmd
datasets.Rmd
microorganisms
: Full Microbial Taxonomy
-A data set with 48,050 rows and 22 columns, containing the following +
A data set with 52,144 rows and 22 columns, containing the following
column names:
mo, fullname, status, kingdom,
phylum, class, order, family,
genus, species, subspecies, rank,
@@ -196,33 +196,33 @@ column names:
mo, fullname, status, kingdomgbif_renamed_to, prevalence and snomed.
This data set is in R available as microorganisms
, after
you load the AMR
package.
It was last updated on 11 December 2022 23:14:56 UTC. Find more info +
It was last updated on 16 December 2022 15:10:43 UTC. Find more info about the structure of this data set here.
Direct download links:
- Download as original
-R Data Structure (RDS) file (1.1 MB)
+R Data Structure (RDS) file (1.2 MB)
- Download as tab-separated
-text file (10.4 MB)
+text file (11.3 MB)
- Download as Microsoft
-Excel workbook (4.7 MB)
+Excel workbook (5 MB)
- Download as Apache
-Feather file (5 MB)
+Feather file (5.4 MB)
- Download as Apache
-Parquet file (2.4 MB)
+Parquet file (2.6 MB)
- Download as SAS
-data file (46.9 MB)
+data file (50.9 MB)
- Download as IBM
-SPSS Statistics data file (15.5 MB)
+SPSS Statistics data file (16.8 MB)
- Download as Stata -DTA file (43.4 MB) +DTA file (47.1 MB)
NOTE: The exported files for SAS, SPSS and Stata contain only
the first 50 SNOMED codes per record, as their file size would otherwise
@@ -265,23 +265,23 @@ Set Name ‘Microoganism’, OID 2.16.840.1.114222.4.11.1009 (v12). URL:
A data set with 206,832 rows and 2 columns, containing the following
+ A data set with 134,634 rows and 2 columns, containing the following
column names: This data set is in R available as It was last updated on 11 December 2022 23:14:56 UTC. Find more info
+ It was last updated on 16 December 2022 15:10:43 UTC. Find more info
about the structure of this data set here. Direct download links: After installing this package, R knows ~48,000 distinct microbial
+ After installing this package, R knows ~52,000 distinct microbial
species and all ~600 antibiotic, antimycotic and antiviral drugs by name
and code (including ATC, EARS-Net, ASIARS-Net, PubChem, LOINC and SNOMED
CT), and knows all about valid R/SI and MIC values. The integral
diff --git a/authors.html b/authors.html
index b19dc9c6a..465c3e669 100644
--- a/authors.html
+++ b/authors.html
@@ -1,5 +1,5 @@
- The This work was published in the Journal of Statistical Software (Volume 104(3); DOI 10.18637/jss.v104.i03) and formed the basis of two PhD theses (DOI 10.33612/diss.177417131 and DOI 10.33612/diss.192486375). After installing this package, R knows ~48,000 distinct microbial species (updated December 2022) and all ~600 antibiotic, antimycotic and antiviral drugs by name and code (including ATC, EARS-Net, ASIARS-Net, PubChem, LOINC and SNOMED CT), and knows all about valid R/SI and MIC values. The integral breakpoint guidelines from CLSI and EUCAST are included from the last 10 years. It supports and can read any data format, including WHONET data. This package works on Windows, macOS and Linux with all versions of R since R-3.0 (April 2013). It was designed to work in any setting, including those with very limited resources. It was created for both routine data analysis and academic research at the Faculty of Medical Sciences of the University of Groningen, in collaboration with non-profit organisations Certe Medical Diagnostics and Advice Foundation and University Medical Center Groningen, and is being actively and durably maintained by two public healthcare organisations in the Netherlands. After installing this package, R knows ~52,000 distinct microbial species (updated December 2022) and all ~600 antibiotic, antimycotic and antiviral drugs by name and code (including ATC, EARS-Net, ASIARS-Net, PubChem, LOINC and SNOMED CT), and knows all about valid R/SI and MIC values. The integral breakpoint guidelines from CLSI and EUCAST are included from the last 10 years. It supports and can read any data format, including WHONET data. This package works on Windows, macOS and Linux with all versions of R since R-3.0 (April 2013). It was designed to work in any setting, including those with very limited resources. It was created for both routine data analysis and academic research at the Faculty of Medical Sciences of the University of Groningen, in collaboration with non-profit organisations Certe Medical Diagnostics and Advice Foundation and University Medical Center Groningen, and is being actively and durably maintained by two public healthcare organisations in the Netherlands. This version will eventually become v2.0! We’re happy to reach a new major milestone soon! (this beta version will eventually become v2.0! We’re happy to reach a new major milestone soon!) This is a new major release of the AMR package, with great new additions but also some breaking changes for current users. These are all listed below. TL;DR EUCAST 2022 and CLSI 2022 guidelines have been added for Interpretation guidelines older than 10 years were removed, the oldest now included guidelines of EUCAST and CLSI are from 2013. We added support for the following languages: Chinese, Greek, Japanese, Polish, Turkish and Ukrainian. All antibiotic names are now available in these languages, and the AMR package will automatically determine a supported language based on the user system language. We are very grateful for the valuable input by our colleagues from other countries. The The We also made the following changes regarding the included taxonomy or microorganisms functions: * Updated full microbiological taxonomy according to the latest daily LPSN data set (December 2022) and latest yearly GBIF taxonomy backbone (November 2022) * Support for all 1,515 city-like serovars of Salmonella, such as Salmonella Goldcoast. Formally, these are serovars belonging to the S. enterica species, but they are reported with only the name of the genus and the city. For this reason, the serovars are in the The new function The Also, we added support for using antibiotic selectors in scoped We now added extensive support for antiviral agents! For the first time, the After installing this package, R knows ~48,000 distinct microbial species and all ~600 antibiotic, antimycotic and antiviral drugs by name and code (including ATC, EARS-NET, LOINC and SNOMED CT), and knows all about valid R/SI and MIC values. It supports any data format, including WHONET/EARS-Net data. After installing this package, R knows ~52,000 distinct microbial species and all ~600 antibiotic, antimycotic and antiviral drugs by name and code (including ATC, EARS-NET, LOINC and SNOMED CT), and knows all about valid R/SI and MIC values. It supports any data format, including WHONET/EARS-Net data. This package is fully independent of any other R package and works on Windows, macOS and Linux with all versions of R since R-3.0.0 (April 2013). It was designed to work in any setting, including those with very limited resources. It was created for both routine data analysis and academic research at the Faculty of Medical Sciences of the University of Groningen, in collaboration with non-profit organisations Certe Medical Diagnostics and Advice and University Medical Center Groningen. This R package is actively maintained and free software; you can freely use and distribute it for both personal and commercial (but not patent) purposes under the terms of the GNU General Public License version 2.0 (GPL-2), as published by the Free Software Foundation. This package can be used for: Reference for the taxonomy of microorganisms, since the package contains all microbial (sub)species from the List of Prokaryotic names with Standing in Nomenclature (LPSN) and the Global Biodiversity Information Facility (GBIF) Interpreting raw MIC and disk diffusion values, based on any CLSI or EUCAST guideline from the last 10 years A tibble with 206,832 observations and 2 variables: A tibble with 134,634 observations and 2 variables:
Animalia
-1,061
+1,379
Archaea
-1,291
+1,314
Bacteria
-34,398
+36,478
Fungi
-6,852
+7,901
@@ -1042,37 +1042,37 @@ diffusion diameters. Included guidelines are CLSI (2013-2022) and EUCAST
Protozoa
-4,447
+5,071
intrinsic_resistant
: Intrinsic Bacterial
Resistance
-
mo and ab.intrinsic_resistant
,
after you load the AMR
package.
+R Data Structure (RDS) file (78 kB)
+text file (5.1 MB)
+Excel workbook (1.3 MB)
+Feather file (1.2 MB)
+Parquet file (0.2 MB)
+data file (9.8 MB)
+SPSS Statistics data file (7.4 MB)
Source
diff --git a/articles/index.html b/articles/index.html
index 994381533..b7eea2cdb 100644
--- a/articles/index.html
+++ b/articles/index.html
@@ -1,5 +1,5 @@
-
AMR
package is a free and open-source R package with zero dependencies to simplify the analysis and prediction of Antimicrobial Resistance (AMR) and to work with microbial and antimicrobial data and properties, by using evidence-based methods. Our aim is to provide a standard for clean and reproducible AMR data analysis, that can therefore empower epidemiological analyses to continuously enable surveillance and treatment evaluation in any setting.Used in 175 countries, translated to 16 languages
diff --git a/news/index.html b/news/index.html
index 8f6f22618..fc1520d6a 100644
--- a/news/index.html
+++ b/news/index.html
@@ -1,5 +1,5 @@
-AMR 1.8.2.9062
-Breaking
-microorganisms
no longer relies on the Catalogue of Life, but now primarily on the List of Prokaryotic names with Standing in Nomenclature (LPSN) and is supplemented with the Global Biodiversity Information Facility (GBIF). The structure of this data set has changed to include separate LPSN and GBIF identifiers. Almost all previous MO codes were retained. It contains over 1,000 taxonomic names from 2022 already.microorganisms.old
data set was removed, and all previously accepted names are now included in the microorganisms
data set. A new column status
contains "accepted"
for currently accepted names and "synonym"
for taxonomic synonyms; currently invalid names. All previously accepted names now have a microorganisms ID and - if available - an LPSN, GBIF and SNOMED CT identifier.mo_matching_score()
) now counts deletions and substitutions as 2 instead of 1, which impacts the outcome of as.mo()
and any mo_*()
functioncombine_IR
has been removed from this package (affecting functions count_df()
, proportion_df()
, and rsi_df()
and some plotting functions), since it was replaced with combine_SI
three years agounits
in ab_ddd(..., units = "...")
had been deprecated and is now not supported anymore. Use ab_ddd_units()
instead.New
-as.rsi()
. EUCAST 2022 (v12.0) is now the new default guideline for all MIC and disks diffusion interpretations, and for eucast_rules()
to apply EUCAST Expert Rules.AMR
package is now available in 16 languages. The automatic language determination will give a note on systems in supported languages.as.mo()
(and thus all mo_*()
functions) while still following our original set-up as described in our recently published JSS paper (DOI 10.18637/jss.v104.i03).
-keep_synonyms
allows to not correct for updated taxonomy, in favour of the now deleted argument allow_uncertain
+AMR 1.8.2.9063
+microorganisms
data set) updated to 2022 and now based on LPSN and GBIFantivirals
data set), with many new functionsrsi_confidence_interval()
and mean_amr_distance()
mo_reset_session()
function.AMR
package has extensive support for antiviral drugs and to work with their names, codes and other data in any way.
+New
+Interpretation of MIC and disk diffusion values
+as.rsi()
. EUCAST 2022 (v12.0) is now the new default guideline for all MIC and disks diffusion interpretations, and for eucast_rules()
to apply EUCAST Expert Rules. The default guideline (EUCAST) can now be changed with the new AMR_guideline
option, such as: options(AMR_guideline = "CLSI 2020")
.Supported languages
+AMR
package is now available in 16 languages and according to download stats used in almost all countries in the world!Microbiological taxonomy
+microorganisms
no longer relies on the Catalogue of Life, but on the List of Prokaryotic names with Standing in Nomenclature (LPSN) and is supplemented with the ‘backbone taxonomy’ from the Global Biodiversity Information Facility (GBIF). The structure of this data set has changed to include separate LPSN and GBIF identifiers. Almost all previous MO codes were retained. It contains over 1,400 taxonomic names from 2022.subspecies
column of the microorganisms
data set and “enterica” is in the species
column, but the full name does not contain the species name (enterica). * All new algorithm for as.mo()
(and thus all mo_*()
functions) while still following our original set-up as described in our recently published JSS paper (DOI 10.18637/jss.v104.i03). * A new argument keep_synonyms
allows to not correct for updated taxonomy, in favour of the now deleted argument allow_uncertain
* It has increased tremendously in speed and returns generally more consequent results * Sequential coercion is now extremely fast as results are stored to the package environment, although coercion of unknown values must be run once per session. Previous results can be reset/removed with the new mo_reset_session()
function. * Support for microorganism codes of the ASIan Antimicrobial Resistance Surveillance Network (ASIARS-Net) * The MO matching score algorithm (mo_matching_score()
) now counts deletions and substitutions as 2 instead of 1, which impacts the outcome of as.mo()
and any mo_*()
function * Removed all species of the taxonomic kingdom Chromista from the package. This was done for multiple reasons: * CRAN allows packages to be around 5 MB maximum, some packages are exempted but this package is not one of them * Chromista are not relevant when it comes to antimicrobial resistance, thus lacking the primary scope of this package * Chromista are almost never clinically relevant, thus lacking the secondary scope of this package * The microorganisms.old
data set was removed, and all previously accepted names are now included in the microorganisms
data set. A new column status
contains "accepted"
for currently accepted names and "synonym"
for taxonomic synonyms; currently invalid names. All previously accepted names now have a microorganisms ID and - if available - an LPSN, GBIF and SNOMED CT identifier.Antibiotic agents and selectors
+add_custom_antimicrobials()
allows users to add custom antimicrobial codes and names to the AMR
package.antibiotics
data set was greatly updated: * The following 20 antibiotics have been added (also includes the new J01RA ATC group): azithromycin/fluconazole/secnidazole (AFC), cefepime/amikacin (CFA), cefixime/ornidazole (CEO), ceftriaxone/beta-lactamase inhibitor (CEB), ciprofloxacin/metronidazole (CIM), ciprofloxacin/ornidazole (CIO), ciprofloxacin/tinidazole (CIT), furazidin (FUR), isoniazid/sulfamethoxazole/trimethoprim/pyridoxine (IST), lascufloxacin (LSC), levofloxacin/ornidazole (LEO), nemonoxacin (NEM), norfloxacin/metronidazole (NME), norfloxacin/tinidazole (NTI), ofloxacin/ornidazole (OOR), oteseconazole (OTE), rifampicin/ethambutol/isoniazid (REI), sarecycline (SRC), tetracycline/oleandomycin (TOL), and thioacetazone (TAT) * Added some missing ATC codes * Updated DDDs and PubChem Compound IDs * Updated some antibiotic name spelling, now used by WHOCC (such as cephalexin -> cefalexin, and phenethicillin -> pheneticillin) * Antibiotic code “CEI” for ceftolozane/tazobactam has been replaced with “CZT” to comply with EARS-Net and WHONET 2022. The old code will still work in all cases when using as.ab()
or any of the ab_*()
functions. * Support for antimicrobial interpretation of anaerobic bacteria, by adding a ‘placeholder’ code B_ANAER
to the microorganisms
data set and adding the breakpoints of anaerobics to the rsi_interpretation
data set, which is used by as.rsi()
for interpretion of MIC and disk diffusion valuesdplyr
verbs (with or without using vars()
), such as in: ... %>% summarise_at(aminoglycosides(), resistance)
, please see resistance()
for examples.Antiviral agents
+AMR
package has extensive support for antiviral drugs and to work with their names, codes and other data in any way.
-antivirals
data set has been extended with 18 new drugs (also from the new J05AJ ATC group) and now also contains antiviral identifiers and LOINC codesav
(antivirals) has been added, which is functionally similar to ab
for antibioticsas.av()
, av_name()
, av_atc()
, av_synonyms()
, av_from_text()
have all been added as siblings to their ab_*()
equivalentsrsi_confidence_interval()
to add confidence intervals in AMR calculation. This is also included in rsi_df()
and proportion_df()
-Other new functions
+
-rsi_confidence_interval()
to add confidence intervals in AMR calculation. This is now also included in rsi_df()
and proportion_df()
.mean_amr_distance()
to calculate the mean AMR distance. The mean AMR distance is a normalised numeric value to compare AMR test results and can help to identify similar isolates, without comparing antibiograms by hand.rsi_interpretation_history()
to view the history of previous runs of as.rsi()
. This returns a ‘logbook’ with the selected guideline, reference table and specific interpretation of each row in a data set on which as.rsi()
was run.mo_current()
to get the currently valid taxonomic name of a microorganismadd_custom_antimicrobials()
to add custom antimicrobial codes and names to the AMR
packageantibiotics
data set
-as.ab()
or any of the ab_*()
functions.B_ANAER
to the microorganisms
data set and add the breakpoints of anaerobics to the rsi_interpretation
data set, which is used by as.rsi()
when interpreting MIC and disk diffusion valuesdata.frame
-enhancing R packages, more specifically: data.table::data.table
, janitor::tabyl
, tibble::tibble
, and tsibble::tsibble
. AMR package functions that have a data set as output (such as rsi_df()
and bug_drug_combinations()
), will now return the same data type as the input.tibble
, instead of base R data.frame
s. Older R versions are still supported.dplyr
verbs (with or without vars()
), such as in: ... %>% summarise_at(aminoglycosides(), resistance)
, see resistance()
-Changes
-Changes
+combine_IR
has been removed from this package (affecting functions count_df()
, proportion_df()
, and rsi_df()
and some plotting functions), since it was replaced with combine_SI
three years agounits
in ab_ddd(..., units = "...")
had been deprecated for some time and is now not supported anymore. Use ab_ddd_units()
instead.data.frame
-enhancing R packages, more specifically: data.table::data.table
, janitor::tabyl
, tibble::tibble
, and tsibble::tsibble
. AMR package functions that have a data set as output (such as rsi_df()
and bug_drug_combinations()
), will now return the same data type as the input.tibble
, instead of base R data.frame
s. Older R versions are still supported, even if they do not support tibble
s.as.rsi()
:
NA
values (e.g. as.rsi(as.disk(NA), ...)
)options(AMR_guideline = "...")
-as.integer()
method for MIC values, since MIC are not integer values and running table()
on MIC values consequently failed for not being able to retrieve the level position (as that’s how normally as.integer()
on factor
s work)mo_gramstain()
), since the taxonomic phyla Actinobacteria, Chloroflexi, Firmicutes, and Tenericutes have been renamed to respectively Actinomycetota, Chloroflexota, Bacillota, and Mycoplasmatota in 2021mdro()
when using similar column names with the Magiorakos guidelinerandom_*()
function (such as random_mic()
) is now possible by directly calling the package without loading it first: AMR::random_mic(10)
P_TXPL_GOND
) to the microorganisms
data set, together with its genus, family, and orderprevalence
of the microorganisms
data set from 3 to 2 for these genera: Acholeplasma, Alistipes, Alloprevotella, Bergeyella, Borrelia, Brachyspira, Butyricimonas, Cetobacterium, Chlamydia, Chlamydophila, Deinococcus, Dysgonomonas, Elizabethkingia, Empedobacter, Haloarcula, Halobacterium, Halococcus, Myroides, Odoribacter, Ornithobacterium, Parabacteroides, Pedobacter, Phocaeicola, Porphyromonas, Riemerella, Sphingobacterium, Streptobacillus, Tenacibaculum, Terrimonas, Victivallis, Wautersiella, Weeksella
-vctrs
package, used internally by the tidyverse. This allows to change values of class mic
, disk
, rsi
, mo
and ab
in tibbles, and to use antibiotic selectors for selecting/filtering, e.g. df[carbapenems() == "R", ]
info = FALSE
in mdro()
@@ -234,7 +224,7 @@
as.rsi()
, as.mic()
, or as.disk()
will now show the column name in the warning for invalid resultsOther
+Other
styler
packagemicroorganisms
- microorganisms.codes
antibiotics
antivirals
diff --git a/reference/intrinsic_resistant.html b/reference/intrinsic_resistant.html
index a87d8aed3..bcaff7b5c 100644
--- a/reference/intrinsic_resistant.html
+++ b/reference/intrinsic_resistant.html
@@ -1,5 +1,5 @@
-Format
- mo
Microorganism IDmo
Microorganism IDab
Antibiotic IDExamples
intrinsic_resistant
-#> # A tibble: 206,832 × 2
-#> mo ab
-#> <mo> <ab>
-#> 1 B_GRAMP ATM
-#> 2 B_GRAMP COL
-#> 3 B_GRAMP NAL
-#> 4 B_GRAMP PLB
-#> 5 B_GRAMP TEM
-#> 6 B_ABTRP ATM
-#> 7 B_ABTRP COL
-#> 8 B_ABTRP NAL
-#> 9 B_ABTRP PLB
-#> 10 B_ABTRP TEM
-#> # … with 206,822 more rows
+#> # A tibble: 134,634 × 2
+#> mo ab
+#> <mo> <chr>
+#> 1 B_GRAMP ATM
+#> 2 B_GRAMP COL
+#> 3 B_GRAMP NAL
+#> 4 B_GRAMP PLB
+#> 5 B_GRAMP TEM
+#> 6 B_ABTRP ATM
+#> 7 B_ABTRP COL
+#> 8 B_ABTRP NAL
+#> 9 B_ABTRP PLB
+#> 10 B_ABTRP TEM
+#> # … with 134,624 more rows