All reference data (about microorganisms, antibiotics, R/SI
interpretation, EUCAST rules, etc.) in this AMR
package are
reliable, up-to-date and freely available. We continually export our
data sets to formats for use in R, MS Excel, Apache Feather, Apache
Parquet, SPSS, SAS, and Stata. We also provide tab-separated text files
that are machine-readable and suitable for input in any software
program, such as laboratory information systems.
On this page, we explain how to download them and how the structure of the data sets look like.
microorganisms
: Full Microbial Taxonomy
A data set with 48,883 rows and 22 columns, containing the following
column names:
mo, fullname, status, kingdom,
phylum, class, order, family,
genus, species, subspecies, rank,
ref, source, lpsn, lpsn_parent,
lpsn_renamed_to, gbif, gbif_parent,
gbif_renamed_to, prevalence and snomed.
This data set is in R available as microorganisms
, after
you load the AMR
package.
It was last updated on 29 October 2022 12:15:23 UTC. Find more info about the structure of this data set here.
Direct download links:
- Download as original
R Data Structure (RDS) file (1.1 MB)
- Download as tab-separated
text file (10.6 MB)
- Download as Microsoft
Excel workbook (4.8 MB)
- Download as Apache
Feather file (5.1 MB)
- Download as Apache
Parquet file (2.5 MB)
- Download as SAS
data file (47.8 MB)
- Download as IBM
SPSS Statistics data file (15.8 MB)
- Download as Stata DTA file (43.8 MB)
NOTE: The exported files for SAS, SPSS and Stata contain only the first 50 SNOMED codes per record, as their file size would otherwise exceed 100 MB; the file size limit of GitHub. Advice? Use R instead.
The tab-separated text file and Microsoft Excel workbook both contain all SNOMED codes as comma separated values.
Source
This data set contains the full microbial taxonomy of five kingdoms from the List of Prokaryotic names with Standing in Nomenclature (LPSN) and the Global Biodiversity Information Facility (GBIF):
- Parte, AC et al. (2020). List of Prokaryotic names with Standing in Nomenclature (LPSN) moves to the DSMZ. International Journal of Systematic and Evolutionary Microbiology, 70, 5607-5612; . Accessed from https://lpsn.dsmz.de on 12 September, 2022.
- GBIF Secretariat (November 26, 2021). GBIF Backbone Taxonomy. Checklist dataset . Accessed from https://www.gbif.org on 12 September, 2022.
- Public Health Information Network Vocabulary Access and Distribution System (PHIN VADS). US Edition of SNOMED CT from 1 September 2020. Value Set Name ‘Microoganism’, OID 2.16.840.1.114222.4.11.1009 (v12). URL: https://phinvads.cdc.gov
Example content
Included (sub)species per taxonomic kingdom:
Kingdom | Number of (sub)species |
---|---|
(unknown kingdom) | 5 |
Animalia | 1,524 |
Archaea | 1,237 |
Bacteria | 33,716 |
Fungi | 7,450 |
Protozoa | 4,951 |
Example rows when filtering on genus Escherichia:
mo | fullname | status | kingdom | phylum | class | order | family | genus | species | subspecies | rank | ref | source | lpsn | lpsn_parent | lpsn_renamed_to | gbif | gbif_parent | gbif_renamed_to | prevalence | snomed |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
B_ESCHR | Escherichia | accepted | Bacteria | Pseudomonadota | Gammaproteobacteria | Enterobacterales | Enterobacteriaceae | Escherichia | genus | Castellani et al., 1919 | LPSN | 515602 | 482 | 3221780 | 4899 | 1 | 407310004, 407251000, 407281008, … | ||||
B_ESCHR_ADCR | Escherichia adecarboxylata | synonym | Bacteria | Pseudomonadota | Gammaproteobacteria | Enterobacterales | Enterobacteriaceae | Escherichia | adecarboxylata | species | Leclerc, 1962 | LPSN | 776052 | 515602 | 777447 | 1 | |||||
B_ESCHR_ALBR | Escherichia albertii | accepted | Bacteria | Pseudomonadota | Gammaproteobacteria | Enterobacterales | Enterobacteriaceae | Escherichia | albertii | species | Huys et al., 2003 | LPSN | 776053 | 515602 | 5427575 | 3221780 | 1 | 419388003 | |||
B_ESCHR_BLTT | Escherichia blattae | synonym | Bacteria | Pseudomonadota | Gammaproteobacteria | Enterobacterales | Enterobacteriaceae | Escherichia | blattae | species | Burgess et al., 1973 | LPSN | 776056 | 515602 | 788468 | 1 | |||||
B_ESCHR_COLI | Escherichia coli | accepted | Bacteria | Pseudomonadota | Gammaproteobacteria | Enterobacterales | Enterobacteriaceae | Escherichia | coli | species | Castellani et al., 1919 | LPSN | 776057 | 515602 | 6110934 | 3221780 | 1 | 1095001000112106, 715307006, 737528008, … | |||
B_ESCHR_DYSN | Escherichia dysenteriae | accepted | Bacteria | Pseudomonadota | Gammaproteobacteria | Enterobacterales | Enterobacteriaceae | Escherichia | dysenteriae | species | GBIF | 10862979 | 3221780 | 1 |
antibiotics
: Antibiotic (+Antifungal) Drugs
A data set with 483 rows and 14 columns, containing the following
column names:
ab, cid, name, group, atc,
atc_group1, atc_group2, abbreviations,
synonyms, oral_ddd, oral_units,
iv_ddd, iv_units and loinc.
This data set is in R available as antibiotics
, after
you load the AMR
package.
It was last updated on 30 October 2022 20:05:46 UTC. Find more info about the structure of this data set here.
Direct download links:
- Download as original
R Data Structure (RDS) file (39 kB)
- Download as tab-separated
text file (0.1 MB)
- Download as Microsoft
Excel workbook (66 kB)
- Download as Apache
Feather file (0.1 MB)
- Download as Apache
Parquet file (97 kB)
- Download as SAS
data file (1.9 MB)
- Download as IBM
SPSS Statistics data file (0.3 MB)
- Download as Stata DTA file (0.4 MB)
The tab-separated text file and Microsoft Excel workbook, and SAS, SPSS and Stata files all contain the ATC codes, common abbreviations, trade names and LOINC codes as comma separated values.
Source
This data set contains all EARS-Net and ATC codes gathered from WHO and WHONET, and all compound IDs from PubChem. It also contains all brand names (synonyms) as found on PubChem and Defined Daily Doses (DDDs) for oral and parenteral administration.
- ATC/DDD index from WHO Collaborating Centre for Drug Statistics Methodology (note: this may not be used for commercial purposes, but is freely available from the WHO CC website for personal use)
- PubChem by the US National Library of Medicine
- WHONET software 2019
- LOINC (Logical Observation Identifiers Names and Codes)
Example content
ab | cid | name | group | atc | atc_group1 | atc_group2 | abbreviations | synonyms | oral_ddd | oral_units | iv_ddd | iv_units | loinc |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AMK | 37768 | Amikacin | Aminoglycosides | D06AX12, J01GB06, S01AA21 | Aminoglycoside antibacterials | Other aminoglycosides | ak, ami, amik, … | amicacin, amikacillin, amikacin, … | 1.0 | g | 13546-7, 15098-7, 17798-0, … | ||
AMX | 33613 | Amoxicillin | Beta-lactams/penicillins | J01CA04 | Beta-lactam antibacterials, penicillins | Penicillins with extended spectrum | ac, amox, amx | actimoxi, amoclen, amolin, … | 1.5 | g | 3.0 | g | 16365-9, 25274-2, 3344-9, … |
AMC | 23665637 | Amoxicillin/clavulanic acid | Beta-lactams/penicillins | J01CR02 | Beta-lactam antibacterials, penicillins | Combinations of penicillins, incl. beta-lactamase inhibitors | a/c, amcl, aml, … | amocla, amoclan, amoclav, … | 1.5 | g | 3.0 | g | |
AMP | 6249 | Ampicillin | Beta-lactams/penicillins | J01CA01, S01AA19 | Beta-lactam antibacterials, penicillins | Penicillins with extended spectrum | am, amp, ampi | acillin, adobacillin, amblosin, … | 2.0 | g | 6.0 | g | 21066-6, 3355-5, 33562-0, … |
AZM | 447043 | Azithromycin | Macrolides/lincosamides | J01FA10, S01AA26 | Macrolides, lincosamides and streptogramins | Macrolides | az, azi, azit, … | aritromicina, aruzilina, azasite, … | 0.3 | g | 0.5 | g | 16420-2, 25233-8 |
PEN | 5904 | Benzylpenicillin | Beta-lactams/penicillins | J01CE01, S01AA14 | Combinations of antibacterials | Combinations of antibacterials | bepe, pen, peni, … | abbocillin, ayercillin, bencilpenicilina, … | 3.6 | g |
antivirals
: Antiviral Drugs
A data set with 120 rows and 11 columns, containing the following
column names:
av, name, atc, cid,
atc_group, synonyms, oral_ddd,
oral_units, iv_ddd, iv_units and
loinc.
This data set is in R available as antivirals
, after you
load the AMR
package.
It was last updated on 13 November 2022 07:46:10 UTC. Find more info about the structure of this data set here.
Direct download links:
- Download as original
R Data Structure (RDS) file (5 kB)
- Download as tab-separated
text file (16 kB)
- Download as Microsoft
Excel workbook (16 kB)
- Download as Apache
Feather file (15 kB)
- Download as Apache
Parquet file (13 kB)
- Download as SAS
data file (84 kB)
- Download as IBM
SPSS Statistics data file (30 kB)
- Download as Stata DTA file (73 kB)
The tab-separated text file and Microsoft Excel workbook, and SAS, SPSS and Stata files all contain the trade names and LOINC codes as comma separated values.
Source
This data set contains all ATC codes gathered from WHO and all compound IDs from PubChem. It also contains all brand names (synonyms) as found on PubChem and Defined Daily Doses (DDDs) for oral and parenteral administration.
- ATC/DDD index from WHO Collaborating Centre for Drug Statistics Methodology (note: this may not be used for commercial purposes, but is freely available from the WHO CC website for personal use)
- PubChem by the US National Library of Medicine
- LOINC (Logical Observation Identifiers Names and Codes)
Example content
av | name | atc | cid | atc_group | synonyms | oral_ddd | oral_units | iv_ddd | iv_units | loinc |
---|---|---|---|---|---|---|---|---|---|---|
ABA | Abacavir | J05AF06 | 441300 | Nucleoside and nucleotide reverse transcriptase inhibitors | abacavir sulfate, avacavir, ziagen | 0.6 | g | 29113-8, 78772-1, 78773-9, … | ||
ACI | Aciclovir | J05AB01 | 135398513 | Nucleosides and nucleotides excl. reverse transcriptase inhibitors | acicloftal, aciclovier, aciclovirum, … | 4.0 | g | 4 | g | |
ADD | Adefovir dipivoxil | J05AF08 | 60871 | Nucleoside and nucleotide reverse transcriptase inhibitors | adefovir di, adefovir di ester, adefovir dipivoxyl, … | 10.0 | mg | |||
AME | Amenamevir | J05AX26 | 11397521 | Other antivirals | amenalief | 0.4 | g | |||
AMP | Amprenavir | J05AE05 | 65016 | Protease inhibitors | agenerase, carbamate, prozei | 1.2 | g | 29114-6, 31028-4, 78791-1 | ||
ASU | Asunaprevir | J05AP06 | 16076883 | Antivirals for treatment of HCV infections | sunvepra, sunvepratrade | 0.2 | g |
rsi_translation
: Interpretation from MIC values / disk
diameters to R/SI
A data set with 18,308 rows and 11 columns, containing the following
column names:
guideline, method, site, mo,
rank_index, ab, ref_tbl, disk_dose,
breakpoint_S, breakpoint_R and uti.
This data set is in R available as rsi_translation
,
after you load the AMR
package.
It was last updated on 29 October 2022 17:01:23 UTC. Find more info about the structure of this data set here.
Direct download links:
- Download as original
R Data Structure (RDS) file (42 kB)
- Download as tab-separated
text file (1.9 MB)
- Download as Microsoft
Excel workbook (0.8 MB)
- Download as Apache
Feather file (0.7 MB)
- Download as Apache
Parquet file (87 kB)
- Download as SAS
data file (3.6 MB)
- Download as IBM
SPSS Statistics data file (2.3 MB)
- Download as Stata DTA file (3.4 MB)
Source
This data set contains interpretation rules for MIC values and disk diffusion diameters. Included guidelines are CLSI (2013-2022) and EUCAST (2013-2022).
Example content
guideline | method | site | mo | mo_name | rank_index | ab | ab_name | ref_tbl | disk_dose | breakpoint_S | breakpoint_R | uti |
---|---|---|---|---|---|---|---|---|---|---|---|---|
EUCAST 2022 | MIC | F_ASPRG_MGTS | Aspergillus fumigatus | 2 | AMB | Amphotericin B | Aspergillus | 1 | 1 | FALSE | ||
EUCAST 2022 | MIC | F_ASPRG_NIGR | Aspergillus niger | 2 | AMB | Amphotericin B | Aspergillus | 1 | 1 | FALSE | ||
EUCAST 2022 | MIC | F_CANDD_ALBC | Candida albicans | 2 | AMB | Amphotericin B | Candida | 1 | 1 | FALSE | ||
EUCAST 2022 | MIC | F_CANDD_DBLN | Candida dubliniensis | 2 | AMB | Amphotericin B | Candida | 1 | 1 | FALSE | ||
EUCAST 2022 | MIC | F_CANDD_GLBR | Candida glabrata | 2 | AMB | Amphotericin B | Candida | 1 | 1 | FALSE | ||
EUCAST 2022 | MIC | F_CANDD_KRUS | Candida krusei | 2 | AMB | Amphotericin B | Candida | 1 | 1 | FALSE |
intrinsic_resistant
: Intrinsic Bacterial
Resistance
A data set with 134,659 rows and 2 columns, containing the following
column names:
mo and ab.
This data set is in R available as intrinsic_resistant
,
after you load the AMR
package.
It was last updated on 31 October 2022 10:19:06 UTC. Find more info about the structure of this data set here.
Direct download links:
- Download as original
R Data Structure (RDS) file (78 kB)
- Download as tab-separated
text file (5.1 MB)
- Download as Microsoft
Excel workbook (1.3 MB)
- Download as Apache
Feather file (1.2 MB)
- Download as Apache
Parquet file (0.2 MB)
- Download as SAS
data file (9.8 MB)
- Download as IBM
SPSS Statistics data file (7.4 MB)
- Download as Stata DTA file (9.5 MB)
Source
This data set contains all defined intrinsic resistance by EUCAST of all bug-drug combinations, and is based on ‘EUCAST Expert Rules’ and ‘EUCAST Intrinsic Resistance and Unusual Phenotypes’ v3.3 (2021).
Example content
Example rows when filtering on Enterobacter cloacae:
microorganism | antibiotic |
---|---|
Enterobacter cloacae | Acetylmidecamycin |
Enterobacter cloacae | Acetylspiramycin |
Enterobacter cloacae | Amoxicillin |
Enterobacter cloacae | Amoxicillin/clavulanic acid |
Enterobacter cloacae | Ampicillin |
Enterobacter cloacae | Ampicillin/sulbactam |
Enterobacter cloacae | Avoparcin |
Enterobacter cloacae | Azithromycin |
Enterobacter cloacae | Benzylpenicillin |
Enterobacter cloacae | Cadazolid |
Enterobacter cloacae | Cefadroxil |
Enterobacter cloacae | Cefalexin |
Enterobacter cloacae | Cefalotin |
Enterobacter cloacae | Cefazolin |
Enterobacter cloacae | Cefoxitin |
Enterobacter cloacae | Clarithromycin |
Enterobacter cloacae | Clindamycin |
Enterobacter cloacae | Cycloserine |
Enterobacter cloacae | Dalbavancin |
Enterobacter cloacae | Dirithromycin |
Enterobacter cloacae | Erythromycin |
Enterobacter cloacae | Flurithromycin |
Enterobacter cloacae | Fusidic acid |
Enterobacter cloacae | Gamithromycin |
Enterobacter cloacae | Josamycin |
Enterobacter cloacae | Kitasamycin |
Enterobacter cloacae | Lincomycin |
Enterobacter cloacae | Linezolid |
Enterobacter cloacae | Meleumycin |
Enterobacter cloacae | Midecamycin |
Enterobacter cloacae | Miocamycin |
Enterobacter cloacae | Nafithromycin |
Enterobacter cloacae | Norvancomycin |
Enterobacter cloacae | Oleandomycin |
Enterobacter cloacae | Oritavancin |
Enterobacter cloacae | Pirlimycin |
Enterobacter cloacae | Primycin |
Enterobacter cloacae | Pristinamycin |
Enterobacter cloacae | Quinupristin/dalfopristin |
Enterobacter cloacae | Ramoplanin |
Enterobacter cloacae | Rifampicin |
Enterobacter cloacae | Rokitamycin |
Enterobacter cloacae | Roxithromycin |
Enterobacter cloacae | Solithromycin |
Enterobacter cloacae | Spiramycin |
Enterobacter cloacae | Tedizolid |
Enterobacter cloacae | Teicoplanin |
Enterobacter cloacae | Telavancin |
Enterobacter cloacae | Telithromycin |
Enterobacter cloacae | Thiacetazone |
Enterobacter cloacae | Tildipirosin |
Enterobacter cloacae | Tilmicosin |
Enterobacter cloacae | Troleandomycin |
Enterobacter cloacae | Tulathromycin |
Enterobacter cloacae | Tylosin |
Enterobacter cloacae | Tylvalosin |
Enterobacter cloacae | Vancomycin |
dosage
: Dosage Guidelines from EUCAST
A data set with 169 rows and 9 columns, containing the following
column names:
ab, name, type, dose,
dose_times, administration, notes,
original_txt and eucast_version.
This data set is in R available as dosage
, after you
load the AMR
package.
It was last updated on 30 October 2022 20:05:46 UTC. Find more info about the structure of this data set here.
Direct download links:
- Download as original
R Data Structure (RDS) file (3 kB)
- Download as tab-separated
text file (15 kB)
- Download as Microsoft
Excel workbook (14 kB)
- Download as Apache
Feather file (11 kB)
- Download as Apache
Parquet file (7 kB)
- Download as SAS
data file (52 kB)
- Download as IBM
SPSS Statistics data file (23 kB)
- Download as Stata DTA file (44 kB)
Source
EUCAST breakpoints used in this package are based on the dosages in this data set.
Currently included dosages in the data set are meant for: ‘EUCAST Clinical Breakpoint Tables’ v11.0 (2021).
Example content
ab | name | type | dose | dose_times | administration | notes | original_txt | eucast_version |
---|---|---|---|---|---|---|---|---|
AMK | Amikacin | standard_dosage | 25-30 mg/kg | 1 | iv | 25-30 mg/kg x 1 iv | 11 | |
AMX | Amoxicillin | high_dosage | 2 g | 6 | iv | 2 g x 6 iv | 11 | |
AMX | Amoxicillin | standard_dosage | 1 g | 3 | iv | 1 g x 3-4 iv | 11 | |
AMX | Amoxicillin | high_dosage | 0.75-1 g | 3 | oral | 0.75-1 g x 3 oral | 11 | |
AMX | Amoxicillin | standard_dosage | 0.5 g | 3 | oral | 0.5 g x 3 oral | 11 | |
AMX | Amoxicillin | uncomplicated_uti | 0.5 g | 3 | oral | 0.5 g x 3 oral | 11 |
example_isolates
: Example Data for Practice
A data set with 2,000 rows and 46 columns, containing the following
column names:
date, patient, age, gender,
ward, mo, PEN, OXA, FLC,
AMX, AMC, AMP, TZP, CZO,
FEP, CXM, FOX, CTX, CAZ,
CRO, GEN, TOB, AMK, KAN,
TMP, SXT, NIT, FOS, LNZ,
CIP, MFX, VAN, TEC, TCY,
TGC, DOX, ERY, CLI, AZM,
IPM, MEM, MTR, CHL, COL,
MUP and RIF.
This data set is in R available as example_isolates
,
after you load the AMR
package.
It was last updated on 27 August 2022 18:49:37 UTC. Find more info about the structure of this data set here.
Source
This data set contains randomised fictitious data, but reflects reality and can be used to practise AMR data analysis.
Example content
date | patient | age | gender | ward | mo | PEN | OXA | FLC | AMX | AMC | AMP | TZP | CZO | FEP | CXM | FOX | CTX | CAZ | CRO | GEN | TOB | AMK | KAN | TMP | SXT | NIT | FOS | LNZ | CIP | MFX | VAN | TEC | TCY | TGC | DOX | ERY | CLI | AZM | IPM | MEM | MTR | CHL | COL | MUP | RIF |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2002-01-02 | A77334 | 65 | F | Clinical | B_ESCHR_COLI | R | I | I | R | R | R | R | R | R | R | R | R | R | |||||||||||||||||||||||||||
2002-01-03 | A77334 | 65 | F | Clinical | B_ESCHR_COLI | R | I | I | R | R | R | R | R | R | R | R | R | R | |||||||||||||||||||||||||||
2002-01-07 | 067927 | 45 | F | ICU | B_STPHY_EPDR | R | R | R | R | S | S | S | S | S | S | R | R | R | |||||||||||||||||||||||||||
2002-01-07 | 067927 | 45 | F | ICU | B_STPHY_EPDR | R | R | R | R | S | S | S | S | S | S | R | R | R | |||||||||||||||||||||||||||
2002-01-13 | 067927 | 45 | F | ICU | B_STPHY_EPDR | R | R | R | R | R | S | S | S | S | R | R | R | ||||||||||||||||||||||||||||
2002-01-13 | 067927 | 45 | F | ICU | B_STPHY_EPDR | R | R | R | R | R | S | S | S | S | R | R | R | R |
example_isolates_unclean
: Example Data for
Practice
A data set with 3,000 rows and 8 columns, containing the following
column names:
patient_id, hospital, date,
bacteria, AMX, AMC, CIP and
GEN.
This data set is in R available as
example_isolates_unclean
, after you load the
AMR
package.
It was last updated on 27 August 2022 18:49:37 UTC. Find more info about the structure of this data set here.