--- title: "Data sets for download / own use" date: '`r format(Sys.Date(), "%d %B %Y")`' output: rmarkdown::html_vignette: toc: true toc_depth: 1 vignette: > %\VignetteIndexEntry{Data sets for download / own use} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r setup, include = FALSE, results = "markup"} knitr::opts_chunk$set( warning = FALSE, collapse = TRUE, comment = "#", fig.width = 7.5, fig.height = 5 ) library(AMR) library(dplyr) options(knitr.kable.NA = "") structure_txt <- function(dataset) { paste0( "A data set with ", format(nrow(dataset), big.mark = " "), " rows and ", ncol(dataset), " columns, containing the following column names: \n", AMR:::vector_or(colnames(dataset), quotes = "*", last_sep = " and ", sort = FALSE), "." ) } download_txt <- function(filename) { msg <- paste0( "It was last updated on ", trimws(format(file.mtime(paste0("../data/", filename, ".rda")), "%e %B %Y %H:%M:%S %Z", tz = "UTC")), ". Find more info about the contents, (scientific) source, and structure of this [data set here](https://msberends.github.io/AMR/reference/", ifelse(filename == "antivirals", "antimicrobials", filename), ".html).\n" ) github_base <- "https://github.com/msberends/AMR/raw/main/data-raw/datasets/" local_filename <- paste0("../data-raw/datasets/", filename) rds <- paste0(local_filename, ".rds") txt <- paste0(local_filename, ".txt") excel <- paste0(local_filename, ".xlsx") feather <- paste0(local_filename, ".feather") parquet <- paste0(local_filename, ".parquet") spss <- paste0(local_filename, ".sav") stata <- paste0(local_filename, ".dta") create_txt <- function(filename, type, software, exists) { if (isTRUE(exists)) { paste0( "* Download as [", software, "](", github_base, basename(filename), ") (", AMR:::formatted_filesize(filename), ") \n" ) } else { paste0("* *(unavailable as ", software, ")*\n") } } if (any( file.exists(rds), file.exists(txt), file.exists(excel), file.exists(feather), file.exists(parquet), file.exists(spss), file.exists(stata) )) { msg <- c( msg, "\n**Direct download links:**\n\n", create_txt(rds, "rds", "original R Data Structure (RDS) file", file.exists(rds)), create_txt(txt, "txt", "tab-separated text file", file.exists(txt)), create_txt(excel, "xlsx", "Microsoft Excel workbook", file.exists(excel)), create_txt(feather, "feather", "Apache Feather file", file.exists(feather)), create_txt(parquet, "parquet", "Apache Parquet file", file.exists(parquet)), create_txt(spss, "sav", "IBM SPSS Statistics data file", file.exists(spss)), create_txt(stata, "dta", "Stata DTA file", file.exists(stata)) ) } paste0(msg, collapse = "") } print_df <- function(x, rows = 6) { x %>% as.data.frame(stringsAsFactors = FALSE) %>% head(n = rows) %>% mutate_all(function(x) { if (is.list(x)) { sapply(x, function(y) { if (length(y) > 3) { paste0(paste(y[1:3], collapse = ", "), ", ...") } else if (length(y) == 0 || all(is.na(y))) { "" } else { paste(y, collapse = ", ") } }) } else { x } }) %>% knitr::kable(align = "c") } ``` All reference data (about microorganisms, antimicrobials, SIR interpretation, EUCAST rules, etc.) in this `AMR` package are reliable, up-to-date and freely available. We continually export our data sets to formats for use in R, MS Excel, Apache Feather, Apache Parquet, SPSS, and Stata. We also provide tab-separated text files that are machine-readable and suitable for input in any software program, such as laboratory information systems. > If you are working in Python, be sure to use our [AMR for Python](https://msberends.github.io/AMR/articles/AMR_for_Python.html) package. It allows all relevant AMR data sets to be natively available in Python. ## `microorganisms`: Full Microbial Taxonomy `r structure_txt(microorganisms)` This data set is in R available as `microorganisms`, after you load the `AMR` package. `r download_txt("microorganisms")` **NOTE: The exported files for SPSS and Stata contain only the first 50 SNOMED codes per record, as their file size would otherwise exceed 100 MB; the file size limit of GitHub.** Their file structures and compression techniques are very inefficient. Advice? Use R instead. It's free and much better in many ways. The tab-separated text file and Microsoft Excel workbook both contain all SNOMED codes as comma separated values. **Example content** Included (sub)species per taxonomic kingdom: ```{r, echo = FALSE} microorganisms %>% count(kingdom) %>% mutate(n = format(n, big.mark = " ")) %>% setNames(c("Kingdom", "Number of (sub)species")) %>% print_df() ``` First 6 rows when filtering on genus *Escherichia*: ```{r, echo = FALSE} microorganisms %>% filter(genus == "Escherichia") %>% print_df() ``` ## `antimicrobials`: Antibiotic and Antifungal Drugs `r structure_txt(antimicrobials)` This data set is in R available as `antimicrobials`, after you load the `AMR` package. `r download_txt("antimicrobials")` The tab-separated text, Microsoft Excel, SPSS, and Stata files all contain the ATC codes, common abbreviations, trade names and LOINC codes as comma separated values. **Example content** ```{r, echo = FALSE} antimicrobials %>% filter(ab %in% colnames(example_isolates)) %>% print_df() ``` ## `clinical_breakpoints`: Interpretation from MIC values & disk diameters to SIR `r structure_txt(clinical_breakpoints)` This data set is in R available as `clinical_breakpoints`, after you load the `AMR` package. `r download_txt("clinical_breakpoints")` **Example content** ```{r, echo = FALSE} clinical_breakpoints %>% mutate(mo_name = mo_name(mo, language = NULL), .after = mo) %>% mutate(ab_name = ab_name(ab, language = NULL), .after = ab) %>% print_df() ``` ## `microorganisms.groups`: Species Groups and Microbiological Complexes `r structure_txt(microorganisms.groups)` This data set is in R available as `microorganisms.groups`, after you load the `AMR` package. `r download_txt("microorganisms.groups")` **Example content** ```{r, echo = FALSE} microorganisms.groups %>% print_df() ``` ## `intrinsic_resistant`: Intrinsic Bacterial Resistance `r structure_txt(intrinsic_resistant)` This data set is in R available as `intrinsic_resistant`, after you load the `AMR` package. `r download_txt("intrinsic_resistant")` **Example content** Example rows when filtering on *Enterobacter cloacae*: ```{r, echo = FALSE} intrinsic_resistant %>% transmute( microorganism = mo_name(mo), antibiotic = ab_name(ab) ) %>% filter(microorganism == "Enterobacter cloacae") %>% arrange(antibiotic) %>% print_df(rows = Inf) ``` ## `dosage`: Dosage Guidelines from EUCAST `r structure_txt(dosage)` This data set is in R available as `dosage`, after you load the `AMR` package. `r download_txt("dosage")` **Example content** ```{r, echo = FALSE} dosage %>% print_df() ``` ## `example_isolates`: Example Data for Practice `r structure_txt(example_isolates)` This data set is in R available as `example_isolates`, after you load the `AMR` package. `r download_txt("example_isolates")` **Example content** ```{r, echo = FALSE} example_isolates %>% print_df() ``` ## `example_isolates_unclean`: Example Data for Practice `r structure_txt(example_isolates_unclean)` This data set is in R available as `example_isolates_unclean`, after you load the `AMR` package. `r download_txt("example_isolates_unclean")` **Example content** ```{r, echo = FALSE} example_isolates_unclean %>% print_df() ``` ## `microorganisms.codes`: Common Laboratory Codes `r structure_txt(microorganisms.codes)` This data set is in R available as `microorganisms.codes`, after you load the `AMR` package. `r download_txt("microorganisms.codes")` **Example content** ```{r, echo = FALSE} microorganisms.codes %>% print_df() ``` ## `antivirals`: Antiviral Drugs `r structure_txt(antivirals)` This data set is in R available as `antivirals`, after you load the `AMR` package. `r download_txt("antivirals")` The tab-separated text, Microsoft Excel, SPSS, and Stata files all contain the trade names and LOINC codes as comma separated values. **Example content** ```{r, echo = FALSE} antivirals %>% print_df() ```