1
0
mirror of https://github.com/msberends/AMR.git synced 2025-07-09 03:22:00 +02:00

Feather and Parquet files

This commit is contained in:
2022-08-26 22:25:15 +02:00
parent 4da32e3d40
commit 3864ab2fb8
48 changed files with 188 additions and 175 deletions

BIN
vignettes/AMR_intro.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 125 KiB

View File

@ -13,7 +13,7 @@ editor_options:
chunk_output_type: console
---
```{r setup, include = FALSE, results = 'markup'}
```{r setup, include = FALSE, results = "markup"}
knitr::opts_chunk$set(
warning = FALSE,
collapse = TRUE,
@ -40,30 +40,41 @@ download_txt <- function(filename) {
". Find more info about the structure of this data set [here](https://msberends.github.io/AMR/reference/", ifelse(filename == "antivirals", "antibiotics", filename), ".html).\n")
github_base <- "https://github.com/msberends/AMR/raw/main/data-raw/"
filename <- paste0("../data-raw/", filename)
txt <- paste0(filename, ".txt")
rds <- paste0(filename, ".rds")
txt <- paste0(filename, ".txt")
excel <- paste0(filename, ".xlsx")
feather <- paste0(filename, ".feather")
parquet <- paste0(filename, ".parquet")
sas <- paste0(filename, ".sas")
spss <- paste0(filename, ".sav")
stata <- paste0(filename, ".dta")
sas <- paste0(filename, ".sas")
excel <- paste0(filename, ".xlsx")
create_txt <- function(filename, type, software) {
paste0("* Download as [", software, " file](", github_base, filename, ") (", AMR:::formatted_filesize(filename), ") \n")
create_txt <- function(filename, type, software, exists) {
if (isTRUE(exists)) {
paste0("* Download as [", software, "](", github_base, filename, ") (",
AMR:::formatted_filesize(filename), ") \n")
} else {
paste0("* *(unavailable as ", software, ")*\n")
}
}
if (any(file.exists(rds),
file.exists(excel),
file.exists(txt),
file.exists(excel),
file.exists(feather),
file.exists(parquet),
file.exists(sas),
file.exists(spss),
file.exists(stata))) {
msg <- c(msg, "\n**Direct download links:**\n\n")
msg <- c(msg, "\n**Direct download links:**\n\n",
create_txt(rds, "rds", "original R Data Structure (RDS) file", file.exists(rds)),
create_txt(txt, "txt", "tab-separated text file", file.exists(txt)),
create_txt(excel, "xlsx", "Microsoft Excel workbook", file.exists(excel)),
create_txt(feather, "feather", "Apache Feather file", file.exists(feather)),
create_txt(parquet, "parquet", "Apache Parquet file", file.exists(parquet)),
create_txt(sas, "sas", "SAS data file", file.exists(sas)),
create_txt(spss, "sav", "IBM SPSS Statistics data file", file.exists(spss)),
create_txt(stata, "dta", "Stata DTA file", file.exists(stata)))
}
if (file.exists(rds)) msg <- c(msg, create_txt(rds, "rds", "R"))
if (file.exists(excel)) msg <- c(msg, create_txt(excel, "xlsx", "Excel"))
if (file.exists(txt)) msg <- c(msg, create_txt(txt, "txt", "plain text"))
if (file.exists(sas)) msg <- c(msg, create_txt(sas, "sas", "SAS"))
if (file.exists(spss)) msg <- c(msg, create_txt(spss, "sav", "SPSS"))
if (file.exists(stata)) msg <- c(msg, create_txt(stata, "dta", "Stata"))
paste0(msg, collapse = "")
}
@ -87,14 +98,13 @@ print_df <- function(x, rows = 6) {
}) %>%
knitr::kable(align = "c")
}
```
All reference data (about microorganisms, antibiotics, R/SI interpretation, EUCAST rules, etc.) in this `AMR` package are reliable, up-to-date and freely available. We continually export our data sets to formats for use in R, SPSS, SAS, Stata and Excel. We also supply tab separated files that are machine-readable and suitable for input in any software program, such as laboratory information systems.
All reference data (about microorganisms, antibiotics, R/SI interpretation, EUCAST rules, etc.) in this `AMR` package are reliable, up-to-date and freely available. We continually export our data sets to formats for use in R, MS Excel, Apache Feather, Apache Parquet, SPSS, SAS, and Stata. We also provide tab-separated text files that are machine-readable and suitable for input in any software program, such as laboratory information systems.
On this page, we explain how to download them and how the structure of the data sets look like.
## Microorganisms (currently accepted names)
## `microorganisms`: Microbial Taxonomy (currently accepted names)
`r structure_txt(microorganisms)`
@ -102,6 +112,8 @@ This data set is in R available as `microorganisms`, after you load the `AMR` pa
`r download_txt("microorganisms")`
**NOTE: The exported files for Excel, SAS, SPSS and Stata contain only the first 50 SNOMED codes per record, as their file size would otherwise exceed 100 MB; the file size limit of GitHub.** Advice? Use R instead.
### Source
Our full taxonomy of microorganisms is based on the authoritative and comprehensive:
@ -130,7 +142,7 @@ microorganisms %>%
print_df()
```
## Microorganisms (previously accepted names)
## `microorganisms.old`: Microbial Taxonomy (previously accepted names)
`r structure_txt(microorganisms.old)`
@ -158,7 +170,7 @@ microorganisms.old %>%
```
## Antibiotic agents
## `antibiotics`: Antibiotic Agents
`r structure_txt(antibiotics)`
@ -183,7 +195,7 @@ antibiotics %>%
```
## Antiviral agents
## `antivirals`: Antiviral Agents
`r structure_txt(antivirals)`
@ -205,7 +217,7 @@ antivirals %>%
print_df()
```
## Interpretation from MIC values / disk diameters to R/SI
## `rsi_translation`: Interpretation from MIC values / disk diameters to R/SI
`r structure_txt(rsi_translation)`
@ -227,7 +239,7 @@ rsi_translation %>%
```
## Intrinsic bacterial resistance
## `intrinsic_resistant`: Intrinsic Bacterial Resistance
`r structure_txt(intrinsic_resistant)`
@ -253,7 +265,7 @@ intrinsic_resistant %>%
```
## Dosage guidelines from EUCAST
## `dosage`: Dosage Guidelines from EUCAST
`r structure_txt(dosage)`

View File

@ -22,15 +22,19 @@ knitr::opts_chunk$set(
)
```
Note: to keep the package size as small as possible, we only included this vignette on CRAN. You can read more vignettes on our website about how to conduct AMR data analysis, determine MDRO's, find explanation of EUCAST rules, and much more: <https://msberends.github.io/AMR/articles/>.
Note: to keep the package size as small as possible, we only included this vignette on CRAN. You can read more vignettes on our website about how to conduct AMR data analysis, determine MDROs, find explanation of EUCAST rules, and much more: <https://msberends.github.io/AMR/articles/>.
----
`AMR` is a free, open-source and independent R package (see [Copyright](https://msberends.github.io/AMR/#copyright)) to simplify the analysis and prediction of Antimicrobial Resistance (AMR) and to work with microbial and antimicrobial data and properties, by using evidence-based methods. **Our aim is to provide a standard** for clean and reproducible antimicrobial resistance data analysis, that can therefore empower epidemiological analyses to continuously enable surveillance and treatment evaluation in any setting.
The `AMR` package is a [free and open-source](https://msberends.github.io/AMR/#copyright) R package with [zero dependencies](https://en.wikipedia.org/wiki/Dependency_hell) to simplify the analysis and prediction of Antimicrobial Resistance (AMR) and to work with microbial and antimicrobial data and properties, by using evidence-based methods. **Our aim is to provide a standard** for clean and reproducible AMR data analysis, that can therefore empower epidemiological analyses to continuously enable surveillance and treatment evaluation in any setting.
```{r, echo = FALSE, out.width = "555px"}
knitr::include_graphics("AMR_intro.png")
```
After installing this package, R knows `r AMR:::format_included_data_number(AMR::microorganisms)` distinct microbial species and all `r AMR:::format_included_data_number(rbind(AMR::antibiotics[, "atc", drop = FALSE], AMR::antivirals[, "atc", drop = FALSE]))` antibiotic, antimycotic and antiviral drugs by name and code (including ATC, EARS-Net, PubChem, LOINC and SNOMED CT), and knows all about valid R/SI and MIC values. It supports any data format, including WHONET/EARS-Net data.
The `AMR` package is available in Danish, Dutch, English, French, German, Italian, Portuguese, Russian, Spanish and Swedish. Antimicrobial drug (group) names and colloquial microorganism names are provided in these languages.
The `AMR` package is available in English, Chinese, Danish, Dutch, French, German, Greek, Italian, Japanese, Polish, Portuguese, Russian, Spanish, Swedish, Turkish and Ukrainian. Antimicrobial drug (group) names and colloquial microorganism names are provided in these languages.
This package is fully independent of any other R package and works on Windows, macOS and Linux with all versions of R since R-3.0 (April 2013). **It was designed to work in any setting, including those with very limited resources**. Since its first public release in early 2018, this package has been downloaded from more than 175 countries.
@ -56,3 +60,9 @@ This package can be used for:
All reference data sets (about microorganisms, antibiotics, R/SI interpretation, EUCAST rules, etc.) in this `AMR` package are publicly and freely available. We continually export our data sets to formats for use in R, SPSS, SAS, Stata and Excel. We also supply flat files that are machine-readable and suitable for input in any software program, such as laboratory information systems. Please find [all download links on our website](https://msberends.github.io/AMR/articles/datasets.html), which is automatically updated with every code change.
This R package was created for both routine data analysis and academic research at the Faculty of Medical Sciences of the [University of Groningen](https://www.rug.nl), in collaboration with non-profit organisations [Certe Medical Diagnostics and Advice Foundation](https://www.certe.nl) and [University Medical Center Groningen](https://www.umcg.nl). This R package formed the basis of two PhD theses ([DOI 10.33612/diss.177417131](https://doi.org/10.33612/diss.177417131) and [DOI 10.33612/diss.192486375](https://doi.org/10.33612/diss.192486375)) but is actively and durably maintained (see [changelog)](https://msberends.github.io/AMR/news/index.html)) by two public healthcare organisations in the Netherlands.
----
<small>
This AMR package for R is free, open-source software and licensed under the [GNU General Public License v2.0 (GPL-2)](https://msberends.github.io/AMR/LICENSE-text.html). These requirements are consequently legally binding: modifications must be released under the same license when distributing the package, changes made to the code must be documented, source code must be made available when the package is distributed, and a copy of the license and copyright notice must be included with the package.
</small>