mirror of
https://github.com/msberends/AMR.git
synced 2025-07-08 07:51:57 +02:00
new website, freq updates
This commit is contained in:
@ -1,11 +1,11 @@
|
||||
---
|
||||
title: "Introduction to the AMR package"
|
||||
title: "The AMR package - How to conduct AMR analysis"
|
||||
author: "Matthijs S. Berends"
|
||||
output:
|
||||
rmarkdown::html_vignette:
|
||||
toc: false
|
||||
toc: true
|
||||
vignette: >
|
||||
%\VignetteIndexEntry{Introduction to the AMR package}
|
||||
%\VignetteIndexEntry{The AMR package - How to conduct AMR analysis}
|
||||
%\VignetteEngine{knitr::rmarkdown}
|
||||
%\VignetteEncoding{UTF-8}
|
||||
---
|
||||
@ -17,81 +17,5 @@ knitr::opts_chunk$set(
|
||||
)
|
||||
```
|
||||
|
||||
This R package was intended **to make microbial epidemiology easier**. Most functions contain extensive help pages to get started.
|
||||
|
||||
The `AMR` package basically does four important things:
|
||||
|
||||
1. It **cleanses existing data**, by transforming it to reproducible and profound *classes*, making the most efficient use of R. These functions all use artificial intelligence to guess results that you would expect:
|
||||
|
||||
* Use `as.mo` to get an ID of a microorganism. The IDs are human readable for the trained eye - the ID of *Klebsiella pneumoniae* is "B_KLBSL_PNE" (B stands for Bacteria) and the ID of *S. aureus* is "B_STPHY_AUR". The function takes almost any text as input that looks like the name or code of a microorganism like "E. coli", "esco" and "esccol". Even `as.mo("MRSA")` will return the ID of *S. aureus*. Moreover, it can group all coagulase negative and positive *Staphylococci*, and can transform *Streptococci* into Lancefield groups. To find bacteria based on your input, it uses Artificial Intelligence to look up values in the included ITIS data, consisting of more than 18,000 microorganisms.
|
||||
* Use `as.rsi` to transform values to valid antimicrobial results. It produces just S, I or R based on your input and warns about invalid values. Even values like "<=0.002; S" (combined MIC/RSI) will result in "S".
|
||||
* Use `as.mic` to cleanse your MIC values. It produces a so-called factor (called *ordinal* in SPSS) with valid MIC values as levels. A value like "<=0.002; S" (combined MIC/RSI) will result in "<=0.002".
|
||||
* Use `as.atc` to get the ATC code of an antibiotic as defined by the WHO. This package contains a database with most LIS codes, official names, DDDs and even trade names of antibiotics. For example, the values "Furabid", "Furadantin", "nitro" all return the ATC code of Nitrofurantoine.
|
||||
|
||||
2. It **enhances existing data** and **adds new data** from data sets included in this package.
|
||||
|
||||
* Use `EUCAST_rules` to apply [EUCAST expert rules to isolates](http://www.eucast.org/expert_rules_and_intrinsic_resistance/).
|
||||
* Use `first_isolate` to identify the first isolates of every patient [using guidelines from the CLSI](https://clsi.org/standards/products/microbiology/documents/m39/) (Clinical and Laboratory Standards Institute).
|
||||
* You can also identify first *weighted* isolates of every patient, an adjusted version of the CLSI guideline. This takes into account key antibiotics of every strain and compares them.
|
||||
* Use `MDRO` (abbreviation of Multi Drug Resistant Organisms) to check your isolates for exceptional resistance with country-specific guidelines or EUCAST rules. Currently, national guidelines for Germany and the Netherlands are supported.
|
||||
* The data set `microorganisms` contains the complete taxonomic tree of more than 18,000 microorganisms (bacteria, fungi/yeasts and protozoa). Furthermore, the colloquial name and Gram stain are available, which enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like `mo_genus`, `mo_family`, `mo_gramstain` or even `mo_phylum`. As they use `as.mo` internally, they also use artificial intelligence. For example, `mo_genus("MRSA")` and `mo_genus("S. aureus")` will both return `"Staphylococcus"`. They also come with support for German, Dutch, French, Italian, Spanish and Portuguese. These functions can be used to add new variables to your data.
|
||||
* The data set `antibiotics` contains the ATC code, LIS codes, official name, trivial name and DDD of both oral and parenteral administration. It also contains a total of 298 trade names. Use functions like `ab_name` and `ab_tradenames` to look up values. The `ab_*` functions use `as.atc` internally so they support AI to guess your expected result. For example, `ab_name("Fluclox")`, `ab_name("Floxapen")` and `ab_name("J01CF05")` will all return `"Flucloxacillin"`. These functions can again be used to add new variables to your data.
|
||||
|
||||
3. It **analyses the data** with convenient functions that use well-known methods.
|
||||
|
||||
* Calculate the resistance (and even co-resistance) of microbial isolates with the `portion_R`, `portion_IR`, `portion_I`, `portion_SI` and `portion_S` functions. Similarly, the *number* of isolates can be determined with the `count_R`, `count_IR`, `count_I`, `count_SI` and `count_S` functions. All these functions can be used [with the `dplyr` package](https://dplyr.tidyverse.org/#usage) (e.g. in conjunction with [`summarise`](https://dplyr.tidyverse.org/reference/summarise.html))
|
||||
* Plot AMR results with `geom_rsi`, a function made for the `ggplot2` package
|
||||
* Predict antimicrobial resistance for the nextcoming years using logistic regression models with the `resistance_predict` function
|
||||
* Conduct descriptive statistics to enhance base R: calculate kurtosis, skewness and create frequency tables
|
||||
|
||||
4. It **teaches the user** how to use all the above actions.
|
||||
|
||||
* The package contains extensive help pages with many examples.
|
||||
* It also contains an example data set called `septic_patients`. This data set contains:
|
||||
* 2,000 blood culture isolates from anonymised septic patients between 2001 and 2017 in the Northern Netherlands
|
||||
* Results of 40 antibiotics (each antibiotic in its own column) with a total of 38,414 antimicrobial results
|
||||
* Real and genuine data
|
||||
|
||||
### ITIS
|
||||
|
||||
This package contains the **complete microbial taxonomic data** (with all seven taxonomic ranks - from subkingdom to subspecies) from the publicly available Integrated Taxonomic Information System (ITIS, https://www.itis.gov).
|
||||
|
||||
All (sub)species from the taxonomic kingdoms Bacteria, Fungi and Protozoa are included in this package, as well as all previously accepted names known to ITIS. Furthermore, the responsible authors and year of publication are available. This allows users to use authoritative taxonomic information for their data analysis on any microorganism, not only human pathogens.
|
||||
|
||||
ITIS is a partnership of U.S., Canadian, and Mexican agencies and taxonomic specialists.
|
||||
|
||||
**Get a note when a species was renamed**
|
||||
```r
|
||||
mo_shortname("Chlamydia psittaci")
|
||||
# Note: 'Chlamydia psittaci' (Page, 1968) was renamed 'Chlamydophila psittaci' (Everett et al., 1999)
|
||||
# [1] "C. psittaci"
|
||||
```
|
||||
|
||||
**Get any property from the entire taxonomic tree for all included species**
|
||||
```r
|
||||
mo_class("E. coli")
|
||||
# [1] "Gammaproteobacteria"
|
||||
|
||||
mo_family("E. coli")
|
||||
# [1] "Enterobacteriaceae"
|
||||
|
||||
mo_ref("E. coli")
|
||||
# [1] "Castellani and Chalmers, 1919"
|
||||
```
|
||||
|
||||
**Do not get mistaken - the package only includes microorganisms**
|
||||
```r
|
||||
mo_phylum("C. elegans")
|
||||
# [1] "Cyanobacteria" # Bacteria?!
|
||||
mo_fullname("C. elegans")
|
||||
# [1] "Chroococcus limneticus elegans" # Because a microorganism was found
|
||||
```
|
||||
|
||||
----
|
||||
```{r, echo = FALSE}
|
||||
# this will print "2018" in 2018, and "2018-yyyy" after 2018.
|
||||
yrs <- paste(unique(c(2018, format(Sys.Date(), "%Y"))), collapse = "-")
|
||||
```
|
||||
AMR, (c) `r yrs`, `r packageDescription("AMR")$URL`
|
||||
|
||||
Licensed under the [GNU General Public License v2.0](https://github.com/msberends/AMR/blob/master/LICENSE).
|
||||
This page will soon be updated.
|
||||
|
@ -1,171 +0,0 @@
|
||||
---
|
||||
title: "Creating Frequency Tables"
|
||||
author: "Matthijs S. Berends"
|
||||
output:
|
||||
rmarkdown::html_vignette:
|
||||
toc: true
|
||||
vignette: >
|
||||
%\VignetteIndexEntry{Creating Frequency Tables}
|
||||
%\VignetteEngine{knitr::rmarkdown}
|
||||
%\VignetteEncoding{UTF-8}
|
||||
---
|
||||
|
||||
```{r setup, include = FALSE, results = 'markup'}
|
||||
knitr::opts_chunk$set(
|
||||
collapse = TRUE,
|
||||
comment = "#"
|
||||
)
|
||||
library(dplyr)
|
||||
library(AMR)
|
||||
```
|
||||
|
||||
## Introduction
|
||||
|
||||
Frequency tables (or frequency distributions) are summaries of the distribution of values in a sample. With the `freq` function, you can create univariate frequency tables. Multiple variables will be pasted into one variable, so it forces a univariate distribution. We take the `septic_patients` dataset (included in this AMR package) as example.
|
||||
|
||||
## Frequencies of one variable
|
||||
|
||||
To only show and quickly review the content of one variable, you can just select this variable in various ways. Let's say we want to get the frequencies of the `gender` variable of the `septic_patients` dataset:
|
||||
```{r, echo = TRUE}
|
||||
septic_patients %>% freq(gender)
|
||||
```
|
||||
This immediately shows the class of the variable, its length and availability (i.e. the amount of `NA`), the amount of unique values and (most importantly) that among septic patients men are more prevalent than women.
|
||||
|
||||
## Frequencies of more than one variable
|
||||
|
||||
Multiple variables will be pasted into one variable to review individual cases, keeping a univariate frequency table.
|
||||
|
||||
For illustration, we could add some more variables to the `septic_patients` dataset to learn about bacterial properties:
|
||||
```{r, echo = TRUE, results = 'hide'}
|
||||
my_patients <- septic_patients %>% left_join_microorganisms()
|
||||
```
|
||||
Now all variables of the `microorganisms` dataset have been joined to the `septic_patients` dataset. The `microorganisms` dataset consists of the following variables:
|
||||
```{r, echo = TRUE}
|
||||
colnames(microorganisms)
|
||||
```
|
||||
|
||||
If we compare the dimensions between the old and new dataset, we can see that these `r ncol(my_patients) - ncol(septic_patients)` variables were added:
|
||||
```{r, echo = TRUE}
|
||||
dim(septic_patients)
|
||||
dim(my_patients)
|
||||
```
|
||||
|
||||
So now the `genus` and `species` variables are available. A frequency table of these combined variables can be created like this:
|
||||
```{r, echo = TRUE}
|
||||
my_patients %>% freq(genus, species)
|
||||
```
|
||||
|
||||
## Frequencies of numeric values
|
||||
|
||||
Frequency tables can be created of any input.
|
||||
|
||||
In case of numeric values (like integers, doubles, etc.) additional descriptive statistics will be calculated and shown into the header:
|
||||
|
||||
```{r, echo = TRUE}
|
||||
# # get age distribution of unique patients
|
||||
septic_patients %>%
|
||||
distinct(patient_id, .keep_all = TRUE) %>%
|
||||
freq(age, nmax = 5)
|
||||
```
|
||||
|
||||
So the following properties are determined, where `NA` values are always ignored:
|
||||
|
||||
* **Mean**
|
||||
|
||||
* **Standard deviation**
|
||||
|
||||
* **Coefficient of variation** (CV), the standard deviation divided by the mean
|
||||
|
||||
* **Five numbers of Tukey** (min, Q1, median, Q3, max)
|
||||
|
||||
* **Coefficient of quartile variation** (CQV, sometimes called coefficient of dispersion), calculated as (Q3 - Q1) / (Q3 + Q1) using quantile with `type = 6` as quantile algorithm to comply with SPSS standards
|
||||
|
||||
* **Outliers** (total count and unique count)
|
||||
|
||||
So for example, the above frequency table quickly shows the median age of patients being `r my_patients %>% distinct(patient_id, .keep_all = TRUE) %>% pull(age) %>% median(na.rm = TRUE)`.
|
||||
|
||||
## Frequencies of factors
|
||||
|
||||
Frequencies of factors will be sorted on factor level instead of item count by default. This can be changed with the `sort.count` parameter. Frequency tables of factors always show the factor level as an additional last column.
|
||||
|
||||
`sort.count` is `TRUE` by default, except for factors. Compare this default behaviour...
|
||||
|
||||
```{r, echo = TRUE}
|
||||
septic_patients %>%
|
||||
freq(hospital_id)
|
||||
```
|
||||
|
||||
... with this, where items are now sorted on count:
|
||||
|
||||
```{r, echo = TRUE}
|
||||
septic_patients %>%
|
||||
freq(hospital_id, sort.count = TRUE)
|
||||
```
|
||||
|
||||
All classes will be printed into the header. Variables with the new `rsi` class of this AMR package are actually ordered factors and have three classes (look at `Class` in the header):
|
||||
|
||||
```{r, echo = TRUE}
|
||||
septic_patients %>%
|
||||
select(amox) %>%
|
||||
freq()
|
||||
```
|
||||
|
||||
## Frequencies of dates
|
||||
|
||||
Frequencies of dates will show the oldest and newest date in the data, and the amount of days between them:
|
||||
|
||||
```{r, echo = TRUE}
|
||||
septic_patients %>%
|
||||
select(date) %>%
|
||||
freq(nmax = 5)
|
||||
```
|
||||
|
||||
## Assigning a frequency table to an object
|
||||
|
||||
A frequency table is actaually a regular `data.frame`, with the exception that it contains an additional class.
|
||||
|
||||
```{r, echo = TRUE}
|
||||
my_df <- septic_patients %>% freq(age)
|
||||
class(my_df)
|
||||
```
|
||||
|
||||
Because of this additional class, a frequency table prints like the examples above. But the object itself contains the complete table without a row limitation:
|
||||
|
||||
```{r, echo = TRUE}
|
||||
dim(my_df)
|
||||
```
|
||||
|
||||
## Additional parameters
|
||||
|
||||
### Parameter `na.rm`
|
||||
With the `na.rm` parameter (defaults to `TRUE`, but they will always be shown into the header), you can include `NA` values in the frequency table:
|
||||
|
||||
```{r, echo = TRUE}
|
||||
septic_patients %>%
|
||||
freq(amox, na.rm = FALSE)
|
||||
```
|
||||
|
||||
### Parameter `row.names`
|
||||
The default frequency tables shows row indices. To remove them, use `row.names = FALSE`:
|
||||
|
||||
```{r, echo = TRUE}
|
||||
septic_patients %>%
|
||||
freq(hospital_id, row.names = FALSE)
|
||||
```
|
||||
|
||||
### Parameter `markdown`
|
||||
The `markdown` parameter can be used in reports created with R Markdown. This will always print all rows:
|
||||
|
||||
```{r, echo = TRUE}
|
||||
septic_patients %>%
|
||||
freq(hospital_id, markdown = TRUE)
|
||||
```
|
||||
|
||||
----
|
||||
```{r, echo = FALSE}
|
||||
# this will print "2018" in 2018, and "2018-yyyy" after 2018.
|
||||
yrs <- paste(unique(c(2018, format(Sys.Date(), "%Y"))), collapse = "-")
|
||||
```
|
||||
AMR, (c) `r yrs`, `r packageDescription("AMR")$URL`
|
||||
|
||||
Licensed under the [GNU General Public License v2.0](https://github.com/msberends/AMR/blob/master/LICENSE).
|
Reference in New Issue
Block a user