mirror of
https://github.com/msberends/AMR.git
synced 2025-01-13 12:51:38 +01:00
website update
This commit is contained in:
parent
2e4d703338
commit
68baf058cd
5
R/freq.R
5
R/freq.R
@ -18,7 +18,8 @@
|
||||
|
||||
#' Frequency table
|
||||
#'
|
||||
#' Create a frequency table of a vector with items or a data frame. Supports quasiquotation and markdown for reports. \code{top_freq} can be used to get the top/bottom \emph{n} items of a frequency table, with counts as names.
|
||||
#' Create a frequency table of a vector with items or a data frame. Supports quasiquotation and markdown for reports. The best practice is: \code{data \%>\% freq(var)}.\cr
|
||||
#' \code{top_freq} can be used to get the top/bottom \emph{n} items of a frequency table, with counts as names.
|
||||
#' @param x vector of any class or a \code{\link{data.frame}}, \code{\link{tibble}} (may contain a grouping variable) or \code{\link{table}}
|
||||
#' @param ... up to nine different columns of \code{x} when \code{x} is a \code{data.frame} or \code{tibble}, to calculate frequencies from - see Examples
|
||||
#' @param sort.count sort on count, i.e. frequencies. This will be \code{TRUE} at default for everything except when using grouping variables.
|
||||
@ -30,7 +31,7 @@
|
||||
#' @param quote a logical value indicating whether or not strings should be printed with surrounding quotes
|
||||
#' @param header a logical value indicating whether an informative header should be printed
|
||||
#' @param title text to show above frequency table, at default to tries to coerce from the variables passed to \code{x}
|
||||
#' @param na a character string to should be used to show empty (\code{NA}) values (only useful when \code{na.rm = FALSE})
|
||||
#' @param na a character string that should be used to show empty (\code{NA}) values (only useful when \code{na.rm = FALSE})
|
||||
#' @param droplevels a logical value indicating whether in factors empty levels should be dropped
|
||||
#' @param sep a character string to separate the terms when selecting multiple columns
|
||||
#' @inheritParams base::format
|
||||
|
@ -54,7 +54,7 @@ header(f, property = NULL)
|
||||
|
||||
\item{title}{text to show above frequency table, at default to tries to coerce from the variables passed to \code{x}}
|
||||
|
||||
\item{na}{a character string to should be used to show empty (\code{NA}) values (only useful when \code{na.rm = FALSE})}
|
||||
\item{na}{a character string that should be used to show empty (\code{NA}) values (only useful when \code{na.rm = FALSE})}
|
||||
|
||||
\item{droplevels}{a logical value indicating whether in factors empty levels should be dropped}
|
||||
|
||||
@ -78,7 +78,8 @@ header(f, property = NULL)
|
||||
A \code{data.frame} (with an additional class \code{"frequency_tbl"}) with five columns: \code{item}, \code{count}, \code{percent}, \code{cum_count} and \code{cum_percent}.
|
||||
}
|
||||
\description{
|
||||
Create a frequency table of a vector with items or a data frame. Supports quasiquotation and markdown for reports. \code{top_freq} can be used to get the top/bottom \emph{n} items of a frequency table, with counts as names.
|
||||
Create a frequency table of a vector with items or a data frame. Supports quasiquotation and markdown for reports. The best practice is: \code{data \%>\% freq(var)}.\cr
|
||||
\code{top_freq} can be used to get the top/bottom \emph{n} items of a frequency table, with counts as names.
|
||||
}
|
||||
\details{
|
||||
Frequency tables (or frequency distributions) are summaries of the distribution of values in a sample. With the `freq` function, you can create univariate frequency tables. Multiple variables will be pasted into one variable, so it forces a univariate distribution. This package also has a vignette available to explain the use of this function further, run \code{browseVignettes("AMR")} to read it.
|
||||
|
@ -52,7 +52,7 @@ library(ggplot2) # for appealing plots
|
||||
```
|
||||
|
||||
## Creation of data
|
||||
We will create some fake example data to use for analysis. For antimicrobial resistance analysis, we need at least: a patients ID, name or code of a microorganism, a date and antimicrobial results (an antibiogram). It could also include a specimen type (e.g. to filter on blood or urine), the ward type (e.g. to filter on ICUs).
|
||||
We will create some fake example data to use for analysis. For antimicrobial resistance analysis, we need at least: a patient ID, name or code of a microorganism, a date and antimicrobial results (an antibiogram). It could also include a specimen type (e.g. to filter on blood or urine), the ward type (e.g. to filter on ICUs).
|
||||
|
||||
With additional columns (like a hospital name, the patients gender of even [well-defined] clinical properties) you can do a comparative analysis, as this tutorial will demonstrate too.
|
||||
|
||||
@ -63,7 +63,15 @@ To start with patients, we need a unique list of patients.
|
||||
patients <- unlist(lapply(LETTERS, paste0, 1:10))
|
||||
```
|
||||
|
||||
The `LETTERS` object is available in R - it's a vector with 26 characters: `A` to `Z`. The `patients` object we just created is now a vector of length `r length(patients)`, with values (patient IDs) varying from ``r patients[1]`` to ``r patients[length(patients)]``.
|
||||
The `LETTERS` object is available in R - it's a vector with 26 characters: `A` to `Z`. The `patients` object we just created is now a vector of length `r length(patients)`, with values (patient IDs) varying from ``r patients[1]`` to ``r patients[length(patients)]``. Now we we also set the gender of our patients, by putting the ID and the gender in a table:
|
||||
|
||||
```{r create gender}
|
||||
patients_table <- data.frame(patients,
|
||||
gender = c(strrep("M", 135),
|
||||
strrep("F", 125))
|
||||
```
|
||||
|
||||
The first 135 patient IDs are now male, the other 125 are female.
|
||||
|
||||
#### Dates
|
||||
Let's pretend that our data consists of blood cultures isolates from 1 January 2010 until 1 January 2018.
|
||||
@ -83,10 +91,9 @@ bacteria <- c("Escherichia coli", "Staphylococcus aureus",
|
||||
```
|
||||
|
||||
#### Other variables
|
||||
For completeness, we can also add the patients gender, the hospital where the patients was admitted and all valid antibmicrobial results:
|
||||
For completeness, we can also add the hospital where the patients was admitted and we need to define valid antibmicrobial results for our randomisation:
|
||||
|
||||
```{r create other}
|
||||
genders <- c("M", "F")
|
||||
hospitals <- c("Hospital A", "Hospital B", "Hospital C", "Hospital D")
|
||||
ab_interpretations <- c("S", "I", "R")
|
||||
```
|
||||
@ -98,8 +105,6 @@ Using the `sample()` function, we can randomly select items from all objects we
|
||||
```{r merge data}
|
||||
data <- data.frame(date = sample(dates, 5000, replace = TRUE),
|
||||
patient_id = sample(patients, 5000, replace = TRUE),
|
||||
# gender - add slightly more men:
|
||||
gender = sample(genders, 5000, replace = TRUE, prob = c(0.55, 0.45)),
|
||||
hospital = sample(hospitals, 5000, replace = TRUE),
|
||||
bacteria = sample(bacteria, 5000, replace = TRUE, prob = c(0.50, 0.25, 0.15, 0.10)),
|
||||
amox = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.6, 0.05, 0.35)),
|
||||
@ -109,6 +114,12 @@ data <- data.frame(date = sample(dates, 5000, replace = TRUE),
|
||||
)
|
||||
```
|
||||
|
||||
Using the `left_join()` function from the `dplyr` package, we can 'map' the gender to the patient ID using the `patients_table` object we created earlier:
|
||||
|
||||
```{r merge data 2, message = FALSE, warning = FALSE}
|
||||
data <- data %>% left_join(patients_table)
|
||||
```
|
||||
|
||||
The resulting data set contains 5,000 blood culture isolates. With the `head()` function we can preview the first 6 values of this data set:
|
||||
```{r preview data set 1, echo = TRUE, results = 'hide'}
|
||||
head(data)
|
||||
@ -166,12 +177,12 @@ data <- data %>%
|
||||
### First isolates
|
||||
We also need to know which isolates we can *actually* use for analysis.
|
||||
|
||||
To conduct an analysis of antimicrobial resistance, you [must only include the first isolate of every patient per episode](https://www.ncbi.nlm.nih.gov/pubmed/17304462). If you would not do this, you could easily get an overestimate or underestimate of the resistance of an antibiotic. Imagine that a patient was admitted with an MRSA and that it was found in 5 different blood cultures the following weeks (yes, some countries like the Netherlands have these blood drawing policies). The resistance percentage of oxacillin of all \emph{S. aureus} isolates would be overestimated, because you included this MRSA more than once. It would clearly be \href{https://en.wikipedia.org/wiki/Selection_bias}{selection bias}.
|
||||
To conduct an analysis of antimicrobial resistance, you must [only include the first isolate of every patient per episode](https://www.ncbi.nlm.nih.gov/pubmed/17304462) (Hindler *et al.*, Clin Infect Dis. 2007). If you would not do this, you could easily get an overestimate or underestimate of the resistance of an antibiotic. Imagine that a patient was admitted with an MRSA and that it was found in 5 different blood cultures the following weeks (yes, some countries like the Netherlands have these blood drawing policies). The resistance percentage of oxacillin of all \emph{S. aureus} isolates would be overestimated, because you included this MRSA more than once. It would clearly be [selection bias](https://en.wikipedia.org/wiki/Selection_bias).
|
||||
|
||||
The Clinical and Laboratory Standards Institute (CLSI) appoints this as follows:
|
||||
|
||||
> *(...) When preparing a cumulative antibiogram to guide clinical decisions about empirical antimicrobial therapy of initial infections, **only the first isolate of a given species per patient, per analysis period (eg, one year) should be included, irrespective of body site, antimicrobial susceptibility profile, or other phenotypical characteristics (eg, biotype)**. The first isolate is easily identified, and cumulative antimicrobial susceptibility test data prepared using the first isolate are generally comparable to cumulative antimicrobial susceptibility test data calculated by other methods, providing duplicate isolates are excluded.*
|
||||
Chapter 6.4, M39-A4 Analysis and Presentation of Cumulative Antimicrobial Susceptibility Test Data, 4th Edition. CLSI, 2014. https://clsi.org/standards/products/microbiology/documents/m39/
|
||||
<br>Chapter 6.4, M39-A4 Analysis and Presentation of Cumulative Antimicrobial Susceptibility Test Data, 4th Edition. CLSI, 2014. https://clsi.org/standards/products/microbiology/documents/m39/
|
||||
|
||||
This `AMR` package includes this methodology with the `first_isolate()` function. It adopts the episode of a year (can be changed by user) and it starts counting days after every selected isolate. This new variable can easily be added to our data:
|
||||
```{r 1st isolate}
|
||||
|
Loading…
Reference in New Issue
Block a user