mirror of
https://github.com/msberends/AMR.git
synced 2025-07-09 23:41:55 +02:00
resistance predict update
This commit is contained in:
@ -17,9 +17,9 @@ editor_options:
|
||||
```{r setup, include = FALSE, results = 'markup'}
|
||||
knitr::opts_chunk$set(
|
||||
collapse = TRUE,
|
||||
comment = "#",
|
||||
comment = "#>",
|
||||
fig.width = 7.5,
|
||||
fig.height = 4.5
|
||||
fig.height = 5
|
||||
)
|
||||
```
|
||||
|
||||
@ -106,14 +106,21 @@ ab_interpretations <- c("S", "I", "R")
|
||||
Using the `sample()` function, we can randomly select items from all objects we defined earlier. To let our fake data reflect reality a bit, we will also approximately define the probabilities of bacteria and the antibiotic results with the `prob` parameter.
|
||||
|
||||
```{r merge data}
|
||||
data <- data.frame(date = sample(dates, 5000, replace = TRUE),
|
||||
patient_id = sample(patients, 5000, replace = TRUE),
|
||||
hospital = sample(hospitals, 5000, replace = TRUE, prob = c(0.30, 0.35, 0.15, 0.20)),
|
||||
bacteria = sample(bacteria, 5000, replace = TRUE, prob = c(0.50, 0.25, 0.15, 0.10)),
|
||||
amox = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.60, 0.05, 0.35)),
|
||||
amcl = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.75, 0.10, 0.15)),
|
||||
cipr = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.80, 0.00, 0.20)),
|
||||
gent = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.92, 0.00, 0.08))
|
||||
sample_size <- 20000
|
||||
data <- data.frame(date = sample(dates, size = sample_size, replace = TRUE),
|
||||
patient_id = sample(patients, size = sample_size, replace = TRUE),
|
||||
hospital = sample(hospitals, size = sample_size, replace = TRUE,
|
||||
prob = c(0.30, 0.35, 0.15, 0.20)),
|
||||
bacteria = sample(bacteria, size = sample_size, replace = TRUE,
|
||||
prob = c(0.50, 0.25, 0.15, 0.10)),
|
||||
amox = sample(ab_interpretations, size = sample_size, replace = TRUE,
|
||||
prob = c(0.60, 0.05, 0.35)),
|
||||
amcl = sample(ab_interpretations, size = sample_size, replace = TRUE,
|
||||
prob = c(0.75, 0.10, 0.15)),
|
||||
cipr = sample(ab_interpretations, size = sample_size, replace = TRUE,
|
||||
prob = c(0.80, 0.00, 0.20)),
|
||||
gent = sample(ab_interpretations, size = sample_size, replace = TRUE,
|
||||
prob = c(0.92, 0.00, 0.08))
|
||||
)
|
||||
```
|
||||
|
||||
@ -124,6 +131,7 @@ data <- data %>% left_join(patients_table)
|
||||
```
|
||||
|
||||
The resulting data set contains 5,000 blood culture isolates. With the `head()` function we can preview the first 6 values of this data set:
|
||||
|
||||
```{r preview data set 1, eval = FALSE}
|
||||
head(data)
|
||||
```
|
||||
@ -148,6 +156,7 @@ data %>% freq(gender, markdown = FALSE, header = TRUE)
|
||||
So, we can draw at least two conclusions immediately. From a data scientist perspective, the data looks clean: only values `M` and `F`. From a researcher perspective: there are slightly more men. Nothing we didn't already know.
|
||||
|
||||
The data is already quite clean, but we still need to transform some variables. The `bacteria` column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The `mutate()` function of the `dplyr` package makes this really easy:
|
||||
|
||||
```{r transform mo 1}
|
||||
data <- data %>%
|
||||
mutate(bacteria = as.mo(bacteria))
|
||||
@ -202,6 +211,7 @@ data_1st <- data %>%
|
||||
```
|
||||
|
||||
For future use, the above two syntaxes can be shortened with the `filter_first_isolate()` function:
|
||||
|
||||
```{r 1st isolate filter 2, eval = FALSE}
|
||||
data_1st <- data %>%
|
||||
filter_first_isolate()
|
||||
@ -263,6 +273,7 @@ data_1st <- data %>%
|
||||
So we end up with `r format(nrow(data_1st), big.mark = ",")` isolates for analysis.
|
||||
|
||||
We can remove unneeded columns:
|
||||
|
||||
```{r}
|
||||
data_1st <- data_1st %>%
|
||||
select(-c(first, keyab))
|
||||
@ -359,6 +370,7 @@ data_1st %>%
|
||||
```
|
||||
|
||||
To make a transition to the next part, let's see how this difference could be plotted:
|
||||
|
||||
```{r plot 1}
|
||||
data_1st %>%
|
||||
group_by(genus) %>%
|
||||
@ -391,6 +403,7 @@ ggplot(a_data_set,
|
||||
```
|
||||
|
||||
The `AMR` package contains functions to extend this `ggplot2` package, for example `geom_rsi()`. It automatically transforms data with `count_df()` or `portion_df()` and show results in stacked bars. Its simplest and shortest example:
|
||||
|
||||
```{r plot 3}
|
||||
ggplot(data_1st) +
|
||||
geom_rsi(translate_ab = FALSE)
|
||||
@ -424,6 +437,7 @@ ggplot(data_1st %>% group_by(genus)) +
|
||||
```
|
||||
|
||||
To simplify this, we also created the `ggplot_rsi()` function, which combines almost all above functions:
|
||||
|
||||
```{r plot 5}
|
||||
data_1st %>%
|
||||
group_by(genus) %>%
|
||||
|
@ -17,7 +17,7 @@ editor_options:
|
||||
```{r setup, include = FALSE, results = 'markup'}
|
||||
knitr::opts_chunk$set(
|
||||
collapse = TRUE,
|
||||
comment = "#",
|
||||
comment = "#>",
|
||||
fig.width = 7.5,
|
||||
fig.height = 4.5
|
||||
)
|
||||
|
@ -16,7 +16,7 @@ editor_options:
|
||||
```{r setup, include = FALSE, results = 'markup'}
|
||||
knitr::opts_chunk$set(
|
||||
collapse = TRUE,
|
||||
comment = "#"
|
||||
comment = "#>"
|
||||
)
|
||||
# set to original language (English)
|
||||
Sys.setlocale(locale = "C")
|
@ -16,8 +16,10 @@ editor_options:
|
||||
```{r setup, include = FALSE, results = 'markup'}
|
||||
knitr::opts_chunk$set(
|
||||
collapse = TRUE,
|
||||
comment = "#"
|
||||
comment = "#>"
|
||||
)
|
||||
# set to original language (English)
|
||||
Sys.setlocale(locale = "C")
|
||||
```
|
||||
|
||||
*(will be available soon)*
|
||||
|
@ -16,9 +16,9 @@ editor_options:
|
||||
```{r setup, include = FALSE, results = 'markup'}
|
||||
knitr::opts_chunk$set(
|
||||
collapse = TRUE,
|
||||
comment = "#",
|
||||
comment = "#>",
|
||||
fig.width = 7.5,
|
||||
fig.height = 4.5
|
||||
fig.height = 4.75
|
||||
)
|
||||
```
|
||||
|
||||
@ -37,7 +37,7 @@ library(AMR)
|
||||
```
|
||||
|
||||
## Prediction analysis
|
||||
Our package contains a function `resistance_predict()`, which takes the same input as functions for [other AMR analysis](./articles/AMR.html). Based on a date column, it calculates cases per year and uses a regression model to predict antimicrobial resistance.
|
||||
Our package contains a function `resistance_predict()`, which takes the same input as functions for [other AMR analysis](./AMR.html). Based on a date column, it calculates cases per year and uses a regression model to predict antimicrobial resistance.
|
||||
|
||||
It is basically as easy as:
|
||||
```{r, eval = FALSE}
|
||||
@ -53,27 +53,36 @@ predict_pita <- septic_patients %>%
|
||||
resistance_predict(col_ab = "pita")
|
||||
```
|
||||
|
||||
The function will look for a date column itself if `col_date` is not set.
|
||||
|
||||
When running any of these commands, a summary of the regression model will be printed unless using `resistance_predict(..., info = FALSE)`.
|
||||
|
||||
```{r, echo = FALSE}
|
||||
predict_pita <- septic_patients %>%
|
||||
resistance_predict(col_ab = "pita")
|
||||
```
|
||||
|
||||
The function will look for a data column itself if `col_date` is not set. The result is nothing more than a `data.frame`, containing the years, number of observations, actual observed resistance, the estimated resistance and the standard error below and above the estimation:
|
||||
This text is only a printed summary - the actual result (output) of the function is a `data.frame` containing for each year: the number of observations, the actual observed resistance, the estimated resistance and the standard error below and above the estimation:
|
||||
|
||||
```{r}
|
||||
predict_pita
|
||||
```
|
||||
|
||||
The function `plot` is available in base R, and can be extended by other packages to depend the output based on the type of input. We extended its function to cope with resistance predictions:
|
||||
|
||||
```{r}
|
||||
```{r, fig.height = 5.5}
|
||||
plot(predict_pita)
|
||||
```
|
||||
|
||||
We also support the `ggplot2` package with the function `ggplot_rsi_predict()`:
|
||||
This is the fastest way to plot the result. It automatically adds the right axes, error bars, titles, number of available observations and type of model.
|
||||
|
||||
We also support the `ggplot2` package with our custom function `ggplot_rsi_predict()` to create more appealing plots:
|
||||
|
||||
```{r}
|
||||
library(ggplot2)
|
||||
ggplot_rsi_predict(predict_pita)
|
||||
|
||||
# choose for error bars instead of a ribbon
|
||||
ggplot_rsi_predict(predict_pita, ribbon = FALSE)
|
||||
```
|
||||
|
||||
### Choosing the right model
|
||||
@ -84,7 +93,7 @@ Resistance is not easily predicted; if we look at vancomycin resistance in Gram
|
||||
septic_patients %>%
|
||||
filter(mo_gramstain(mo) == "Gram positive") %>%
|
||||
resistance_predict(col_ab = "vanc", year_min = 2010, info = FALSE) %>%
|
||||
plot()
|
||||
ggplot_rsi_predict()
|
||||
```
|
||||
|
||||
Vancomycin resistance could be 100% in ten years, but might also stay around 0%.
|
||||
@ -99,13 +108,22 @@ Valid values are:
|
||||
| `"loglin"` or `"poisson"` | `glm(..., family = poisson)` | Generalised linear model with poisson distribution |
|
||||
| `"lin"` or `"linear"` | `lm()` | Linear model |
|
||||
|
||||
For the vancomycin resistance in Gram positive bacteria, a linear model might be more appropriate since no (left half of a) binomial distribution is to be expected based on observed years:
|
||||
For the vancomycin resistance in Gram positive bacteria, a linear model might be more appropriate since no (left half of a) binomial distribution is to be expected based on the observed years:
|
||||
|
||||
```{r}
|
||||
septic_patients %>%
|
||||
filter(mo_gramstain(mo) == "Gram positive") %>%
|
||||
resistance_predict(col_ab = "vanc", year_min = 2010, info = FALSE, model = "linear") %>%
|
||||
plot()
|
||||
ggplot_rsi_predict()
|
||||
```
|
||||
|
||||
This seems more likely, doesn't it?
|
||||
|
||||
The model itself is also available from the object, as an `attribute`:
|
||||
```{r}
|
||||
model <- attributes(predict_pita)$model
|
||||
|
||||
summary(model)$family
|
||||
|
||||
summary(model)$coefficients
|
||||
```
|
Reference in New Issue
Block a user