--- title: "How to predict antimicrobial resistance" author: "Matthijs S. Berends" date: '`r format(Sys.Date(), "%d %B %Y")`' output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{How to predict antimicrobial resistance} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r setup, include = FALSE, results = 'markup'} knitr::opts_chunk$set( collapse = TRUE, comment = "#", fig.width = 7.5, fig.height = 4.5 ) ``` ## Needed R packages As with many uses in R, we need some additional packages for AMR analysis. Our package works closely together with the [tidyverse packages](https://www.tidyverse.org) [`dplyr`](https://dplyr.tidyverse.org/) and [`ggplot2`](https://ggplot2.tidyverse.org) by [Dr Hadley Wickham](https://www.linkedin.com/in/hadleywickham/). The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R. Our `AMR` package depends on these packages and even extends their use and functions. ```{r lib packages, message = FALSE} library(dplyr) library(ggplot2) library(AMR) # (if not yet installed, install with:) # install.packages(c("tidyverse", "AMR")) ``` ## Prediction analysis Our package contains a function `resistance_predict()`, which takes the same input as functions for [other AMR analysis](./articles/AMR.html). Based on a date column, it calculates cases per year and uses a regression model to predict antimicrobial resistance. It is basically as easy as: ```{r, eval = FALSE} # resistance prediction of piperacillin/tazobactam (pita): resistance_predict(tbl = septic_patients, col_date = "date", col_ab = "pita") # or: septic_patients %>% resistance_predict(col_ab = "pita") # to bind it to object 'predict_pita' for example: predict_pita <- septic_patients %>% resistance_predict(col_ab = "pita") ``` ```{r, echo = FALSE} predict_pita <- septic_patients %>% resistance_predict(col_ab = "pita") ``` The function will look for a data column itself if `col_date` is not set. The result is nothing more than a `data.frame`, containing the years, number of observations, actual observed resistance, the estimated resistance and the standard error below and above the estimation: ```{r} predict_pita ``` The function `plot` is available in base R, and can be extended by other packages to depend the output based on the type of input. We extended its function to cope with resistance predictions: ```{r} plot(predict_pita) ``` We also support the `ggplot2` package with the function `ggplot_rsi_predict()`: ```{r} library(ggplot2) ggplot_rsi_predict(predict_pita) ``` ### Choosing the right model Resistance is not easily predicted; if we look at vancomycin resistance in Gram positives, the spread (i.e. standard error) is enormous: ```{r} septic_patients %>% filter(mo_gramstain(mo) == "Gram positive") %>% resistance_predict(col_ab = "vanc", year_min = 2010, info = FALSE) %>% plot() ``` Vancomycin resistance could be 100% in ten years, but might also stay around 0%. You can define the model with the `model` parameter. The default model is a generalised linear regression model using a binomial distribution, assuming that a period of zero resistance was followed by a period of increasing resistance leading slowly to more and more resistance. Valid values are: | Input values | Function used by R | Type of model | |----------------------------------------|-------------------------------|-----------------------------------------------------| | `"binomial"` or `"binom"` or `"logit"` | `glm(..., family = binomial)` | Generalised linear model with binomial distribution | | `"loglin"` or `"poisson"` | `glm(..., family = poisson)` | Generalised linear model with poisson distribution | | `"lin"` or `"linear"` | `lm()` | Linear model | For the vancomycin resistance in Gram positive bacteria, a linear model might be more appropriate since no (left half of a) binomial distribution is to be expected based on observed years: ```{r} septic_patients %>% filter(mo_gramstain(mo) == "Gram positive") %>% resistance_predict(col_ab = "vanc", year_min = 2010, info = FALSE, model = "linear") %>% plot() ``` This seems more likely, doesn't it?