AMR/vignettes/resistance_predict.Rmd

---
title: "How to predict antimicrobial resistance"
output: 
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{How to predict antimicrobial resistance}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---

```{r setup, include = FALSE, results = 'markup'}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#",
  fig.width = 7.5,
  fig.height = 4.75
)
```

## Needed R packages
As with many uses in R, we need some additional packages for AMR data analysis. Our package works closely together with the [tidyverse packages](https://www.tidyverse.org) [`dplyr`](https://dplyr.tidyverse.org/) and [`ggplot2`](https://ggplot2.tidyverse.org) by Dr Hadley Wickham. The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.

Our `AMR` package depends on these packages and even extends their use and functions.

```{r lib packages, message = FALSE}
library(dplyr)
library(ggplot2)
library(AMR)

# (if not yet installed, install with:)
# install.packages(c("tidyverse", "AMR"))
```

## Prediction analysis
Our package contains a function `resistance_predict()`, which takes the same input as functions for [other AMR data analysis](./AMR.html). Based on a date column, it calculates cases per year and uses a regression model to predict antimicrobial resistance.

It is basically as easy as:
```{r, eval = FALSE}
# resistance prediction of piperacillin/tazobactam (TZP):
resistance_predict(tbl = example_isolates, col_date = "date", col_ab = "TZP", model = "binomial")

# or:
example_isolates %>% 
  resistance_predict(col_ab = "TZP",
                     model  "binomial")

# to bind it to object 'predict_TZP' for example:
predict_TZP <- example_isolates %>% 
  resistance_predict(col_ab = "TZP",
                     model = "binomial")
```

The function will look for a date column itself if `col_date` is not set.

When running any of these commands, a summary of the regression model will be printed unless using `resistance_predict(..., info = FALSE)`.

```{r, echo = FALSE}
predict_TZP <- example_isolates %>% 
  resistance_predict(col_ab = "TZP", model = "binomial")
```

This text is only a printed summary - the actual result (output) of the function is a `data.frame` containing for each year: the number of observations, the actual observed resistance, the estimated resistance and the standard error below and above the estimation:

```{r}
predict_TZP
```

The function `plot` is available in base R, and can be extended by other packages to depend the output based on the type of input. We extended its function to cope with resistance predictions:

```{r, fig.height = 5.5}
plot(predict_TZP)
```

This is the fastest way to plot the result. It automatically adds the right axes, error bars, titles, number of available observations and type of model.

We also support the `ggplot2` package with our custom function `ggplot_rsi_predict()` to create more appealing plots:

```{r}
ggplot_rsi_predict(predict_TZP)

# choose for error bars instead of a ribbon
ggplot_rsi_predict(predict_TZP, ribbon = FALSE)
```

### Choosing the right model

Resistance is not easily predicted; if we look at vancomycin resistance in Gram-positive bacteria, the spread (i.e. standard error) is enormous:

```{r}
example_isolates %>%
  filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
  resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "binomial") %>% 
  ggplot_rsi_predict()
```

Vancomycin resistance could be 100% in ten years, but might also stay around 0%. 

You can define the model with the `model` parameter. The model chosen above is a generalised linear regression model using a binomial distribution, assuming that a period of zero resistance was followed by a period of increasing resistance leading slowly to more and more resistance.

Valid values are:

| Input values                           | Function used by R            | Type of model                                       |
|----------------------------------------|-------------------------------|-----------------------------------------------------|
| `"binomial"` or `"binom"` or `"logit"` | `glm(..., family = binomial)` | Generalised linear model with binomial distribution |
| `"loglin"` or `"poisson"`              | `glm(..., family = poisson)`  | Generalised linear model with poisson distribution  |
| `"lin"` or `"linear"`                  | `lm()`                        | Linear model                                        |

For the vancomycin resistance in Gram-positive bacteria, a linear model might be more appropriate since no binomial distribution is to be expected based on the observed years:

```{r}
example_isolates %>%
  filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
  resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>% 
  ggplot_rsi_predict()
```

This seems more likely, doesn't it?

The model itself is also available from the object, as an `attribute`:
```{r}
model <- attributes(predict_TZP)$model

summary(model)$family

summary(model)$coefficients
```
big website update, licence txt update 2019-01-02 23:24:07 +01:00			`---`
			`title: "How to predict antimicrobial resistance"`
			`output:`
			`rmarkdown::html_vignette:`
			`toc: true`
			`vignette: >`
			`%\VignetteIndexEntry{How to predict antimicrobial resistance}`
			`%\VignetteEncoding{UTF-8}`
			`%\VignetteEngine{knitr::rmarkdown}`
			`editor_options:`
			`chunk_output_type: console`
			`---`

			```{r setup, include = FALSE, results = 'markup'}
			`knitr::opts_chunk$set(`
			`collapse = TRUE,`
speed improvement eucast_rules(), support more old MO codes 2019-05-20 12:00:18 +02:00			`comment = "#",`
as.rsi warning, site update 2019-02-09 22:16:24 +01:00			`fig.width = 7.5,`
resistance predict update 2019-02-11 10:27:10 +01:00			`fig.height = 4.75`
big website update, licence txt update 2019-01-02 23:24:07 +01:00			`)`
			```

as.rsi warning, site update 2019-02-09 22:16:24 +01:00			`## Needed R packages`
(v1.5.0.9014) only_rsi_columns, is.rsi.eligible improvement 2021-02-02 23:57:35 +01:00			As with many uses in R, we need some additional packages for AMR data analysis. Our package works closely together with the [tidyverse packages](https://www.tidyverse.org) [`dplyr`](https://dplyr.tidyverse.org/) and [`ggplot2`](https://ggplot2.tidyverse.org) by Dr Hadley Wickham. The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.
as.rsi warning, site update 2019-02-09 22:16:24 +01:00
			Our `AMR` package depends on these packages and even extends their use and functions.

			```{r lib packages, message = FALSE}
			`library(dplyr)`
			`library(ggplot2)`
			`library(AMR)`

			`# (if not yet installed, install with:)`
			`# install.packages(c("tidyverse", "AMR"))`
			```

			`## Prediction analysis`
(v1.5.0.9014) only_rsi_columns, is.rsi.eligible improvement 2021-02-02 23:57:35 +01:00			Our package contains a function `resistance_predict()`, which takes the same input as functions for [other AMR data analysis](./AMR.html). Based on a date column, it calculates cases per year and uses a regression model to predict antimicrobial resistance.
as.rsi warning, site update 2019-02-09 22:16:24 +01:00
			`It is basically as easy as:`
			```{r, eval = FALSE}
new antibiotics 2019-05-10 16:44:59 +02:00			`# resistance prediction of piperacillin/tazobactam (TZP):`
(v0.7.1.9063) septic_patients -> example_isolates 2019-08-27 16:45:42 +02:00			`resistance_predict(tbl = example_isolates, col_date = "date", col_ab = "TZP", model = "binomial")`
as.rsi warning, site update 2019-02-09 22:16:24 +01:00
			`# or:`
(v0.7.1.9063) septic_patients -> example_isolates 2019-08-27 16:45:42 +02:00			`example_isolates %>%`
(v0.7.1.9030) eucast_rules() fix 2019-08-08 15:52:07 +02:00			`resistance_predict(col_ab = "TZP",`
			`model "binomial")`
as.rsi warning, site update 2019-02-09 22:16:24 +01:00
new antibiotics 2019-05-10 16:44:59 +02:00			`# to bind it to object 'predict_TZP' for example:`
(v0.7.1.9063) septic_patients -> example_isolates 2019-08-27 16:45:42 +02:00			`predict_TZP <- example_isolates %>%`
(v0.7.1.9030) eucast_rules() fix 2019-08-08 15:52:07 +02:00			`resistance_predict(col_ab = "TZP",`
			`model = "binomial")`
as.rsi warning, site update 2019-02-09 22:16:24 +01:00			```

resistance predict update 2019-02-11 10:27:10 +01:00			The function will look for a date column itself if `col_date` is not set.

			When running any of these commands, a summary of the regression model will be printed unless using `resistance_predict(..., info = FALSE)`.

as.rsi warning, site update 2019-02-09 22:16:24 +01:00			```{r, echo = FALSE}
(v0.7.1.9063) septic_patients -> example_isolates 2019-08-27 16:45:42 +02:00			`predict_TZP <- example_isolates %>%`
(v0.7.1.9030) eucast_rules() fix 2019-08-08 15:52:07 +02:00			`resistance_predict(col_ab = "TZP", model = "binomial")`
as.rsi warning, site update 2019-02-09 22:16:24 +01:00			```

resistance predict update 2019-02-11 10:27:10 +01:00			This text is only a printed summary - the actual result (output) of the function is a `data.frame` containing for each year: the number of observations, the actual observed resistance, the estimated resistance and the standard error below and above the estimation:

as.rsi warning, site update 2019-02-09 22:16:24 +01:00			```{r}
new antibiotics 2019-05-10 16:44:59 +02:00			`predict_TZP`
as.rsi warning, site update 2019-02-09 22:16:24 +01:00			```

			The function `plot` is available in base R, and can be extended by other packages to depend the output based on the type of input. We extended its function to cope with resistance predictions:

resistance predict update 2019-02-11 10:27:10 +01:00			```{r, fig.height = 5.5}
new antibiotics 2019-05-10 16:44:59 +02:00			`plot(predict_TZP)`
as.rsi warning, site update 2019-02-09 22:16:24 +01:00			```

resistance predict update 2019-02-11 10:27:10 +01:00			`This is the fastest way to plot the result. It automatically adds the right axes, error bars, titles, number of available observations and type of model.`

			We also support the `ggplot2` package with our custom function `ggplot_rsi_predict()` to create more appealing plots:
as.rsi warning, site update 2019-02-09 22:16:24 +01:00
			```{r}
new antibiotics 2019-05-10 16:44:59 +02:00			`ggplot_rsi_predict(predict_TZP)`
resistance predict update 2019-02-11 10:27:10 +01:00
			`# choose for error bars instead of a ribbon`
new antibiotics 2019-05-10 16:44:59 +02:00			`ggplot_rsi_predict(predict_TZP, ribbon = FALSE)`
as.rsi warning, site update 2019-02-09 22:16:24 +01:00			```

			`### Choosing the right model`

(v1.1.0.9019) mo_source fix 2020-05-25 01:01:14 +02:00			`Resistance is not easily predicted; if we look at vancomycin resistance in Gram-positive bacteria, the spread (i.e. standard error) is enormous:`
as.rsi warning, site update 2019-02-09 22:16:24 +01:00
			```{r}
(v0.7.1.9063) septic_patients -> example_isolates 2019-08-27 16:45:42 +02:00			`example_isolates %>%`
(v0.7.0.9015) vignette fix 2019-06-22 21:33:13 +02:00			`filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%`
(v0.7.1.9030) eucast_rules() fix 2019-08-08 15:52:07 +02:00			`resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "binomial") %>%`
resistance predict update 2019-02-11 10:27:10 +01:00			`ggplot_rsi_predict()`
as.rsi warning, site update 2019-02-09 22:16:24 +01:00			```

			`Vancomycin resistance could be 100% in ten years, but might also stay around 0%.`

(v0.7.1.9030) eucast_rules() fix 2019-08-08 15:52:07 +02:00			You can define the model with the `model` parameter. The model chosen above is a generalised linear regression model using a binomial distribution, assuming that a period of zero resistance was followed by a period of increasing resistance leading slowly to more and more resistance.
as.rsi warning, site update 2019-02-09 22:16:24 +01:00
			`Valid values are:`

			`\| Input values \| Function used by R \| Type of model \|`
			`\|----------------------------------------\|-------------------------------\|-----------------------------------------------------\|`
			\| `"binomial"` or `"binom"` or `"logit"` \| `glm(..., family = binomial)` \| Generalised linear model with binomial distribution \|
			\| `"loglin"` or `"poisson"` \| `glm(..., family = poisson)` \| Generalised linear model with poisson distribution \|
			\| `"lin"` or `"linear"` \| `lm()` \| Linear model \|

(v1.1.0.9019) mo_source fix 2020-05-25 01:01:14 +02:00			`For the vancomycin resistance in Gram-positive bacteria, a linear model might be more appropriate since no binomial distribution is to be expected based on the observed years:`
as.rsi warning, site update 2019-02-09 22:16:24 +01:00
			```{r}
(v0.7.1.9063) septic_patients -> example_isolates 2019-08-27 16:45:42 +02:00			`example_isolates %>%`
(v0.7.0.9015) vignette fix 2019-06-22 21:33:13 +02:00			`filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%`
new antibiotics 2019-05-10 16:44:59 +02:00			`resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>%`
resistance predict update 2019-02-11 10:27:10 +01:00			`ggplot_rsi_predict()`
as.rsi warning, site update 2019-02-09 22:16:24 +01:00			```

			`This seems more likely, doesn't it?`
resistance predict update 2019-02-11 10:27:10 +01:00
			The model itself is also available from the object, as an `attribute`:
			```{r}
new antibiotics 2019-05-10 16:44:59 +02:00			`model <- attributes(predict_TZP)$model`
resistance predict update 2019-02-11 10:27:10 +01:00
			`summary(model)$family`

			`summary(model)$coefficients`
			```