1
0
mirror of https://github.com/msberends/AMR.git synced 2024-12-25 06:06:12 +01:00

(v2.1.1.9122) fix documentation

This commit is contained in:
dr. M.S. (Matthijs) Berends 2024-12-20 10:52:44 +01:00
parent 15fc72fc66
commit 2e31ec19c3
No known key found for this signature in database
13 changed files with 160 additions and 77 deletions

View File

@ -1,6 +1,6 @@
Package: AMR Package: AMR
Version: 2.1.1.9121 Version: 2.1.1.9122
Date: 2024-12-19 Date: 2024-12-20
Title: Antimicrobial Resistance Data Analysis Title: Antimicrobial Resistance Data Analysis
Description: Functions to simplify and standardise antimicrobial resistance (AMR) Description: Functions to simplify and standardise antimicrobial resistance (AMR)
data analysis and to work with microbial and antimicrobial properties by data analysis and to work with microbial and antimicrobial properties by

View File

@ -1,4 +1,4 @@
# AMR 2.1.1.9121 # AMR 2.1.1.9122
*(this beta version will eventually become v3.0. We're happy to reach a new major milestone soon, which will be all about the new One Health support! Install this beta using [the instructions here](https://msberends.github.io/AMR/#latest-development-version).)* *(this beta version will eventually become v3.0. We're happy to reach a new major milestone soon, which will be all about the new One Health support! Install this beta using [the instructions here](https://msberends.github.io/AMR/#latest-development-version).)*

View File

@ -1,6 +1,6 @@
Metadata-Version: 2.1 Metadata-Version: 2.1
Name: AMR Name: AMR
Version: 2.1.1.9121 Version: 2.1.1.9122
Summary: A Python wrapper for the AMR R package Summary: A Python wrapper for the AMR R package
Home-page: https://github.com/msberends/AMR Home-page: https://github.com/msberends/AMR
Author: Matthijs Berends Author: Matthijs Berends

Binary file not shown.

Binary file not shown.

View File

@ -2,7 +2,7 @@ from setuptools import setup, find_packages
setup( setup(
name='AMR', name='AMR',
version='2.1.1.9121', version='2.1.1.9122',
packages=find_packages(), packages=find_packages(),
install_requires=[ install_requires=[
'rpy2', 'rpy2',

View File

@ -40,7 +40,6 @@
#' @inheritParams proportion #' @inheritParams proportion
#' @param nrow (when using `facet`) number of rows #' @param nrow (when using `facet`) number of rows
#' @param colours a named vactor with colour to be used for filling. The default colours are colour-blind friendly. #' @param colours a named vactor with colour to be used for filling. The default colours are colour-blind friendly.
#' @param aesthetics aesthetics to apply the colours to - the default is "fill" but can also be (a combination of) "alpha", "colour", "fill", "linetype", "shape" or "size"
#' @param datalabels show datalabels using [labels_sir_count()] #' @param datalabels show datalabels using [labels_sir_count()]
#' @param datalabels.size size of the datalabels #' @param datalabels.size size of the datalabels
#' @param datalabels.colour colour of the datalabels #' @param datalabels.colour colour of the datalabels

View File

@ -42,7 +42,10 @@
#' @param colours_SIR colours to use for filling in the bars, must be a vector of three values (in the order S, I and R). The default colours are colour-blind friendly. #' @param colours_SIR colours to use for filling in the bars, must be a vector of three values (in the order S, I and R). The default colours are colour-blind friendly.
#' @param language language to be used to translate 'Susceptible', 'Increased exposure'/'Intermediate' and 'Resistant' - the default is system language (see [get_AMR_locale()]) and can be overwritten by setting the package option [`AMR_locale`][AMR-options], e.g. `options(AMR_locale = "de")`, see [translate]. Use `language = NULL` or `language = ""` to prevent translation. #' @param language language to be used to translate 'Susceptible', 'Increased exposure'/'Intermediate' and 'Resistant' - the default is system language (see [get_AMR_locale()]) and can be overwritten by setting the package option [`AMR_locale`][AMR-options], e.g. `options(AMR_locale = "de")`, see [translate]. Use `language = NULL` or `language = ""` to prevent translation.
#' @param expand a [logical] to indicate whether the range on the x axis should be expanded between the lowest and highest value. For MIC values, intermediate values will be factors of 2 starting from the highest MIC value. For disk diameters, the whole diameter range will be filled. #' @param expand a [logical] to indicate whether the range on the x axis should be expanded between the lowest and highest value. For MIC values, intermediate values will be factors of 2 starting from the highest MIC value. For disk diameters, the whole diameter range will be filled.
#' @param aesthetics aesthetics to apply the colours to - the default is "fill" but can also be (a combination of) "alpha", "colour", "fill", "linetype", "shape" or "size"
#' @inheritParams as.sir #' @inheritParams as.sir
#' @inheritParams ggplot_sir
#' @inheritParams proportion
#' @details #' @details
#' The interpretation of "I" will be named "Increased exposure" for all EUCAST guidelines since 2019, and will be named "Intermediate" in all other cases. #' The interpretation of "I" will be named "Increased exposure" for all EUCAST guidelines since 2019, and will be named "Intermediate" in all other cases.
#' #'
@ -80,7 +83,7 @@
#' plot(some_disk_values, mo = "Escherichia coli", ab = "cipro", language = "nl") #' plot(some_disk_values, mo = "Escherichia coli", ab = "cipro", language = "nl")
#' #'
#' #'
#' # Plotting using scale_x_mic() #' # Plotting using scale_x_mic() ---------------------------------------------
#' \donttest{ #' \donttest{
#' if (require("ggplot2")) { #' if (require("ggplot2")) {
#' mic_plot <- ggplot(data.frame(mics = as.mic(c(0.25, "<=4", 4, 8, 32, ">=32")), #' mic_plot <- ggplot(data.frame(mics = as.mic(c(0.25, "<=4", 4, 8, 32, ">=32")),
@ -120,6 +123,25 @@
#' if (require("ggplot2")) { #' if (require("ggplot2")) {
#' autoplot(some_sir_values) #' autoplot(some_sir_values)
#' } #' }
#'
#' # Plotting using scale_y_percent() -----------------------------------------
#' if (require("ggplot2")) {
#' p <- ggplot(data.frame(mics = as.mic(c(0.25, "<=4", 4, 8, 32, ">=32")),
#' counts = c(1, 1, 2, 2, 3, 3)),
#' aes(mics, counts / sum(counts))) +
#' geom_col()
#' print(p)
#'
#' p2 <- p +
#' scale_y_percent() +
#' theme_sir()
#' print(p2)
#'
#' p +
#' scale_y_percent(breaks = seq(from = 0, to = 1, by = 0.1),
#' limits = c(0, 1)) +
#' theme_sir()
#' }
#' } #' }
NULL NULL
@ -954,7 +976,7 @@ facet_sir <- function(facet = c("interpretation", "antibiotic"), nrow = NULL) {
#' @rdname plot #' @rdname plot
#' @export #' @export
scale_y_percent <- function(breaks = function(x) seq(0, max(x, na.rm = TRUE), 0.1), limits = NULL) { scale_y_percent <- function(breaks = function(x) seq(0, max(x, na.rm = TRUE), 0.1), limits = c(0, NA)) {
stop_ifnot_installed("ggplot2") stop_ifnot_installed("ggplot2")
meet_criteria(breaks, allow_class = c("numeric", "integer", "function")) meet_criteria(breaks, allow_class = c("numeric", "integer", "function"))
meet_criteria(limits, allow_class = c("numeric", "integer"), has_length = 2, allow_NULL = TRUE, allow_NA = TRUE) meet_criteria(limits, allow_class = c("numeric", "integer"), has_length = 2, allow_NULL = TRUE, allow_NA = TRUE)

View File

@ -1,5 +1,5 @@
This files contains all context you must know about the AMR package for R. This files contains all context you must know about the AMR package for R.
First and foremost, you are trained on version 2.1.1.9121. Remember this whenever someone asks which AMR package version youre at. First and foremost, you are trained on version 2.1.1.9122. Remember this whenever someone asks which AMR package version youre at.
-------------------------------- --------------------------------
THE PART HEREAFTER CONTAINS CONTENTS FROM FILE 'NAMESPACE': THE PART HEREAFTER CONTAINS CONTENTS FROM FILE 'NAMESPACE':
@ -5448,8 +5448,6 @@ geom_sir(
\item{y.title}{text to show as y axis description} \item{y.title}{text to show as y axis description}
\item{...}{other arguments passed on to \code{\link[=geom_sir]{geom_sir()}} or, in case of \code{\link[=scale_sir_colours]{scale_sir_colours()}}, named values to set colours. The default colours are colour-blind friendly, while maintaining the convention that e.g. 'susceptible' should be green and 'resistant' should be red. See \emph{Examples}.} \item{...}{other arguments passed on to \code{\link[=geom_sir]{geom_sir()}} or, in case of \code{\link[=scale_sir_colours]{scale_sir_colours()}}, named values to set colours. The default colours are colour-blind friendly, while maintaining the convention that e.g. 'susceptible' should be green and 'resistant' should be red. See \emph{Examples}.}
\item{aesthetics}{aesthetics to apply the colours to - the default is "fill" but can also be (a combination of) "alpha", "colour", "fill", "linetype", "shape" or "size"}
} }
\description{ \description{
Use these functions to create bar plots for AMR data analysis. All functions rely on \link[ggplot2:ggplot]{ggplot2} functions. Use these functions to create bar plots for AMR data analysis. All functions rely on \link[ggplot2:ggplot]{ggplot2} functions.
@ -7545,7 +7543,7 @@ facet_sir(facet = c("interpretation", "antibiotic"), nrow = NULL)
scale_y_percent( scale_y_percent(
breaks = function(x) seq(0, max(x, na.rm = TRUE), 0.1), breaks = function(x) seq(0, max(x, na.rm = TRUE), 0.1),
limits = NULL limits = c(0, NA)
) )
scale_sir_colours( scale_sir_colours(
@ -7597,6 +7595,28 @@ labels_sir_count(
\item{include_PKPD}{a \link{logical} to indicate that PK/PD clinical breakpoints must be applied as a last resort - the default is \code{TRUE}. Can also be set with the package option \code{\link[=AMR-options]{AMR_include_PKPD}}.} \item{include_PKPD}{a \link{logical} to indicate that PK/PD clinical breakpoints must be applied as a last resort - the default is \code{TRUE}. Can also be set with the package option \code{\link[=AMR-options]{AMR_include_PKPD}}.}
\item{breakpoint_type}{the type of breakpoints to use, either "ECOFF", "animal", or "human". ECOFF stands for Epidemiological Cut-Off values. The default is \code{"human"}, which can also be set with the package option \code{\link[=AMR-options]{AMR_breakpoint_type}}. If \code{host} is set to values of veterinary species, this will automatically be set to \code{"animal"}.} \item{breakpoint_type}{the type of breakpoints to use, either "ECOFF", "animal", or "human". ECOFF stands for Epidemiological Cut-Off values. The default is \code{"human"}, which can also be set with the package option \code{\link[=AMR-options]{AMR_breakpoint_type}}. If \code{host} is set to values of veterinary species, this will automatically be set to \code{"animal"}.}
\item{facet}{variable to split plots by, either \code{"interpretation"} (default) or \code{"antibiotic"} or a grouping variable}
\item{nrow}{(when using \code{facet}) number of rows}
\item{breaks}{a \link{numeric} vector of positions}
\item{limits}{a \link{numeric} vector of length two providing limits of the scale, use \code{NA} to refer to the existing minimum or maximum}
\item{aesthetics}{aesthetics to apply the colours to - the default is "fill" but can also be (a combination of) "alpha", "colour", "fill", "linetype", "shape" or "size"}
\item{position}{position adjustment of bars, either \code{"fill"}, \code{"stack"} or \code{"dodge"}}
\item{translate_ab}{a column name of the \link{antibiotics} data set to translate the antibiotic abbreviations to, using \code{\link[=ab_property]{ab_property()}}}
\item{minimum}{the minimum allowed number of available (tested) isolates. Any isolate count lower than \code{minimum} will return \code{NA} with a warning. The default number of \code{30} isolates is advised by the Clinical and Laboratory Standards Institute (CLSI) as best practice, see \emph{Source}.}
\item{combine_SI}{a \link{logical} to indicate whether all values of S, SDD, and I must be merged into one, so the output only consists of S+SDD+I vs. R (susceptible vs. resistant) - the default is \code{TRUE}}
\item{datalabels.size}{size of the datalabels}
\item{datalabels.colour}{colour of the datalabels}
} }
\value{ \value{
The \code{autoplot()} functions return a \code{\link[ggplot2:ggplot]{ggplot}} model that is extendible with any \code{ggplot2} function. The \code{autoplot()} functions return a \code{\link[ggplot2:ggplot]{ggplot}} model that is extendible with any \code{ggplot2} function.
@ -7641,7 +7661,7 @@ plot(some_disk_values, mo = "Escherichia coli", ab = "cipro")
plot(some_disk_values, mo = "Escherichia coli", ab = "cipro", language = "nl") plot(some_disk_values, mo = "Escherichia coli", ab = "cipro", language = "nl")
# Plotting using scale_x_mic() # Plotting using scale_x_mic() ---------------------------------------------
\donttest{ \donttest{
if (require("ggplot2")) { if (require("ggplot2")) {
mic_plot <- ggplot(data.frame(mics = as.mic(c(0.25, "<=4", 4, 8, 32, ">=32")), mic_plot <- ggplot(data.frame(mics = as.mic(c(0.25, "<=4", 4, 8, 32, ">=32")),
@ -7681,6 +7701,25 @@ if (require("ggplot2")) {
if (require("ggplot2")) { if (require("ggplot2")) {
autoplot(some_sir_values) autoplot(some_sir_values)
} }
# Plotting using scale_y_percent() -----------------------------------------
if (require("ggplot2")) {
p <- ggplot(data.frame(mics = as.mic(c(0.25, "<=4", 4, 8, 32, ">=32")),
counts = c(1, 1, 2, 2, 3, 3)),
aes(mics, counts / sum(counts))) +
geom_col()
print(p)
p2 <- p +
scale_y_percent() +
theme_sir()
print(p2)
p +
scale_y_percent(breaks = seq(from = 0, to = 1, by = 0.1),
limits = c(0, 1)) +
theme_sir()
}
} }
} }
@ -8912,13 +8951,13 @@ THE PART HEREAFTER CONTAINS CONTENTS FROM FILE 'vignettes/AMR_with_tidymodels.Rm
--- ---
title: "`AMR` with `tidymodels`" title: "AMR with tidymodels"
output: output:
rmarkdown::html_vignette: rmarkdown::html_vignette:
toc: true toc: true
toc_depth: 3 toc_depth: 3
vignette: > vignette: >
%\VignetteIndexEntry{`AMR` with `tidymodels`} %\VignetteIndexEntry{AMR with tidymodels}
%\VignetteEncoding{UTF-8} %\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown} %\VignetteEngine{knitr::rmarkdown}
editor_options: editor_options:
@ -8935,22 +8974,20 @@ knitr::opts_chunk$set(
) )
``` ```
> This page was entirely written by our [AMR for R Assistant](https://chatgpt.com/g/g-M4UNLwFi5-amr-for-r-assistant), a ChatGPT manually-trained model able to answer any question about the AMR package.
Antimicrobial resistance (AMR) is a global health crisis, and understanding resistance patterns is crucial for managing effective treatments. The `AMR` R package provides robust tools for analysing AMR data, including convenient antibiotic selector functions like `aminoglycosides()` and `betalactams()`. In this post, we will explore how to use the `tidymodels` framework to predict resistance patterns in the `example_isolates` dataset. Antimicrobial resistance (AMR) is a global health crisis, and understanding resistance patterns is crucial for managing effective treatments. The `AMR` R package provides robust tools for analysing AMR data, including convenient antibiotic selector functions like `aminoglycosides()` and `betalactams()`. In this post, we will explore how to use the `tidymodels` framework to predict resistance patterns in the `example_isolates` dataset.
By leveraging the power of `tidymodels` and the `AMR` package, well build a reproducible machine learning workflow to predict resistance to two important antibiotic classes: aminoglycosides and beta-lactams. By leveraging the power of `tidymodels` and the `AMR` package, well build a reproducible machine learning workflow to predict the Gramstain of the microorganism to two important antibiotic classes: aminoglycosides and beta-lactams.
---
### **Objective** ### **Objective**
Our goal is to build a predictive model using the `tidymodels` framework to determine resistance patterns based on microbial data. We will: Our goal is to build a predictive model using the `tidymodels` framework to determine the Gramstain of the microorganism based on microbial data. We will:
1. Preprocess data using the selector functions `aminoglycosides()` and `betalactams()`. 1. Preprocess data using the selector functions `aminoglycosides()` and `betalactams()`.
2. Define a logistic regression model for prediction. 2. Define a logistic regression model for prediction.
3. Use a structured `tidymodels` workflow to preprocess, train, and evaluate the model. 3. Use a structured `tidymodels` workflow to preprocess, train, and evaluate the model.
---
### **Data Preparation** ### **Data Preparation**
We begin by loading the required libraries and preparing the `example_isolates` dataset from the `AMR` package. We begin by loading the required libraries and preparing the `example_isolates` dataset from the `AMR` package.
@ -8976,26 +9013,21 @@ data <- example_isolates %>%
# get Gramstain of microorganisms # get Gramstain of microorganisms
mo = as.factor(mo_gramstain(mo))) %>% mo = as.factor(mo_gramstain(mo))) %>%
# drop NAs - the ones without a Gramstain (fungi, etc.) # drop NAs - the ones without a Gramstain (fungi, etc.)
drop_na() # %>% drop_na()
# Cefepime is not reliable
#select(-FEP)
``` ```
**Explanation:** **Explanation:**
- `aminoglycosides()` and `betalactams()` dynamically select columns for antibiotics in these classes. - `aminoglycosides()` and `betalactams()` dynamically select columns for antibiotics in these classes.
- `drop_na()` ensures the model receives complete cases for training. - `drop_na()` ensures the model receives complete cases for training.
---
### **Defining the Workflow** ### **Defining the Workflow**
We now define the `tidymodels` workflow, which consists of three steps: preprocessing, model specification, and fitting. We now define the `tidymodels` workflow, which consists of three steps: preprocessing, model specification, and fitting.
#### 1. Preprocessing with a Recipe #### 1. Preprocessing with a Recipe
We create a recipe to preprocess the data for modelling. This includes: We create a recipe to preprocess the data for modelling.
- Encoding resistance results (`S`, `I`, `R`) as binary (resistant or not resistant).
- Converting microbial organism names (`mo`) into numerical features using one-hot encoding.
```{r} ```{r}
# Define the recipe for data preprocessing # Define the recipe for data preprocessing
@ -9005,8 +9037,11 @@ resistance_recipe
``` ```
**Explanation:** **Explanation:**
- `step_mutate()` transforms resistance results (`R`) into binary variables (TRUE/FALSE).
- `step_dummy()` converts categorical organism (`mo`) names into one-hot encoded numerical features, making them compatible with the model. - `recipe(mo ~ ., data = data)` will take the `mo` column as outcome and all other columns as predictors.
- `step_corr()` removes predictors (i.e., antibiotic columns) that have a higher correlation than 90%.
Notice how the recipe contains just the antibiotic selector functions - no need to define the columns specifically.
#### 2. Specifying the Model #### 2. Specifying the Model
@ -9020,6 +9055,7 @@ logistic_model
``` ```
**Explanation:** **Explanation:**
- `logistic_reg()` sets up a logistic regression model. - `logistic_reg()` sets up a logistic regression model.
- `set_engine("glm")` specifies the use of R's built-in GLM engine. - `set_engine("glm")` specifies the use of R's built-in GLM engine.
@ -9032,11 +9068,8 @@ We bundle the recipe and model together into a `workflow`, which organizes the e
resistance_workflow <- workflow() %>% resistance_workflow <- workflow() %>%
add_recipe(resistance_recipe) %>% # Add the preprocessing recipe add_recipe(resistance_recipe) %>% # Add the preprocessing recipe
add_model(logistic_model) # Add the logistic regression model add_model(logistic_model) # Add the logistic regression model
resistance_workflow
``` ```
---
### **Training and Evaluating the Model** ### **Training and Evaluating the Model**
To train the model, we split the data into training and testing sets. Then, we fit the workflow on the training set and evaluate its performance. To train the model, we split the data into training and testing sets. Then, we fit the workflow on the training set and evaluate its performance.
@ -9051,14 +9084,15 @@ testing_data <- testing(data_split) # Testing set
# Fit the workflow to the training data # Fit the workflow to the training data
fitted_workflow <- resistance_workflow %>% fitted_workflow <- resistance_workflow %>%
fit(training_data) # Train the model fit(training_data) # Train the model
fitted_workflow
``` ```
**Explanation:** **Explanation:**
- `initial_split()` splits the data into training and testing sets. - `initial_split()` splits the data into training and testing sets.
- `fit()` trains the workflow on the training set. - `fit()` trains the workflow on the training set.
Notice how in `fit()`, the antibiotic selector functions are internally called again. For training, these functions are called since they are stored in the recipe.
Next, we evaluate the model on the testing data. Next, we evaluate the model on the testing data.
```{r} ```{r}
@ -9082,10 +9116,11 @@ metrics
``` ```
**Explanation:** **Explanation:**
- `predict()` generates predictions on the testing set.
- `metrics()` computes evaluation metrics like accuracy and AUC.
It appears we can predict the Gram based on AMR results with a `r round(metrics$.estimate[1], 3)` accuracy. The ROC curve looks like: - `predict()` generates predictions on the testing set.
- `metrics()` computes evaluation metrics like accuracy and kappa.
It appears we can predict the Gram based on AMR results with a `r round(metrics$.estimate[1], 3)` accuracy based on AMR results of aminoglycosides and beta-lactam antibiotics. The ROC curve looks like this:
```{r} ```{r}
predictions %>% predictions %>%
@ -9093,16 +9128,12 @@ predictions %>%
autoplot() autoplot()
``` ```
---
### **Conclusion** ### **Conclusion**
In this post, we demonstrated how to build a machine learning pipeline with the `tidymodels` framework and the `AMR` package. By combining selector functions like `aminoglycosides()` and `betalactams()` with `tidymodels`, we efficiently prepared data, trained a model, and evaluated its performance. In this post, we demonstrated how to build a machine learning pipeline with the `tidymodels` framework and the `AMR` package. By combining selector functions like `aminoglycosides()` and `betalactams()` with `tidymodels`, we efficiently prepared data, trained a model, and evaluated its performance.
This workflow is extensible to other antibiotic classes and resistance patterns, empowering users to analyse AMR data systematically and reproducibly. This workflow is extensible to other antibiotic classes and resistance patterns, empowering users to analyse AMR data systematically and reproducibly.
---
THE PART HEREAFTER CONTAINS CONTENTS FROM FILE 'vignettes/EUCAST.Rmd': THE PART HEREAFTER CONTAINS CONTENTS FROM FILE 'vignettes/EUCAST.Rmd':

View File

@ -86,8 +86,6 @@ geom_sir(
\item{y.title}{text to show as y axis description} \item{y.title}{text to show as y axis description}
\item{...}{other arguments passed on to \code{\link[=geom_sir]{geom_sir()}} or, in case of \code{\link[=scale_sir_colours]{scale_sir_colours()}}, named values to set colours. The default colours are colour-blind friendly, while maintaining the convention that e.g. 'susceptible' should be green and 'resistant' should be red. See \emph{Examples}.} \item{...}{other arguments passed on to \code{\link[=geom_sir]{geom_sir()}} or, in case of \code{\link[=scale_sir_colours]{scale_sir_colours()}}, named values to set colours. The default colours are colour-blind friendly, while maintaining the convention that e.g. 'susceptible' should be green and 'resistant' should be red. See \emph{Examples}.}
\item{aesthetics}{aesthetics to apply the colours to - the default is "fill" but can also be (a combination of) "alpha", "colour", "fill", "linetype", "shape" or "size"}
} }
\description{ \description{
Use these functions to create bar plots for AMR data analysis. All functions rely on \link[ggplot2:ggplot]{ggplot2} functions. Use these functions to create bar plots for AMR data analysis. All functions rely on \link[ggplot2:ggplot]{ggplot2} functions.

View File

@ -123,7 +123,7 @@ facet_sir(facet = c("interpretation", "antibiotic"), nrow = NULL)
scale_y_percent( scale_y_percent(
breaks = function(x) seq(0, max(x, na.rm = TRUE), 0.1), breaks = function(x) seq(0, max(x, na.rm = TRUE), 0.1),
limits = NULL limits = c(0, NA)
) )
scale_sir_colours( scale_sir_colours(
@ -175,6 +175,28 @@ labels_sir_count(
\item{include_PKPD}{a \link{logical} to indicate that PK/PD clinical breakpoints must be applied as a last resort - the default is \code{TRUE}. Can also be set with the package option \code{\link[=AMR-options]{AMR_include_PKPD}}.} \item{include_PKPD}{a \link{logical} to indicate that PK/PD clinical breakpoints must be applied as a last resort - the default is \code{TRUE}. Can also be set with the package option \code{\link[=AMR-options]{AMR_include_PKPD}}.}
\item{breakpoint_type}{the type of breakpoints to use, either "ECOFF", "animal", or "human". ECOFF stands for Epidemiological Cut-Off values. The default is \code{"human"}, which can also be set with the package option \code{\link[=AMR-options]{AMR_breakpoint_type}}. If \code{host} is set to values of veterinary species, this will automatically be set to \code{"animal"}.} \item{breakpoint_type}{the type of breakpoints to use, either "ECOFF", "animal", or "human". ECOFF stands for Epidemiological Cut-Off values. The default is \code{"human"}, which can also be set with the package option \code{\link[=AMR-options]{AMR_breakpoint_type}}. If \code{host} is set to values of veterinary species, this will automatically be set to \code{"animal"}.}
\item{facet}{variable to split plots by, either \code{"interpretation"} (default) or \code{"antibiotic"} or a grouping variable}
\item{nrow}{(when using \code{facet}) number of rows}
\item{breaks}{a \link{numeric} vector of positions}
\item{limits}{a \link{numeric} vector of length two providing limits of the scale, use \code{NA} to refer to the existing minimum or maximum}
\item{aesthetics}{aesthetics to apply the colours to - the default is "fill" but can also be (a combination of) "alpha", "colour", "fill", "linetype", "shape" or "size"}
\item{position}{position adjustment of bars, either \code{"fill"}, \code{"stack"} or \code{"dodge"}}
\item{translate_ab}{a column name of the \link{antibiotics} data set to translate the antibiotic abbreviations to, using \code{\link[=ab_property]{ab_property()}}}
\item{minimum}{the minimum allowed number of available (tested) isolates. Any isolate count lower than \code{minimum} will return \code{NA} with a warning. The default number of \code{30} isolates is advised by the Clinical and Laboratory Standards Institute (CLSI) as best practice, see \emph{Source}.}
\item{combine_SI}{a \link{logical} to indicate whether all values of S, SDD, and I must be merged into one, so the output only consists of S+SDD+I vs. R (susceptible vs. resistant) - the default is \code{TRUE}}
\item{datalabels.size}{size of the datalabels}
\item{datalabels.colour}{colour of the datalabels}
} }
\value{ \value{
The \code{autoplot()} functions return a \code{\link[ggplot2:ggplot]{ggplot}} model that is extendible with any \code{ggplot2} function. The \code{autoplot()} functions return a \code{\link[ggplot2:ggplot]{ggplot}} model that is extendible with any \code{ggplot2} function.
@ -219,7 +241,7 @@ plot(some_disk_values, mo = "Escherichia coli", ab = "cipro")
plot(some_disk_values, mo = "Escherichia coli", ab = "cipro", language = "nl") plot(some_disk_values, mo = "Escherichia coli", ab = "cipro", language = "nl")
# Plotting using scale_x_mic() # Plotting using scale_x_mic() ---------------------------------------------
\donttest{ \donttest{
if (require("ggplot2")) { if (require("ggplot2")) {
mic_plot <- ggplot(data.frame(mics = as.mic(c(0.25, "<=4", 4, 8, 32, ">=32")), mic_plot <- ggplot(data.frame(mics = as.mic(c(0.25, "<=4", 4, 8, 32, ">=32")),
@ -259,5 +281,24 @@ if (require("ggplot2")) {
if (require("ggplot2")) { if (require("ggplot2")) {
autoplot(some_sir_values) autoplot(some_sir_values)
} }
# Plotting using scale_y_percent() -----------------------------------------
if (require("ggplot2")) {
p <- ggplot(data.frame(mics = as.mic(c(0.25, "<=4", 4, 8, 32, ">=32")),
counts = c(1, 1, 2, 2, 3, 3)),
aes(mics, counts / sum(counts))) +
geom_col()
print(p)
p2 <- p +
scale_y_percent() +
theme_sir()
print(p2)
p +
scale_y_percent(breaks = seq(from = 0, to = 1, by = 0.1),
limits = c(0, 1)) +
theme_sir()
}
} }
} }

View File

@ -1,11 +1,11 @@
--- ---
title: "`AMR` with `tidymodels`" title: "AMR with tidymodels"
output: output:
rmarkdown::html_vignette: rmarkdown::html_vignette:
toc: true toc: true
toc_depth: 3 toc_depth: 3
vignette: > vignette: >
%\VignetteIndexEntry{`AMR` with `tidymodels`} %\VignetteIndexEntry{AMR with tidymodels}
%\VignetteEncoding{UTF-8} %\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown} %\VignetteEngine{knitr::rmarkdown}
editor_options: editor_options:
@ -22,22 +22,20 @@ knitr::opts_chunk$set(
) )
``` ```
> This page was entirely written by our [AMR for R Assistant](https://chatgpt.com/g/g-M4UNLwFi5-amr-for-r-assistant), a ChatGPT manually-trained model able to answer any question about the AMR package.
Antimicrobial resistance (AMR) is a global health crisis, and understanding resistance patterns is crucial for managing effective treatments. The `AMR` R package provides robust tools for analysing AMR data, including convenient antibiotic selector functions like `aminoglycosides()` and `betalactams()`. In this post, we will explore how to use the `tidymodels` framework to predict resistance patterns in the `example_isolates` dataset. Antimicrobial resistance (AMR) is a global health crisis, and understanding resistance patterns is crucial for managing effective treatments. The `AMR` R package provides robust tools for analysing AMR data, including convenient antibiotic selector functions like `aminoglycosides()` and `betalactams()`. In this post, we will explore how to use the `tidymodels` framework to predict resistance patterns in the `example_isolates` dataset.
By leveraging the power of `tidymodels` and the `AMR` package, well build a reproducible machine learning workflow to predict resistance to two important antibiotic classes: aminoglycosides and beta-lactams. By leveraging the power of `tidymodels` and the `AMR` package, well build a reproducible machine learning workflow to predict the Gramstain of the microorganism to two important antibiotic classes: aminoglycosides and beta-lactams.
---
### **Objective** ### **Objective**
Our goal is to build a predictive model using the `tidymodels` framework to determine resistance patterns based on microbial data. We will: Our goal is to build a predictive model using the `tidymodels` framework to determine the Gramstain of the microorganism based on microbial data. We will:
1. Preprocess data using the selector functions `aminoglycosides()` and `betalactams()`. 1. Preprocess data using the selector functions `aminoglycosides()` and `betalactams()`.
2. Define a logistic regression model for prediction. 2. Define a logistic regression model for prediction.
3. Use a structured `tidymodels` workflow to preprocess, train, and evaluate the model. 3. Use a structured `tidymodels` workflow to preprocess, train, and evaluate the model.
---
### **Data Preparation** ### **Data Preparation**
We begin by loading the required libraries and preparing the `example_isolates` dataset from the `AMR` package. We begin by loading the required libraries and preparing the `example_isolates` dataset from the `AMR` package.
@ -63,26 +61,21 @@ data <- example_isolates %>%
# get Gramstain of microorganisms # get Gramstain of microorganisms
mo = as.factor(mo_gramstain(mo))) %>% mo = as.factor(mo_gramstain(mo))) %>%
# drop NAs - the ones without a Gramstain (fungi, etc.) # drop NAs - the ones without a Gramstain (fungi, etc.)
drop_na() # %>% drop_na()
# Cefepime is not reliable
#select(-FEP)
``` ```
**Explanation:** **Explanation:**
- `aminoglycosides()` and `betalactams()` dynamically select columns for antibiotics in these classes. - `aminoglycosides()` and `betalactams()` dynamically select columns for antibiotics in these classes.
- `drop_na()` ensures the model receives complete cases for training. - `drop_na()` ensures the model receives complete cases for training.
---
### **Defining the Workflow** ### **Defining the Workflow**
We now define the `tidymodels` workflow, which consists of three steps: preprocessing, model specification, and fitting. We now define the `tidymodels` workflow, which consists of three steps: preprocessing, model specification, and fitting.
#### 1. Preprocessing with a Recipe #### 1. Preprocessing with a Recipe
We create a recipe to preprocess the data for modelling. This includes: We create a recipe to preprocess the data for modelling.
- Encoding resistance results (`S`, `I`, `R`) as binary (resistant or not resistant).
- Converting microbial organism names (`mo`) into numerical features using one-hot encoding.
```{r} ```{r}
# Define the recipe for data preprocessing # Define the recipe for data preprocessing
@ -92,8 +85,11 @@ resistance_recipe
``` ```
**Explanation:** **Explanation:**
- `step_mutate()` transforms resistance results (`R`) into binary variables (TRUE/FALSE).
- `step_dummy()` converts categorical organism (`mo`) names into one-hot encoded numerical features, making them compatible with the model. - `recipe(mo ~ ., data = data)` will take the `mo` column as outcome and all other columns as predictors.
- `step_corr()` removes predictors (i.e., antibiotic columns) that have a higher correlation than 90%.
Notice how the recipe contains just the antibiotic selector functions - no need to define the columns specifically.
#### 2. Specifying the Model #### 2. Specifying the Model
@ -107,6 +103,7 @@ logistic_model
``` ```
**Explanation:** **Explanation:**
- `logistic_reg()` sets up a logistic regression model. - `logistic_reg()` sets up a logistic regression model.
- `set_engine("glm")` specifies the use of R's built-in GLM engine. - `set_engine("glm")` specifies the use of R's built-in GLM engine.
@ -119,11 +116,8 @@ We bundle the recipe and model together into a `workflow`, which organizes the e
resistance_workflow <- workflow() %>% resistance_workflow <- workflow() %>%
add_recipe(resistance_recipe) %>% # Add the preprocessing recipe add_recipe(resistance_recipe) %>% # Add the preprocessing recipe
add_model(logistic_model) # Add the logistic regression model add_model(logistic_model) # Add the logistic regression model
resistance_workflow
``` ```
---
### **Training and Evaluating the Model** ### **Training and Evaluating the Model**
To train the model, we split the data into training and testing sets. Then, we fit the workflow on the training set and evaluate its performance. To train the model, we split the data into training and testing sets. Then, we fit the workflow on the training set and evaluate its performance.
@ -138,14 +132,15 @@ testing_data <- testing(data_split) # Testing set
# Fit the workflow to the training data # Fit the workflow to the training data
fitted_workflow <- resistance_workflow %>% fitted_workflow <- resistance_workflow %>%
fit(training_data) # Train the model fit(training_data) # Train the model
fitted_workflow
``` ```
**Explanation:** **Explanation:**
- `initial_split()` splits the data into training and testing sets. - `initial_split()` splits the data into training and testing sets.
- `fit()` trains the workflow on the training set. - `fit()` trains the workflow on the training set.
Notice how in `fit()`, the antibiotic selector functions are internally called again. For training, these functions are called since they are stored in the recipe.
Next, we evaluate the model on the testing data. Next, we evaluate the model on the testing data.
```{r} ```{r}
@ -169,10 +164,11 @@ metrics
``` ```
**Explanation:** **Explanation:**
- `predict()` generates predictions on the testing set.
- `metrics()` computes evaluation metrics like accuracy and AUC.
It appears we can predict the Gram based on AMR results with a `r round(metrics$.estimate[1], 3)` accuracy. The ROC curve looks like: - `predict()` generates predictions on the testing set.
- `metrics()` computes evaluation metrics like accuracy and kappa.
It appears we can predict the Gram based on AMR results with a `r round(metrics$.estimate[1], 3)` accuracy based on AMR results of aminoglycosides and beta-lactam antibiotics. The ROC curve looks like this:
```{r} ```{r}
predictions %>% predictions %>%
@ -180,12 +176,8 @@ predictions %>%
autoplot() autoplot()
``` ```
---
### **Conclusion** ### **Conclusion**
In this post, we demonstrated how to build a machine learning pipeline with the `tidymodels` framework and the `AMR` package. By combining selector functions like `aminoglycosides()` and `betalactams()` with `tidymodels`, we efficiently prepared data, trained a model, and evaluated its performance. In this post, we demonstrated how to build a machine learning pipeline with the `tidymodels` framework and the `AMR` package. By combining selector functions like `aminoglycosides()` and `betalactams()` with `tidymodels`, we efficiently prepared data, trained a model, and evaluated its performance.
This workflow is extensible to other antibiotic classes and resistance patterns, empowering users to analyse AMR data systematically and reproducibly. This workflow is extensible to other antibiotic classes and resistance patterns, empowering users to analyse AMR data systematically and reproducibly.
---