(v2.1.1.9122) fix documentation

2025-07-11 04:21:52 +02:00 · 2024-12-20 10:52:44 +01:00
parent 15fc72fc66
commit 2e31ec19c3
13 changed files with 160 additions and 77 deletions
--- a/vignettes/AMR_with_tidymodels.Rmd
+++ b/vignettes/AMR_with_tidymodels.Rmd
@ -1,11 +1,11 @@
 ---
-title: "`AMR` with `tidymodels`"
+title: "AMR with tidymodels"
 output: 
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
 vignette: >
-  %\VignetteIndexEntry{`AMR` with `tidymodels`}
+  %\VignetteIndexEntry{AMR with tidymodels}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
 editor_options: 
@ -22,22 +22,20 @@ knitr::opts_chunk$set(
 )
 ```

+> This page was entirely written by our [AMR for R Assistant](https://chatgpt.com/g/g-M4UNLwFi5-amr-for-r-assistant), a ChatGPT manually-trained model able to answer any question about the AMR package.
+
 Antimicrobial resistance (AMR) is a global health crisis, and understanding resistance patterns is crucial for managing effective treatments. The `AMR` R package provides robust tools for analysing AMR data, including convenient antibiotic selector functions like `aminoglycosides()` and `betalactams()`. In this post, we will explore how to use the `tidymodels` framework to predict resistance patterns in the `example_isolates` dataset. 

-By leveraging the power of `tidymodels` and the `AMR` package, we’ll build a reproducible machine learning workflow to predict resistance to two important antibiotic classes: aminoglycosides and beta-lactams.
-
---
+By leveraging the power of `tidymodels` and the `AMR` package, we’ll build a reproducible machine learning workflow to predict the Gramstain of the microorganism to two important antibiotic classes: aminoglycosides and beta-lactams.

 ### **Objective**

-Our goal is to build a predictive model using the `tidymodels` framework to determine resistance patterns based on microbial data. We will:
+Our goal is to build a predictive model using the `tidymodels` framework to determine the Gramstain of the microorganism based on microbial data. We will:

 1. Preprocess data using the selector functions `aminoglycosides()` and `betalactams()`.
 2. Define a logistic regression model for prediction.
 3. Use a structured `tidymodels` workflow to preprocess, train, and evaluate the model.

---
-
 ### **Data Preparation**

 We begin by loading the required libraries and preparing the `example_isolates` dataset from the `AMR` package.
@ -63,26 +61,21 @@ data <- example_isolates %>%
          # get Gramstain of microorganisms
          mo = as.factor(mo_gramstain(mo))) %>%
  # drop NAs - the ones without a Gramstain (fungi, etc.)
-  drop_na() # %>%
-  # Cefepime is not reliable
-  #select(-FEP)
+  drop_na()
 ```

 **Explanation:**
+
 - `aminoglycosides()` and `betalactams()` dynamically select columns for antibiotics in these classes.
 - `drop_na()` ensures the model receives complete cases for training.

---
-
 ### **Defining the Workflow**

 We now define the `tidymodels` workflow, which consists of three steps: preprocessing, model specification, and fitting.

 #### 1. Preprocessing with a Recipe

-We create a recipe to preprocess the data for modelling. This includes:
- Encoding resistance results (`S`, `I`, `R`) as binary (resistant or not resistant).
- Converting microbial organism names (`mo`) into numerical features using one-hot encoding.
+We create a recipe to preprocess the data for modelling.

 ```{r}
 # Define the recipe for data preprocessing
@ -92,8 +85,11 @@ resistance_recipe
 ```

 **Explanation:**
- `step_mutate()` transforms resistance results (`R`) into binary variables (TRUE/FALSE).
- `step_dummy()` converts categorical organism (`mo`) names into one-hot encoded numerical features, making them compatible with the model.
+
+- `recipe(mo ~ ., data = data)` will take the `mo` column as outcome and all other columns as predictors.
+- `step_corr()` removes predictors (i.e., antibiotic columns) that have a higher correlation than 90%.
+
+Notice how the recipe contains just the antibiotic selector functions - no need to define the columns specifically.

 #### 2. Specifying the Model

@ -107,6 +103,7 @@ logistic_model
 ```

 **Explanation:**
+
 - `logistic_reg()` sets up a logistic regression model.
 - `set_engine("glm")` specifies the use of R's built-in GLM engine.

@ -119,11 +116,8 @@ We bundle the recipe and model together into a `workflow`, which organizes the e
 resistance_workflow <- workflow() %>%
  add_recipe(resistance_recipe) %>% # Add the preprocessing recipe
  add_model(logistic_model) # Add the logistic regression model
-resistance_workflow
 ```

---
-
 ### **Training and Evaluating the Model**

 To train the model, we split the data into training and testing sets. Then, we fit the workflow on the training set and evaluate its performance.
@ -138,14 +132,15 @@ testing_data <- testing(data_split)   # Testing set
 # Fit the workflow to the training data
 fitted_workflow <- resistance_workflow %>%
  fit(training_data) # Train the model
-
-fitted_workflow
 ```

 **Explanation:**
+
 - `initial_split()` splits the data into training and testing sets.
 - `fit()` trains the workflow on the training set.

+Notice how in `fit()`, the antibiotic selector functions are internally called again. For training, these functions are called since they are stored in the recipe.
+
 Next, we evaluate the model on the testing data.

 ```{r}
@ -169,10 +164,11 @@ metrics
 ```

 **Explanation:**
- `predict()` generates predictions on the testing set.
- `metrics()` computes evaluation metrics like accuracy and AUC.

-It appears we can predict the Gram based on AMR results with a `r round(metrics$.estimate[1], 3)` accuracy. The ROC curve looks like:
+- `predict()` generates predictions on the testing set.
+- `metrics()` computes evaluation metrics like accuracy and kappa.
+
+It appears we can predict the Gram based on AMR results with a `r round(metrics$.estimate[1], 3)` accuracy based on AMR results of aminoglycosides and beta-lactam antibiotics. The ROC curve looks like this:

 ```{r}
 predictions %>%
@ -180,12 +176,8 @@ predictions %>%
  autoplot()
 ```

---
-
 ### **Conclusion**

 In this post, we demonstrated how to build a machine learning pipeline with the `tidymodels` framework and the `AMR` package. By combining selector functions like `aminoglycosides()` and `betalactams()` with `tidymodels`, we efficiently prepared data, trained a model, and evaluated its performance.

 This workflow is extensible to other antibiotic classes and resistance patterns, empowering users to analyse AMR data systematically and reproducibly.
-
---