1
0
mirror of https://github.com/msberends/AMR.git synced 2025-01-25 00:24:41 +01:00
AMR/articles/AMR_with_tidymodels.html

386 lines
37 KiB
HTML
Raw Normal View History

2024-12-19 20:25:10 +01:00
<!DOCTYPE html>
<!-- Generated by pkgdown: do not edit by hand --><html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
2024-12-20 11:03:24 +01:00
<title>AMR with tidymodels • AMR (for R)</title>
2024-12-19 20:25:10 +01:00
<!-- favicons --><link rel="icon" type="image/png" sizes="16x16" href="../favicon-16x16.png">
<link rel="icon" type="image/png" sizes="32x32" href="../favicon-32x32.png">
<link rel="apple-touch-icon" type="image/png" sizes="180x180" href="../apple-touch-icon.png">
<link rel="apple-touch-icon" type="image/png" sizes="120x120" href="../apple-touch-icon-120x120.png">
<link rel="apple-touch-icon" type="image/png" sizes="76x76" href="../apple-touch-icon-76x76.png">
<link rel="apple-touch-icon" type="image/png" sizes="60x60" href="../apple-touch-icon-60x60.png">
<script src="../deps/jquery-3.6.0/jquery-3.6.0.min.js"></script><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<link href="../deps/bootstrap-5.3.1/bootstrap.min.css" rel="stylesheet">
<script src="../deps/bootstrap-5.3.1/bootstrap.bundle.min.js"></script><link href="../deps/Lato-0.4.9/font.css" rel="stylesheet">
<link href="../deps/Fira_Code-0.4.9/font.css" rel="stylesheet">
<link href="../deps/font-awesome-6.5.2/css/all.min.css" rel="stylesheet">
<link href="../deps/font-awesome-6.5.2/css/v4-shims.min.css" rel="stylesheet">
<script src="../deps/headroom-0.11.0/headroom.min.js"></script><script src="../deps/headroom-0.11.0/jQuery.headroom.min.js"></script><script src="../deps/bootstrap-toc-1.0.1/bootstrap-toc.min.js"></script><script src="../deps/clipboard.js-2.0.11/clipboard.min.js"></script><script src="../deps/search-1.0.0/autocomplete.jquery.min.js"></script><script src="../deps/search-1.0.0/fuse.min.js"></script><script src="../deps/search-1.0.0/mark.min.js"></script><!-- pkgdown --><script src="../pkgdown.js"></script><link href="../extra.css" rel="stylesheet">
2024-12-20 11:03:24 +01:00
<script src="../extra.js"></script><meta property="og:title" content="AMR with tidymodels">
2024-12-19 20:25:10 +01:00
</head>
<body>
<a href="#main" class="visually-hidden-focusable">Skip to contents</a>
<nav class="navbar navbar-expand-lg fixed-top bg-primary" data-bs-theme="dark" aria-label="Site navigation"><div class="container">
<a class="navbar-brand me-2" href="../index.html">AMR (for R)</a>
2025-01-15 16:25:08 +01:00
<small class="nav-text text-muted me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="">2.1.1.9123</small>
2024-12-19 20:25:10 +01:00
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div id="navbar" class="collapse navbar-collapse ms-3">
<ul class="navbar-nav me-auto">
<li class="active nav-item dropdown">
<button class="nav-link dropdown-toggle" type="button" id="dropdown-how-to" data-bs-toggle="dropdown" aria-expanded="false" aria-haspopup="true"><span class="fa fa-question-circle"></span> How to</button>
<ul class="dropdown-menu" aria-labelledby="dropdown-how-to">
<li><a class="dropdown-item" href="../articles/AMR.html"><span class="fa fa-directions"></span> Conduct AMR Analysis</a></li>
<li><a class="dropdown-item" href="../reference/antibiogram.html"><span class="fa fa-file-prescription"></span> Generate Antibiogram (Trad./Syndromic/WISCA)</a></li>
<li><a class="dropdown-item" href="../articles/resistance_predict.html"><span class="fa fa-dice"></span> Predict Antimicrobial Resistance</a></li>
<li><a class="dropdown-item" href="../articles/datasets.html"><span class="fa fa-database"></span> Download Data Sets for Own Use</a></li>
<li><a class="dropdown-item" href="../articles/AMR_with_tidymodels.html"><span class="fa fa-square-root-variable"></span> Use AMR for Predictive Modelling (tidymodels)</a></li>
<li><a class="dropdown-item" href="../reference/AMR-options.html"><span class="fa fa-gear"></span> Set User- Or Team-specific Package Settings</a></li>
<li><a class="dropdown-item" href="../articles/PCA.html"><span class="fa fa-compress"></span> Conduct Principal Component Analysis for AMR</a></li>
<li><a class="dropdown-item" href="../articles/MDR.html"><span class="fa fa-skull-crossbones"></span> Determine Multi-Drug Resistance (MDR)</a></li>
<li><a class="dropdown-item" href="../articles/WHONET.html"><span class="fa fa-globe-americas"></span> Work with WHONET Data</a></li>
<li><a class="dropdown-item" href="../articles/EUCAST.html"><span class="fa fa-exchange-alt"></span> Apply Eucast Rules</a></li>
<li><a class="dropdown-item" href="../reference/mo_property.html"><span class="fa fa-bug"></span> Get Taxonomy of a Microorganism</a></li>
<li><a class="dropdown-item" href="../reference/ab_property.html"><span class="fa fa-capsules"></span> Get Properties of an Antibiotic Drug</a></li>
<li><a class="dropdown-item" href="../reference/av_property.html"><span class="fa fa-capsules"></span> Get Properties of an Antiviral Drug</a></li>
</ul>
</li>
<li class="nav-item"><a class="nav-link" href="../articles/AMR_for_Python.html"><span class="fa fab fa-python"></span> AMR for Python</a></li>
<li class="nav-item"><a class="nav-link" href="../reference/index.html"><span class="fa fa-book-open"></span> Manual</a></li>
<li class="nav-item"><a class="nav-link" href="../authors.html"><span class="fa fa-users"></span> Authors</a></li>
</ul>
<ul class="navbar-nav">
<li class="nav-item"><a class="nav-link" href="../news/index.html"><span class="fa far fa-newspaper"></span> Changelog</a></li>
<li class="nav-item"><a class="external-link nav-link" href="https://github.com/msberends/AMR"><span class="fa fab fa-github"></span> Source Code</a></li>
</ul>
</div>
</div>
</nav><div class="container template-article">
<div class="row">
<main id="main" class="col-md-9"><div class="page-header">
2024-12-20 11:03:24 +01:00
<img src="../logo.svg" class="logo" alt=""><h1>AMR with tidymodels</h1>
2024-12-19 20:25:10 +01:00
<small class="dont-index">Source: <a href="https://github.com/msberends/AMR/blob/main/vignettes/AMR_with_tidymodels.Rmd" class="external-link"><code>vignettes/AMR_with_tidymodels.Rmd</code></a></small>
<div class="d-none name"><code>AMR_with_tidymodels.Rmd</code></div>
</div>
2024-12-20 11:03:24 +01:00
<blockquote>
<p>This page was entirely written by our <a href="https://chatgpt.com/g/g-M4UNLwFi5-amr-for-r-assistant" class="external-link">AMR for R
Assistant</a>, a ChatGPT manually-trained model able to answer any
question about the AMR package.</p>
</blockquote>
2024-12-19 20:25:10 +01:00
<p>Antimicrobial resistance (AMR) is a global health crisis, and
understanding resistance patterns is crucial for managing effective
treatments. The <code>AMR</code> R package provides robust tools for
analysing AMR data, including convenient antibiotic selector functions
like <code><a href="../reference/antibiotic_class_selectors.html">aminoglycosides()</a></code> and <code><a href="../reference/antibiotic_class_selectors.html">betalactams()</a></code>. In
this post, we will explore how to use the <code>tidymodels</code>
framework to predict resistance patterns in the
<code>example_isolates</code> dataset.</p>
<p>By leveraging the power of <code>tidymodels</code> and the
<code>AMR</code> package, well build a reproducible machine learning
2024-12-20 11:03:24 +01:00
workflow to predict the Gramstain of the microorganism to two important
antibiotic classes: aminoglycosides and beta-lactams.</p>
2024-12-19 20:25:10 +01:00
<div class="section level3">
<h3 id="objective">
<strong>Objective</strong><a class="anchor" aria-label="anchor" href="#objective"></a>
</h3>
<p>Our goal is to build a predictive model using the
2024-12-20 11:03:24 +01:00
<code>tidymodels</code> framework to determine the Gramstain of the
microorganism based on microbial data. We will:</p>
2024-12-19 20:25:10 +01:00
<ol style="list-style-type: decimal">
<li>Preprocess data using the selector functions
<code><a href="../reference/antibiotic_class_selectors.html">aminoglycosides()</a></code> and <code><a href="../reference/antibiotic_class_selectors.html">betalactams()</a></code>.</li>
<li>Define a logistic regression model for prediction.</li>
<li>Use a structured <code>tidymodels</code> workflow to preprocess,
train, and evaluate the model.</li>
</ol>
</div>
<div class="section level3">
<h3 id="data-preparation">
<strong>Data Preparation</strong><a class="anchor" aria-label="anchor" href="#data-preparation"></a>
</h3>
<p>We begin by loading the required libraries and preparing the
<code>example_isolates</code> dataset from the <code>AMR</code>
package.</p>
<div class="sourceCode" id="cb1"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="co"># Load required libraries</span></span>
<span><span class="kw"><a href="https://rdrr.io/r/base/library.html" class="external-link">library</a></span><span class="op">(</span><span class="va"><a href="https://tidymodels.tidymodels.org" class="external-link">tidymodels</a></span><span class="op">)</span> <span class="co"># For machine learning workflows, and data manipulation (dplyr, tidyr, ...)</span></span>
<span><span class="co">#&gt; ── <span style="font-weight: bold;">Attaching packages</span> ────────────────────────────────────── tidymodels 1.2.0 ──</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">broom </span> 1.0.7 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">recipes </span> 1.1.0</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">dials </span> 1.3.0 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">rsample </span> 1.2.1</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">dplyr </span> 1.1.4 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">tibble </span> 3.2.1</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">ggplot2 </span> 3.5.1 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">tidyr </span> 1.3.1</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">infer </span> 1.0.7 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">tune </span> 1.2.1</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">modeldata </span> 1.4.0 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">workflows </span> 1.1.4</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">parsnip </span> 1.2.1 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">workflowsets</span> 1.1.0</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">purrr </span> 1.0.2 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">yardstick </span> 1.3.1</span></span>
<span><span class="co">#&gt; ── <span style="font-weight: bold;">Conflicts</span> ───────────────────────────────────────── tidymodels_conflicts() ──</span></span>
<span><span class="co">#&gt; <span style="color: #BB0000;"></span> <span style="color: #0000BB;">purrr</span>::<span style="color: #00BB00;">discard()</span> masks <span style="color: #0000BB;">scales</span>::discard()</span></span>
<span><span class="co">#&gt; <span style="color: #BB0000;"></span> <span style="color: #0000BB;">dplyr</span>::<span style="color: #00BB00;">filter()</span> masks <span style="color: #0000BB;">stats</span>::filter()</span></span>
<span><span class="co">#&gt; <span style="color: #BB0000;"></span> <span style="color: #0000BB;">dplyr</span>::<span style="color: #00BB00;">lag()</span> masks <span style="color: #0000BB;">stats</span>::lag()</span></span>
<span><span class="co">#&gt; <span style="color: #BB0000;"></span> <span style="color: #0000BB;">recipes</span>::<span style="color: #00BB00;">step()</span> masks <span style="color: #0000BB;">stats</span>::step()</span></span>
2025-01-15 16:25:08 +01:00
<span><span class="co">#&gt; <span style="color: #0000BB;"></span> Dig deeper into tidy modeling with R at <span style="color: #00BB00;">https://www.tmwr.org</span></span></span>
2024-12-19 20:25:10 +01:00
<span><span class="kw"><a href="https://rdrr.io/r/base/library.html" class="external-link">library</a></span><span class="op">(</span><span class="va"><a href="https://msberends.github.io/AMR/">AMR</a></span><span class="op">)</span> <span class="co"># For AMR data analysis</span></span>
<span></span>
<span><span class="co"># Load the example_isolates dataset</span></span>
<span><span class="fu"><a href="https://rdrr.io/r/utils/data.html" class="external-link">data</a></span><span class="op">(</span><span class="st">"example_isolates"</span><span class="op">)</span> <span class="co"># Preloaded dataset with AMR results</span></span>
<span></span>
<span><span class="co"># Select relevant columns for prediction</span></span>
<span><span class="va">data</span> <span class="op">&lt;-</span> <span class="va">example_isolates</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="co"># select AB results dynamically</span></span>
<span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/select.html" class="external-link">select</a></span><span class="op">(</span><span class="va">mo</span>, <span class="fu"><a href="../reference/antibiotic_class_selectors.html">aminoglycosides</a></span><span class="op">(</span><span class="op">)</span>, <span class="fu"><a href="../reference/antibiotic_class_selectors.html">betalactams</a></span><span class="op">(</span><span class="op">)</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="co"># replace NAs with NI (not-interpretable)</span></span>
<span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/mutate.html" class="external-link">mutate</a></span><span class="op">(</span><span class="fu"><a href="https://dplyr.tidyverse.org/reference/across.html" class="external-link">across</a></span><span class="op">(</span><span class="fu"><a href="https://tidyselect.r-lib.org/reference/where.html" class="external-link">where</a></span><span class="op">(</span><span class="va">is.sir</span><span class="op">)</span>,</span>
<span> <span class="op">~</span><span class="fu">replace_na</span><span class="op">(</span><span class="va">.x</span>, <span class="st">"NI"</span><span class="op">)</span><span class="op">)</span>,</span>
<span> <span class="co"># make factors of SIR columns</span></span>
<span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/across.html" class="external-link">across</a></span><span class="op">(</span><span class="fu"><a href="https://tidyselect.r-lib.org/reference/where.html" class="external-link">where</a></span><span class="op">(</span><span class="va">is.sir</span><span class="op">)</span>,</span>
<span> <span class="va">as.integer</span><span class="op">)</span>,</span>
<span> <span class="co"># get Gramstain of microorganisms</span></span>
<span> mo <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/factor.html" class="external-link">as.factor</a></span><span class="op">(</span><span class="fu"><a href="../reference/mo_property.html">mo_gramstain</a></span><span class="op">(</span><span class="va">mo</span><span class="op">)</span><span class="op">)</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="co"># drop NAs - the ones without a Gramstain (fungi, etc.)</span></span>
2024-12-20 11:03:24 +01:00
<span> <span class="fu">drop_na</span><span class="op">(</span><span class="op">)</span></span>
2024-12-19 20:25:10 +01:00
<span><span class="co">#&gt; For aminoglycosides() using columns 'GEN' (gentamicin), 'TOB'</span></span>
<span><span class="co">#&gt; (tobramycin), 'AMK' (amikacin), and 'KAN' (kanamycin)</span></span>
<span><span class="co">#&gt; For betalactams() using columns 'PEN' (benzylpenicillin), 'OXA'</span></span>
<span><span class="co">#&gt; (oxacillin), 'FLC' (flucloxacillin), 'AMX' (amoxicillin), 'AMC'</span></span>
<span><span class="co">#&gt; (amoxicillin/clavulanic acid), 'AMP' (ampicillin), 'TZP'</span></span>
<span><span class="co">#&gt; (piperacillin/tazobactam), 'CZO' (cefazolin), 'FEP' (cefepime), 'CXM'</span></span>
<span><span class="co">#&gt; (cefuroxime), 'FOX' (cefoxitin), 'CTX' (cefotaxime), 'CAZ' (ceftazidime),</span></span>
2024-12-20 11:03:24 +01:00
<span><span class="co">#&gt; 'CRO' (ceftriaxone), 'IPM' (imipenem), and 'MEM' (meropenem)</span></span></code></pre></div>
<p><strong>Explanation:</strong></p>
<ul>
<li>
<code><a href="../reference/antibiotic_class_selectors.html">aminoglycosides()</a></code> and <code><a href="../reference/antibiotic_class_selectors.html">betalactams()</a></code>
dynamically select columns for antibiotics in these classes.</li>
<li>
<code>drop_na()</code> ensures the model receives complete cases for
training.</li>
</ul>
2024-12-19 20:25:10 +01:00
</div>
<div class="section level3">
<h3 id="defining-the-workflow">
<strong>Defining the Workflow</strong><a class="anchor" aria-label="anchor" href="#defining-the-workflow"></a>
</h3>
<p>We now define the <code>tidymodels</code> workflow, which consists of
three steps: preprocessing, model specification, and fitting.</p>
<div class="section level4">
<h4 id="preprocessing-with-a-recipe">1. Preprocessing with a Recipe<a class="anchor" aria-label="anchor" href="#preprocessing-with-a-recipe"></a>
</h4>
2024-12-20 11:03:24 +01:00
<p>We create a recipe to preprocess the data for modelling.</p>
2024-12-19 20:25:10 +01:00
<div class="sourceCode" id="cb2"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="co"># Define the recipe for data preprocessing</span></span>
<span><span class="va">resistance_recipe</span> <span class="op">&lt;-</span> <span class="fu">recipe</span><span class="op">(</span><span class="va">mo</span> <span class="op">~</span> <span class="va">.</span>, data <span class="op">=</span> <span class="va">data</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu">step_corr</span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/c.html" class="external-link">c</a></span><span class="op">(</span><span class="fu"><a href="../reference/antibiotic_class_selectors.html">aminoglycosides</a></span><span class="op">(</span><span class="op">)</span>, <span class="fu"><a href="../reference/antibiotic_class_selectors.html">betalactams</a></span><span class="op">(</span><span class="op">)</span><span class="op">)</span>, threshold <span class="op">=</span> <span class="fl">0.9</span><span class="op">)</span></span>
<span><span class="va">resistance_recipe</span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; <span style="color: #00BBBB;">──</span> <span style="font-weight: bold;">Recipe</span> <span style="color: #00BBBB;">──────────────────────────────────────────────────────────────────────</span></span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; ── Inputs</span></span>
<span><span class="co">#&gt; Number of variables by role</span></span>
<span><span class="co">#&gt; outcome: 1</span></span>
<span><span class="co">#&gt; predictor: 20</span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; ── Operations</span></span>
<span><span class="co">#&gt; <span style="color: #00BBBB;"></span> Correlation filter on: <span style="color: #0000BB;">c(aminoglycosides(), betalactams())</span></span></span></code></pre></div>
2024-12-20 11:03:24 +01:00
<p><strong>Explanation:</strong></p>
<ul>
<li>
<code>recipe(mo ~ ., data = data)</code> will take the
<code>mo</code> column as outcome and all other columns as
predictors.</li>
<li>
<code>step_corr()</code> removes predictors (i.e., antibiotic
columns) that have a higher correlation than 90%.</li>
</ul>
<p>Notice how the recipe contains just the antibiotic selector functions
- no need to define the columns specifically.</p>
2024-12-19 20:25:10 +01:00
</div>
<div class="section level4">
<h4 id="specifying-the-model">2. Specifying the Model<a class="anchor" aria-label="anchor" href="#specifying-the-model"></a>
</h4>
<p>We define a logistic regression model since resistance prediction is
a binary classification task.</p>
<div class="sourceCode" id="cb3"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="co"># Specify a logistic regression model</span></span>
<span><span class="va">logistic_model</span> <span class="op">&lt;-</span> <span class="fu">logistic_reg</span><span class="op">(</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu">set_engine</span><span class="op">(</span><span class="st">"glm"</span><span class="op">)</span> <span class="co"># Use the Generalized Linear Model engine</span></span>
<span><span class="va">logistic_model</span></span>
<span><span class="co">#&gt; Logistic Regression Model Specification (classification)</span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; Computational engine: glm</span></span></code></pre></div>
2024-12-20 11:03:24 +01:00
<p><strong>Explanation:</strong></p>
<ul>
<li>
<code>logistic_reg()</code> sets up a logistic regression
model.</li>
<li>
<code>set_engine("glm")</code> specifies the use of Rs built-in GLM
engine.</li>
</ul>
2024-12-19 20:25:10 +01:00
</div>
<div class="section level4">
<h4 id="building-the-workflow">3. Building the Workflow<a class="anchor" aria-label="anchor" href="#building-the-workflow"></a>
</h4>
<p>We bundle the recipe and model together into a <code>workflow</code>,
which organizes the entire modeling process.</p>
<div class="sourceCode" id="cb4"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="co"># Combine the recipe and model into a workflow</span></span>
<span><span class="va">resistance_workflow</span> <span class="op">&lt;-</span> <span class="fu">workflow</span><span class="op">(</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu">add_recipe</span><span class="op">(</span><span class="va">resistance_recipe</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span> <span class="co"># Add the preprocessing recipe</span></span>
2024-12-20 11:03:24 +01:00
<span> <span class="fu">add_model</span><span class="op">(</span><span class="va">logistic_model</span><span class="op">)</span> <span class="co"># Add the logistic regression model</span></span></code></pre></div>
2024-12-19 20:25:10 +01:00
</div>
</div>
<div class="section level3">
<h3 id="training-and-evaluating-the-model">
<strong>Training and Evaluating the Model</strong><a class="anchor" aria-label="anchor" href="#training-and-evaluating-the-model"></a>
</h3>
<p>To train the model, we split the data into training and testing sets.
Then, we fit the workflow on the training set and evaluate its
performance.</p>
<div class="sourceCode" id="cb5"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="co"># Split data into training and testing sets</span></span>
<span><span class="fu"><a href="https://rdrr.io/r/base/Random.html" class="external-link">set.seed</a></span><span class="op">(</span><span class="fl">123</span><span class="op">)</span> <span class="co"># For reproducibility</span></span>
<span><span class="va">data_split</span> <span class="op">&lt;-</span> <span class="fu">initial_split</span><span class="op">(</span><span class="va">data</span>, prop <span class="op">=</span> <span class="fl">0.8</span><span class="op">)</span> <span class="co"># 80% training, 20% testing</span></span>
<span><span class="va">training_data</span> <span class="op">&lt;-</span> <span class="fu">training</span><span class="op">(</span><span class="va">data_split</span><span class="op">)</span> <span class="co"># Training set</span></span>
<span><span class="va">testing_data</span> <span class="op">&lt;-</span> <span class="fu">testing</span><span class="op">(</span><span class="va">data_split</span><span class="op">)</span> <span class="co"># Testing set</span></span>
<span></span>
<span><span class="co"># Fit the workflow to the training data</span></span>
<span><span class="va">fitted_workflow</span> <span class="op">&lt;-</span> <span class="va">resistance_workflow</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu">fit</span><span class="op">(</span><span class="va">training_data</span><span class="op">)</span> <span class="co"># Train the model</span></span>
<span><span class="co">#&gt; For aminoglycosides() using columns 'GEN' (gentamicin), 'TOB'</span></span>
<span><span class="co">#&gt; (tobramycin), 'AMK' (amikacin), and 'KAN' (kanamycin)</span></span>
<span><span class="co">#&gt; For betalactams() using columns 'PEN' (benzylpenicillin), 'OXA'</span></span>
<span><span class="co">#&gt; (oxacillin), 'FLC' (flucloxacillin), 'AMX' (amoxicillin), 'AMC'</span></span>
<span><span class="co">#&gt; (amoxicillin/clavulanic acid), 'AMP' (ampicillin), 'TZP'</span></span>
<span><span class="co">#&gt; (piperacillin/tazobactam), 'CZO' (cefazolin), 'FEP' (cefepime), 'CXM'</span></span>
<span><span class="co">#&gt; (cefuroxime), 'FOX' (cefoxitin), 'CTX' (cefotaxime), 'CAZ' (ceftazidime),</span></span>
2024-12-20 11:03:24 +01:00
<span><span class="co">#&gt; 'CRO' (ceftriaxone), 'IPM' (imipenem), and 'MEM' (meropenem)</span></span></code></pre></div>
<p><strong>Explanation:</strong></p>
<ul>
<li>
<code>initial_split()</code> splits the data into training and
testing sets.</li>
<li>
<code>fit()</code> trains the workflow on the training set.</li>
</ul>
<p>Notice how in <code>fit()</code>, the antibiotic selector functions
are internally called again. For training, these functions are called
since they are stored in the recipe.</p>
2024-12-19 20:25:10 +01:00
<p>Next, we evaluate the model on the testing data.</p>
<div class="sourceCode" id="cb6"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="co"># Make predictions on the testing set</span></span>
<span><span class="va">predictions</span> <span class="op">&lt;-</span> <span class="va">fitted_workflow</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu"><a href="https://rdrr.io/r/stats/predict.html" class="external-link">predict</a></span><span class="op">(</span><span class="va">testing_data</span><span class="op">)</span> <span class="co"># Generate predictions</span></span>
<span><span class="va">probabilities</span> <span class="op">&lt;-</span> <span class="va">fitted_workflow</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu"><a href="https://rdrr.io/r/stats/predict.html" class="external-link">predict</a></span><span class="op">(</span><span class="va">testing_data</span>, type <span class="op">=</span> <span class="st">"prob"</span><span class="op">)</span> <span class="co"># Generate probabilities</span></span>
<span></span>
<span><span class="va">predictions</span> <span class="op">&lt;-</span> <span class="va">predictions</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/bind_cols.html" class="external-link">bind_cols</a></span><span class="op">(</span><span class="va">probabilities</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/bind_cols.html" class="external-link">bind_cols</a></span><span class="op">(</span><span class="va">testing_data</span><span class="op">)</span> <span class="co"># Combine with true labels</span></span>
<span></span>
<span><span class="va">predictions</span></span>
<span><span class="co">#&gt; <span style="color: #949494;"># A tibble: 394 × 24</span></span></span>
<span><span class="co">#&gt; .pred_class `.pred_Gram-negative` `.pred_Gram-positive` mo GEN TOB</span></span>
<span><span class="co">#&gt; <span style="color: #949494; font-style: italic;">&lt;fct&gt;</span> <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span> <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span> <span style="color: #949494; font-style: italic;">&lt;fct&gt;</span> <span style="color: #949494; font-style: italic;">&lt;int&gt;</span> <span style="color: #949494; font-style: italic;">&lt;int&gt;</span></span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 1</span> Gram-positive 1.07<span style="color: #949494;">e</span><span style="color: #BB0000;">- 1</span> 8.93<span style="color: #949494;">e</span><span style="color: #BB0000;">- 1</span> Gram-p… 5 5</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 2</span> Gram-positive 3.17<span style="color: #949494;">e</span><span style="color: #BB0000;">- 8</span> 1.00<span style="color: #949494;">e</span>+ 0 Gram-p… 5 1</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 3</span> Gram-negative 9.99<span style="color: #949494;">e</span><span style="color: #BB0000;">- 1</span> 1.42<span style="color: #949494;">e</span><span style="color: #BB0000;">- 3</span> Gram-n… 5 5</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 4</span> Gram-positive 2.22<span style="color: #949494;">e</span><span style="color: #BB0000;">-16</span> 1 <span style="color: #949494;">e</span>+ 0 Gram-p… 5 5</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 5</span> Gram-negative 9.46<span style="color: #949494;">e</span><span style="color: #BB0000;">- 1</span> 5.42<span style="color: #949494;">e</span><span style="color: #BB0000;">- 2</span> Gram-n… 5 5</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 6</span> Gram-positive 1.07<span style="color: #949494;">e</span><span style="color: #BB0000;">- 1</span> 8.93<span style="color: #949494;">e</span><span style="color: #BB0000;">- 1</span> Gram-p… 5 5</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 7</span> Gram-positive 2.22<span style="color: #949494;">e</span><span style="color: #BB0000;">-16</span> 1 <span style="color: #949494;">e</span>+ 0 Gram-p… 1 5</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 8</span> Gram-positive 2.22<span style="color: #949494;">e</span><span style="color: #BB0000;">-16</span> 1 <span style="color: #949494;">e</span>+ 0 Gram-p… 4 4</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 9</span> Gram-negative 1 <span style="color: #949494;">e</span>+ 0 2.22<span style="color: #949494;">e</span><span style="color: #BB0000;">-16</span> Gram-n… 1 1</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;">10</span> Gram-positive 6.05<span style="color: #949494;">e</span><span style="color: #BB0000;">-11</span> 1.00<span style="color: #949494;">e</span>+ 0 Gram-p… 4 4</span></span>
<span><span class="co">#&gt; <span style="color: #949494;"># 384 more rows</span></span></span>
<span><span class="co">#&gt; <span style="color: #949494;"># 18 more variables: AMK &lt;int&gt;, KAN &lt;int&gt;, PEN &lt;int&gt;, OXA &lt;int&gt;, FLC &lt;int&gt;,</span></span></span>
<span><span class="co">#&gt; <span style="color: #949494;"># AMX &lt;int&gt;, AMC &lt;int&gt;, AMP &lt;int&gt;, TZP &lt;int&gt;, CZO &lt;int&gt;, FEP &lt;int&gt;,</span></span></span>
<span><span class="co">#&gt; <span style="color: #949494;"># CXM &lt;int&gt;, FOX &lt;int&gt;, CTX &lt;int&gt;, CAZ &lt;int&gt;, CRO &lt;int&gt;, IPM &lt;int&gt;, MEM &lt;int&gt;</span></span></span>
<span></span>
<span><span class="co"># Evaluate model performance</span></span>
<span><span class="va">metrics</span> <span class="op">&lt;-</span> <span class="va">predictions</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu">metrics</span><span class="op">(</span>truth <span class="op">=</span> <span class="va">mo</span>, estimate <span class="op">=</span> <span class="va">.pred_class</span><span class="op">)</span> <span class="co"># Calculate performance metrics</span></span>
<span></span>
<span><span class="va">metrics</span></span>
<span><span class="co">#&gt; <span style="color: #949494;"># A tibble: 2 × 3</span></span></span>
<span><span class="co">#&gt; .metric .estimator .estimate</span></span>
<span><span class="co">#&gt; <span style="color: #949494; font-style: italic;">&lt;chr&gt;</span> <span style="color: #949494; font-style: italic;">&lt;chr&gt;</span> <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span></span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;">1</span> accuracy binary 0.995</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;">2</span> kap binary 0.989</span></span></code></pre></div>
2024-12-20 11:03:24 +01:00
<p><strong>Explanation:</strong></p>
<ul>
<li>
<code><a href="https://rdrr.io/r/stats/predict.html" class="external-link">predict()</a></code> generates predictions on the testing
set.</li>
<li>
<code>metrics()</code> computes evaluation metrics like accuracy and
kappa.</li>
</ul>
2024-12-19 20:25:10 +01:00
<p>It appears we can predict the Gram based on AMR results with a 0.995
2024-12-20 11:03:24 +01:00
accuracy based on AMR results of aminoglycosides and beta-lactam
antibiotics. The ROC curve looks like this:</p>
2024-12-19 20:25:10 +01:00
<div class="sourceCode" id="cb7"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="va">predictions</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu">roc_curve</span><span class="op">(</span><span class="va">mo</span>, <span class="va">`.pred_Gram-negative`</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/autoplot.html" class="external-link">autoplot</a></span><span class="op">(</span><span class="op">)</span></span></code></pre></div>
<p><img src="AMR_with_tidymodels_files/figure-html/unnamed-chunk-7-1.png" width="720"></p>
</div>
<div class="section level3">
<h3 id="conclusion">
<strong>Conclusion</strong><a class="anchor" aria-label="anchor" href="#conclusion"></a>
</h3>
<p>In this post, we demonstrated how to build a machine learning
pipeline with the <code>tidymodels</code> framework and the
<code>AMR</code> package. By combining selector functions like
<code><a href="../reference/antibiotic_class_selectors.html">aminoglycosides()</a></code> and <code><a href="../reference/antibiotic_class_selectors.html">betalactams()</a></code> with
<code>tidymodels</code>, we efficiently prepared data, trained a model,
and evaluated its performance.</p>
<p>This workflow is extensible to other antibiotic classes and
resistance patterns, empowering users to analyse AMR data systematically
and reproducibly.</p>
</div>
</main><aside class="col-md-3"><nav id="toc" aria-label="Table of contents"><h2>On this page</h2>
</nav></aside>
</div>
<footer><div class="pkgdown-footer-left">
<p><code>AMR</code> (for R). Free and open-source, licenced under the <a target="_blank" href="https://github.com/msberends/AMR/blob/main/LICENSE" class="external-link">GNU General Public License version 2.0 (GPL-2)</a>.<br>Developed at the <a target="_blank" href="https://www.rug.nl" class="external-link">University of Groningen</a> and <a target="_blank" href="https://www.umcg.nl" class="external-link">University Medical Center Groningen</a> in The Netherlands.</p>
</div>
<div class="pkgdown-footer-right">
<p><a target="_blank" href="https://www.rug.nl" class="external-link"><img src="https://github.com/msberends/AMR/raw/main/pkgdown/assets/logo_rug.svg" style="max-width: 150px;"></a><a target="_blank" href="https://www.umcg.nl" class="external-link"><img src="https://github.com/msberends/AMR/raw/main/pkgdown/assets/logo_umcg.svg" style="max-width: 150px;"></a></p>
</div>
</footer>
</div>
</body>
</html>