1
0
mirror of https://github.com/msberends/AMR.git synced 2024-12-25 20:06:12 +01:00
AMR/articles/AMR_with_tidymodels.html
2024-12-20 10:03:24 +00:00

388 lines
38 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<!-- Generated by pkgdown: do not edit by hand --><html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title>AMR with tidymodels • AMR (for R)</title>
<!-- favicons --><link rel="icon" type="image/png" sizes="16x16" href="../favicon-16x16.png">
<link rel="icon" type="image/png" sizes="32x32" href="../favicon-32x32.png">
<link rel="apple-touch-icon" type="image/png" sizes="180x180" href="../apple-touch-icon.png">
<link rel="apple-touch-icon" type="image/png" sizes="120x120" href="../apple-touch-icon-120x120.png">
<link rel="apple-touch-icon" type="image/png" sizes="76x76" href="../apple-touch-icon-76x76.png">
<link rel="apple-touch-icon" type="image/png" sizes="60x60" href="../apple-touch-icon-60x60.png">
<script src="../deps/jquery-3.6.0/jquery-3.6.0.min.js"></script><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<link href="../deps/bootstrap-5.3.1/bootstrap.min.css" rel="stylesheet">
<script src="../deps/bootstrap-5.3.1/bootstrap.bundle.min.js"></script><link href="../deps/Lato-0.4.9/font.css" rel="stylesheet">
<link href="../deps/Fira_Code-0.4.9/font.css" rel="stylesheet">
<link href="../deps/font-awesome-6.5.2/css/all.min.css" rel="stylesheet">
<link href="../deps/font-awesome-6.5.2/css/v4-shims.min.css" rel="stylesheet">
<script src="../deps/headroom-0.11.0/headroom.min.js"></script><script src="../deps/headroom-0.11.0/jQuery.headroom.min.js"></script><script src="../deps/bootstrap-toc-1.0.1/bootstrap-toc.min.js"></script><script src="../deps/clipboard.js-2.0.11/clipboard.min.js"></script><script src="../deps/search-1.0.0/autocomplete.jquery.min.js"></script><script src="../deps/search-1.0.0/fuse.min.js"></script><script src="../deps/search-1.0.0/mark.min.js"></script><!-- pkgdown --><script src="../pkgdown.js"></script><link href="../extra.css" rel="stylesheet">
<script src="../extra.js"></script><meta property="og:title" content="AMR with tidymodels">
</head>
<body>
<a href="#main" class="visually-hidden-focusable">Skip to contents</a>
<nav class="navbar navbar-expand-lg fixed-top bg-primary" data-bs-theme="dark" aria-label="Site navigation"><div class="container">
<a class="navbar-brand me-2" href="../index.html">AMR (for R)</a>
<small class="nav-text text-muted me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="">2.1.1.9122</small>
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div id="navbar" class="collapse navbar-collapse ms-3">
<ul class="navbar-nav me-auto">
<li class="active nav-item dropdown">
<button class="nav-link dropdown-toggle" type="button" id="dropdown-how-to" data-bs-toggle="dropdown" aria-expanded="false" aria-haspopup="true"><span class="fa fa-question-circle"></span> How to</button>
<ul class="dropdown-menu" aria-labelledby="dropdown-how-to">
<li><a class="dropdown-item" href="../articles/AMR.html"><span class="fa fa-directions"></span> Conduct AMR Analysis</a></li>
<li><a class="dropdown-item" href="../reference/antibiogram.html"><span class="fa fa-file-prescription"></span> Generate Antibiogram (Trad./Syndromic/WISCA)</a></li>
<li><a class="dropdown-item" href="../articles/resistance_predict.html"><span class="fa fa-dice"></span> Predict Antimicrobial Resistance</a></li>
<li><a class="dropdown-item" href="../articles/datasets.html"><span class="fa fa-database"></span> Download Data Sets for Own Use</a></li>
<li><a class="dropdown-item" href="../articles/AMR_with_tidymodels.html"><span class="fa fa-square-root-variable"></span> Use AMR for Predictive Modelling (tidymodels)</a></li>
<li><a class="dropdown-item" href="../reference/AMR-options.html"><span class="fa fa-gear"></span> Set User- Or Team-specific Package Settings</a></li>
<li><a class="dropdown-item" href="../articles/PCA.html"><span class="fa fa-compress"></span> Conduct Principal Component Analysis for AMR</a></li>
<li><a class="dropdown-item" href="../articles/MDR.html"><span class="fa fa-skull-crossbones"></span> Determine Multi-Drug Resistance (MDR)</a></li>
<li><a class="dropdown-item" href="../articles/WHONET.html"><span class="fa fa-globe-americas"></span> Work with WHONET Data</a></li>
<li><a class="dropdown-item" href="../articles/EUCAST.html"><span class="fa fa-exchange-alt"></span> Apply Eucast Rules</a></li>
<li><a class="dropdown-item" href="../reference/mo_property.html"><span class="fa fa-bug"></span> Get Taxonomy of a Microorganism</a></li>
<li><a class="dropdown-item" href="../reference/ab_property.html"><span class="fa fa-capsules"></span> Get Properties of an Antibiotic Drug</a></li>
<li><a class="dropdown-item" href="../reference/av_property.html"><span class="fa fa-capsules"></span> Get Properties of an Antiviral Drug</a></li>
</ul>
</li>
<li class="nav-item"><a class="nav-link" href="../articles/AMR_for_Python.html"><span class="fa fab fa-python"></span> AMR for Python</a></li>
<li class="nav-item"><a class="nav-link" href="../reference/index.html"><span class="fa fa-book-open"></span> Manual</a></li>
<li class="nav-item"><a class="nav-link" href="../authors.html"><span class="fa fa-users"></span> Authors</a></li>
</ul>
<ul class="navbar-nav">
<li class="nav-item"><a class="nav-link" href="../news/index.html"><span class="fa far fa-newspaper"></span> Changelog</a></li>
<li class="nav-item"><a class="external-link nav-link" href="https://github.com/msberends/AMR"><span class="fa fab fa-github"></span> Source Code</a></li>
</ul>
</div>
</div>
</nav><div class="container template-article">
<div class="row">
<main id="main" class="col-md-9"><div class="page-header">
<img src="../logo.svg" class="logo" alt=""><h1>AMR with tidymodels</h1>
<small class="dont-index">Source: <a href="https://github.com/msberends/AMR/blob/main/vignettes/AMR_with_tidymodels.Rmd" class="external-link"><code>vignettes/AMR_with_tidymodels.Rmd</code></a></small>
<div class="d-none name"><code>AMR_with_tidymodels.Rmd</code></div>
</div>
<blockquote>
<p>This page was entirely written by our <a href="https://chatgpt.com/g/g-M4UNLwFi5-amr-for-r-assistant" class="external-link">AMR for R
Assistant</a>, a ChatGPT manually-trained model able to answer any
question about the AMR package.</p>
</blockquote>
<p>Antimicrobial resistance (AMR) is a global health crisis, and
understanding resistance patterns is crucial for managing effective
treatments. The <code>AMR</code> R package provides robust tools for
analysing AMR data, including convenient antibiotic selector functions
like <code><a href="../reference/antibiotic_class_selectors.html">aminoglycosides()</a></code> and <code><a href="../reference/antibiotic_class_selectors.html">betalactams()</a></code>. In
this post, we will explore how to use the <code>tidymodels</code>
framework to predict resistance patterns in the
<code>example_isolates</code> dataset.</p>
<p>By leveraging the power of <code>tidymodels</code> and the
<code>AMR</code> package, well build a reproducible machine learning
workflow to predict the Gramstain of the microorganism to two important
antibiotic classes: aminoglycosides and beta-lactams.</p>
<div class="section level3">
<h3 id="objective">
<strong>Objective</strong><a class="anchor" aria-label="anchor" href="#objective"></a>
</h3>
<p>Our goal is to build a predictive model using the
<code>tidymodels</code> framework to determine the Gramstain of the
microorganism based on microbial data. We will:</p>
<ol style="list-style-type: decimal">
<li>Preprocess data using the selector functions
<code><a href="../reference/antibiotic_class_selectors.html">aminoglycosides()</a></code> and <code><a href="../reference/antibiotic_class_selectors.html">betalactams()</a></code>.</li>
<li>Define a logistic regression model for prediction.</li>
<li>Use a structured <code>tidymodels</code> workflow to preprocess,
train, and evaluate the model.</li>
</ol>
</div>
<div class="section level3">
<h3 id="data-preparation">
<strong>Data Preparation</strong><a class="anchor" aria-label="anchor" href="#data-preparation"></a>
</h3>
<p>We begin by loading the required libraries and preparing the
<code>example_isolates</code> dataset from the <code>AMR</code>
package.</p>
<div class="sourceCode" id="cb1"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="co"># Load required libraries</span></span>
<span><span class="kw"><a href="https://rdrr.io/r/base/library.html" class="external-link">library</a></span><span class="op">(</span><span class="va"><a href="https://tidymodels.tidymodels.org" class="external-link">tidymodels</a></span><span class="op">)</span> <span class="co"># For machine learning workflows, and data manipulation (dplyr, tidyr, ...)</span></span>
<span><span class="co">#&gt; Error in get(paste0(generic, ".", class), envir = get_method_env()) : </span></span>
<span><span class="co">#&gt; object 'type_sum.accel' not found</span></span>
<span><span class="co">#&gt; ── <span style="font-weight: bold;">Attaching packages</span> ────────────────────────────────────── tidymodels 1.2.0 ──</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">broom </span> 1.0.7 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">recipes </span> 1.1.0</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">dials </span> 1.3.0 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">rsample </span> 1.2.1</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">dplyr </span> 1.1.4 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">tibble </span> 3.2.1</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">ggplot2 </span> 3.5.1 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">tidyr </span> 1.3.1</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">infer </span> 1.0.7 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">tune </span> 1.2.1</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">modeldata </span> 1.4.0 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">workflows </span> 1.1.4</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">parsnip </span> 1.2.1 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">workflowsets</span> 1.1.0</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">purrr </span> 1.0.2 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">yardstick </span> 1.3.1</span></span>
<span><span class="co">#&gt; ── <span style="font-weight: bold;">Conflicts</span> ───────────────────────────────────────── tidymodels_conflicts() ──</span></span>
<span><span class="co">#&gt; <span style="color: #BB0000;"></span> <span style="color: #0000BB;">purrr</span>::<span style="color: #00BB00;">discard()</span> masks <span style="color: #0000BB;">scales</span>::discard()</span></span>
<span><span class="co">#&gt; <span style="color: #BB0000;"></span> <span style="color: #0000BB;">dplyr</span>::<span style="color: #00BB00;">filter()</span> masks <span style="color: #0000BB;">stats</span>::filter()</span></span>
<span><span class="co">#&gt; <span style="color: #BB0000;"></span> <span style="color: #0000BB;">dplyr</span>::<span style="color: #00BB00;">lag()</span> masks <span style="color: #0000BB;">stats</span>::lag()</span></span>
<span><span class="co">#&gt; <span style="color: #BB0000;"></span> <span style="color: #0000BB;">recipes</span>::<span style="color: #00BB00;">step()</span> masks <span style="color: #0000BB;">stats</span>::step()</span></span>
<span><span class="co">#&gt; <span style="color: #0000BB;"></span> Use <span style="color: #00BB00;">tidymodels_prefer()</span> to resolve common conflicts.</span></span>
<span><span class="kw"><a href="https://rdrr.io/r/base/library.html" class="external-link">library</a></span><span class="op">(</span><span class="va"><a href="https://msberends.github.io/AMR/">AMR</a></span><span class="op">)</span> <span class="co"># For AMR data analysis</span></span>
<span></span>
<span><span class="co"># Load the example_isolates dataset</span></span>
<span><span class="fu"><a href="https://rdrr.io/r/utils/data.html" class="external-link">data</a></span><span class="op">(</span><span class="st">"example_isolates"</span><span class="op">)</span> <span class="co"># Preloaded dataset with AMR results</span></span>
<span></span>
<span><span class="co"># Select relevant columns for prediction</span></span>
<span><span class="va">data</span> <span class="op">&lt;-</span> <span class="va">example_isolates</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="co"># select AB results dynamically</span></span>
<span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/select.html" class="external-link">select</a></span><span class="op">(</span><span class="va">mo</span>, <span class="fu"><a href="../reference/antibiotic_class_selectors.html">aminoglycosides</a></span><span class="op">(</span><span class="op">)</span>, <span class="fu"><a href="../reference/antibiotic_class_selectors.html">betalactams</a></span><span class="op">(</span><span class="op">)</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="co"># replace NAs with NI (not-interpretable)</span></span>
<span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/mutate.html" class="external-link">mutate</a></span><span class="op">(</span><span class="fu"><a href="https://dplyr.tidyverse.org/reference/across.html" class="external-link">across</a></span><span class="op">(</span><span class="fu"><a href="https://tidyselect.r-lib.org/reference/where.html" class="external-link">where</a></span><span class="op">(</span><span class="va">is.sir</span><span class="op">)</span>,</span>
<span> <span class="op">~</span><span class="fu">replace_na</span><span class="op">(</span><span class="va">.x</span>, <span class="st">"NI"</span><span class="op">)</span><span class="op">)</span>,</span>
<span> <span class="co"># make factors of SIR columns</span></span>
<span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/across.html" class="external-link">across</a></span><span class="op">(</span><span class="fu"><a href="https://tidyselect.r-lib.org/reference/where.html" class="external-link">where</a></span><span class="op">(</span><span class="va">is.sir</span><span class="op">)</span>,</span>
<span> <span class="va">as.integer</span><span class="op">)</span>,</span>
<span> <span class="co"># get Gramstain of microorganisms</span></span>
<span> mo <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/factor.html" class="external-link">as.factor</a></span><span class="op">(</span><span class="fu"><a href="../reference/mo_property.html">mo_gramstain</a></span><span class="op">(</span><span class="va">mo</span><span class="op">)</span><span class="op">)</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="co"># drop NAs - the ones without a Gramstain (fungi, etc.)</span></span>
<span> <span class="fu">drop_na</span><span class="op">(</span><span class="op">)</span></span>
<span><span class="co">#&gt; For aminoglycosides() using columns 'GEN' (gentamicin), 'TOB'</span></span>
<span><span class="co">#&gt; (tobramycin), 'AMK' (amikacin), and 'KAN' (kanamycin)</span></span>
<span><span class="co">#&gt; For betalactams() using columns 'PEN' (benzylpenicillin), 'OXA'</span></span>
<span><span class="co">#&gt; (oxacillin), 'FLC' (flucloxacillin), 'AMX' (amoxicillin), 'AMC'</span></span>
<span><span class="co">#&gt; (amoxicillin/clavulanic acid), 'AMP' (ampicillin), 'TZP'</span></span>
<span><span class="co">#&gt; (piperacillin/tazobactam), 'CZO' (cefazolin), 'FEP' (cefepime), 'CXM'</span></span>
<span><span class="co">#&gt; (cefuroxime), 'FOX' (cefoxitin), 'CTX' (cefotaxime), 'CAZ' (ceftazidime),</span></span>
<span><span class="co">#&gt; 'CRO' (ceftriaxone), 'IPM' (imipenem), and 'MEM' (meropenem)</span></span></code></pre></div>
<p><strong>Explanation:</strong></p>
<ul>
<li>
<code><a href="../reference/antibiotic_class_selectors.html">aminoglycosides()</a></code> and <code><a href="../reference/antibiotic_class_selectors.html">betalactams()</a></code>
dynamically select columns for antibiotics in these classes.</li>
<li>
<code>drop_na()</code> ensures the model receives complete cases for
training.</li>
</ul>
</div>
<div class="section level3">
<h3 id="defining-the-workflow">
<strong>Defining the Workflow</strong><a class="anchor" aria-label="anchor" href="#defining-the-workflow"></a>
</h3>
<p>We now define the <code>tidymodels</code> workflow, which consists of
three steps: preprocessing, model specification, and fitting.</p>
<div class="section level4">
<h4 id="preprocessing-with-a-recipe">1. Preprocessing with a Recipe<a class="anchor" aria-label="anchor" href="#preprocessing-with-a-recipe"></a>
</h4>
<p>We create a recipe to preprocess the data for modelling.</p>
<div class="sourceCode" id="cb2"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="co"># Define the recipe for data preprocessing</span></span>
<span><span class="va">resistance_recipe</span> <span class="op">&lt;-</span> <span class="fu">recipe</span><span class="op">(</span><span class="va">mo</span> <span class="op">~</span> <span class="va">.</span>, data <span class="op">=</span> <span class="va">data</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu">step_corr</span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/c.html" class="external-link">c</a></span><span class="op">(</span><span class="fu"><a href="../reference/antibiotic_class_selectors.html">aminoglycosides</a></span><span class="op">(</span><span class="op">)</span>, <span class="fu"><a href="../reference/antibiotic_class_selectors.html">betalactams</a></span><span class="op">(</span><span class="op">)</span><span class="op">)</span>, threshold <span class="op">=</span> <span class="fl">0.9</span><span class="op">)</span></span>
<span><span class="va">resistance_recipe</span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; <span style="color: #00BBBB;">──</span> <span style="font-weight: bold;">Recipe</span> <span style="color: #00BBBB;">──────────────────────────────────────────────────────────────────────</span></span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; ── Inputs</span></span>
<span><span class="co">#&gt; Number of variables by role</span></span>
<span><span class="co">#&gt; outcome: 1</span></span>
<span><span class="co">#&gt; predictor: 20</span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; ── Operations</span></span>
<span><span class="co">#&gt; <span style="color: #00BBBB;"></span> Correlation filter on: <span style="color: #0000BB;">c(aminoglycosides(), betalactams())</span></span></span></code></pre></div>
<p><strong>Explanation:</strong></p>
<ul>
<li>
<code>recipe(mo ~ ., data = data)</code> will take the
<code>mo</code> column as outcome and all other columns as
predictors.</li>
<li>
<code>step_corr()</code> removes predictors (i.e., antibiotic
columns) that have a higher correlation than 90%.</li>
</ul>
<p>Notice how the recipe contains just the antibiotic selector functions
- no need to define the columns specifically.</p>
</div>
<div class="section level4">
<h4 id="specifying-the-model">2. Specifying the Model<a class="anchor" aria-label="anchor" href="#specifying-the-model"></a>
</h4>
<p>We define a logistic regression model since resistance prediction is
a binary classification task.</p>
<div class="sourceCode" id="cb3"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="co"># Specify a logistic regression model</span></span>
<span><span class="va">logistic_model</span> <span class="op">&lt;-</span> <span class="fu">logistic_reg</span><span class="op">(</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu">set_engine</span><span class="op">(</span><span class="st">"glm"</span><span class="op">)</span> <span class="co"># Use the Generalized Linear Model engine</span></span>
<span><span class="va">logistic_model</span></span>
<span><span class="co">#&gt; Logistic Regression Model Specification (classification)</span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; Computational engine: glm</span></span></code></pre></div>
<p><strong>Explanation:</strong></p>
<ul>
<li>
<code>logistic_reg()</code> sets up a logistic regression
model.</li>
<li>
<code>set_engine("glm")</code> specifies the use of Rs built-in GLM
engine.</li>
</ul>
</div>
<div class="section level4">
<h4 id="building-the-workflow">3. Building the Workflow<a class="anchor" aria-label="anchor" href="#building-the-workflow"></a>
</h4>
<p>We bundle the recipe and model together into a <code>workflow</code>,
which organizes the entire modeling process.</p>
<div class="sourceCode" id="cb4"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="co"># Combine the recipe and model into a workflow</span></span>
<span><span class="va">resistance_workflow</span> <span class="op">&lt;-</span> <span class="fu">workflow</span><span class="op">(</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu">add_recipe</span><span class="op">(</span><span class="va">resistance_recipe</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span> <span class="co"># Add the preprocessing recipe</span></span>
<span> <span class="fu">add_model</span><span class="op">(</span><span class="va">logistic_model</span><span class="op">)</span> <span class="co"># Add the logistic regression model</span></span></code></pre></div>
</div>
</div>
<div class="section level3">
<h3 id="training-and-evaluating-the-model">
<strong>Training and Evaluating the Model</strong><a class="anchor" aria-label="anchor" href="#training-and-evaluating-the-model"></a>
</h3>
<p>To train the model, we split the data into training and testing sets.
Then, we fit the workflow on the training set and evaluate its
performance.</p>
<div class="sourceCode" id="cb5"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="co"># Split data into training and testing sets</span></span>
<span><span class="fu"><a href="https://rdrr.io/r/base/Random.html" class="external-link">set.seed</a></span><span class="op">(</span><span class="fl">123</span><span class="op">)</span> <span class="co"># For reproducibility</span></span>
<span><span class="va">data_split</span> <span class="op">&lt;-</span> <span class="fu">initial_split</span><span class="op">(</span><span class="va">data</span>, prop <span class="op">=</span> <span class="fl">0.8</span><span class="op">)</span> <span class="co"># 80% training, 20% testing</span></span>
<span><span class="va">training_data</span> <span class="op">&lt;-</span> <span class="fu">training</span><span class="op">(</span><span class="va">data_split</span><span class="op">)</span> <span class="co"># Training set</span></span>
<span><span class="va">testing_data</span> <span class="op">&lt;-</span> <span class="fu">testing</span><span class="op">(</span><span class="va">data_split</span><span class="op">)</span> <span class="co"># Testing set</span></span>
<span></span>
<span><span class="co"># Fit the workflow to the training data</span></span>
<span><span class="va">fitted_workflow</span> <span class="op">&lt;-</span> <span class="va">resistance_workflow</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu">fit</span><span class="op">(</span><span class="va">training_data</span><span class="op">)</span> <span class="co"># Train the model</span></span>
<span><span class="co">#&gt; For aminoglycosides() using columns 'GEN' (gentamicin), 'TOB'</span></span>
<span><span class="co">#&gt; (tobramycin), 'AMK' (amikacin), and 'KAN' (kanamycin)</span></span>
<span><span class="co">#&gt; For betalactams() using columns 'PEN' (benzylpenicillin), 'OXA'</span></span>
<span><span class="co">#&gt; (oxacillin), 'FLC' (flucloxacillin), 'AMX' (amoxicillin), 'AMC'</span></span>
<span><span class="co">#&gt; (amoxicillin/clavulanic acid), 'AMP' (ampicillin), 'TZP'</span></span>
<span><span class="co">#&gt; (piperacillin/tazobactam), 'CZO' (cefazolin), 'FEP' (cefepime), 'CXM'</span></span>
<span><span class="co">#&gt; (cefuroxime), 'FOX' (cefoxitin), 'CTX' (cefotaxime), 'CAZ' (ceftazidime),</span></span>
<span><span class="co">#&gt; 'CRO' (ceftriaxone), 'IPM' (imipenem), and 'MEM' (meropenem)</span></span></code></pre></div>
<p><strong>Explanation:</strong></p>
<ul>
<li>
<code>initial_split()</code> splits the data into training and
testing sets.</li>
<li>
<code>fit()</code> trains the workflow on the training set.</li>
</ul>
<p>Notice how in <code>fit()</code>, the antibiotic selector functions
are internally called again. For training, these functions are called
since they are stored in the recipe.</p>
<p>Next, we evaluate the model on the testing data.</p>
<div class="sourceCode" id="cb6"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="co"># Make predictions on the testing set</span></span>
<span><span class="va">predictions</span> <span class="op">&lt;-</span> <span class="va">fitted_workflow</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu"><a href="https://rdrr.io/r/stats/predict.html" class="external-link">predict</a></span><span class="op">(</span><span class="va">testing_data</span><span class="op">)</span> <span class="co"># Generate predictions</span></span>
<span><span class="va">probabilities</span> <span class="op">&lt;-</span> <span class="va">fitted_workflow</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu"><a href="https://rdrr.io/r/stats/predict.html" class="external-link">predict</a></span><span class="op">(</span><span class="va">testing_data</span>, type <span class="op">=</span> <span class="st">"prob"</span><span class="op">)</span> <span class="co"># Generate probabilities</span></span>
<span></span>
<span><span class="va">predictions</span> <span class="op">&lt;-</span> <span class="va">predictions</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/bind_cols.html" class="external-link">bind_cols</a></span><span class="op">(</span><span class="va">probabilities</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/bind_cols.html" class="external-link">bind_cols</a></span><span class="op">(</span><span class="va">testing_data</span><span class="op">)</span> <span class="co"># Combine with true labels</span></span>
<span></span>
<span><span class="va">predictions</span></span>
<span><span class="co">#&gt; <span style="color: #949494;"># A tibble: 394 × 24</span></span></span>
<span><span class="co">#&gt; .pred_class `.pred_Gram-negative` `.pred_Gram-positive` mo GEN TOB</span></span>
<span><span class="co">#&gt; <span style="color: #949494; font-style: italic;">&lt;fct&gt;</span> <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span> <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span> <span style="color: #949494; font-style: italic;">&lt;fct&gt;</span> <span style="color: #949494; font-style: italic;">&lt;int&gt;</span> <span style="color: #949494; font-style: italic;">&lt;int&gt;</span></span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 1</span> Gram-positive 1.07<span style="color: #949494;">e</span><span style="color: #BB0000;">- 1</span> 8.93<span style="color: #949494;">e</span><span style="color: #BB0000;">- 1</span> Gram-p… 5 5</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 2</span> Gram-positive 3.17<span style="color: #949494;">e</span><span style="color: #BB0000;">- 8</span> 1.00<span style="color: #949494;">e</span>+ 0 Gram-p… 5 1</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 3</span> Gram-negative 9.99<span style="color: #949494;">e</span><span style="color: #BB0000;">- 1</span> 1.42<span style="color: #949494;">e</span><span style="color: #BB0000;">- 3</span> Gram-n… 5 5</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 4</span> Gram-positive 2.22<span style="color: #949494;">e</span><span style="color: #BB0000;">-16</span> 1 <span style="color: #949494;">e</span>+ 0 Gram-p… 5 5</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 5</span> Gram-negative 9.46<span style="color: #949494;">e</span><span style="color: #BB0000;">- 1</span> 5.42<span style="color: #949494;">e</span><span style="color: #BB0000;">- 2</span> Gram-n… 5 5</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 6</span> Gram-positive 1.07<span style="color: #949494;">e</span><span style="color: #BB0000;">- 1</span> 8.93<span style="color: #949494;">e</span><span style="color: #BB0000;">- 1</span> Gram-p… 5 5</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 7</span> Gram-positive 2.22<span style="color: #949494;">e</span><span style="color: #BB0000;">-16</span> 1 <span style="color: #949494;">e</span>+ 0 Gram-p… 1 5</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 8</span> Gram-positive 2.22<span style="color: #949494;">e</span><span style="color: #BB0000;">-16</span> 1 <span style="color: #949494;">e</span>+ 0 Gram-p… 4 4</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 9</span> Gram-negative 1 <span style="color: #949494;">e</span>+ 0 2.22<span style="color: #949494;">e</span><span style="color: #BB0000;">-16</span> Gram-n… 1 1</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;">10</span> Gram-positive 6.05<span style="color: #949494;">e</span><span style="color: #BB0000;">-11</span> 1.00<span style="color: #949494;">e</span>+ 0 Gram-p… 4 4</span></span>
<span><span class="co">#&gt; <span style="color: #949494;"># 384 more rows</span></span></span>
<span><span class="co">#&gt; <span style="color: #949494;"># 18 more variables: AMK &lt;int&gt;, KAN &lt;int&gt;, PEN &lt;int&gt;, OXA &lt;int&gt;, FLC &lt;int&gt;,</span></span></span>
<span><span class="co">#&gt; <span style="color: #949494;"># AMX &lt;int&gt;, AMC &lt;int&gt;, AMP &lt;int&gt;, TZP &lt;int&gt;, CZO &lt;int&gt;, FEP &lt;int&gt;,</span></span></span>
<span><span class="co">#&gt; <span style="color: #949494;"># CXM &lt;int&gt;, FOX &lt;int&gt;, CTX &lt;int&gt;, CAZ &lt;int&gt;, CRO &lt;int&gt;, IPM &lt;int&gt;, MEM &lt;int&gt;</span></span></span>
<span></span>
<span><span class="co"># Evaluate model performance</span></span>
<span><span class="va">metrics</span> <span class="op">&lt;-</span> <span class="va">predictions</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu">metrics</span><span class="op">(</span>truth <span class="op">=</span> <span class="va">mo</span>, estimate <span class="op">=</span> <span class="va">.pred_class</span><span class="op">)</span> <span class="co"># Calculate performance metrics</span></span>
<span></span>
<span><span class="va">metrics</span></span>
<span><span class="co">#&gt; <span style="color: #949494;"># A tibble: 2 × 3</span></span></span>
<span><span class="co">#&gt; .metric .estimator .estimate</span></span>
<span><span class="co">#&gt; <span style="color: #949494; font-style: italic;">&lt;chr&gt;</span> <span style="color: #949494; font-style: italic;">&lt;chr&gt;</span> <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span></span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;">1</span> accuracy binary 0.995</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;">2</span> kap binary 0.989</span></span></code></pre></div>
<p><strong>Explanation:</strong></p>
<ul>
<li>
<code><a href="https://rdrr.io/r/stats/predict.html" class="external-link">predict()</a></code> generates predictions on the testing
set.</li>
<li>
<code>metrics()</code> computes evaluation metrics like accuracy and
kappa.</li>
</ul>
<p>It appears we can predict the Gram based on AMR results with a 0.995
accuracy based on AMR results of aminoglycosides and beta-lactam
antibiotics. The ROC curve looks like this:</p>
<div class="sourceCode" id="cb7"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="va">predictions</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu">roc_curve</span><span class="op">(</span><span class="va">mo</span>, <span class="va">`.pred_Gram-negative`</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/autoplot.html" class="external-link">autoplot</a></span><span class="op">(</span><span class="op">)</span></span></code></pre></div>
<p><img src="AMR_with_tidymodels_files/figure-html/unnamed-chunk-7-1.png" width="720"></p>
</div>
<div class="section level3">
<h3 id="conclusion">
<strong>Conclusion</strong><a class="anchor" aria-label="anchor" href="#conclusion"></a>
</h3>
<p>In this post, we demonstrated how to build a machine learning
pipeline with the <code>tidymodels</code> framework and the
<code>AMR</code> package. By combining selector functions like
<code><a href="../reference/antibiotic_class_selectors.html">aminoglycosides()</a></code> and <code><a href="../reference/antibiotic_class_selectors.html">betalactams()</a></code> with
<code>tidymodels</code>, we efficiently prepared data, trained a model,
and evaluated its performance.</p>
<p>This workflow is extensible to other antibiotic classes and
resistance patterns, empowering users to analyse AMR data systematically
and reproducibly.</p>
</div>
</main><aside class="col-md-3"><nav id="toc" aria-label="Table of contents"><h2>On this page</h2>
</nav></aside>
</div>
<footer><div class="pkgdown-footer-left">
<p><code>AMR</code> (for R). Free and open-source, licenced under the <a target="_blank" href="https://github.com/msberends/AMR/blob/main/LICENSE" class="external-link">GNU General Public License version 2.0 (GPL-2)</a>.<br>Developed at the <a target="_blank" href="https://www.rug.nl" class="external-link">University of Groningen</a> and <a target="_blank" href="https://www.umcg.nl" class="external-link">University Medical Center Groningen</a> in The Netherlands.</p>
</div>
<div class="pkgdown-footer-right">
<p><a target="_blank" href="https://www.rug.nl" class="external-link"><img src="https://github.com/msberends/AMR/raw/main/pkgdown/assets/logo_rug.svg" style="max-width: 150px;"></a><a target="_blank" href="https://www.umcg.nl" class="external-link"><img src="https://github.com/msberends/AMR/raw/main/pkgdown/assets/logo_umcg.svg" style="max-width: 150px;"></a></p>
</div>
</footer>
</div>
</body>
</html>