1
0
mirror of https://github.com/msberends/AMR.git synced 2024-12-26 06:46:11 +01:00
AMR/articles/AMR_with_tidymodels.html
2024-12-19 19:25:10 +00:00

404 lines
41 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<!-- Generated by pkgdown: do not edit by hand --><html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title>`AMR` with `tidymodels` • AMR (for R)</title>
<!-- favicons --><link rel="icon" type="image/png" sizes="16x16" href="../favicon-16x16.png">
<link rel="icon" type="image/png" sizes="32x32" href="../favicon-32x32.png">
<link rel="apple-touch-icon" type="image/png" sizes="180x180" href="../apple-touch-icon.png">
<link rel="apple-touch-icon" type="image/png" sizes="120x120" href="../apple-touch-icon-120x120.png">
<link rel="apple-touch-icon" type="image/png" sizes="76x76" href="../apple-touch-icon-76x76.png">
<link rel="apple-touch-icon" type="image/png" sizes="60x60" href="../apple-touch-icon-60x60.png">
<script src="../deps/jquery-3.6.0/jquery-3.6.0.min.js"></script><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<link href="../deps/bootstrap-5.3.1/bootstrap.min.css" rel="stylesheet">
<script src="../deps/bootstrap-5.3.1/bootstrap.bundle.min.js"></script><link href="../deps/Lato-0.4.9/font.css" rel="stylesheet">
<link href="../deps/Fira_Code-0.4.9/font.css" rel="stylesheet">
<link href="../deps/font-awesome-6.5.2/css/all.min.css" rel="stylesheet">
<link href="../deps/font-awesome-6.5.2/css/v4-shims.min.css" rel="stylesheet">
<script src="../deps/headroom-0.11.0/headroom.min.js"></script><script src="../deps/headroom-0.11.0/jQuery.headroom.min.js"></script><script src="../deps/bootstrap-toc-1.0.1/bootstrap-toc.min.js"></script><script src="../deps/clipboard.js-2.0.11/clipboard.min.js"></script><script src="../deps/search-1.0.0/autocomplete.jquery.min.js"></script><script src="../deps/search-1.0.0/fuse.min.js"></script><script src="../deps/search-1.0.0/mark.min.js"></script><!-- pkgdown --><script src="../pkgdown.js"></script><link href="../extra.css" rel="stylesheet">
<script src="../extra.js"></script><meta property="og:title" content="`AMR` with `tidymodels`">
</head>
<body>
<a href="#main" class="visually-hidden-focusable">Skip to contents</a>
<nav class="navbar navbar-expand-lg fixed-top bg-primary" data-bs-theme="dark" aria-label="Site navigation"><div class="container">
<a class="navbar-brand me-2" href="../index.html">AMR (for R)</a>
<small class="nav-text text-muted me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="">2.1.1.9121</small>
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div id="navbar" class="collapse navbar-collapse ms-3">
<ul class="navbar-nav me-auto">
<li class="active nav-item dropdown">
<button class="nav-link dropdown-toggle" type="button" id="dropdown-how-to" data-bs-toggle="dropdown" aria-expanded="false" aria-haspopup="true"><span class="fa fa-question-circle"></span> How to</button>
<ul class="dropdown-menu" aria-labelledby="dropdown-how-to">
<li><a class="dropdown-item" href="../articles/AMR.html"><span class="fa fa-directions"></span> Conduct AMR Analysis</a></li>
<li><a class="dropdown-item" href="../reference/antibiogram.html"><span class="fa fa-file-prescription"></span> Generate Antibiogram (Trad./Syndromic/WISCA)</a></li>
<li><a class="dropdown-item" href="../articles/resistance_predict.html"><span class="fa fa-dice"></span> Predict Antimicrobial Resistance</a></li>
<li><a class="dropdown-item" href="../articles/datasets.html"><span class="fa fa-database"></span> Download Data Sets for Own Use</a></li>
<li><a class="dropdown-item" href="../articles/AMR_with_tidymodels.html"><span class="fa fa-square-root-variable"></span> Use AMR for Predictive Modelling (tidymodels)</a></li>
<li><a class="dropdown-item" href="../reference/AMR-options.html"><span class="fa fa-gear"></span> Set User- Or Team-specific Package Settings</a></li>
<li><a class="dropdown-item" href="../articles/PCA.html"><span class="fa fa-compress"></span> Conduct Principal Component Analysis for AMR</a></li>
<li><a class="dropdown-item" href="../articles/MDR.html"><span class="fa fa-skull-crossbones"></span> Determine Multi-Drug Resistance (MDR)</a></li>
<li><a class="dropdown-item" href="../articles/WHONET.html"><span class="fa fa-globe-americas"></span> Work with WHONET Data</a></li>
<li><a class="dropdown-item" href="../articles/EUCAST.html"><span class="fa fa-exchange-alt"></span> Apply Eucast Rules</a></li>
<li><a class="dropdown-item" href="../reference/mo_property.html"><span class="fa fa-bug"></span> Get Taxonomy of a Microorganism</a></li>
<li><a class="dropdown-item" href="../reference/ab_property.html"><span class="fa fa-capsules"></span> Get Properties of an Antibiotic Drug</a></li>
<li><a class="dropdown-item" href="../reference/av_property.html"><span class="fa fa-capsules"></span> Get Properties of an Antiviral Drug</a></li>
</ul>
</li>
<li class="nav-item"><a class="nav-link" href="../articles/AMR_for_Python.html"><span class="fa fab fa-python"></span> AMR for Python</a></li>
<li class="nav-item"><a class="nav-link" href="../reference/index.html"><span class="fa fa-book-open"></span> Manual</a></li>
<li class="nav-item"><a class="nav-link" href="../authors.html"><span class="fa fa-users"></span> Authors</a></li>
</ul>
<ul class="navbar-nav">
<li class="nav-item"><a class="nav-link" href="../news/index.html"><span class="fa far fa-newspaper"></span> Changelog</a></li>
<li class="nav-item"><a class="external-link nav-link" href="https://github.com/msberends/AMR"><span class="fa fab fa-github"></span> Source Code</a></li>
</ul>
</div>
</div>
</nav><div class="container template-article">
<div class="row">
<main id="main" class="col-md-9"><div class="page-header">
<img src="../logo.svg" class="logo" alt=""><h1>`AMR` with `tidymodels`</h1>
<small class="dont-index">Source: <a href="https://github.com/msberends/AMR/blob/main/vignettes/AMR_with_tidymodels.Rmd" class="external-link"><code>vignettes/AMR_with_tidymodels.Rmd</code></a></small>
<div class="d-none name"><code>AMR_with_tidymodels.Rmd</code></div>
</div>
<p>Antimicrobial resistance (AMR) is a global health crisis, and
understanding resistance patterns is crucial for managing effective
treatments. The <code>AMR</code> R package provides robust tools for
analysing AMR data, including convenient antibiotic selector functions
like <code><a href="../reference/antibiotic_class_selectors.html">aminoglycosides()</a></code> and <code><a href="../reference/antibiotic_class_selectors.html">betalactams()</a></code>. In
this post, we will explore how to use the <code>tidymodels</code>
framework to predict resistance patterns in the
<code>example_isolates</code> dataset.</p>
<p>By leveraging the power of <code>tidymodels</code> and the
<code>AMR</code> package, well build a reproducible machine learning
workflow to predict resistance to two important antibiotic classes:
aminoglycosides and beta-lactams.</p>
<hr>
<div class="section level3">
<h3 id="objective">
<strong>Objective</strong><a class="anchor" aria-label="anchor" href="#objective"></a>
</h3>
<p>Our goal is to build a predictive model using the
<code>tidymodels</code> framework to determine resistance patterns based
on microbial data. We will:</p>
<ol style="list-style-type: decimal">
<li>Preprocess data using the selector functions
<code><a href="../reference/antibiotic_class_selectors.html">aminoglycosides()</a></code> and <code><a href="../reference/antibiotic_class_selectors.html">betalactams()</a></code>.</li>
<li>Define a logistic regression model for prediction.</li>
<li>Use a structured <code>tidymodels</code> workflow to preprocess,
train, and evaluate the model.</li>
</ol>
<hr>
</div>
<div class="section level3">
<h3 id="data-preparation">
<strong>Data Preparation</strong><a class="anchor" aria-label="anchor" href="#data-preparation"></a>
</h3>
<p>We begin by loading the required libraries and preparing the
<code>example_isolates</code> dataset from the <code>AMR</code>
package.</p>
<div class="sourceCode" id="cb1"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="co"># Load required libraries</span></span>
<span><span class="kw"><a href="https://rdrr.io/r/base/library.html" class="external-link">library</a></span><span class="op">(</span><span class="va"><a href="https://tidymodels.tidymodels.org" class="external-link">tidymodels</a></span><span class="op">)</span> <span class="co"># For machine learning workflows, and data manipulation (dplyr, tidyr, ...)</span></span>
<span><span class="co">#&gt; Error in get(paste0(generic, ".", class), envir = get_method_env()) : </span></span>
<span><span class="co">#&gt; object 'type_sum.accel' not found</span></span>
<span><span class="co">#&gt; ── <span style="font-weight: bold;">Attaching packages</span> ────────────────────────────────────── tidymodels 1.2.0 ──</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">broom </span> 1.0.7 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">recipes </span> 1.1.0</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">dials </span> 1.3.0 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">rsample </span> 1.2.1</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">dplyr </span> 1.1.4 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">tibble </span> 3.2.1</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">ggplot2 </span> 3.5.1 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">tidyr </span> 1.3.1</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">infer </span> 1.0.7 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">tune </span> 1.2.1</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">modeldata </span> 1.4.0 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">workflows </span> 1.1.4</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">parsnip </span> 1.2.1 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">workflowsets</span> 1.1.0</span></span>
<span><span class="co">#&gt; <span style="color: #00BB00;"></span> <span style="color: #0000BB;">purrr </span> 1.0.2 <span style="color: #00BB00;"></span> <span style="color: #0000BB;">yardstick </span> 1.3.1</span></span>
<span><span class="co">#&gt; ── <span style="font-weight: bold;">Conflicts</span> ───────────────────────────────────────── tidymodels_conflicts() ──</span></span>
<span><span class="co">#&gt; <span style="color: #BB0000;"></span> <span style="color: #0000BB;">purrr</span>::<span style="color: #00BB00;">discard()</span> masks <span style="color: #0000BB;">scales</span>::discard()</span></span>
<span><span class="co">#&gt; <span style="color: #BB0000;"></span> <span style="color: #0000BB;">dplyr</span>::<span style="color: #00BB00;">filter()</span> masks <span style="color: #0000BB;">stats</span>::filter()</span></span>
<span><span class="co">#&gt; <span style="color: #BB0000;"></span> <span style="color: #0000BB;">dplyr</span>::<span style="color: #00BB00;">lag()</span> masks <span style="color: #0000BB;">stats</span>::lag()</span></span>
<span><span class="co">#&gt; <span style="color: #BB0000;"></span> <span style="color: #0000BB;">recipes</span>::<span style="color: #00BB00;">step()</span> masks <span style="color: #0000BB;">stats</span>::step()</span></span>
<span><span class="co">#&gt; <span style="color: #0000BB;"></span> Use <span style="color: #00BB00;">tidymodels_prefer()</span> to resolve common conflicts.</span></span>
<span><span class="kw"><a href="https://rdrr.io/r/base/library.html" class="external-link">library</a></span><span class="op">(</span><span class="va"><a href="https://msberends.github.io/AMR/">AMR</a></span><span class="op">)</span> <span class="co"># For AMR data analysis</span></span>
<span></span>
<span><span class="co"># Load the example_isolates dataset</span></span>
<span><span class="fu"><a href="https://rdrr.io/r/utils/data.html" class="external-link">data</a></span><span class="op">(</span><span class="st">"example_isolates"</span><span class="op">)</span> <span class="co"># Preloaded dataset with AMR results</span></span>
<span></span>
<span><span class="co"># Select relevant columns for prediction</span></span>
<span><span class="va">data</span> <span class="op">&lt;-</span> <span class="va">example_isolates</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="co"># select AB results dynamically</span></span>
<span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/select.html" class="external-link">select</a></span><span class="op">(</span><span class="va">mo</span>, <span class="fu"><a href="../reference/antibiotic_class_selectors.html">aminoglycosides</a></span><span class="op">(</span><span class="op">)</span>, <span class="fu"><a href="../reference/antibiotic_class_selectors.html">betalactams</a></span><span class="op">(</span><span class="op">)</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="co"># replace NAs with NI (not-interpretable)</span></span>
<span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/mutate.html" class="external-link">mutate</a></span><span class="op">(</span><span class="fu"><a href="https://dplyr.tidyverse.org/reference/across.html" class="external-link">across</a></span><span class="op">(</span><span class="fu"><a href="https://tidyselect.r-lib.org/reference/where.html" class="external-link">where</a></span><span class="op">(</span><span class="va">is.sir</span><span class="op">)</span>,</span>
<span> <span class="op">~</span><span class="fu">replace_na</span><span class="op">(</span><span class="va">.x</span>, <span class="st">"NI"</span><span class="op">)</span><span class="op">)</span>,</span>
<span> <span class="co"># make factors of SIR columns</span></span>
<span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/across.html" class="external-link">across</a></span><span class="op">(</span><span class="fu"><a href="https://tidyselect.r-lib.org/reference/where.html" class="external-link">where</a></span><span class="op">(</span><span class="va">is.sir</span><span class="op">)</span>,</span>
<span> <span class="va">as.integer</span><span class="op">)</span>,</span>
<span> <span class="co"># get Gramstain of microorganisms</span></span>
<span> mo <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/factor.html" class="external-link">as.factor</a></span><span class="op">(</span><span class="fu"><a href="../reference/mo_property.html">mo_gramstain</a></span><span class="op">(</span><span class="va">mo</span><span class="op">)</span><span class="op">)</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="co"># drop NAs - the ones without a Gramstain (fungi, etc.)</span></span>
<span> <span class="fu">drop_na</span><span class="op">(</span><span class="op">)</span> <span class="co"># %&gt;%</span></span>
<span><span class="co">#&gt; For aminoglycosides() using columns 'GEN' (gentamicin), 'TOB'</span></span>
<span><span class="co">#&gt; (tobramycin), 'AMK' (amikacin), and 'KAN' (kanamycin)</span></span>
<span><span class="co">#&gt; For betalactams() using columns 'PEN' (benzylpenicillin), 'OXA'</span></span>
<span><span class="co">#&gt; (oxacillin), 'FLC' (flucloxacillin), 'AMX' (amoxicillin), 'AMC'</span></span>
<span><span class="co">#&gt; (amoxicillin/clavulanic acid), 'AMP' (ampicillin), 'TZP'</span></span>
<span><span class="co">#&gt; (piperacillin/tazobactam), 'CZO' (cefazolin), 'FEP' (cefepime), 'CXM'</span></span>
<span><span class="co">#&gt; (cefuroxime), 'FOX' (cefoxitin), 'CTX' (cefotaxime), 'CAZ' (ceftazidime),</span></span>
<span><span class="co">#&gt; 'CRO' (ceftriaxone), 'IPM' (imipenem), and 'MEM' (meropenem)</span></span>
<span> <span class="co"># Cefepime is not reliable</span></span>
<span> <span class="co">#select(-FEP)</span></span></code></pre></div>
<p><strong>Explanation:</strong> - <code><a href="../reference/antibiotic_class_selectors.html">aminoglycosides()</a></code> and
<code><a href="../reference/antibiotic_class_selectors.html">betalactams()</a></code> dynamically select columns for antibiotics in
these classes. - <code>drop_na()</code> ensures the model receives
complete cases for training.</p>
<hr>
</div>
<div class="section level3">
<h3 id="defining-the-workflow">
<strong>Defining the Workflow</strong><a class="anchor" aria-label="anchor" href="#defining-the-workflow"></a>
</h3>
<p>We now define the <code>tidymodels</code> workflow, which consists of
three steps: preprocessing, model specification, and fitting.</p>
<div class="section level4">
<h4 id="preprocessing-with-a-recipe">1. Preprocessing with a Recipe<a class="anchor" aria-label="anchor" href="#preprocessing-with-a-recipe"></a>
</h4>
<p>We create a recipe to preprocess the data for modelling. This
includes: - Encoding resistance results (<code>S</code>, <code>I</code>,
<code>R</code>) as binary (resistant or not resistant). - Converting
microbial organism names (<code>mo</code>) into numerical features using
one-hot encoding.</p>
<div class="sourceCode" id="cb2"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="co"># Define the recipe for data preprocessing</span></span>
<span><span class="va">resistance_recipe</span> <span class="op">&lt;-</span> <span class="fu">recipe</span><span class="op">(</span><span class="va">mo</span> <span class="op">~</span> <span class="va">.</span>, data <span class="op">=</span> <span class="va">data</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu">step_corr</span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/c.html" class="external-link">c</a></span><span class="op">(</span><span class="fu"><a href="../reference/antibiotic_class_selectors.html">aminoglycosides</a></span><span class="op">(</span><span class="op">)</span>, <span class="fu"><a href="../reference/antibiotic_class_selectors.html">betalactams</a></span><span class="op">(</span><span class="op">)</span><span class="op">)</span>, threshold <span class="op">=</span> <span class="fl">0.9</span><span class="op">)</span></span>
<span><span class="va">resistance_recipe</span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; <span style="color: #00BBBB;">──</span> <span style="font-weight: bold;">Recipe</span> <span style="color: #00BBBB;">──────────────────────────────────────────────────────────────────────</span></span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; ── Inputs</span></span>
<span><span class="co">#&gt; Number of variables by role</span></span>
<span><span class="co">#&gt; outcome: 1</span></span>
<span><span class="co">#&gt; predictor: 20</span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; ── Operations</span></span>
<span><span class="co">#&gt; <span style="color: #00BBBB;"></span> Correlation filter on: <span style="color: #0000BB;">c(aminoglycosides(), betalactams())</span></span></span></code></pre></div>
<p><strong>Explanation:</strong> - <code>step_mutate()</code> transforms
resistance results (<code>R</code>) into binary variables (TRUE/FALSE).
- <code>step_dummy()</code> converts categorical organism
(<code>mo</code>) names into one-hot encoded numerical features, making
them compatible with the model.</p>
</div>
<div class="section level4">
<h4 id="specifying-the-model">2. Specifying the Model<a class="anchor" aria-label="anchor" href="#specifying-the-model"></a>
</h4>
<p>We define a logistic regression model since resistance prediction is
a binary classification task.</p>
<div class="sourceCode" id="cb3"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="co"># Specify a logistic regression model</span></span>
<span><span class="va">logistic_model</span> <span class="op">&lt;-</span> <span class="fu">logistic_reg</span><span class="op">(</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu">set_engine</span><span class="op">(</span><span class="st">"glm"</span><span class="op">)</span> <span class="co"># Use the Generalized Linear Model engine</span></span>
<span><span class="va">logistic_model</span></span>
<span><span class="co">#&gt; Logistic Regression Model Specification (classification)</span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; Computational engine: glm</span></span></code></pre></div>
<p><strong>Explanation:</strong> - <code>logistic_reg()</code> sets up a
logistic regression model. - <code>set_engine("glm")</code> specifies
the use of Rs built-in GLM engine.</p>
</div>
<div class="section level4">
<h4 id="building-the-workflow">3. Building the Workflow<a class="anchor" aria-label="anchor" href="#building-the-workflow"></a>
</h4>
<p>We bundle the recipe and model together into a <code>workflow</code>,
which organizes the entire modeling process.</p>
<div class="sourceCode" id="cb4"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="co"># Combine the recipe and model into a workflow</span></span>
<span><span class="va">resistance_workflow</span> <span class="op">&lt;-</span> <span class="fu">workflow</span><span class="op">(</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu">add_recipe</span><span class="op">(</span><span class="va">resistance_recipe</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span> <span class="co"># Add the preprocessing recipe</span></span>
<span> <span class="fu">add_model</span><span class="op">(</span><span class="va">logistic_model</span><span class="op">)</span> <span class="co"># Add the logistic regression model</span></span>
<span><span class="va">resistance_workflow</span></span>
<span><span class="co">#&gt; ══ Workflow ════════════════════════════════════════════════════════════════════</span></span>
<span><span class="co">#&gt; <span style="font-style: italic;">Preprocessor:</span> Recipe</span></span>
<span><span class="co">#&gt; <span style="font-style: italic;">Model:</span> logistic_reg()</span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; ── Preprocessor ────────────────────────────────────────────────────────────────</span></span>
<span><span class="co">#&gt; 1 Recipe Step</span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; • step_corr()</span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; ── Model ───────────────────────────────────────────────────────────────────────</span></span>
<span><span class="co">#&gt; Logistic Regression Model Specification (classification)</span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; Computational engine: glm</span></span></code></pre></div>
<hr>
</div>
</div>
<div class="section level3">
<h3 id="training-and-evaluating-the-model">
<strong>Training and Evaluating the Model</strong><a class="anchor" aria-label="anchor" href="#training-and-evaluating-the-model"></a>
</h3>
<p>To train the model, we split the data into training and testing sets.
Then, we fit the workflow on the training set and evaluate its
performance.</p>
<div class="sourceCode" id="cb5"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="co"># Split data into training and testing sets</span></span>
<span><span class="fu"><a href="https://rdrr.io/r/base/Random.html" class="external-link">set.seed</a></span><span class="op">(</span><span class="fl">123</span><span class="op">)</span> <span class="co"># For reproducibility</span></span>
<span><span class="va">data_split</span> <span class="op">&lt;-</span> <span class="fu">initial_split</span><span class="op">(</span><span class="va">data</span>, prop <span class="op">=</span> <span class="fl">0.8</span><span class="op">)</span> <span class="co"># 80% training, 20% testing</span></span>
<span><span class="va">training_data</span> <span class="op">&lt;-</span> <span class="fu">training</span><span class="op">(</span><span class="va">data_split</span><span class="op">)</span> <span class="co"># Training set</span></span>
<span><span class="va">testing_data</span> <span class="op">&lt;-</span> <span class="fu">testing</span><span class="op">(</span><span class="va">data_split</span><span class="op">)</span> <span class="co"># Testing set</span></span>
<span></span>
<span><span class="co"># Fit the workflow to the training data</span></span>
<span><span class="va">fitted_workflow</span> <span class="op">&lt;-</span> <span class="va">resistance_workflow</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu">fit</span><span class="op">(</span><span class="va">training_data</span><span class="op">)</span> <span class="co"># Train the model</span></span>
<span><span class="co">#&gt; For aminoglycosides() using columns 'GEN' (gentamicin), 'TOB'</span></span>
<span><span class="co">#&gt; (tobramycin), 'AMK' (amikacin), and 'KAN' (kanamycin)</span></span>
<span><span class="co">#&gt; For betalactams() using columns 'PEN' (benzylpenicillin), 'OXA'</span></span>
<span><span class="co">#&gt; (oxacillin), 'FLC' (flucloxacillin), 'AMX' (amoxicillin), 'AMC'</span></span>
<span><span class="co">#&gt; (amoxicillin/clavulanic acid), 'AMP' (ampicillin), 'TZP'</span></span>
<span><span class="co">#&gt; (piperacillin/tazobactam), 'CZO' (cefazolin), 'FEP' (cefepime), 'CXM'</span></span>
<span><span class="co">#&gt; (cefuroxime), 'FOX' (cefoxitin), 'CTX' (cefotaxime), 'CAZ' (ceftazidime),</span></span>
<span><span class="co">#&gt; 'CRO' (ceftriaxone), 'IPM' (imipenem), and 'MEM' (meropenem)</span></span>
<span></span>
<span><span class="va">fitted_workflow</span></span>
<span><span class="co">#&gt; ══ Workflow [trained] ══════════════════════════════════════════════════════════</span></span>
<span><span class="co">#&gt; <span style="font-style: italic;">Preprocessor:</span> Recipe</span></span>
<span><span class="co">#&gt; <span style="font-style: italic;">Model:</span> logistic_reg()</span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; ── Preprocessor ────────────────────────────────────────────────────────────────</span></span>
<span><span class="co">#&gt; 1 Recipe Step</span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; • step_corr()</span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; ── Model ───────────────────────────────────────────────────────────────────────</span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; Call: stats::glm(formula = ..y ~ ., family = stats::binomial, data = data)</span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; Coefficients:</span></span>
<span><span class="co">#&gt; (Intercept) GEN TOB AMK KAN PEN </span></span>
<span><span class="co">#&gt; 101.11641 -3.69738 4.55879 1.86703 -23.37497 -0.57182 </span></span>
<span><span class="co">#&gt; OXA FLC AMC AMP TZP CZO </span></span>
<span><span class="co">#&gt; -4.68575 -11.69742 0.79748 -1.56197 0.87667 -2.28424 </span></span>
<span><span class="co">#&gt; FEP CXM FOX CAZ CRO IPM </span></span>
<span><span class="co">#&gt; -0.19847 0.02659 10.32455 10.27248 0.97321 -0.93096 </span></span>
<span><span class="co">#&gt; MEM </span></span>
<span><span class="co">#&gt; -0.88753 </span></span>
<span><span class="co">#&gt; </span></span>
<span><span class="co">#&gt; Degrees of Freedom: 1573 Total (i.e. Null); 1555 Residual</span></span>
<span><span class="co">#&gt; Null Deviance: 2071 </span></span>
<span><span class="co">#&gt; Residual Deviance: 74.91 AIC: 112.9</span></span></code></pre></div>
<p><strong>Explanation:</strong> - <code>initial_split()</code> splits
the data into training and testing sets. - <code>fit()</code> trains the
workflow on the training set.</p>
<p>Next, we evaluate the model on the testing data.</p>
<div class="sourceCode" id="cb6"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="co"># Make predictions on the testing set</span></span>
<span><span class="va">predictions</span> <span class="op">&lt;-</span> <span class="va">fitted_workflow</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu"><a href="https://rdrr.io/r/stats/predict.html" class="external-link">predict</a></span><span class="op">(</span><span class="va">testing_data</span><span class="op">)</span> <span class="co"># Generate predictions</span></span>
<span><span class="va">probabilities</span> <span class="op">&lt;-</span> <span class="va">fitted_workflow</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu"><a href="https://rdrr.io/r/stats/predict.html" class="external-link">predict</a></span><span class="op">(</span><span class="va">testing_data</span>, type <span class="op">=</span> <span class="st">"prob"</span><span class="op">)</span> <span class="co"># Generate probabilities</span></span>
<span></span>
<span><span class="va">predictions</span> <span class="op">&lt;-</span> <span class="va">predictions</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/bind_cols.html" class="external-link">bind_cols</a></span><span class="op">(</span><span class="va">probabilities</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/bind_cols.html" class="external-link">bind_cols</a></span><span class="op">(</span><span class="va">testing_data</span><span class="op">)</span> <span class="co"># Combine with true labels</span></span>
<span></span>
<span><span class="va">predictions</span></span>
<span><span class="co">#&gt; <span style="color: #949494;"># A tibble: 394 × 24</span></span></span>
<span><span class="co">#&gt; .pred_class `.pred_Gram-negative` `.pred_Gram-positive` mo GEN TOB</span></span>
<span><span class="co">#&gt; <span style="color: #949494; font-style: italic;">&lt;fct&gt;</span> <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span> <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span> <span style="color: #949494; font-style: italic;">&lt;fct&gt;</span> <span style="color: #949494; font-style: italic;">&lt;int&gt;</span> <span style="color: #949494; font-style: italic;">&lt;int&gt;</span></span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 1</span> Gram-positive 1.07<span style="color: #949494;">e</span><span style="color: #BB0000;">- 1</span> 8.93<span style="color: #949494;">e</span><span style="color: #BB0000;">- 1</span> Gram-p… 5 5</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 2</span> Gram-positive 3.17<span style="color: #949494;">e</span><span style="color: #BB0000;">- 8</span> 1.00<span style="color: #949494;">e</span>+ 0 Gram-p… 5 1</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 3</span> Gram-negative 9.99<span style="color: #949494;">e</span><span style="color: #BB0000;">- 1</span> 1.42<span style="color: #949494;">e</span><span style="color: #BB0000;">- 3</span> Gram-n… 5 5</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 4</span> Gram-positive 2.22<span style="color: #949494;">e</span><span style="color: #BB0000;">-16</span> 1 <span style="color: #949494;">e</span>+ 0 Gram-p… 5 5</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 5</span> Gram-negative 9.46<span style="color: #949494;">e</span><span style="color: #BB0000;">- 1</span> 5.42<span style="color: #949494;">e</span><span style="color: #BB0000;">- 2</span> Gram-n… 5 5</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 6</span> Gram-positive 1.07<span style="color: #949494;">e</span><span style="color: #BB0000;">- 1</span> 8.93<span style="color: #949494;">e</span><span style="color: #BB0000;">- 1</span> Gram-p… 5 5</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 7</span> Gram-positive 2.22<span style="color: #949494;">e</span><span style="color: #BB0000;">-16</span> 1 <span style="color: #949494;">e</span>+ 0 Gram-p… 1 5</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 8</span> Gram-positive 2.22<span style="color: #949494;">e</span><span style="color: #BB0000;">-16</span> 1 <span style="color: #949494;">e</span>+ 0 Gram-p… 4 4</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;"> 9</span> Gram-negative 1 <span style="color: #949494;">e</span>+ 0 2.22<span style="color: #949494;">e</span><span style="color: #BB0000;">-16</span> Gram-n… 1 1</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;">10</span> Gram-positive 6.05<span style="color: #949494;">e</span><span style="color: #BB0000;">-11</span> 1.00<span style="color: #949494;">e</span>+ 0 Gram-p… 4 4</span></span>
<span><span class="co">#&gt; <span style="color: #949494;"># 384 more rows</span></span></span>
<span><span class="co">#&gt; <span style="color: #949494;"># 18 more variables: AMK &lt;int&gt;, KAN &lt;int&gt;, PEN &lt;int&gt;, OXA &lt;int&gt;, FLC &lt;int&gt;,</span></span></span>
<span><span class="co">#&gt; <span style="color: #949494;"># AMX &lt;int&gt;, AMC &lt;int&gt;, AMP &lt;int&gt;, TZP &lt;int&gt;, CZO &lt;int&gt;, FEP &lt;int&gt;,</span></span></span>
<span><span class="co">#&gt; <span style="color: #949494;"># CXM &lt;int&gt;, FOX &lt;int&gt;, CTX &lt;int&gt;, CAZ &lt;int&gt;, CRO &lt;int&gt;, IPM &lt;int&gt;, MEM &lt;int&gt;</span></span></span>
<span></span>
<span><span class="co"># Evaluate model performance</span></span>
<span><span class="va">metrics</span> <span class="op">&lt;-</span> <span class="va">predictions</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu">metrics</span><span class="op">(</span>truth <span class="op">=</span> <span class="va">mo</span>, estimate <span class="op">=</span> <span class="va">.pred_class</span><span class="op">)</span> <span class="co"># Calculate performance metrics</span></span>
<span></span>
<span><span class="va">metrics</span></span>
<span><span class="co">#&gt; <span style="color: #949494;"># A tibble: 2 × 3</span></span></span>
<span><span class="co">#&gt; .metric .estimator .estimate</span></span>
<span><span class="co">#&gt; <span style="color: #949494; font-style: italic;">&lt;chr&gt;</span> <span style="color: #949494; font-style: italic;">&lt;chr&gt;</span> <span style="color: #949494; font-style: italic;">&lt;dbl&gt;</span></span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;">1</span> accuracy binary 0.995</span></span>
<span><span class="co">#&gt; <span style="color: #BCBCBC;">2</span> kap binary 0.989</span></span></code></pre></div>
<p><strong>Explanation:</strong> - <code><a href="https://rdrr.io/r/stats/predict.html" class="external-link">predict()</a></code> generates
predictions on the testing set. - <code>metrics()</code> computes
evaluation metrics like accuracy and AUC.</p>
<p>It appears we can predict the Gram based on AMR results with a 0.995
accuracy. The ROC curve looks like:</p>
<div class="sourceCode" id="cb7"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="va">predictions</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu">roc_curve</span><span class="op">(</span><span class="va">mo</span>, <span class="va">`.pred_Gram-negative`</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html" class="external-link">%&gt;%</a></span></span>
<span> <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/autoplot.html" class="external-link">autoplot</a></span><span class="op">(</span><span class="op">)</span></span></code></pre></div>
<p><img src="AMR_with_tidymodels_files/figure-html/unnamed-chunk-7-1.png" width="720"></p>
<hr>
</div>
<div class="section level3">
<h3 id="conclusion">
<strong>Conclusion</strong><a class="anchor" aria-label="anchor" href="#conclusion"></a>
</h3>
<p>In this post, we demonstrated how to build a machine learning
pipeline with the <code>tidymodels</code> framework and the
<code>AMR</code> package. By combining selector functions like
<code><a href="../reference/antibiotic_class_selectors.html">aminoglycosides()</a></code> and <code><a href="../reference/antibiotic_class_selectors.html">betalactams()</a></code> with
<code>tidymodels</code>, we efficiently prepared data, trained a model,
and evaluated its performance.</p>
<p>This workflow is extensible to other antibiotic classes and
resistance patterns, empowering users to analyse AMR data systematically
and reproducibly.</p>
<hr>
</div>
</main><aside class="col-md-3"><nav id="toc" aria-label="Table of contents"><h2>On this page</h2>
</nav></aside>
</div>
<footer><div class="pkgdown-footer-left">
<p><code>AMR</code> (for R). Free and open-source, licenced under the <a target="_blank" href="https://github.com/msberends/AMR/blob/main/LICENSE" class="external-link">GNU General Public License version 2.0 (GPL-2)</a>.<br>Developed at the <a target="_blank" href="https://www.rug.nl" class="external-link">University of Groningen</a> and <a target="_blank" href="https://www.umcg.nl" class="external-link">University Medical Center Groningen</a> in The Netherlands.</p>
</div>
<div class="pkgdown-footer-right">
<p><a target="_blank" href="https://www.rug.nl" class="external-link"><img src="https://github.com/msberends/AMR/raw/main/pkgdown/assets/logo_rug.svg" style="max-width: 150px;"></a><a target="_blank" href="https://www.umcg.nl" class="external-link"><img src="https://github.com/msberends/AMR/raw/main/pkgdown/assets/logo_umcg.svg" style="max-width: 150px;"></a></p>
</div>
</footer>
</div>
</body>
</html>