mirror of
https://github.com/msberends/AMR.git
synced 2025-07-20 21:33:14 +02:00
(v3.0.0.9003) eucast_rules fix, new tidymodels integration
This commit is contained in:
122
man/amr-tidymodels.Rd
Normal file
122
man/amr-tidymodels.Rd
Normal file
@ -0,0 +1,122 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/tidymodels.R
|
||||
\name{amr-tidymodels}
|
||||
\alias{amr-tidymodels}
|
||||
\alias{all_mic}
|
||||
\alias{all_mic_predictors}
|
||||
\alias{all_sir}
|
||||
\alias{all_sir_predictors}
|
||||
\alias{step_mic_log2}
|
||||
\alias{step_sir_numeric}
|
||||
\title{AMR Extensions for Tidymodels}
|
||||
\usage{
|
||||
all_mic()
|
||||
|
||||
all_mic_predictors()
|
||||
|
||||
all_sir()
|
||||
|
||||
all_sir_predictors()
|
||||
|
||||
step_mic_log2(recipe, ..., role = NA, trained = FALSE, columns = NULL,
|
||||
skip = FALSE, id = recipes::rand_id("mic_log2"))
|
||||
|
||||
step_sir_numeric(recipe, ..., role = NA, trained = FALSE, columns = NULL,
|
||||
skip = FALSE, id = recipes::rand_id("sir_numeric"))
|
||||
}
|
||||
\arguments{
|
||||
\item{recipe}{A recipe object. The step will be added to the sequence of
|
||||
operations for this recipe.}
|
||||
|
||||
\item{...}{One or more selector functions to choose variables for this step.
|
||||
See \code{\link[recipes:selections]{selections()}} for more details.}
|
||||
|
||||
\item{role}{Not used by this step since no new variables are created.}
|
||||
|
||||
\item{trained}{A logical to indicate if the quantities for preprocessing have
|
||||
been estimated.}
|
||||
|
||||
\item{skip}{A logical. Should the step be skipped when the recipe is baked by
|
||||
\code{\link[recipes:bake]{bake()}}? While all operations are baked when \code{\link[recipes:prep]{prep()}} is run, some
|
||||
operations may not be able to be conducted on new data (e.g. processing the
|
||||
outcome variable(s)). Care should be taken when using \code{skip = TRUE} as it
|
||||
may affect the computations for subsequent operations.}
|
||||
|
||||
\item{id}{A character string that is unique to this step to identify it.}
|
||||
}
|
||||
\description{
|
||||
This family of functions allows using AMR-specific data types such as \verb{<mic>} and \verb{<sir>} inside \code{tidymodels} pipelines.
|
||||
}
|
||||
\details{
|
||||
You can read more in our online \href{https://amr-for-r.org/articles/AMR_with_tidymodels.html}{AMR with tidymodels introduction}.
|
||||
|
||||
Tidyselect helpers include:
|
||||
\itemize{
|
||||
\item \code{\link[=all_mic]{all_mic()}} and \code{\link[=all_mic_predictors]{all_mic_predictors()}} to select \verb{<mic>} columns
|
||||
\item \code{\link[=all_sir]{all_sir()}} and \code{\link[=all_sir_predictors]{all_sir_predictors()}} to select \verb{<sir>} columns
|
||||
}
|
||||
|
||||
Pre-processing pipeline steps include:
|
||||
\itemize{
|
||||
\item \code{\link[=step_mic_log2]{step_mic_log2()}} to convert MIC columns to numeric (via \code{as.numeric()}) and apply a log2 transform, to be used with \code{\link[=all_mic_predictors]{all_mic_predictors()}}
|
||||
\item \code{\link[=step_sir_numeric]{step_sir_numeric()}} to convert SIR columns to numeric (via \code{as.numeric()}), to be used with \code{\link[=all_sir_predictors]{all_sir_predictors()}}: \code{"S"} = 1, \code{"I"}/\code{"SDD"} = 2, \code{"R"} = 3. All other values are rendered \code{NA}. Keep this in mind for further processing, especially if the model does not allow for \code{NA} values.
|
||||
}
|
||||
|
||||
These steps integrate with \code{recipes::recipe()} and work like standard preprocessing steps. They are useful for preparing data for modelling, especially with classification models.
|
||||
}
|
||||
\examples{
|
||||
library(tidymodels)
|
||||
|
||||
# The below approach formed the basis for this paper: DOI 10.3389/fmicb.2025.1582703
|
||||
# Presence of ESBL genes was predicted based on raw MIC values.
|
||||
|
||||
|
||||
# example data set in the AMR package
|
||||
esbl_isolates
|
||||
|
||||
# Prepare a binary outcome and convert to ordered factor
|
||||
data <- esbl_isolates \%>\%
|
||||
mutate(esbl = factor(esbl, levels = c(FALSE, TRUE), ordered = TRUE))
|
||||
|
||||
# Split into training and testing sets
|
||||
split <- initial_split(data)
|
||||
training_data <- training(split)
|
||||
testing_data <- testing(split)
|
||||
|
||||
# Create and prep a recipe with MIC log2 transformation
|
||||
mic_recipe <- recipe(esbl ~ ., data = training_data) \%>\%
|
||||
# Optionally remove non-predictive variables
|
||||
remove_role(genus, old_role = "predictor") \%>\%
|
||||
# Apply the log2 transformation to all MIC predictors
|
||||
step_mic_log2(all_mic_predictors()) \%>\%
|
||||
prep()
|
||||
|
||||
# View prepped recipe
|
||||
mic_recipe
|
||||
|
||||
# Apply the recipe to training and testing data
|
||||
out_training <- bake(mic_recipe, new_data = NULL)
|
||||
out_testing <- bake(mic_recipe, new_data = testing_data)
|
||||
|
||||
# Fit a logistic regression model
|
||||
fitted <- logistic_reg(mode = "classification") \%>\%
|
||||
set_engine("glm") \%>\%
|
||||
fit(esbl ~ ., data = out_training)
|
||||
|
||||
# Generate predictions on the test set
|
||||
predictions <- predict(fitted, out_testing) \%>\%
|
||||
bind_cols(out_testing)
|
||||
|
||||
# Evaluate predictions using standard classification metrics
|
||||
our_metrics <- metric_set(accuracy, kap, ppv, npv)
|
||||
metrics <- our_metrics(predictions, truth = esbl, estimate = .pred_class)
|
||||
|
||||
# Show performance:
|
||||
# - negative predictive value (NPV) of ~98\%
|
||||
# - positive predictive value (PPV) of ~94\%
|
||||
metrics
|
||||
}
|
||||
\seealso{
|
||||
\code{\link[recipes:recipe]{recipes::recipe()}}, \code{\link[=as.mic]{as.mic()}}, \code{\link[=as.sir]{as.sir()}}
|
||||
}
|
||||
\keyword{internal}
|
@ -247,7 +247,7 @@ To determine which isolates are multi-drug resistant, be sure to run \code{\link
|
||||
|
||||
The function \code{\link[=is.sir]{is.sir()}} detects if the input contains class \code{sir}. If the input is a \link{data.frame} or \link{list}, it iterates over all columns/items and returns a \link{logical} vector.
|
||||
|
||||
The base R function \code{\link[=as.double]{as.double()}} can be used to retrieve quantitative values from a \code{sir} object: \code{"S"} = 1, \code{"I"}/\code{"SDD"} = 2, \code{"R"} = 3. All other values are rendered \code{NA} . \strong{Note:} Do not use \code{as.integer()}, since that (because of how R works internally) will return the factor level indices, and not these aforementioned quantitative values.
|
||||
The base R function \code{\link[=as.double]{as.double()}} can be used to retrieve quantitative values from a \code{sir} object: \code{"S"} = 1, \code{"I"}/\code{"SDD"} = 2, \code{"R"} = 3. All other values are rendered \code{NA}. \strong{Note:} Do not use \code{as.integer()}, since that (because of how R works internally) will return the factor level indices, and not these aforementioned quantitative values.
|
||||
|
||||
The function \code{\link[=is_sir_eligible]{is_sir_eligible()}} returns \code{TRUE} when a column contains at most 5\% potentially invalid antimicrobial interpretations, and \code{FALSE} otherwise. The threshold of 5\% can be set with the \code{threshold} argument. If the input is a \link{data.frame}, it iterates over all columns and returns a \link{logical} vector.
|
||||
}
|
||||
|
27
man/esbl_isolates.Rd
Normal file
27
man/esbl_isolates.Rd
Normal file
@ -0,0 +1,27 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/data.R
|
||||
\docType{data}
|
||||
\name{esbl_isolates}
|
||||
\alias{esbl_isolates}
|
||||
\title{Data Set with 500 ESBL Isolates}
|
||||
\format{
|
||||
A \link[tibble:tibble]{tibble} with 500 observations and 19 variables:
|
||||
\itemize{
|
||||
\item \code{esbl}\cr Logical indicator if the isolate is ESBL-producing
|
||||
\item \code{genus}\cr Genus of the microorganism
|
||||
\item \code{AMC:COL}\cr MIC values for 17 antimicrobial agents, transformed to class \code{\link{mic}} (see \code{\link[=as.mic]{as.mic()}})
|
||||
}
|
||||
}
|
||||
\usage{
|
||||
esbl_isolates
|
||||
}
|
||||
\description{
|
||||
A data set containing 500 microbial isolates with MIC values of common antibiotics and a binary \code{esbl} column for extended-spectrum beta-lactamase (ESBL) production. This data set contains randomised fictitious data but reflects reality and can be used to practise AMR-related machine learning, e.g., classification modelling with \href{https://amr-for-r.org/articles/AMR_with_tidymodels.html}{tidymodels}.
|
||||
}
|
||||
\details{
|
||||
See our \link[=amr-tidymodels]{tidymodels integration} for an example using this data set.
|
||||
}
|
||||
\examples{
|
||||
esbl_isolates
|
||||
}
|
||||
\keyword{datasets}
|
@ -7,19 +7,25 @@
|
||||
\alias{random_sir}
|
||||
\title{Random MIC Values/Disk Zones/SIR Generation}
|
||||
\usage{
|
||||
random_mic(size = NULL, mo = NULL, ab = NULL, ...)
|
||||
random_mic(size = NULL, mo = NULL, ab = NULL, skew = "right",
|
||||
severity = 1, ...)
|
||||
|
||||
random_disk(size = NULL, mo = NULL, ab = NULL, ...)
|
||||
random_disk(size = NULL, mo = NULL, ab = NULL, skew = "left",
|
||||
severity = 1, ...)
|
||||
|
||||
random_sir(size = NULL, prob_SIR = c(0.33, 0.33, 0.33), ...)
|
||||
}
|
||||
\arguments{
|
||||
\item{size}{Desired size of the returned vector. If used in a \link{data.frame} call or \code{dplyr} verb, will get the current (group) size if left blank.}
|
||||
|
||||
\item{mo}{Any \link{character} that can be coerced to a valid microorganism code with \code{\link[=as.mo]{as.mo()}}.}
|
||||
\item{mo}{Any \link{character} that can be coerced to a valid microorganism code with \code{\link[=as.mo]{as.mo()}}. Can be the same length as \code{size}.}
|
||||
|
||||
\item{ab}{Any \link{character} that can be coerced to a valid antimicrobial drug code with \code{\link[=as.ab]{as.ab()}}.}
|
||||
|
||||
\item{skew}{Direction of skew for MIC or disk values, either \code{"right"} or \code{"left"}. A left-skewed distribution has the majority of the data on the right.}
|
||||
|
||||
\item{severity}{Skew severity; higher values will increase the skewedness. Default is \code{2}; use \code{0} to prevent skewedness.}
|
||||
|
||||
\item{...}{Ignored, only in place to allow future extensions.}
|
||||
|
||||
\item{prob_SIR}{A vector of length 3: the probabilities for "S" (1st value), "I" (2nd value) and "R" (3rd value).}
|
||||
@ -31,17 +37,25 @@ class \code{mic} for \code{\link[=random_mic]{random_mic()}} (see \code{\link[=a
|
||||
These functions can be used for generating random MIC values and disk diffusion diameters, for AMR data analysis practice. By providing a microorganism and antimicrobial drug, the generated results will reflect reality as much as possible.
|
||||
}
|
||||
\details{
|
||||
The base \R function \code{\link[=sample]{sample()}} is used for generating values.
|
||||
|
||||
Generated values are based on the EUCAST 2025 guideline as implemented in the \link{clinical_breakpoints} data set. To create specific generated values per bug or drug, set the \code{mo} and/or \code{ab} argument.
|
||||
Internally, MIC and disk zone values are sampled based on clinical breakpoints defined in the \link{clinical_breakpoints} data set. To create specific generated values per bug or drug, set the \code{mo} and/or \code{ab} argument. The MICs are sampled on a log2 scale and disks linearly, using weighted probabilities. The weights are based on the \code{skew} and \code{severity} arguments:
|
||||
\itemize{
|
||||
\item \code{skew = "right"} places more emphasis on lower MIC or higher disk values.
|
||||
\item \code{skew = "left"} places more emphasis on higher MIC or lower disk values.
|
||||
\item \code{severity} controls the exponential bias applied.
|
||||
}
|
||||
}
|
||||
\examples{
|
||||
random_mic(25)
|
||||
random_disk(25)
|
||||
random_sir(25)
|
||||
|
||||
# add more skewedness, make more realistic by setting a bug and/or drug:
|
||||
disks <- random_disk(100, severity = 2, mo = "Escherichia coli", ab = "CIP")
|
||||
plot(disks)
|
||||
# `plot()` and `ggplot2::autoplot()` allow for coloured bars if `mo` and `ab` are set
|
||||
plot(disks, mo = "Escherichia coli", ab = "CIP", guideline = "CLSI 2025")
|
||||
|
||||
\donttest{
|
||||
# make the random generation more realistic by setting a bug and/or drug:
|
||||
random_mic(25, "Klebsiella pneumoniae") # range 0.0625-64
|
||||
random_mic(25, "Klebsiella pneumoniae", "meropenem") # range 0.0625-16
|
||||
random_mic(25, "Streptococcus pneumoniae", "meropenem") # range 0.0625-4
|
||||
|
Reference in New Issue
Block a user