mirror of
https://github.com/msberends/AMR.git
synced 2026-05-14 01:10:45 +02:00
* Add parallel computing support to antibiogram() and wisca() (#281) For WISCA: simulations are distributed across (group, chunk) job pairs via future.apply::future_lapply(), keeping all workers active even when the regimen count is smaller than nbrOfWorkers(). Sequential fallback with progress ticker is preserved when parallel = FALSE or workers = 1. For grouped antibiograms: each group is processed by a separate worker, mirroring the row-batch approach in as.sir(). Same gate pattern as as.sir() (PR #280): requires a non-sequential future::plan() to be active; auto-upgrades to parallel = TRUE when a parallel plan is detected; throws an informative error otherwise. https://claude.ai/code/session_01FC43syPbzhGmKgrrVNHjnF * Fix version to 3.0.1.9055 and update CLAUDE.md version formula Uses origin/${defaultbranch} (with a fetch) instead of the local branch ref so the commit count is never stale after a merge. https://claude.ai/code/session_01FC43syPbzhGmKgrrVNHjnF * Fix non-ASCII characters in antibiogram.R Replace en/em dashes and non-breaking spaces with ASCII equivalents to satisfy R CMD check portability requirement. https://claude.ai/code/session_01FC43syPbzhGmKgrrVNHjnF * Update auto-generated Rd files after documentation rebuild https://claude.ai/code/session_01FC43syPbzhGmKgrrVNHjnF * Move parallel gate to top of antibiogram.default() like sir.R The gate was inside the wisca==TRUE block, so parallel=TRUE with a sequential plan was silently ignored for non-WISCA antibiograms. Now the gate runs unconditionally at the top of the function, identical to the as.sir() pattern: error on explicit parallel=TRUE with sequential plan, auto-upgrade when a non-sequential plan is already active. https://claude.ai/code/session_01FC43syPbzhGmKgrrVNHjnF * Fix parallel WISCA returning all NA; strengthen tests; add sequential hint Bug: lapply() over a factor yields length-1 factor elements (integer codes), while for() over a factor yields character strings. The job list stored j\$group as a factor integer, but the reassembly loop compared it with identical(j\$group, g) where g was character -- always FALSE, so no simulation chunks were ever assembled and coverage stayed NA throughout. Fix: convert unique_groups to character before building jobs so both the job list and the reassembly loop use the same type. Tests: replaced na.rm = TRUE guards with explicit anyNA() checks so the test suite would have caught the all-NA result immediately. Also adds a sequential-mode performance hint (analogous to sir.R lines 1116-1127) when simulations >= 500 and >= 3 regimens. https://claude.ai/code/session_01FC43syPbzhGmKgrrVNHjnF --------- Co-authored-by: Claude <noreply@anthropic.com>
91 lines
3.7 KiB
R
91 lines
3.7 KiB
R
% Generated by roxygen2: do not edit by hand
|
|
% Please edit documentation in R/pca.R
|
|
\name{pca}
|
|
\alias{pca}
|
|
\title{Principal Component Analysis (for AMR)}
|
|
\usage{
|
|
pca(x, ..., retx = TRUE, center = TRUE, scale. = TRUE, tol = NULL,
|
|
rank. = NULL)
|
|
}
|
|
\arguments{
|
|
\item{x}{A \link{data.frame} containing \link{numeric} columns.}
|
|
|
|
\item{...}{Columns of \code{x} to be selected for PCA, can be unquoted since it supports quasiquotation.}
|
|
|
|
\item{retx}{a logical value indicating whether the rotated variables
|
|
should be returned.}
|
|
|
|
\item{center}{a logical value indicating whether the variables
|
|
should be shifted to be zero centered. Alternately, a vector of
|
|
length equal the number of columns of \code{x} can be supplied.
|
|
The value is passed to \code{scale}.}
|
|
|
|
\item{scale.}{a logical value indicating whether the variables should
|
|
be scaled to have unit variance before the analysis takes
|
|
place. The default is \code{FALSE} for consistency with S, but
|
|
in general scaling is advisable. Alternatively, a vector of length
|
|
equal the number of columns of \code{x} can be supplied. The
|
|
value is passed to \code{\link{scale}}.}
|
|
|
|
\item{tol}{a value indicating the magnitude below which components
|
|
should be omitted. (Components are omitted if their
|
|
standard deviations are less than or equal to \code{tol} times the
|
|
standard deviation of the first component.) With the default null
|
|
setting, no components are omitted (unless \code{rank.} is specified
|
|
less than \code{min(dim(x))}.). Other settings for tol could be
|
|
\code{tol = 0} or \code{tol = sqrt(.Machine$double.eps)}, which
|
|
would omit essentially constant components.}
|
|
|
|
\item{rank.}{optionally, a number specifying the maximal rank, i.e.,
|
|
maximal number of principal components to be used. Can be set as
|
|
alternative or in addition to \code{tol}, useful notably when the
|
|
desired rank is considerably smaller than the dimensions of the matrix.}
|
|
}
|
|
\value{
|
|
An object of classes \link{pca} and \link{prcomp}
|
|
}
|
|
\description{
|
|
Performs a principal component analysis (PCA) based on a data set with automatic determination for afterwards plotting the groups and labels, and automatic filtering on only suitable (i.e. non-empty and numeric) variables.
|
|
}
|
|
\details{
|
|
The \code{\link[=pca]{pca()}} function takes a \link{data.frame} as input and performs the actual PCA with the \R function \code{\link[=prcomp]{prcomp()}}.
|
|
|
|
The result of the \code{\link[=pca]{pca()}} function is a \link{prcomp} object, with an additional attribute \code{non_numeric_cols} which is a vector with the column names of all columns that do not contain \link{numeric} values. These are probably the groups and labels, and will be used by \code{\link[=ggplot_pca]{ggplot_pca()}}.
|
|
}
|
|
\examples{
|
|
# `example_isolates` is a data set available in the AMR package.
|
|
# See ?example_isolates.
|
|
|
|
\donttest{
|
|
if (require("dplyr")) {
|
|
# calculate the resistance per group first
|
|
resistance_data <- example_isolates \%>\%
|
|
group_by(
|
|
order = mo_order(mo), # group on anything, like order
|
|
genus = mo_genus(mo)
|
|
) \%>\% # and genus as we do here;
|
|
filter(n() >= 30) \%>\% # filter on only 30 results per group
|
|
summarise_if(is.sir, resistance) # then get resistance of all drugs
|
|
|
|
# now conduct PCA for certain antimicrobial drugs
|
|
pca_result <- resistance_data \%>\%
|
|
pca(AMC, CXM, CTX, CAZ, GEN, TOB, TMP, SXT)
|
|
|
|
pca_result
|
|
summary(pca_result)
|
|
# old base R plotting method:
|
|
biplot(pca_result)
|
|
}
|
|
|
|
# new ggplot2 plotting method using this package:
|
|
if (require("dplyr") && require("ggplot2")) {
|
|
ggplot_pca(pca_result)
|
|
}
|
|
if (require("dplyr") && require("ggplot2")) {
|
|
ggplot_pca(pca_result) +
|
|
scale_colour_viridis_d() +
|
|
labs(title = "Title here")
|
|
}
|
|
}
|
|
}
|