2020-03-07 21:48:21 +01:00
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/pca.R
2020-03-08 11:18:59 +01:00
\name{pca}
2020-03-07 21:48:21 +01:00
\alias{pca}
\title{Principal Component Analysis (for AMR)}
\usage{
2020-03-08 11:18:59 +01:00
pca(
2020-03-07 21:48:21 +01:00
x,
...,
retx = TRUE,
center = TRUE,
scale. = TRUE,
tol = NULL,
rank. = NULL
)
}
\arguments{
2021-05-12 18:15:03 +02:00
\item{x}{a \link{data.frame} containing \link{numeric} columns}
2020-03-07 21:48:21 +01:00
2020-05-16 13:05:47 +02:00
\item{...}{columns of \code{x} to be selected for PCA, can be unquoted since it supports quasiquotation.}
2020-03-07 21:48:21 +01:00
\item{retx}{a logical value indicating whether the rotated variables
should be returned.}
\item{center}{a logical value indicating whether the variables
should be shifted to be zero centered. Alternately, a vector of
length equal the number of columns of \code{x} can be supplied.
The value is passed to \code{scale}.}
\item{scale.}{a logical value indicating whether the variables should
be scaled to have unit variance before the analysis takes
place. The default is \code{FALSE} for consistency with S, but
in general scaling is advisable. Alternatively, a vector of length
equal the number of columns of \code{x} can be supplied. The
value is passed to \code{\link{scale}}.}
\item{tol}{a value indicating the magnitude below which components
should be omitted. (Components are omitted if their
standard deviations are less than or equal to \code{tol} times the
standard deviation of the first component.) With the default null
setting, no components are omitted (unless \code{rank.} is specified
2024-05-20 15:27:04 +02:00
less than \code{min(dim(x))}.). Other settings for \code{tol} could be
2020-03-07 21:48:21 +01:00
\code{tol = 0} or \code{tol = sqrt(.Machine$double.eps)}, which
would omit essentially constant components.}
\item{rank.}{optionally, a number specifying the maximal rank, i.e.,
maximal number of principal components to be used. Can be set as
alternative or in addition to \code{tol}, useful notably when the
desired rank is considerably smaller than the dimensions of the matrix.}
}
2020-03-08 11:18:59 +01:00
\value{
An object of classes \link{pca} and \link{prcomp}
}
2020-03-07 21:48:21 +01:00
\description{
2020-03-08 11:18:59 +01:00
Performs a principal component analysis (PCA) based on a data set with automatic determination for afterwards plotting the groups and labels, and automatic filtering on only suitable (i.e. non-empty and numeric) variables.
2020-03-07 21:48:21 +01:00
}
\details{
2020-03-08 11:18:59 +01:00
The \code{\link[=pca]{pca()}} function takes a \link{data.frame} as input and performs the actual PCA with the \R function \code{\link[=prcomp]{prcomp()}}.
2020-03-07 21:48:21 +01:00
2021-05-12 18:15:03 +02:00
The result of the \code{\link[=pca]{pca()}} function is a \link{prcomp} object, with an additional attribute \code{non_numeric_cols} which is a vector with the column names of all columns that do not contain \link{numeric} values. These are probably the groups and labels, and will be used by \code{\link[=ggplot_pca]{ggplot_pca()}}.
2020-03-07 21:48:21 +01:00
}
\examples{
2021-01-24 14:48:56 +01:00
# `example_isolates` is a data set available in the AMR package.
2020-03-07 21:48:21 +01:00
# See ?example_isolates.
2020-09-29 23:35:46 +02:00
\donttest{
if (require("dplyr")) {
2022-08-28 10:31:50 +02:00
# calculate the resistance per group first
resistance_data <- example_isolates \%>\%
group_by(
order = mo_order(mo), # group on anything, like order
genus = mo_genus(mo)
) \%>\% # and genus as we do here;
filter(n() >= 30) \%>\% # filter on only 30 results per group
2023-01-21 23:47:20 +01:00
summarise_if(is.sir, resistance) # then get resistance of all drugs
2022-08-28 10:31:50 +02:00
2022-11-13 13:44:25 +01:00
# now conduct PCA for certain antimicrobial drugs
2022-08-28 10:31:50 +02:00
pca_result <- resistance_data \%>\%
pca(AMC, CXM, CTX, CAZ, GEN, TOB, TMP, SXT)
2020-09-29 23:35:46 +02:00
pca_result
summary(pca_result)
2022-08-28 10:31:50 +02:00
2022-08-21 16:37:20 +02:00
# old base R plotting method:
2020-09-29 23:35:46 +02:00
biplot(pca_result)
2022-08-21 16:37:20 +02:00
# new ggplot2 plotting method using this package:
if (require("ggplot2")) {
2022-11-05 12:06:40 +01:00
ggplot_pca(pca_result)
2022-08-21 16:37:20 +02:00
ggplot_pca(pca_result) +
scale_colour_viridis_d() +
labs(title = "Title here")
}
2020-09-29 23:35:46 +02:00
}
2020-03-07 21:48:21 +01:00
}
2020-05-16 13:05:47 +02:00
}