2020-03-07 21:48:21 +01:00
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ggplot_pca.R
\name{ggplot_pca}
\alias{ggplot_pca}
2021-01-18 16:57:56 +01:00
\title{PCA Biplot with \code{ggplot2}}
2020-03-07 21:48:21 +01:00
\source{
2020-03-08 09:12:11 +01:00
The \code{\link[=ggplot_pca]{ggplot_pca()}} function is based on the \code{ggbiplot()} function from the \code{ggbiplot} package by Vince Vu, as found on GitHub: \url{https://github.com/vqv/ggbiplot} (retrieved: 2 March 2020, their latest commit: \href{https://github.com/vqv/ggbiplot/commit/7325e880485bea4c07465a0304c470608fffb5d9}{\code{7325e88}}; 12 February 2015).
2020-03-07 21:48:21 +01:00
As per their GPL-2 licence that demands documentation of code changes, the changes made based on the source code were:
\enumerate{
\item Rewritten code to remove the dependency on packages \code{plyr}, \code{scales} and \code{grid}
\item Parametrised more options, like arrow and ellipse settings
2020-12-22 00:51:17 +01:00
\item Hardened all input possibilities by defining the exact type of user input for every argument
2020-03-07 21:48:21 +01:00
\item Added total amount of explained variance as a caption in the plot
2020-10-19 17:09:19 +02:00
\item Cleaned all syntax based on the \code{lintr} package, fixed grammatical errors and added integrity checks
2020-03-07 21:48:21 +01:00
\item Updated documentation
}
}
\usage{
ggplot_pca(
x,
choices = 1:2,
2020-10-19 17:09:19 +02:00
scale = 1,
2020-03-08 11:18:59 +01:00
pc.biplot = TRUE,
2020-03-07 21:48:21 +01:00
labels = NULL,
labels_textsize = 3,
labels_text_placement = 1.5,
groups = NULL,
2020-03-08 11:18:59 +01:00
ellipse = TRUE,
2020-03-07 21:48:21 +01:00
ellipse_prob = 0.68,
ellipse_size = 0.5,
2020-03-08 11:18:59 +01:00
ellipse_alpha = 0.5,
2020-03-07 21:48:21 +01:00
points_size = 2,
points_alpha = 0.25,
arrows = TRUE,
arrows_colour = "darkblue",
arrows_size = 0.5,
arrows_textsize = 3,
2020-07-28 18:39:57 +02:00
arrows_textangled = TRUE,
2020-03-07 21:48:21 +01:00
arrows_alpha = 0.75,
base_textsize = 10,
...
)
}
\arguments{
\item{x}{an object returned by \code{\link[=pca]{pca()}}, \code{\link[=prcomp]{prcomp()}} or \code{\link[=princomp]{princomp()}}}
\item{choices}{
length 2 vector specifying the components to plot. Only the default
is a biplot in the strict sense.
}
\item{scale}{
The variables are scaled by \code{lambda ^ scale} and the
observations are scaled by \code{lambda ^ (1-scale)} where
\code{lambda} are the singular values as computed by
\code{\link[stats]{princomp}}. Normally \code{0 <= scale <= 1}, and a warning
will be issued if the specified \code{scale} is outside this range.
}
2020-03-08 11:18:59 +01:00
\item{pc.biplot}{
If true, use what Gabriel (1971) refers to as a "principal component
biplot", with \code{lambda = 1} and observations scaled up by sqrt(n) and
variables scaled down by sqrt(n). Then inner products between
variables approximate covariances and distances between observations
approximate Mahalanobis distance.
}
2020-03-07 21:48:21 +01:00
\item{labels}{an optional vector of labels for the observations. If set, the labels will be placed below their respective points. When using the \code{\link[=pca]{pca()}} function as input for \code{x}, this will be determined automatically based on the attribute \code{non_numeric_cols}, see \code{\link[=pca]{pca()}}.}
\item{labels_textsize}{the size of the text used for the labels}
\item{labels_text_placement}{adjustment factor the placement of the variable names (\verb{>=1} means further away from the arrow head)}
\item{groups}{an optional vector of groups for the labels, with the same length as \code{labels}. If set, the points and labels will be coloured according to these groups. When using the \code{\link[=pca]{pca()}} function as input for \code{x}, this will be determined automatically based on the attribute \code{non_numeric_cols}, see \code{\link[=pca]{pca()}}.}
2021-05-12 18:15:03 +02:00
\item{ellipse}{a \link{logical} to indicate whether a normal data ellipse should be drawn for each group (set with \code{groups})}
2020-03-07 21:48:21 +01:00
\item{ellipse_prob}{statistical size of the ellipse in normal probability}
\item{ellipse_size}{the size of the ellipse line}
\item{ellipse_alpha}{the alpha (transparency) of the ellipse line}
2020-03-08 09:12:11 +01:00
\item{points_size}{the size of the points}
2020-03-07 21:48:21 +01:00
\item{points_alpha}{the alpha (transparency) of the points}
2021-05-12 18:15:03 +02:00
\item{arrows}{a \link{logical} to indicate whether arrows should be drawn}
2020-03-07 21:48:21 +01:00
\item{arrows_colour}{the colour of the arrow and their text}
\item{arrows_size}{the size (thickness) of the arrow lines}
\item{arrows_textsize}{the size of the text at the end of the arrows}
2021-05-12 18:15:03 +02:00
\item{arrows_textangled}{a \link{logical} whether the text at the end of the arrows should be angled}
2020-07-28 18:39:57 +02:00
2020-03-07 21:48:21 +01:00
\item{arrows_alpha}{the alpha (transparency) of the arrows and their text}
\item{base_textsize}{the text size for all plot elements except the labels and arrows}
2021-04-29 17:16:30 +02:00
\item{...}{arguments passed on to functions}
2020-03-07 21:48:21 +01:00
}
\description{
2020-04-13 21:09:56 +02:00
Produces a \code{ggplot2} variant of a so-called \href{https://en.wikipedia.org/wiki/Biplot}{biplot} for PCA (principal component analysis), but is more flexible and more appealing than the base \R \code{\link[=biplot]{biplot()}} function.
2020-03-07 21:48:21 +01:00
}
\details{
2021-03-15 07:23:21 +01:00
The colours for labels and points can be changed by adding another scale layer for colour, such as \code{scale_colour_viridis_d()} and \code{scale_colour_brewer()}.
2020-03-07 21:48:21 +01:00
}
\examples{
2021-01-24 14:48:56 +01:00
# `example_isolates` is a data set available in the AMR package.
2020-03-07 21:48:21 +01:00
# See ?example_isolates.
2021-05-24 09:00:11 +02:00
\donttest{
2020-09-29 23:35:46 +02:00
if (require("dplyr")) {
2022-08-28 10:31:50 +02:00
# calculate the resistance per group first
resistance_data <- example_isolates \%>\%
group_by(
order = mo_order(mo), # group on anything, like order
genus = mo_genus(mo)
) \%>\% # and genus as we do here;
filter(n() >= 30) \%>\% # filter on only 30 results per group
2023-01-21 23:47:20 +01:00
summarise_if(is.sir, resistance) # then get resistance of all drugs
2022-08-28 10:31:50 +02:00
2022-11-13 13:44:25 +01:00
# now conduct PCA for certain antimicrobial drugs
2022-08-28 10:31:50 +02:00
pca_result <- resistance_data \%>\%
pca(AMC, CXM, CTX, CAZ, GEN, TOB, TMP, SXT)
2022-08-21 16:37:20 +02:00
summary(pca_result)
2022-08-28 10:31:50 +02:00
2022-08-21 16:37:20 +02:00
# old base R plotting method:
2023-12-03 11:34:48 +01:00
biplot(pca_result, main = "Base R biplot")
2022-08-28 10:31:50 +02:00
2022-11-05 09:31:19 +01:00
# new ggplot2 plotting method using this package:
2020-07-28 18:39:57 +02:00
if (require("ggplot2")) {
2023-12-03 11:34:48 +01:00
ggplot_pca(pca_result) +
labs(title = "ggplot2 biplot")
}
if (require("ggplot2")) {
2022-11-05 09:31:19 +01:00
# still extendible with any ggplot2 function
2022-08-21 16:37:20 +02:00
ggplot_pca(pca_result) +
2020-07-28 18:39:57 +02:00
scale_colour_viridis_d() +
2023-12-03 11:34:48 +01:00
labs(title = "ggplot2 biplot")
2020-07-28 18:39:57 +02:00
}
2020-03-07 21:48:21 +01:00
}
2020-05-16 13:05:47 +02:00
}
2021-05-24 09:00:11 +02:00
}