Produces a ggplot2
variant of a so-called biplot for PCA (principal component analysis), but is more flexible and more appealing than the base R biplot()
function.
ggplot_pca(
x,
choices = 1:2,
scale = 1,
pc.biplot = TRUE,
labels = NULL,
labels_textsize = 3,
labels_text_placement = 1.5,
groups = NULL,
ellipse = TRUE,
ellipse_prob = 0.68,
ellipse_size = 0.5,
ellipse_alpha = 0.5,
points_size = 2,
points_alpha = 0.25,
arrows = TRUE,
arrows_colour = "darkblue",
arrows_size = 0.5,
arrows_textsize = 3,
arrows_textangled = TRUE,
arrows_alpha = 0.75,
base_textsize = 10,
...
)
The ggplot_pca()
function is based on the ggbiplot()
function from the ggbiplot
package by Vince Vu, as found on GitHub: https://github.com/vqv/ggbiplot (retrieved: 2 March 2020, their latest commit: 7325e88
; 12 February 2015).
As per their GPL-2 licence that demands documentation of code changes, the changes made based on the source code were:
Rewritten code to remove the dependency on packages plyr
, scales
and grid
Parametrised more options, like arrow and ellipse settings
Hardened all input possibilities by defining the exact type of user input for every argument
Added total amount of explained variance as a caption in the plot
Cleaned all syntax based on the lintr
package, fixed grammatical errors and added integrity checks
Updated documentation
an object returned by pca()
, prcomp()
or princomp()
length 2 vector specifying the components to plot. Only the default is a biplot in the strict sense.
The variables are scaled by lambda ^ scale
and the
observations are scaled by lambda ^ (1-scale)
where
lambda
are the singular values as computed by
princomp
. Normally 0 <= scale <= 1
, and a warning
will be issued if the specified scale
is outside this range.
If true, use what Gabriel (1971) refers to as a "principal component
biplot", with lambda = 1
and observations scaled up by sqrt(n) and
variables scaled down by sqrt(n). Then inner products between
variables approximate covariances and distances between observations
approximate Mahalanobis distance.
an optional vector of labels for the observations. If set, the labels will be placed below their respective points. When using the pca()
function as input for x
, this will be determined automatically based on the attribute non_numeric_cols
, see pca()
.
the size of the text used for the labels
adjustment factor the placement of the variable names (>=1
means further away from the arrow head)
an optional vector of groups for the labels, with the same length as labels
. If set, the points and labels will be coloured according to these groups. When using the pca()
function as input for x
, this will be determined automatically based on the attribute non_numeric_cols
, see pca()
.
a logical to indicate whether a normal data ellipse should be drawn for each group (set with groups
)
statistical size of the ellipse in normal probability
the size of the ellipse line
the alpha (transparency) of the ellipse line
the size of the points
the alpha (transparency) of the points
a logical to indicate whether arrows should be drawn
the colour of the arrow and their text
the size (thickness) of the arrow lines
the size of the text at the end of the arrows
a logical whether the text at the end of the arrows should be angled
the alpha (transparency) of the arrows and their text
the text size for all plot elements except the labels and arrows
arguments passed on to functions
The colours for labels and points can be changed by adding another scale layer for colour, such as scale_colour_viridis_d()
and scale_colour_brewer()
.
The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, an argument will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
# `example_isolates` is a data set available in the AMR package.
# See ?example_isolates.
# See ?pca for more info about Principal Component Analysis (PCA).
# \donttest{
if (require("dplyr")) {
pca_model <- example_isolates %>%
filter(mo_genus(mo) == "Staphylococcus") %>%
group_by(species = mo_shortname(mo)) %>%
summarise_if (is.rsi, resistance) %>%
pca(FLC, AMC, CXM, GEN, TOB, TMP, SXT, CIP, TEC, TCY, ERY)
# old (base R)
biplot(pca_model)
# new
ggplot_pca(pca_model)
if (require("ggplot2")) {
ggplot_pca(pca_model) +
scale_colour_viridis_d() +
labs(title = "Title here")
}
}
# }