Produces a ggplot2
variant of a so-called biplot for PCA (principal component analysis), but is more flexible and more appealing than the base R biplot()
function.
ggplot_pca( x, choices = 1:2, scale = TRUE, pc.biplot = TRUE, labels = NULL, labels_textsize = 3, labels_text_placement = 1.5, groups = NULL, ellipse = TRUE, ellipse_prob = 0.68, ellipse_size = 0.5, ellipse_alpha = 0.5, points_size = 2, points_alpha = 0.25, arrows = TRUE, arrows_colour = "darkblue", arrows_size = 0.5, arrows_textsize = 3, arrows_alpha = 0.75, base_textsize = 10, ... )
x | an object returned by |
---|---|
choices | length 2 vector specifying the components to plot. Only the default is a biplot in the strict sense. |
scale | The variables are scaled by |
pc.biplot | If true, use what Gabriel (1971) refers to as a "principal component
biplot", with |
labels | an optional vector of labels for the observations. If set, the labels will be placed below their respective points. When using the |
labels_textsize | the size of the text used for the labels |
labels_text_placement | adjustment factor the placement of the variable names ( |
groups | an optional vector of groups for the labels, with the same length as |
ellipse | a logical to indicate whether a normal data ellipse should be drawn for each group (set with |
ellipse_prob | statistical size of the ellipse in normal probability |
ellipse_size | the size of the ellipse line |
ellipse_alpha | the alpha (transparency) of the ellipse line |
points_size | the size of the points |
points_alpha | the alpha (transparency) of the points |
arrows | a logical to indicate whether arrows should be drawn |
arrows_colour | the colour of the arrow and their text |
arrows_size | the size (thickness) of the arrow lines |
arrows_textsize | the size of the text at the end of the arrows |
arrows_alpha | the alpha (transparency) of the arrows and their text |
base_textsize | the text size for all plot elements except the labels and arrows |
... | Parameters passed on to functions |
The ggplot_pca()
function is based on the ggbiplot()
function from the ggbiplot
package by Vince Vu, as found on GitHub: https://github.com/vqv/ggbiplot (retrieved: 2 March 2020, their latest commit: 7325e88
; 12 February 2015).
As per their GPL-2 licence that demands documentation of code changes, the changes made based on the source code were:
Rewritten code to remove the dependency on packages plyr
, scales
and grid
Parametrised more options, like arrow and ellipse settings
Added total amount of explained variance as a caption in the plot
Cleaned all syntax based on the lintr
package and added integrity checks
Updated documentation
The colours for labels and points can be changed by adding another scale layer for colour, like scale_colour_viridis_d()
or scale_colour_brewer()
.
The lifecycle of this function is maturing. The unlying code of a maturing function has been roughed out, but finer details might still change. We will strive to maintain backward compatibility, but the function needs wider usage and more extensive testing in order to optimise the unlying code.
# `example_isolates` is a dataset available in the AMR package. # See ?example_isolates. if (FALSE) { # See ?pca for more info about Principal Component Analysis (PCA). library(dplyr) pca_model <- example_isolates %>% filter(mo_genus(mo) == "Staphylococcus") %>% group_by(species = mo_shortname(mo)) %>% summarise_if (is.rsi, resistance) %>% pca(FLC, AMC, CXM, GEN, TOB, TMP, SXT, CIP, TEC, TCY, ERY) # old biplot(pca_model) # new ggplot_pca(pca_model) }