1
0
mirror of https://github.com/msberends/AMR.git synced 2025-12-16 06:30:21 +01:00
Files
AMR/reference/ggplot_pca.md
2025-11-24 10:42:21 +00:00

229 lines
6.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# PCA Biplot with `ggplot2`
Produces a `ggplot2` variant of a so-called
[biplot](https://en.wikipedia.org/wiki/Biplot) for PCA (principal
component analysis), but is more flexible and more appealing than the
base R [`biplot()`](https://rdrr.io/r/stats/biplot.html) function.
## Usage
``` r
ggplot_pca(x, choices = 1:2, scale = 1, pc.biplot = TRUE,
labels = NULL, labels_textsize = 3, labels_text_placement = 1.5,
groups = NULL, ellipse = TRUE, ellipse_prob = 0.68,
ellipse_size = 0.5, ellipse_alpha = 0.5, points_size = 2,
points_alpha = 0.25, arrows = TRUE, arrows_colour = "darkblue",
arrows_size = 0.5, arrows_textsize = 3, arrows_textangled = TRUE,
arrows_alpha = 0.75, base_textsize = 10, ...)
```
## Source
The `ggplot_pca()` function is based on the `ggbiplot()` function from
the `ggbiplot` package by Vince Vu, as found on GitHub:
<https://github.com/vqv/ggbiplot> (retrieved: 2 March 2020, their latest
commit:
[`7325e88`](https://github.com/vqv/ggbiplot/commit/7325e880485bea4c07465a0304c470608fffb5d9);
12 February 2015).
As per their GPL-2 licence that demands documentation of code changes,
the changes made based on the source code were:
1. Rewritten code to remove the dependency on packages `plyr`, `scales`
and `grid`
2. Parametrised more options, like arrow and ellipse settings
3. Hardened all input possibilities by defining the exact type of user
input for every argument
4. Added total amount of explained variance as a caption in the plot
5. Cleaned all syntax based on the `lintr` package, fixed grammatical
errors and added integrity checks
6. Updated documentation
## Arguments
- x:
An object returned by
[`pca()`](https://amr-for-r.org/reference/pca.md),
[`prcomp()`](https://rdrr.io/r/stats/prcomp.html) or
[`princomp()`](https://rdrr.io/r/stats/princomp.html).
- choices:
length 2 vector specifying the components to plot. Only the default is
a biplot in the strict sense.
- scale:
The variables are scaled by `lambda ^ scale` and the observations are
scaled by `lambda ^ (1-scale)` where `lambda` are the singular values
as computed by [`princomp`](https://rdrr.io/r/stats/princomp.html).
Normally `0 <= scale <= 1`, and a warning will be issued if the
specified `scale` is outside this range.
- pc.biplot:
If true, use what Gabriel (1971) refers to as a "principal component
biplot", with `lambda = 1` and observations scaled up by sqrt(n) and
variables scaled down by sqrt(n). Then inner products between
variables approximate covariances and distances between observations
approximate Mahalanobis distance.
- labels:
An optional vector of labels for the observations. If set, the labels
will be placed below their respective points. When using the
[`pca()`](https://amr-for-r.org/reference/pca.md) function as input
for `x`, this will be determined automatically based on the attribute
`non_numeric_cols`, see
[`pca()`](https://amr-for-r.org/reference/pca.md).
- labels_textsize:
The size of the text used for the labels.
- labels_text_placement:
Adjustment factor the placement of the variable names (`>=1` means
further away from the arrow head).
- groups:
An optional vector of groups for the labels, with the same length as
`labels`. If set, the points and labels will be coloured according to
these groups. When using the
[`pca()`](https://amr-for-r.org/reference/pca.md) function as input
for `x`, this will be determined automatically based on the attribute
`non_numeric_cols`, see
[`pca()`](https://amr-for-r.org/reference/pca.md).
- ellipse:
A [logical](https://rdrr.io/r/base/logical.html) to indicate whether a
normal data ellipse should be drawn for each group (set with
`groups`).
- ellipse_prob:
Statistical size of the ellipse in normal probability.
- ellipse_size:
The size of the ellipse line.
- ellipse_alpha:
The alpha (transparency) of the ellipse line.
- points_size:
The size of the points.
- points_alpha:
The alpha (transparency) of the points.
- arrows:
A [logical](https://rdrr.io/r/base/logical.html) to indicate whether
arrows should be drawn.
- arrows_colour:
The colour of the arrow and their text.
- arrows_size:
The size (thickness) of the arrow lines.
- arrows_textsize:
The size of the text at the end of the arrows.
- arrows_textangled:
A [logical](https://rdrr.io/r/base/logical.html) whether the text at
the end of the arrows should be angled.
- arrows_alpha:
The alpha (transparency) of the arrows and their text.
- base_textsize:
The text size for all plot elements except the labels and arrows.
- ...:
Arguments passed on to functions.
## Details
The colours for labels and points can be changed by adding another scale
layer for colour, such as
[`scale_colour_viridis_d()`](https://ggplot2.tidyverse.org/reference/scale_viridis.html)
and
[`scale_colour_brewer()`](https://ggplot2.tidyverse.org/reference/scale_brewer.html).
## Examples
``` r
# `example_isolates` is a data set available in the AMR package.
# See ?example_isolates.
# \donttest{
if (require("dplyr")) {
# calculate the resistance per group first
resistance_data <- example_isolates %>%
group_by(
order = mo_order(mo), # group on anything, like order
genus = mo_genus(mo)
) %>% # and genus as we do here;
filter(n() >= 30) %>% # filter on only 30 results per group
summarise_if(is.sir, resistance) # then get resistance of all drugs
# now conduct PCA for certain antimicrobial drugs
pca_result <- resistance_data %>%
pca(AMC, CXM, CTX, CAZ, GEN, TOB, TMP, SXT)
summary(pca_result)
# old base R plotting method:
biplot(pca_result, main = "Base R biplot")
# new ggplot2 plotting method using this package:
if (require("ggplot2")) {
ggplot_pca(pca_result) +
labs(title = "ggplot2 biplot")
}
if (require("ggplot2")) {
# still extendible with any ggplot2 function
ggplot_pca(pca_result) +
scale_colour_viridis_d() +
labs(title = "ggplot2 biplot")
}
}
#> Warning: There were 73 warnings in `summarise()`.
#> The first warning was:
#> In argument: `PEN = (function (..., minimum = 30, as_percent = FALSE,
#> only_all_tested = FALSE) ...`.
#> In group 5: `order = "Lactobacillales"` `genus = "Enterococcus"`.
#> Caused by warning:
#> ! Introducing NA: only 14 results available for PEN in group: order =
#> "Lactobacillales", genus = "Enterococcus" (`minimum` = 30).
#> Run `dplyr::last_dplyr_warnings()` to see the 72 remaining warnings.
#> Columns selected for PCA: "AMC", "CAZ", "CTX", "CXM", "GEN", "SXT",
#> "TMP", and "TOB". Total observations available: 7.
#> Groups (n=4, named as 'order'):
#> [1] "Caryophanales" "Enterobacterales" "Lactobacillales" "Pseudomonadales"
#>
# }
```