1
0
mirror of https://github.com/msberends/AMR.git synced 2025-12-15 23:10:28 +01:00
Files
AMR/reference/pca.md
2025-11-24 10:42:21 +00:00

139 lines
4.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Principal Component Analysis (for AMR)
Performs a principal component analysis (PCA) based on a data set with
automatic determination for afterwards plotting the groups and labels,
and automatic filtering on only suitable (i.e. non-empty and numeric)
variables.
## Usage
``` r
pca(x, ..., retx = TRUE, center = TRUE, scale. = TRUE, tol = NULL,
rank. = NULL)
```
## Arguments
- x:
A [data.frame](https://rdrr.io/r/base/data.frame.html) containing
[numeric](https://rdrr.io/r/base/numeric.html) columns.
- ...:
Columns of `x` to be selected for PCA, can be unquoted since it
supports quasiquotation.
- retx:
a logical value indicating whether the rotated variables should be
returned.
- center:
a logical value indicating whether the variables should be shifted to
be zero centered. Alternately, a vector of length equal the number of
columns of `x` can be supplied. The value is passed to `scale`.
- scale.:
a logical value indicating whether the variables should be scaled to
have unit variance before the analysis takes place. The default is
`FALSE` for consistency with S, but in general scaling is advisable.
Alternatively, a vector of length equal the number of columns of `x`
can be supplied. The value is passed to
[`scale`](https://rdrr.io/r/base/scale.html).
- tol:
a value indicating the magnitude below which components should be
omitted. (Components are omitted if their standard deviations are less
than or equal to `tol` times the standard deviation of the first
component.) With the default null setting, no components are omitted
(unless `rank.` is specified less than `min(dim(x))`.). Other settings
for `tol` could be `tol = 0` or `tol = sqrt(.Machine$double.eps)`,
which would omit essentially constant components.
- rank.:
optionally, a number specifying the maximal rank, i.e., maximal number
of principal components to be used. Can be set as alternative or in
addition to `tol`, useful notably when the desired rank is
considerably smaller than the dimensions of the matrix.
## Value
An object of classes pca and
[prcomp](https://rdrr.io/r/stats/prcomp.html)
## Details
The `pca()` function takes a
[data.frame](https://rdrr.io/r/base/data.frame.html) as input and
performs the actual PCA with the R function
[`prcomp()`](https://rdrr.io/r/stats/prcomp.html).
The result of the `pca()` function is a
[prcomp](https://rdrr.io/r/stats/prcomp.html) object, with an additional
attribute `non_numeric_cols` which is a vector with the column names of
all columns that do not contain
[numeric](https://rdrr.io/r/base/numeric.html) values. These are
probably the groups and labels, and will be used by
[`ggplot_pca()`](https://amr-for-r.org/reference/ggplot_pca.md).
## Examples
``` r
# `example_isolates` is a data set available in the AMR package.
# See ?example_isolates.
# \donttest{
if (require("dplyr")) {
# calculate the resistance per group first
resistance_data <- example_isolates %>%
group_by(
order = mo_order(mo), # group on anything, like order
genus = mo_genus(mo)
) %>% # and genus as we do here;
filter(n() >= 30) %>% # filter on only 30 results per group
summarise_if(is.sir, resistance) # then get resistance of all drugs
# now conduct PCA for certain antimicrobial drugs
pca_result <- resistance_data %>%
pca(AMC, CXM, CTX, CAZ, GEN, TOB, TMP, SXT)
pca_result
summary(pca_result)
# old base R plotting method:
biplot(pca_result)
}
#> Warning: There were 73 warnings in `summarise()`.
#> The first warning was:
#> In argument: `PEN = (function (..., minimum = 30, as_percent = FALSE,
#> only_all_tested = FALSE) ...`.
#> In group 5: `order = "Lactobacillales"` `genus = "Enterococcus"`.
#> Caused by warning:
#> ! Introducing NA: only 14 results available for PEN in group: order =
#> "Lactobacillales", genus = "Enterococcus" (`minimum` = 30).
#> Run `dplyr::last_dplyr_warnings()` to see the 72 remaining warnings.
#> Columns selected for PCA: "AMC", "CAZ", "CTX", "CXM", "GEN", "SXT",
#> "TMP", and "TOB". Total observations available: 7.
#> Groups (n=4, named as 'order'):
#> [1] "Caryophanales" "Enterobacterales" "Lactobacillales" "Pseudomonadales"
#>
# new ggplot2 plotting method using this package:
if (require("dplyr") && require("ggplot2")) {
ggplot_pca(pca_result)
}
if (require("dplyr") && require("ggplot2")) {
ggplot_pca(pca_result) +
scale_colour_viridis_d() +
labs(title = "Title here")
}
# }
```