mirror of
https://github.com/msberends/AMR.git
synced 2025-12-15 23:10:28 +01:00
139 lines
4.4 KiB
Markdown
139 lines
4.4 KiB
Markdown
# Principal Component Analysis (for AMR)
|
||
|
||
Performs a principal component analysis (PCA) based on a data set with
|
||
automatic determination for afterwards plotting the groups and labels,
|
||
and automatic filtering on only suitable (i.e. non-empty and numeric)
|
||
variables.
|
||
|
||
## Usage
|
||
|
||
``` r
|
||
pca(x, ..., retx = TRUE, center = TRUE, scale. = TRUE, tol = NULL,
|
||
rank. = NULL)
|
||
```
|
||
|
||
## Arguments
|
||
|
||
- x:
|
||
|
||
A [data.frame](https://rdrr.io/r/base/data.frame.html) containing
|
||
[numeric](https://rdrr.io/r/base/numeric.html) columns.
|
||
|
||
- ...:
|
||
|
||
Columns of `x` to be selected for PCA, can be unquoted since it
|
||
supports quasiquotation.
|
||
|
||
- retx:
|
||
|
||
a logical value indicating whether the rotated variables should be
|
||
returned.
|
||
|
||
- center:
|
||
|
||
a logical value indicating whether the variables should be shifted to
|
||
be zero centered. Alternately, a vector of length equal the number of
|
||
columns of `x` can be supplied. The value is passed to `scale`.
|
||
|
||
- scale.:
|
||
|
||
a logical value indicating whether the variables should be scaled to
|
||
have unit variance before the analysis takes place. The default is
|
||
`FALSE` for consistency with S, but in general scaling is advisable.
|
||
Alternatively, a vector of length equal the number of columns of `x`
|
||
can be supplied. The value is passed to
|
||
[`scale`](https://rdrr.io/r/base/scale.html).
|
||
|
||
- tol:
|
||
|
||
a value indicating the magnitude below which components should be
|
||
omitted. (Components are omitted if their standard deviations are less
|
||
than or equal to `tol` times the standard deviation of the first
|
||
component.) With the default null setting, no components are omitted
|
||
(unless `rank.` is specified less than `min(dim(x))`.). Other settings
|
||
for `tol` could be `tol = 0` or `tol = sqrt(.Machine$double.eps)`,
|
||
which would omit essentially constant components.
|
||
|
||
- rank.:
|
||
|
||
optionally, a number specifying the maximal rank, i.e., maximal number
|
||
of principal components to be used. Can be set as alternative or in
|
||
addition to `tol`, useful notably when the desired rank is
|
||
considerably smaller than the dimensions of the matrix.
|
||
|
||
## Value
|
||
|
||
An object of classes pca and
|
||
[prcomp](https://rdrr.io/r/stats/prcomp.html)
|
||
|
||
## Details
|
||
|
||
The `pca()` function takes a
|
||
[data.frame](https://rdrr.io/r/base/data.frame.html) as input and
|
||
performs the actual PCA with the R function
|
||
[`prcomp()`](https://rdrr.io/r/stats/prcomp.html).
|
||
|
||
The result of the `pca()` function is a
|
||
[prcomp](https://rdrr.io/r/stats/prcomp.html) object, with an additional
|
||
attribute `non_numeric_cols` which is a vector with the column names of
|
||
all columns that do not contain
|
||
[numeric](https://rdrr.io/r/base/numeric.html) values. These are
|
||
probably the groups and labels, and will be used by
|
||
[`ggplot_pca()`](https://amr-for-r.org/reference/ggplot_pca.md).
|
||
|
||
## Examples
|
||
|
||
``` r
|
||
# `example_isolates` is a data set available in the AMR package.
|
||
# See ?example_isolates.
|
||
|
||
# \donttest{
|
||
if (require("dplyr")) {
|
||
# calculate the resistance per group first
|
||
resistance_data <- example_isolates %>%
|
||
group_by(
|
||
order = mo_order(mo), # group on anything, like order
|
||
genus = mo_genus(mo)
|
||
) %>% # and genus as we do here;
|
||
filter(n() >= 30) %>% # filter on only 30 results per group
|
||
summarise_if(is.sir, resistance) # then get resistance of all drugs
|
||
|
||
# now conduct PCA for certain antimicrobial drugs
|
||
pca_result <- resistance_data %>%
|
||
pca(AMC, CXM, CTX, CAZ, GEN, TOB, TMP, SXT)
|
||
|
||
pca_result
|
||
summary(pca_result)
|
||
# old base R plotting method:
|
||
biplot(pca_result)
|
||
}
|
||
#> Warning: There were 73 warnings in `summarise()`.
|
||
#> The first warning was:
|
||
#> ℹ In argument: `PEN = (function (..., minimum = 30, as_percent = FALSE,
|
||
#> only_all_tested = FALSE) ...`.
|
||
#> ℹ In group 5: `order = "Lactobacillales"` `genus = "Enterococcus"`.
|
||
#> Caused by warning:
|
||
#> ! Introducing NA: only 14 results available for PEN in group: order =
|
||
#> "Lactobacillales", genus = "Enterococcus" (`minimum` = 30).
|
||
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 72 remaining warnings.
|
||
#> ℹ Columns selected for PCA: "AMC", "CAZ", "CTX", "CXM", "GEN", "SXT",
|
||
#> "TMP", and "TOB". Total observations available: 7.
|
||
#> Groups (n=4, named as 'order'):
|
||
#> [1] "Caryophanales" "Enterobacterales" "Lactobacillales" "Pseudomonadales"
|
||
#>
|
||
|
||
|
||
# new ggplot2 plotting method using this package:
|
||
if (require("dplyr") && require("ggplot2")) {
|
||
ggplot_pca(pca_result)
|
||
}
|
||
|
||
if (require("dplyr") && require("ggplot2")) {
|
||
ggplot_pca(pca_result) +
|
||
scale_colour_viridis_d() +
|
||
labs(title = "Title here")
|
||
}
|
||
|
||
# }
|
||
```
|