mirror of
https://github.com/msberends/AMR.git
synced 2025-12-18 20:20:23 +01:00
Built site for AMR@3.0.1.9003: ba30b08
This commit is contained in:
138
reference/pca.md
Normal file
138
reference/pca.md
Normal file
@@ -0,0 +1,138 @@
|
||||
# Principal Component Analysis (for AMR)
|
||||
|
||||
Performs a principal component analysis (PCA) based on a data set with
|
||||
automatic determination for afterwards plotting the groups and labels,
|
||||
and automatic filtering on only suitable (i.e. non-empty and numeric)
|
||||
variables.
|
||||
|
||||
## Usage
|
||||
|
||||
``` r
|
||||
pca(x, ..., retx = TRUE, center = TRUE, scale. = TRUE, tol = NULL,
|
||||
rank. = NULL)
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
- x:
|
||||
|
||||
A [data.frame](https://rdrr.io/r/base/data.frame.html) containing
|
||||
[numeric](https://rdrr.io/r/base/numeric.html) columns.
|
||||
|
||||
- ...:
|
||||
|
||||
Columns of `x` to be selected for PCA, can be unquoted since it
|
||||
supports quasiquotation.
|
||||
|
||||
- retx:
|
||||
|
||||
a logical value indicating whether the rotated variables should be
|
||||
returned.
|
||||
|
||||
- center:
|
||||
|
||||
a logical value indicating whether the variables should be shifted to
|
||||
be zero centered. Alternately, a vector of length equal the number of
|
||||
columns of `x` can be supplied. The value is passed to `scale`.
|
||||
|
||||
- scale.:
|
||||
|
||||
a logical value indicating whether the variables should be scaled to
|
||||
have unit variance before the analysis takes place. The default is
|
||||
`FALSE` for consistency with S, but in general scaling is advisable.
|
||||
Alternatively, a vector of length equal the number of columns of `x`
|
||||
can be supplied. The value is passed to
|
||||
[`scale`](https://rdrr.io/r/base/scale.html).
|
||||
|
||||
- tol:
|
||||
|
||||
a value indicating the magnitude below which components should be
|
||||
omitted. (Components are omitted if their standard deviations are less
|
||||
than or equal to `tol` times the standard deviation of the first
|
||||
component.) With the default null setting, no components are omitted
|
||||
(unless `rank.` is specified less than `min(dim(x))`.). Other settings
|
||||
for `tol` could be `tol = 0` or `tol = sqrt(.Machine$double.eps)`,
|
||||
which would omit essentially constant components.
|
||||
|
||||
- rank.:
|
||||
|
||||
optionally, a number specifying the maximal rank, i.e., maximal number
|
||||
of principal components to be used. Can be set as alternative or in
|
||||
addition to `tol`, useful notably when the desired rank is
|
||||
considerably smaller than the dimensions of the matrix.
|
||||
|
||||
## Value
|
||||
|
||||
An object of classes pca and
|
||||
[prcomp](https://rdrr.io/r/stats/prcomp.html)
|
||||
|
||||
## Details
|
||||
|
||||
The `pca()` function takes a
|
||||
[data.frame](https://rdrr.io/r/base/data.frame.html) as input and
|
||||
performs the actual PCA with the R function
|
||||
[`prcomp()`](https://rdrr.io/r/stats/prcomp.html).
|
||||
|
||||
The result of the `pca()` function is a
|
||||
[prcomp](https://rdrr.io/r/stats/prcomp.html) object, with an additional
|
||||
attribute `non_numeric_cols` which is a vector with the column names of
|
||||
all columns that do not contain
|
||||
[numeric](https://rdrr.io/r/base/numeric.html) values. These are
|
||||
probably the groups and labels, and will be used by
|
||||
[`ggplot_pca()`](https://amr-for-r.org/reference/ggplot_pca.md).
|
||||
|
||||
## Examples
|
||||
|
||||
``` r
|
||||
# `example_isolates` is a data set available in the AMR package.
|
||||
# See ?example_isolates.
|
||||
|
||||
# \donttest{
|
||||
if (require("dplyr")) {
|
||||
# calculate the resistance per group first
|
||||
resistance_data <- example_isolates %>%
|
||||
group_by(
|
||||
order = mo_order(mo), # group on anything, like order
|
||||
genus = mo_genus(mo)
|
||||
) %>% # and genus as we do here;
|
||||
filter(n() >= 30) %>% # filter on only 30 results per group
|
||||
summarise_if(is.sir, resistance) # then get resistance of all drugs
|
||||
|
||||
# now conduct PCA for certain antimicrobial drugs
|
||||
pca_result <- resistance_data %>%
|
||||
pca(AMC, CXM, CTX, CAZ, GEN, TOB, TMP, SXT)
|
||||
|
||||
pca_result
|
||||
summary(pca_result)
|
||||
# old base R plotting method:
|
||||
biplot(pca_result)
|
||||
}
|
||||
#> Warning: There were 73 warnings in `summarise()`.
|
||||
#> The first warning was:
|
||||
#> ℹ In argument: `PEN = (function (..., minimum = 30, as_percent = FALSE,
|
||||
#> only_all_tested = FALSE) ...`.
|
||||
#> ℹ In group 5: `order = "Lactobacillales"` `genus = "Enterococcus"`.
|
||||
#> Caused by warning:
|
||||
#> ! Introducing NA: only 14 results available for PEN in group: order =
|
||||
#> "Lactobacillales", genus = "Enterococcus" (`minimum` = 30).
|
||||
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 72 remaining warnings.
|
||||
#> ℹ Columns selected for PCA: "AMC", "CAZ", "CTX", "CXM", "GEN", "SXT",
|
||||
#> "TMP", and "TOB". Total observations available: 7.
|
||||
#> Groups (n=4, named as 'order'):
|
||||
#> [1] "Caryophanales" "Enterobacterales" "Lactobacillales" "Pseudomonadales"
|
||||
#>
|
||||
|
||||
|
||||
# new ggplot2 plotting method using this package:
|
||||
if (require("dplyr") && require("ggplot2")) {
|
||||
ggplot_pca(pca_result)
|
||||
}
|
||||
|
||||
if (require("dplyr") && require("ggplot2")) {
|
||||
ggplot_pca(pca_result) +
|
||||
scale_colour_viridis_d() +
|
||||
labs(title = "Title here")
|
||||
}
|
||||
|
||||
# }
|
||||
```
|
||||
Reference in New Issue
Block a user