1
0
mirror of https://github.com/msberends/AMR.git synced 2025-12-18 20:20:23 +01:00

Built site for AMR@3.0.1.9003: ba30b08

This commit is contained in:
github-actions
2025-11-24 10:42:21 +00:00
parent 7d16891987
commit 141fc468f8
161 changed files with 21798 additions and 313 deletions

138
reference/pca.md Normal file
View File

@@ -0,0 +1,138 @@
# Principal Component Analysis (for AMR)
Performs a principal component analysis (PCA) based on a data set with
automatic determination for afterwards plotting the groups and labels,
and automatic filtering on only suitable (i.e. non-empty and numeric)
variables.
## Usage
``` r
pca(x, ..., retx = TRUE, center = TRUE, scale. = TRUE, tol = NULL,
rank. = NULL)
```
## Arguments
- x:
A [data.frame](https://rdrr.io/r/base/data.frame.html) containing
[numeric](https://rdrr.io/r/base/numeric.html) columns.
- ...:
Columns of `x` to be selected for PCA, can be unquoted since it
supports quasiquotation.
- retx:
a logical value indicating whether the rotated variables should be
returned.
- center:
a logical value indicating whether the variables should be shifted to
be zero centered. Alternately, a vector of length equal the number of
columns of `x` can be supplied. The value is passed to `scale`.
- scale.:
a logical value indicating whether the variables should be scaled to
have unit variance before the analysis takes place. The default is
`FALSE` for consistency with S, but in general scaling is advisable.
Alternatively, a vector of length equal the number of columns of `x`
can be supplied. The value is passed to
[`scale`](https://rdrr.io/r/base/scale.html).
- tol:
a value indicating the magnitude below which components should be
omitted. (Components are omitted if their standard deviations are less
than or equal to `tol` times the standard deviation of the first
component.) With the default null setting, no components are omitted
(unless `rank.` is specified less than `min(dim(x))`.). Other settings
for `tol` could be `tol = 0` or `tol = sqrt(.Machine$double.eps)`,
which would omit essentially constant components.
- rank.:
optionally, a number specifying the maximal rank, i.e., maximal number
of principal components to be used. Can be set as alternative or in
addition to `tol`, useful notably when the desired rank is
considerably smaller than the dimensions of the matrix.
## Value
An object of classes pca and
[prcomp](https://rdrr.io/r/stats/prcomp.html)
## Details
The `pca()` function takes a
[data.frame](https://rdrr.io/r/base/data.frame.html) as input and
performs the actual PCA with the R function
[`prcomp()`](https://rdrr.io/r/stats/prcomp.html).
The result of the `pca()` function is a
[prcomp](https://rdrr.io/r/stats/prcomp.html) object, with an additional
attribute `non_numeric_cols` which is a vector with the column names of
all columns that do not contain
[numeric](https://rdrr.io/r/base/numeric.html) values. These are
probably the groups and labels, and will be used by
[`ggplot_pca()`](https://amr-for-r.org/reference/ggplot_pca.md).
## Examples
``` r
# `example_isolates` is a data set available in the AMR package.
# See ?example_isolates.
# \donttest{
if (require("dplyr")) {
# calculate the resistance per group first
resistance_data <- example_isolates %>%
group_by(
order = mo_order(mo), # group on anything, like order
genus = mo_genus(mo)
) %>% # and genus as we do here;
filter(n() >= 30) %>% # filter on only 30 results per group
summarise_if(is.sir, resistance) # then get resistance of all drugs
# now conduct PCA for certain antimicrobial drugs
pca_result <- resistance_data %>%
pca(AMC, CXM, CTX, CAZ, GEN, TOB, TMP, SXT)
pca_result
summary(pca_result)
# old base R plotting method:
biplot(pca_result)
}
#> Warning: There were 73 warnings in `summarise()`.
#> The first warning was:
#> In argument: `PEN = (function (..., minimum = 30, as_percent = FALSE,
#> only_all_tested = FALSE) ...`.
#> In group 5: `order = "Lactobacillales"` `genus = "Enterococcus"`.
#> Caused by warning:
#> ! Introducing NA: only 14 results available for PEN in group: order =
#> "Lactobacillales", genus = "Enterococcus" (`minimum` = 30).
#> Run `dplyr::last_dplyr_warnings()` to see the 72 remaining warnings.
#> Columns selected for PCA: "AMC", "CAZ", "CTX", "CXM", "GEN", "SXT",
#> "TMP", and "TOB". Total observations available: 7.
#> Groups (n=4, named as 'order'):
#> [1] "Caryophanales" "Enterobacterales" "Lactobacillales" "Pseudomonadales"
#>
# new ggplot2 plotting method using this package:
if (require("dplyr") && require("ggplot2")) {
ggplot_pca(pca_result)
}
if (require("dplyr") && require("ggplot2")) {
ggplot_pca(pca_result) +
scale_colour_viridis_d() +
labs(title = "Title here")
}
# }
```