Principal Component Analysis (for AMR) — pca
Principal Component Analysis (for AMR) — pca
Performs a principal component analysis (PCA) based on a data set with automatic determination for afterwards plotting the groups and labels, and automatic filtering on only suitable (i.e. non-empty and numeric) variables.
< div class = "row" >
< div class = "col-md-9 contents" >
< div class = "page-header" >
Principal Component Analysis (for AMR)
< div class = "ref-description" >
2020-03-08 11:18:59 +01:00
Performs a principal component analysis (PCA) based on a data set with automatic determination for afterwards plotting the groups and labels, and automatic filtering on only suitable (i.e. non-empty and numeric) variables.
2020-03-07 21:48:21 +01:00
< / div >
pca(
x,
...,
retx = TRUE,
center = TRUE,
scale. = TRUE,
tol = NULL,
rank. = NULL
)
Arguments
< table class = "ref-arguments" >
< colgroup > < col class = "name" / > < col class = "desc" / > < / colgroup >
< tr >
< th > x< / th >
< td > < p > a < a href = 'https://rdrr.io/r/base/data.frame.html' > data.frame< / a > containing numeric columns< / p > < / td >
< / tr >
< tr >
< th > ...< / th >
< td > < p > columns of < code > x< / code > to be selected for PCA, can be unquoted since it supports quasiquotation.< / p > < / td >
< / tr >
< tr >
< th > retx< / th >
< td > < p > a logical value indicating whether the rotated variables
should be returned.< / p > < / td >
< / tr >
< tr >
< th > center< / th >
< td > < p > a logical value indicating whether the variables
should be shifted to be zero centered. Alternately, a vector of
length equal the number of columns of < code > x< / code > can be supplied.
The value is passed to < code > scale< / code > .< / p > < / td >
< / tr >
< tr >
< th > scale.< / th >
< td > < p > a logical value indicating whether the variables should
be scaled to have unit variance before the analysis takes
place. The default is < code > FALSE< / code > for consistency with S, but
in general scaling is advisable. Alternatively, a vector of length
equal the number of columns of < code > x< / code > can be supplied. The
value is passed to < code > < a href = 'https://rdrr.io/r/base/scale.html' > scale< / a > < / code > .< / p > < / td >
< / tr >
< tr >
< th > tol< / th >
< td > < p > a value indicating the magnitude below which components
should be omitted. (Components are omitted if their
standard deviations are less than or equal to < code > tol< / code > times the
standard deviation of the first component.) With the default null
setting, no components are omitted (unless < code > rank.< / code > is specified
less than < code > < a href = 'https://rdrr.io/r/base/Extremes.html' > min(dim(x))< / a > < / code > .). Other settings for tol could be
< code > tol = 0< / code > or < code > tol = sqrt(.Machine$double.eps)< / code > , which
would omit essentially constant components.< / p > < / td >
< / tr >
< tr >
< th > rank.< / th >
< td > < p > optionally, a number specifying the maximal rank, i.e.,
maximal number of principal components to be used. Can be set as
alternative or in addition to < code > tol< / code > , useful notably when the
desired rank is considerably smaller than the dimensions of the matrix.< / p > < / td >
< / tr >
< / table >
Value
An object of classes pca and prcomp
< p > An object of classes pca and < a href = 'https://rdrr.io/r/stats/prcomp.html' > prcomp< / a > < / p >
Details
2020-03-08 11:18:59 +01:00
The pca() function takes a data.frame as input and performs the actual PCA with the R function prcomp().

The result of the pca() function is a prcomp object, with an additional attribute non_numeric_cols which is a vector with the column names of all columns that do not contain numeric values. These are probably the groups and labels, and will be used by ggplot_pca().
< p > The result of the < code > pca()< / code > function is a < a href = 'https://rdrr.io/r/stats/prcomp.html' > prcomp< / a > object, with an additional attribute < code > non_numeric_cols< / code > which is a vector with the column names of all columns that do not contain numeric values. These are probably the groups and labels, and will be used by < code > < a href = 'ggplot_pca.html' > ggplot_pca()< / a > < / code > .< / p >
Maturing lifecycle
2020-03-14 14:05:43 +01:00
< p > < img src = 'figures/lifecycle_maturing.svg' style = margin-bottom:5px / > < br / >
The < a href = 'lifecycle.html' > lifecycle< / a > of this function is < strong > maturing< / strong > . The unlying code of a maturing function has been roughed out, but finer details might still change. Since this function needs wider usage and more extensive testing, you are very welcome < a href = 'https://github.com/msberends/AMR/issues' > to suggest changes at our repository< / a > or < a href = 'AMR.html' > write us an email (see section 'Contact Us')< / a > .< / p >
Examples
# `example_isolates` is a dataset available in the AMR package.
# See ?example_isolates.
< span class = 'co' > # See ?example_isolates.< / span >
if (FALSE) {
< span class = 'co' > # calculate the resistance per group first< / span >
< span class = 'fu' > < a href = 'https://rdrr.io/r/base/library.html' > library< / a > < / span > (< span class = 'no' > dplyr< / span > )
< span class = 'no' > resistance_data< / span > < span class = 'kw' > < -< / span > < span class = 'no' > example_isolates< / span > < span class = 'kw' > %> %< / span >
< span class = 'fu' > < a href = 'https://dplyr.tidyverse.org/reference/group_by.html' > group_by< / a > < / span > (< span class = 'kw' > order< / span > < span class = 'kw' > =< / span > < span class = 'fu' > < a href = 'mo_property.html' > mo_order< / a > < / span > (< span class = 'no' > mo< / span > ), < span class = 'co' > # group on anything, like order< / span >
< span class = 'kw' > genus< / span > < span class = 'kw' > =< / span > < span class = 'fu' > < a href = 'mo_property.html' > mo_genus< / a > < / span > (< span class = 'no' > mo< / span > )) < span class = 'kw' > %> %< / span > < span class = 'co' > # and genus as we do here< / span >
< span class = 'fu' > < a href = 'https://dplyr.tidyverse.org/reference/summarise_all.html' > summarise_if< / a > < / span > (< span class = 'no' > is.rsi< / span > , < span class = 'no' > resistance< / span > ) < span class = 'co' > # then get resistance of all drugs< / span >
< span class = 'co' > # now conduct PCA for certain antimicrobial agents< / span >
< span class = 'no' > pca_result< / span > < span class = 'kw' > < -< / span > < span class = 'no' > resistance_data< / span > < span class = 'kw' > %> %< / span >
< span class = 'fu' > pca< / span > (< span class = 'no' > AMC< / span > , < span class = 'no' > CXM< / span > , < span class = 'no' > CTX< / span > , < span class = 'no' > CAZ< / span > , < span class = 'no' > GEN< / span > , < span class = 'no' > TOB< / span > , < span class = 'no' > TMP< / span > , < span class = 'no' > SXT< / span > )
< span class = 'no' > pca_result< / span >
< span class = 'fu' > < a href = 'https://rdrr.io/r/base/summary.html' > summary< / a > < / span > (< span class = 'no' > pca_result< / span > )
< span class = 'fu' > < a href = 'https://rdrr.io/r/stats/biplot.html' > biplot< / a > < / span > (< span class = 'no' > pca_result< / span > )
2020-05-16 13:05:47 +02:00
ggplot_pca(pca_result) # a new and convenient plot function
}
}< / pre >
