--- title: "How to conduct principal component analysis (PCA) for AMR" author: "Matthijs S. Berends" date: '`r format(Sys.Date(), "%d %B %Y")`' output: rmarkdown::html_vignette: toc: true toc_depth: 3 vignette: > %\VignetteIndexEntry{How to conduct principal component analysis (PCA) for AMR} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r setup, include = FALSE, results = 'markup'} knitr::opts_chunk$set( collapse = TRUE, comment = "#", fig.width = 7.5, fig.height = 4.5, dpi = 100 ) ``` **NOTE: This page will be updated soon, as the pca() function is currently being developed.** # Introduction # Transforming For PCA, we need to transform our AMR data first. This is what the `example_isolates` data set in this package looks like: ```{r, message = FALSE} library(AMR) library(dplyr) glimpse(example_isolates) ``` Now to transform this to a data set with only resistance percentages per taxonomic order and genus: ```{r, warning = FALSE} resistance_data <- example_isolates %>% group_by(order = mo_order(mo), # group on anything, like order genus = mo_genus(mo)) %>% # and genus as we do here summarise_if(is.rsi, resistance) %>% # then get resistance of all drugs select(order, genus, AMC, CXM, CTX, CAZ, GEN, TOB, TMP, SXT) # and select only relevant columns head(resistance_data) ``` # Perform principal component analysis The new `pca()` function will automatically filter on rows that contain numeric values in all selected variables, so we now only need to do: ```{r pca} pca_result <- pca(resistance_data) ``` The result can be reviewed with the good old `summary()` function: ```{r} summary(pca_result) ``` ```{r, echo = FALSE} proportion_of_variance <- summary(pca_result)$importance[2, ] ``` Good news. The first two components explain a total of `r cleaner::percentage(sum(proportion_of_variance[1:2]))` of the variance (see the PC1 and PC2 values of the *Proportion of Variance*. We can create a so-called biplot with the base R `biplot()` function, to see which antimicrobial resistance per drug explain the difference per microorganism. # Plotting the results ```{r} biplot(pca_result) ``` But we can't see the explanation of the points. Perhaps this works better with our new `ggplot_pca()` function, that automatically adds the right labels and even groups: ```{r} ggplot_pca(pca_result) ``` You can also print an ellipse per group, and edit the appearance: ```{r} ggplot_pca(pca_result, ellipse = TRUE) + ggplot2::labs(title = "An AMR/PCA biplot!") ```