AMR/vignettes/PCA.Rmd

---
title: "How to conduct principal component analysis (PCA) for AMR"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
vignette: >
  %\VignetteIndexEntry{How to conduct principal component analysis (PCA) for AMR}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options:
  chunk_output_type: console
---

```{r setup, include = FALSE, results = 'markup'}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#",
  fig.width = 7.5,
  fig.height = 4.5,
  dpi = 100
)
```

**NOTE: This page will be updated soon, as the pca() function is currently being developed.**

# Introduction

# Transforming

For PCA, we need to transform our AMR data first. This is what the `example_isolates` data set in this package looks like:

```{r, message = FALSE}
library(AMR)
library(dplyr)
glimpse(example_isolates)
```

Now to transform this to a data set with only resistance percentages per taxonomic order and genus:

```{r, warning = FALSE}
resistance_data <- example_isolates %>%
  group_by(order = mo_order(mo),       # group on anything, like order
           genus = mo_genus(mo)) %>%   #  and genus as we do here
  summarise_if(is.rsi, resistance) %>% # then get resistance of all drugs
  select(order, genus, AMC, CXM, CTX,
         CAZ, GEN, TOB, TMP, SXT)      # and select only relevant columns

head(resistance_data)
```

# Perform principal component analysis

The new `pca()` function will automatically filter on rows that contain numeric values in all selected variables, so we now only need to do:

```{r pca}
pca_result <- pca(resistance_data)
```

The result can be reviewed with the good old `summary()` function:

```{r}
summary(pca_result)
```

```{r, echo = FALSE}
proportion_of_variance <- summary(pca_result)$importance[2, ]
```

Good news. The first two components explain a total of `r cleaner::percentage(sum(proportion_of_variance[1:2]))` of the variance (see the PC1 and PC2 values of the *Proportion of Variance*. We can create a so-called biplot with the base R `biplot()` function, to see which antimicrobial resistance per drug explain the difference per microorganism.

# Plotting the results

```{r}
biplot(pca_result)
```

But we can't see the explanation of the points. Perhaps this works better with our new `ggplot_pca()` function, that automatically adds the right labels and even groups:

```{r}
ggplot_pca(pca_result)
```

You can also print an ellipse per group, and edit the appearance:

```{r}
ggplot_pca(pca_result, ellipse = TRUE) +
  ggplot2::labs(title = "An AMR/PCA biplot!")
```