(v1.0.1.9000) first PCA implementation

2025-08-24 13:12:09 +02:00 · 2020-03-07 21:48:21 +01:00
parent f444c24ed3
commit fa0d9c58d9
40 changed files with 2224 additions and 172 deletions
--- a/vignettes/EUCAST.Rmd
+++ b/vignettes/EUCAST.Rmd
@@ -71,8 +71,8 @@ data
 knitr::kable(data, align = "lccccccc")
 ```
 ```{r, eval = FALSE}
-eucast_rules(data, info = FALSE)
+eucast_rules(data)
 ```
 ```{r, echo = FALSE, message = FALSE}
-knitr::kable(eucast_rules(data, info = FALSE), align = "lccccccc")
+knitr::kable(eucast_rules(data), align = "lccccccc")
 ```
--- a/vignettes/PCA.Rmd
+++ b/vignettes/PCA.Rmd
@@ -0,0 +1,91 @@
+---
+title: "How to conduct principal component analysis (PCA) for AMR"
+author: "Matthijs S. Berends"
+date: '`r format(Sys.Date(), "%d %B %Y")`'
+output: 
+  rmarkdown::html_vignette:
+    toc: true
+    toc_depth: 3
+vignette: >
+  %\VignetteIndexEntry{Benchmarks}
+  %\VignetteEncoding{UTF-8}
+  %\VignetteEngine{knitr::rmarkdown}
+editor_options: 
+  chunk_output_type: console
+---
+
+```{r setup, include = FALSE, results = 'markup'}
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  comment = "#",
+  fig.width = 7.5,
+  fig.height = 4.5,
+  dpi = 100
+)
+```
+
+**NOTE: This page will be updated soon, as the pca() function is currently being developed.**
+
+# Introduction
+
+# Transforming
+
+For PCA, we need to transform our AMR data first. This is what the `example_isolates` data set in this package looks like:
+
+```{r, message = FALSE}
+library(AMR)
+library(dplyr)
+glimpse(example_isolates)
+```
+
+Now to transform this to a data set with only resistance percentages per taxonomic order and genus:
+
+```{r, warning = FALSE}
+resistance_data <- example_isolates %>% 
+  group_by(order = mo_order(mo),       # group on anything, like order
+           genus = mo_genus(mo)) %>%   #  and genus as we do here
+  summarise_if(is.rsi, resistance) %>% # then get resistance of all drugs
+  select(order, genus, AMC, CXM, CTX, 
+         CAZ, GEN, TOB, TMP, SXT)      # and select only relevant columns
+
+head(resistance_data)
+```
+
+# Perform principal component analysis
+
+The new `pca()` function will automatically filter on rows that contain numeric values in all selected variables, so we now only need to do:
+
+```{r pca}
+pca_result <- pca(resistance_data)
+```
+
+The result can be reviewed with the good old `summary()` function:
+
+```{r}
+summary(pca_result)
+```
+
+```{r, echo = FALSE}
+proportion_of_variance <- summary(pca_result)$importance[2, ]
+```
+
+Good news. The first two components explain a total of `r cleaner::percentage(sum(proportion_of_variance[1:2]))` of the variance (see the PC1 and PC2 values of the *Proportion of Variance*. We can create a so-called biplot with the base R `biplot()` function, to see which antimicrobial resistance per drug explain the difference per microorganism.
+
+# Plotting the results
+
+```{r}
+biplot(pca_result)
+```
+
+But we can't see the explanation of the points. Perhaps this works better with the new `ggplot_pca()` function, that automatically adds the right labels and even groups:
+
+```{r}
+ggplot_pca(pca_result)
+```
+
+You can also print an ellipse per group, and edit the appearance:
+
+```{r}
+ggplot_pca(pca_result, ellipse = TRUE) +
+  ggplot2::labs(title = "An AMR/PCA biplot!")
+```
--- a/vignettes/benchmarks.Rmd
+++ b/vignettes/benchmarks.Rmd
@@ -112,9 +112,9 @@ In the figure below, we compare *Escherichia coli* (which is very common) with *
 ```{r, echo = FALSE, fig.width=12}
 par(mar = c(5, 16, 4, 2))
 boxplot(microbenchmark(
-  as.mo("M. semesiae"),
-  as.mo("P. brevis"),
-  as.mo("E. coli"),
+  as.mo("Meth. semesiae"),
+  as.mo("Prev. brevis"),
+  as.mo("Esc. coli"),
  times = 10),
        horizontal = TRUE, las = 1, unit = "s", log = TRUE,
        xlab = "", ylab = "Time in seconds (log)",