mirror of
https://github.com/msberends/AMR.git
synced 2025-07-09 16:22:05 +02:00
(v1.0.1.9000) first PCA implementation
This commit is contained in:
@ -71,8 +71,8 @@ data
|
||||
knitr::kable(data, align = "lccccccc")
|
||||
```
|
||||
```{r, eval = FALSE}
|
||||
eucast_rules(data, info = FALSE)
|
||||
eucast_rules(data)
|
||||
```
|
||||
```{r, echo = FALSE, message = FALSE}
|
||||
knitr::kable(eucast_rules(data, info = FALSE), align = "lccccccc")
|
||||
knitr::kable(eucast_rules(data), align = "lccccccc")
|
||||
```
|
||||
|
91
vignettes/PCA.Rmd
Executable file
91
vignettes/PCA.Rmd
Executable file
@ -0,0 +1,91 @@
|
||||
---
|
||||
title: "How to conduct principal component analysis (PCA) for AMR"
|
||||
author: "Matthijs S. Berends"
|
||||
date: '`r format(Sys.Date(), "%d %B %Y")`'
|
||||
output:
|
||||
rmarkdown::html_vignette:
|
||||
toc: true
|
||||
toc_depth: 3
|
||||
vignette: >
|
||||
%\VignetteIndexEntry{Benchmarks}
|
||||
%\VignetteEncoding{UTF-8}
|
||||
%\VignetteEngine{knitr::rmarkdown}
|
||||
editor_options:
|
||||
chunk_output_type: console
|
||||
---
|
||||
|
||||
```{r setup, include = FALSE, results = 'markup'}
|
||||
knitr::opts_chunk$set(
|
||||
collapse = TRUE,
|
||||
comment = "#",
|
||||
fig.width = 7.5,
|
||||
fig.height = 4.5,
|
||||
dpi = 100
|
||||
)
|
||||
```
|
||||
|
||||
**NOTE: This page will be updated soon, as the pca() function is currently being developed.**
|
||||
|
||||
# Introduction
|
||||
|
||||
# Transforming
|
||||
|
||||
For PCA, we need to transform our AMR data first. This is what the `example_isolates` data set in this package looks like:
|
||||
|
||||
```{r, message = FALSE}
|
||||
library(AMR)
|
||||
library(dplyr)
|
||||
glimpse(example_isolates)
|
||||
```
|
||||
|
||||
Now to transform this to a data set with only resistance percentages per taxonomic order and genus:
|
||||
|
||||
```{r, warning = FALSE}
|
||||
resistance_data <- example_isolates %>%
|
||||
group_by(order = mo_order(mo), # group on anything, like order
|
||||
genus = mo_genus(mo)) %>% # and genus as we do here
|
||||
summarise_if(is.rsi, resistance) %>% # then get resistance of all drugs
|
||||
select(order, genus, AMC, CXM, CTX,
|
||||
CAZ, GEN, TOB, TMP, SXT) # and select only relevant columns
|
||||
|
||||
head(resistance_data)
|
||||
```
|
||||
|
||||
# Perform principal component analysis
|
||||
|
||||
The new `pca()` function will automatically filter on rows that contain numeric values in all selected variables, so we now only need to do:
|
||||
|
||||
```{r pca}
|
||||
pca_result <- pca(resistance_data)
|
||||
```
|
||||
|
||||
The result can be reviewed with the good old `summary()` function:
|
||||
|
||||
```{r}
|
||||
summary(pca_result)
|
||||
```
|
||||
|
||||
```{r, echo = FALSE}
|
||||
proportion_of_variance <- summary(pca_result)$importance[2, ]
|
||||
```
|
||||
|
||||
Good news. The first two components explain a total of `r cleaner::percentage(sum(proportion_of_variance[1:2]))` of the variance (see the PC1 and PC2 values of the *Proportion of Variance*. We can create a so-called biplot with the base R `biplot()` function, to see which antimicrobial resistance per drug explain the difference per microorganism.
|
||||
|
||||
# Plotting the results
|
||||
|
||||
```{r}
|
||||
biplot(pca_result)
|
||||
```
|
||||
|
||||
But we can't see the explanation of the points. Perhaps this works better with the new `ggplot_pca()` function, that automatically adds the right labels and even groups:
|
||||
|
||||
```{r}
|
||||
ggplot_pca(pca_result)
|
||||
```
|
||||
|
||||
You can also print an ellipse per group, and edit the appearance:
|
||||
|
||||
```{r}
|
||||
ggplot_pca(pca_result, ellipse = TRUE) +
|
||||
ggplot2::labs(title = "An AMR/PCA biplot!")
|
||||
```
|
@ -112,9 +112,9 @@ In the figure below, we compare *Escherichia coli* (which is very common) with *
|
||||
```{r, echo = FALSE, fig.width=12}
|
||||
par(mar = c(5, 16, 4, 2))
|
||||
boxplot(microbenchmark(
|
||||
as.mo("M. semesiae"),
|
||||
as.mo("P. brevis"),
|
||||
as.mo("E. coli"),
|
||||
as.mo("Meth. semesiae"),
|
||||
as.mo("Prev. brevis"),
|
||||
as.mo("Esc. coli"),
|
||||
times = 10),
|
||||
horizontal = TRUE, las = 1, unit = "s", log = TRUE,
|
||||
xlab = "", ylab = "Time in seconds (log)",
|
||||
|
Reference in New Issue
Block a user