Import of data

This tutorial assumes you already imported the WHONET data with e.g. the readxl package. In RStudio, this can be done using the menu button ‘Import Dataset’ in the tab ‘Environment’. Choose the option ‘From Excel’ and select your exported file. Make sure date fields are imported correctly.

An example syntax could look like this:

library(readxl)
data <- read_excel(path = "path/to/your/file.xlsx")

This package comes with an example data set WHONET. We will use it for this analysis.

Preparation

First, load the relevant packages if you did not yet did this. I use the tidyverse for all of my analyses. All of them. If you don’t know it yet, I suggest you read about it on their website: https://www.tidyverse.org/.

library(dplyr)   # part of tidyverse
library(ggplot2) # part of tidyverse
library(AMR)     # this package

We will have to transform some variables to simplify and automate the analysis:

  • Microorganisms should be transformed to our own microorganism IDs (called an mo) using our Catalogue of Life reference data set, which contains all ~70,000 microorganisms from the taxonomic kingdoms Bacteria, Fungi and Protozoa. We do the tranformation with as.mo(). This function also recognises almost all WHONET abbreviations of microorganisms.
  • Antimicrobial results or interpretations have to be clean and valid. In other words, they should only contain values "S", "I" or "R". That is exactly where the as.rsi() function is for.

No errors or warnings, so all values are transformed succesfully.

We also created a package dedicated to data cleaning and checking, called the cleaner package. It gets automatically installed with the AMR package. For its freq() function to create frequency tables, you don’t even need to load it yourself as it is available through the AMR package as well.

So let’s check our data, with a couple of frequency tables:

Frequency table

Class: character
Length: 500 (of which NA: 0 = 0%)
Unique: 39

Shortest: 11
Longest: 40

Item Count Percent Cum. Count Cum. Percent
1 Escherichia coli 245 49.0% 245 49.0%
2 Coagulase-negative Staphylococcus (CoNS) 74 14.8% 319 63.8%
3 Staphylococcus epidermidis 38 7.6% 357 71.4%
4 Streptococcus pneumoniae 31 6.2% 388 77.6%
5 Staphylococcus hominis 21 4.2% 409 81.8%
6 Proteus mirabilis 9 1.8% 418 83.6%
7 Enterococcus faecium 8 1.6% 426 85.2%
8 Staphylococcus capitis 8 1.6% 434 86.8%
9 Enterobacter cloacae 5 1.0% 439 87.8%
10 Enterococcus columbae 4 0.8% 443 88.6%

(omitted 29 entries, n = 57 [11.40%])

Frequency table

Class: factor > ordered > rsi (numeric)
Length: 500 (of which NA: 19 = 3.8%)
Levels: 3: S < I < R
Unique: 3

%SI: 78.6%

Item Count Percent Cum. Count Cum. Percent
1 S 356 74.01% 356 74.01%
2 R 103 21.41% 459 95.43%
3 I 22 4.57% 481 100.00%

A first glimpse at results

An easy ggplot will already give a lot of information, using the included ggplot_rsi() function:

data %>%
  group_by(Country) %>%
  select(Country, AMP_ND2, AMC_ED20, CAZ_ED10, CIP_ED5) %>%
  ggplot_rsi(translate_ab = 'ab', facet = "Country", datalabels = FALSE)