For PCA, we need to transform our AMR data first. This is what the example_isolates
data set in this package looks like:
-library(AMR) -library(dplyr) -glimpse(example_isolates) +library(AMR) +library(dplyr) +glimpse(example_isolates) # Rows: 2,000 # Columns: 49 # $ date <date> 2002-01-02, 2002-01-03, 2002-01-07, 2002-01-07, 2002… @@ -263,18 +264,17 @@ # $ CHL <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… # $ COL <rsi> NA, NA, R, R, R, R, R, R, R, R, R, R, NA, NA, NA, R, … # $ MUP <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… -# $ RIF <rsi> R, R, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R, R, R… -
Now to transform this to a data set with only resistance percentages per taxonomic order and genus:
-resistance_data <- example_isolates %>% - group_by(order = mo_order(mo), # group on anything, like order - genus = mo_genus(mo)) %>% # and genus as we do here - summarise_if(is.rsi, resistance) %>% # then get resistance of all drugs - select(order, genus, AMC, CXM, CTX, - CAZ, GEN, TOB, TMP, SXT) # and select only relevant columns +resistance_data <- example_isolates %>% + group_by(order = mo_order(mo), # group on anything, like order + genus = mo_genus(mo)) %>% # and genus as we do here + summarise_if(is.rsi, resistance) %>% # then get resistance of all drugs + select(order, genus, AMC, CXM, CTX, + CAZ, GEN, TOB, TMP, SXT) # and select only relevant columns -head(resistance_data) +head(resistance_data) # # A tibble: 6 x 10 # # Groups: order [2] # order genus AMC CXM CTX CAZ GEN TOB TMP SXT @@ -284,46 +284,40 @@ # 3 Actinomycetales Cutibacterium NA NA NA NA NA NA NA NA # 4 Actinomycetales Dermabacter NA NA NA NA NA NA NA NA # 5 Actinomycetales Micrococcus NA NA NA NA NA NA NA NA -# 6 Actinomycetales Rothia NA NA NA NA NA NA NA NA -
The new pca()
function will automatically filter on rows that contain numeric values in all selected variables, so we now only need to do:
-pca_result <- pca(resistance_data) +pca_result <- pca(resistance_data) # NOTE: Columns selected for PCA: AMC CXM CTX CAZ GEN TOB TMP SXT. -# Total observations available: 7. -
The result can be reviewed with the good old summary()
function:
-summary(pca_result) +summary(pca_result) # Importance of components: # PC1 PC2 PC3 PC4 PC5 PC6 PC7 # Standard deviation 2.154 1.6807 0.61365 0.33902 0.20757 0.03136 1.733e-16 # Proportion of Variance 0.580 0.3531 0.04707 0.01437 0.00539 0.00012 0.000e+00 -# Cumulative Proportion 0.580 0.9331 0.98012 0.99449 0.99988 1.00000 1.000e+00 -
Good news. The first two components explain a total of 93.3% of the variance (see the PC1 and PC2 values of the Proportion of Variance. We can create a so-called biplot with the base R biplot()
function, to see which antimicrobial resistance per drug explain the difference per microorganism.
But we can’t see the explanation of the points. Perhaps this works better with our new ggplot_pca()
function, that automatically adds the right labels and even groups:
-ggplot_pca(pca_result) -
You can also print an ellipse per group, and edit the appearance:
-ggplot_pca(pca_result, ellipse = TRUE) + - ggplot2::labs(title = "An AMR/PCA biplot!") -
vignettes/SPSS.Rmd
SPSS.Rmd
To demonstrate the first point:
# not all values are valid MIC values: -as.mic(0.125) +as.mic(0.125) # Class <mic> # [1] 0.125 -as.mic("testvalue") +as.mic("testvalue") # Class <mic> # [1] <NA> # the Gram stain is avaiable for all bacteria: -mo_gramstain("E. coli") +mo_gramstain("E. coli") # [1] "Gram-negative" # Klebsiella is intrinsic resistant to amoxicllin, according to EUCAST: -klebsiella_test <- data.frame(mo = "klebsiella", - amox = "S", - stringsAsFactors = FALSE) -klebsiella_test # (our original data) +klebsiella_test <- data.frame(mo = "klebsiella", + amox = "S", + stringsAsFactors = FALSE) +klebsiella_test # (our original data) # mo amox # 1 klebsiella S -eucast_rules(klebsiella_test, info = FALSE) # (the edited data by EUCAST rules) +eucast_rules(klebsiella_test, info = FALSE) # (the edited data by EUCAST rules) # mo amox # 1 klebsiella R # hundreds of trade names can be translated to a name, trade name or an ATC code: -ab_name("floxapen") +ab_name("floxapen") # [1] "Flucloxacillin" -ab_tradenames("floxapen") +ab_tradenames("floxapen") # [1] "floxacillin" "floxapen" "floxapen sodium salt" # [4] "fluclox" "flucloxacilina" "flucloxacillin" # [7] "flucloxacilline" "flucloxacillinum" "fluorochloroxacillin" -ab_atc("floxapen") -# [1] "J01CF05" -
If additional packages are needed, RStudio will ask you if they should be installed on beforehand.
In the the window that opens, you can define all options (parameters) that should be used for import and you’re ready to go:
-If you want named variables to be imported as factors so it resembles SPSS more, use as_factor()
.
If you want named variables to be imported as factors so it resembles SPSS more, use as_factor()
.
The difference is this:
-SPSS_data +SPSS_data # # A tibble: 4,203 x 4 # v001 sex status statusage # <dbl> <dbl+lbl> <dbl+lbl> <dbl> @@ -313,7 +313,7 @@ # 10 10018 0 1 66.6 # # … with 4,193 more rows -as_factor(SPSS_data) +as_factor(SPSS_data) # # A tibble: 4,203 x 4 # v001 sex status statusage # <dbl> <fct> <fct> <dbl> @@ -327,8 +327,7 @@ # 8 10011 Male alive 73.1 # 9 10017 Male alive 56.7 # 10 10018 Female alive 66.6 -# # … with 4,193 more rows -
To import data from SPSS, SAS or Stata, you can use the great haven
package yourself:
# download and install the latest version: -install.packages("haven") +install.packages("haven") # load the package you just installed: -library(haven) -
You can now import files as follows:
To read files from SPSS into R:
# read any SPSS file based on file extension (best way): -read_spss(file = "path/to/file") +read_spss(file = "path/to/file") # read .sav or .zsav file: -read_sav(file = "path/to/file") +read_sav(file = "path/to/file") # read .por file: -read_por(file = "path/to/file") -
Do not forget about as_factor()
, as mentioned above.
Do not forget about as_factor()
, as mentioned above.
To export your R objects to the SPSS file format:
# save as .sav file: -write_sav(data = yourdata, path = "path/to/file") +write_sav(data = yourdata, path = "path/to/file") # save as compressed .zsav file: -write_sav(data = yourdata, path = "path/to/file", compress = TRUE) -
To read files from SAS into R:
# read .sas7bdat + .sas7bcat files: -read_sas(data_file = "path/to/file", catalog_file = NULL) +read_sas(data_file = "path/to/file", catalog_file = NULL) # read SAS transport files (version 5 and version 8): -read_xpt(file = "path/to/file") -
To export your R objects to the SAS file format:
# save as regular SAS file: -write_sas(data = yourdata, path = "path/to/file") +write_sas(data = yourdata, path = "path/to/file") # the SAS transport format is an open format # (required for submission of the data to the FDA) -write_xpt(data = yourdata, path = "path/to/file", version = 8) -
To read files from Stata into R:
# read .dta file: -read_stata(file = "/path/to/file") +read_stata(file = "/path/to/file") # works exactly the same: -read_dta(file = "/path/to/file") -
To export your R objects to the Stata file format:
# save as .dta file, Stata version 14: # (supports Stata v8 until v15 at the time of writing) -write_dta(data = yourdata, path = "/path/to/file", version = 14) -
This tutorial assumes you already imported the WHONET data with e.g. the readxl
package. In RStudio, this can be done using the menu button ‘Import Dataset’ in the tab ‘Environment’. Choose the option ‘From Excel’ and select your exported file. Make sure date fields are imported correctly.
An example syntax could look like this:
-library(readxl) -data <- read_excel(path = "path/to/your/file.xlsx") -
This package comes with an example data set WHONET
. We will use it for this analysis.
First, load the relevant packages if you did not yet did this. I use the tidyverse for all of my analyses. All of them. If you don’t know it yet, I suggest you read about it on their website: https://www.tidyverse.org/.
-library(dplyr) # part of tidyverse -library(ggplot2) # part of tidyverse -library(AMR) # this package -library(cleaner) # to create frequency tables -
We will have to transform some variables to simplify and automate the analysis:
mo
) using our Catalogue of Life reference data set, which contains all ~70,000 microorganisms from the taxonomic kingdoms Bacteria, Fungi and Protozoa. We do the tranformation with as.mo()
. This function also recognises almost all WHONET abbreviations of microorganisms.# transform variables -data <- WHONET %>% +data <- WHONET %>% # get microbial ID based on given organism - mutate(mo = as.mo(Organism)) %>% + mutate(mo = as.mo(Organism)) %>% # transform everything from "AMP_ND10" to "CIP_EE" to the new `rsi` class - mutate_at(vars(AMP_ND10:CIP_EE), as.rsi) -
No errors or warnings, so all values are transformed succesfully.
We also created a package dedicated to data cleaning and checking, called the cleaner
package. Its freq()
function can be used to create frequency tables.
So let’s check our data, with a couple of frequency tables:
# our newly created `mo` variable, put in the mo_name() function -data %>% freq(mo_name(mo), nmax = 10) -
Frequency table
Class: character
Length: 500
@@ -344,15 +341,16 @@ Longest: 40
# our transformed antibiotic columns # amoxicillin/clavulanic acid (J01CR02) as an example -data %>% freq(AMC_ND2) -
Frequency table
Class: factor > ordered > rsi (numeric)
Length: 500
Levels: 3: S < I < R
Available: 481 (96.2%, NA: 19 = 3.8%)
Unique: 3
%SI: 78.59%
+Drug: Amoxicillin/clavulanic acid (AMC, J01CR02)
+Drug group: Beta-lactams/penicillins
+%SI: 78.59%
@@ -395,11 +393,10 @@ Unique: 3 A first glimpse at results |
---|
For the vancomycin resistance in Gram-positive bacteria, a linear model might be more appropriate since no binomial distribution is to be expected based on the observed years:
-example_isolates %>% - filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>% - resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>% - ggplot_rsi_predict() -# NOTE: Using column `date` as input for `col_date`. -
This seems more likely, doesn’t it?
The model itself is also available from the object, as an attribute
:
-model <- attributes(predict_TZP)$model +model <- attributes(predict_TZP)$model -summary(model)$family +summary(model)$family # # Family: binomial # Link function: logit -summary(model)$coefficients +summary(model)$coefficients # Estimate Std. Error z value Pr(>|z|) # (Intercept) -200.67944891 46.17315349 -4.346237 1.384932e-05 -# year 0.09883005 0.02295317 4.305725 1.664395e-05 -
NEWS.md
- is_gram_negative()
and is_gram_positive()
as wrappers around mo_gramstain()
. They always return TRUE
or FALSE
, thus always return FALSE
for species outside the taxonomic kingdom of Bacteria.%not_like%
and %like_perl%
as wrappers around %like%
.%not_like%
and %not_like_case%
as wrappers around %like%
and %like_case%
. The RStudio addin to insert the text " %like% " as provided in this package now iterates over all like variants. So if you have defined the keyboard shortcut Ctrl/Cmd + L to this addin, it will first insert %like%
and by pressing it again it will be replaced with %not_like%
, etc.p_symbol()
that not really fits the scope of this package. It will be removed in a future version. See here for the source code to preserve it.as.rsi()
on a data.framereference_df
in as.mo()
and mo_*()
functions that contain old microbial codes (from previous package versions)Making this package independent of especially the tidyverse (e.g. packages dplyr
and tidyr
) tremendously increases sustainability on the long term, since tidyverse functions change quite often. Good for users, but hard for package maintainers. Most of our functions are replaced with versions that only rely on base R, which keeps this package fully functional for many years to come, without requiring a lot of maintenance to keep up with other packages anymore. Another upside it that this package can now be used with all versions of R since R-3.0.0 (April 2013). Our package is being used in settings where the resources are very limited. Fewer dependencies on newer software is helpful for such settings.
Negative effects of this change are:
freq()
that was borrowed from the cleaner
package was removed. Use cleaner::freq()
, or run library("cleaner")
before you use freq()
.freq()
that was borrowed from the cleaner
package was removed. Use cleaner::freq()
, or run library("cleaner")
before you use freq()
.mo
or rsi
in a tibble will no longer be in colour and printing rsi
in a tibble will show the class <ord>
, not <rsi>
anymore. This is purely a visual effect.mo_*
family (like mo_name()
and mo_gramstain()
) are noticeably slower when running on hundreds of thousands of rows.mo
and ab
now both also inherit class character
, to support any data transformation. This change invalidates code that checks for class length == 1.This is important, because a value like "testvalue"
could never be understood by e.g. mo_name()
, although the class would suggest a valid microbial code.
Function freq()
has moved to a new package, clean
(CRAN link), since creating frequency tables actually does not fit the scope of this package. The freq()
function still works, since it is re-exported from the clean
package (which will be installed automatically upon updating this AMR
package).
Function freq()
has moved to a new package, clean
(CRAN link), since creating frequency tables actually does not fit the scope of this package. The freq()
function still works, since it is re-exported from the clean
package (which will be installed automatically upon updating this AMR
package).
Renamed data set septic_patients
to example_isolates
age()
function gained a new parameter exact
to determine ages with decimalsguess_mo()
, guess_atc()
, EUCAST_rules()
, interpretive_reading()
, rsi()
freq()
):
+freq()
):
speed improvement for microbial IDs
fixed factor level names for R Markdown
support for boxplots:
age_groups()
, to let groups of fives and tens end with 100+ instead of 120+freq()
for when all values are NA
+freq()
for when all values are NA
first_isolate()
for when dates are missingguess_ab_col()
@@ -1315,7 +1317,7 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/
freq()
function):
+freq()
function):
Support for tidyverse quasiquotation! Now you can create frequency tables of function outcomes:
@@ -1324,15 +1326,15 @@ This works for all drug combinations, such as ampicillin/sulbactam, ceftazidime/ # OLD WAY septic_patients %>% mutate(genus = mo_genus(mo)) %>% - freq(genus) + freq(genus) # NEW WAY septic_patients %>% - freq(mo_genus(mo)) + freq(mo_genus(mo)) # Even supports grouping variables: septic_patients %>% group_by(gender) %>% - freq(mo_genus(mo))Header info is now available as a list, with the header
function
The parameter header
is now set to TRUE
at default, even for markdown
Using portion_*
functions now throws a warning when total available isolate is below parameter minimum
Functions as.mo
, as.rsi
, as.mic
, as.atc
and freq
will not set package name as attribute anymore
Frequency tables - freq()
:
Frequency tables - freq()
:
Support for grouping variables, test with:
septic_patients %>% group_by(hospital_id) %>% - freq(gender)
Support for (un)selecting columns:
Check for hms::is.hms
Check for hms::is.hms
Now prints in markdown at default in non-interactive sessions
No longer adds the factor level column and sorts factors on count again
Support for class difftime
Removed diacritics from all authors (columns microorganisms$ref
and microorganisms.old$ref
) to comply with CRAN policy to only allow ASCII characters
Fix for mo_property
not working properly
Fix for eucast_rules
where some Streptococci would become ceftazidime R in EUCAST rule 4.5
Support for named vectors of class mo
, useful for top_freq()
Support for named vectors of class mo
, useful for top_freq()
ggplot_rsi
and scale_y_percent
have breaks
parameter
AI improvements for as.mo
:
Support for types (classes) list and matrix for freq
For lists, subsetting is possible:
my_list = list(age = septic_patients$age, gender = septic_patients$gender) -my_list %>% freq(age) -my_list %>% freq(gender)
rsi
(antimicrobial resistance) to use as inputtable
to use as input: freq(table(x, y))
+table
to use as input: freq(table(x, y))
hist
and plot
to use a frequency table as input: hist(freq(df$age))
as.vector
, as.data.frame
, as_tibble
and format
freq(mydata, mycolumn)
is the same as mydata %>% freq(mycolumn)
+freq(mydata, mycolumn)
is the same as mydata %>% freq(mycolumn)
top_freq
function to return the top/below n items as vectorFilter isolates on results in specific antimicrobial classes. This makes it easy to filter on isolates that were tested for e.g. any aminoglycoside, or to filter on carbapenem-resistant isolates without the need to specify the drugs.
-filter_ab_class(x, ab_class, result = NULL, scope = "any", ...) +filter_ab_class(x, ab_class, result = NULL, scope = "any", ...) -filter_aminoglycosides(x, result = NULL, scope = "any", ...) +filter_aminoglycosides(x, result = NULL, scope = "any", ...) -filter_carbapenems(x, result = NULL, scope = "any", ...) +filter_carbapenems(x, result = NULL, scope = "any", ...) -filter_cephalosporins(x, result = NULL, scope = "any", ...) +filter_cephalosporins(x, result = NULL, scope = "any", ...) -filter_1st_cephalosporins(x, result = NULL, scope = "any", ...) +filter_1st_cephalosporins(x, result = NULL, scope = "any", ...) -filter_2nd_cephalosporins(x, result = NULL, scope = "any", ...) +filter_2nd_cephalosporins(x, result = NULL, scope = "any", ...) -filter_3rd_cephalosporins(x, result = NULL, scope = "any", ...) +filter_3rd_cephalosporins(x, result = NULL, scope = "any", ...) -filter_4th_cephalosporins(x, result = NULL, scope = "any", ...) +filter_4th_cephalosporins(x, result = NULL, scope = "any", ...) -filter_5th_cephalosporins(x, result = NULL, scope = "any", ...) +filter_5th_cephalosporins(x, result = NULL, scope = "any", ...) -filter_fluoroquinolones(x, result = NULL, scope = "any", ...) +filter_fluoroquinolones(x, result = NULL, scope = "any", ...) -filter_glycopeptides(x, result = NULL, scope = "any", ...) +filter_glycopeptides(x, result = NULL, scope = "any", ...) -filter_macrolides(x, result = NULL, scope = "any", ...) +filter_macrolides(x, result = NULL, scope = "any", ...) -filter_penicillins(x, result = NULL, scope = "any", ...) +filter_penicillins(x, result = NULL, scope = "any", ...) -filter_tetracyclines(x, result = NULL, scope = "any", ...)+filter_tetracyclines(x, result = NULL, scope = "any", ...)
-
+
|
- Pattern Matching |
+ Pattern matching with keyboard shortcut |
||
diff --git a/docs/reference/key_antibiotics.html b/docs/reference/key_antibiotics.html
index 14e314da..89c5d3be 100644
--- a/docs/reference/key_antibiotics.html
+++ b/docs/reference/key_antibiotics.html
@@ -82,7 +82,7 @@
diff --git a/docs/reference/like.html b/docs/reference/like.html
index 9459a30f..e7d875a9 100644
--- a/docs/reference/like.html
+++ b/docs/reference/like.html
@@ -6,7 +6,7 @@
-
@@ -246,7 +246,11 @@
x %like% pattern
-x %like_case% pattern
+x %not_like% pattern
+
+x %like_case% pattern
+
+x %not_like_case% pattern
Arguments
|
diff --git a/docs/reference/lifecycle.html b/docs/reference/lifecycle.html
index e6a5443d..2372b3f7 100644
--- a/docs/reference/lifecycle.html
+++ b/docs/reference/lifecycle.html
@@ -84,7 +84,7 @@ This page contains a section for every lifecycle (with text borrowed from the af
@@ -295,7 +295,7 @@ The lifecycle of this function is questioning. This function mi