From 2e4d703338678b4e660de1f08edc21bd5a05c22b Mon Sep 17 00:00:00 2001 From: "Matthijs S. Berends" Date: Mon, 31 Dec 2018 01:48:53 +0100 Subject: [PATCH] eucast rules fix, 1st isolate fix, website update --- DESCRIPTION | 2 +- NEWS.md | 35 +- R/eucast_rules.R | 45 +- R/first_isolate.R | 49 +- R/freq.R | 6 +- README.md | 4 +- docs/articles/AMR.html | 707 ++++++++++- docs/articles/freq.html | 1434 ---------------------- docs/extra.css | 29 + docs/index.html | 56 + docs/news/index.html | 119 +- docs/pkgdown.yml | 3 +- docs/reference/ab_property.html | 8 +- docs/reference/abname.html | 8 +- docs/reference/age.html | 2 +- docs/reference/age_groups.html | 16 +- docs/reference/antibiotics.html | 6 +- docs/reference/as.atc.html | 10 +- docs/reference/as.mic.html | 4 +- docs/reference/as.mo.html | 20 +- docs/reference/as.rsi.html | 8 +- docs/reference/atc_property.html | 2 +- docs/reference/count.html | 28 +- docs/reference/eucast_rules.html | 2 +- docs/reference/first_isolate.html | 24 +- docs/reference/freq.html | 16 +- docs/reference/get_locale.html | 2 +- docs/reference/ggplot_rsi.html | 50 +- docs/reference/join.html | 8 +- docs/reference/key_antibiotics.html | 16 +- docs/reference/kurtosis.html | 2 +- docs/reference/like.html | 6 +- docs/reference/mdro.html | 4 +- docs/reference/microorganisms.certe.html | 6 +- docs/reference/microorganisms.html | 6 +- docs/reference/microorganisms.old.html | 4 +- docs/reference/microorganisms.umcg.html | 4 +- docs/reference/mo_failures.html | 4 +- docs/reference/mo_property.html | 28 +- docs/reference/mo_renamed.html | 4 +- docs/reference/portion.html | 58 +- docs/reference/read.4D.html | 2 +- docs/reference/resistance_predict.html | 12 +- docs/reference/rsi.html | 4 +- docs/reference/septic_patients.html | 22 +- docs/reference/skewness.html | 2 +- docs/reference/supplementary_data.html | 2 +- index.md | 41 + pkgdown/extra.css | 29 + tests/testthat/test-first_isolate.R | 22 +- vignettes/AMR.Rmd | 252 +++- 51 files changed, 1473 insertions(+), 1760 deletions(-) delete mode 100644 docs/articles/freq.html diff --git a/DESCRIPTION b/DESCRIPTION index 72f4246c..27cc5349 100755 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR Version: 0.5.0.9008 -Date: 2018-12-30 +Date: 2018-12-31 Title: Antimicrobial Resistance Analysis Authors@R: c( person( diff --git a/NEWS.md b/NEWS.md index 306e4577..f2d7bdee 100755 --- a/NEWS.md +++ b/NEWS.md @@ -2,15 +2,15 @@ **Note: this is the development version, which will eventually be released as AMR 0.6.0.** #### New -* **BREAKING**: removed deprecated functions, parameters and references to 'bactid'. Use `as.mo` to identify an MO code. +* **BREAKING**: removed deprecated functions, parameters and references to 'bactid'. Use `as.mo()` to identify an MO code. * New website: https://msberends.gitlab.io/AMR (built with the great [`pkgdown`](https://pkgdown.r-lib.org/)) * Contains the complete manual of this package and all of its functions with an explanation of their parameters * Support for [`dplyr`](https://dplyr.tidyverse.org) version 0.8.0 -* Function `mo_failures` to review values that could not be coerced to a valid MO code, using `as.mo`. This latter function will now only show a maximum of 25 uncoerced values. -* Function `mo_renamed` to get a list of all returned values from `as.mo` that have had taxonomic renaming -* Function `age` to calculate the (patients) age in years -* Function `age_groups` to split ages into custom or predefined groups (like children or elderly). This allows for easier demographic antimicrobial resistance analysis per age group. -* Functions `filter_first_isolate` and `filter_first_weighted_isolate()` to shorten and fasten filtering on data sets with antimicrobial results, e.g.: +* Function `mo_failures()` to review values that could not be coerced to a valid MO code, using `as.mo()`. This latter function will now only show a maximum of 25 uncoerced values. +* Function `mo_renamed()` to get a list of all returned values from `as.mo()` that have had taxonomic renaming +* Function `age()` to calculate the (patients) age in years +* Function `age_groups()` to split ages into custom or predefined groups (like children or elderly). This allows for easier demographic antimicrobial resistance analysis per age group. +* Functions `filter_first_isolate()` and `filter_first_weighted_isolate()` to shorten and fasten filtering on data sets with antimicrobial results, e.g.: ```r septic_patients %>% filter_first_isolate() # or @@ -25,7 +25,8 @@ ``` #### Changed -* Improvements for `as.mo`: +* Fixed a critical bug in `eucast_rules()` where some rules that depend on previous applied rules would not be applied adequately +* Improvements for `as.mo()`: * Finds better results when input is in other languages * Better handling for subspecies * Better handling for *Salmonellae* @@ -33,18 +34,18 @@ * Manual now contains more info about the algorithms * Progress bar will be shown when it takes more than 3 seconds to get results * Support for formatted console text -* Function `first_isolate`: +* Function `first_isolate()`: + * Fixed a bug where distances between dates would not be calculated right - in the `septic_patients` data set this yielded a difference of 0.15% more isolates * Will now use a column named like "patid" for the patient ID (parameter `col_patientid`), when this parameter was left blank - * Will now use a column named like "key(...)ab" or "key(...)antibiotics" for the key antibiotics (parameter `col_keyantibiotics`), when this parameter was left blank + * Will now use a column named like "key(...)ab" or "key(...)antibiotics" for the key antibiotics (parameter `col_keyantibiotics()`), when this parameter was left blank * Removed parameter `output_logical`, the function will now always return a logical value * Renamed parameter `filter_specimen` to `specimen_group`, although using `filter_specimen` will still work * A note to the manual pages of the `portion` functions, that low counts can influence the outcome and that the `portion` functions may camouflage this, since they only return the portion (albeit being dependent on the `minimum` parameter) -* Function `mo_taxonomy` now contains the kingdom too -* Function `first_isolate` will now use a column named like "patid" for the patient ID, when this parameter was left blank -* Reduce false positives for `is.rsi.eligible` +* Function `mo_taxonomy()` now contains the kingdom too +* Reduce false positives for `is.rsi.eligible()` * Summaries of class `mo` will now return the top 3 and the unique count, e.g. using `summary(mo)` * Small text updates to summaries of class `rsi` and `mic` -* Frequency tables (`freq` function): +* Frequency tables (`freq()` function): * Header info is now available as a list, with the `header` function * Added header info for class `mo` to show unique count of families, genera and species * Now honours the `decimal.mark` setting, which just like `format` defaults to `getOption("OutDec")` @@ -52,10 +53,10 @@ * Fix for header text where all observations are `NA` * New parameter `droplevels` to exclude empty factor levels when input is a factor * Factor levels will be in header when present in input data -* Function `scale_y_percent` now contains the `limits` parameter -* Automatic parameter filling for `mdro`, `key_antibiotics` and `eucast_rules` -* Updated examples for resistance prediction (`resistance_predict` function) -* Fix for `as.mic` to support more values ending in (several) zeroes +* Function `scale_y_percent()` now contains the `limits` parameter +* Automatic parameter filling for `mdro()`, `key_antibiotics()` and `eucast_rules()` +* Updated examples for resistance prediction (`resistance_predict()` function) +* Fix for `as.mic()` to support more values ending in (several) zeroes #### Other * Updated licence text to emphasise GPL 2.0 and that this is an R package. diff --git a/R/eucast_rules.R b/R/eucast_rules.R index 38167f36..68fcc056 100755 --- a/R/eucast_rules.R +++ b/R/eucast_rules.R @@ -348,7 +348,7 @@ eucast_rules <- function(tbl, # helper function for editing the table edit_rsi <- function(to, rule, rows, cols) { - cols <- cols[!is.na(cols)] + cols <- unique(cols[!is.na(cols)]) if (length(rows) > 0 & length(cols) > 0) { before <- as.character(unlist(as.list(tbl_original[rows, cols]))) tryCatch( @@ -367,6 +367,10 @@ eucast_rules <- function(tbl, stop(e, call. = FALSE) } ) + suppressMessages( + suppressWarnings( + tbl[rows, cols] <<- to + )) after <- as.character(unlist(as.list(tbl_original[rows, cols]))) amount_changed <<- amount_changed + sum(before != after, na.rm = TRUE) amount_affected_rows <<- unique(c(amount_affected_rows, rows)) @@ -404,27 +408,14 @@ eucast_rules <- function(tbl, # join to microorganisms data set tbl <- tbl %>% mutate_at(vars(col_mo), as.mo) %>% - left_join_microorganisms(by = col_mo, suffix = c("_oldcols", "")) - - # antibiotic classes - aminoglycosides <- c(tobr, gent, kana, neom, neti, siso) - tetracyclines <- c(doxy, mino, tetr) # since EUCAST v3.1 tige(cycline) is set apart - polymyxins <- c(poly, coli) - macrolides <- c(eryt, azit, roxi, clar) # since EUCAST v3.1 clinda is set apart - glycopeptides <- c(vanc, teic) - streptogramins <- c(qida, pris) # should officially also be quinupristin/dalfopristin - cephalosporins <- c(cfep, cfot, cfox, cfra, cfta, cftr, cfur, czol) - carbapenems <- c(erta, imip, mero) - aminopenicillins <- c(ampi, amox) - ureidopenicillins <- c(pipe, pita, azlo, mezl) - fluoroquinolones <- c(oflo, cipr, norf, levo, moxi) - all_betalactam <- c(aminopenicillins, ureidopenicillins, cephalosporins, carbapenems, amcl, oxac, clox, peni) + left_join_microorganisms(by = col_mo, suffix = c("_oldcols", "")) %>% + as.data.frame(stringsAsFactors = FALSE) if (info == TRUE) { - cat("Rules by the European Committee on Antimicrobial Susceptibility Testing (EUCAST)\n") + cat("\nRules by the European Committee on Antimicrobial Susceptibility Testing (EUCAST)\n") } - # since ampicillin ^= amoxicillin, get the first from the latter (not in original table) + # since ampicillin ^= amoxicillin, get the first from the latter (not in original EUCAST table) if (!is.na(ampi) & !is.na(amox)) { if (verbose == TRUE) { cat(bgGreen("\n VERBOSE: transforming", @@ -440,8 +431,26 @@ eucast_rules <- function(tbl, tbl[which(tbl[, amox] == "S" & !tbl[, ampi] %in% c("S", "I", "R")), ampi] <- "S" tbl[which(tbl[, amox] == "I" & !tbl[, ampi] %in% c("S", "I", "R")), ampi] <- "I" tbl[which(tbl[, amox] == "R" & !tbl[, ampi] %in% c("S", "I", "R")), ampi] <- "R" + } else if (is.na(ampi) & !is.na(amox)) { + # ampicillin column is missing, but amoxicillin is available + message(blue(paste0("NOTE: Using column `", bold(amox), "` as input for ampicillin (J01CA01) since many EUCAST rules depend on it."))) + ampi <- amox } + # antibiotic classes + aminoglycosides <- c(tobr, gent, kana, neom, neti, siso) + tetracyclines <- c(doxy, mino, tetr) # since EUCAST v3.1 tige(cycline) is set apart + polymyxins <- c(poly, coli) + macrolides <- c(eryt, azit, roxi, clar) # since EUCAST v3.1 clinda is set apart + glycopeptides <- c(vanc, teic) + streptogramins <- c(qida, pris) # should officially also be quinupristin/dalfopristin + cephalosporins <- c(cfep, cfot, cfox, cfra, cfta, cftr, cfur, czol) + carbapenems <- c(erta, imip, mero) + aminopenicillins <- c(ampi, amox) + ureidopenicillins <- c(pipe, pita, azlo, mezl) + fluoroquinolones <- c(oflo, cipr, norf, levo, moxi) + all_betalactam <- c(aminopenicillins, ureidopenicillins, cephalosporins, carbapenems, amcl, oxac, clox, peni) + if (any(c("all", "breakpoints") %in% rules)) { # BREAKPOINTS ------------------------------------------------------------- diff --git a/R/first_isolate.R b/R/first_isolate.R index 1b78ddc3..6f5815fd 100755 --- a/R/first_isolate.R +++ b/R/first_isolate.R @@ -380,7 +380,7 @@ first_isolate <- function(tbl, ) } - # suppress warnings because dplyr want us to use library(dplyr) when using filter(row_number()) + # suppress warnings because dplyr wants us to use library(dplyr) when using filter(row_number()) suppressWarnings( scope.size <- tbl %>% filter( @@ -391,17 +391,46 @@ first_isolate <- function(tbl, nrow() ) + identify_new_year = function(x, episode_days) { + # I asked on StackOverflow: + # https://stackoverflow.com/questions/42122245/filter-one-row-every-year + if (length(x) == 1) { + return(TRUE) + } + indices = integer(0) + start = x[1] + ind = 1 + indices[ind] = ind + for (i in 2:length(x)) { + if (as.numeric(x[i] - start >= episode_days)) { + ind = ind + 1 + indices[ind] = i + start = x[i] + } + } + result <- rep(FALSE, length(x)) + result[indices] <- TRUE + return(result) + } + # Analysis of first isolate ---- all_first <- tbl %>% mutate(other_pat_or_mo = if_else(patient_id == lag(patient_id) & genus == lag(genus) & species == lag(species), FALSE, - TRUE), - days_diff = 0) %>% - mutate(days_diff = if_else(other_pat_or_mo == FALSE, - (date_lab - lag(date_lab)) + lag(days_diff), - 0)) + TRUE)) %>% #, + # days_diff = 0) %>% + # mutate(days_diff = if_else(other_pat_or_mo == FALSE, + # as.integer((date_lab - lag(date_lab)) + lag(days_diff)), + # as.integer(0))) %>% + # mutate(r = days_diff) %>% + group_by_at(vars(patient_id, + genus, + species)) %>% + mutate(more_than_episode_ago = identify_new_year(x = date_lab, + episode_days = episode_days)) %>% + ungroup() weighted.notice <- '' if (!is.null(col_keyantibiotics)) { @@ -436,13 +465,12 @@ first_isolate <- function(tbl, between(row_number(), row.start, row.end) & genus != "" & species != "" - & (other_pat_or_mo - | days_diff >= episode_days - | key_ab_other), + & (other_pat_or_mo | more_than_episode_ago | key_ab_other), TRUE, FALSE)) ) } else { + # no key antibiotics # suppress warnings because dplyr want us to use library(dplyr) when using filter(row_number()) suppressWarnings( all_first <- all_first %>% @@ -452,8 +480,7 @@ first_isolate <- function(tbl, between(row_number(), row.start, row.end) & genus != "" & species != "" - & (other_pat_or_mo - | days_diff >= episode_days), + & (other_pat_or_mo | more_than_episode_ago), TRUE, FALSE)) ) diff --git a/R/freq.R b/R/freq.R index c71b1115..077ebefa 100755 --- a/R/freq.R +++ b/R/freq.R @@ -573,9 +573,9 @@ format_header <- function(x, markdown = FALSE, decimal.mark = ".", big.mark = ", n_levels_list <- c(n_levels_list[1:5], "...") } if (header$ordered == TRUE) { - n_levels_list <- paste0(header$levels, collapse = " < ") + n_levels_list <- paste0(n_levels_list, collapse = " < ") } else { - n_levels_list <- paste0(header$levels, collapse = ", ") + n_levels_list <- paste0(n_levels_list, collapse = ", ") } header$levels <- n_levels_list header <- header[names(header) != "ordered"] @@ -824,7 +824,7 @@ print.frequency_tbl <- function(x, } } else if (opt$tbl_format == "markdown") { # do print title as caption in markdown - cat("\n", title, sep = "") + cat("\n", title, " ", sep = "") # two trailing spaces for markdown } if (NROW(x) == 0) { diff --git a/README.md b/README.md index f5de69cd..8c4a9b93 100755 --- a/README.md +++ b/README.md @@ -56,10 +56,10 @@ The `AMR` package basically does four important things: 2. It **enhances existing data** and **adds new data** from data sets included in this package. - * Use `EUCAST_rules` to apply [EUCAST expert rules to isolates](http://www.eucast.org/expert_rules_and_intrinsic_resistance/). + * Use `eucast_rules` to apply [EUCAST expert rules to isolates](http://www.eucast.org/expert_rules_and_intrinsic_resistance/). * Use `first_isolate` to identify the first isolates of every patient [using guidelines from the CLSI](https://clsi.org/standards/products/microbiology/documents/m39/) (Clinical and Laboratory Standards Institute). * You can also identify first *weighted* isolates of every patient, an adjusted version of the CLSI guideline. This takes into account key antibiotics of every strain and compares them. - * Use `MDRO` (abbreviation of Multi Drug Resistant Organisms) to check your isolates for exceptional resistance with country-specific guidelines or EUCAST rules. Currently, national guidelines for Germany and the Netherlands are supported. + * Use `mdro` (abbreviation of Multi Drug Resistant Organisms) to check your isolates for exceptional resistance with country-specific guidelines or EUCAST rules. Currently, national guidelines for Germany and the Netherlands are supported. * The data set `microorganisms` contains the complete taxonomic tree of more than 18,000 microorganisms (bacteria, fungi/yeasts and protozoa). Furthermore, the colloquial name and Gram stain are available, which enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like `mo_genus`, `mo_family`, `mo_gramstain` or even `mo_phylum`. As they use `as.mo` internally, they also use artificial intelligence. For example, `mo_genus("MRSA")` and `mo_genus("S. aureus")` will both return `"Staphylococcus"`. They also come with support for German, Dutch, Spanish, Italian, French and Portuguese. These functions can be used to add new variables to your data. * The data set `antibiotics` contains the ATC code, LIS codes, official name, trivial name and DDD of both oral and parenteral administration. It also contains a total of 298 trade names. Use functions like `ab_name` and `ab_tradenames` to look up values. The `ab_*` functions use `as.atc` internally so they support AI to guess your expected result. For example, `ab_name("Fluclox")`, `ab_name("Floxapen")` and `ab_name("J01CF05")` will all return `"Flucloxacillin"`. These functions can again be used to add new variables to your data. diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index 405f5495..3ae90fe5 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -126,10 +126,715 @@ -

This page will soon be updated.

+

Note: values on this page will be regenerated with every website update since it is written in RMarkdown, so actual results will change over time. However, the methodology remains unchanged. This page was generated on 31 December 2018.

+
+

+Introduction

+

(work in progress)

+
+
+

+Tutorial

+

For this tutorial, we will create fake demonstration data to work with.

+

You can skip to Cleaning the data if you already have your own data ready. If you start your analysis, try to make the structure of your data generally look like this:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
datepatient_idmoamoxcipr
2018-12-31abcdEscherichia coliSS
2018-12-31abcdEscherichia coliSR
2018-12-31efghEscherichia coliRS
+
+

+Needed R packages

+

As with many uses in R, we need some additional packages for AMR analysis. The most important one is dplyr, which tremendously improves the way we work with data - it allows for a very natural way of writing syntaxes in R. Another important dependency is ggplot2. This package can be used to create beautiful plots in R.

+

Our AMR package depends on these packages and even extends their use and functions.

+
library(dplyr)   # the data science package
+library(AMR)     # this package, to simplify and automate AMR analysis
+library(ggplot2) # for appealing plots
+
+
+

+Creation of data

+

We will create some fake example data to use for analysis. For antimicrobial resistance analysis, we need at least: a patients ID, name or code of a microorganism, a date and antimicrobial results (an antibiogram). It could also include a specimen type (e.g. to filter on blood or urine), the ward type (e.g. to filter on ICUs).

+

With additional columns (like a hospital name, the patients gender of even [well-defined] clinical properties) you can do a comparative analysis, as this tutorial will demonstrate too.

+
+

+Patients

+

To start with patients, we need a unique list of patients.

+
patients <- unlist(lapply(LETTERS, paste0, 1:10))
+

The LETTERS object is available in R - it’s a vector with 26 characters: A to Z. The patients object we just created is now a vector of length 260, with values (patient IDs) varying from A1 to Z10.

+
+
+

+Dates

+

Let’s pretend that our data consists of blood cultures isolates from 1 January 2010 until 1 January 2018.

+
dates <- seq(as.Date("2010-01-01"), as.Date("2018-01-01"), by = "day")
+

This dates object now contains all days in our date range.

+
+
+

+Microorganisms

+

For this tutorial, we will uses four different microorganisms: Escherichia coli, Staphylococcus aureus, Streptococcus pneumoniae, and Klebsiella pneumoniae:

+
bacteria <- c("Escherichia coli", "Staphylococcus aureus",
+              "Streptococcus pneumoniae", "Klebsiella pneumoniae")
+
+
+

+Other variables

+

For completeness, we can also add the patients gender, the hospital where the patients was admitted and all valid antibmicrobial results:

+
genders <- c("M", "F")
+hospitals <- c("Hospital A", "Hospital B", "Hospital C", "Hospital D")
+ab_interpretations <- c("S", "I", "R")
+
+
+

+Put everything together

+

Using the sample() function, we can randomly select items from all objects we defined earlier. To let our fake data reflect reality a bit, we will also approximately define the probabilities of bacteria and the antibiotic results with the prob parameter.

+
data <- data.frame(date = sample(dates, 5000, replace = TRUE),
+                   patient_id = sample(patients, 5000, replace = TRUE),
+                   # gender - add slightly more men:
+                   gender = sample(genders, 5000, replace = TRUE, prob = c(0.55, 0.45)),
+                   hospital = sample(hospitals, 5000, replace = TRUE),
+                   bacteria = sample(bacteria, 5000, replace = TRUE, prob = c(0.50, 0.25, 0.15, 0.10)),
+                   amox = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.6, 0.05, 0.35)),
+                   amcl = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.75, 0.1, 0.15)),
+                   cipr = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.8, 0, 0.2)),
+                   gent = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.92, 0, 0.07))
+                   )
+

The resulting data set contains 5,000 blood culture isolates. With the head() function we can preview the first 6 values of this data set:

+
head(data)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
datepatient_idgenderhospitalbacteriaamoxamclciprgent
2017-01-24M8FHospital DStreptococcus pneumoniaeIRSS
2016-12-18J6MHospital AEscherichia coliSSSS
2015-06-29E1MHospital CEscherichia coliRRSS
2013-02-28B1MHospital CEscherichia coliRSSS
2013-05-19N8MHospital AStreptococcus pneumoniaeRSRS
2014-04-02M2MHospital DStaphylococcus aureusSSRS
+

Now, let’s start the cleaning and the analysis!

+
+
+
+

+Cleaning the data

+

Use the frequency table function freq() to look specifically for unique values in every variables. For example, for the gender variable:

+
data %>% freq(gender) # this would be the same: freq(data$gender)
+
# Frequency table of `gender` 
+# Class:   factor (numeric)  
+# Levels:  F, M  
+# Length:  5,000 (of which NA: 0 = 0.00%)  
+# Unique:  2
+# 
+#      Item    Count   Percent   Cum. Count   Cum. Percent
+# ---  -----  ------  --------  -----------  -------------
+# 1    M       2,773     55.5%        2,773          55.5%
+# 2    F       2,227     44.5%        5,000         100.0%
+

So, we can draw at least two conclusions immediately. From a data scientist perspective, the data looks clean: only values M and F. From a researcher perspective: there are slightly more men. Nothing we didn’t already know.

+

The data is already quite clean, but we still need to transform some variables. The bacteria column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate function of the dplyr package makes this really easy:

+
data <- data %>%
+  mutate(bacteria = as.mo(bacteria))
+

We also want to transform the antibiotics, because in real life data we don’t know if they are really clean. The as.rsi() function ensures reliability and reproducibility in these kind of variables. The mutate_at() will run the as.rsi() function on defined variables:

+
data <- data %>%
+  mutate_at(vars(amox:cipr), as.rsi)
+

Finally, we will apply EUCAST rules on our antimicrobial results. In Europe, most medical microbiological laboratories already apply these rules. Our package features their latest insights on intrinsic resistance and exceptional phenotypes. Moreover, the eucast_rules() function can also apply additional rules, like forcing ampicillin = R when amoxicillin/clavulanic acid = R.

+

Because the amoxicillin (column amox) and amoxicillin/clavulanic acid (column amcl) in our data were generated randomly, some rows will undoubtedly contain amox = S and amcl = R, which is technically impossible. The eucast_rules() fixes this:

+
data <- eucast_rules(data, col_mo = "bacteria")
+# 
+# Rules by the European Committee on Antimicrobial Susceptibility Testing (EUCAST)
+# 
+# EUCAST Clinical Breakpoints (v8.1, 2018)
+# Enterobacteriales (Order) (no changes)
+# Staphylococcus (no changes)
+# Enterococcus (no changes)
+# Streptococcus groups A, B, C, G (no changes)
+# Streptococcus pneumoniae (386 changes)
+# Viridans group streptococci (no changes)
+# Haemophilus influenzae (no changes)
+# Moraxella catarrhalis (no changes)
+# Anaerobic Gram positives (no changes)
+# Anaerobic Gram negatives (no changes)
+# Pasteurella multocida (no changes)
+# Campylobacter jejuni and C. coli (no changes)
+# Aerococcus sanguinicola and A. urinae (no changes)
+# Kingella kingae (no changes)
+# 
+# EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
+# Table 1:  Intrinsic resistance in Enterobacteriaceae (342 changes)
+# Table 2:  Intrinsic resistance in non-fermentative Gram-negative bacteria (no changes)
+# Table 3:  Intrinsic resistance in other Gram-negative bacteria (no changes)
+# Table 4:  Intrinsic resistance in Gram-positive bacteria (705 changes)
+# Table 8:  Interpretive rules for B-lactam agents and Gram-positive cocci (no changes)
+# Table 9:  Interpretive rules for B-lactam agents and Gram-negative rods (no changes)
+# Table 10: Interpretive rules for B-lactam agents and other Gram-negative bacteria (no changes)
+# Table 11: Interpretive rules for macrolides, lincosamides, and streptogramins (no changes)
+# Table 12: Interpretive rules for aminoglycosides (no changes)
+# Table 13: Interpretive rules for quinolones (no changes)
+# 
+# Other rules
+# Non-EUCAST: ampicillin = R where amoxicillin/clav acid = R (364 changes)
+# Non-EUCAST: piperacillin = R where piperacillin/tazobactam = R (no changes)
+# Non-EUCAST: trimethoprim = R where trimethoprim/sulfa = R (no changes)
+# Non-EUCAST: amoxicillin/clav acid = S where ampicillin = S (211 changes)
+# Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no changes)
+# Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no changes)
+# 
+# => EUCAST rules affected 4,626 out of 5,000 rows -> changed 2,008 test results.
+
+
+

+Adding new variables

+

Now we have the microbial ID, we can add some taxonomic properties:

+
data <- data %>% 
+  mutate(gramstain = mo_gramstain(bacteria),
+         family = mo_family(bacteria))
+
+

+First isolates

+

We also need to know which isolates we can actually use for analysis.

+

To conduct an analysis of antimicrobial resistance, you must only include the first isolate of every patient per episode. If you would not do this, you could easily get an overestimate or underestimate of the resistance of an antibiotic. Imagine that a patient was admitted with an MRSA and that it was found in 5 different blood cultures the following weeks (yes, some countries like the Netherlands have these blood drawing policies). The resistance percentage of oxacillin of all isolates would be overestimated, because you included this MRSA more than once. It would clearly be .

+

The Clinical and Laboratory Standards Institute (CLSI) appoints this as follows:

+
+

(…) When preparing a cumulative antibiogram to guide clinical decisions about empirical antimicrobial therapy of initial infections, only the first isolate of a given species per patient, per analysis period (eg, one year) should be included, irrespective of body site, antimicrobial susceptibility profile, or other phenotypical characteristics (eg, biotype). The first isolate is easily identified, and cumulative antimicrobial susceptibility test data prepared using the first isolate are generally comparable to cumulative antimicrobial susceptibility test data calculated by other methods, providing duplicate isolates are excluded. Chapter 6.4, M39-A4 Analysis and Presentation of Cumulative Antimicrobial Susceptibility Test Data, 4th Edition. CLSI, 2014. https://clsi.org/standards/products/microbiology/documents/m39/

+
+

This AMR package includes this methodology with the first_isolate() function. It adopts the episode of a year (can be changed by user) and it starts counting days after every selected isolate. This new variable can easily be added to our data:

+ +

So only 58.3% is suitable for resistance analysis! We can now filter on is with the filter() function, also from the dplyr package:

+ +

For future use, the above two syntaxes can be shortened with the filter_first_isolate() function:

+ +
+
+

+First weighted isolates

+

We made a slight twist to the CLSI algorithm, to take into account antimicrobial results. Imagine this data, sorted on date:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
isolatedatepatient_idbacteriaamoxamclciprgentfirst
12010-07-19S6B_ESCHR_COLSSSSTRUE
22010-10-13S6B_ESCHR_COLSSSSFALSE
32010-12-24S6B_ESCHR_COLRISSFALSE
42011-01-02S6B_ESCHR_COLRISRFALSE
52011-01-23S6B_ESCHR_COLSSSSFALSE
62011-05-16S6B_ESCHR_COLSSSSFALSE
72011-10-13S6B_ESCHR_COLRSSSTRUE
82012-03-25S6B_ESCHR_COLRISSFALSE
92012-09-01S6B_ESCHR_COLRSSSFALSE
102012-10-04S6B_ESCHR_COLSSSSFALSE
+

Only 2 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and show be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics() function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.

+

If a column exists with a name like ‘key(…)ab’ the first_isolate() function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
isolatedatepatient_idbacteriaamoxamclciprgentfirstfirst_weighted
12010-07-19S6B_ESCHR_COLSSSSTRUETRUE
22010-10-13S6B_ESCHR_COLSSSSFALSEFALSE
32010-12-24S6B_ESCHR_COLRISSFALSETRUE
42011-01-02S6B_ESCHR_COLRISRFALSETRUE
52011-01-23S6B_ESCHR_COLSSSSFALSETRUE
62011-05-16S6B_ESCHR_COLSSSSFALSEFALSE
72011-10-13S6B_ESCHR_COLRSSSTRUETRUE
82012-03-25S6B_ESCHR_COLRISSFALSEFALSE
92012-09-01S6B_ESCHR_COLRSSSFALSEFALSE
102012-10-04S6B_ESCHR_COLSSSSFALSETRUE
+

Instead of 2, now 6 isolates are flagged. In total, 86.4% of all isolates are marked ‘first weighted’ - 28.1% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.

+

As with filter_first_isolate(), there’s a shortcut for this new algorithm too:

+ +

So we end up with 4,318 isolates for analysis.

+

We can remove unneeded columns:

+ +

Now our data looks like:

+
head(data_1st)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
datepatient_idgenderhospitalbacteriaamoxamclciprgentgramstainfamilyfirst_weighted
12017-01-24M8FHospital DB_STRPTC_PNEIISRGram positiveStreptococcaceaeTRUE
22016-12-18J6MHospital AB_ESCHR_COLSSSSGram negativeEnterobacteriaceaeTRUE
32015-06-29E1MHospital CB_ESCHR_COLRRSSGram negativeEnterobacteriaceaeTRUE
42013-02-28B1MHospital CB_ESCHR_COLRSSSGram negativeEnterobacteriaceaeTRUE
62014-04-02M2MHospital DB_STPHY_AURSSRSGram positiveStaphylococcaceaeTRUE
72015-11-12E2MHospital BB_KLBSL_PNERSRSGram negativeEnterobacteriaceaeTRUE
+

Time for the analysis!

+
+
+
+

+Analysing the data

+

(work in progress)

+
+
diff --git a/docs/articles/freq.html b/docs/articles/freq.html deleted file mode 100644 index 50697aab..00000000 --- a/docs/articles/freq.html +++ /dev/null @@ -1,1434 +0,0 @@ - - - - - - - -Creating Frequency Tables • AMR (for R) - - - - - - - - - - - - - - - - - - - -
-
- - - -
-
- - - - -
-

-Introduction

-

Frequency tables (or frequency distributions) are summaries of the distribution of values in a sample. With the freq function, you can create univariate frequency tables. Multiple variables will be pasted into one variable, so it forces a univariate distribution. We take the septic_patients dataset (included in this AMR package) as example.

-
-
-

-Frequencies of one variable

-

To only show and quickly review the content of one variable, you can just select this variable in various ways. Let’s say we want to get the frequencies of the gender variable of the septic_patients dataset:

- -

Frequency table of gender

- - - - - - - - - - - - - - - - - - - - - - - - - - - -
ItemCountPercentCum. CountCum. Percent
1M1,03151.6%1,03151.6%
2F96948.5%2,000100.0%
-

This immediately shows the class of the variable, its length and availability (i.e. the amount of NA), the amount of unique values and (most importantly) that among septic patients men are more prevalent than women.

-
-
-

-Frequencies of more than one variable

-

Multiple variables will be pasted into one variable to review individual cases, keeping a univariate frequency table.

-

For illustration, we could add some more variables to the septic_patients dataset to learn about bacterial properties:

- -

Now all variables of the microorganisms dataset have been joined to the septic_patients dataset. The microorganisms dataset consists of the following variables:

- -

If we compare the dimensions between the old and new dataset, we can see that these 14 variables were added:

-
dim(septic_patients)
-[1] 2000   49
-dim(my_patients)
-[1] 2000   63
-

So now the genus and species variables are available. A frequency table of these combined variables can be created like this:

-
my_patients %>% freq(genus, species)
-

Frequency table of genus and species


ItemCountPercentCum. CountCum. Percent
1Escherichia coli46723.4%46723.4%
2Staphylococcus coagulase negative31315.7%78039.0%
3Staphylococcus aureus23511.8%1,01550.7%
4Staphylococcus epidermidis1748.7%1,18959.5%
5Streptococcus pneumoniae1175.9%1,30665.3%
6Staphylococcus hominis814.1%1,38769.4%
7Klebsiella pneumoniae582.9%1,44572.3%
8Enterococcus faecalis392.0%1,48474.2%
9Proteus mirabilis361.8%1,52076.0%
10Pseudomonas aeruginosa301.5%1,55077.5%
11Serratia marcescens251.3%1,57578.8%
12Enterobacter cloacae231.2%1,59879.9%
13Enterococcus faecium211.1%1,61981.0%
14Staphylococcus capitis211.1%1,64082.0%
15Bacteroides fragilis201.0%1,66083.0%
16Enterococcus species201.0%1,68084.0%
17Streptococcus group B180.9%1,69884.9%
18Klebsiella oxytoca160.8%1,71485.7%
19Streptococcus pyogenes160.8%1,73086.5%
20Streptococcus dysgalactiae140.7%1,74487.2%
21Streptococcus group A130.7%1,75787.9%
22Streptococcus mitis130.7%1,77088.5%
23Streptococcus salivarius120.6%1,78289.1%
24Streptococcus agalactiae110.6%1,79389.7%
25Streptococcus species110.6%1,80490.2%
26Corynebacterium species100.5%1,81490.7%
27Streptococcus bovis100.5%1,82491.2%
28Clostridium difficile90.5%1,83391.7%
29Haemophilus influenzae80.4%1,84192.1%
30Candida albicans70.4%1,84892.4%
31Staphylococcus haemolyticus70.4%1,85592.8%
32Streptococcus constellatus70.4%1,86293.1%
33Candida glabrata60.3%1,86893.4%
34Citrobacter freundii60.3%1,87493.7%
35Corynebacterium striatum60.3%1,88094.0%
36Morganella morganii60.3%1,88694.3%
37Streptococcus anginosus60.3%1,89294.6%
38Streptococcus oralis60.3%1,89894.9%
39Acinetobacter baumannii30.2%1,90195.1%
40Acinetobacter species30.2%1,90495.2%
41Citrobacter koseri30.2%1,90795.4%
42Clostridium perfringens30.2%1,91095.5%
43Clostridium septicum30.2%1,91395.7%
44Enterobacter aerogenes30.2%1,91695.8%
45Gemella haemolysans30.2%1,91996.0%
46Micrococcus luteus30.2%1,92296.1%
47Micrococcus species30.2%1,92596.3%
48Salmonella enterica30.2%1,92896.4%
49Staphylococcus warneri30.2%1,93196.6%
50Streptococcus equi30.2%1,93496.7%
51Streptococcus group C30.2%1,93796.9%
52Streptococcus group G30.2%1,94097.0%
53Streptococcus intermedius30.2%1,94397.2%
54Streptococcus parasanguinis30.2%1,94697.3%
55Aerococcus urinae20.1%1,94897.4%
56Candida tropicalis20.1%1,95097.5%
57Citrobacter species20.1%1,95297.6%
58Enterococcus avium20.1%1,95497.7%
59Pantoea agglomerans20.1%1,95697.8%
60Pantoea species20.1%1,95897.9%
61Proteus vulgaris20.1%1,96098.0%
62Staphylococcus cohnii20.1%1,96298.1%
63Staphylococcus lugdunensis20.1%1,96498.2%
64Staphylococcus schleiferi20.1%1,96698.3%
65Stenotrophomonas maltophilia20.1%1,96898.4%
66Streptococcus mutans20.1%1,97098.5%
67Actinomyces odontolyticus10.1%1,97198.6%
68Campylobacter jejuni10.1%1,97298.6%
69Candida lusitaniae10.1%1,97398.7%
70Clostridium novyi10.1%1,97498.7%
71Corynebacterium tuberculostearicum10.1%1,97598.8%
72Dermabacter hominis10.1%1,97698.8%
73Eikenella corrodens10.1%1,97798.9%
74Enterococcus casseliflavus10.1%1,97898.9%
75Escherichia vulneris10.1%1,97999.0%
76Fusobacterium species10.1%1,98099.0%
77Globicatella sanguinis10.1%1,98199.1%
78Granulicatella adiacens10.1%1,98299.1%
79Haemophilus parainfluenzae10.1%1,98399.2%
80Hafnia alvei10.1%1,98499.2%
81Lactobacillus delbrueckii10.1%1,98599.3%
82Leuconostoc species10.1%1,98699.3%
83Listeria monocytogenes10.1%1,98799.4%
84Neisseria meningitidis10.1%1,98899.4%
85Neisseria sicca10.1%1,98999.5%
86Paenibacillus durus10.1%1,99099.5%
87Propionibacterium acnes10.1%1,99199.6%
88Proteus penneri10.1%1,99299.6%
89Rothia mucilaginosa10.1%1,99399.7%
90Sphingobacterium spiritivorum10.1%1,99499.7%
91Sphingomonas paucimobilis10.1%1,99599.8%
92Streptococcus equinus10.1%1,99699.8%
93Streptococcus gordonii10.1%1,99799.9%
94Streptococcus infantarius10.1%1,99899.9%
95Streptococcus sanguinis10.1%1,999100.0%
96Veillonella parvula10.1%2,000100.0%
-
-
-

-Frequencies of numeric values

-

Frequency tables can be created of any input.

-

In case of numeric values (like integers, doubles, etc.) additional descriptive statistics will be calculated and shown into the header. When creating frequency tables automatically (like here in markdown), add header = TRUE to also show the header in markdown reports:

- -

Frequency table of age

-

Class: numeric

-

Length: 981 (of which NA: 0 = 0.00%)

-

Unique: 73

-

Mean: 71

-

Std. dev.: 14 (CV: 0.2, MAD: 13)

-

Five-Num: 14 | 63 | 74 | 82 | 97 (IQR: 19, CQV: 0.13)

-

Outliers: 15 (unique count: 12)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ItemCountPercentCum. CountCum. Percent
183444.5%444.5%
276434.4%878.9%
375373.8%12412.6%
482333.4%15716.0%
578323.3%18919.3%
-

(omitted 68 entries, n = 792 [80.7%])

-

So the following properties are determined, where NA values are always ignored:

-
    -
  • Mean

  • -
  • Standard deviation

  • -
  • Coefficient of variation (CV), the standard deviation divided by the mean

  • -
  • Five numbers of Tukey (min, Q1, median, Q3, max)

  • -
  • Coefficient of quartile variation (CQV, sometimes called coefficient of dispersion), calculated as (Q3 - Q1) / (Q3 + Q1) using quantile with type = 6 as quantile algorithm to comply with SPSS standards

  • -
  • Outliers (total count and unique count)

  • -
-

So for example, the above frequency table quickly shows the median age of patients being 74.

-
-
-

-Frequencies of factors

-

Frequencies of factors can be sorted on factor level instead of item count with the sort.count parameter.

-

Default behaviour:

- -

Frequency table of hospital_id

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ItemCountPercentCum. CountCum. Percent
1D76238.1%76238.1%
2B66333.2%1,42571.3%
3A32116.1%1,74687.3%
4C25412.7%2,000100.0%
-

Sorting on item instead of count:

-
septic_patients %>%
-  freq(hospital_id, sort.count = FALSE)
-

Frequency table of hospital_id

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ItemCountPercentCum. CountCum. Percent
1A32116.1%32116.1%
2B66333.2%98449.2%
3C25412.7%1,23861.9%
4D76238.1%2,000100.0%
-

All classes will be printed into the header. Variables with the new rsi class of this AMR package are actually ordered factors and have three classes (look at Class in the header):

-
septic_patients %>%
-  freq(amox, header = TRUE)
-

Frequency table of amox

-

Class: factor > ordered > rsi (numeric)

-

Levels: S < I < R

-

Length: 2,000 (of which NA: 828 = 41.40%)

-

Unique: 3

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ItemCountPercentCum. CountCum. Percent
1R68358.3%68358.3%
2S48641.5%1,16999.7%
3I30.3%1,172100.0%
-
-
-

-Frequencies of dates

-

Frequencies of dates will show the oldest and newest date in the data, and the amount of days between them:

-
septic_patients %>%
-  freq(date, nmax = 5, header = TRUE)
-

Frequency table of date

-

Class: Date (numeric)

-

Length: 2,000 (of which NA: 0 = 0.00%)

-

Unique: 1,140

-

Oldest: 2 januari 2002

-

Newest: 28 december 2017 (+5,839)

-

Median: 31 juli 2009 (~47%)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ItemCountPercentCum. CountCum. Percent
12016-05-21100.5%100.5%
22004-11-1580.4%180.9%
32013-07-2980.4%261.3%
42017-06-1280.4%341.7%
52015-11-1970.4%412.1%
-

(omitted 1,135 entries, n = 1,959 [98.0%])

-
-
-

-Assigning a frequency table to an object

-

A frequency table is actually a regular data.frame, with the exception that it contains an additional class.

- - -

Because of this additional class, a frequency table prints like the examples above. But the object itself contains the complete table without a row limitation:

-
dim(my_df)
-[1] 74  5
-
-
-

-Additional parameters

-
-

-Parameter na.rm -

-

With the na.rm parameter (defaults to TRUE, but they will always be shown into the header), you can include NA values in the frequency table:

- -

Frequency table of amox

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ItemCountPercentCum. CountCum. Percent
182841.4%82841.4%
2R68334.2%1,51175.6%
3S48624.3%1,99799.9%
4I30.2%2,000100.0%
-
-
-

-Parameter row.names -

-

The default frequency tables shows row indices. To remove them, use row.names = FALSE:

-
septic_patients %>%
-  freq(hospital_id, row.names = FALSE)
-

Frequency table of hospital_id

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ItemCountPercentCum. CountCum. Percent
D76238.1%76238.1%
B66333.2%1,42571.3%
A32116.1%1,74687.3%
C25412.7%2,000100.0%
-
-

AMR, (c) 2018, https://msberends.gitlab.io/AMR,https://gitlab.com/msberends/AMR

-

Licensed under the GNU General Public License v2.0.

-
-
-
- - - -
- - - -
- - - - - diff --git a/docs/extra.css b/docs/extra.css index 5c0cf18c..b0969727 100644 --- a/docs/extra.css +++ b/docs/extra.css @@ -28,10 +28,24 @@ pre, code { font-weight: bold; background-color: transparent; } +pre { + font-size: 90% !important; +} li, p { line-height: 1.5; } +/* slightly smaller blockquote */ +blockquote { + font-size: 98%; +} + +/* 2nd list in navigation should be smaller */ +#tocnav li li { + font-size: 90%; + margin-left: 5px; +} + /* new element with dotted underline */ help { border-bottom: 1px dotted; @@ -60,3 +74,18 @@ help { #navbar .fas { margin-right: 5px; } + +/* tables */ +.table { + font-size: 90%; +} +.table td { + padding: 4px !important; +} +thead { + border-top: 2px solid black; + border-bottom: 2px solid black; +} +tbody { + border-bottom: 2px solid black; +} diff --git a/docs/index.html b/docs/index.html index 13370fe0..05bfb7a0 100644 --- a/docs/index.html +++ b/docs/index.html @@ -169,6 +169,62 @@

Get started

To find out how to conduct AMR analysis, please continue reading here to get started or click the button ‘Get Started’ in the top menu.

+ +
+

+Short introduction

+

+

This package contains the complete microbial taxonomic data (with all nine taxonomic ranks - from kingdom to subspecies) from the publicly available Integrated Taxonomic Information System (ITIS, https://www.itis.gov).

+

All (sub)species from the taxonomic kingdoms Bacteria, Fungi and Protozoa are included in this package, as well as all previously accepted names known to ITIS. Furthermore, the responsible authors and year of publication are available. This allows users to use authoritative taxonomic information for their data analysis on any microorganism, not only human pathogens. It also helps to quickly determine the Gram stain of bacteria, since all bacteria are classified into subkingdom Negibacteria or Posibacteria. ITIS is a partnership of U.S., Canadian, and Mexican agencies and taxonomic specialists.

+

The AMR package basically does four important things:

+
    +
  1. +

    It cleanses existing data, by transforming it to reproducible and profound classes, making the most efficient use of R. These functions all use artificial intelligence to guess results that you would expect:

    +
      +
    • Use as.mo() to get an ID of a microorganism. The IDs are human readable for the trained eye - the ID of Klebsiella pneumoniae is “B_KLBSL_PNE” (B stands for Bacteria) and the ID of S. aureus is “B_STPHY_AUR”. The function takes almost any text as input that looks like the name or code of a microorganism like “E. coli”, “esco” and “esccol”. Even as.mo("MRSA") will return the ID of S. aureus. Moreover, it can group all coagulase negative and positive Staphylococci, and can transform Streptococci into Lancefield groups. To find bacteria based on your input, it uses Artificial Intelligence to look up values in the included ITIS data, consisting of more than 18,000 microorganisms.
    • +
    • Use as.rsi() to transform values to valid antimicrobial results. It produces just S, I or R based on your input and warns about invalid values. Even values like “<=0.002; S” (combined MIC/RSI) will result in “S”.
    • +
    • Use as.mic() to cleanse your MIC values. It produces a so-called factor (called ordinal in SPSS) with valid MIC values as levels. A value like “<=0.002; S” (combined MIC/RSI) will result in “<=0.002”.
    • +
    • Use as.atc() to get the ATC code of an antibiotic as defined by the WHO. This package contains a database with most LIS codes, official names, DDDs and even trade names of antibiotics. For example, the values “Furabid”, “Furadantin”, “nitro” all return the ATC code of Nitrofurantoine.
    • +
    +
  2. +
  3. +

    It enhances existing data and adds new data from data sets included in this package.

    +
      +
    • Use eucast_rules() to apply EUCAST expert rules to isolates.
    • +
    • Use first_isolate() to identify the first isolates of every patient using guidelines from the CLSI (Clinical and Laboratory Standards Institute). +
        +
      • You can also identify first weighted isolates of every patient, an adjusted version of the CLSI guideline. This takes into account key antibiotics of every strain and compares them.
      • +
      +
    • +
    • Use mdro() (abbreviation of Multi Drug Resistant Organisms) to check your isolates for exceptional resistance with country-specific guidelines or EUCAST rules. Currently, national guidelines for Germany and the Netherlands are supported.
    • +
    • The data set microorganisms contains the complete taxonomic tree of more than 18,000 microorganisms (bacteria, fungi/yeasts and protozoa). Furthermore, the colloquial name and Gram stain are available, which enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like mo_genus(), mo_family(), mo_gramstain() or even mo_phylum(). As they use as.mo() internally, they also use artificial intelligence. For example, mo_genus("MRSA") and mo_genus("S. aureus") will both return "Staphylococcus". They also come with support for German, Dutch, Spanish, Italian, French and Portuguese. These functions can be used to add new variables to your data.
    • +
    • The data set antibiotics contains the ATC code, LIS codes, official name, trivial name and DDD of both oral and parenteral administration. It also contains a total of 298 trade names. Use functions like ab_name() and ab_tradenames() to look up values. The ab_* functions use as.atc() internally so they support AI to guess your expected result. For example, ab_name("Fluclox"), ab_name("Floxapen") and ab_name("J01CF05") will all return "Flucloxacillin". These functions can again be used to add new variables to your data.
    • +
    +
  4. +
  5. +

    It analyses the data with convenient functions that use well-known methods.

    + +
  6. +
  7. +

    It teaches the user how to use all the above actions.

    +
      +
    • The package contains extensive help pages with many examples.
    • +
    • It also contains an example data set called septic_patients. This data set contains: +
        +
      • 2,000 blood culture isolates from anonymised septic patients between 2001 and 2017 in the Northern Netherlands
      • +
      • Results of 40 antibiotics (each antibiotic in its own column) with a total of 38,414 antimicrobial results
      • +
      • Real and genuine data
      • +
      +
    • +
    +
  8. +

diff --git a/docs/news/index.html b/docs/news/index.html index 5ce3f2b1..954e4c2d 100644 --- a/docs/news/index.html +++ b/docs/news/index.html @@ -168,25 +168,25 @@ New

They also come with support for German, Dutch, French, Italian, Spanish and Portuguese:

-
mo_gramstain("E. coli")
+
 

Furthermore, former taxonomic names will give a note about the current taxonomic name:

-
mo_gramstain("Esc blattae")
+
 
@@ -423,15 +424,15 @@
 
 
  • Functions as.mo and is.mo as replacements for as.bactid and is.bactid (since the microoganisms data set not only contains bacteria). These last two functions are deprecated and will be removed in a future release. The as.mo function determines microbial IDs using Artificial Intelligence (AI):

    -
    as.mo("E. coli")
    +
     

    And with great speed too - on a quite regular Linux server from 2007 it takes us less than 0.02 seconds to transform 25,000 items:

    @@ -462,11 +463,11 @@
  • Added three antimicrobial agents to the antibiotics data set: Terbinafine (D01BA02), Rifaximin (A07AA11) and Isoconazole (D01AC05)
  • Added 163 trade names to the antibiotics data set, it now contains 298 different trade names in total, e.g.:

    -
    ab_official("Bactroban")
    +
    ab_official("Bactroban")
     # [1] "Mupirocin"
    -ab_name(c("Bactroban", "Amoxil", "Zithromax", "Floxapen"))
    +ab_name(c("Bactroban", "Amoxil", "Zithromax", "Floxapen"))
     # [1] "Mupirocin" "Amoxicillin" "Azithromycin" "Flucloxacillin"
    -ab_atc(c("Bactroban", "Amoxil", "Zithromax", "Floxapen"))
    +ab_atc(c("Bactroban", "Amoxil", "Zithromax", "Floxapen"))
     # [1] "R01AX06" "J01CA04" "J01FA10" "J01CF05"
  • For first_isolate, rows will be ignored when there’s no species available
  • @@ -478,13 +479,13 @@
  • Support for quasiquotation in the functions series count_* and portions_*, and n_rsi. This allows to check for more than 2 vectors or columns.

    -
  • Edited ggplot_rsi and geom_rsi so they can cope with count_df. The new fun parameter has value portion_df at default, but can be set to count_df.
  • Fix for ggplot_rsi when the ggplot2 package was not loaded
  • @@ -499,11 +500,11 @@
  • Support for types (classes) list and matrix for freq

    my_matrix = with(septic_patients, matrix(c(age, gender), ncol = 2))
    -freq(my_matrix)
    +freq(my_matrix)
  • For lists, subsetting is possible:

    my_list = list(age = septic_patients$age, gender = septic_patients$gender)
    -my_list %>% freq(age)
    -my_list %>% freq(gender)
    +my_list %>% freq(age) +my_list %>% freq(gender)
    @@ -545,7 +546,7 @@ @@ -566,7 +567,7 @@
  • Function ratio to transform a vector of values to a preset ratio
  • Support for Addins menu in RStudio to quickly insert %in% or %like% (and give them keyboard shortcuts), or to view the datasets that come with this package
  • @@ -577,13 +578,13 @@ diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml index 7d0e471f..4fecc46c 100644 --- a/docs/pkgdown.yml +++ b/docs/pkgdown.yml @@ -1,7 +1,8 @@ pandoc: 2.3.1 pkgdown: 1.3.0 pkgdown_sha: ~ -articles: [] +articles: + AMR: AMR.html urls: reference: https://msberends.gitlab.io/reference article: https://msberends.gitlab.io/articles diff --git a/docs/reference/ab_property.html b/docs/reference/ab_property.html index 97b9e9e7..c19b6499 100644 --- a/docs/reference/ab_property.html +++ b/docs/reference/ab_property.html @@ -163,7 +163,7 @@
    -

    Use these functions to return a specific property of an antibiotic from the antibiotics data set, based on their ATC code. Get such a code with as.atc.

    +

    Use these functions to return a specific property of an antibiotic from the antibiotics data set, based on their ATC code. Get such a code with as.atc.

    @@ -188,11 +188,11 @@ x -

    a (vector of a) valid atc code or any text that can be coerced to a valid atc with as.atc

    +

    a (vector of a) valid atc code or any text that can be coerced to a valid atc with as.atc

    property -

    one of the column names of one of the antibiotics data set, like "atc" and "official"

    +

    one of the column names of one of the antibiotics data set, like "atc" and "official"

    language @@ -206,7 +206,7 @@

    See also

    -

    antibiotics

    +

    Examples

    diff --git a/docs/reference/abname.html b/docs/reference/abname.html index d77f8b64..c364a262 100644 --- a/docs/reference/abname.html +++ b/docs/reference/abname.html @@ -163,7 +163,7 @@
    -

    Convert antibiotic codes to a (trivial) antibiotic name or ATC code, or vice versa. This uses the data from antibiotics.

    +

    Convert antibiotic codes to a (trivial) antibiotic name or ATC code, or vice versa. This uses the data from antibiotics.

    @@ -179,7 +179,7 @@ from, to -

    type to transform from and to. See antibiotics for its column names. WIth from = "guess" the from will be guessed from "atc", "certe" and "umcg". When using to = "atc", the ATC code will be searched using as.atc.

    +

    type to transform from and to. See antibiotics for its column names. WIth from = "guess" the from will be guessed from "atc", "certe" and "umcg". When using to = "atc", the ATC code will be searched using as.atc.

    textbetween @@ -193,11 +193,11 @@

    Source

    -

    antibiotics

    +

    antibiotics

    Details

    -

    The ab_property functions are faster and more concise, but do not support concatenated strings, like abname("AMCL+GENT".

    +

    The ab_property functions are faster and more concise, but do not support concatenated strings, like abname("AMCL+GENT".

    Examples

    diff --git a/docs/reference/age.html b/docs/reference/age.html index eb34b42a..8c5ec6d7 100644 --- a/docs/reference/age.html +++ b/docs/reference/age.html @@ -188,7 +188,7 @@

    See also

    -

    age_groups to splits age into groups

    +

    age_groups to splits age into groups

    diff --git a/docs/reference/age_groups.html b/docs/reference/age_groups.html index 3cfcdd2d..aaee4aaf 100644 --- a/docs/reference/age_groups.html +++ b/docs/reference/age_groups.html @@ -174,7 +174,7 @@ x -

    age, e.g. calculated with age

    +

    age, e.g. calculated with age

    split_at @@ -201,7 +201,7 @@

    See also

    -

    age to determine ages based on one or more reference dates

    +

    age to determine ages based on one or more reference dates

    Examples

    @@ -230,13 +230,13 @@ # resistance of ciprofloxacine per age group library(dplyr) septic_patients %>% - mutate(first_isolate = first_isolate(.)) %>% - filter(first_isolate == TRUE, - mo == as.mo("E. coli")) %>% - group_by(age_group = age_groups(age)) %>% - select(age_group, + mutate(first_isolate = first_isolate(.)) %>% + filter(first_isolate == TRUE, + mo == as.mo("E. coli")) %>% + group_by(age_group = age_groups(age)) %>% + select(age_group, cipr) %>% - ggplot_rsi(x = "age_group") + ggplot_rsi(x = "age_group") # } diff --git a/docs/reference/as.atc.html b/docs/reference/as.atc.html index 82bbf371..4bfdb41f 100644 --- a/docs/reference/as.atc.html +++ b/docs/reference/as.atc.html @@ -163,7 +163,7 @@
    -

    Use this function to determine the ATC code of one or more antibiotics. The data set antibiotics will be searched for abbreviations, official names and trade names.

    +

    Use this function to determine the ATC code of one or more antibiotics. The data set antibiotics will be searched for abbreviations, official names and trade names.

    @@ -188,13 +188,13 @@

    Details

    -

    Use the ab_property functions to get properties based on the returned ATC code, see Examples.

    +

    Use the ab_property functions to get properties based on the returned ATC code, see Examples.

    In the ATC classification system, the active substances are classified in a hierarchy with five different levels. The system has fourteen main anatomical/pharmacological groups or 1st levels. Each ATC main group is divided into 2nd levels which could be either pharmacological or therapeutic groups. The 3rd and 4th levels are chemical, pharmacological or therapeutic subgroups and the 5th level is the chemical substance. The 2nd, 3rd and 4th levels are often used to identify pharmacological subgroups when that is considered more appropriate than therapeutic or chemical subgroups. Source: https://www.whocc.no/atc/structure_and_principles/

    See also

    -

    antibiotics for the dataframe that is being used to determine ATCs.

    +

    antibiotics for the dataframe that is being used to determine ATCs.

    Examples

    @@ -212,8 +212,8 @@ # Use ab_* functions to get a specific property based on an ATC code Cipro <- as.atc("cipro") # returns `J01MA02` -ab_official(Cipro) # returns "Ciprofloxacin" -ab_umcg(Cipro) # returns "CIPR", the code used in the UMCG +ab_official(Cipro) # returns "Ciprofloxacin" +ab_umcg(Cipro) # returns "CIPR", the code used in the UMCG # } diff --git a/docs/reference/atc_property.html b/docs/reference/atc_property.html index 6aef6af5..a0a26ec3 100644 --- a/docs/reference/atc_property.html +++ b/docs/reference/atc_property.html @@ -232,7 +232,7 @@

    Examples

    # NOT RUN {
     # What's the ATC of amoxicillin?
    -guess_atc("Amoxicillin")
    +guess_atc("Amoxicillin")
     # [1] "J01CA04"
     
     # oral DDD (Defined Daily Dose) of amoxicillin
    diff --git a/docs/reference/count.html b/docs/reference/count.html
    index 9dc60832..b2849e8d 100644
    --- a/docs/reference/count.html
    +++ b/docs/reference/count.html
    @@ -191,7 +191,7 @@ count_R and count_IR can be used to count resistant isolates, count_S and count_
         
         
           ...
    -      

    one or more vectors (or columns) with antibiotic interpretations. They will be transformed internally with as.rsi if needed.

    +

    one or more vectors (or columns) with antibiotic interpretations. They will be transformed internally with as.rsi if needed.

    also_single_tested @@ -199,11 +199,11 @@ count_R and count_IR can be used to count resistant isolates, count_S and count_ data -

    a data.frame containing columns with class rsi (see as.rsi)

    +

    a data.frame containing columns with class rsi (see as.rsi)

    translate_ab -

    a column name of the antibiotics data set to translate the antibiotic abbreviations to, using abname. This can be set with getOption("get_antibiotic_names").

    +

    a column name of the antibiotics data set to translate the antibiotic abbreviations to, using abname. This can be set with getOption("get_antibiotic_names").

    combine_IR @@ -221,13 +221,13 @@ count_R and count_IR can be used to count resistant isolates, count_S and count_

    Details

    -

    These functions are meant to count isolates. Use the portion_* functions to calculate microbial resistance.

    -

    n_rsi is an alias of count_all. They can be used to count all available isolates, i.e. where all input antibiotics have an available result (S, I or R). Their use is equal to n_distinct. Their function is equal to count_S(...) + count_IR(...).

    -

    count_df takes any variable from data that has an "rsi" class (created with as.rsi) and counts the amounts of R, I and S. The resulting tidy data (see Source) data.frame will have three rows (S/I/R) and a column for each variable with class "rsi".

    +

    These functions are meant to count isolates. Use the portion_* functions to calculate microbial resistance.

    +

    n_rsi is an alias of count_all. They can be used to count all available isolates, i.e. where all input antibiotics have an available result (S, I or R). Their use is equal to n_distinct. Their function is equal to count_S(...) + count_IR(...).

    +

    count_df takes any variable from data that has an "rsi" class (created with as.rsi) and counts the amounts of R, I and S. The resulting tidy data (see Source) data.frame will have three rows (S/I/R) and a column for each variable with class "rsi".

    See also

    -

    portion_* to calculate microbial resistance and susceptibility.

    +

    portion_* to calculate microbial resistance and susceptibility.

    Examples

    @@ -251,17 +251,17 @@ count_R and count_IR can be used to count resistant isolates, count_S and count_ # calculate back to count e.g. non-susceptible isolates. # This results in the same: count_IR(septic_patients$amox) -portion_IR(septic_patients$amox) * n_rsi(septic_patients$amox) +portion_IR(septic_patients$amox) * n_rsi(septic_patients$amox) library(dplyr) septic_patients %>% - group_by(hospital_id) %>% - summarise(R = count_R(cipr), + group_by(hospital_id) %>% + summarise(R = count_R(cipr), I = count_I(cipr), S = count_S(cipr), n1 = count_all(cipr), # the actual total; sum of all three n2 = n_rsi(cipr), # same - analogous to n_distinct - total = n()) # NOT the amount of tested isolates! + total = n()) # NOT the amount of tested isolates! # Count co-resistance between amoxicillin/clav acid and gentamicin, # so we can see that combination therapy does a lot more than mono therapy. @@ -279,13 +279,13 @@ count_R and count_IR can be used to count resistant isolates, count_S and count_ # Get portions S/I/R immediately of all rsi columns septic_patients %>% - select(amox, cipr) %>% + select(amox, cipr) %>% count_df(translate = FALSE) # It also supports grouping variables septic_patients %>% - select(hospital_id, amox, cipr) %>% - group_by(hospital_id) %>% + select(hospital_id, amox, cipr) %>% + group_by(hospital_id) %>% count_df(translate = FALSE) # }
    diff --git a/docs/reference/eucast_rules.html b/docs/reference/eucast_rules.html index 6ab01938..d9174890 100644 --- a/docs/reference/eucast_rules.html +++ b/docs/reference/eucast_rules.html @@ -199,7 +199,7 @@ col_mo -

    column name of the unique IDs of the microorganisms (see mo), defaults to the first column of class mo. Values will be coerced using as.mo.

    +

    column name of the unique IDs of the microorganisms (see mo), defaults to the first column of class mo. Values will be coerced using as.mo.

    info diff --git a/docs/reference/first_isolate.html b/docs/reference/first_isolate.html index 2ec70f00..04ab01bd 100644 --- a/docs/reference/first_isolate.html +++ b/docs/reference/first_isolate.html @@ -198,7 +198,7 @@ col_mo -

    column name of the unique IDs of the microorganisms (see mo), defaults to the first column of class mo. Values will be coerced using as.mo.

    +

    column name of the unique IDs of the microorganisms (see mo), defaults to the first column of class mo. Values will be coerced using as.mo.

    col_testcode @@ -214,7 +214,7 @@ col_keyantibiotics -

    column name of the key antibiotics to determine first weighted isolates, see key_antibiotics. Defaults to the first column that starts with 'key' followed by 'ab' or 'antibiotics' (case insensitive). Use col_keyantibiotics = FALSE to prevent this.

    +

    column name of the key antibiotics to determine first weighted isolates, see key_antibiotics. Defaults to the first column that starts with 'key' followed by 'ab' or 'antibiotics' (case insensitive). Use col_keyantibiotics = FALSE to prevent this.

    episode_days @@ -291,7 +291,7 @@ To conduct an analysis of antimicrobial resistance, you should only include the

    See also

    -

    key_antibiotics

    +

    Examples

    @@ -302,11 +302,11 @@ To conduct an analysis of antimicrobial resistance, you should only include the library(dplyr) # Filter on first isolates: septic_patients %>% - mutate(first_isolate = first_isolate(., + mutate(first_isolate = first_isolate(., col_date = "date", col_patient_id = "patient_id", col_mo = "mo")) %>% - filter(first_isolate == TRUE) + filter(first_isolate == TRUE) # Which can be shortened to: septic_patients %>% @@ -317,15 +317,15 @@ To conduct an analysis of antimicrobial resistance, you should only include the # Now let's see if first isolates matter: A <- septic_patients %>% - group_by(hospital_id) %>% - summarise(count = n_rsi(gent), # gentamicin availability - resistance = portion_IR(gent)) # gentamicin resistance + group_by(hospital_id) %>% + summarise(count = n_rsi(gent), # gentamicin availability + resistance = portion_IR(gent)) # gentamicin resistance B <- septic_patients %>% filter_first_weighted_isolate() %>% # the 1st isolate filter - group_by(hospital_id) %>% - summarise(count = n_rsi(gent), # gentamicin availability - resistance = portion_IR(gent)) # gentamicin resistance + group_by(hospital_id) %>% + summarise(count = n_rsi(gent), # gentamicin availability + resistance = portion_IR(gent)) # gentamicin resistance # Have a look at A and B. # B is more reliable because every isolate is only counted once. @@ -337,7 +337,7 @@ To conduct an analysis of antimicrobial resistance, you should only include the # }# NOT RUN { # set key antibiotics to a new variable -tbl$keyab <- key_antibiotics(tbl) +tbl$keyab <- key_antibiotics(tbl) tbl$first_isolate <- first_isolate(tbl) diff --git a/docs/reference/freq.html b/docs/reference/freq.html index bbeabe99..5f6bed31 100644 --- a/docs/reference/freq.html +++ b/docs/reference/freq.html @@ -317,34 +317,34 @@ # you could also use `select` or `pull` to get your variables septic_patients %>% - filter(hospital_id == "A") %>% - select(mo) %>% + filter(hospital_id == "A") %>% + select(mo) %>% freq() # multiple selected variables will be pasted together septic_patients %>% left_join_microorganisms %>% - filter(hospital_id == "A") %>% + filter(hospital_id == "A") %>% freq(genus, species) # group a variable and analyse another septic_patients %>% - group_by(hospital_id) %>% + group_by(hospital_id) %>% freq(gender) # get top 10 bugs of hospital A as a vector septic_patients %>% - filter(hospital_id == "A") %>% + filter(hospital_id == "A") %>% freq(mo) %>% top_freq(10) # save frequency table to an object years <- septic_patients %>% - mutate(year = format(date, "%Y")) %>% + mutate(year = format(date, "%Y")) %>% freq(year) @@ -395,11 +395,11 @@ # only get selected columns septic_patients %>% freq(hospital_id) %>% - select(item, percent) + select(item, percent) septic_patients %>% freq(hospital_id) %>% - select(-count, -cum_count) + select(-count, -cum_count) # check differences between frequency tables diff --git a/docs/reference/get_locale.html b/docs/reference/get_locale.html index b19d7c1f..cd3df960 100644 --- a/docs/reference/get_locale.html +++ b/docs/reference/get_locale.html @@ -163,7 +163,7 @@
    -

    Determines the system language to be used for language-dependent output of AMR functions, like mo_gramstain and mo_type.

    +

    Determines the system language to be used for language-dependent output of AMR functions, like mo_gramstain and mo_type.

    diff --git a/docs/reference/ggplot_rsi.html b/docs/reference/ggplot_rsi.html index 8a068a95..bca73c4b 100644 --- a/docs/reference/ggplot_rsi.html +++ b/docs/reference/ggplot_rsi.html @@ -193,11 +193,11 @@ data -

    a data.frame with column(s) of class "rsi" (see as.rsi)

    +

    a data.frame with column(s) of class "rsi" (see as.rsi)

    position -

    position adjustment of bars, either "fill" (default when fun is count_df), "stack" (default when fun is portion_df) or "dodge"

    +

    position adjustment of bars, either "fill" (default when fun is count_df), "stack" (default when fun is portion_df) or "dodge"

    x @@ -221,11 +221,11 @@ translate_ab -

    a column name of the antibiotics data set to translate the antibiotic abbreviations into, using abname. Default behaviour is to translate to official names according to the WHO. Use translate_ab = FALSE to disable translation.

    +

    a column name of the antibiotics data set to translate the antibiotic abbreviations into, using abname. Default behaviour is to translate to official names according to the WHO. Use translate_ab = FALSE to disable translation.

    fun -

    function to transform data, either count_df (default) or portion_df

    +

    function to transform data, either count_df (default) or portion_df

    nrow @@ -251,9 +251,9 @@

    Details

    -

    At default, the names of antibiotics will be shown on the plots using abname. This can be set with the option get_antibiotic_names (a logical value), so change it e.g. to FALSE with options(get_antibiotic_names = FALSE).

    +

    At default, the names of antibiotics will be shown on the plots using abname. This can be set with the option get_antibiotic_names (a logical value), so change it e.g. to FALSE with options(get_antibiotic_names = FALSE).

    The functions
    -geom_rsi will take any variable from the data that has an rsi class (created with as.rsi) using fun (count_df at default, can also be portion_df) and will plot bars with the percentage R, I and S. The default behaviour is to have the bars stacked and to have the different antibiotics on the x axis.

    +geom_rsi will take any variable from the data that has an rsi class (created with as.rsi) using fun (count_df at default, can also be portion_df) and will plot bars with the percentage R, I and S. The default behaviour is to have the bars stacked and to have the different antibiotics on the x axis.

    facet_rsi creates 2d plots (at default based on S/I/R) using facet_wrap.

    scale_y_percent transforms the y axis to a 0 to 100% range using scale_continuous.

    scale_rsi_colours sets colours to the bars: green for S, yellow for I and red for R, using scale_brewer.

    @@ -268,7 +268,7 @@ library(ggplot2) # get antimicrobial results for drugs against a UTI: -ggplot(septic_patients %>% select(amox, nitr, fosf, trim, cipr)) + +ggplot(septic_patients %>% select(amox, nitr, fosf, trim, cipr)) + geom_rsi() # prettify the plot using some additional functions: @@ -282,17 +282,17 @@ # or better yet, simplify this using the wrapper function - a single command: septic_patients %>% - select(amox, nitr, fosf, trim, cipr) %>% + select(amox, nitr, fosf, trim, cipr) %>% ggplot_rsi() # get only portions and no counts: septic_patients %>% - select(amox, nitr, fosf, trim, cipr) %>% + select(amox, nitr, fosf, trim, cipr) %>% ggplot_rsi(fun = portion_df) # add other ggplot2 parameters as you like: septic_patients %>% - select(amox, nitr, fosf, trim, cipr) %>% + select(amox, nitr, fosf, trim, cipr) %>% ggplot_rsi(width = 0.5, colour = "black", size = 1, @@ -301,25 +301,25 @@ # resistance of ciprofloxacine per age group septic_patients %>% - mutate(first_isolate = first_isolate(.)) %>% - filter(first_isolate == TRUE, - mo == as.mo("E. coli")) %>% + mutate(first_isolate = first_isolate(.)) %>% + filter(first_isolate == TRUE, + mo == as.mo("E. coli")) %>% # `age_group` is also a function of this package: - group_by(age_group = age_groups(age)) %>% - select(age_group, + group_by(age_group = age_groups(age)) %>% + select(age_group, cipr) %>% ggplot_rsi(x = "age_group") # }# NOT RUN { # for colourblind mode, use divergent colours from the viridis package: septic_patients %>% - select(amox, nitr, fosf, trim, cipr) %>% + select(amox, nitr, fosf, trim, cipr) %>% ggplot_rsi() + scale_fill_viridis_d() # it also supports groups (don't forget to use the group var on `x` or `facet`): septic_patients %>% - select(hospital_id, amox, nitr, fosf, trim, cipr) %>% - group_by(hospital_id) %>% + select(hospital_id, amox, nitr, fosf, trim, cipr) %>% + group_by(hospital_id) %>% ggplot_rsi(x = hospital_id, facet = Antibiotic, nrow = 1) + @@ -329,22 +329,22 @@ # genuine analysis: check 2 most prevalent microorganisms septic_patients %>% # create new bacterial ID's, with all CoNS under the same group (Becker et al.) - mutate(mo = as.mo(mo, Becker = TRUE)) %>% + mutate(mo = as.mo(mo, Becker = TRUE)) %>% # filter on top three bacterial ID's - filter(mo %in% top_freq(freq(.$mo), 3)) %>% + filter(mo %in% top_freq(freq(.$mo), 3)) %>% # determine first isolates - mutate(first_isolate = first_isolate(., + mutate(first_isolate = first_isolate(., col_date = "date", col_patient_id = "patient_id", col_mo = "mo")) %>% # filter on first isolates - filter(first_isolate == TRUE) %>% + filter(first_isolate == TRUE) %>% # get short MO names (like "E. coli") - mutate(mo = mo_shortname(mo, Becker = TRUE)) %>% + mutate(mo = mo_shortname(mo, Becker = TRUE)) %>% # select this short name and some antiseptic drugs - select(mo, cfur, gent, cipr) %>% + select(mo, cfur, gent, cipr) %>% # group by MO - group_by(mo) %>% + group_by(mo) %>% # plot the thing, putting MOs on the facet ggplot_rsi(x = Antibiotic, facet = mo, diff --git a/docs/reference/join.html b/docs/reference/join.html index aee46a33..bac48017 100644 --- a/docs/reference/join.html +++ b/docs/reference/join.html @@ -163,7 +163,7 @@
    -

    Join the dataset microorganisms easily to an existing table or character vector.

    +

    Join the dataset microorganisms easily to an existing table or character vector.

    @@ -188,7 +188,7 @@ by -

    a variable to join by - if left empty will search for a column with class mo (created with as.mo) or will be "mo" if that column name exists in x, could otherwise be a column name of x with values that exist in microorganisms$mo (like by = "bacteria_id"), or another column in microorganisms (but then it should be named, like by = c("my_genus_species" = "fullname"))

    +

    a variable to join by - if left empty will search for a column with class mo (created with as.mo) or will be "mo" if that column name exists in x, could otherwise be a column name of x with values that exist in microorganisms$mo (like by = "bacteria_id"), or another column in microorganisms (but then it should be named, like by = c("my_genus_species" = "fullname"))

    suffix @@ -207,7 +207,7 @@

    Examples

    # NOT RUN {
    -left_join_microorganisms(as.mo("K. pneumoniae"))
    +left_join_microorganisms(as.mo("K. pneumoniae"))
     left_join_microorganisms("B_KLBSL_PNE")
     
     library(dplyr)
    @@ -216,7 +216,7 @@
     df <- data.frame(date = seq(from = as.Date("2018-01-01"),
                                 to = as.Date("2018-01-07"),
                                 by = 1),
    -                 bacteria = as.mo(c("S. aureus", "MRSA", "MSSA", "STAAUR",
    +                 bacteria = as.mo(c("S. aureus", "MRSA", "MSSA", "STAAUR",
                                         "E. coli", "E. coli", "E. coli")),
                      stringsAsFactors = FALSE)
     colnames(df)
    diff --git a/docs/reference/key_antibiotics.html b/docs/reference/key_antibiotics.html
    index bf4c020d..1919773d 100644
    --- a/docs/reference/key_antibiotics.html
    +++ b/docs/reference/key_antibiotics.html
    @@ -163,7 +163,7 @@
     
         
    -

    These function can be used to determine first isolates (see first_isolate). Using key antibiotics to determine first isolates is more reliable than without key antibiotics. These selected isolates will then be called first weighted isolates.

    +

    These function can be used to determine first isolates (see first_isolate). Using key antibiotics to determine first isolates is more reliable than without key antibiotics. These selected isolates will then be called first weighted isolates.

    @@ -187,7 +187,7 @@ col_mo -

    column name of the unique IDs of the microorganisms (see mo), defaults to the first column of class mo. Values will be coerced using as.mo.

    +

    column name of the unique IDs of the microorganisms (see mo), defaults to the first column of class mo. Values will be coerced using as.mo.

    universal_1, universal_2, universal_3, universal_4, universal_5, universal_6 @@ -233,7 +233,7 @@

    Details

    -

    The function key_antibiotics returns a character vector with 12 antibiotic results for every isolate. These isolates can then be compared using key_antibiotics_equal, to check if two isolates have generally the same antibiogram. Missing and invalid values are replaced with a dot ("."). The first_isolate function only uses this function on the same microbial species from the same patient. Using this, an MRSA will be included after a susceptible S. aureus (MSSA) found within the same episode (see episode parameter of first_isolate). Without key antibiotic comparison it wouldn't.

    +

    The function key_antibiotics returns a character vector with 12 antibiotic results for every isolate. These isolates can then be compared using key_antibiotics_equal, to check if two isolates have generally the same antibiogram. Missing and invalid values are replaced with a dot ("."). The first_isolate function only uses this function on the same microbial species from the same patient. Using this, an MRSA will be included after a susceptible S. aureus (MSSA) found within the same episode (see episode parameter of first_isolate). Without key antibiotic comparison it wouldn't.

    At default, the antibiotics that are used for Gram positive bacteria are (colum names):
    "amox", "amcl", "cfur", "pita", "cipr", "trsu" (until here is universal), "vanc", "teic", "tetr", "eryt", "oxac", "rifa".

    At default, the antibiotics that are used for Gram negative bacteria are (colum names):
    @@ -251,7 +251,7 @@

    See also

    -

    first_isolate

    +

    Examples

    @@ -261,12 +261,12 @@ library(dplyr) # set key antibiotics to a new variable my_patients <- septic_patients %>% - mutate(keyab = key_antibiotics(.)) %>% - mutate( + mutate(keyab = key_antibiotics(.)) %>% + mutate( # now calculate first isolates - first_regular = first_isolate(., col_keyantibiotics = FALSE), + first_regular = first_isolate(., col_keyantibiotics = FALSE), # and first WEIGHTED isolates - first_weighted = first_isolate(., col_keyantibiotics = "keyab") + first_weighted = first_isolate(., col_keyantibiotics = "keyab") ) # Check the difference, in this data set it results in 7% more isolates: diff --git a/docs/reference/kurtosis.html b/docs/reference/kurtosis.html index 82f6e9e8..7bcbecdb 100644 --- a/docs/reference/kurtosis.html +++ b/docs/reference/kurtosis.html @@ -193,7 +193,7 @@

    See also

    -

    skewness

    + diff --git a/docs/reference/like.html b/docs/reference/like.html index cdcdeda6..24cecf58 100644 --- a/docs/reference/like.html +++ b/docs/reference/like.html @@ -228,9 +228,9 @@ # get frequencies of bacteria whose name start with 'Ent' or 'ent' library(dplyr) septic_patients %>% - left_join_microorganisms() %>% - filter(genus %like% '^ent') %>% - freq(genus, species) + left_join_microorganisms() %>% + filter(genus %like% '^ent') %>% + freq(genus, species) # }
    diff --git a/docs/reference/microorganisms.certe.html b/docs/reference/microorganisms.certe.html index 0ee93698..105a42b2 100644 --- a/docs/reference/microorganisms.certe.html +++ b/docs/reference/microorganisms.certe.html @@ -163,7 +163,7 @@
    -

    A data set containing all bacteria codes of Certe MMB. These codes can be joined to data with an ID from microorganisms$mo (using left_join_microorganisms). GLIMS codes can also be translated to valid MOs with guess_mo.

    +

    A data set containing all bacteria codes of Certe MMB. These codes can be joined to data with an ID from microorganisms$mo (using left_join_microorganisms). GLIMS codes can also be translated to valid MOs with guess_mo.

    @@ -173,12 +173,12 @@

    A data.frame with 2,665 observations and 2 variables:

    certe

    Code of microorganism according to Certe MMB

    -
    mo

    Code of microorganism in microorganisms

    +
    mo

    Code of microorganism in microorganisms

    See also

    -

    as.mo microorganisms

    + diff --git a/docs/reference/microorganisms.html b/docs/reference/microorganisms.html index 9150aa4b..d3803b57 100644 --- a/docs/reference/microorganisms.html +++ b/docs/reference/microorganisms.html @@ -163,7 +163,7 @@
    -

    A data set containing the complete microbial taxonomy of the kingdoms Bacteria, Fungi and Protozoa. MO codes can be looked up using as.mo.

    +

    A data set containing the complete microbial taxonomy of the kingdoms Bacteria, Fungi and Protozoa. MO codes can be looked up using as.mo.

    @@ -185,7 +185,7 @@
    subkingdom

    Taxonomic subkingdom of the microorganism as found in ITIS, see Source

    kingdom

    Taxonomic kingdom of the microorganism as found in ITIS, see Source

    gramstain

    Gram of microorganism, like "Gram negative"

    -
    prevalence

    An integer based on estimated prevalence of the microorganism in humans. Used internally by as.mo, otherwise quite meaningless. It has a value of 25 for manually added items and a value of 1000 for all unprevalent microorganisms whose genus was somewhere in the top 250 (with another species).

    +
    prevalence

    An integer based on estimated prevalence of the microorganism in humans. Used internally by as.mo, otherwise quite meaningless. It has a value of 25 for manually added items and a value of 1000 for all unprevalent microorganisms whose genus was somewhere in the top 250 (with another species).

    ref

    Author(s) and year of concerning publication as found in ITIS, see Source

    @@ -203,7 +203,7 @@ This package contains the complete microbial taxonomic data (wi

    See also

    -

    as.mo mo_property microorganisms.umcg

    + diff --git a/docs/reference/microorganisms.old.html b/docs/reference/microorganisms.old.html index 66928578..787704dc 100644 --- a/docs/reference/microorganisms.old.html +++ b/docs/reference/microorganisms.old.html @@ -163,7 +163,7 @@
    -

    A data set containing old (previously valid or accepted) taxonomic names according to ITIS. This data set is used internally by as.mo.

    +

    A data set containing old (previously valid or accepted) taxonomic names according to ITIS. This data set is used internally by as.mo.

    @@ -192,7 +192,7 @@ This package contains the complete microbial taxonomic data (wi

    See also

    -

    as.mo mo_property microorganisms

    + diff --git a/docs/reference/microorganisms.umcg.html b/docs/reference/microorganisms.umcg.html index e17e9ef4..081b91d6 100644 --- a/docs/reference/microorganisms.umcg.html +++ b/docs/reference/microorganisms.umcg.html @@ -163,7 +163,7 @@
    -

    A data set containing all bacteria codes of UMCG MMB. These codes can be joined to data with an ID from microorganisms$mo (using left_join_microorganisms). GLIMS codes can also be translated to valid MOs with guess_mo.

    +

    A data set containing all bacteria codes of UMCG MMB. These codes can be joined to data with an ID from microorganisms$mo (using left_join_microorganisms). GLIMS codes can also be translated to valid MOs with guess_mo.

    @@ -178,7 +178,7 @@

    See also

    -

    as.mo microorganisms.certe microorganisms

    + diff --git a/docs/reference/mo_failures.html b/docs/reference/mo_failures.html index 47fd4c94..ebd64bf2 100644 --- a/docs/reference/mo_failures.html +++ b/docs/reference/mo_failures.html @@ -163,7 +163,7 @@
    -

    Returns a vector of all failed attempts to coerce values to a valid MO code with as.mo.

    +

    Returns a vector of all failed attempts to coerce values to a valid MO code with as.mo.

    @@ -171,7 +171,7 @@

    See also

    -

    as.mo

    + diff --git a/docs/reference/mo_property.html b/docs/reference/mo_property.html index 46fe045d..bcc59e01 100644 --- a/docs/reference/mo_property.html +++ b/docs/reference/mo_property.html @@ -163,19 +163,19 @@
    -

    Use these functions to return a specific property of a microorganism from the microorganisms data set. All input values will be evaluated internally with as.mo.

    +

    Use these functions to return a specific property of a microorganism from the microorganisms data set. All input values will be evaluated internally with as.mo.

    -
    mo_fullname(x, language = get_locale(), ...)
    +    
    mo_fullname(x, language = get_locale(), ...)
     
    -mo_shortname(x, language = get_locale(), ...)
    +mo_shortname(x, language = get_locale(), ...)
     
    -mo_subspecies(x, language = get_locale(), ...)
    +mo_subspecies(x, language = get_locale(), ...)
     
    -mo_species(x, language = get_locale(), ...)
    +mo_species(x, language = get_locale(), ...)
     
    -mo_genus(x, language = get_locale(), ...)
    +mo_genus(x, language = get_locale(), ...)
     
     mo_family(x, ...)
     
    @@ -189,9 +189,9 @@
     
     mo_kingdom(x, ...)
     
    -mo_type(x, language = get_locale(), ...)
    +mo_type(x, language = get_locale(), ...)
     
    -mo_gramstain(x, language = get_locale(), ...)
    +mo_gramstain(x, language = get_locale(), ...)
     
     mo_TSN(x, ...)
     
    @@ -203,26 +203,26 @@
     
     mo_taxonomy(x, ...)
     
    -mo_property(x, property = "fullname", language = get_locale(), ...)
    +mo_property(x, property = "fullname", language = get_locale(), ...)

    Arguments

    - + - + - + - +
    x

    any (vector of) text that can be coerced to a valid microorganism code with as.mo

    any (vector of) text that can be coerced to a valid microorganism code with as.mo

    language

    language of the returned text, defaults to system language (see get_locale) and can also be set with getOption("AMR_locale"). Use language = NULL or language = "" to prevent translation.

    language of the returned text, defaults to system language (see get_locale) and can also be set with getOption("AMR_locale"). Use language = NULL or language = "" to prevent translation.

    ...

    other parameters passed on to as.mo

    other parameters passed on to as.mo

    property

    one of the column names of one of the microorganisms data set or "shortname"

    one of the column names of one of the microorganisms data set or "shortname"

    @@ -266,7 +266,7 @@ This package contains the complete microbial taxonomic data (wi

    See also

    -

    microorganisms

    +

    Examples

    diff --git a/docs/reference/mo_renamed.html b/docs/reference/mo_renamed.html index 6eb3562f..1c2d9758 100644 --- a/docs/reference/mo_renamed.html +++ b/docs/reference/mo_renamed.html @@ -163,7 +163,7 @@
    -

    Returns a vector of all renamed items of the last coercion to valid MO codes with as.mo.

    +

    Returns a vector of all renamed items of the last coercion to valid MO codes with as.mo.

    @@ -171,7 +171,7 @@

    See also

    -

    as.mo

    + diff --git a/docs/reference/portion.html b/docs/reference/portion.html index a48d0f8a..1e6fab8d 100644 --- a/docs/reference/portion.html +++ b/docs/reference/portion.html @@ -192,7 +192,7 @@ portion_R and portion_IR can be used to calculate resistance, portion_S and port ... -

    one or more vectors (or columns) with antibiotic interpretations. They will be transformed internally with as.rsi if needed. Use multiple columns to calculate (the lack of) co-resistance: the probability where one of two drugs have a resistant or susceptible result. See Examples.

    +

    one or more vectors (or columns) with antibiotic interpretations. They will be transformed internally with as.rsi if needed. Use multiple columns to calculate (the lack of) co-resistance: the probability where one of two drugs have a resistant or susceptible result. See Examples.

    minimum @@ -208,11 +208,11 @@ portion_R and portion_IR can be used to calculate resistance, portion_S and port data -

    a data.frame containing columns with class rsi (see as.rsi)

    +

    a data.frame containing columns with class rsi (see as.rsi)

    translate_ab -

    a column name of the antibiotics data set to translate the antibiotic abbreviations to, using abname. This can be set with getOption("get_antibiotic_names").

    +

    a column name of the antibiotics data set to translate the antibiotic abbreviations to, using abname. This can be set with getOption("get_antibiotic_names").

    combine_IR @@ -231,10 +231,10 @@ portion_R and portion_IR can be used to calculate resistance, portion_S and port

    Details

    -

    Remember that you should filter your table to let it contain only first isolates! Use first_isolate to determine them in your data set.

    -

    These functions are not meant to count isolates, but to calculate the portion of resistance/susceptibility. Use the count functions to count isolates. Low counts can infuence the outcome - these portion functions may camouflage this, since they only return the portion albeit being dependent on the minimum parameter.

    -

    portion_df takes any variable from data that has an "rsi" class (created with as.rsi) and calculates the portions R, I and S. The resulting tidy data (see Source) data.frame will have three rows (S/I/R) and a column for each variable with class "rsi".

    -

    The old rsi function is still available for backwards compatibility but is deprecated. +

    Remember that you should filter your table to let it contain only first isolates! Use first_isolate to determine them in your data set.

    +

    These functions are not meant to count isolates, but to calculate the portion of resistance/susceptibility. Use the count functions to count isolates. Low counts can infuence the outcome - these portion functions may camouflage this, since they only return the portion albeit being dependent on the minimum parameter.

    +

    portion_df takes any variable from data that has an "rsi" class (created with as.rsi) and calculates the portions R, I and S. The resulting tidy data (see Source) data.frame will have three rows (S/I/R) and a column for each variable with class "rsi".

    +

    The old rsi function is still available for backwards compatibility but is deprecated.

    To calculate the probability (p) of susceptibility of one antibiotic, we use this formula:

    @@ -250,7 +250,7 @@ portion_R and portion_IR can be used to calculate resistance, portion_S and port

    See also

    -

    count_* to count resistant and susceptible isolates.

    +

    count_* to count resistant and susceptible isolates.

    Examples

    @@ -274,58 +274,58 @@ portion_R and portion_IR can be used to calculate resistance, portion_S and port septic_patients %>% portion_SI(amox) septic_patients %>% - group_by(hospital_id) %>% - summarise(p = portion_S(cipr), - n = n_rsi(cipr)) # n_rsi works like n_distinct in dplyr + group_by(hospital_id) %>% + summarise(p = portion_S(cipr), + n = n_rsi(cipr)) # n_rsi works like n_distinct in dplyr septic_patients %>% - group_by(hospital_id) %>% - summarise(R = portion_R(cipr, as_percent = TRUE), + group_by(hospital_id) %>% + summarise(R = portion_R(cipr, as_percent = TRUE), I = portion_I(cipr, as_percent = TRUE), S = portion_S(cipr, as_percent = TRUE), - n = n_rsi(cipr), # works like n_distinct in dplyr - total = n()) # NOT the amount of tested isolates! + n = n_rsi(cipr), # works like n_distinct in dplyr + total = n()) # NOT the amount of tested isolates! # Calculate co-resistance between amoxicillin/clav acid and gentamicin, # so we can see that combination therapy does a lot more than mono therapy: septic_patients %>% portion_S(amcl) # S = 67.1% -septic_patients %>% count_all(amcl) # n = 1576 +septic_patients %>% count_all(amcl) # n = 1576 septic_patients %>% portion_S(gent) # S = 74.0% -septic_patients %>% count_all(gent) # n = 1855 +septic_patients %>% count_all(gent) # n = 1855 septic_patients %>% portion_S(amcl, gent) # S = 92.0% -septic_patients %>% count_all(amcl, gent) # n = 1517 +septic_patients %>% count_all(amcl, gent) # n = 1517 septic_patients %>% - group_by(hospital_id) %>% - summarise(cipro_p = portion_S(cipr, as_percent = TRUE), - cipro_n = count_all(cipr), + group_by(hospital_id) %>% + summarise(cipro_p = portion_S(cipr, as_percent = TRUE), + cipro_n = count_all(cipr), genta_p = portion_S(gent, as_percent = TRUE), - genta_n = count_all(gent), + genta_n = count_all(gent), combination_p = portion_S(cipr, gent, as_percent = TRUE), - combination_n = count_all(cipr, gent)) + combination_n = count_all(cipr, gent)) # Get portions S/I/R immediately of all rsi columns septic_patients %>% - select(amox, cipr) %>% + select(amox, cipr) %>% portion_df(translate = FALSE) # It also supports grouping variables septic_patients %>% - select(hospital_id, amox, cipr) %>% - group_by(hospital_id) %>% + select(hospital_id, amox, cipr) %>% + group_by(hospital_id) %>% portion_df(translate = FALSE) # }# NOT RUN { # calculate current empiric combination therapy of Helicobacter gastritis: my_table %>% - filter(first_isolate == TRUE, + filter(first_isolate == TRUE, genus == "Helicobacter") %>% - summarise(p = portion_S(amox, metr), # amoxicillin with metronidazole - n = count_all(amox, metr)) + summarise(p = portion_S(amox, metr), # amoxicillin with metronidazole + n = count_all(amox, metr)) # } diff --git a/docs/reference/supplementary_data.html b/docs/reference/supplementary_data.html index ad8bf49e..dd10ec46 100644 --- a/docs/reference/supplementary_data.html +++ b/docs/reference/supplementary_data.html @@ -163,7 +163,7 @@
    -

    These data.tables are transformed from the microorganisms and microorganisms data sets to improve speed of as.mo. They are meant for internal use only, and are only mentioned here for reference.

    +

    These data.tables are transformed from the microorganisms and microorganisms data sets to improve speed of as.mo. They are meant for internal use only, and are only mentioned here for reference.

    diff --git a/index.md b/index.md index d6dc1674..5a74ccf3 100644 --- a/index.md +++ b/index.md @@ -56,6 +56,47 @@ It will be downloaded and installed automatically. To find out how to conduct AMR analysis, please [continue reading here to get started](./articles/AMR.html) or click the button 'Get Started' in the top menu. +### Short introduction + + + +This package contains the **complete microbial taxonomic data** (with all nine taxonomic ranks - from kingdom to subspecies) from the publicly available Integrated Taxonomic Information System (ITIS, https://www.itis.gov). + +All (sub)species from **the taxonomic kingdoms Bacteria, Fungi and Protozoa are included in this package**, as well as all previously accepted names known to ITIS. Furthermore, the responsible authors and year of publication are available. This allows users to use authoritative taxonomic information for their data analysis on any microorganism, not only human pathogens. It also helps to quickly determine the Gram stain of bacteria, since all bacteria are classified into subkingdom Negibacteria or Posibacteria. ITIS is a partnership of U.S., Canadian, and Mexican agencies and taxonomic specialists. + +The `AMR` package basically does four important things: + +1. It **cleanses existing data**, by transforming it to reproducible and profound *classes*, making the most efficient use of R. These functions all use artificial intelligence to guess results that you would expect: + + * Use `as.mo()` to get an ID of a microorganism. The IDs are human readable for the trained eye - the ID of *Klebsiella pneumoniae* is "B_KLBSL_PNE" (B stands for Bacteria) and the ID of *S. aureus* is "B_STPHY_AUR". The function takes almost any text as input that looks like the name or code of a microorganism like "E. coli", "esco" and "esccol". Even `as.mo("MRSA")` will return the ID of *S. aureus*. Moreover, it can group all coagulase negative and positive *Staphylococci*, and can transform *Streptococci* into Lancefield groups. To find bacteria based on your input, it uses Artificial Intelligence to look up values in the included ITIS data, consisting of more than 18,000 microorganisms. + * Use `as.rsi()` to transform values to valid antimicrobial results. It produces just S, I or R based on your input and warns about invalid values. Even values like "<=0.002; S" (combined MIC/RSI) will result in "S". + * Use `as.mic()` to cleanse your MIC values. It produces a so-called factor (called *ordinal* in SPSS) with valid MIC values as levels. A value like "<=0.002; S" (combined MIC/RSI) will result in "<=0.002". + * Use `as.atc()` to get the ATC code of an antibiotic as defined by the WHO. This package contains a database with most LIS codes, official names, DDDs and even trade names of antibiotics. For example, the values "Furabid", "Furadantin", "nitro" all return the ATC code of Nitrofurantoine. + +2. It **enhances existing data** and **adds new data** from data sets included in this package. + + * Use `eucast_rules()` to apply [EUCAST expert rules to isolates](http://www.eucast.org/expert_rules_and_intrinsic_resistance/). + * Use `first_isolate()` to identify the first isolates of every patient [using guidelines from the CLSI](https://clsi.org/standards/products/microbiology/documents/m39/) (Clinical and Laboratory Standards Institute). + * You can also identify first *weighted* isolates of every patient, an adjusted version of the CLSI guideline. This takes into account key antibiotics of every strain and compares them. + * Use `mdro()` (abbreviation of Multi Drug Resistant Organisms) to check your isolates for exceptional resistance with country-specific guidelines or EUCAST rules. Currently, national guidelines for Germany and the Netherlands are supported. + * The data set `microorganisms` contains the complete taxonomic tree of more than 18,000 microorganisms (bacteria, fungi/yeasts and protozoa). Furthermore, the colloquial name and Gram stain are available, which enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like `mo_genus()`, `mo_family()`, `mo_gramstain()` or even `mo_phylum()`. As they use `as.mo()` internally, they also use artificial intelligence. For example, `mo_genus("MRSA")` and `mo_genus("S. aureus")` will both return `"Staphylococcus"`. They also come with support for German, Dutch, Spanish, Italian, French and Portuguese. These functions can be used to add new variables to your data. + * The data set `antibiotics` contains the ATC code, LIS codes, official name, trivial name and DDD of both oral and parenteral administration. It also contains a total of 298 trade names. Use functions like `ab_name()` and `ab_tradenames()` to look up values. The `ab_*` functions use `as.atc()` internally so they support AI to guess your expected result. For example, `ab_name("Fluclox")`, `ab_name("Floxapen")` and `ab_name("J01CF05")` will all return `"Flucloxacillin"`. These functions can again be used to add new variables to your data. + +3. It **analyses the data** with convenient functions that use well-known methods. + + * Calculate the resistance (and even co-resistance) of microbial isolates with the `portion_R()`, `portion_IR()`, `portion_I()`, `portion_SI()` and `portion_S()` functions. Similarly, the *number* of isolates can be determined with the `count_R()`, `count_IR()`, `count_I()`, `count_SI()` and `count_S()` functions. All these functions can be used [with the `dplyr` package](https://dplyr.tidyverse.org/#usage) (e.g. in conjunction with [`summarise`](https://dplyr.tidyverse.org/reference/summarise.html)) + * Plot AMR results with `geom_rsi()`, a function made for the `ggplot2` package + * Predict antimicrobial resistance for the nextcoming years using logistic regression models with the `resistance_predict()` function + * Conduct descriptive statistics to enhance base R: calculate `kurtosis()`, `skewness()` and create frequency tables with `freq()` + +4. It **teaches the user** how to use all the above actions. + + * The package contains extensive help pages with many examples. + * It also contains an example data set called `septic_patients`. This data set contains: + * 2,000 blood culture isolates from anonymised septic patients between 2001 and 2017 in the Northern Netherlands + * Results of 40 antibiotics (each antibiotic in its own column) with a total of 38,414 antimicrobial results + * Real and genuine data + ---- diff --git a/pkgdown/extra.css b/pkgdown/extra.css index 5c0cf18c..b0969727 100644 --- a/pkgdown/extra.css +++ b/pkgdown/extra.css @@ -28,10 +28,24 @@ pre, code { font-weight: bold; background-color: transparent; } +pre { + font-size: 90% !important; +} li, p { line-height: 1.5; } +/* slightly smaller blockquote */ +blockquote { + font-size: 98%; +} + +/* 2nd list in navigation should be smaller */ +#tocnav li li { + font-size: 90%; + margin-left: 5px; +} + /* new element with dotted underline */ help { border-bottom: 1px dotted; @@ -60,3 +74,18 @@ help { #navbar .fas { margin-right: 5px; } + +/* tables */ +.table { + font-size: 90%; +} +.table td { + padding: 4px !important; +} +thead { + border-top: 2px solid black; + border-bottom: 2px solid black; +} +tbody { + border-bottom: 2px solid black; +} diff --git a/tests/testthat/test-first_isolate.R b/tests/testthat/test-first_isolate.R index 80196198..71ee0d60 100755 --- a/tests/testthat/test-first_isolate.R +++ b/tests/testthat/test-first_isolate.R @@ -19,7 +19,7 @@ context("first_isolate.R") test_that("first isolates work", { - # septic_patients contains 1315 out of 2000 first isolates + # septic_patients contains 1317 out of 2000 first isolates expect_equal( sum( first_isolate(tbl = septic_patients, @@ -28,9 +28,9 @@ test_that("first isolates work", { col_mo = "mo", info = TRUE), na.rm = TRUE), - 1315) + 1317) - # septic_patients contains 1411 out of 2000 first *weighted* isolates + # septic_patients contains 1413 out of 2000 first *weighted* isolates expect_equal( suppressWarnings( sum( @@ -43,7 +43,7 @@ test_that("first isolates work", { type = "keyantibiotics", info = TRUE), na.rm = TRUE)), - 1411) + 1413) # should be same for tibbles expect_equal( suppressWarnings( @@ -57,8 +57,8 @@ test_that("first isolates work", { type = "keyantibiotics", info = TRUE), na.rm = TRUE)), - 1411) - # and 1435 when not ignoring I + 1413) + # and 1436 when not ignoring I expect_equal( suppressWarnings( sum( @@ -71,8 +71,8 @@ test_that("first isolates work", { type = "keyantibiotics", info = TRUE), na.rm = TRUE)), - 1435) - # and 1416 when using points + 1436) + # and 1417 when using points expect_equal( suppressWarnings( sum( @@ -84,9 +84,9 @@ test_that("first isolates work", { type = "points", info = TRUE), na.rm = TRUE)), - 1416) + 1417) - # septic_patients contains 1161 out of 2000 first non-ICU isolates + # septic_patients contains 1163 out of 2000 first non-ICU isolates expect_equal( sum( first_isolate(septic_patients, @@ -97,7 +97,7 @@ test_that("first isolates work", { info = TRUE, icu_exclude = TRUE), na.rm = TRUE), - 1161) + 1163) # set 1500 random observations to be of specimen type 'Urine' random_rows <- sample(x = 1:2000, size = 1500, replace = FALSE) diff --git a/vignettes/AMR.Rmd b/vignettes/AMR.Rmd index 171932d5..60d71977 100755 --- a/vignettes/AMR.Rmd +++ b/vignettes/AMR.Rmd @@ -6,8 +6,10 @@ output: toc: true vignette: > %\VignetteIndexEntry{The AMR package - How to conduct AMR analysis} - %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} + %\VignetteEngine{knitr::rmarkdown} +editor_options: + chunk_output_type: console --- ```{r setup, include = FALSE, results = 'markup'} @@ -15,7 +17,253 @@ knitr::opts_chunk$set( collapse = TRUE, comment = "#" ) +# set to original language (English) +Sys.setlocale(locale = "C") ``` +**Note:** values on this page will be regenerated with every website update since it is written in [RMarkdown](https://rmarkdown.rstudio.com/), so actual results will change over time. However, the methodology remains unchanged. This page was generated on `r format(Sys.Date(), "%d %B %Y")`. -This page will soon be updated. +# Introduction +(work in progress) + +# Tutorial +For this tutorial, we will create fake demonstration data to work with. + +You can skip to [Cleaning the data](#cleaning-the-data) if you already have your own data ready. If you start your analysis, try to make the structure of your data generally look like this: + +```{r example table, echo = FALSE, results = 'asis'} +knitr::kable(dplyr::tibble(date = Sys.Date(), + patient_id = c("abcd", "abcd", "efgh"), + mo = "Escherichia coli", + amox = c("S", "S", "R"), + cipr = c("S", "R", "S")), + align = "c") +``` + +## Needed R packages +As with many uses in R, we need some additional packages for AMR analysis. The most important one is [`dplyr`](https://dplyr.tidyverse.org/), which tremendously improves the way we work with data - it allows for a very natural way of writing syntaxes in R. Another important dependency is [`ggplot2`](https://ggplot2.tidyverse.org/). This package can be used to create beautiful plots in R. + +Our `AMR` package depends on these packages and even extends their use and functions. + +```{r lib packages, message = FALSE} +library(dplyr) # the data science package +library(AMR) # this package, to simplify and automate AMR analysis +library(ggplot2) # for appealing plots +``` + +## Creation of data +We will create some fake example data to use for analysis. For antimicrobial resistance analysis, we need at least: a patients ID, name or code of a microorganism, a date and antimicrobial results (an antibiogram). It could also include a specimen type (e.g. to filter on blood or urine), the ward type (e.g. to filter on ICUs). + +With additional columns (like a hospital name, the patients gender of even [well-defined] clinical properties) you can do a comparative analysis, as this tutorial will demonstrate too. + +#### Patients +To start with patients, we need a unique list of patients. + +```{r create patients} +patients <- unlist(lapply(LETTERS, paste0, 1:10)) +``` + +The `LETTERS` object is available in R - it's a vector with 26 characters: `A` to `Z`. The `patients` object we just created is now a vector of length `r length(patients)`, with values (patient IDs) varying from ``r patients[1]`` to ``r patients[length(patients)]``. + +#### Dates +Let's pretend that our data consists of blood cultures isolates from 1 January 2010 until 1 January 2018. + +```{r create dates} +dates <- seq(as.Date("2010-01-01"), as.Date("2018-01-01"), by = "day") +``` + +This `dates` object now contains all days in our date range. + +#### Microorganisms +For this tutorial, we will uses four different microorganisms: *Escherichia coli*, *Staphylococcus aureus*, *Streptococcus pneumoniae*, and *Klebsiella pneumoniae*: + +```{r mo} +bacteria <- c("Escherichia coli", "Staphylococcus aureus", + "Streptococcus pneumoniae", "Klebsiella pneumoniae") +``` + +#### Other variables +For completeness, we can also add the patients gender, the hospital where the patients was admitted and all valid antibmicrobial results: + +```{r create other} +genders <- c("M", "F") +hospitals <- c("Hospital A", "Hospital B", "Hospital C", "Hospital D") +ab_interpretations <- c("S", "I", "R") +``` + +#### Put everything together + +Using the `sample()` function, we can randomly select items from all objects we defined earlier. To let our fake data reflect reality a bit, we will also approximately define the probabilities of bacteria and the antibiotic results with the `prob` parameter. + +```{r merge data} +data <- data.frame(date = sample(dates, 5000, replace = TRUE), + patient_id = sample(patients, 5000, replace = TRUE), + # gender - add slightly more men: + gender = sample(genders, 5000, replace = TRUE, prob = c(0.55, 0.45)), + hospital = sample(hospitals, 5000, replace = TRUE), + bacteria = sample(bacteria, 5000, replace = TRUE, prob = c(0.50, 0.25, 0.15, 0.10)), + amox = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.6, 0.05, 0.35)), + amcl = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.75, 0.1, 0.15)), + cipr = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.8, 0, 0.2)), + gent = sample(ab_interpretations, 5000, replace = TRUE, prob = c(0.92, 0, 0.07)) + ) +``` + +The resulting data set contains 5,000 blood culture isolates. With the `head()` function we can preview the first 6 values of this data set: +```{r preview data set 1, echo = TRUE, results = 'hide'} +head(data) +``` + +```{r preview data set 2, echo = FALSE, results = 'asis'} +knitr::kable(head(data), align = "c") +``` + +Now, let's start the cleaning and the analysis! + +## Cleaning the data +Use the frequency table function `freq()` to look specifically for unique values in any variable. For example, for the `gender` variable: + +```{r freq gender 1, echo = TRUE, results = 'hide'} +data %>% freq(gender) # this would be the same: freq(data$gender) +``` + +```{r freq gender 2, echo = FALSE, results = 'markup'} +data %>% freq(gender, markdown = FALSE, header = TRUE) +``` + +So, we can draw at least two conclusions immediately. From a data scientist perspective, the data looks clean: only values `M` and `F`. From a researcher perspective: there are slightly more men. Nothing we didn't already know. + +The data is already quite clean, but we still need to transform some variables. The `bacteria` column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The `mutate` function of the dplyr package makes this really easy: +```{r transform mo 1} +data <- data %>% + mutate(bacteria = as.mo(bacteria)) +``` + +We also want to transform the antibiotics, because in real life data we don't know if they are really clean. The `as.rsi()` function ensures reliability and reproducibility in these kind of variables. The `mutate_at()` will run the `as.rsi()` function on defined variables: + +```{r transform abx} +data <- data %>% + mutate_at(vars(amox:cipr), as.rsi) +``` + +Finally, we will apply [EUCAST rules](http://www.eucast.org/expert_rules_and_intrinsic_resistance/) on our antimicrobial results. In Europe, most medical microbiological laboratories already apply these rules. Our package features their latest insights on intrinsic resistance and exceptional phenotypes. Moreover, the `eucast_rules()` function can also apply additional rules, like forcing ampicillin = R when amoxicillin/clavulanic acid = R. + +Because the amoxicillin (column `amox`) and amoxicillin/clavulanic acid (column `amcl`) in our data were generated randomly, some rows will undoubtedly contain amox = S and amcl = R, which is technically impossible. The `eucast_rules()` fixes this: + +```{r eucast, warning = FALSE, message = FALSE} +data <- eucast_rules(data, col_mo = "bacteria") +``` + +## Adding new variables +Now we have the microbial ID, we can add some taxonomic properties: + +```{r new taxo} +data <- data %>% + mutate(gramstain = mo_gramstain(bacteria), + family = mo_family(bacteria)) +``` + +### First isolates +We also need to know which isolates we can *actually* use for analysis. + +To conduct an analysis of antimicrobial resistance, you [must only include the first isolate of every patient per episode](https://www.ncbi.nlm.nih.gov/pubmed/17304462). If you would not do this, you could easily get an overestimate or underestimate of the resistance of an antibiotic. Imagine that a patient was admitted with an MRSA and that it was found in 5 different blood cultures the following weeks (yes, some countries like the Netherlands have these blood drawing policies). The resistance percentage of oxacillin of all \emph{S. aureus} isolates would be overestimated, because you included this MRSA more than once. It would clearly be \href{https://en.wikipedia.org/wiki/Selection_bias}{selection bias}. + +The Clinical and Laboratory Standards Institute (CLSI) appoints this as follows: + +> *(...) When preparing a cumulative antibiogram to guide clinical decisions about empirical antimicrobial therapy of initial infections, **only the first isolate of a given species per patient, per analysis period (eg, one year) should be included, irrespective of body site, antimicrobial susceptibility profile, or other phenotypical characteristics (eg, biotype)**. The first isolate is easily identified, and cumulative antimicrobial susceptibility test data prepared using the first isolate are generally comparable to cumulative antimicrobial susceptibility test data calculated by other methods, providing duplicate isolates are excluded.* +Chapter 6.4, M39-A4 Analysis and Presentation of Cumulative Antimicrobial Susceptibility Test Data, 4th Edition. CLSI, 2014. https://clsi.org/standards/products/microbiology/documents/m39/ + +This `AMR` package includes this methodology with the `first_isolate()` function. It adopts the episode of a year (can be changed by user) and it starts counting days after every selected isolate. This new variable can easily be added to our data: +```{r 1st isolate} +data <- data %>% + mutate(first = first_isolate(.)) +``` + +So only `r AMR:::percent(sum(data$first) / nrow(data))` is suitable for resistance analysis! We can now filter on is with the `filter()` function, also from the `dplyr` package: + +```{r 1st isolate filter} +data_1st <- data %>% + filter(first == TRUE) +``` + +For future use, the above two syntaxes can be shortened with the `filter_first_isolate()` function: +```{r 1st isolate filter 2, results = 'hide', message = FALSE} +data_1st <- data %>% + filter_first_isolate() +``` + +### First *weighted* isolates +We made a slight twist to the CLSI algorithm, to take into account antimicrobial results. Imagine this data, sorted on date: + +```{r, echo = FALSE, message = FALSE, warning = FALSE, results = 'asis'} +weighted_df <- data %>% + filter(bacteria == as.mo("E. coli")) %>% + # only most prevalent patient + filter(patient_id == top_freq(freq(., patient_id), 1)[1]) %>% + arrange(date) %>% + select(date, patient_id, bacteria, amox:gent, first) %>% + # maximum of 10 rows + .[1:min(10, nrow(.)),] %>% + mutate(isolate = row_number()) %>% + select(isolate, everything()) + +weighted_df %>% + knitr::kable(align = "c") +``` + +Only `r sum(weighted_df$first)` isolates are marked as 'first' according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and show be included too. This is why we weigh isolates, based on their antibiogram. The `key_antibiotics()` function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user. + +If a column exists with a name like 'key(...)ab' the `first_isolate()` function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output: + +```{r 1st weighted} +data <- data %>% + mutate(keyab = key_antibiotics(.)) %>% + mutate(first_weighted = first_isolate(.)) +``` + +```{r, echo = FALSE, message = FALSE, warning = FALSE, results = 'asis'} +weighted_df2 <- data %>% + filter(bacteria == as.mo("E. coli")) %>% + # only most prevalent patient + filter(patient_id == top_freq(freq(., patient_id), 1)[1]) %>% + arrange(date) %>% + select(date, patient_id, bacteria, amox:gent, first, first_weighted) %>% + # maximum of 10 rows + .[1:min(10, nrow(.)),] %>% + mutate(isolate = row_number()) %>% + select(isolate, everything()) + +weighted_df2 %>% + knitr::kable(align = "c") +``` + +Instead of `r sum(weighted_df$first)`, now `r sum(weighted_df2$first_weighted)` isolates are flagged. In total, `r AMR:::percent(sum(data$first_weighted) / nrow(data))` of all isolates are marked 'first weighted' - `r AMR:::percent((sum(data$first_weighted) / nrow(data)) - (sum(data$first) / nrow(data)))` more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline. + +As with `filter_first_isolate()`, there's a shortcut for this new algorithm too: +```{r 1st isolate filter 3, results = 'hide', message = FALSE, warning = FALSE} +data_1st <- data %>% + filter_first_weighted_isolate() +``` + +So we end up with `r format(nrow(data_1st), big.mark = ",")` isolates for analysis. + +We can remove unneeded columns: +```{r} +data_1st <- data_1st %>% + select(-first, -keyab) +``` + +Now our data looks like: + +```{r preview data set 3, echo = TRUE, results = 'hide'} +head(data_1st) +``` + +```{r preview data set 4, echo = FALSE, results = 'asis'} +knitr::kable(head(data_1st), align = "c") +``` + +Time for the analysis! + +## Analysing the data +(work in progress)