AMR/R/portion.R

249 lines
12 KiB
R
Raw Normal View History

2018-08-10 15:01:05 +02:00
# ==================================================================== #
# TITLE #
# Antimicrobial Resistance (AMR) Analysis #
# #
2019-01-02 23:24:07 +01:00
# SOURCE #
# https://gitlab.com/msberends/AMR #
2018-08-10 15:01:05 +02:00
# #
# LICENCE #
2019-01-02 23:24:07 +01:00
# (c) 2019 Berends MS (m.s.berends@umcg.nl), Luz CF (c.f.luz@umcg.nl) #
2018-08-10 15:01:05 +02:00
# #
2019-01-02 23:24:07 +01:00
# This R package is free software; you can freely use and distribute #
# it for both personal and commercial purposes under the terms of the #
# GNU General Public License version 2.0 (GNU GPL-2), as published by #
# the Free Software Foundation. #
# #
# This R package was created for academic research and was publicly #
# released in the hope that it will be useful, but it comes WITHOUT #
# ANY WARRANTY OR LIABILITY. #
2019-04-05 18:47:39 +02:00
# Visit our website for more info: https://msberends.gitlab.io/AMR. #
2018-08-10 15:01:05 +02:00
# ==================================================================== #
#' Calculate resistance of isolates
#'
2019-03-26 14:24:03 +01:00
#' @description These functions can be used to calculate the (co-)resistance of microbial isolates (i.e. percentage of S, SI, I, IR or R). All functions support quasiquotation with pipes, can be used in \code{dplyr}s \code{\link[dplyr]{summarise}} and support grouped variables, see \emph{Examples}.
2018-08-10 15:01:05 +02:00
#'
#' \code{portion_R} and \code{portion_IR} can be used to calculate resistance, \code{portion_S} and \code{portion_SI} can be used to calculate susceptibility.\cr
2018-08-23 00:40:36 +02:00
#' @param ... one or more vectors (or columns) with antibiotic interpretations. They will be transformed internally with \code{\link{as.rsi}} if needed. Use multiple columns to calculate (the lack of) co-resistance: the probability where one of two drugs have a resistant or susceptible result. See Examples.
2019-03-26 14:24:03 +01:00
#' @param minimum the minimum allowed number of available (tested) isolates. Any isolate count lower than \code{minimum} will return \code{NA} with a warning. The default number of \code{30} isolates is advised by the Clinical and Laboratory Standards Institute (CLSI) as best practice, see Source.
2018-10-16 09:59:31 +02:00
#' @param as_percent a logical to indicate whether the output must be returned as a hundred fold with \% sign (a character). A value of \code{0.123456} will then be returned as \code{"12.3\%"}.
2019-06-27 11:57:45 +02:00
#' @param also_single_tested a logical to indicate whether for combination therapies also observations should be included where not all antibiotics were tested, but at least one of the tested antibiotics contains a target interpretation (e.g. S in case of \code{portion_S} and R in case of \code{portion_R}). \strong{This could lead to selection bias.}
2018-08-22 00:02:26 +02:00
#' @param data a \code{data.frame} containing columns with class \code{rsi} (see \code{\link{as.rsi}})
2019-05-10 16:44:59 +02:00
#' @param translate_ab a column name of the \code{\link{antibiotics}} data set to translate the antibiotic abbreviations to, using \code{\link{ab_property}}
#' @inheritParams ab_property
2019-05-23 16:58:59 +02:00
#' @param combine_SI a logical to indicate whether all values of S and I must be merged into one, so the output only consists of S+I vs. R (susceptible vs. resistant). This used to be the parameter \code{combine_IR}, but this now follows the redefinition by EUCAST about the interpretion of I (increased exposure) in 2019, see section 'Interpretation of S, I and R' below. Default is \code{TRUE}.
2019-05-13 12:21:57 +02:00
#' @param combine_IR a logical to indicate whether all values of I and R must be merged into one, so the output only consists of S vs. I+R (susceptible vs. non-susceptible). This is outdated, see parameter \code{combine_SI}.
2019-05-13 10:10:16 +02:00
#' @inheritSection as.rsi Interpretation of S, I and R
2018-08-10 15:01:05 +02:00
#' @details \strong{Remember that you should filter your table to let it contain only first isolates!} Use \code{\link{first_isolate}} to determine them in your data set.
#'
2018-12-14 07:23:25 +01:00
#' These functions are not meant to count isolates, but to calculate the portion of resistance/susceptibility. Use the \code{\link[AMR]{count}} functions to count isolates. \emph{Low counts can infuence the outcome - these \code{portion} functions may camouflage this, since they only return the portion albeit being dependent on the \code{minimum} parameter.}
2018-08-22 00:02:26 +02:00
#'
2019-06-13 14:28:46 +02:00
#' The function \code{portion_df} takes any variable from \code{data} that has an \code{"rsi"} class (created with \code{\link{as.rsi}}) and calculates the portions R, I and S. The resulting \emph{tidy data} (see Source) \code{data.frame} will have three rows (S/I/R) and a column for each group and each variable with class \code{"rsi"}.
#'
#' The function \code{rsi_df} works exactly like \code{portion_df}, but adds the number of isolates.
2018-08-10 15:01:05 +02:00
#' \if{html}{
2019-03-26 14:24:03 +01:00
# (created with https://www.latex4technics.com/)
2018-08-10 15:01:05 +02:00
#' \cr\cr
#' To calculate the probability (\emph{p}) of susceptibility of one antibiotic, we use this formula:
2019-03-26 14:24:03 +01:00
#' \out{<div style="text-align: center;">}\figure{combi_therapy_2.png}\out{</div>}
2018-08-10 15:01:05 +02:00
#' To calculate the probability (\emph{p}) of susceptibility of more antibiotics (i.e. combination therapy), we need to check whether one of them has a susceptible result (as numerator) and count all cases where all antibiotics were tested (as denominator). \cr
#' \cr
#' For two antibiotics:
2019-03-26 14:24:03 +01:00
#' \out{<div style="text-align: center;">}\figure{combi_therapy_2.png}\out{</div>}
2018-08-10 15:01:05 +02:00
#' \cr
2018-08-23 00:40:36 +02:00
#' For three antibiotics:
2019-03-26 14:24:03 +01:00
#' \out{<div style="text-align: center;">}\figure{combi_therapy_2.png}\out{</div>}
2018-08-23 00:40:36 +02:00
#' \cr
#' And so on.
2018-08-10 15:01:05 +02:00
#' }
2019-03-26 14:24:03 +01:00
#'
2018-08-10 15:01:05 +02:00
#' @source \strong{M39 Analysis and Presentation of Cumulative Antimicrobial Susceptibility Test Data, 4th Edition}, 2014, \emph{Clinical and Laboratory Standards Institute (CLSI)}. \url{https://clsi.org/standards/products/microbiology/documents/m39/}.
#'
#' Wickham H. \strong{Tidy Data.} The Journal of Statistical Software, vol. 59, 2014. \url{http://vita.had.co.nz/papers/tidy-data.html}
2018-12-14 07:48:12 +01:00
#' @seealso \code{\link[AMR]{count}_*} to count resistant and susceptible isolates.
2018-08-10 15:01:05 +02:00
#' @keywords resistance susceptibility rsi_df rsi antibiotics isolate isolates
#' @return Double or, when \code{as_percent = TRUE}, a character.
#' @rdname portion
#' @name portion
#' @export
2019-01-02 23:24:07 +01:00
#' @inheritSection AMR Read more on our website!
2018-08-10 15:01:05 +02:00
#' @examples
#' # septic_patients is a data set available in the AMR package. It is true, genuine data.
#' ?septic_patients
#'
2018-08-10 15:01:05 +02:00
#' # Calculate resistance
2019-05-10 16:44:59 +02:00
#' portion_R(septic_patients$AMX)
#' portion_IR(septic_patients$AMX)
2018-08-10 15:01:05 +02:00
#'
#' # Or susceptibility
2019-05-10 16:44:59 +02:00
#' portion_S(septic_patients$AMX)
#' portion_SI(septic_patients$AMX)
2018-08-10 15:01:05 +02:00
#'
2018-08-23 00:40:36 +02:00
#' # Do the above with pipes:
2018-08-10 15:01:05 +02:00
#' library(dplyr)
2019-05-10 16:44:59 +02:00
#' septic_patients %>% portion_R(AMX)
#' septic_patients %>% portion_IR(AMX)
#' septic_patients %>% portion_S(AMX)
#' septic_patients %>% portion_SI(AMX)
2018-08-23 00:40:36 +02:00
#'
2018-08-10 15:01:05 +02:00
#' septic_patients %>%
#' group_by(hospital_id) %>%
2019-05-10 16:44:59 +02:00
#' summarise(p = portion_S(CIP),
#' n = n_rsi(CIP)) # n_rsi works like n_distinct in dplyr
2018-08-10 15:01:05 +02:00
#'
#' septic_patients %>%
#' group_by(hospital_id) %>%
2019-05-10 16:44:59 +02:00
#' summarise(R = portion_R(CIP, as_percent = TRUE),
#' I = portion_I(CIP, as_percent = TRUE),
#' S = portion_S(CIP, as_percent = TRUE),
#' n1 = count_all(CIP), # the actual total; sum of all three
#' n2 = n_rsi(CIP), # same - analogous to n_distinct
2019-03-26 14:24:03 +01:00
#' total = n()) # NOT the number of tested isolates!
2018-08-10 15:01:05 +02:00
#'
#' # Calculate co-resistance between amoxicillin/clav acid and gentamicin,
#' # so we can see that combination therapy does a lot more than mono therapy:
2019-05-10 16:44:59 +02:00
#' septic_patients %>% portion_S(AMC) # S = 71.4%
#' septic_patients %>% count_all(AMC) # n = 1879
2018-08-10 15:01:05 +02:00
#'
2019-05-10 16:44:59 +02:00
#' septic_patients %>% portion_S(GEN) # S = 74.0%
#' septic_patients %>% count_all(GEN) # n = 1855
2018-08-23 00:40:36 +02:00
#'
2019-05-10 16:44:59 +02:00
#' septic_patients %>% portion_S(AMC, GEN) # S = 92.3%
#' septic_patients %>% count_all(AMC, GEN) # n = 1798
2018-08-10 15:01:05 +02:00
#'
2019-06-27 11:57:45 +02:00
#' # Using `also_single_tested` can be useful ...
#' septic_patients %>%
#' portion_S(AMC, GEN,
#' also_single_tested = TRUE) # S = 92.6%
#' # ... but can also lead to selection bias - the data only has 2,000 rows:
#' septic_patients %>%
#' count_all(AMC, GEN,
#' also_single_tested = TRUE) # n = 2555
#'
2018-08-10 15:01:05 +02:00
#'
#' septic_patients %>%
#' group_by(hospital_id) %>%
2019-05-10 16:44:59 +02:00
#' summarise(cipro_p = portion_S(CIP, as_percent = TRUE),
#' cipro_n = count_all(CIP),
#' genta_p = portion_S(GEN, as_percent = TRUE),
#' genta_n = count_all(GEN),
#' combination_p = portion_S(CIP, GEN, as_percent = TRUE),
#' combination_n = count_all(CIP, GEN))
2018-08-10 15:01:05 +02:00
#'
#' # Get portions S/I/R immediately of all rsi columns
#' septic_patients %>%
2019-05-10 16:44:59 +02:00
#' select(AMX, CIP) %>%
#' portion_df(translate = FALSE)
#'
#' # It also supports grouping variables
#' septic_patients %>%
2019-05-10 16:44:59 +02:00
#' select(hospital_id, AMX, CIP) %>%
#' group_by(hospital_id) %>%
#' portion_df(translate = FALSE)
#'
#'
2018-08-10 15:01:05 +02:00
#' \dontrun{
#'
#' # calculate current empiric combination therapy of Helicobacter gastritis:
#' my_table %>%
#' filter(first_isolate == TRUE,
#' genus == "Helicobacter") %>%
2019-05-10 16:44:59 +02:00
#' summarise(p = portion_S(AMX, MTR), # amoxicillin with metronidazole
#' n = count_all(AMX, MTR))
2018-08-10 15:01:05 +02:00
#' }
2018-08-23 00:40:36 +02:00
portion_R <- function(...,
2018-08-10 15:01:05 +02:00
minimum = 30,
as_percent = FALSE,
also_single_tested = FALSE) {
2018-08-23 00:40:36 +02:00
rsi_calc(...,
type = "R",
2018-08-10 15:01:05 +02:00
include_I = FALSE,
minimum = minimum,
2018-08-22 00:02:26 +02:00
as_percent = as_percent,
also_single_tested = also_single_tested,
2018-08-22 00:02:26 +02:00
only_count = FALSE)
2018-08-10 15:01:05 +02:00
}
#' @rdname portion
#' @export
2018-08-23 00:40:36 +02:00
portion_IR <- function(...,
2018-08-10 15:01:05 +02:00
minimum = 30,
as_percent = FALSE,
also_single_tested = FALSE) {
2018-08-23 00:40:36 +02:00
rsi_calc(...,
type = "R",
2018-08-10 15:01:05 +02:00
include_I = TRUE,
minimum = minimum,
2018-08-22 00:02:26 +02:00
as_percent = as_percent,
also_single_tested = also_single_tested,
2018-08-22 00:02:26 +02:00
only_count = FALSE)
2018-08-10 15:01:05 +02:00
}
#' @rdname portion
#' @export
2018-08-23 00:40:36 +02:00
portion_I <- function(...,
2018-08-10 15:01:05 +02:00
minimum = 30,
as_percent = FALSE,
also_single_tested = FALSE) {
2018-08-23 00:40:36 +02:00
rsi_calc(...,
type = "I",
2018-08-10 15:01:05 +02:00
include_I = FALSE,
minimum = minimum,
2018-08-22 00:02:26 +02:00
as_percent = as_percent,
also_single_tested = also_single_tested,
2018-08-22 00:02:26 +02:00
only_count = FALSE)
2018-08-10 15:01:05 +02:00
}
#' @rdname portion
#' @export
2018-08-23 00:40:36 +02:00
portion_SI <- function(...,
2018-08-10 15:01:05 +02:00
minimum = 30,
as_percent = FALSE,
also_single_tested = FALSE) {
2018-08-23 00:40:36 +02:00
rsi_calc(...,
type = "S",
2018-08-10 15:01:05 +02:00
include_I = TRUE,
minimum = minimum,
2018-08-22 00:02:26 +02:00
as_percent = as_percent,
also_single_tested = also_single_tested,
2018-08-22 00:02:26 +02:00
only_count = FALSE)
2018-08-10 15:01:05 +02:00
}
#' @rdname portion
#' @export
2018-08-23 00:40:36 +02:00
portion_S <- function(...,
2018-08-10 15:01:05 +02:00
minimum = 30,
as_percent = FALSE,
also_single_tested = FALSE) {
2018-08-23 00:40:36 +02:00
rsi_calc(...,
type = "S",
2018-08-10 15:01:05 +02:00
include_I = FALSE,
minimum = minimum,
2018-08-22 00:02:26 +02:00
as_percent = as_percent,
also_single_tested = also_single_tested,
2018-08-22 00:02:26 +02:00
only_count = FALSE)
2018-08-10 15:01:05 +02:00
}
2018-08-12 17:44:06 +02:00
#' @rdname portion
2018-08-22 00:02:26 +02:00
#' @importFrom dplyr %>% select_if bind_rows summarise_if mutate group_vars select everything
2018-08-12 17:44:06 +02:00
#' @export
2018-08-22 00:02:26 +02:00
portion_df <- function(data,
2019-05-10 16:44:59 +02:00
translate_ab = "name",
language = get_locale(),
2018-08-22 00:02:26 +02:00
minimum = 30,
2018-10-16 09:59:31 +02:00
as_percent = FALSE,
2019-05-13 10:10:16 +02:00
combine_SI = TRUE,
2018-10-16 09:59:31 +02:00
combine_IR = FALSE) {
2018-08-22 00:02:26 +02:00
2019-05-13 10:10:16 +02:00
rsi_calc_df(type = "portion",
data = data,
translate_ab = translate_ab,
language = language,
minimum = minimum,
as_percent = as_percent,
combine_SI = combine_SI,
combine_IR = combine_IR,
combine_SI_missing = missing(combine_SI))
2018-08-12 17:44:06 +02:00
}