AMR/man/get_episode.Rd

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/get_episode.R
\name{get_episode}
\alias{get_episode}
\alias{is_new_episode}
\title{Determine (Clinical or Epidemic) Episodes}
\usage{
get_episode(x, episode_days = NULL, case_free_days = NULL, ...)

is_new_episode(x, episode_days = NULL, case_free_days = NULL, ...)
}
\arguments{
\item{x}{vector of dates (class \code{Date} or \code{POSIXt}), will be sorted internally to determine episodes}

\item{episode_days}{episode length in days, can also be less than a day or \code{Inf}, see \emph{Details}}

\item{case_free_days}{length in days after which new episode will start, can also be less than a day or \code{Inf}, see \emph{Details}}

\item{...}{ignored, only in place to allow future extensions}
}
\value{
\itemize{
\item \code{\link[=get_episode]{get_episode()}}: an \link{integer} vector
\item \code{\link[=is_new_episode]{is_new_episode()}}: a \link{logical} vector
}
}
\description{
These functions determine which items in a vector can be considered (the start of) a new episode, based on the argument \code{episode_days}. This can be used to determine clinical episodes for any epidemiological analysis. The \code{\link[=get_episode]{get_episode()}} function returns the index number of the episode per group, while the \code{\link[=is_new_episode]{is_new_episode()}} function returns \code{TRUE} for every new \code{\link[=get_episode]{get_episode()}} index. Both absolute and relative episode determination are supported.
}
\details{
Episodes can be determined in two ways: absolute and relative.
\enumerate{
\item Absolute

This method uses \code{episode_days} to define an episode length in days, after which a new episode will start. A common use case in AMR data analysis is microbial epidemiology: episodes of \emph{S. aureus} bacteraemia in ICU patients for example. The episode length could then be 30 days, so that new \emph{S. aureus} isolates after an ICU episode of 30 days will be considered a different (or new) episode.

Thus, this method counts \strong{since the start of the previous episode}.
\item Relative

This method uses \code{case_free_days} to quantify the duration of (inter-epidemic) intervals, after which a new episode will start. A common use case is infectious disease epidemiology: episodes of norovirus outbreaks in a hospital for example. The case-free period could then be 14 days, so that new norovirus cases after that time will be considered a different (or new) episode.

Thus, this methods counts \strong{since the last case in the previous episode}.
}

In a table:\tabular{ccc}{
   Date \tab Using \code{episode_days = 7} \tab Using \code{case_free_days = 7} \cr
   2023-01-01 \tab 1 \tab 1 \cr
   2023-01-02 \tab 1 \tab 1 \cr
   2023-01-05 \tab 1 \tab 1 \cr
   2023-01-08 \tab 2\code{*} \tab 1 \cr
   2023-02-21 \tab 3 \tab 2\code{**} \cr
   2023-02-22 \tab 3 \tab 2 \cr
   2023-02-23 \tab 3 \tab 2 \cr
   2023-02-24 \tab 3 \tab 2 \cr
   2023-03-01 \tab 4 \tab 2 \cr
}


\code{*} This marks the start of a new episode, because 8 January 2023 is more than 7 days since the start of the previous episode (1 January 2023). \cr
\code{**} This marks the start of a new episode, because 21 January 2023 is more than 7 days since the last case in the previous episode (8 January 2023).
\subsection{Difference between \code{get_episode()} and \code{is_new_episode()}}{

The \code{\link[=get_episode]{get_episode()}} function returns the index number of the episode, so all cases/patients/isolates in the first episode will have the number 1, all cases/patients/isolates in the second episode will have the number 2, etc.

The \code{\link[=is_new_episode]{is_new_episode()}} function returns \code{TRUE} for every new \code{\link[=get_episode]{get_episode()}} index, and is thus equal to \code{!duplicated(get_episode(...))}.

To specify, when setting \code{episode_days = 365} (using method 1 as explained above), this is how the two functions differ:\tabular{cccc}{
   patient \tab date \tab \code{get_episode()} \tab \code{is_new_episode()} \cr
   A \tab 2019-01-01 \tab 1 \tab TRUE \cr
   A \tab 2019-03-01 \tab 1 \tab FALSE \cr
   A \tab 2021-01-01 \tab 2 \tab TRUE \cr
   B \tab 2008-01-01 \tab 1 \tab TRUE \cr
   B \tab 2008-01-01 \tab 1 \tab FALSE \cr
   C \tab 2020-01-01 \tab 1 \tab TRUE \cr
}

}

\subsection{Other}{

The \code{\link[=first_isolate]{first_isolate()}} function is a wrapper around the \code{\link[=is_new_episode]{is_new_episode()}} function, but is more efficient for data sets containing microorganism codes or names and allows for different isolate selection methods.

The \code{dplyr} package is not required for these functions to work, but these episode functions do support \link[dplyr:group_by]{variable grouping} and work conveniently inside \code{dplyr} verbs such as \code{\link[dplyr:filter]{filter()}}, \code{\link[dplyr:mutate]{mutate()}} and \code{\link[dplyr:summarise]{summarise()}}.
}
}
\examples{
# difference between absolute and relative determination of episodes:
x <- data.frame(dates = as.Date(c(
  "2021-01-01",
  "2021-01-02",
  "2021-01-05",
  "2021-01-08",
  "2021-02-21",
  "2021-02-22",
  "2021-02-23",
  "2021-02-24",
  "2021-03-01",
  "2021-03-01"
)))
x$absolute <- get_episode(x$dates, episode_days = 7)
x$relative <- get_episode(x$dates, case_free_days = 7)
x


# `example_isolates` is a data set available in the AMR package.
# See ?example_isolates
df <- example_isolates[sample(seq_len(2000), size = 100), ]

get_episode(df$date, episode_days = 60) # indices
is_new_episode(df$date, episode_days = 60) # TRUE/FALSE

# filter on results from the third 60-day episode only, using base R
df[which(get_episode(df$date, 60) == 3), ]

# the functions also work for less than a day, e.g. to include one per hour:
get_episode(
  c(
    Sys.time(),
    Sys.time() + 60 * 60
  ),
  episode_days = 1 / 24
)

\donttest{
if (require("dplyr")) {
  # is_new_episode() can also be used in dplyr verbs to determine patient
  # episodes based on any (combination of) grouping variables:
  df \%>\%
    mutate(condition = sample(
      x = c("A", "B", "C"),
      size = 100,
      replace = TRUE
    )) \%>\%
    group_by(patient, condition) \%>\%
    mutate(new_episode = is_new_episode(date, 365)) \%>\%
    select(patient, date, condition, new_episode) \%>\%
    arrange(patient, condition, date)
}

if (require("dplyr")) {
  df \%>\%
    group_by(ward, patient) \%>\%
    transmute(date,
      patient,
      new_index = get_episode(date, 60),
      new_logical = is_new_episode(date, 60)
    ) \%>\%
    arrange(patient, ward, date)
}

if (require("dplyr")) {
  df \%>\%
    group_by(ward) \%>\%
    summarise(
      n_patients = n_distinct(patient),
      n_episodes_365 = sum(is_new_episode(date, episode_days = 365)),
      n_episodes_60 = sum(is_new_episode(date, episode_days = 60)),
      n_episodes_30 = sum(is_new_episode(date, episode_days = 30))
    )
}

# grouping on patients and microorganisms leads to the same
# results as first_isolate() when using 'episode-based':
if (require("dplyr")) {
  x <- df \%>\%
    filter_first_isolate(
      include_unknown = TRUE,
      method = "episode-based"
    )

  y <- df \%>\%
    group_by(patient, mo) \%>\%
    filter(is_new_episode(date, 365)) \%>\%
    ungroup()

  identical(x, y)
}

# but is_new_episode() has a lot more flexibility than first_isolate(),
# since you can now group on anything that seems relevant:
if (require("dplyr")) {
  df \%>\%
    group_by(patient, mo, ward) \%>\%
    mutate(flag_episode = is_new_episode(date, 365)) \%>\%
    select(group_vars(.), flag_episode)
}
}
}
\seealso{
\code{\link[=first_isolate]{first_isolate()}}
}
(v1.4.0.9025) is_new_episode() 2020-11-23 21:50:27 +01:00			`% Generated by roxygen2: do not edit by hand`
use dplyr where available, new `antibiogram()` for WISCA, fixed Salmonella Typhi/Paratyphi 2023-02-06 11:57:22 +01:00			`% Please edit documentation in R/get_episode.R`
(v1.4.0.9046) get_episode 2020-12-27 00:07:00 +01:00			`\name{get_episode}`
			`\alias{get_episode}`
(v1.4.0.9025) is_new_episode() 2020-11-23 21:50:27 +01:00			`\alias{is_new_episode}`
new relative episode determination in `get_episode()`, fix for plotting disk/MIC values 2023-02-24 17:06:30 +01:00			`\title{Determine (Clinical or Epidemic) Episodes}`
(v1.4.0.9025) is_new_episode() 2020-11-23 21:50:27 +01:00			`\usage{`
new relative episode determination in `get_episode()`, fix for plotting disk/MIC values 2023-02-24 17:06:30 +01:00			`get_episode(x, episode_days = NULL, case_free_days = NULL, ...)`
(v1.4.0.9046) get_episode 2020-12-27 00:07:00 +01:00
new relative episode determination in `get_episode()`, fix for plotting disk/MIC values 2023-02-24 17:06:30 +01:00			`is_new_episode(x, episode_days = NULL, case_free_days = NULL, ...)`
(v1.4.0.9025) is_new_episode() 2020-11-23 21:50:27 +01:00			`}`
			`\arguments{`
(v1.7.1.9056) unit tests 2021-11-29 11:55:18 +01:00			`\item{x}{vector of dates (class \code{Date} or \code{POSIXt}), will be sorted internally to determine episodes}`
(v1.4.0.9025) is_new_episode() 2020-11-23 21:50:27 +01:00
new relative episode determination in `get_episode()`, fix for plotting disk/MIC values 2023-02-24 17:06:30 +01:00			`\item{episode_days}{episode length in days, can also be less than a day or \code{Inf}, see \emph{Details}}`

			`\item{case_free_days}{length in days after which new episode will start, can also be less than a day or \code{Inf}, see \emph{Details}}`
(v1.4.0.9025) is_new_episode() 2020-11-23 21:50:27 +01:00
(v1.6.0.9013) website update 2021-04-29 17:16:30 +02:00			`\item{...}{ignored, only in place to allow future extensions}`
(v1.4.0.9025) is_new_episode() 2020-11-23 21:50:27 +01:00			`}`
			`\value{`
(v1.4.0.9046) get_episode 2020-12-27 00:07:00 +01:00			`\itemize{`
add include_screening to `as.sir()` 2023-02-12 15:09:54 +01:00			`\item \code{\link[=get_episode]{get_episode()}}: an \link{integer} vector`
(v1.4.0.9046) get_episode 2020-12-27 00:07:00 +01:00			`\item \code{\link[=is_new_episode]{is_new_episode()}}: a \link{logical} vector`
			`}`
(v1.4.0.9025) is_new_episode() 2020-11-23 21:50:27 +01:00			`}`
			`\description{`
new relative episode determination in `get_episode()`, fix for plotting disk/MIC values 2023-02-24 17:06:30 +01:00			These functions determine which items in a vector can be considered (the start of) a new episode, based on the argument \code{episode_days}. This can be used to determine clinical episodes for any epidemiological analysis. The \code{\link[=get_episode]{get_episode()}} function returns the index number of the episode per group, while the \code{\link[=is_new_episode]{is_new_episode()}} function returns \code{TRUE} for every new \code{\link[=get_episode]{get_episode()}} index. Both absolute and relative episode determination are supported.
(v1.4.0.9025) is_new_episode() 2020-11-23 21:50:27 +01:00			`}`
			`\details{`
new relative episode determination in `get_episode()`, fix for plotting disk/MIC values 2023-02-24 17:06:30 +01:00			`Episodes can be determined in two ways: absolute and relative.`
			`\enumerate{`
			`\item Absolute`

			`This method uses \code{episode_days} to define an episode length in days, after which a new episode will start. A common use case in AMR data analysis is microbial epidemiology: episodes of \emph{S. aureus} bacteraemia in ICU patients for example. The episode length could then be 30 days, so that new \emph{S. aureus} isolates after an ICU episode of 30 days will be considered a different (or new) episode.`

			`Thus, this method counts \strong{since the start of the previous episode}.`
			`\item Relative`

			`This method uses \code{case_free_days} to quantify the duration of (inter-epidemic) intervals, after which a new episode will start. A common use case is infectious disease epidemiology: episodes of norovirus outbreaks in a hospital for example. The case-free period could then be 14 days, so that new norovirus cases after that time will be considered a different (or new) episode.`

			`Thus, this methods counts \strong{since the last case in the previous episode}.`
			`}`

			`In a table:\tabular{ccc}{`
			`Date \tab Using \code{episode_days = 7} \tab Using \code{case_free_days = 7} \cr`
			`2023-01-01 \tab 1 \tab 1 \cr`
			`2023-01-02 \tab 1 \tab 1 \cr`
			`2023-01-05 \tab 1 \tab 1 \cr`
			`2023-01-08 \tab 2\code{*} \tab 1 \cr`
			`2023-02-21 \tab 3 \tab 2\code{**} \cr`
			`2023-02-22 \tab 3 \tab 2 \cr`
			`2023-02-23 \tab 3 \tab 2 \cr`
			`2023-02-24 \tab 3 \tab 2 \cr`
			`2023-03-01 \tab 4 \tab 2 \cr`
			`}`


			`\code{*} This marks the start of a new episode, because 8 January 2023 is more than 7 days since the start of the previous episode (1 January 2023). \cr`
			`\code{**} This marks the start of a new episode, because 21 January 2023 is more than 7 days since the last case in the previous episode (8 January 2023).`
			`\subsection{Difference between \code{get_episode()} and \code{is_new_episode()}}{`

			`The \code{\link[=get_episode]{get_episode()}} function returns the index number of the episode, so all cases/patients/isolates in the first episode will have the number 1, all cases/patients/isolates in the second episode will have the number 2, etc.`

			`The \code{\link[=is_new_episode]{is_new_episode()}} function returns \code{TRUE} for every new \code{\link[=get_episode]{get_episode()}} index, and is thus equal to \code{!duplicated(get_episode(...))}.`

			`To specify, when setting \code{episode_days = 365} (using method 1 as explained above), this is how the two functions differ:\tabular{cccc}{`
			`patient \tab date \tab \code{get_episode()} \tab \code{is_new_episode()} \cr`
add include_screening to `as.sir()` 2023-02-12 15:09:54 +01:00			`A \tab 2019-01-01 \tab 1 \tab TRUE \cr`
			`A \tab 2019-03-01 \tab 1 \tab FALSE \cr`
			`A \tab 2021-01-01 \tab 2 \tab TRUE \cr`
			`B \tab 2008-01-01 \tab 1 \tab TRUE \cr`
			`B \tab 2008-01-01 \tab 1 \tab FALSE \cr`
			`C \tab 2020-01-01 \tab 1 \tab TRUE \cr`
			`}`

new relative episode determination in `get_episode()`, fix for plotting disk/MIC values 2023-02-24 17:06:30 +01:00			`}`
add include_screening to `as.sir()` 2023-02-12 15:09:54 +01:00
new relative episode determination in `get_episode()`, fix for plotting disk/MIC values 2023-02-24 17:06:30 +01:00			`\subsection{Other}{`
(v1.4.0.9025) is_new_episode() 2020-11-23 21:50:27 +01:00
(v1.6.0.9013) website update 2021-04-29 17:16:30 +02:00			`The \code{\link[=first_isolate]{first_isolate()}} function is a wrapper around the \code{\link[=is_new_episode]{is_new_episode()}} function, but is more efficient for data sets containing microorganism codes or names and allows for different isolate selection methods.`
(v1.4.0.9037) random_* functions 2020-12-12 23:17:29 +01:00
use dplyr where available, new `antibiogram()` for WISCA, fixed Salmonella Typhi/Paratyphi 2023-02-06 11:57:22 +01:00			`The \code{dplyr} package is not required for these functions to work, but these episode functions do support \link[dplyr:group_by]{variable grouping} and work conveniently inside \code{dplyr} verbs such as \code{\link[dplyr:filter]{filter()}}, \code{\link[dplyr:mutate]{mutate()}} and \code{\link[dplyr:summarise]{summarise()}}.`
(v1.4.0.9025) is_new_episode() 2020-11-23 21:50:27 +01:00			`}`
new relative episode determination in `get_episode()`, fix for plotting disk/MIC values 2023-02-24 17:06:30 +01:00			`}`
(v1.4.0.9025) is_new_episode() 2020-11-23 21:50:27 +01:00			`\examples{`
new relative episode determination in `get_episode()`, fix for plotting disk/MIC values 2023-02-24 17:06:30 +01:00			`# difference between absolute and relative determination of episodes:`
			`x <- data.frame(dates = as.Date(c(`
			`"2021-01-01",`
			`"2021-01-02",`
			`"2021-01-05",`
			`"2021-01-08",`
			`"2021-02-21",`
			`"2021-02-22",`
			`"2021-02-23",`
			`"2021-02-24",`
			`"2021-03-01",`
			`"2021-03-01"`
			`)))`
			`x$absolute <- get_episode(x$dates, episode_days = 7)`
			`x$relative <- get_episode(x$dates, case_free_days = 7)`
			`x`


(v1.5.0.9010) MDRO vignette update, get_episode for < day 2021-01-24 14:48:56 +01:00			# `example_isolates` is a data set available in the AMR package.
new, automated website 2022-08-21 16:37:20 +02:00			`# See ?example_isolates`
use dplyr where available, new `antibiogram()` for WISCA, fixed Salmonella Typhi/Paratyphi 2023-02-06 11:57:22 +01:00			`df <- example_isolates[sample(seq_len(2000), size = 100), ]`
(v1.4.0.9025) is_new_episode() 2020-11-23 21:50:27 +01:00
styled, unit test fix 2022-08-28 10:31:50 +02:00			`get_episode(df$date, episode_days = 60) # indices`
new, automated website 2022-08-21 16:37:20 +02:00			`is_new_episode(df$date, episode_days = 60) # TRUE/FALSE`
(v1.4.0.9046) get_episode 2020-12-27 00:07:00 +01:00
(v1.4.0.9048) AmpC de-repressed cephalo-resistant mutants 2020-12-27 14:23:11 +01:00			`# filter on results from the third 60-day episode only, using base R`
new, automated website 2022-08-21 16:37:20 +02:00			`df[which(get_episode(df$date, 60) == 3), ]`
(v1.4.0.9046) get_episode 2020-12-27 00:07:00 +01:00
(v1.5.0.9010) MDRO vignette update, get_episode for < day 2021-01-24 14:48:56 +01:00			`# the functions also work for less than a day, e.g. to include one per hour:`
unit tests 2023-02-12 17:10:48 +01:00			`get_episode(`
			`c(`
			`Sys.time(),`
			`Sys.time() + 60 * 60`
			`),`
			`episode_days = 1 / 24`
			`)`
(v1.5.0.9010) MDRO vignette update, get_episode for < day 2021-01-24 14:48:56 +01:00
(v1.4.0.9033) documentation update 2020-12-08 12:37:25 +01:00			`\donttest{`
(v1.4.0.9025) is_new_episode() 2020-11-23 21:50:27 +01:00			`if (require("dplyr")) {`
			`# is_new_episode() can also be used in dplyr verbs to determine patient`
			`# episodes based on any (combination of) grouping variables:`
new, automated website 2022-08-21 16:37:20 +02:00			`df \%>\%`
styled, unit test fix 2022-08-28 10:31:50 +02:00			`mutate(condition = sample(`
			`x = c("A", "B", "C"),`
add include_screening to `as.sir()` 2023-02-12 15:09:54 +01:00			`size = 100,`
styled, unit test fix 2022-08-28 10:31:50 +02:00			`replace = TRUE`
			`)) \%>\%`
add include_screening to `as.sir()` 2023-02-12 15:09:54 +01:00			`group_by(patient, condition) \%>\%`
new, automated website 2022-08-21 16:37:20 +02:00			`mutate(new_episode = is_new_episode(date, 365)) \%>\%`
unit tests 2023-02-12 17:10:48 +01:00			`select(patient, date, condition, new_episode) \%>\%`
add include_screening to `as.sir()` 2023-02-12 15:09:54 +01:00			`arrange(patient, condition, date)`
new tibble export 2022-08-27 20:49:37 +02:00			`}`
use dplyr where available, new `antibiogram()` for WISCA, fixed Salmonella Typhi/Paratyphi 2023-02-06 11:57:22 +01:00
new tibble export 2022-08-27 20:49:37 +02:00			`if (require("dplyr")) {`
new, automated website 2022-08-21 16:37:20 +02:00			`df \%>\%`
new tibble export 2022-08-27 20:49:37 +02:00			`group_by(ward, patient) \%>\%`
styled, unit test fix 2022-08-28 10:31:50 +02:00			`transmute(date,`
			`patient,`
			`new_index = get_episode(date, 60),`
			`new_logical = is_new_episode(date, 60)`
unit tests 2023-02-12 17:10:48 +01:00			`) \%>\%`
fix first isolate 2023-02-10 13:13:17 +01:00			`arrange(patient, ward, date)`
new tibble export 2022-08-27 20:49:37 +02:00			`}`
use dplyr where available, new `antibiogram()` for WISCA, fixed Salmonella Typhi/Paratyphi 2023-02-06 11:57:22 +01:00
new tibble export 2022-08-27 20:49:37 +02:00			`if (require("dplyr")) {`
new, automated website 2022-08-21 16:37:20 +02:00			`df \%>\%`
styled, unit test fix 2022-08-28 10:31:50 +02:00			`group_by(ward) \%>\%`
			`summarise(`
			`n_patients = n_distinct(patient),`
			`n_episodes_365 = sum(is_new_episode(date, episode_days = 365)),`
			`n_episodes_60 = sum(is_new_episode(date, episode_days = 60)),`
			`n_episodes_30 = sum(is_new_episode(date, episode_days = 30))`
			`)`
new tibble export 2022-08-27 20:49:37 +02:00			`}`
(v1.4.0.9025) is_new_episode() 2020-11-23 21:50:27 +01:00
bring back `antibiogram()`, without deps 2023-02-10 16:18:00 +01:00			`# grouping on patients and microorganisms leads to the same`
			`# results as first_isolate() when using 'episode-based':`
new tibble export 2022-08-27 20:49:37 +02:00			`if (require("dplyr")) {`
bring back `antibiogram()`, without deps 2023-02-10 16:18:00 +01:00			`x <- df \%>\%`
			`filter_first_isolate(`
			`include_unknown = TRUE,`
			`method = "episode-based"`
			`)`

			`y <- df \%>\%`
			`group_by(patient, mo) \%>\%`
			`filter(is_new_episode(date, 365)) \%>\%`
			`ungroup()`

			`identical(x, y)`
			`}`

			`# but is_new_episode() has a lot more flexibility than first_isolate(),`
			`# since you can now group on anything that seems relevant:`
			`if (require("dplyr")) {`
new, automated website 2022-08-21 16:37:20 +02:00			`df \%>\%`
new tibble export 2022-08-27 20:49:37 +02:00			`group_by(patient, mo, ward) \%>\%`
new, automated website 2022-08-21 16:37:20 +02:00			`mutate(flag_episode = is_new_episode(date, 365)) \%>\%`
			`select(group_vars(.), flag_episode)`
(v1.4.0.9046) get_episode 2020-12-27 00:07:00 +01:00			`}`
(v1.4.0.9025) is_new_episode() 2020-11-23 21:50:27 +01:00			`}`
			`}`
(v1.4.0.9046) get_episode 2020-12-27 00:07:00 +01:00			`\seealso{`
			`\code{\link[=first_isolate]{first_isolate()}}`
(v1.4.0.9032) auto-data guessing for functions 2020-12-07 16:06:42 +01:00			`}`