AMR/man/freq.Rd

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/freq.R
\name{freq}
\alias{freq}
\alias{frequency_tbl}
\alias{top_freq}
\alias{print.frequency_tbl}
\title{Frequency table}
\usage{
frequency_tbl(x, ..., sort.count = TRUE,
  nmax = getOption("max.print.freq"), na.rm = TRUE, row.names = TRUE,
  markdown = FALSE, digits = 2, sep = " ")

freq(x, ..., sort.count = TRUE, nmax = getOption("max.print.freq"),
  na.rm = TRUE, row.names = TRUE, markdown = FALSE, digits = 2,
  sep = " ")

top_freq(f, n)

\method{print}{frequency_tbl}(x, nmax = getOption("max.print.freq",
  default = 15), ...)
}
\arguments{
\item{x}{vector of any class or a \code{\link{data.frame}}, \code{\link{tibble}} or \code{\link{table}}}

\item{...}{up to nine different columns of \code{x} when \code{x} is a \code{data.frame} or \code{tibble}, to calculate frequencies from - see Examples}

\item{sort.count}{sort on count, i.e. frequencies. This will be \code{TRUE} at default for everything except for factors.}

\item{nmax}{number of row to print. The default, \code{15}, uses \code{\link{getOption}("max.print.freq")}. Use \code{nmax = 0}, \code{nmax = Inf}, \code{nmax = NULL} or \code{nmax = NA} to print all rows.}

\item{na.rm}{a logical value indicating whether \code{NA} values should be removed from the frequency table. The header will always print the amount of \code{NA}s.}

\item{row.names}{a logical value indicating whether row indices should be printed as \code{1:nrow(x)}}

\item{markdown}{print table in markdown format (this forces \code{nmax = NA})}

\item{digits}{how many significant digits are to be used for numeric values in the header (not for the items themselves, that depends on \code{\link{getOption}("digits")})}

\item{sep}{a character string to separate the terms when selecting multiple columns}

\item{f}{a frequency table}

\item{n}{number of top \emph{n} items to return, use -n for the bottom \emph{n} items. It will include more than \code{n} rows if there are ties.}
}
\value{
A \code{data.frame} with an additional class \code{"frequency_tbl"}
}
\description{
Create a frequency table of a vector with items or a data frame. Supports quasiquotation and markdown for reports. \code{top_freq} can be used to get the top/bottom \emph{n} items of a frequency table, with counts as names.
}
\details{
Frequency tables (or frequency distributions) are summaries of the distribution of values in a sample. With the `freq` function, you can create univariate frequency tables. Multiple variables will be pasted into one variable, so it forces a univariate distribution. This package also has a vignette available to explain the use of this function further, run \code{browseVignettes("AMR")} to read it.

For numeric values of any class, these additional values will all be calculated with \code{na.rm = TRUE} and shown into the header:
\itemize{
  \item{Mean, using \code{\link[base]{mean}}}
  \item{Standard Deviation, using \code{\link[stats]{sd}}}
  \item{Coefficient of Variation (CV), the standard deviation divided by the mean}
  \item{Mean Absolute Deviation (MAD), using \code{\link[stats]{mad}}}
  \item{Tukey Five-Number Summaries (minimum, Q1, median, Q3, maximum), using \code{\link[stats]{fivenum}}}
  \item{Interquartile Range (IQR) calculated as \code{Q3 - Q1} using the Tukey Five-Number Summaries, i.e. \strong{not} using the \code{\link[stats]{quantile}} function}
  \item{Coefficient of Quartile Variation (CQV, sometimes called coefficient of dispersion), calculated as \code{(Q3 - Q1) / (Q3 + Q1)} using the Tukey Five-Number Summaries}
  \item{Outliers (total count and unique count), using \code{\link[grDevices]{boxplot.stats}}}
}

For dates and times of any class, these additional values will be calculated with \code{na.rm = TRUE} and shown into the header:
\itemize{
  \item{Oldest, using \code{\link{min}}}
  \item{Newest, using \code{\link{max}}, with difference between newest and oldest}
  \item{Median, using \code{\link[stats]{median}}, with percentage since oldest}
}


The function \code{top_freq} uses \code{\link[dplyr]{top_n}} internally and will include more than \code{n} rows if there are ties.
}
\examples{
library(dplyr)

# this all gives the same result:
freq(septic_patients$hospital_id)
freq(septic_patients[, "hospital_id"])
septic_patients$hospital_id \%>\% freq()
septic_patients[, "hospital_id"] \%>\% freq()
septic_patients \%>\% freq("hospital_id")
septic_patients \%>\% freq(hospital_id)  #<- easiest to remember when you're used to tidyverse

# you could also use `select` or `pull` to get your variables
septic_patients \%>\%
  filter(hospital_id == "A") \%>\%
  select(bactid) \%>\%
  freq()

# multiple selected variables will be pasted together
septic_patients \%>\%
  left_join_microorganisms \%>\%
  filter(hospital_id == "A") \%>\%
  freq(genus, species)

# get top 10 bugs of hospital A as a vector
septic_patients \%>\%
  filter(hospital_id == "A") \%>\%
  freq(bactid) \%>\%
  top_freq(10)

# save frequency table to an object
years <- septic_patients \%>\%
  mutate(year = format(date, "\%Y")) \%>\%
  freq(year)

# show only the top 5
years \%>\% print(nmax = 5)

# save to an object with formatted percentages
years <- format(years)

# print a histogram of numeric values
septic_patients \%>\%
  freq(age) \%>\%
  hist()  # prettier: ggplot(septic_patients, aes(age)) + geom_histogram()

# or print all points to a regular plot
septic_patients \%>\%
  freq(age) \%>\%
  plot()

# transform to a data.frame or tibble
septic_patients \%>\%
  freq(age) \%>\%
  as.data.frame()

# or transform (back) to a vector
septic_patients \%>\%
  freq(age) \%>\%
  as.vector()

identical(septic_patients \%>\%
            freq(age) \%>\%
            as.vector() \%>\%
            sort(),
          sort(septic_patients$age)) # TRUE

# it also supports `table` objects:
table(septic_patients$sex,
      septic_patients$age) \%>\%
  freq(sep = " **sep** ")

\dontrun{
# send frequency table to clipboard (e.g. for pasting in Excel)
septic_patients \%>\%
  freq(age) \%>\%
  format() \%>\%       # this will format the percentages
  clipboard_export()
}
}
\keyword{freq}
\keyword{frequency}
\keyword{summarise}
\keyword{summary}
MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00			`% Generated by roxygen2: do not edit by hand`
			`% Please edit documentation in R/freq.R`
			`\name{freq}`
			`\alias{freq}`
			`\alias{frequency_tbl}`
top_freq 2018-06-20 14:47:37 +02:00			`\alias{top_freq}`
include IQR and MAD in freq 2018-07-03 11:30:40 +02:00			`\alias{print.frequency_tbl}`
MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00			`\title{Frequency table}`
			`\usage{`
new ggplot enhancement 2018-08-11 21:30:00 +02:00			`frequency_tbl(x, ..., sort.count = TRUE,`
			`nmax = getOption("max.print.freq"), na.rm = TRUE, row.names = TRUE,`
			`markdown = FALSE, digits = 2, sep = " ")`
MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00
new g.test() and edited freq() 2018-07-01 21:40:37 +02:00			`freq(x, ..., sort.count = TRUE, nmax = getOption("max.print.freq"),`
			`na.rm = TRUE, row.names = TRUE, markdown = FALSE, digits = 2,`
			`sep = " ")`
top_freq 2018-06-20 14:47:37 +02:00
			`top_freq(f, n)`
include IQR and MAD in freq 2018-07-03 11:30:40 +02:00
new ggplot enhancement 2018-08-11 21:30:00 +02:00			`\method{print}{frequency_tbl}(x, nmax = getOption("max.print.freq",`
			`default = 15), ...)`
MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00			`}`
			`\arguments{`
freq: support for table 2018-07-09 14:02:58 +02:00			`\item{x}{vector of any class or a \code{\link{data.frame}}, \code{\link{tibble}} or \code{\link{table}}}`
MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00
freq: support for table 2018-07-09 14:02:58 +02:00			`\item{...}{up to nine different columns of \code{x} when \code{x} is a \code{data.frame} or \code{tibble}, to calculate frequencies from - see Examples}`
MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00
include IQR and MAD in freq 2018-07-03 11:30:40 +02:00			`\item{sort.count}{sort on count, i.e. frequencies. This will be \code{TRUE} at default for everything except for factors.}`
new g.test() and edited freq() 2018-07-01 21:40:37 +02:00
include IQR and MAD in freq 2018-07-03 11:30:40 +02:00			`\item{nmax}{number of row to print. The default, \code{15}, uses \code{\link{getOption}("max.print.freq")}. Use \code{nmax = 0}, \code{nmax = Inf}, \code{nmax = NULL} or \code{nmax = NA} to print all rows.}`
MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00
include IQR and MAD in freq 2018-07-03 11:30:40 +02:00			`\item{na.rm}{a logical value indicating whether \code{NA} values should be removed from the frequency table. The header will always print the amount of \code{NA}s.}`
MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00
extra unit tests, add row.names to freq 2018-06-19 15:20:14 +02:00			`\item{row.names}{a logical value indicating whether row indices should be printed as \code{1:nrow(x)}}`

MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00			`\item{markdown}{print table in markdown format (this forces \code{nmax = NA})}`

new g.test() and edited freq() 2018-07-01 21:40:37 +02:00			`\item{digits}{how many significant digits are to be used for numeric values in the header (not for the items themselves, that depends on \code{\link{getOption}("digits")})}`
MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00
			`\item{sep}{a character string to separate the terms when selecting multiple columns}`
top_freq 2018-06-20 14:47:37 +02:00
new g.test() and edited freq() 2018-07-01 21:40:37 +02:00			`\item{f}{a frequency table}`
top_freq 2018-06-20 14:47:37 +02:00
			`\item{n}{number of top \emph{n} items to return, use -n for the bottom \emph{n} items. It will include more than \code{n} rows if there are ties.}`
			`}`
			`\value{`
new g.test() and edited freq() 2018-07-01 21:40:37 +02:00			`A \code{data.frame} with an additional class \code{"frequency_tbl"}`
MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00			`}`
			`\description{`
new g.test() and edited freq() 2018-07-01 21:40:37 +02:00			`Create a frequency table of a vector with items or a data frame. Supports quasiquotation and markdown for reports. \code{top_freq} can be used to get the top/bottom \emph{n} items of a frequency table, with counts as names.`
MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00			`}`
			`\details{`
include IQR and MAD in freq 2018-07-03 11:30:40 +02:00			Frequency tables (or frequency distributions) are summaries of the distribution of values in a sample. With the `freq` function, you can create univariate frequency tables. Multiple variables will be pasted into one variable, so it forces a univariate distribution. This package also has a vignette available to explain the use of this function further, run \code{browseVignettes("AMR")} to read it.
top_freq 2018-06-20 14:47:37 +02:00
include IQR and MAD in freq 2018-07-03 11:30:40 +02:00			`For numeric values of any class, these additional values will all be calculated with \code{na.rm = TRUE} and shown into the header:`
MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00			`\itemize{`
			`\item{Mean, using \code{\link[base]{mean}}}`
include IQR and MAD in freq 2018-07-03 11:30:40 +02:00			`\item{Standard Deviation, using \code{\link[stats]{sd}}}`
			`\item{Coefficient of Variation (CV), the standard deviation divided by the mean}`
			`\item{Mean Absolute Deviation (MAD), using \code{\link[stats]{mad}}}`
			`\item{Tukey Five-Number Summaries (minimum, Q1, median, Q3, maximum), using \code{\link[stats]{fivenum}}}`
			`\item{Interquartile Range (IQR) calculated as \code{Q3 - Q1} using the Tukey Five-Number Summaries, i.e. \strong{not} using the \code{\link[stats]{quantile}} function}`
			`\item{Coefficient of Quartile Variation (CQV, sometimes called coefficient of dispersion), calculated as \code{(Q3 - Q1) / (Q3 + Q1)} using the Tukey Five-Number Summaries}`
			`\item{Outliers (total count and unique count), using \code{\link[grDevices]{boxplot.stats}}}`
MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00			`}`
top_freq 2018-06-20 14:47:37 +02:00
include IQR and MAD in freq 2018-07-03 11:30:40 +02:00			`For dates and times of any class, these additional values will be calculated with \code{na.rm = TRUE} and shown into the header:`
top_freq 2018-06-20 14:47:37 +02:00			`\itemize{`
kurtosis, skewness, start with ML 2018-07-08 22:14:55 +02:00			`\item{Oldest, using \code{\link{min}}}`
			`\item{Newest, using \code{\link{max}}, with difference between newest and oldest}`
top_freq 2018-06-20 14:47:37 +02:00			`\item{Median, using \code{\link[stats]{median}}, with percentage since oldest}`
			`}`

include IQR and MAD in freq 2018-07-03 11:30:40 +02:00
top_freq 2018-06-20 14:47:37 +02:00			`The function \code{top_freq} uses \code{\link[dplyr]{top_n}} internally and will include more than \code{n} rows if there are ties.`
MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00			`}`
			`\examples{`
			`library(dplyr)`

new g.test() and edited freq() 2018-07-01 21:40:37 +02:00			`# this all gives the same result:`
MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00			`freq(septic_patients$hospital_id)`
new g.test() and edited freq() 2018-07-01 21:40:37 +02:00			`freq(septic_patients[, "hospital_id"])`
			`septic_patients$hospital_id \%>\% freq()`
			`septic_patients[, "hospital_id"] \%>\% freq()`
			`septic_patients \%>\% freq("hospital_id")`
freq: support for table 2018-07-09 14:02:58 +02:00			`septic_patients \%>\% freq(hospital_id) #<- easiest to remember when you're used to tidyverse`
MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00
freq: support for table 2018-07-09 14:02:58 +02:00			# you could also use `select` or `pull` to get your variables
MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00			`septic_patients \%>\%`
			`filter(hospital_id == "A") \%>\%`
			`select(bactid) \%>\%`
			`freq()`

freq: support for table 2018-07-09 14:02:58 +02:00			`# multiple selected variables will be pasted together`
MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00			`septic_patients \%>\%`
			`left_join_microorganisms \%>\%`
			`filter(hospital_id == "A") \%>\%`
new g.test() and edited freq() 2018-07-01 21:40:37 +02:00			`freq(genus, species)`
MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00
include IQR and MAD in freq 2018-07-03 11:30:40 +02:00			`# get top 10 bugs of hospital A as a vector`
			`septic_patients \%>\%`
			`filter(hospital_id == "A") \%>\%`
			`freq(bactid) \%>\%`
			`top_freq(10)`

MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00			`# save frequency table to an object`
			`years <- septic_patients \%>\%`
			`mutate(year = format(date, "\%Y")) \%>\%`
new g.test() and edited freq() 2018-07-01 21:40:37 +02:00			`freq(year)`
top_freq 2018-06-20 14:47:37 +02:00
freq: support for table 2018-07-09 14:02:58 +02:00			`# show only the top 5`
include IQR and MAD in freq 2018-07-03 11:30:40 +02:00			`years \%>\% print(nmax = 5)`

support format.freq 2018-07-16 16:41:48 +02:00			`# save to an object with formatted percentages`
			`years <- format(years)`

freq: support for table 2018-07-09 14:02:58 +02:00			`# print a histogram of numeric values`
			`septic_patients \%>\%`
			`freq(age) \%>\%`
support format.freq 2018-07-16 16:41:48 +02:00			`hist() # prettier: ggplot(septic_patients, aes(age)) + geom_histogram()`
freq: support for table 2018-07-09 14:02:58 +02:00
			`# or print all points to a regular plot`
			`septic_patients \%>\%`
			`freq(age) \%>\%`
			`plot()`

			`# transform to a data.frame or tibble`
top_freq 2018-06-20 14:47:37 +02:00			`septic_patients \%>\%`
include IQR and MAD in freq 2018-07-03 11:30:40 +02:00			`freq(age) \%>\%`
			`as.data.frame()`
freq: support for table 2018-07-09 14:02:58 +02:00
			`# or transform (back) to a vector`
			`septic_patients \%>\%`
			`freq(age) \%>\%`
			`as.vector()`

			`identical(septic_patients \%>\%`
			`freq(age) \%>\%`
			`as.vector() \%>\%`
			`sort(),`
support format.freq 2018-07-16 16:41:48 +02:00			`sort(septic_patients$age)) # TRUE`
freq: support for table 2018-07-09 14:02:58 +02:00
support format.freq 2018-07-16 16:41:48 +02:00			# it also supports `table` objects:
freq: support for table 2018-07-09 14:02:58 +02:00			`table(septic_patients$sex,`
			`septic_patients$age) \%>\%`
support format.freq 2018-07-16 16:41:48 +02:00			`freq(sep = " sep ")`

			`\dontrun{`
			`# send frequency table to clipboard (e.g. for pasting in Excel)`
			`septic_patients \%>\%`
			`freq(age) \%>\%`
			`format() \%>\% # this will format the percentages`
			`clipboard_export()`
			`}`
MDRO, freq tables, new print format for tibbles 2018-04-18 12:24:54 +02:00			`}`
			`\keyword{freq}`
			`\keyword{frequency}`
			`\keyword{summarise}`
			`\keyword{summary}`