2018-04-18 12:24:54 +02:00
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/freq.R
\name{freq}
\alias{freq}
\alias{frequency_tbl}
2018-06-20 14:47:37 +02:00
\alias{top_freq}
2018-04-18 12:24:54 +02:00
\title{Frequency table}
\usage{
2018-07-01 21:40:37 +02:00
frequency_tbl(x, ..., sort.count = TRUE, nmax = getOption("max.print.freq"),
na.rm = TRUE, row.names = TRUE, markdown = FALSE, digits = 2,
sep = " ")
2018-04-18 12:24:54 +02:00
2018-07-01 21:40:37 +02:00
freq(x, ..., sort.count = TRUE, nmax = getOption("max.print.freq"),
na.rm = TRUE, row.names = TRUE, markdown = FALSE, digits = 2,
sep = " ")
2018-06-20 14:47:37 +02:00
top_freq(f, n)
2018-04-18 12:24:54 +02:00
}
\arguments{
2018-07-01 21:40:37 +02:00
\item{x}{vector with items, or \code{data.frame}}
2018-04-18 12:24:54 +02:00
2018-07-01 21:40:37 +02:00
\item{...}{up to nine different columns of \code{x} to calculate frequencies from, see Examples}
2018-04-18 12:24:54 +02:00
2018-07-01 21:40:37 +02:00
\item{sort.count}{sort on count, i.e. frequencies. Use \code{FALSE} to sort alphabetically on item.}
\item{nmax}{number of row to print. The default, \code{15}, uses \code{\link{getOption}("max.print.freq")}. Use \code{nmax = 0}, \code{nmax = NULL} or \code{nmax = NA} to print all rows.}
2018-04-18 12:24:54 +02:00
2018-05-09 11:44:46 +02:00
\item{na.rm}{a logical value indicating whether NA values should be removed from the frequency table. The header will always print the amount of \code{NA}s.}
2018-04-18 12:24:54 +02:00
2018-06-19 15:20:14 +02:00
\item{row.names}{a logical value indicating whether row indices should be printed as \code{1:nrow(x)}}
2018-04-18 12:24:54 +02:00
\item{markdown}{print table in markdown format (this forces \code{nmax = NA})}
2018-07-01 21:40:37 +02:00
\item{digits}{how many significant digits are to be used for numeric values in the header (not for the items themselves, that depends on \code{\link{getOption}("digits")})}
2018-04-18 12:24:54 +02:00
\item{sep}{a character string to separate the terms when selecting multiple columns}
2018-06-20 14:47:37 +02:00
2018-07-01 21:40:37 +02:00
\item{f}{a frequency table}
2018-06-20 14:47:37 +02:00
\item{n}{number of top \emph{n} items to return, use -n for the bottom \emph{n} items. It will include more than \code{n} rows if there are ties.}
}
\value{
2018-07-01 21:40:37 +02:00
A \code{data.frame} with an additional class \code{"frequency_tbl"}
2018-04-18 12:24:54 +02:00
}
\description{
2018-07-01 21:40:37 +02:00
Create a frequency table of a vector with items or a data frame. Supports quasiquotation and markdown for reports. \code{top_freq} can be used to get the top/bottom \emph{n} items of a frequency table, with counts as names.
2018-04-18 12:24:54 +02:00
}
\details{
2018-06-20 14:47:37 +02:00
This package also has a vignette available about this function, run: \code{browseVignettes("AMR")} to read it.
For numeric values of any class, these additional values will be calculated and shown into the header:
2018-04-18 12:24:54 +02:00
\itemize{
\item{Mean, using \code{\link[base]{mean}}}
\item{Standard deviation, using \code{\link[stats]{sd}}}
\item{Five numbers of Tukey (min, Q1, median, Q3, max), using \code{\link[stats]{fivenum}}}
2018-05-09 11:44:46 +02:00
\item{Outliers (total count and unique count), using \code{\link{boxplot.stats}}}
2018-04-18 12:24:54 +02:00
\item{Coefficient of variation (CV), the standard deviation divided by the mean}
\item{Coefficient of quartile variation (CQV, sometimes called coefficient of dispersion), calculated as \code{(Q3 - Q1) / (Q3 + Q1)} using \code{\link{quantile}} with \code{type = 6} as quantile algorithm to comply with SPSS standards}
}
2018-06-20 14:47:37 +02:00
For dates and times of any class, these additional values will be calculated and shown into the header:
\itemize{
\item{Oldest, using \code{\link[base]{min}}}
\item{Newest, using \code{\link[base]{max}}, with difference between newest and oldest}
\item{Median, using \code{\link[stats]{median}}, with percentage since oldest}
}
The function \code{top_freq} uses \code{\link[dplyr]{top_n}} internally and will include more than \code{n} rows if there are ties.
2018-04-18 12:24:54 +02:00
}
\examples{
library(dplyr)
2018-07-01 21:40:37 +02:00
# this all gives the same result:
2018-04-18 12:24:54 +02:00
freq(septic_patients$hospital_id)
2018-07-01 21:40:37 +02:00
freq(septic_patients[, "hospital_id"])
septic_patients$hospital_id \%>\% freq()
septic_patients[, "hospital_id"] \%>\% freq()
septic_patients \%>\% freq("hospital_id")
septic_patients \%>\% freq(hospital_id) # <- easiest to remember when used to tidyverse
2018-04-18 12:24:54 +02:00
2018-07-01 21:40:37 +02:00
# you could use `select`...
2018-04-18 12:24:54 +02:00
septic_patients \%>\%
filter(hospital_id == "A") \%>\%
select(bactid) \%>\%
freq()
2018-07-01 21:40:37 +02:00
# ... or you use `freq` to select it immediately
septic_patients \%>\%
filter(hospital_id == "A") \%>\%
freq(bactid)
2018-04-18 12:24:54 +02:00
# select multiple columns; they will be pasted together
septic_patients \%>\%
left_join_microorganisms \%>\%
filter(hospital_id == "A") \%>\%
2018-07-01 21:40:37 +02:00
freq(genus, species)
2018-04-18 12:24:54 +02:00
# save frequency table to an object
years <- septic_patients \%>\%
mutate(year = format(date, "\%Y")) \%>\%
2018-07-01 21:40:37 +02:00
freq(year)
years \%>\% pull(item)
2018-06-20 14:47:37 +02:00
# get top 10 bugs of hospital A as a vector
septic_patients \%>\%
filter(hospital_id == "A") \%>\%
2018-07-01 21:40:37 +02:00
freq(bactid) \%>\%
2018-06-20 14:47:37 +02:00
top_freq(10)
2018-04-18 12:24:54 +02:00
}
\keyword{freq}
\keyword{frequency}
\keyword{summarise}
\keyword{summary}