Create a frequency table of a vector with items or a data.frame. Supports quasiquotation and markdown for reports. Best practice is: data %>% freq(var).
top_freq can be used to get the top/bottom n items of a frequency table, with counts as names.

frequency_tbl(x, ..., sort.count = TRUE,
  nmax = getOption("max.print.freq"), na.rm = TRUE, row.names = TRUE,
  markdown = !interactive(), digits = 2, quote = FALSE,
  header = !markdown, title = NULL, na = "<NA>", droplevels = TRUE,
  sep = " ", decimal.mark = getOption("OutDec"),
  big.mark = ifelse(decimal.mark != ",", ",", "."))

freq(x, ..., sort.count = TRUE, nmax = getOption("max.print.freq"),
  na.rm = TRUE, row.names = TRUE, markdown = !interactive(),
  digits = 2, quote = FALSE, header = !markdown, title = NULL,
  na = "<NA>", droplevels = TRUE, sep = " ",
  decimal.mark = getOption("OutDec"), big.mark = ifelse(decimal.mark !=
  ",", ",", "."))

top_freq(f, n)

header(f, property = NULL)

# S3 method for frequency_tbl
print(x, nmax = getOption("max.print.freq",
  default = 15), markdown = !interactive(), header = !markdown,
  decimal.mark = getOption("OutDec"), big.mark = ifelse(decimal.mark !=
  ",", ",", "."), ...)

Arguments

x

vector of any class or a data.frame, tibble (may contain a grouping variable) or table

...

up to nine different columns of x when x is a data.frame or tibble, to calculate frequencies from - see Examples. Also supports quasiquotion.

sort.count

sort on count, i.e. frequencies. This will be TRUE at default for everything except when using grouping variables.

nmax

number of row to print. The default, 15, uses getOption("max.print.freq"). Use nmax = 0, nmax = Inf, nmax = NULL or nmax = NA to print all rows.

na.rm

a logical value indicating whether NA values should be removed from the frequency table. The header (if set) will always print the amount of NAs.

row.names

a logical value indicating whether row indices should be printed as 1:nrow(x)

markdown

a logical value indicating whether the frequency table should be printed in markdown format. This will print all rows (except when nmax is defined) and is default behaviour in non-interactive R sessions (like when knitting RMarkdown files).

digits

how many significant digits are to be used for numeric values in the header (not for the items themselves, that depends on getOption("digits"))

quote

a logical value indicating whether or not strings should be printed with surrounding quotes

header

a logical value indicating whether an informative header should be printed

title

text to show above frequency table, at default to tries to coerce from the variables passed to x

na

a character string that should be used to show empty (NA) values (only useful when na.rm = FALSE)

droplevels

a logical value indicating whether in factors empty levels should be dropped

sep

a character string to separate the terms when selecting multiple columns

decimal.mark

used for prettying (longish) numerical and complex sequences. Passed to prettyNum: that help page explains the details.

big.mark

used for prettying (longish) numerical and complex sequences. Passed to prettyNum: that help page explains the details.

f

a frequency table

n

number of top n items to return, use -n for the bottom n items. It will include more than n rows if there are ties.

property

property in header to return this value directly

Value

A data.frame (with an additional class "frequency_tbl") with five columns: item, count, percent, cum_count and cum_percent.

Details

Frequency tables (or frequency distributions) are summaries of the distribution of values in a sample. With the `freq` function, you can create univariate frequency tables. Multiple variables will be pasted into one variable, so it forces a univariate distribution. This package also has a vignette available to explain the use of this function further, run browseVignettes("AMR") to read it.

For numeric values of any class, these additional values will all be calculated with na.rm = TRUE and shown into the header:

  • Mean, using mean

  • Standard Deviation, using sd

  • Coefficient of Variation (CV), the standard deviation divided by the mean

  • Mean Absolute Deviation (MAD), using mad

  • Tukey Five-Number Summaries (minimum, Q1, median, Q3, maximum), using fivenum

  • Interquartile Range (IQR) calculated as Q3 - Q1 using the Tukey Five-Number Summaries, i.e. not using the quantile function

  • Coefficient of Quartile Variation (CQV, sometimes called coefficient of dispersion), calculated as (Q3 - Q1) / (Q3 + Q1) using the Tukey Five-Number Summaries

  • Outliers (total count and unique count), using boxplot.stats

For dates and times of any class, these additional values will be calculated with na.rm = TRUE and shown into the header:

  • Oldest, using min

  • Newest, using max, with difference between newest and oldest

  • Median, using median, with percentage since oldest

In factors, all factor levels that are not existing in the input data will be dropped.

The function top_freq uses top_n internally and will include more than n rows if there are ties.

Read more on our website!


On our website https://msberends.gitlab.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.

Examples

# NOT RUN {
library(dplyr)

# this all gives the same result:
freq(septic_patients$hospital_id)
freq(septic_patients[, "hospital_id"])
septic_patients$hospital_id %>% freq()
septic_patients[, "hospital_id"] %>% freq()
septic_patients %>% freq("hospital_id")
septic_patients %>% freq(hospital_id)  #<- easiest to remember (tidyverse)


# you could also use `select` or `pull` to get your variables
septic_patients %>%
  filter(hospital_id == "A") %>%
  select(mo) %>%
  freq()


# multiple selected variables will be pasted together
septic_patients %>%
  left_join_microorganisms %>%
  freq(genus, species)

# functions as quasiquotation are also supported
septic_patients %>%
  freq(mo_genus(mo), mo_species(mo))


# group a variable and analyse another
septic_patients %>%
  group_by(hospital_id) %>%
  freq(gender)


# get top 10 bugs of hospital A as a vector
septic_patients %>%
  filter(hospital_id == "A") %>%
  freq(mo) %>%
  top_freq(10)


# save frequency table to an object
years <- septic_patients %>%
  mutate(year = format(date, "%Y")) %>%
  freq(year)


# show only the top 5
years %>% print(nmax = 5)


# save to an object with formatted percentages
years <- format(years)


# print a histogram of numeric values
septic_patients %>%
  freq(age) %>%
  hist()


# or print all points to a regular plot
septic_patients %>%
  freq(age) %>%
  plot()


# transform to a data.frame or tibble
septic_patients %>%
  freq(age) %>%
  as.data.frame()


# or transform (back) to a vector
septic_patients %>%
  freq(age) %>%
  as.vector()

identical(septic_patients %>%
            freq(age) %>%
            as.vector() %>%
            sort(),
          sort(septic_patients$age)) # TRUE


# it also supports `table` objects
table(septic_patients$gender,
      septic_patients$age) %>%
  freq(sep = " **sep** ")


# only get selected columns
septic_patients %>%
  freq(hospital_id) %>%
  select(item, percent)

septic_patients %>%
  freq(hospital_id) %>%
  select(-count, -cum_count)


# check differences between frequency tables
diff(freq(septic_patients$trim),
     freq(septic_patients$trsu))
# }