freq.Rd
Create a frequency table of a vector with items or a data.frame
. Supports quasiquotation and markdown for reports. Best practice is: data %>% freq(var)
.
top_freq
can be used to get the top/bottom n items of a frequency table, with counts as names.
frequency_tbl(x, ..., sort.count = TRUE, nmax = getOption("max.print.freq"), na.rm = TRUE, row.names = TRUE, markdown = !interactive(), digits = 2, quote = FALSE, header = TRUE, title = NULL, na = "<NA>", droplevels = TRUE, sep = " ", decimal.mark = getOption("OutDec"), big.mark = ifelse(decimal.mark != ",", ",", ".")) freq(x, ..., sort.count = TRUE, nmax = getOption("max.print.freq"), na.rm = TRUE, row.names = TRUE, markdown = !interactive(), digits = 2, quote = FALSE, header = TRUE, title = NULL, na = "<NA>", droplevels = TRUE, sep = " ", decimal.mark = getOption("OutDec"), big.mark = ifelse(decimal.mark != ",", ",", ".")) top_freq(f, n) header(f, property = NULL) # S3 method for frequency_tbl print(x, nmax = getOption("max.print.freq", default = 15), markdown = !interactive(), header = TRUE, decimal.mark = getOption("OutDec"), big.mark = ifelse(decimal.mark != ",", ",", "."), ...)
x | vector of any class or a |
---|---|
... | up to nine different columns of |
sort.count | sort on count, i.e. frequencies. This will be |
nmax | number of row to print. The default, |
na.rm | a logical value indicating whether |
row.names | a logical value indicating whether row indices should be printed as |
markdown | a logical value indicating whether the frequency table should be printed in markdown format. This will print all rows (except when |
digits | how many significant digits are to be used for numeric values in the header (not for the items themselves, that depends on |
quote | a logical value indicating whether or not strings should be printed with surrounding quotes |
header | a logical value indicating whether an informative header should be printed |
title | text to show above frequency table, at default to tries to coerce from the variables passed to |
na | a character string that should be used to show empty ( |
droplevels | a logical value indicating whether in factors empty levels should be dropped |
sep | a character string to separate the terms when selecting multiple columns |
decimal.mark |
used for prettying (longish) numerical and complex sequences.
Passed to |
big.mark |
used for prettying (longish) numerical and complex sequences.
Passed to |
f | a frequency table |
n | number of top n items to return, use -n for the bottom n items. It will include more than |
property | property in header to return this value directly |
A data.frame
(with an additional class "frequency_tbl"
) with five columns: item
, count
, percent
, cum_count
and cum_percent
.
Frequency tables (or frequency distributions) are summaries of the distribution of values in a sample. With the `freq` function, you can create univariate frequency tables. Multiple variables will be pasted into one variable, so it forces a univariate distribution. This package also has a vignette available to explain the use of this function further, run browseVignettes("AMR")
to read it.
For numeric values of any class, these additional values will all be calculated with na.rm = TRUE
and shown into the header:
Mean, using mean
Standard Deviation, using sd
Coefficient of Variation (CV), the standard deviation divided by the mean
Mean Absolute Deviation (MAD), using mad
Tukey Five-Number Summaries (minimum, Q1, median, Q3, maximum), using fivenum
Interquartile Range (IQR) calculated as Q3 - Q1
using the Tukey Five-Number Summaries, i.e. not using the quantile
function
Coefficient of Quartile Variation (CQV, sometimes called coefficient of dispersion), calculated as (Q3 - Q1) / (Q3 + Q1)
using the Tukey Five-Number Summaries
Outliers (total count and unique count), using boxplot.stats
For dates and times of any class, these additional values will be calculated with na.rm = TRUE
and shown into the header:
Oldest, using min
Newest, using max
, with difference between newest and oldest
Median, using median
, with percentage since oldest
In factors, all factor levels that are not existing in the input data will be dropped.
The function top_freq
uses top_n
internally and will include more than n
rows if there are ties.
On our website https://msberends.gitlab.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
# NOT RUN { library(dplyr) # this all gives the same result: freq(septic_patients$hospital_id) freq(septic_patients[, "hospital_id"]) septic_patients$hospital_id %>% freq() septic_patients[, "hospital_id"] %>% freq() septic_patients %>% freq("hospital_id") septic_patients %>% freq(hospital_id) #<- easiest to remember (tidyverse) # you could also use `select` or `pull` to get your variables septic_patients %>% filter(hospital_id == "A") %>% select(mo) %>% freq() # multiple selected variables will be pasted together septic_patients %>% left_join_microorganisms %>% freq(genus, species) # functions as quasiquotation are also supported septic_patients %>% freq(mo_genus(mo), mo_species(mo)) # group a variable and analyse another septic_patients %>% group_by(hospital_id) %>% freq(gender) # get top 10 bugs of hospital A as a vector septic_patients %>% filter(hospital_id == "A") %>% freq(mo) %>% top_freq(10) # save frequency table to an object years <- septic_patients %>% mutate(year = format(date, "%Y")) %>% freq(year) # show only the top 5 years %>% print(nmax = 5) # save to an object with formatted percentages years <- format(years) # print a histogram of numeric values septic_patients %>% freq(age) %>% hist() # or print all points to a regular plot septic_patients %>% freq(age) %>% plot() # transform to a data.frame or tibble septic_patients %>% freq(age) %>% as.data.frame() # or transform (back) to a vector septic_patients %>% freq(age) %>% as.vector() identical(septic_patients %>% freq(age) %>% as.vector() %>% sort(), sort(septic_patients$age)) # TRUE # it also supports `table` objects table(septic_patients$gender, septic_patients$age) %>% freq(sep = " **sep** ") # only get selected columns septic_patients %>% freq(hospital_id) %>% select(item, percent) septic_patients %>% freq(hospital_id) %>% select(-count, -cum_count) # check differences between frequency tables diff(freq(septic_patients$trim), freq(septic_patients$trsu)) # }