freq.Rmd
Frequency tables (or frequency distributions) are summaries of the distribution of values in a sample. With the freq
function, you can create univariate frequency tables. Multiple variables will be pasted into one variable, so it forces a univariate distribution. We take the septic_patients
dataset (included in this AMR package) as example.
To only show and quickly review the content of one variable, you can just select this variable in various ways. Let’s say we want to get the frequencies of the gender
variable of the septic_patients
dataset:
Frequency table of gender
Item | Count | Percent | Cum. Count | Cum. Percent | |
---|---|---|---|---|---|
1 | M | 1,031 | 51.6% | 1,031 | 51.6% |
2 | F | 969 | 48.5% | 2,000 | 100.0% |
This immediately shows the class of the variable, its length and availability (i.e. the amount of NA
), the amount of unique values and (most importantly) that among septic patients men are more prevalent than women.
Multiple variables will be pasted into one variable to review individual cases, keeping a univariate frequency table.
For illustration, we could add some more variables to the septic_patients
dataset to learn about bacterial properties:
Now all variables of the microorganisms
dataset have been joined to the septic_patients
dataset. The microorganisms
dataset consists of the following variables:
colnames(microorganisms)
[1] "mo" "tsn" "genus" "species" "subspecies"
[6] "fullname" "family" "order" "class" "phylum"
[11] "subkingdom" "kingdom" "gramstain" "prevalence" "ref"
If we compare the dimensions between the old and new dataset, we can see that these 14 variables were added:
So now the genus
and species
variables are available. A frequency table of these combined variables can be created like this:
Frequency table of genus
and species
Item | Count | Percent | Cum. Count | Cum. Percent | |
---|---|---|---|---|---|
1 | Escherichia coli | 467 | 23.4% | 467 | 23.4% |
2 | Staphylococcus coagulase negative | 313 | 15.7% | 780 | 39.0% |
3 | Staphylococcus aureus | 235 | 11.8% | 1,015 | 50.7% |
4 | Staphylococcus epidermidis | 174 | 8.7% | 1,189 | 59.5% |
5 | Streptococcus pneumoniae | 117 | 5.9% | 1,306 | 65.3% |
6 | Staphylococcus hominis | 81 | 4.1% | 1,387 | 69.4% |
7 | Klebsiella pneumoniae | 58 | 2.9% | 1,445 | 72.3% |
8 | Enterococcus faecalis | 39 | 2.0% | 1,484 | 74.2% |
9 | Proteus mirabilis | 36 | 1.8% | 1,520 | 76.0% |
10 | Pseudomonas aeruginosa | 30 | 1.5% | 1,550 | 77.5% |
11 | Serratia marcescens | 25 | 1.3% | 1,575 | 78.8% |
12 | Enterobacter cloacae | 23 | 1.2% | 1,598 | 79.9% |
13 | Enterococcus faecium | 21 | 1.1% | 1,619 | 81.0% |
14 | Staphylococcus capitis | 21 | 1.1% | 1,640 | 82.0% |
15 | Bacteroides fragilis | 20 | 1.0% | 1,660 | 83.0% |
16 | Enterococcus species | 20 | 1.0% | 1,680 | 84.0% |
17 | Streptococcus group B | 18 | 0.9% | 1,698 | 84.9% |
18 | Klebsiella oxytoca | 16 | 0.8% | 1,714 | 85.7% |
19 | Streptococcus pyogenes | 16 | 0.8% | 1,730 | 86.5% |
20 | Streptococcus dysgalactiae | 14 | 0.7% | 1,744 | 87.2% |
21 | Streptococcus group A | 13 | 0.7% | 1,757 | 87.9% |
22 | Streptococcus mitis | 13 | 0.7% | 1,770 | 88.5% |
23 | Streptococcus salivarius | 12 | 0.6% | 1,782 | 89.1% |
24 | Streptococcus agalactiae | 11 | 0.6% | 1,793 | 89.7% |
25 | Streptococcus species | 11 | 0.6% | 1,804 | 90.2% |
26 | Corynebacterium species | 10 | 0.5% | 1,814 | 90.7% |
27 | Streptococcus bovis | 10 | 0.5% | 1,824 | 91.2% |
28 | Clostridium difficile | 9 | 0.5% | 1,833 | 91.7% |
29 | Haemophilus influenzae | 8 | 0.4% | 1,841 | 92.1% |
30 | Candida albicans | 7 | 0.4% | 1,848 | 92.4% |
31 | Staphylococcus haemolyticus | 7 | 0.4% | 1,855 | 92.8% |
32 | Streptococcus constellatus | 7 | 0.4% | 1,862 | 93.1% |
33 | Candida glabrata | 6 | 0.3% | 1,868 | 93.4% |
34 | Citrobacter freundii | 6 | 0.3% | 1,874 | 93.7% |
35 | Corynebacterium striatum | 6 | 0.3% | 1,880 | 94.0% |
36 | Morganella morganii | 6 | 0.3% | 1,886 | 94.3% |
37 | Streptococcus anginosus | 6 | 0.3% | 1,892 | 94.6% |
38 | Streptococcus oralis | 6 | 0.3% | 1,898 | 94.9% |
39 | Acinetobacter baumannii | 3 | 0.2% | 1,901 | 95.1% |
40 | Acinetobacter species | 3 | 0.2% | 1,904 | 95.2% |
41 | Citrobacter koseri | 3 | 0.2% | 1,907 | 95.4% |
42 | Clostridium perfringens | 3 | 0.2% | 1,910 | 95.5% |
43 | Clostridium septicum | 3 | 0.2% | 1,913 | 95.7% |
44 | Enterobacter aerogenes | 3 | 0.2% | 1,916 | 95.8% |
45 | Gemella haemolysans | 3 | 0.2% | 1,919 | 96.0% |
46 | Micrococcus luteus | 3 | 0.2% | 1,922 | 96.1% |
47 | Micrococcus species | 3 | 0.2% | 1,925 | 96.3% |
48 | Salmonella enterica | 3 | 0.2% | 1,928 | 96.4% |
49 | Staphylococcus warneri | 3 | 0.2% | 1,931 | 96.6% |
50 | Streptococcus equi | 3 | 0.2% | 1,934 | 96.7% |
51 | Streptococcus group C | 3 | 0.2% | 1,937 | 96.9% |
52 | Streptococcus group G | 3 | 0.2% | 1,940 | 97.0% |
53 | Streptococcus intermedius | 3 | 0.2% | 1,943 | 97.2% |
54 | Streptococcus parasanguinis | 3 | 0.2% | 1,946 | 97.3% |
55 | Aerococcus urinae | 2 | 0.1% | 1,948 | 97.4% |
56 | Candida tropicalis | 2 | 0.1% | 1,950 | 97.5% |
57 | Citrobacter species | 2 | 0.1% | 1,952 | 97.6% |
58 | Enterococcus avium | 2 | 0.1% | 1,954 | 97.7% |
59 | Pantoea agglomerans | 2 | 0.1% | 1,956 | 97.8% |
60 | Pantoea species | 2 | 0.1% | 1,958 | 97.9% |
61 | Proteus vulgaris | 2 | 0.1% | 1,960 | 98.0% |
62 | Staphylococcus cohnii | 2 | 0.1% | 1,962 | 98.1% |
63 | Staphylococcus lugdunensis | 2 | 0.1% | 1,964 | 98.2% |
64 | Staphylococcus schleiferi | 2 | 0.1% | 1,966 | 98.3% |
65 | Stenotrophomonas maltophilia | 2 | 0.1% | 1,968 | 98.4% |
66 | Streptococcus mutans | 2 | 0.1% | 1,970 | 98.5% |
67 | Actinomyces odontolyticus | 1 | 0.1% | 1,971 | 98.6% |
68 | Campylobacter jejuni | 1 | 0.1% | 1,972 | 98.6% |
69 | Candida lusitaniae | 1 | 0.1% | 1,973 | 98.7% |
70 | Clostridium novyi | 1 | 0.1% | 1,974 | 98.7% |
71 | Corynebacterium tuberculostearicum | 1 | 0.1% | 1,975 | 98.8% |
72 | Dermabacter hominis | 1 | 0.1% | 1,976 | 98.8% |
73 | Eikenella corrodens | 1 | 0.1% | 1,977 | 98.9% |
74 | Enterococcus casseliflavus | 1 | 0.1% | 1,978 | 98.9% |
75 | Escherichia vulneris | 1 | 0.1% | 1,979 | 99.0% |
76 | Fusobacterium species | 1 | 0.1% | 1,980 | 99.0% |
77 | Globicatella sanguinis | 1 | 0.1% | 1,981 | 99.1% |
78 | Granulicatella adiacens | 1 | 0.1% | 1,982 | 99.1% |
79 | Haemophilus parainfluenzae | 1 | 0.1% | 1,983 | 99.2% |
80 | Hafnia alvei | 1 | 0.1% | 1,984 | 99.2% |
81 | Lactobacillus delbrueckii | 1 | 0.1% | 1,985 | 99.3% |
82 | Leuconostoc species | 1 | 0.1% | 1,986 | 99.3% |
83 | Listeria monocytogenes | 1 | 0.1% | 1,987 | 99.4% |
84 | Neisseria meningitidis | 1 | 0.1% | 1,988 | 99.4% |
85 | Neisseria sicca | 1 | 0.1% | 1,989 | 99.5% |
86 | Paenibacillus durus | 1 | 0.1% | 1,990 | 99.5% |
87 | Propionibacterium acnes | 1 | 0.1% | 1,991 | 99.6% |
88 | Proteus penneri | 1 | 0.1% | 1,992 | 99.6% |
89 | Rothia mucilaginosa | 1 | 0.1% | 1,993 | 99.7% |
90 | Sphingobacterium spiritivorum | 1 | 0.1% | 1,994 | 99.7% |
91 | Sphingomonas paucimobilis | 1 | 0.1% | 1,995 | 99.8% |
92 | Streptococcus equinus | 1 | 0.1% | 1,996 | 99.8% |
93 | Streptococcus gordonii | 1 | 0.1% | 1,997 | 99.9% |
94 | Streptococcus infantarius | 1 | 0.1% | 1,998 | 99.9% |
95 | Streptococcus sanguinis | 1 | 0.1% | 1,999 | 100.0% |
96 | Veillonella parvula | 1 | 0.1% | 2,000 | 100.0% |
Frequency tables can be created of any input.
In case of numeric values (like integers, doubles, etc.) additional descriptive statistics will be calculated and shown into the header. When creating frequency tables automatically (like here in markdown), add header = TRUE
to also show the header in markdown reports:
# get age distribution of unique patients
septic_patients %>%
distinct(patient_id, .keep_all = TRUE) %>%
freq(age, nmax = 5, header = TRUE)
Frequency table of age
Class: numeric
Length: 981 (of which NA: 0 = 0.00%)
Unique: 73
Mean: 71
Std. dev.: 14 (CV: 0.2, MAD: 13)
Five-Num: 14 | 63 | 74 | 82 | 97 (IQR: 19, CQV: 0.13)
Outliers: 15 (unique count: 12)
Item | Count | Percent | Cum. Count | Cum. Percent | |
---|---|---|---|---|---|
1 | 83 | 44 | 4.5% | 44 | 4.5% |
2 | 76 | 43 | 4.4% | 87 | 8.9% |
3 | 75 | 37 | 3.8% | 124 | 12.6% |
4 | 82 | 33 | 3.4% | 157 | 16.0% |
5 | 78 | 32 | 3.3% | 189 | 19.3% |
(omitted 68 entries, n = 792 [80.7%])
So the following properties are determined, where NA
values are always ignored:
Mean
Standard deviation
Coefficient of variation (CV), the standard deviation divided by the mean
Five numbers of Tukey (min, Q1, median, Q3, max)
Coefficient of quartile variation (CQV, sometimes called coefficient of dispersion), calculated as (Q3 - Q1) / (Q3 + Q1) using quantile with type = 6
as quantile algorithm to comply with SPSS standards
Outliers (total count and unique count)
So for example, the above frequency table quickly shows the median age of patients being 74.
Frequencies of factors can be sorted on factor level instead of item count with the sort.count
parameter.
Default behaviour:
Frequency table of hospital_id
Item | Count | Percent | Cum. Count | Cum. Percent | |
---|---|---|---|---|---|
1 | D | 762 | 38.1% | 762 | 38.1% |
2 | B | 663 | 33.2% | 1,425 | 71.3% |
3 | A | 321 | 16.1% | 1,746 | 87.3% |
4 | C | 254 | 12.7% | 2,000 | 100.0% |
Sorting on item instead of count:
Frequency table of hospital_id
Item | Count | Percent | Cum. Count | Cum. Percent | |
---|---|---|---|---|---|
1 | A | 321 | 16.1% | 321 | 16.1% |
2 | B | 663 | 33.2% | 984 | 49.2% |
3 | C | 254 | 12.7% | 1,238 | 61.9% |
4 | D | 762 | 38.1% | 2,000 | 100.0% |
All classes will be printed into the header. Variables with the new rsi
class of this AMR package are actually ordered factors and have three classes (look at Class
in the header):
Frequency table of amox
Class: factor > ordered > rsi (numeric)
Levels: S < I < R
Length: 2,000 (of which NA: 828 = 41.40%)
Unique: 3
Item | Count | Percent | Cum. Count | Cum. Percent | |
---|---|---|---|---|---|
1 | R | 683 | 58.3% | 683 | 58.3% |
2 | S | 486 | 41.5% | 1,169 | 99.7% |
3 | I | 3 | 0.3% | 1,172 | 100.0% |
Frequencies of dates will show the oldest and newest date in the data, and the amount of days between them:
Frequency table of date
Class: Date (numeric)
Length: 2,000 (of which NA: 0 = 0.00%)
Unique: 1,140
Oldest: 2 januari 2002
Newest: 28 december 2017 (+5,839)
Median: 31 juli 2009 (~47%)
Item | Count | Percent | Cum. Count | Cum. Percent | |
---|---|---|---|---|---|
1 | 2016-05-21 | 10 | 0.5% | 10 | 0.5% |
2 | 2004-11-15 | 8 | 0.4% | 18 | 0.9% |
3 | 2013-07-29 | 8 | 0.4% | 26 | 1.3% |
4 | 2017-06-12 | 8 | 0.4% | 34 | 1.7% |
5 | 2015-11-19 | 7 | 0.4% | 41 | 2.1% |
(omitted 1,135 entries, n = 1,959 [98.0%])
A frequency table is actually a regular data.frame
, with the exception that it contains an additional class.
Because of this additional class, a frequency table prints like the examples above. But the object itself contains the complete table without a row limitation:
na.rm
With the na.rm
parameter (defaults to TRUE
, but they will always be shown into the header), you can include NA
values in the frequency table:
septic_patients %>%
freq(amox, na.rm = FALSE)
Warning: Factor `item` contains implicit NA, consider using
`forcats::fct_explicit_na`
Frequency table of amox
Item | Count | Percent | Cum. Count | Cum. Percent | |
---|---|---|---|---|---|
1 | 828 | 41.4% | 828 | 41.4% | |
2 | R | 683 | 34.2% | 1,511 | 75.6% |
3 | S | 486 | 24.3% | 1,997 | 99.9% |
4 | I | 3 | 0.2% | 2,000 | 100.0% |
row.names
The default frequency tables shows row indices. To remove them, use row.names = FALSE
:
Frequency table of hospital_id
Item | Count | Percent | Cum. Count | Cum. Percent |
---|---|---|---|---|
D | 762 | 38.1% | 762 | 38.1% |
B | 663 | 33.2% | 1,425 | 71.3% |
A | 321 | 16.1% | 1,746 | 87.3% |
C | 254 | 12.7% | 2,000 | 100.0% |
AMR, (c) 2018, https://msberends.gitlab.io/AMR,https://gitlab.com/msberends/AMR
Licensed under the GNU General Public License v2.0.