Introduction

Frequency tables (or frequency distributions) are summaries of the distribution of values in a sample. With the freq function, you can create univariate frequency tables. Multiple variables will be pasted into one variable, so it forces a univariate distribution. We take the septic_patients dataset (included in this AMR package) as example.

Frequencies of one variable

To only show and quickly review the content of one variable, you can just select this variable in various ways. Let’s say we want to get the frequencies of the gender variable of the septic_patients dataset:

Frequency table of gender

Item Count Percent Cum. Count Cum. Percent
1 M 1,031 51.6% 1,031 51.6%
2 F 969 48.5% 2,000 100.0%

This immediately shows the class of the variable, its length and availability (i.e. the amount of NA), the amount of unique values and (most importantly) that among septic patients men are more prevalent than women.

Frequencies of more than one variable

Multiple variables will be pasted into one variable to review individual cases, keeping a univariate frequency table.

For illustration, we could add some more variables to the septic_patients dataset to learn about bacterial properties:

Now all variables of the microorganisms dataset have been joined to the septic_patients dataset. The microorganisms dataset consists of the following variables:

If we compare the dimensions between the old and new dataset, we can see that these 14 variables were added:

dim(septic_patients)
[1] 2000   49
dim(my_patients)
[1] 2000   63

So now the genus and species variables are available. A frequency table of these combined variables can be created like this:

my_patients %>% freq(genus, species)

Frequency table of genus and species

Item Count Percent Cum. Count Cum. Percent
1 Escherichia coli 467 23.4% 467 23.4%
2 Staphylococcus coagulase negative 313 15.7% 780 39.0%
3 Staphylococcus aureus 235 11.8% 1,015 50.7%
4 Staphylococcus epidermidis 174 8.7% 1,189 59.5%
5 Streptococcus pneumoniae 117 5.9% 1,306 65.3%
6 Staphylococcus hominis 81 4.1% 1,387 69.4%
7 Klebsiella pneumoniae 58 2.9% 1,445 72.3%
8 Enterococcus faecalis 39 2.0% 1,484 74.2%
9 Proteus mirabilis 36 1.8% 1,520 76.0%
10 Pseudomonas aeruginosa 30 1.5% 1,550 77.5%
11 Serratia marcescens 25 1.3% 1,575 78.8%
12 Enterobacter cloacae 23 1.2% 1,598 79.9%
13 Enterococcus faecium 21 1.1% 1,619 81.0%
14 Staphylococcus capitis 21 1.1% 1,640 82.0%
15 Bacteroides fragilis 20 1.0% 1,660 83.0%
16 Enterococcus species 20 1.0% 1,680 84.0%
17 Streptococcus group B 18 0.9% 1,698 84.9%
18 Klebsiella oxytoca 16 0.8% 1,714 85.7%
19 Streptococcus pyogenes 16 0.8% 1,730 86.5%
20 Streptococcus dysgalactiae 14 0.7% 1,744 87.2%
21 Streptococcus group A 13 0.7% 1,757 87.9%
22 Streptococcus mitis 13 0.7% 1,770 88.5%
23 Streptococcus salivarius 12 0.6% 1,782 89.1%
24 Streptococcus agalactiae 11 0.6% 1,793 89.7%
25 Streptococcus species 11 0.6% 1,804 90.2%
26 Corynebacterium species 10 0.5% 1,814 90.7%
27 Streptococcus bovis 10 0.5% 1,824 91.2%
28 Clostridium difficile 9 0.5% 1,833 91.7%
29 Haemophilus influenzae 8 0.4% 1,841 92.1%
30 Candida albicans 7 0.4% 1,848 92.4%
31 Staphylococcus haemolyticus 7 0.4% 1,855 92.8%
32 Streptococcus constellatus 7 0.4% 1,862 93.1%
33 Candida glabrata 6 0.3% 1,868 93.4%
34 Citrobacter freundii 6 0.3% 1,874 93.7%
35 Corynebacterium striatum 6 0.3% 1,880 94.0%
36 Morganella morganii 6 0.3% 1,886 94.3%
37 Streptococcus anginosus 6 0.3% 1,892 94.6%
38 Streptococcus oralis 6 0.3% 1,898 94.9%
39 Acinetobacter baumannii 3 0.2% 1,901 95.1%
40 Acinetobacter species 3 0.2% 1,904 95.2%
41 Citrobacter koseri 3 0.2% 1,907 95.4%
42 Clostridium perfringens 3 0.2% 1,910 95.5%
43 Clostridium septicum 3 0.2% 1,913 95.7%
44 Enterobacter aerogenes 3 0.2% 1,916 95.8%
45 Gemella haemolysans 3 0.2% 1,919 96.0%
46 Micrococcus luteus 3 0.2% 1,922 96.1%
47 Micrococcus species 3 0.2% 1,925 96.3%
48 Salmonella enterica 3 0.2% 1,928 96.4%
49 Staphylococcus warneri 3 0.2% 1,931 96.6%
50 Streptococcus equi 3 0.2% 1,934 96.7%
51 Streptococcus group C 3 0.2% 1,937 96.9%
52 Streptococcus group G 3 0.2% 1,940 97.0%
53 Streptococcus intermedius 3 0.2% 1,943 97.2%
54 Streptococcus parasanguinis 3 0.2% 1,946 97.3%
55 Aerococcus urinae 2 0.1% 1,948 97.4%
56 Candida tropicalis 2 0.1% 1,950 97.5%
57 Citrobacter species 2 0.1% 1,952 97.6%
58 Enterococcus avium 2 0.1% 1,954 97.7%
59 Pantoea agglomerans 2 0.1% 1,956 97.8%
60 Pantoea species 2 0.1% 1,958 97.9%
61 Proteus vulgaris 2 0.1% 1,960 98.0%
62 Staphylococcus cohnii 2 0.1% 1,962 98.1%
63 Staphylococcus lugdunensis 2 0.1% 1,964 98.2%
64 Staphylococcus schleiferi 2 0.1% 1,966 98.3%
65 Stenotrophomonas maltophilia 2 0.1% 1,968 98.4%
66 Streptococcus mutans 2 0.1% 1,970 98.5%
67 Actinomyces odontolyticus 1 0.1% 1,971 98.6%
68 Campylobacter jejuni 1 0.1% 1,972 98.6%
69 Candida lusitaniae 1 0.1% 1,973 98.7%
70 Clostridium novyi 1 0.1% 1,974 98.7%
71 Corynebacterium tuberculostearicum 1 0.1% 1,975 98.8%
72 Dermabacter hominis 1 0.1% 1,976 98.8%
73 Eikenella corrodens 1 0.1% 1,977 98.9%
74 Enterococcus casseliflavus 1 0.1% 1,978 98.9%
75 Escherichia vulneris 1 0.1% 1,979 99.0%
76 Fusobacterium species 1 0.1% 1,980 99.0%
77 Globicatella sanguinis 1 0.1% 1,981 99.1%
78 Granulicatella adiacens 1 0.1% 1,982 99.1%
79 Haemophilus parainfluenzae 1 0.1% 1,983 99.2%
80 Hafnia alvei 1 0.1% 1,984 99.2%
81 Lactobacillus delbrueckii 1 0.1% 1,985 99.3%
82 Leuconostoc species 1 0.1% 1,986 99.3%
83 Listeria monocytogenes 1 0.1% 1,987 99.4%
84 Neisseria meningitidis 1 0.1% 1,988 99.4%
85 Neisseria sicca 1 0.1% 1,989 99.5%
86 Paenibacillus durus 1 0.1% 1,990 99.5%
87 Propionibacterium acnes 1 0.1% 1,991 99.6%
88 Proteus penneri 1 0.1% 1,992 99.6%
89 Rothia mucilaginosa 1 0.1% 1,993 99.7%
90 Sphingobacterium spiritivorum 1 0.1% 1,994 99.7%
91 Sphingomonas paucimobilis 1 0.1% 1,995 99.8%
92 Streptococcus equinus 1 0.1% 1,996 99.8%
93 Streptococcus gordonii 1 0.1% 1,997 99.9%
94 Streptococcus infantarius 1 0.1% 1,998 99.9%
95 Streptococcus sanguinis 1 0.1% 1,999 100.0%
96 Veillonella parvula 1 0.1% 2,000 100.0%

Frequencies of numeric values

Frequency tables can be created of any input.

In case of numeric values (like integers, doubles, etc.) additional descriptive statistics will be calculated and shown into the header. When creating frequency tables automatically (like here in markdown), add header = TRUE to also show the header in markdown reports:

Frequency table of age

Class: numeric

Length: 981 (of which NA: 0 = 0.00%)

Unique: 73

Mean: 71

Std. dev.: 14 (CV: 0.2, MAD: 13)

Five-Num: 14 | 63 | 74 | 82 | 97 (IQR: 19, CQV: 0.13)

Outliers: 15 (unique count: 12)

Item Count Percent Cum. Count Cum. Percent
1 83 44 4.5% 44 4.5%
2 76 43 4.4% 87 8.9%
3 75 37 3.8% 124 12.6%
4 82 33 3.4% 157 16.0%
5 78 32 3.3% 189 19.3%

(omitted 68 entries, n = 792 [80.7%])

So the following properties are determined, where NA values are always ignored:

  • Mean

  • Standard deviation

  • Coefficient of variation (CV), the standard deviation divided by the mean

  • Five numbers of Tukey (min, Q1, median, Q3, max)

  • Coefficient of quartile variation (CQV, sometimes called coefficient of dispersion), calculated as (Q3 - Q1) / (Q3 + Q1) using quantile with type = 6 as quantile algorithm to comply with SPSS standards

  • Outliers (total count and unique count)

So for example, the above frequency table quickly shows the median age of patients being 74.

Frequencies of factors

Frequencies of factors can be sorted on factor level instead of item count with the sort.count parameter.

Default behaviour:

Frequency table of hospital_id

Item Count Percent Cum. Count Cum. Percent
1 D 762 38.1% 762 38.1%
2 B 663 33.2% 1,425 71.3%
3 A 321 16.1% 1,746 87.3%
4 C 254 12.7% 2,000 100.0%

Sorting on item instead of count:

septic_patients %>%
  freq(hospital_id, sort.count = FALSE)

Frequency table of hospital_id

Item Count Percent Cum. Count Cum. Percent
1 A 321 16.1% 321 16.1%
2 B 663 33.2% 984 49.2%
3 C 254 12.7% 1,238 61.9%
4 D 762 38.1% 2,000 100.0%

All classes will be printed into the header. Variables with the new rsi class of this AMR package are actually ordered factors and have three classes (look at Class in the header):

septic_patients %>%
  freq(amox, header = TRUE)

Frequency table of amox

Class: factor > ordered > rsi (numeric)

Levels: S < I < R

Length: 2,000 (of which NA: 828 = 41.40%)

Unique: 3

Item Count Percent Cum. Count Cum. Percent
1 R 683 58.3% 683 58.3%
2 S 486 41.5% 1,169 99.7%
3 I 3 0.3% 1,172 100.0%

Frequencies of dates

Frequencies of dates will show the oldest and newest date in the data, and the amount of days between them:

septic_patients %>%
  freq(date, nmax = 5, header = TRUE)

Frequency table of date

Class: Date (numeric)

Length: 2,000 (of which NA: 0 = 0.00%)

Unique: 1,140

Oldest: 2 januari 2002

Newest: 28 december 2017 (+5,839)

Median: 31 juli 2009 (~47%)

Item Count Percent Cum. Count Cum. Percent
1 2016-05-21 10 0.5% 10 0.5%
2 2004-11-15 8 0.4% 18 0.9%
3 2013-07-29 8 0.4% 26 1.3%
4 2017-06-12 8 0.4% 34 1.7%
5 2015-11-19 7 0.4% 41 2.1%

(omitted 1,135 entries, n = 1,959 [98.0%])

Assigning a frequency table to an object

A frequency table is actually a regular data.frame, with the exception that it contains an additional class.

Because of this additional class, a frequency table prints like the examples above. But the object itself contains the complete table without a row limitation:

dim(my_df)
[1] 74  5

Additional parameters

Parameter na.rm

With the na.rm parameter (defaults to TRUE, but they will always be shown into the header), you can include NA values in the frequency table:

Frequency table of amox

Item Count Percent Cum. Count Cum. Percent
1 828 41.4% 828 41.4%
2 R 683 34.2% 1,511 75.6%
3 S 486 24.3% 1,997 99.9%
4 I 3 0.2% 2,000 100.0%

Parameter row.names

The default frequency tables shows row indices. To remove them, use row.names = FALSE:

septic_patients %>%
  freq(hospital_id, row.names = FALSE)

Frequency table of hospital_id

Item Count Percent Cum. Count Cum. Percent
D 762 38.1% 762 38.1%
B 663 33.2% 1,425 71.3%
A 321 16.1% 1,746 87.3%
C 254 12.7% 2,000 100.0%

AMR, (c) 2018, https://msberends.gitlab.io/AMR,https://gitlab.com/msberends/AMR

Licensed under the GNU General Public License v2.0.