styled, unit test fix

2025-08-24 13:12:09 +02:00 · 2022-08-28 10:31:50 +02:00
parent 4cb1db4554
commit 4d050aef7c
147 changed files with 10897 additions and 8169 deletions
--- a/vignettes/AMR.Rmd
+++ b/vignettes/AMR.Rmd
@@ -48,13 +48,16 @@ For this tutorial, we will create fake demonstration data to work with.
 You can skip to [Cleaning the data](#cleaning-the-data) if you already have your own data ready. If you start your analysis, try to make the structure of your data generally look like this:

 ```{r example table, echo = FALSE, results = 'asis'}
-knitr::kable(data.frame(date = Sys.Date(),
-                        patient_id = c("abcd", "abcd", "efgh"),
-                        mo = "Escherichia coli", 
-                        AMX = c("S", "S", "R"),
-                        CIP = c("S", "R", "S"),
-                        stringsAsFactors = FALSE), 
-             align = "c")
+knitr::kable(data.frame(
+  date = Sys.Date(),
+  patient_id = c("abcd", "abcd", "efgh"),
+  mo = "Escherichia coli",
+  AMX = c("S", "S", "R"),
+  CIP = c("S", "R", "S"),
+  stringsAsFactors = FALSE
+),
+align = "c"
+)
 ``` 

 ## Needed R packages
@@ -87,9 +90,13 @@ patients <- unlist(lapply(LETTERS, paste0, 1:10))
 The `LETTERS` object is available in R - it's a vector with 26 characters: `A` to `Z`. The `patients` object we just created is now a vector of length `r length(patients)`, with values (patient IDs) varying from ``r patients[1]`` to ``r patients[length(patients)]``. Now we we also set the gender of our patients, by putting the ID and the gender in a table:

 ```{r create gender}
-patients_table <- data.frame(patient_id = patients,
-                             gender = c(rep("M", 135),
-                                        rep("F", 125)))
+patients_table <- data.frame(
+  patient_id = patients,
+  gender = c(
+    rep("M", 135),
+    rep("F", 125)
+  )
+)
 ```

 The first 135 patient IDs are now male, the other 125 are female.
@@ -107,8 +114,10 @@ This `dates` object now contains all days in our date range.
 For this tutorial, we will uses four different microorganisms: *Escherichia coli*, *Staphylococcus aureus*, *Streptococcus pneumoniae*, and *Klebsiella pneumoniae*:

 ```{r mo}
-bacteria <- c("Escherichia coli", "Staphylococcus aureus",
-              "Streptococcus pneumoniae", "Klebsiella pneumoniae")
+bacteria <- c(
+  "Escherichia coli", "Staphylococcus aureus",
+  "Streptococcus pneumoniae", "Klebsiella pneumoniae"
+)
 ```

 ## Put everything together
@@ -117,20 +126,27 @@ Using the `sample()` function, we can randomly select items from all objects we

 ```{r merge data}
 sample_size <- 20000
-data <- data.frame(date = sample(dates, size = sample_size, replace = TRUE),
-                   patient_id = sample(patients, size = sample_size, replace = TRUE),
-                   hospital = sample(c("Hospital A",
-                                       "Hospital B",
-                                       "Hospital C",
-                                       "Hospital D"),
-                                     size = sample_size, replace = TRUE,
-                                     prob = c(0.30, 0.35, 0.15, 0.20)),
-                   bacteria = sample(bacteria, size = sample_size, replace = TRUE,
-                                     prob = c(0.50, 0.25, 0.15, 0.10)),
-                   AMX = random_rsi(sample_size, prob_RSI = c(0.35, 0.60, 0.05)),
-                   AMC = random_rsi(sample_size, prob_RSI = c(0.15, 0.75, 0.10)),
-                   CIP = random_rsi(sample_size, prob_RSI = c(0.20, 0.80, 0.00)),
-                   GEN = random_rsi(sample_size, prob_RSI = c(0.08, 0.92, 0.00)))
+data <- data.frame(
+  date = sample(dates, size = sample_size, replace = TRUE),
+  patient_id = sample(patients, size = sample_size, replace = TRUE),
+  hospital = sample(c(
+    "Hospital A",
+    "Hospital B",
+    "Hospital C",
+    "Hospital D"
+  ),
+  size = sample_size, replace = TRUE,
+  prob = c(0.30, 0.35, 0.15, 0.20)
+  ),
+  bacteria = sample(bacteria,
+    size = sample_size, replace = TRUE,
+    prob = c(0.50, 0.25, 0.15, 0.10)
+  ),
+  AMX = random_rsi(sample_size, prob_RSI = c(0.35, 0.60, 0.05)),
+  AMC = random_rsi(sample_size, prob_RSI = c(0.15, 0.75, 0.10)),
+  CIP = random_rsi(sample_size, prob_RSI = c(0.20, 0.80, 0.00)),
+  GEN = random_rsi(sample_size, prob_RSI = c(0.08, 0.92, 0.00))
+)
 ```

 Using the `left_join()` function from the `dplyr` package, we can 'map' the gender to the patient ID using the `patients_table` object we created earlier:
@@ -192,10 +208,12 @@ data <- eucast_rules(data, col_mo = "bacteria", rules = "all")
 Now that we have the microbial ID, we can add some taxonomic properties:

 ```{r new taxo}
-data <- data %>% 
-  mutate(gramstain = mo_gramstain(bacteria),
-         genus = mo_genus(bacteria),
-         species = mo_species(bacteria))
+data <- data %>%
+  mutate(
+    gramstain = mo_gramstain(bacteria),
+    genus = mo_genus(bacteria),
+    species = mo_species(bacteria)
+  )
 ```

 ## First isolates
@@ -213,21 +231,21 @@ This `AMR` package includes this methodology with the `first_isolate()` function
 The outcome of the function can easily be added to our data:

 ```{r 1st isolate}
-data <- data %>% 
+data <- data %>%
  mutate(first = first_isolate(info = TRUE))
 ```

 So only `r percentage(sum(data$first) / nrow(data))` is suitable for resistance analysis! We can now filter on it with the `filter()` function, also from the `dplyr` package:

 ```{r 1st isolate filter}
-data_1st <- data %>% 
+data_1st <- data %>%
  filter(first == TRUE)
 ```

 For future use, the above two syntaxes can be shortened:

 ```{r 1st isolate filter 2}
-data_1st <- data %>% 
+data_1st <- data %>%
  filter_first_isolate()
 ```

@@ -261,7 +279,7 @@ Or can be used like the `dplyr` way, which is easier readable:
 data_1st %>% freq(genus, species)
 ```
 ```{r freq 2b, results = 'asis', echo = FALSE}
-data_1st %>% 
+data_1st %>%
  freq(genus, species, header = TRUE)
 ```

@@ -270,45 +288,48 @@ data_1st %>%
 Using [tidyverse selections](https://tidyselect.r-lib.org/reference/language.html), you can also select or filter columns based on the antibiotic class they are in:

 ```{r bug_drg 2a, eval = FALSE}
-data_1st %>% 
+data_1st %>%
  filter(any(aminoglycosides() == "R"))
 ```

 ```{r bug_drg 2b, echo = FALSE, results = 'asis'}
-knitr::kable(data_1st %>% 
-               filter(any(aminoglycosides() == "R")) %>% 
-               head(),
-             align = "c")
+knitr::kable(data_1st %>%
+  filter(any(aminoglycosides() == "R")) %>%
+  head(),
+align = "c"
+)
 ```

 If you want to get a quick glance of the number of isolates in different bug/drug combinations, you can use the `bug_drug_combinations()` function:

 ```{r bug_drg 1a, eval = FALSE}
-data_1st %>% 
-  bug_drug_combinations() %>% 
+data_1st %>%
+  bug_drug_combinations() %>%
  head() # show first 6 rows
 ```

 ```{r bug_drg 1b, echo = FALSE, results = 'asis'}
-knitr::kable(data_1st %>% 
-               bug_drug_combinations() %>% 
-               head(),
-             align = "c")
+knitr::kable(data_1st %>%
+  bug_drug_combinations() %>%
+  head(),
+align = "c"
+)
 ```


 ```{r bug_drg 3a, eval = FALSE}
-data_1st %>% 
-  select(bacteria, aminoglycosides()) %>% 
+data_1st %>%
+  select(bacteria, aminoglycosides()) %>%
  bug_drug_combinations()
 ```


 ```{r bug_drg 3b, echo = FALSE, results = 'asis'}
-knitr::kable(data_1st %>% 
-               select(bacteria, aminoglycosides()) %>% 
-               bug_drug_combinations(),
-             align = "c")
+knitr::kable(data_1st %>%
+  select(bacteria, aminoglycosides()) %>%
+  bug_drug_combinations(),
+align = "c"
+)
 ```

 This will only give you the crude numbers in the data. To calculate antimicrobial resistance in a more sensible way, also by correcting for too few results, we use the `resistance()` and `susceptibility()` functions.
@@ -328,86 +349,98 @@ data_1st %>% resistance(AMX)
 Or can be used in conjunction with `group_by()` and `summarise()`, both from the `dplyr` package:

 ```{r, eval = FALSE}
-data_1st %>% 
-  group_by(hospital) %>% 
+data_1st %>%
+  group_by(hospital) %>%
  summarise(amoxicillin = resistance(AMX))
 ```
 ```{r, echo = FALSE}
-data_1st %>% 
-  group_by(hospital) %>% 
-  summarise(amoxicillin = resistance(AMX)) %>% 
+data_1st %>%
+  group_by(hospital) %>%
+  summarise(amoxicillin = resistance(AMX)) %>%
  knitr::kable(align = "c", big.mark = ",")
 ```

 Of course it would be very convenient to know the number of isolates responsible for the percentages. For that purpose the `n_rsi()` can be used, which works exactly like `n_distinct()` from the `dplyr` package. It counts all isolates available for every group (i.e. values S, I or R):

 ```{r, eval = FALSE}
-data_1st %>% 
-  group_by(hospital) %>% 
-  summarise(amoxicillin = resistance(AMX),
-            available = n_rsi(AMX))
+data_1st %>%
+  group_by(hospital) %>%
+  summarise(
+    amoxicillin = resistance(AMX),
+    available = n_rsi(AMX)
+  )
 ```
 ```{r, echo = FALSE}
-data_1st %>% 
-  group_by(hospital) %>% 
-  summarise(amoxicillin = resistance(AMX),
-            available = n_rsi(AMX)) %>% 
+data_1st %>%
+  group_by(hospital) %>%
+  summarise(
+    amoxicillin = resistance(AMX),
+    available = n_rsi(AMX)
+  ) %>%
  knitr::kable(align = "c", big.mark = ",")
 ```

 These functions can also be used to get the proportion of multiple antibiotics, to calculate empiric susceptibility of combination therapies very easily:

 ```{r, eval = FALSE}
-data_1st %>% 
-  group_by(genus) %>% 
-  summarise(amoxiclav = susceptibility(AMC),
-            gentamicin = susceptibility(GEN),
-            amoxiclav_genta = susceptibility(AMC, GEN))
+data_1st %>%
+  group_by(genus) %>%
+  summarise(
+    amoxiclav = susceptibility(AMC),
+    gentamicin = susceptibility(GEN),
+    amoxiclav_genta = susceptibility(AMC, GEN)
+  )
 ```
 ```{r, echo = FALSE}
-data_1st %>% 
-  group_by(genus) %>% 
-  summarise(amoxiclav = susceptibility(AMC),
-            gentamicin = susceptibility(GEN),
-            amoxiclav_genta = susceptibility(AMC, GEN)) %>% 
+data_1st %>%
+  group_by(genus) %>%
+  summarise(
+    amoxiclav = susceptibility(AMC),
+    gentamicin = susceptibility(GEN),
+    amoxiclav_genta = susceptibility(AMC, GEN)
+  ) %>%
  knitr::kable(align = "c", big.mark = ",")
 ```

 Or if you are curious for the resistance within certain antibiotic classes, use a antibiotic class selector such as `penicillins()`, which automatically will include the columns `AMX` and `AMC` of our data:

 ```{r, eval = FALSE}
-data_1st %>% 
+data_1st %>%
  # group by hospital
-  group_by(hospital) %>% 
+  group_by(hospital) %>%
  #                / -> select all penicillins in the data for calculation
  #                |              / -> use resistance() for all peni's per hospital
  #                |              |           / -> print as percentages
-  summarise(across(penicillins(), resistance, as_percent = TRUE)) %>% 
+  summarise(across(penicillins(), resistance, as_percent = TRUE)) %>%
  # format the antibiotic column names, using so-called snake case,
  # so 'Amoxicillin/clavulanic acid' becomes 'amoxicillin_clavulanic_acid'
  rename_with(set_ab_names, penicillins())
 ```
 ```{r, echo = FALSE, message = FALSE}
-data_1st %>% 
-  group_by(hospital) %>% 
-  summarise(across(penicillins(), resistance, as_percent = TRUE)) %>% 
-  rename_with(set_ab_names, penicillins()) %>% 
+data_1st %>%
+  group_by(hospital) %>%
+  summarise(across(penicillins(), resistance, as_percent = TRUE)) %>%
+  rename_with(set_ab_names, penicillins()) %>%
  knitr::kable(align = "lrr")
 ```

 To make a transition to the next part, let's see how differences in the previously calculated combination therapies could be plotted:

 ```{r plot 1}
-data_1st %>% 
-  group_by(genus) %>% 
-  summarise("1. Amoxi/clav" = susceptibility(AMC),
-            "2. Gentamicin" = susceptibility(GEN),
-            "3. Amoxi/clav + genta" = susceptibility(AMC, GEN)) %>% 
+data_1st %>%
+  group_by(genus) %>%
+  summarise(
+    "1. Amoxi/clav" = susceptibility(AMC),
+    "2. Gentamicin" = susceptibility(GEN),
+    "3. Amoxi/clav + genta" = susceptibility(AMC, GEN)
+  ) %>%
  # pivot_longer() from the tidyr package "lengthens" data:
-  tidyr::pivot_longer(-genus, names_to = "antibiotic") %>% 
-  ggplot(aes(x = genus,
-             y = value,
-             fill = antibiotic)) +
+  tidyr::pivot_longer(-genus, names_to = "antibiotic") %>%
+  ggplot(aes(
+    x = genus,
+    y = value,
+    fill = antibiotic
+  )) +
  geom_col(position = "dodge2")
 ```

@@ -416,14 +449,20 @@ data_1st %>%
 To show results in plots, most R users would nowadays use the `ggplot2` package. This package lets you create plots in layers. You can read more about it [on their website](https://ggplot2.tidyverse.org/). A quick example would look like these syntaxes:

 ```{r plot 2, eval = FALSE}
-ggplot(data = a_data_set,
-       mapping = aes(x = year,
-                     y = value)) +
+ggplot(
+  data = a_data_set,
+  mapping = aes(
+    x = year,
+    y = value
+  )
+) +
  geom_col() +
-  labs(title = "A title",
-       subtitle = "A subtitle",
-       x = "My X axis",
-       y = "My Y axis")
+  labs(
+    title = "A title",
+    subtitle = "A subtitle",
+    x = "My X axis",
+    y = "My Y axis"
+  )

 # or as short as:
 ggplot(a_data_set) +
@@ -443,11 +482,11 @@ If we group on e.g. the `genus` column and add some additional functions from ou

 ```{r plot 4}
 # group the data on `genus`
-ggplot(data_1st %>% group_by(genus)) + 
+ggplot(data_1st %>% group_by(genus)) +
  # create bars with genus on x axis
  # it looks for variables with class `rsi`,
  # of which we have 4 (earlier created with `as.rsi`)
-  geom_rsi(x = "genus") + 
+  geom_rsi(x = "genus") +
  # split plots on antibiotic
  facet_rsi(facet = "antibiotic") +
  # set colours to the R/SI interpretations (colour-blind friendly)
@@ -457,8 +496,10 @@ ggplot(data_1st %>% group_by(genus)) +
  # turn 90 degrees, to make it bars instead of columns
  coord_flip() +
  # add labels
-  labs(title = "Resistance per genus and antibiotic", 
-       subtitle = "(this is fake data)") +
+  labs(
+    title = "Resistance per genus and antibiotic",
+    subtitle = "(this is fake data)"
+  ) +
  # and print genus in italic to follow our convention
  # (is now y axis because we turned the plot)
  theme(axis.text.y = element_text(face = "italic"))
@@ -467,12 +508,14 @@ ggplot(data_1st %>% group_by(genus)) +
 To simplify this, we also created the `ggplot_rsi()` function, which combines almost all above functions:

 ```{r plot 5}
-data_1st %>% 
+data_1st %>%
  group_by(genus) %>%
-  ggplot_rsi(x = "genus",
-             facet = "antibiotic",
-             breaks = 0:4 * 25,
-             datalabels = FALSE) +
+  ggplot_rsi(
+    x = "genus",
+    facet = "antibiotic",
+    breaks = 0:4 * 25,
+    datalabels = FALSE
+  ) +
  coord_flip()
 ```

@@ -527,9 +570,10 @@ And when using the `ggplot2` package, but now choosing the latest implemented CL

 ```{r disk_plots_mo_ab, message = FALSE, warning = FALSE}
 autoplot(disk_values,
-       mo = "E. coli",
-       ab = "cipro",
-       guideline = "CLSI")
+  mo = "E. coli",
+  ab = "cipro",
+  guideline = "CLSI"
+)
 ```

 ## Independence test
@@ -544,13 +588,15 @@ library(tidyr)

 check_FOS <- example_isolates %>%
  filter(ward %in% c("A", "D")) %>% # filter on only hospitals A and D
-  select(ward, FOS) %>%             # select the hospitals and fosfomycin
-  group_by(ward) %>%                # group on the hospitals
-  count_df(combine_SI = TRUE) %>%          # count all isolates per group (ward)
-  pivot_wider(names_from = ward,    # transform output so A and D are columns
-              values_from = value) %>%     
-  select(A, D) %>%                         # and only select these columns
-  as.matrix()                              # transform to a good old matrix for fisher.test()
+  select(ward, FOS) %>% # select the hospitals and fosfomycin
+  group_by(ward) %>% # group on the hospitals
+  count_df(combine_SI = TRUE) %>% # count all isolates per group (ward)
+  pivot_wider(
+    names_from = ward, # transform output so A and D are columns
+    values_from = value
+  ) %>%
+  select(A, D) %>% # and only select these columns
+  as.matrix() # transform to a good old matrix for fisher.test()

 check_FOS
 ```
@@ -559,7 +605,7 @@ We can apply the test now with:

 ```{r}
 # do Fisher's Exact Test
-fisher.test(check_FOS)                            
+fisher.test(check_FOS)
 ```

 As can be seen, the p value is `r round(fisher.test(check_FOS)$p.value, 3)`, which means that the fosfomycin resistance found in isolates from patients in hospital A and D are really different.
--- a/vignettes/AMR_intro.png
+++ b/vignettes/AMR_intro.png
--- a/vignettes/EUCAST.Rmd
+++ b/vignettes/EUCAST.Rmd
@@ -39,9 +39,13 @@ These rules can be used to discard impossible bug-drug combinations in your data
 Sometimes, laboratory data can still contain such strains with ampicillin being susceptible to ampicillin. This could be because an antibiogram is available before an identification is available, and the antibiogram is then not re-interpreted based on the identification (namely, *Klebsiella*). EUCAST expert rules solve this, that can be applied using `eucast_rules()`:

 ```{r, warning = FALSE, message = FALSE}
-oops <- data.frame(mo = c("Klebsiella", 
-                          "Escherichia"),
-                   ampicillin = "S")
+oops <- data.frame(
+  mo = c(
+    "Klebsiella",
+    "Escherichia"
+  ),
+  ampicillin = "S"
+)
 oops

 eucast_rules(oops, info = FALSE)
@@ -50,29 +54,37 @@ eucast_rules(oops, info = FALSE)
 A more convenient function is `mo_is_intrinsic_resistant()` that uses the same guideline, but allows to check for one or more specific microorganisms or antibiotics:

 ```{r, warning = FALSE, message = FALSE}
-mo_is_intrinsic_resistant(c("Klebsiella", "Escherichia"),
-                          "ampicillin")
+mo_is_intrinsic_resistant(
+  c("Klebsiella", "Escherichia"),
+  "ampicillin"
+)

-mo_is_intrinsic_resistant("Klebsiella",
-                          c("ampicillin", "kanamycin"))
+mo_is_intrinsic_resistant(
+  "Klebsiella",
+  c("ampicillin", "kanamycin")
+)
 ```

 EUCAST rules can not only be used for correction, they can also be used for filling in known resistance and susceptibility based on results of other antimicrobials drugs. This process is called *interpretive reading*, is basically a form of imputation, and is part of the `eucast_rules()` function as well:

 ```{r, warning = FALSE, message = FALSE}
-data <- data.frame(mo = c("Staphylococcus aureus",
-                          "Enterococcus faecalis",
-                          "Escherichia coli",
-                          "Klebsiella pneumoniae",
-                          "Pseudomonas aeruginosa"),
-                   VAN = "-",       # Vancomycin
-                   AMX = "-",       # Amoxicillin
-                   COL = "-",       # Colistin
-                   CAZ = "-",       # Ceftazidime
-                   CXM = "-",       # Cefuroxime
-                   PEN = "S",       # Benzylenicillin
-                   FOX = "S",       # Cefoxitin
-                   stringsAsFactors = FALSE)
+data <- data.frame(
+  mo = c(
+    "Staphylococcus aureus",
+    "Enterococcus faecalis",
+    "Escherichia coli",
+    "Klebsiella pneumoniae",
+    "Pseudomonas aeruginosa"
+  ),
+  VAN = "-", # Vancomycin
+  AMX = "-", # Amoxicillin
+  COL = "-", # Colistin
+  CAZ = "-", # Ceftazidime
+  CXM = "-", # Cefuroxime
+  PEN = "S", # Benzylenicillin
+  FOX = "S", # Cefoxitin
+  stringsAsFactors = FALSE
+)
 ```
 ```{r, eval = FALSE}
 data
--- a/vignettes/MDR.Rmd
+++ b/vignettes/MDR.Rmd
@@ -64,8 +64,10 @@ You can also use your own custom guideline. Custom guidelines can be set with th
 If you are familiar with `case_when()` of the `dplyr` package, you will recognise the input method to set your own rules. Rules must be set using what R considers to be the 'formula notation':

 ```{r}
-custom <- custom_mdro_guideline(CIP == "R" & age > 60 ~ "Elderly Type A",
-                                ERY == "R" & age > 60 ~ "Elderly Type B")
+custom <- custom_mdro_guideline(
+  CIP == "R" & age > 60 ~ "Elderly Type A",
+  ERY == "R" & age > 60 ~ "Elderly Type B"
+)
 ```

 If a row/an isolate matches the first rule, the value after the first `~` (in this case *'Elderly Type A'*) will be set as MDRO value. Otherwise, the second rule will be tried and so on. The maximum number of rules is unlimited. 
@@ -92,17 +94,17 @@ The `mdro()` function always returns an ordered `factor` for predefined guidelin
 The next example uses the `example_isolates` data set. This is a data set included with this package and contains full antibiograms of 2,000 microbial isolates. It reflects reality and can be used to practise AMR data analysis. If we test the MDR/XDR/PDR guideline on this data set, we get:

 ```{r, message = FALSE}
-library(dplyr)   # to support pipes: %>%
+library(dplyr) # to support pipes: %>%
 library(cleaner) # to create frequency tables
 ```
 ```{r, results = 'hide'}
-example_isolates %>% 
-  mdro() %>% 
+example_isolates %>%
+  mdro() %>%
  freq() # show frequency table of the result
 ```
 ```{r, echo = FALSE, results = 'asis', message = FALSE, warning = FALSE}
-example_isolates %>% 
-  mdro(info = FALSE) %>% 
+example_isolates %>%
+  mdro(info = FALSE) %>%
  freq() # show frequency table of the result
 ```

@@ -111,25 +113,29 @@ For another example, I will create a data set to determine multi-drug resistant
 ```{r}
 # random_rsi() is a helper function to generate
 # a random vector with values S, I and R
-my_TB_data <- data.frame(rifampicin = random_rsi(5000),
-                         isoniazid = random_rsi(5000),
-                         gatifloxacin = random_rsi(5000),
-                         ethambutol = random_rsi(5000),
-                         pyrazinamide = random_rsi(5000),
-                         moxifloxacin = random_rsi(5000),
-                         kanamycin = random_rsi(5000))
+my_TB_data <- data.frame(
+  rifampicin = random_rsi(5000),
+  isoniazid = random_rsi(5000),
+  gatifloxacin = random_rsi(5000),
+  ethambutol = random_rsi(5000),
+  pyrazinamide = random_rsi(5000),
+  moxifloxacin = random_rsi(5000),
+  kanamycin = random_rsi(5000)
+)
 ```

 Because all column names are automatically verified for valid drug names or codes, this would have worked exactly the same way:

 ```{r, eval = FALSE}
-my_TB_data <- data.frame(RIF = random_rsi(5000),
-                         INH = random_rsi(5000),
-                         GAT = random_rsi(5000),
-                         ETH = random_rsi(5000),
-                         PZA = random_rsi(5000),
-                         MFX = random_rsi(5000),
-                         KAN = random_rsi(5000))
+my_TB_data <- data.frame(
+  RIF = random_rsi(5000),
+  INH = random_rsi(5000),
+  GAT = random_rsi(5000),
+  ETH = random_rsi(5000),
+  PZA = random_rsi(5000),
+  MFX = random_rsi(5000),
+  KAN = random_rsi(5000)
+)
 ```

 The data set now looks like this:
--- a/vignettes/PCA.Rmd
+++ b/vignettes/PCA.Rmd
@@ -39,12 +39,16 @@ glimpse(example_isolates)
 Now to transform this to a data set with only resistance percentages per taxonomic order and genus:

 ```{r, warning = FALSE}
-resistance_data <- example_isolates %>% 
-  group_by(order = mo_order(mo),       # group on anything, like order
-           genus = mo_genus(mo)) %>%   #  and genus as we do here
+resistance_data <- example_isolates %>%
+  group_by(
+    order = mo_order(mo), # group on anything, like order
+    genus = mo_genus(mo)
+  ) %>% #  and genus as we do here
  summarise_if(is.rsi, resistance) %>% # then get resistance of all drugs
-  select(order, genus, AMC, CXM, CTX, 
-         CAZ, GEN, TOB, TMP, SXT)      # and select only relevant columns
+  select(
+    order, genus, AMC, CXM, CTX,
+    CAZ, GEN, TOB, TMP, SXT
+  ) # and select only relevant columns

 head(resistance_data)
 ```
--- a/vignettes/SPSS.Rmd
+++ b/vignettes/SPSS.Rmd
@@ -80,9 +80,11 @@ as.mic("testvalue")
 mo_gramstain("E. coli")

 # Klebsiella is intrinsic resistant to amoxicillin, according to EUCAST:
-klebsiella_test <- data.frame(mo = "klebsiella", 
-                              amox = "S",
-                              stringsAsFactors = FALSE)
+klebsiella_test <- data.frame(
+  mo = "klebsiella",
+  amox = "S",
+  stringsAsFactors = FALSE
+)
 klebsiella_test # (our original data)
 eucast_rules(klebsiella_test, info = FALSE) # (the edited data by EUCAST rules)

@@ -153,7 +155,7 @@ To import data from SPSS, SAS or Stata, you can use the [great `haven` package](
 # download and install the latest version:
 install.packages("haven")
 # load the package you just installed:
-library(haven) 
+library(haven)
 ```

 You can now import files as follows:
@@ -203,7 +205,7 @@ To export your R objects to the SAS file format:
 # save as regular SAS file:
 write_sas(data = yourdata, path = "path/to/file")

-# the SAS transport format is an open format 
+# the SAS transport format is an open format
 # (required for submission of the data to the FDA)
 write_xpt(data = yourdata, path = "path/to/file", version = 8)
 ```
--- a/vignettes/WHONET.Rmd
+++ b/vignettes/WHONET.Rmd
@@ -39,9 +39,9 @@ This package comes with an [example data set `WHONET`](https://msberends.github.
 First, load the relevant packages if you did not yet did this. I use the tidyverse for all of my analyses. All of them. If you don't know it yet, I suggest you read about it on their website: https://www.tidyverse.org/.

 ```{r, message = FALSE}
-library(dplyr)   # part of tidyverse
+library(dplyr) # part of tidyverse
 library(ggplot2) # part of tidyverse
-library(AMR)     # this package
+library(AMR) # this package
 library(cleaner) # to create frequency tables
 ```

@@ -54,7 +54,7 @@ We will have to transform some variables to simplify and automate the analysis:
 # transform variables
 data <- WHONET %>%
  # get microbial ID based on given organism
-  mutate(mo = as.mo(Organism)) %>% 
+  mutate(mo = as.mo(Organism)) %>%
  # transform everything from "AMP_ND10" to "CIP_EE" to the new `rsi` class
  mutate_at(vars(AMP_ND10:CIP_EE), as.rsi)
 ```
@@ -83,15 +83,16 @@ An easy `ggplot` will already give a lot of information, using the included `ggp
 data %>%
  group_by(Country) %>%
  select(Country, AMP_ND2, AMC_ED20, CAZ_ED10, CIP_ED5) %>%
-  ggplot_rsi(translate_ab = 'ab', facet = "Country", datalabels = FALSE)
+  ggplot_rsi(translate_ab = "ab", facet = "Country", datalabels = FALSE)
 ```

 ```{r, echo = FALSE}
 # on very old and some new releases of R, this may lead to an error
 tryCatch(data %>%
-           group_by(Country) %>%
-           select(Country, AMP_ND2, AMC_ED20, CAZ_ED10, CIP_ED5) %>%
-           ggplot_rsi(translate_ab = 'ab', facet = "Country", datalabels = FALSE) %>%
-           print(),
-         error = function(e) base::invisible())
+  group_by(Country) %>%
+  select(Country, AMP_ND2, AMC_ED20, CAZ_ED10, CIP_ED5) %>%
+  ggplot_rsi(translate_ab = "ab", facet = "Country", datalabels = FALSE) %>%
+  print(),
+error = function(e) base::invisible()
+)
 ```
--- a/vignettes/benchmarks.Rmd.not
+++ b/vignettes/benchmarks.Rmd.not
@@ -33,19 +33,25 @@ ggplot.bm <- function(df) {
    summ <- tapply(.x, .f, .fun)
    factor(.f, levels = names(summ)[order(summ, decreasing = .desc)], ordered = is.ordered(.f))
  }
-  ggplot(df,
-         aes(x = reorder(expr, time, median), y = time / 1000 / 1000)) + 
-  stat_boxplot(geom = "errorbar", width = 0.5) +
-  geom_boxplot(outlier.alpha = 0) +
-  coord_flip() +
-  scale_y_continuous(trans = "log", breaks = c(1, 2, 5, 
-                                                10, 20, 50,
-                                                100, 200, 500,
-                                                1000, 2000, 5000)) +
-  labs(x = "Expression",
-       y = "Time in milliseconds (log scale)") +
-  theme_minimal() +
-  theme(axis.text.y = element_text(family = "mono"))
+  ggplot(
+    df,
+    aes(x = reorder(expr, time, median), y = time / 1000 / 1000)
+  ) +
+    stat_boxplot(geom = "errorbar", width = 0.5) +
+    geom_boxplot(outlier.alpha = 0) +
+    coord_flip() +
+    scale_y_continuous(trans = "log", breaks = c(
+      1, 2, 5,
+      10, 20, 50,
+      100, 200, 500,
+      1000, 2000, 5000
+    )) +
+    labs(
+      x = "Expression",
+      y = "Time in milliseconds (log scale)"
+    ) +
+    theme_minimal() +
+    theme(axis.text.y = element_text(family = "mono"))
 }
 ```

@@ -75,7 +81,8 @@ S.aureus <- microbenchmark(
  as.mo("Sthafilokkockus aaureuz"), # incorrect spelling
  as.mo("MRSA"), # Methicillin Resistant S. aureus
  as.mo("VISA"), # Vancomycin Intermediate S. aureus
-  times = 25)
+  times = 25
+)
 print(S.aureus, unit = "ms", signif = 2)
 ```
 ```{r, echo = FALSE}
@@ -95,7 +102,7 @@ To prove this, we will use `mo_name()` for testing - a helper function that retu

 ```{r, message = FALSE}
 # start with the example_isolates data set
-x <- example_isolates %>% 
+x <- example_isolates %>%
  # take all MO codes from the 'mo' column
  pull(mo) %>%
  # and copy them a thousand times
@@ -105,7 +112,7 @@ x <- example_isolates %>%

 # what do these values look like? They are of class <mo>:
 head(x)
-  
+
 # as the example_isolates data set has 2,000 rows, we should have 2 million items
 length(x)

@@ -114,7 +121,8 @@ n_distinct(x)

 # now let's see:
 run_it <- microbenchmark(mo_name(x),
-                         times = 10)
+  times = 10
+)
 print(run_it, unit = "ms", signif = 3)
 ```

@@ -125,25 +133,29 @@ So getting official taxonomic names of `r format(length(x), big.mark = ",")` (!!
 What about precalculated results? If the input is an already precalculated result of a helper function such as `mo_name()`, it almost doesn't take any time at all. In other words, if you run `mo_name()` on a valid taxonomic name, it will return the results immediately (see 'C' below):

 ```{r, warning=FALSE, message=FALSE}
-run_it <- microbenchmark(A = mo_name("STAAUR"),
-                         B = mo_name("S. aureus"),
-                         C = mo_name("Staphylococcus aureus"),
-                         times = 10)
+run_it <- microbenchmark(
+  A = mo_name("STAAUR"),
+  B = mo_name("S. aureus"),
+  C = mo_name("Staphylococcus aureus"),
+  times = 10
+)
 print(run_it, unit = "ms", signif = 3)
 ```

 So going from `mo_name("Staphylococcus aureus")` to `"Staphylococcus aureus"` takes `r format(round(run_it %>% filter(expr == "C") %>% pull(time) %>% median() / 1e9, 4), scientific = FALSE)` seconds - it doesn't even start calculating *if the result would be the same as the expected resulting value*. That goes for all helper functions:

 ```{r}
-run_it <- microbenchmark(A = mo_species("aureus"),
-                         B = mo_genus("Staphylococcus"),
-                         C = mo_name("Staphylococcus aureus"),
-                         D = mo_family("Staphylococcaceae"),
-                         E = mo_order("Bacillales"),
-                         F = mo_class("Bacilli"),
-                         G = mo_phylum("Firmicutes"),
-                         H = mo_kingdom("Bacteria"),
-                         times = 10)
+run_it <- microbenchmark(
+  A = mo_species("aureus"),
+  B = mo_genus("Staphylococcus"),
+  C = mo_name("Staphylococcus aureus"),
+  D = mo_family("Staphylococcaceae"),
+  E = mo_order("Bacillales"),
+  F = mo_class("Bacilli"),
+  G = mo_phylum("Firmicutes"),
+  H = mo_kingdom("Bacteria"),
+  times = 10
+)
 print(run_it, unit = "ms", signif = 3)
 ```

@@ -163,17 +175,19 @@ mo_name(CoNS, language = "es") # or just mo_name(CoNS) on a Spanish system

 mo_name(CoNS, language = "nl") # or just mo_name(CoNS) on a Dutch system

-run_it <- microbenchmark(da = mo_name(CoNS, language = "da"),
-                         de = mo_name(CoNS, language = "de"),
-                         en = mo_name(CoNS, language = "en"),
-                         es = mo_name(CoNS, language = "es"),
-                         fr = mo_name(CoNS, language = "fr"),
-                         it = mo_name(CoNS, language = "it"),
-                         nl = mo_name(CoNS, language = "nl"),
-                         pt = mo_name(CoNS, language = "pt"),
-                         ru = mo_name(CoNS, language = "ru"),
-                         sv = mo_name(CoNS, language = "sv"),
-                         times = 100)
+run_it <- microbenchmark(
+  da = mo_name(CoNS, language = "da"),
+  de = mo_name(CoNS, language = "de"),
+  en = mo_name(CoNS, language = "en"),
+  es = mo_name(CoNS, language = "es"),
+  fr = mo_name(CoNS, language = "fr"),
+  it = mo_name(CoNS, language = "it"),
+  nl = mo_name(CoNS, language = "nl"),
+  pt = mo_name(CoNS, language = "pt"),
+  ru = mo_name(CoNS, language = "ru"),
+  sv = mo_name(CoNS, language = "sv"),
+  times = 100
+)
 print(run_it, unit = "ms", signif = 4)
 ```

--- a/vignettes/datasets.Rmd
+++ b/vignettes/datasets.Rmd
@@ -28,16 +28,20 @@ library(dplyr)
 options(knitr.kable.NA = "")

 structure_txt <- function(dataset) {
-  paste0("A data set with ",
-         format(nrow(dataset), big.mark = ","), " rows and ", 
-         ncol(dataset), " columns, containing the following column names:  \n",
-         AMR:::vector_or(colnames(dataset), quotes = "*", last_sep = " and ", sort = FALSE), ".")
+  paste0(
+    "A data set with ",
+    format(nrow(dataset), big.mark = ","), " rows and ",
+    ncol(dataset), " columns, containing the following column names:  \n",
+    AMR:::vector_or(colnames(dataset), quotes = "*", last_sep = " and ", sort = FALSE), "."
+  )
 }

 download_txt <- function(filename) {
-  msg <- paste0("It was last updated on ", 
-                trimws(format(file.mtime(paste0("../data/", filename, ".rda")), "%e %B %Y %H:%M:%S %Z", tz = "UTC")), 
-                ". Find more info about the structure of this data set [here](https://msberends.github.io/AMR/reference/", ifelse(filename == "antivirals", "antibiotics", filename), ".html).\n")
+  msg <- paste0(
+    "It was last updated on ",
+    trimws(format(file.mtime(paste0("../data/", filename, ".rda")), "%e %B %Y %H:%M:%S %Z", tz = "UTC")),
+    ". Find more info about the structure of this data set [here](https://msberends.github.io/AMR/reference/", ifelse(filename == "antivirals", "antibiotics", filename), ".html).\n"
+  )
  github_base <- "https://github.com/msberends/AMR/raw/main/data-raw/"
  filename <- paste0("../data-raw/", filename)
  rds <- paste0(filename, ".rds")
@@ -50,38 +54,44 @@ download_txt <- function(filename) {
  stata <- paste0(filename, ".dta")
  create_txt <- function(filename, type, software, exists) {
    if (isTRUE(exists)) {
-      paste0("* Download as [", software, "](", github_base, filename, ") (",
-             AMR:::formatted_filesize(filename), ")  \n")
+      paste0(
+        "* Download as [", software, "](", github_base, filename, ") (",
+        AMR:::formatted_filesize(filename), ")  \n"
+      )
    } else {
      paste0("* *(unavailable as ", software, ")*\n")
    }
  }
-  
-  if (any(file.exists(rds),
-          file.exists(txt),
-          file.exists(excel),
-          file.exists(feather),
-          file.exists(parquet),
-          file.exists(sas),
-          file.exists(spss),
-          file.exists(stata))) {
-    msg <- c(msg, "\n**Direct download links:**\n\n",
-             create_txt(rds, "rds", "original R Data Structure (RDS) file", file.exists(rds)),
-             create_txt(txt, "txt", "tab-separated text file", file.exists(txt)),
-             create_txt(excel, "xlsx", "Microsoft Excel workbook", file.exists(excel)),
-             create_txt(feather, "feather", "Apache Feather file", file.exists(feather)),
-             create_txt(parquet, "parquet", "Apache Parquet file", file.exists(parquet)),
-             create_txt(sas, "sas", "SAS data file", file.exists(sas)),
-             create_txt(spss, "sav", "IBM SPSS Statistics data file", file.exists(spss)),
-             create_txt(stata, "dta", "Stata DTA file", file.exists(stata)))
+
+  if (any(
+    file.exists(rds),
+    file.exists(txt),
+    file.exists(excel),
+    file.exists(feather),
+    file.exists(parquet),
+    file.exists(sas),
+    file.exists(spss),
+    file.exists(stata)
+  )) {
+    msg <- c(
+      msg, "\n**Direct download links:**\n\n",
+      create_txt(rds, "rds", "original R Data Structure (RDS) file", file.exists(rds)),
+      create_txt(txt, "txt", "tab-separated text file", file.exists(txt)),
+      create_txt(excel, "xlsx", "Microsoft Excel workbook", file.exists(excel)),
+      create_txt(feather, "feather", "Apache Feather file", file.exists(feather)),
+      create_txt(parquet, "parquet", "Apache Parquet file", file.exists(parquet)),
+      create_txt(sas, "sas", "SAS data file", file.exists(sas)),
+      create_txt(spss, "sav", "IBM SPSS Statistics data file", file.exists(spss)),
+      create_txt(stata, "dta", "Stata DTA file", file.exists(stata))
+    )
  }
  paste0(msg, collapse = "")
 }

 print_df <- function(x, rows = 6) {
-  x %>% 
-    as.data.frame(stringsAsFactors = FALSE) %>% 
-    head(n = rows) %>% 
+  x %>%
+    as.data.frame(stringsAsFactors = FALSE) %>%
+    head(n = rows) %>%
    mutate_all(function(x) {
      if (is.list(x)) {
        sapply(x, function(y) {
@@ -128,10 +138,10 @@ Our full taxonomy of microorganisms is based on the authoritative and comprehens
 Included (sub)species per taxonomic kingdom:

 ```{r, echo = FALSE}
-microorganisms %>% 
-  count(kingdom) %>% 
-  mutate(n = format(n, big.mark = ",")) %>% 
-  setNames(c("Kingdom", "Number of (sub)species")) %>% 
+microorganisms %>%
+  count(kingdom) %>%
+  mutate(n = format(n, big.mark = ",")) %>%
+  setNames(c("Kingdom", "Number of (sub)species")) %>%
  print_df()
 ```

@@ -139,7 +149,7 @@ Example rows when filtering on genus *Escherichia*:

 ```{r, echo = FALSE}
 microorganisms %>%
-  filter(genus == "Escherichia") %>% 
+  filter(genus == "Escherichia") %>%
  print_df()
 ```

@@ -166,7 +176,7 @@ Example rows when filtering on *Escherichia*:

 ```{r, echo = FALSE}
 microorganisms.old %>%
-  filter(fullname %like% "^Escherichia") %>% 
+  filter(fullname %like% "^Escherichia") %>%
  print_df()
 ```

@@ -191,7 +201,7 @@ This data set contains all EARS-Net and ATC codes gathered from WHO and WHONET,

 ```{r, echo = FALSE}
 antibiotics %>%
-  filter(ab %in% colnames(example_isolates)) %>% 
+  filter(ab %in% colnames(example_isolates)) %>%
  print_df()
 ```

@@ -233,9 +243,9 @@ This data set contains interpretation rules for MIC values and disk diffusion di
 ### Example content

 ```{r, echo = FALSE}
-rsi_translation %>% 
-  mutate(mo_name = mo_name(mo, language = NULL), .after = mo) %>% 
-  mutate(ab_name = ab_name(ab, language = NULL), .after = ab) %>% 
+rsi_translation %>%
+  mutate(mo_name = mo_name(mo, language = NULL), .after = mo) %>%
+  mutate(ab_name = ab_name(ab, language = NULL), .after = ab) %>%
  print_df()
 ```

@@ -258,9 +268,11 @@ Example rows when filtering on *Enterobacter cloacae*:

 ```{r, echo = FALSE}
 intrinsic_resistant %>%
-  transmute(microorganism = mo_name(mo),
-            antibiotic = ab_name(ab)) %>% 
-  filter(microorganism == "Enterobacter cloacae") %>% 
+  transmute(
+    microorganism = mo_name(mo),
+    antibiotic = ab_name(ab)
+  ) %>%
+  filter(microorganism == "Enterobacter cloacae") %>%
  arrange(antibiotic) %>%
  print_df(rows = Inf)
 ```
@@ -283,7 +295,7 @@ Currently included dosages in the data set are meant for: `r AMR:::format_eucast
 ### Example content

 ```{r, echo = FALSE}
-dosage %>% 
+dosage %>%
  print_df()
 ```

@@ -303,7 +315,7 @@ This data set contains randomised fictitious data, but reflects reality and can
 ### Example content

 ```{r, echo = FALSE}
-example_isolates %>% 
+example_isolates %>%
  print_df()
 ```

@@ -322,6 +334,6 @@ This data set contains randomised fictitious data, but reflects reality and can
 ### Example content

 ```{r, echo = FALSE}
-example_isolates_unclean %>% 
+example_isolates_unclean %>%
  print_df()
 ```
--- a/vignettes/resistance_predict.Rmd
+++ b/vignettes/resistance_predict.Rmd
@@ -43,14 +43,18 @@ It is basically as easy as:
 resistance_predict(tbl = example_isolates, col_date = "date", col_ab = "TZP", model = "binomial")

 # or:
-example_isolates %>% 
-  resistance_predict(col_ab = "TZP",
-                     model  "binomial")
+example_isolates %>%
+  resistance_predict(
+    col_ab = "TZP",
+    model = "binomial"
+  )

 # to bind it to object 'predict_TZP' for example:
-predict_TZP <- example_isolates %>% 
-  resistance_predict(col_ab = "TZP",
-                     model = "binomial")
+predict_TZP <- example_isolates %>%
+  resistance_predict(
+    col_ab = "TZP",
+    model = "binomial"
+  )
 ```

 The function will look for a date column itself if `col_date` is not set.
@@ -58,7 +62,7 @@ The function will look for a date column itself if `col_date` is not set.
 When running any of these commands, a summary of the regression model will be printed unless using `resistance_predict(..., info = FALSE)`.

 ```{r, echo = FALSE, message = FALSE}
-predict_TZP <- example_isolates %>% 
+predict_TZP <- example_isolates %>%
  resistance_predict(col_ab = "TZP", model = "binomial")
 ```

@@ -92,7 +96,7 @@ Resistance is not easily predicted; if we look at vancomycin resistance in Gram-
 ```{r}
 example_isolates %>%
  filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
-  resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "binomial") %>% 
+  resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "binomial") %>%
  ggplot_rsi_predict()
 ```

@@ -113,7 +117,7 @@ For the vancomycin resistance in Gram-positive bacteria, a linear model might be
 ```{r}
 example_isolates %>%
  filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
-  resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>% 
+  resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>%
  ggplot_rsi_predict()
 ```

--- a/vignettes/welcome_to_AMR.Rmd
+++ b/vignettes/welcome_to_AMR.Rmd
@@ -28,10 +28,6 @@ Note: to keep the package size as small as possible, we only included this vigne

 The `AMR` package is a [free and open-source](https://msberends.github.io/AMR/#copyright) R package with [zero dependencies](https://en.wikipedia.org/wiki/Dependency_hell) to simplify the analysis and prediction of Antimicrobial Resistance (AMR) and to work with microbial and antimicrobial data and properties, by using evidence-based methods. **Our aim is to provide a standard** for clean and reproducible AMR data analysis, that can therefore empower epidemiological analyses to continuously enable surveillance and treatment evaluation in any setting.

-```{r, echo = FALSE, out.width = "555px"}
-knitr::include_graphics("AMR_intro.png")
-```
-
 After installing this package, R knows `r AMR:::format_included_data_number(AMR::microorganisms)` distinct microbial species and all `r AMR:::format_included_data_number(rbind(AMR::antibiotics[, "atc", drop = FALSE], AMR::antivirals[, "atc", drop = FALSE]))` antibiotic, antimycotic and antiviral drugs by name and code (including ATC, EARS-Net, PubChem, LOINC and SNOMED CT), and knows all about valid R/SI and MIC values. It supports any data format, including WHONET/EARS-Net data.

 The `AMR` package is available in English, Chinese, Danish, Dutch, French, German, Greek, Italian, Japanese, Polish, Portuguese, Russian, Spanish, Swedish, Turkish and Ukrainian. Antimicrobial drug (group) names and colloquial microorganism names are provided in these languages.