new antibiotics

2025-07-08 12:31:58 +02:00 · 2019-05-10 16:44:59 +02:00
parent 73f1ee1159
commit 68cc7ef0d0
147 changed files with 6228 additions and 4187 deletions
--- a/vignettes/AMR.Rmd
+++ b/vignettes/AMR.Rmd
@ -35,8 +35,8 @@ You can skip to [Cleaning the data](#cleaning-the-data) if you already have your
 knitr::kable(dplyr::tibble(date = Sys.Date(),
                           patient_id = c("abcd", "abcd", "efgh"),
                           mo = "Escherichia coli", 
-                           amox = c("S", "S", "R"),
-                           cipr = c("S", "R", "S")), 
+                           AMX = c("S", "S", "R"),
+                           CIP = c("S", "R", "S")), 
             align = "c")
 ``` 

@ -113,13 +113,13 @@ data <- data.frame(date = sample(dates, size = sample_size, replace = TRUE),
                                     prob = c(0.30, 0.35, 0.15, 0.20)),
                   bacteria = sample(bacteria, size = sample_size, replace = TRUE,
                                     prob = c(0.50, 0.25, 0.15, 0.10)),
-                   amox = sample(ab_interpretations, size = sample_size, replace = TRUE,
+                   AMX = sample(ab_interpretations, size = sample_size, replace = TRUE,
                                 prob = c(0.60, 0.05, 0.35)),
-                   amcl = sample(ab_interpretations, size = sample_size, replace = TRUE,
+                   AMC = sample(ab_interpretations, size = sample_size, replace = TRUE,
                                 prob = c(0.75, 0.10, 0.15)),
-                   cipr = sample(ab_interpretations, size = sample_size, replace = TRUE,
+                   CIP = sample(ab_interpretations, size = sample_size, replace = TRUE,
                                 prob = c(0.80, 0.00, 0.20)),
-                   gent = sample(ab_interpretations, size = sample_size, replace = TRUE,
+                   GEN = sample(ab_interpretations, size = sample_size, replace = TRUE,
                                 prob = c(0.92, 0.00, 0.08))
                   )
 ```
@ -166,12 +166,12 @@ We also want to transform the antibiotics, because in real life data we don't kn

 ```{r transform abx}
 data <- data %>%
-  mutate_at(vars(amox:gent), as.rsi)
+  mutate_at(vars(AMX:GEN), as.rsi)
 ```

 Finally, we will apply [EUCAST rules](http://www.eucast.org/expert_rules_and_intrinsic_resistance/) on our antimicrobial results. In Europe, most medical microbiological laboratories already apply these rules. Our package features their latest insights on intrinsic resistance and exceptional phenotypes. Moreover, the `eucast_rules()` function can also apply additional rules, like forcing <help title="ATC: J01CA01">ampicillin</help> = R when <help title="ATC: J01CR02">amoxicillin/clavulanic acid</help> = R.

-Because the amoxicillin (column `amox`) and amoxicillin/clavulanic acid (column `amcl`) in our data were generated randomly, some rows will undoubtedly contain amox = S and amcl = R, which is technically impossible. The `eucast_rules()` fixes this:
+Because the amoxicillin (column `AMX`) and amoxicillin/clavulanic acid (column `AMC`) in our data were generated randomly, some rows will undoubtedly contain AMX = S and AMC = R, which is technically impossible. The `eucast_rules()` fixes this:

 ```{r eucast, warning = FALSE, message = FALSE}
 data <- eucast_rules(data, col_mo = "bacteria")
@ -226,7 +226,7 @@ weighted_df <- data %>%
  # only most prevalent patient
  filter(patient_id == top_freq(freq(., patient_id), 1)[1]) %>% 
  arrange(date) %>%
-  select(date, patient_id, bacteria, amox:gent, first) %>% 
+  select(date, patient_id, bacteria, AMX:GEN, first) %>% 
  # maximum of 10 rows
  .[1:min(10, nrow(.)),] %>% 
  mutate(isolate = row_number()) %>% 
@ -240,7 +240,7 @@ Only `r sum(weighted_df$first)` isolates are marked as 'first' according to CLSI

 If a column exists with a name like 'key(...)ab' the `first_isolate()` function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output: 

-```{r 1st weighted}
+```{r 1st weighted, warning = FALSE}
 data <- data %>% 
  mutate(keyab = key_antibiotics(.)) %>% 
  mutate(first_weighted = first_isolate(.))
@ -252,7 +252,7 @@ weighted_df2 <- data %>%
  # only most prevalent patient
  filter(patient_id == top_freq(freq(., patient_id), 1)[1]) %>% 
  arrange(date) %>%
-  select(date, patient_id, bacteria, amox:gent, first, first_weighted) %>% 
+  select(date, patient_id, bacteria, AMX:GEN, first, first_weighted) %>% 
  # maximum of 10 rows
  .[1:min(10, nrow(.)),] %>% 
  mutate(isolate = row_number()) %>% 
@ -292,7 +292,7 @@ knitr::kable(head(data_1st), align = "c")
 Time for the analysis!

 # Analysing the data
-You might want to start by getting an idea of how the data is distributed. It's an important start, because it also decides how you will continue your analysis. 
+You might want to start by getting an idea of how the data is distributed. It's an important start, because it also decides how you will continue your analysis. Although this package contains a convenient function to make frequency tables, exploratory data analysis (EDA) is not the primary scope of this package. Use a package like [`DataExplorer`](https://cran.r-project.org/package=DataExplorer) for that, or read the free online book [Exploratory Data Analysis with R](https://bookdown.org/rdpeng/exdata/) by Roger D. Peng.

 ## Dispersion of species
 To just get an idea how the species are distributed, create a frequency table with our `freq()` function. We created the `genus` and `species` column earlier based on the microbial ID. With `paste()`, we can concatenate them together.
@ -318,7 +318,7 @@ data_1st %>%
 The functions `portion_S()`, `portion_SI()`, `portion_I()`, `portion_IR()` and `portion_R()` can be used to determine the portion of a specific antimicrobial outcome. They can be used on their own:

 ```{r}
-data_1st %>% portion_IR(amox)
+data_1st %>% portion_IR(AMX)
 ```

 Or can be used in conjuction with `group_by()` and `summarise()`, both from the `dplyr` package:
@ -326,12 +326,12 @@ Or can be used in conjuction with `group_by()` and `summarise()`, both from the
 ```{r, eval = FALSE}
 data_1st %>% 
  group_by(hospital) %>% 
-  summarise(amoxicillin = portion_IR(amox))
+  summarise(amoxicillin = portion_IR(AMX))
 ```
 ```{r, echo = FALSE}
 data_1st %>% 
  group_by(hospital) %>% 
-  summarise(amoxicillin = portion_IR(amox)) %>% 
+  summarise(amoxicillin = portion_IR(AMX)) %>% 
  knitr::kable(align = "c", big.mark = ",")
 ```

@ -340,14 +340,14 @@ Of course it would be very convenient to know the number of isolates responsible
 ```{r, eval = FALSE}
 data_1st %>% 
  group_by(hospital) %>% 
-  summarise(amoxicillin = portion_IR(amox),
-            available = n_rsi(amox))
+  summarise(amoxicillin = portion_IR(AMX),
+            available = n_rsi(AMX))
 ```
 ```{r, echo = FALSE}
 data_1st %>% 
  group_by(hospital) %>% 
-  summarise(amoxicillin = portion_IR(amox),
-            available = n_rsi(amox)) %>% 
+  summarise(amoxicillin = portion_IR(AMX),
+            available = n_rsi(AMX)) %>% 
  knitr::kable(align = "c", big.mark = ",")
 ```

@ -356,16 +356,16 @@ These functions can also be used to get the portion of multiple antibiotics, to
 ```{r, eval = FALSE}
 data_1st %>% 
  group_by(genus) %>% 
-  summarise(amoxiclav = portion_S(amcl),
-            gentamicin = portion_S(gent),
-            amoxiclav_genta = portion_S(amcl, gent))
+  summarise(amoxiclav = portion_S(AMC),
+            gentamicin = portion_S(GEN),
+            amoxiclav_genta = portion_S(AMC, GEN))
 ```
 ```{r, echo = FALSE}
 data_1st %>% 
  group_by(genus) %>% 
-  summarise(amoxiclav = portion_S(amcl),
-            gentamicin = portion_S(gent),
-            amoxiclav_genta = portion_S(amcl, gent)) %>% 
+  summarise(amoxiclav = portion_S(AMC),
+            gentamicin = portion_S(GEN),
+            amoxiclav_genta = portion_S(AMC, GEN)) %>% 
  knitr::kable(align = "c", big.mark = ",")
 ```

@ -374,9 +374,9 @@ To make a transition to the next part, let's see how this difference could be pl
 ```{r plot 1}
 data_1st %>% 
  group_by(genus) %>% 
-  summarise("1. Amoxi/clav" = portion_S(amcl),
-            "2. Gentamicin" = portion_S(gent),
-            "3. Amoxi/clav + gent" = portion_S(amcl, gent)) %>% 
+  summarise("1. Amoxi/clav" = portion_S(AMC),
+            "2. Gentamicin" = portion_S(GEN),
+            "3. Amoxi/clav + GEN" = portion_S(AMC, GEN)) %>% 
  tidyr::gather("Antibiotic", "S", -genus) %>%
  ggplot(aes(x = genus,
             y = S,
@ -409,7 +409,7 @@ ggplot(data_1st) +
  geom_rsi(translate_ab = FALSE)
 ```

-Omit the `translate_ab = FALSE` to have the antibiotic codes (amox, amcl, cipr, gent) translated to official WHO names (amoxicillin, amoxicillin and betalactamase inhibitor, ciprofloxacin, gentamicin).
+Omit the `translate_ab = FALSE` to have the antibiotic codes (AMX, AMC, CIP, GEN) translated to official WHO names (amoxicillin, amoxicillin and betalactamase inhibitor, ciprofloxacin, gentamicin).

 If we group on e.g. the `genus` column and add some additional functions from our package, we can create this:

@ -452,12 +452,12 @@ data_1st %>%

 The next example uses the included `septic_patients`, which is an anonymised data set containing 2,000 microbial blood culture isolates with their full antibiograms found in septic patients in 4 different hospitals in the Netherlands, between 2001 and 2017. It is true, genuine data. This `data.frame` can be used to practice AMR analysis.

-We will compare the resistance to fosfomycin (column `fosf`) in hospital A and D. The input for the final `fisher.test()` will be this:
+We will compare the resistance to fosfomycin (column `FOS`) in hospital A and D. The input for the final `fisher.test()` will be this:

 ```{r, echo = FALSE, results = 'asis'}
 septic_patients %>%
  filter(hospital_id %in% c("A", "D")) %>%
-  select(hospital_id, fosf) %>%
+  select(hospital_id, FOS) %>%
  group_by(hospital_id) %>%
  count_df(combine_IR = TRUE) %>%
  tidyr::spread(hospital_id, Value) %>%
@ -472,7 +472,7 @@ We can transform the data and apply the test in only a couple of lines:
 ```{r}
 septic_patients %>%
  filter(hospital_id %in% c("A", "D")) %>% # filter on only hospitals A and D
-  select(hospital_id, fosf) %>%            # select the hospitals and fosfomycin
+  select(hospital_id, FOS) %>%             # select the hospitals and fosfomycin
  group_by(hospital_id) %>%                # group on the hospitals
  count_df(combine_IR = TRUE) %>%          # count all isolates per group (hospital_id)
  tidyr::spread(hospital_id, Value) %>%    # transform output so A and D are columns
--- a/vignettes/SPSS.Rmd
+++ b/vignettes/SPSS.Rmd
@ -64,7 +64,7 @@ As said, SPSS is easier to learn than R. But SPSS, SAS and Stata come with major
  If you are working at a midsized or small company, you can save it tens of thousands of dollars by using R instead of e.g. SPSS - gaining even more functions and flexibility. And all R enthousiasts can do as much PR as they want (like I do here), because nobody is officially associated with or affiliated by R. It is really free.


-If you sometimes write syntaxes in SPSS to run a complete analysis or to 'automate' some of your work, you should perhaps do this in R. You will notice that writing syntaxes in R is a lot more nifty and clever than in SPSS.
+If you sometimes write syntaxes in SPSS to run a complete analysis or to 'automate' some of your work, you should perhaps do this in R. You will notice that writing syntaxes in R is a lot more nifty and clever than in SPSS. Still, as working with any statistical package, you will have to have knowledge about what you are doing (statistically) and what you are willing to accomplish.

 To demonstrate the first point:

@ -83,10 +83,10 @@ klebsiella_test <- data.frame(mo = "klebsiella",
 klebsiella_test
 eucast_rules(klebsiella_test, info = FALSE)

-# hundreds of trade names can be translated to an ATC or name:
-atc_name("floxapen")
-as.atc("floxapen")
-atc_tradenames("floxapen")
+# hundreds of trade names can be translated to a name, trade name or an ATC code:
+ab_name("floxapen")
+ab_tradenames("floxapen")
+ab_atc("floxapen")
 ```

 ## Import data from SPSS/SAS/Stata
--- a/vignettes/atc_property.Rmd
+++ b/vignettes/atc_property.Rmd
--- a/vignettes/benchmarks.Rmd
+++ b/vignettes/benchmarks.Rmd
@ -81,7 +81,7 @@ boxplot(microbenchmark(as.mo("Thermus islandicus"),
                       as.mo("E. coli"),
                       times = 10),
        horizontal = TRUE, las = 1, unit = "s", log = FALSE,
-        xlab = "", ylab = "Time in seconds", ylim = c(0, 0.5),
+        xlab = "", ylab = "Time in seconds",
        main = "Benchmarks per prevalence")
 ```

--- a/vignettes/resistance_predict.Rmd
+++ b/vignettes/resistance_predict.Rmd
@ -41,16 +41,16 @@ Our package contains a function `resistance_predict()`, which takes the same inp

 It is basically as easy as:
 ```{r, eval = FALSE}
-# resistance prediction of piperacillin/tazobactam (pita):
-resistance_predict(tbl = septic_patients, col_date = "date", col_ab = "pita")
+# resistance prediction of piperacillin/tazobactam (TZP):
+resistance_predict(tbl = septic_patients, col_date = "date", col_ab = "TZP")

 # or:
 septic_patients %>% 
-  resistance_predict(col_ab = "pita")
+  resistance_predict(col_ab = "TZP")

-# to bind it to object 'predict_pita' for example:
-predict_pita <- septic_patients %>% 
-  resistance_predict(col_ab = "pita")
+# to bind it to object 'predict_TZP' for example:
+predict_TZP <- septic_patients %>% 
+  resistance_predict(col_ab = "TZP")
 ```

 The function will look for a date column itself if `col_date` is not set.
@ -58,20 +58,20 @@ The function will look for a date column itself if `col_date` is not set.
 When running any of these commands, a summary of the regression model will be printed unless using `resistance_predict(..., info = FALSE)`.

 ```{r, echo = FALSE}
-predict_pita <- septic_patients %>% 
-  resistance_predict(col_ab = "pita")
+predict_TZP <- septic_patients %>% 
+  resistance_predict(col_ab = "TZP")
 ```

 This text is only a printed summary - the actual result (output) of the function is a `data.frame` containing for each year: the number of observations, the actual observed resistance, the estimated resistance and the standard error below and above the estimation:

 ```{r}
-predict_pita
+predict_TZP
 ```

 The function `plot` is available in base R, and can be extended by other packages to depend the output based on the type of input. We extended its function to cope with resistance predictions:

 ```{r, fig.height = 5.5}
-plot(predict_pita)
+plot(predict_TZP)
 ```

 This is the fastest way to plot the result. It automatically adds the right axes, error bars, titles, number of available observations and type of model.
@ -79,10 +79,10 @@ This is the fastest way to plot the result. It automatically adds the right axes
 We also support the `ggplot2` package with our custom function `ggplot_rsi_predict()` to create more appealing plots:

 ```{r}
-ggplot_rsi_predict(predict_pita)
+ggplot_rsi_predict(predict_TZP)

 # choose for error bars instead of a ribbon
-ggplot_rsi_predict(predict_pita, ribbon = FALSE)
+ggplot_rsi_predict(predict_TZP, ribbon = FALSE)
 ```

 ### Choosing the right model
@ -92,7 +92,7 @@ Resistance is not easily predicted; if we look at vancomycin resistance in Gram
 ```{r}
 septic_patients %>%
  filter(mo_gramstain(mo) == "Gram positive") %>%
-  resistance_predict(col_ab = "vanc", year_min = 2010, info = FALSE) %>% 
+  resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE) %>% 
  ggplot_rsi_predict()
 ```

@ -113,7 +113,7 @@ For the vancomycin resistance in Gram positive bacteria, a linear model might be
 ```{r}
 septic_patients %>%
  filter(mo_gramstain(mo) == "Gram positive") %>%
-  resistance_predict(col_ab = "vanc", year_min = 2010, info = FALSE, model = "linear") %>% 
+  resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>% 
  ggplot_rsi_predict()
 ```

@ -121,7 +121,7 @@ This seems more likely, doesn't it?

 The model itself is also available from the object, as an `attribute`:
 ```{r}
-model <- attributes(predict_pita)$model
+model <- attributes(predict_TZP)$model

 summary(model)$family