1
0
mirror of https://github.com/msberends/AMR.git synced 2025-07-08 12:31:58 +02:00

new antibiotics

This commit is contained in:
2019-05-10 16:44:59 +02:00
parent 73f1ee1159
commit 68cc7ef0d0
147 changed files with 6228 additions and 4187 deletions

View File

@ -35,8 +35,8 @@ You can skip to [Cleaning the data](#cleaning-the-data) if you already have your
knitr::kable(dplyr::tibble(date = Sys.Date(),
patient_id = c("abcd", "abcd", "efgh"),
mo = "Escherichia coli",
amox = c("S", "S", "R"),
cipr = c("S", "R", "S")),
AMX = c("S", "S", "R"),
CIP = c("S", "R", "S")),
align = "c")
```
@ -113,13 +113,13 @@ data <- data.frame(date = sample(dates, size = sample_size, replace = TRUE),
prob = c(0.30, 0.35, 0.15, 0.20)),
bacteria = sample(bacteria, size = sample_size, replace = TRUE,
prob = c(0.50, 0.25, 0.15, 0.10)),
amox = sample(ab_interpretations, size = sample_size, replace = TRUE,
AMX = sample(ab_interpretations, size = sample_size, replace = TRUE,
prob = c(0.60, 0.05, 0.35)),
amcl = sample(ab_interpretations, size = sample_size, replace = TRUE,
AMC = sample(ab_interpretations, size = sample_size, replace = TRUE,
prob = c(0.75, 0.10, 0.15)),
cipr = sample(ab_interpretations, size = sample_size, replace = TRUE,
CIP = sample(ab_interpretations, size = sample_size, replace = TRUE,
prob = c(0.80, 0.00, 0.20)),
gent = sample(ab_interpretations, size = sample_size, replace = TRUE,
GEN = sample(ab_interpretations, size = sample_size, replace = TRUE,
prob = c(0.92, 0.00, 0.08))
)
```
@ -166,12 +166,12 @@ We also want to transform the antibiotics, because in real life data we don't kn
```{r transform abx}
data <- data %>%
mutate_at(vars(amox:gent), as.rsi)
mutate_at(vars(AMX:GEN), as.rsi)
```
Finally, we will apply [EUCAST rules](http://www.eucast.org/expert_rules_and_intrinsic_resistance/) on our antimicrobial results. In Europe, most medical microbiological laboratories already apply these rules. Our package features their latest insights on intrinsic resistance and exceptional phenotypes. Moreover, the `eucast_rules()` function can also apply additional rules, like forcing <help title="ATC: J01CA01">ampicillin</help> = R when <help title="ATC: J01CR02">amoxicillin/clavulanic acid</help> = R.
Because the amoxicillin (column `amox`) and amoxicillin/clavulanic acid (column `amcl`) in our data were generated randomly, some rows will undoubtedly contain amox = S and amcl = R, which is technically impossible. The `eucast_rules()` fixes this:
Because the amoxicillin (column `AMX`) and amoxicillin/clavulanic acid (column `AMC`) in our data were generated randomly, some rows will undoubtedly contain AMX = S and AMC = R, which is technically impossible. The `eucast_rules()` fixes this:
```{r eucast, warning = FALSE, message = FALSE}
data <- eucast_rules(data, col_mo = "bacteria")
@ -226,7 +226,7 @@ weighted_df <- data %>%
# only most prevalent patient
filter(patient_id == top_freq(freq(., patient_id), 1)[1]) %>%
arrange(date) %>%
select(date, patient_id, bacteria, amox:gent, first) %>%
select(date, patient_id, bacteria, AMX:GEN, first) %>%
# maximum of 10 rows
.[1:min(10, nrow(.)),] %>%
mutate(isolate = row_number()) %>%
@ -240,7 +240,7 @@ Only `r sum(weighted_df$first)` isolates are marked as 'first' according to CLSI
If a column exists with a name like 'key(...)ab' the `first_isolate()` function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output:
```{r 1st weighted}
```{r 1st weighted, warning = FALSE}
data <- data %>%
mutate(keyab = key_antibiotics(.)) %>%
mutate(first_weighted = first_isolate(.))
@ -252,7 +252,7 @@ weighted_df2 <- data %>%
# only most prevalent patient
filter(patient_id == top_freq(freq(., patient_id), 1)[1]) %>%
arrange(date) %>%
select(date, patient_id, bacteria, amox:gent, first, first_weighted) %>%
select(date, patient_id, bacteria, AMX:GEN, first, first_weighted) %>%
# maximum of 10 rows
.[1:min(10, nrow(.)),] %>%
mutate(isolate = row_number()) %>%
@ -292,7 +292,7 @@ knitr::kable(head(data_1st), align = "c")
Time for the analysis!
# Analysing the data
You might want to start by getting an idea of how the data is distributed. It's an important start, because it also decides how you will continue your analysis.
You might want to start by getting an idea of how the data is distributed. It's an important start, because it also decides how you will continue your analysis. Although this package contains a convenient function to make frequency tables, exploratory data analysis (EDA) is not the primary scope of this package. Use a package like [`DataExplorer`](https://cran.r-project.org/package=DataExplorer) for that, or read the free online book [Exploratory Data Analysis with R](https://bookdown.org/rdpeng/exdata/) by Roger D. Peng.
## Dispersion of species
To just get an idea how the species are distributed, create a frequency table with our `freq()` function. We created the `genus` and `species` column earlier based on the microbial ID. With `paste()`, we can concatenate them together.
@ -318,7 +318,7 @@ data_1st %>%
The functions `portion_S()`, `portion_SI()`, `portion_I()`, `portion_IR()` and `portion_R()` can be used to determine the portion of a specific antimicrobial outcome. They can be used on their own:
```{r}
data_1st %>% portion_IR(amox)
data_1st %>% portion_IR(AMX)
```
Or can be used in conjuction with `group_by()` and `summarise()`, both from the `dplyr` package:
@ -326,12 +326,12 @@ Or can be used in conjuction with `group_by()` and `summarise()`, both from the
```{r, eval = FALSE}
data_1st %>%
group_by(hospital) %>%
summarise(amoxicillin = portion_IR(amox))
summarise(amoxicillin = portion_IR(AMX))
```
```{r, echo = FALSE}
data_1st %>%
group_by(hospital) %>%
summarise(amoxicillin = portion_IR(amox)) %>%
summarise(amoxicillin = portion_IR(AMX)) %>%
knitr::kable(align = "c", big.mark = ",")
```
@ -340,14 +340,14 @@ Of course it would be very convenient to know the number of isolates responsible
```{r, eval = FALSE}
data_1st %>%
group_by(hospital) %>%
summarise(amoxicillin = portion_IR(amox),
available = n_rsi(amox))
summarise(amoxicillin = portion_IR(AMX),
available = n_rsi(AMX))
```
```{r, echo = FALSE}
data_1st %>%
group_by(hospital) %>%
summarise(amoxicillin = portion_IR(amox),
available = n_rsi(amox)) %>%
summarise(amoxicillin = portion_IR(AMX),
available = n_rsi(AMX)) %>%
knitr::kable(align = "c", big.mark = ",")
```
@ -356,16 +356,16 @@ These functions can also be used to get the portion of multiple antibiotics, to
```{r, eval = FALSE}
data_1st %>%
group_by(genus) %>%
summarise(amoxiclav = portion_S(amcl),
gentamicin = portion_S(gent),
amoxiclav_genta = portion_S(amcl, gent))
summarise(amoxiclav = portion_S(AMC),
gentamicin = portion_S(GEN),
amoxiclav_genta = portion_S(AMC, GEN))
```
```{r, echo = FALSE}
data_1st %>%
group_by(genus) %>%
summarise(amoxiclav = portion_S(amcl),
gentamicin = portion_S(gent),
amoxiclav_genta = portion_S(amcl, gent)) %>%
summarise(amoxiclav = portion_S(AMC),
gentamicin = portion_S(GEN),
amoxiclav_genta = portion_S(AMC, GEN)) %>%
knitr::kable(align = "c", big.mark = ",")
```
@ -374,9 +374,9 @@ To make a transition to the next part, let's see how this difference could be pl
```{r plot 1}
data_1st %>%
group_by(genus) %>%
summarise("1. Amoxi/clav" = portion_S(amcl),
"2. Gentamicin" = portion_S(gent),
"3. Amoxi/clav + gent" = portion_S(amcl, gent)) %>%
summarise("1. Amoxi/clav" = portion_S(AMC),
"2. Gentamicin" = portion_S(GEN),
"3. Amoxi/clav + GEN" = portion_S(AMC, GEN)) %>%
tidyr::gather("Antibiotic", "S", -genus) %>%
ggplot(aes(x = genus,
y = S,
@ -409,7 +409,7 @@ ggplot(data_1st) +
geom_rsi(translate_ab = FALSE)
```
Omit the `translate_ab = FALSE` to have the antibiotic codes (amox, amcl, cipr, gent) translated to official WHO names (amoxicillin, amoxicillin and betalactamase inhibitor, ciprofloxacin, gentamicin).
Omit the `translate_ab = FALSE` to have the antibiotic codes (AMX, AMC, CIP, GEN) translated to official WHO names (amoxicillin, amoxicillin and betalactamase inhibitor, ciprofloxacin, gentamicin).
If we group on e.g. the `genus` column and add some additional functions from our package, we can create this:
@ -452,12 +452,12 @@ data_1st %>%
The next example uses the included `septic_patients`, which is an anonymised data set containing 2,000 microbial blood culture isolates with their full antibiograms found in septic patients in 4 different hospitals in the Netherlands, between 2001 and 2017. It is true, genuine data. This `data.frame` can be used to practice AMR analysis.
We will compare the resistance to fosfomycin (column `fosf`) in hospital A and D. The input for the final `fisher.test()` will be this:
We will compare the resistance to fosfomycin (column `FOS`) in hospital A and D. The input for the final `fisher.test()` will be this:
```{r, echo = FALSE, results = 'asis'}
septic_patients %>%
filter(hospital_id %in% c("A", "D")) %>%
select(hospital_id, fosf) %>%
select(hospital_id, FOS) %>%
group_by(hospital_id) %>%
count_df(combine_IR = TRUE) %>%
tidyr::spread(hospital_id, Value) %>%
@ -472,7 +472,7 @@ We can transform the data and apply the test in only a couple of lines:
```{r}
septic_patients %>%
filter(hospital_id %in% c("A", "D")) %>% # filter on only hospitals A and D
select(hospital_id, fosf) %>% # select the hospitals and fosfomycin
select(hospital_id, FOS) %>% # select the hospitals and fosfomycin
group_by(hospital_id) %>% # group on the hospitals
count_df(combine_IR = TRUE) %>% # count all isolates per group (hospital_id)
tidyr::spread(hospital_id, Value) %>% # transform output so A and D are columns

View File

@ -64,7 +64,7 @@ As said, SPSS is easier to learn than R. But SPSS, SAS and Stata come with major
If you are working at a midsized or small company, you can save it tens of thousands of dollars by using R instead of e.g. SPSS - gaining even more functions and flexibility. And all R enthousiasts can do as much PR as they want (like I do here), because nobody is officially associated with or affiliated by R. It is really free.
If you sometimes write syntaxes in SPSS to run a complete analysis or to 'automate' some of your work, you should perhaps do this in R. You will notice that writing syntaxes in R is a lot more nifty and clever than in SPSS.
If you sometimes write syntaxes in SPSS to run a complete analysis or to 'automate' some of your work, you should perhaps do this in R. You will notice that writing syntaxes in R is a lot more nifty and clever than in SPSS. Still, as working with any statistical package, you will have to have knowledge about what you are doing (statistically) and what you are willing to accomplish.
To demonstrate the first point:
@ -83,10 +83,10 @@ klebsiella_test <- data.frame(mo = "klebsiella",
klebsiella_test
eucast_rules(klebsiella_test, info = FALSE)
# hundreds of trade names can be translated to an ATC or name:
atc_name("floxapen")
as.atc("floxapen")
atc_tradenames("floxapen")
# hundreds of trade names can be translated to a name, trade name or an ATC code:
ab_name("floxapen")
ab_tradenames("floxapen")
ab_atc("floxapen")
```
## Import data from SPSS/SAS/Stata

View File

@ -81,7 +81,7 @@ boxplot(microbenchmark(as.mo("Thermus islandicus"),
as.mo("E. coli"),
times = 10),
horizontal = TRUE, las = 1, unit = "s", log = FALSE,
xlab = "", ylab = "Time in seconds", ylim = c(0, 0.5),
xlab = "", ylab = "Time in seconds",
main = "Benchmarks per prevalence")
```

View File

@ -41,16 +41,16 @@ Our package contains a function `resistance_predict()`, which takes the same inp
It is basically as easy as:
```{r, eval = FALSE}
# resistance prediction of piperacillin/tazobactam (pita):
resistance_predict(tbl = septic_patients, col_date = "date", col_ab = "pita")
# resistance prediction of piperacillin/tazobactam (TZP):
resistance_predict(tbl = septic_patients, col_date = "date", col_ab = "TZP")
# or:
septic_patients %>%
resistance_predict(col_ab = "pita")
resistance_predict(col_ab = "TZP")
# to bind it to object 'predict_pita' for example:
predict_pita <- septic_patients %>%
resistance_predict(col_ab = "pita")
# to bind it to object 'predict_TZP' for example:
predict_TZP <- septic_patients %>%
resistance_predict(col_ab = "TZP")
```
The function will look for a date column itself if `col_date` is not set.
@ -58,20 +58,20 @@ The function will look for a date column itself if `col_date` is not set.
When running any of these commands, a summary of the regression model will be printed unless using `resistance_predict(..., info = FALSE)`.
```{r, echo = FALSE}
predict_pita <- septic_patients %>%
resistance_predict(col_ab = "pita")
predict_TZP <- septic_patients %>%
resistance_predict(col_ab = "TZP")
```
This text is only a printed summary - the actual result (output) of the function is a `data.frame` containing for each year: the number of observations, the actual observed resistance, the estimated resistance and the standard error below and above the estimation:
```{r}
predict_pita
predict_TZP
```
The function `plot` is available in base R, and can be extended by other packages to depend the output based on the type of input. We extended its function to cope with resistance predictions:
```{r, fig.height = 5.5}
plot(predict_pita)
plot(predict_TZP)
```
This is the fastest way to plot the result. It automatically adds the right axes, error bars, titles, number of available observations and type of model.
@ -79,10 +79,10 @@ This is the fastest way to plot the result. It automatically adds the right axes
We also support the `ggplot2` package with our custom function `ggplot_rsi_predict()` to create more appealing plots:
```{r}
ggplot_rsi_predict(predict_pita)
ggplot_rsi_predict(predict_TZP)
# choose for error bars instead of a ribbon
ggplot_rsi_predict(predict_pita, ribbon = FALSE)
ggplot_rsi_predict(predict_TZP, ribbon = FALSE)
```
### Choosing the right model
@ -92,7 +92,7 @@ Resistance is not easily predicted; if we look at vancomycin resistance in Gram
```{r}
septic_patients %>%
filter(mo_gramstain(mo) == "Gram positive") %>%
resistance_predict(col_ab = "vanc", year_min = 2010, info = FALSE) %>%
resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE) %>%
ggplot_rsi_predict()
```
@ -113,7 +113,7 @@ For the vancomycin resistance in Gram positive bacteria, a linear model might be
```{r}
septic_patients %>%
filter(mo_gramstain(mo) == "Gram positive") %>%
resistance_predict(col_ab = "vanc", year_min = 2010, info = FALSE, model = "linear") %>%
resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>%
ggplot_rsi_predict()
```
@ -121,7 +121,7 @@ This seems more likely, doesn't it?
The model itself is also available from the object, as an `attribute`:
```{r}
model <- attributes(predict_pita)$model
model <- attributes(predict_TZP)$model
summary(model)$family