1
0
mirror of https://github.com/msberends/AMR.git synced 2025-07-11 11:41:54 +02:00

Replace RSI with SIR

This commit is contained in:
Dr. Matthijs Berends
2023-01-21 23:47:20 +01:00
committed by GitHub
parent 24b12024ce
commit 98e62c9af2
127 changed files with 1746 additions and 1648 deletions

View File

@ -33,7 +33,7 @@ Conducting AMR data analysis unfortunately requires in-depth knowledge from diff
* Good questions (always start with those!)
* A thorough understanding of (clinical) epidemiology, to understand the clinical and epidemiological relevance and possible bias of results
* A thorough understanding of (clinical) microbiology/infectious diseases, to understand which microorganisms are causal to which infections and the implications of pharmaceutical treatment, as well as understanding intrinsic and acquired microbial resistance
* Experience with data analysis with microbiological tests and their results, to understand the determination and limitations of MIC values and their interpretations to RSI values
* Experience with data analysis with microbiological tests and their results, to understand the determination and limitations of MIC values and their interpretations to SIR values
* Availability of the biological taxonomy of microorganisms and probably normalisation factors for pharmaceuticals, such as defined daily doses (DDD)
* Available (inter-)national guidelines, and profound methods to apply them
@ -122,7 +122,7 @@ bacteria <- c(
## Put everything together
Using the `sample()` function, we can randomly select items from all objects we defined earlier. To let our fake data reflect reality a bit, we will also approximately define the probabilities of bacteria and the antibiotic results, using the `random_rsi()` function.
Using the `sample()` function, we can randomly select items from all objects we defined earlier. To let our fake data reflect reality a bit, we will also approximately define the probabilities of bacteria and the antibiotic results, using the `random_sir()` function.
```{r merge data}
sample_size <- 20000
@ -142,10 +142,10 @@ data <- data.frame(
size = sample_size, replace = TRUE,
prob = c(0.50, 0.25, 0.15, 0.10)
),
AMX = random_rsi(sample_size, prob_RSI = c(0.35, 0.60, 0.05)),
AMC = random_rsi(sample_size, prob_RSI = c(0.15, 0.75, 0.10)),
CIP = random_rsi(sample_size, prob_RSI = c(0.20, 0.80, 0.00)),
GEN = random_rsi(sample_size, prob_RSI = c(0.08, 0.92, 0.00))
AMX = random_sir(sample_size, prob_sir = c(0.35, 0.60, 0.05)),
AMC = random_sir(sample_size, prob_sir = c(0.15, 0.75, 0.10)),
CIP = random_sir(sample_size, prob_sir = c(0.20, 0.80, 0.00)),
GEN = random_sir(sample_size, prob_sir = c(0.08, 0.92, 0.00))
)
```
@ -186,14 +186,14 @@ data <- data %>%
mutate(bacteria = as.mo(bacteria))
```
We also want to transform the antibiotics, because in real life data we don't know if they are really clean. The `as.rsi()` function ensures reliability and reproducibility in these kind of variables. The `is.rsi.eligible()` can check which columns are probably columns with R/SI test results. Using `mutate()` and `across()`, we can apply the transformation to the formal `<rsi>` class:
We also want to transform the antibiotics, because in real life data we don't know if they are really clean. The `as.sir()` function ensures reliability and reproducibility in these kind of variables. The `is_sir_eligible()` can check which columns are probably columns with SIR test results. Using `mutate()` and `across()`, we can apply the transformation to the formal `<rsi>` class:
```{r transform abx}
is.rsi.eligible(data)
colnames(data)[is.rsi.eligible(data)]
is_sir_eligible(data)
colnames(data)[is_sir_eligible(data)]
data <- data %>%
mutate(across(where(is.rsi.eligible), as.rsi))
mutate(across(where(is_sir_eligible), as.sir))
```
Finally, we will apply [EUCAST rules](https://www.eucast.org/expert_rules_and_intrinsic_resistance/) on our antimicrobial results. In Europe, most medical microbiological laboratories already apply these rules. Our package features their latest insights on intrinsic resistance and exceptional phenotypes. Moreover, the `eucast_rules()` function can also apply additional rules, like forcing <help title="ATC: J01CA01">ampicillin</help> = R when <help title="ATC: J01CR02">amoxicillin/clavulanic acid</help> = R.
@ -360,14 +360,14 @@ data_1st %>%
knitr::kable(align = "c", big.mark = ",")
```
Of course it would be very convenient to know the number of isolates responsible for the percentages. For that purpose the `n_rsi()` can be used, which works exactly like `n_distinct()` from the `dplyr` package. It counts all isolates available for every group (i.e. values S, I or R):
Of course it would be very convenient to know the number of isolates responsible for the percentages. For that purpose the `n_sir()` can be used, which works exactly like `n_distinct()` from the `dplyr` package. It counts all isolates available for every group (i.e. values S, I or R):
```{r, eval = FALSE}
data_1st %>%
group_by(hospital) %>%
summarise(
amoxicillin = resistance(AMX),
available = n_rsi(AMX)
available = n_sir(AMX)
)
```
```{r, echo = FALSE}
@ -375,7 +375,7 @@ data_1st %>%
group_by(hospital) %>%
summarise(
amoxicillin = resistance(AMX),
available = n_rsi(AMX)
available = n_sir(AMX)
) %>%
knitr::kable(align = "c", big.mark = ",")
```
@ -469,11 +469,11 @@ ggplot(a_data_set) +
geom_bar(aes(year))
```
The `AMR` package contains functions to extend this `ggplot2` package, for example `geom_rsi()`. It automatically transforms data with `count_df()` or `proportion_df()` and show results in stacked bars. Its simplest and shortest example:
The `AMR` package contains functions to extend this `ggplot2` package, for example `geom_sir()`. It automatically transforms data with `count_df()` or `proportion_df()` and show results in stacked bars. Its simplest and shortest example:
```{r plot 3}
ggplot(data_1st) +
geom_rsi(translate_ab = FALSE)
geom_sir(translate_ab = FALSE)
```
Omit the `translate_ab = FALSE` to have the antibiotic codes (AMX, AMC, CIP, GEN) translated to official WHO names (amoxicillin, amoxicillin/clavulanic acid, ciprofloxacin, gentamicin).
@ -484,13 +484,13 @@ If we group on e.g. the `genus` column and add some additional functions from ou
# group the data on `genus`
ggplot(data_1st %>% group_by(genus)) +
# create bars with genus on x axis
# it looks for variables with class `rsi`,
# of which we have 4 (earlier created with `as.rsi`)
geom_rsi(x = "genus") +
# it looks for variables with class `sir`,
# of which we have 4 (earlier created with `as.sir`)
geom_sir(x = "genus") +
# split plots on antibiotic
facet_rsi(facet = "antibiotic") +
# set colours to the R/SI interpretations (colour-blind friendly)
scale_rsi_colours() +
facet_sir(facet = "antibiotic") +
# set colours to the SIR interpretations (colour-blind friendly)
scale_sir_colours() +
# show percentages on y axis
scale_y_percent(breaks = 0:4 * 25) +
# turn 90 degrees, to make it bars instead of columns
@ -505,12 +505,12 @@ ggplot(data_1st %>% group_by(genus)) +
theme(axis.text.y = element_text(face = "italic"))
```
To simplify this, we also created the `ggplot_rsi()` function, which combines almost all above functions:
To simplify this, we also created the `ggplot_sir()` function, which combines almost all above functions:
```{r plot 5}
data_1st %>%
group_by(genus) %>%
ggplot_rsi(
ggplot_sir(
x = "genus",
facet = "antibiotic",
breaks = 0:4 * 25,

View File

@ -111,16 +111,16 @@ example_isolates %>%
For another example, I will create a data set to determine multi-drug resistant TB:
```{r}
# random_rsi() is a helper function to generate
# random_sir() is a helper function to generate
# a random vector with values S, I and R
my_TB_data <- data.frame(
rifampicin = random_rsi(5000),
isoniazid = random_rsi(5000),
gatifloxacin = random_rsi(5000),
ethambutol = random_rsi(5000),
pyrazinamide = random_rsi(5000),
moxifloxacin = random_rsi(5000),
kanamycin = random_rsi(5000)
rifampicin = random_sir(5000),
isoniazid = random_sir(5000),
gatifloxacin = random_sir(5000),
ethambutol = random_sir(5000),
pyrazinamide = random_sir(5000),
moxifloxacin = random_sir(5000),
kanamycin = random_sir(5000)
)
```
@ -128,13 +128,13 @@ Because all column names are automatically verified for valid drug names or code
```{r, eval = FALSE}
my_TB_data <- data.frame(
RIF = random_rsi(5000),
INH = random_rsi(5000),
GAT = random_rsi(5000),
ETH = random_rsi(5000),
PZA = random_rsi(5000),
MFX = random_rsi(5000),
KAN = random_rsi(5000)
RIF = random_sir(5000),
INH = random_sir(5000),
GAT = random_sir(5000),
ETH = random_sir(5000),
PZA = random_sir(5000),
MFX = random_sir(5000),
KAN = random_sir(5000)
)
```

View File

@ -44,7 +44,7 @@ resistance_data <- example_isolates %>%
order = mo_order(mo), # group on anything, like order
genus = mo_genus(mo)
) %>% # and genus as we do here
summarise_if(is.rsi, resistance) %>% # then get resistance of all drugs
summarise_if(is.sir, resistance) %>% # then get resistance of all drugs
select(
order, genus, AMC, CXM, CTX,
CAZ, GEN, TOB, TMP, SXT

View File

@ -48,15 +48,15 @@ library(cleaner) # to create frequency tables
We will have to transform some variables to simplify and automate the analysis:
* Microorganisms should be transformed to our own microorganism codes (called an `mo`) using [our Catalogue of Life reference data set](https://msberends.github.io/AMR/reference/catalogue_of_life), which contains all ~70,000 microorganisms from the taxonomic kingdoms Bacteria, Fungi and Protozoa. We do the tranformation with `as.mo()`. This function also recognises almost all WHONET abbreviations of microorganisms.
* Antimicrobial results or interpretations have to be clean and valid. In other words, they should only contain values `"S"`, `"I"` or `"R"`. That is exactly where the `as.rsi()` function is for.
* Antimicrobial results or interpretations have to be clean and valid. In other words, they should only contain values `"S"`, `"I"` or `"R"`. That is exactly where the `as.sir()` function is for.
```{r}
# transform variables
data <- WHONET %>%
# get microbial ID based on given organism
mutate(mo = as.mo(Organism)) %>%
# transform everything from "AMP_ND10" to "CIP_EE" to the new `rsi` class
mutate_at(vars(AMP_ND10:CIP_EE), as.rsi)
# transform everything from "AMP_ND10" to "CIP_EE" to the new `sir` class
mutate_at(vars(AMP_ND10:CIP_EE), as.sir)
```
No errors or warnings, so all values are transformed succesfully.
@ -77,13 +77,13 @@ data %>% freq(AMC_ND2)
### A first glimpse at results
An easy `ggplot` will already give a lot of information, using the included `ggplot_rsi()` function:
An easy `ggplot` will already give a lot of information, using the included `ggplot_sir()` function:
```{r, eval = FALSE}
data %>%
group_by(Country) %>%
select(Country, AMP_ND2, AMC_ED20, CAZ_ED10, CIP_ED5) %>%
ggplot_rsi(translate_ab = "ab", facet = "Country", datalabels = FALSE)
ggplot_sir(translate_ab = "ab", facet = "Country", datalabels = FALSE)
```
```{r, echo = FALSE}
@ -91,7 +91,7 @@ data %>%
tryCatch(data %>%
group_by(Country) %>%
select(Country, AMP_ND2, AMC_ED20, CAZ_ED10, CIP_ED5) %>%
ggplot_rsi(translate_ab = "ab", facet = "Country", datalabels = FALSE) %>%
ggplot_sir(translate_ab = "ab", facet = "Country", datalabels = FALSE) %>%
print(),
error = function(e) base::invisible()
)

View File

@ -111,7 +111,7 @@ print_df <- function(x, rows = 6) {
}
```
All reference data (about microorganisms, antibiotics, R/SI interpretation, EUCAST rules, etc.) in this `AMR` package are reliable, up-to-date and freely available. We continually export our data sets to formats for use in R, MS Excel, Apache Feather, Apache Parquet, SPSS, SAS, and Stata. We also provide tab-separated text files that are machine-readable and suitable for input in any software program, such as laboratory information systems.
All reference data (about microorganisms, antibiotics, SIR interpretation, EUCAST rules, etc.) in this `AMR` package are reliable, up-to-date and freely available. We continually export our data sets to formats for use in R, MS Excel, Apache Feather, Apache Parquet, SPSS, SAS, and Stata. We also provide tab-separated text files that are machine-readable and suitable for input in any software program, such as laboratory information systems.
On this page, we explain how to download them and how the structure of the data sets look like.
@ -209,22 +209,22 @@ antivirals %>%
print_df()
```
## `rsi_translation`: Interpretation from MIC values / disk diameters to R/SI
## `clinical_breakpoints`: Interpretation from MIC values & disk diameters to SIR
`r structure_txt(rsi_translation)`
`r structure_txt(clinical_breakpoints)`
This data set is in R available as `rsi_translation`, after you load the `AMR` package.
This data set is in R available as `clinical_breakpoints`, after you load the `AMR` package.
`r download_txt("rsi_translation")`
`r download_txt("clinical_breakpoints")`
### Source
This data set contains interpretation rules for MIC values and disk diffusion diameters. Included guidelines are CLSI (`r min(as.integer(gsub("[^0-9]", "", subset(rsi_translation, guideline %like% "CLSI")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(rsi_translation, guideline %like% "CLSI")$guideline)))`) and EUCAST (`r min(as.integer(gsub("[^0-9]", "", subset(rsi_translation, guideline %like% "EUCAST")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(rsi_translation, guideline %like% "EUCAST")$guideline)))`).
This data set contains interpretation rules for MIC values and disk diffusion diameters. Included guidelines are CLSI (`r min(as.integer(gsub("[^0-9]", "", subset(clinical_breakpoints, guideline %like% "CLSI")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(clinical_breakpoints, guideline %like% "CLSI")$guideline)))`) and EUCAST (`r min(as.integer(gsub("[^0-9]", "", subset(clinical_breakpoints, guideline %like% "EUCAST")$guideline)))`-`r max(as.integer(gsub("[^0-9]", "", subset(clinical_breakpoints, guideline %like% "EUCAST")$guideline)))`).
### Example content
```{r, echo = FALSE}
rsi_translation %>%
clinical_breakpoints %>%
mutate(mo_name = mo_name(mo, language = NULL), .after = mo) %>%
mutate(ab_name = ab_name(ab, language = NULL), .after = ab) %>%
print_df()

View File

@ -80,13 +80,13 @@ plot(predict_TZP)
This is the fastest way to plot the result. It automatically adds the right axes, error bars, titles, number of available observations and type of model.
We also support the `ggplot2` package with our custom function `ggplot_rsi_predict()` to create more appealing plots:
We also support the `ggplot2` package with our custom function `ggplot_sir_predict()` to create more appealing plots:
```{r}
ggplot_rsi_predict(predict_TZP)
ggplot_sir_predict(predict_TZP)
# choose for error bars instead of a ribbon
ggplot_rsi_predict(predict_TZP, ribbon = FALSE)
ggplot_sir_predict(predict_TZP, ribbon = FALSE)
```
### Choosing the right model
@ -97,7 +97,7 @@ Resistance is not easily predicted; if we look at vancomycin resistance in Gram-
example_isolates %>%
filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "binomial") %>%
ggplot_rsi_predict()
ggplot_sir_predict()
```
Vancomycin resistance could be 100% in ten years, but might remain very low.
@ -118,7 +118,7 @@ For the vancomycin resistance in Gram-positive bacteria, a linear model might be
example_isolates %>%
filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>%
ggplot_rsi_predict()
ggplot_sir_predict()
```
The model itself is also available from the object, as an `attribute`:

View File

@ -30,7 +30,7 @@ The `AMR` package is a [free and open-source](https://msberends.github.io/AMR/#c
This work was published in the Journal of Statistical Software (Volume 104(3); [DOI 10.18637/jss.v104.i03](https://doi.org/10.18637/jss.v104.i03)) and formed the basis of two PhD theses ([DOI 10.33612/diss.177417131](https://doi.org/10.33612/diss.177417131) and [DOI 10.33612/diss.192486375](https://doi.org/10.33612/diss.192486375)).
After installing this package, R knows `r AMR:::format_included_data_number(AMR::microorganisms)` distinct microbial species and all `r AMR:::format_included_data_number(rbind(AMR::antibiotics[, "atc", drop = FALSE], AMR::antivirals[, "atc", drop = FALSE]))` antibiotic, antimycotic and antiviral drugs by name and code (including ATC, EARS-Net, ASIARS-Net, PubChem, LOINC and SNOMED CT), and knows all about valid R/SI and MIC values. The integral breakpoint guidelines from CLSI and EUCAST are included from the last 10 years. It supports and can read any data format, including WHONET data.
After installing this package, R knows `r AMR:::format_included_data_number(AMR::microorganisms)` distinct microbial species and all `r AMR:::format_included_data_number(rbind(AMR::antibiotics[, "atc", drop = FALSE], AMR::antivirals[, "atc", drop = FALSE]))` antibiotic, antimycotic and antiviral drugs by name and code (including ATC, EARS-Net, ASIARS-Net, PubChem, LOINC and SNOMED CT), and knows all about valid SIR and MIC values. The integral breakpoint guidelines from CLSI and EUCAST are included from the last 10 years. It supports and can read any data format, including WHONET data.
The `AMR` package is available in English, Chinese, Danish, Dutch, French, German, Greek, Italian, Japanese, Polish, Portuguese, Russian, Spanish, Swedish, Turkish and Ukrainian. Antimicrobial drug (group) names and colloquial microorganism names are provided in these languages.
@ -52,10 +52,10 @@ This package can be used for:
* Applying EUCAST expert rules
* Getting SNOMED codes of a microorganism, or getting properties of a microorganism based on a SNOMED code
* Getting LOINC codes of an antibiotic, or getting properties of an antibiotic based on a LOINC code
* Machine reading the EUCAST and CLSI guidelines from 2011-2020 to translate MIC values and disk diffusion diameters to R/SI
* Machine reading the EUCAST and CLSI guidelines from 2011-2020 to translate MIC values and disk diffusion diameters to SIR
* Principal component analysis for AMR
All reference data sets (about microorganisms, antibiotics, R/SI interpretation, EUCAST rules, etc.) in this `AMR` package are publicly and freely available. We continually export our data sets to formats for use in R, SPSS, SAS, Stata and Excel. We also supply flat files that are machine-readable and suitable for input in any software program, such as laboratory information systems. Please find [all download links on our website](https://msberends.github.io/AMR/articles/datasets.html), which is automatically updated with every code change.
All reference data sets (about microorganisms, antibiotics, SIR interpretation, EUCAST rules, etc.) in this `AMR` package are publicly and freely available. We continually export our data sets to formats for use in R, SPSS, SAS, Stata and Excel. We also supply flat files that are machine-readable and suitable for input in any software program, such as laboratory information systems. Please find [all download links on our website](https://msberends.github.io/AMR/articles/datasets.html), which is automatically updated with every code change.
This R package was created for both routine data analysis and academic research at the Faculty of Medical Sciences of the [University of Groningen](https://www.rug.nl), in collaboration with non-profit organisations [Certe Medical Diagnostics and Advice Foundation](https://www.certe.nl) and [University Medical Center Groningen](https://www.umcg.nl), and is being [actively and durably maintained](./news) by two public healthcare organisations in the Netherlands.