To calculate antimicrobial resistance in a more sensible way, also by
correcting for too few results, we use the resistance() and
susceptibility() functions.
@@ -1348,7 +1348,7 @@ categories.
labs(title ="MIC Distribution and SIR Interpretation", x ="Sample Groups", y ="MIC (mg/L)")
-
+
This plot provides an intuitive way to assess susceptibility patterns
across different groups while incorporating clinical breakpoints.
For a more straightforward and less manual approach,
@@ -1357,12 +1357,12 @@ extended by this package to directly plot MIC and disk diffusion
values:
# by providing `mo` and `ab`, colours will indicate the SIR interpretation:autoplot(mic_values, mo ="K. pneumoniae", ab ="cipro", guideline ="EUCAST 2024")
-
+
Author: Dr. Matthijs Berends, 23rd Feb 2025
diff --git a/articles/AMR.md b/articles/AMR.md
new file mode 100644
index 000000000..516ecd23f
--- /dev/null
+++ b/articles/AMR.md
@@ -0,0 +1,951 @@
+# Conduct AMR data analysis
+
+**Note:** values on this page will change with every website update
+since they are based on randomly created values and the page was written
+in [R Markdown](https://rmarkdown.rstudio.com/). However, the
+methodology remains unchanged. This page was generated on 24 November
+2025.
+
+## Introduction
+
+Conducting AMR data analysis unfortunately requires in-depth knowledge
+from different scientific fields, which makes it hard to do right. At
+least, it requires:
+
+- Good questions (always start with those!) and reliable data
+- A thorough understanding of (clinical) epidemiology, to understand the
+ clinical and epidemiological relevance and possible bias of results
+- A thorough understanding of (clinical) microbiology/infectious
+ diseases, to understand which microorganisms are causal to which
+ infections and the implications of pharmaceutical treatment, as well
+ as understanding intrinsic and acquired microbial resistance
+- Experience with data analysis with microbiological tests and their
+ results, to understand the determination and limitations of MIC values
+ and their interpretations to SIR values
+- Availability of the biological taxonomy of microorganisms and probably
+ normalisation factors for pharmaceuticals, such as defined daily doses
+ (DDD)
+- Available (inter-)national guidelines, and profound methods to apply
+ them
+
+Of course, we cannot instantly provide you with knowledge and
+experience. But with this `AMR` package, we aimed at providing (1) tools
+to simplify antimicrobial resistance data cleaning, transformation and
+analysis, (2) methods to easily incorporate international guidelines and
+(3) scientifically reliable reference data, including the requirements
+mentioned above.
+
+The `AMR` package enables standardised and reproducible AMR data
+analysis, with the application of evidence-based rules, determination of
+first isolates, translation of various codes for microorganisms and
+antimicrobial agents, determination of (multi-drug) resistant
+microorganisms, and calculation of antimicrobial resistance, prevalence
+and future trends.
+
+## Preparation
+
+For this tutorial, we will create fake demonstration data to work with.
+
+You can skip to [Cleaning the data](#cleaning-the-data) if you already
+have your own data ready. If you start your analysis, try to make the
+structure of your data generally look like this:
+
+| date | patient_id | mo | AMX | CIP |
+|:----------:|:----------:|:----------------:|:---:|:---:|
+| 2025-11-24 | abcd | Escherichia coli | S | S |
+| 2025-11-24 | abcd | Escherichia coli | S | R |
+| 2025-11-24 | efgh | Escherichia coli | R | S |
+
+### Needed R packages
+
+As with many uses in R, we need some additional packages for AMR data
+analysis. Our package works closely together with the [tidyverse
+packages](https://www.tidyverse.org)
+[`dplyr`](https://dplyr.tidyverse.org/) and
+[`ggplot2`](https://ggplot2.tidyverse.org) by RStudio. The tidyverse
+tremendously improves the way we conduct data science - it allows for a
+very natural way of writing syntaxes and creating beautiful plots in R.
+
+We will also use the `cleaner` package, that can be used for cleaning
+data and creating frequency tables.
+
+``` r
+library(dplyr)
+library(ggplot2)
+library(AMR)
+
+# (if not yet installed, install with:)
+# install.packages(c("dplyr", "ggplot2", "AMR"))
+```
+
+The `AMR` package contains a data set `example_isolates_unclean`, which
+might look data that users have extracted from their laboratory systems:
+
+``` r
+example_isolates_unclean
+#> # A tibble: 3,000 × 8
+#> patient_id hospital date bacteria AMX AMC CIP GEN
+#>
+#> 1 J3 A 2012-11-21 E. coli R I S S
+#> 2 R7 A 2018-04-03 K. pneumoniae R I S S
+#> 3 P3 A 2014-09-19 E. coli R S S S
+#> 4 P10 A 2015-12-10 E. coli S I S S
+#> 5 B7 A 2015-03-02 E. coli S S S S
+#> 6 W3 A 2018-03-31 S. aureus R S R S
+#> 7 J8 A 2016-06-14 E. coli R S S S
+#> 8 M3 A 2015-10-25 E. coli R S S S
+#> 9 J3 A 2019-06-19 E. coli S S S S
+#> 10 G6 A 2015-04-27 S. aureus S S S S
+#> # ℹ 2,990 more rows
+
+# we will use 'our_data' as the data set name for this tutorial
+our_data <- example_isolates_unclean
+```
+
+For AMR data analysis, we would like the microorganism column to contain
+valid, up-to-date taxonomy, and the antibiotic columns to be cleaned as
+SIR values as well.
+
+### Taxonomy of microorganisms
+
+With [`as.mo()`](https://amr-for-r.org/reference/as.mo.md), users can
+transform arbitrary microorganism names or codes to current taxonomy.
+The `AMR` package contains up-to-date taxonomic data. To be specific,
+currently included data were retrieved on 24 Jun 2024.
+
+The codes of the AMR packages that come from
+[`as.mo()`](https://amr-for-r.org/reference/as.mo.md) are short, but
+still human readable. More importantly,
+[`as.mo()`](https://amr-for-r.org/reference/as.mo.md) supports all kinds
+of input:
+
+``` r
+as.mo("Klebsiella pneumoniae")
+#> Class 'mo'
+#> [1] B_KLBSL_PNMN
+as.mo("K. pneumoniae")
+#> Class 'mo'
+#> [1] B_KLBSL_PNMN
+as.mo("KLEPNE")
+#> Class 'mo'
+#> [1] B_KLBSL_PNMN
+as.mo("KLPN")
+#> Class 'mo'
+#> [1] B_KLBSL_PNMN
+```
+
+The first character in above codes denote their taxonomic kingdom, such
+as Bacteria (B), Fungi (F), and Protozoa (P).
+
+The `AMR` package also contain functions to directly retrieve taxonomic
+properties, such as the name, genus, species, family, order, and even
+Gram-stain. They all start with `mo_` and they use
+[`as.mo()`](https://amr-for-r.org/reference/as.mo.md) internally, so
+that still any arbitrary user input can be used:
+
+``` r
+mo_family("K. pneumoniae")
+#> [1] "Enterobacteriaceae"
+mo_genus("K. pneumoniae")
+#> [1] "Klebsiella"
+mo_species("K. pneumoniae")
+#> [1] "pneumoniae"
+
+mo_gramstain("Klebsiella pneumoniae")
+#> [1] "Gram-negative"
+
+mo_ref("K. pneumoniae")
+#> [1] "Trevisan, 1887"
+
+mo_snomed("K. pneumoniae")
+#> [[1]]
+#> [1] "1098101000112102" "446870005" "1098201000112108" "409801009"
+#> [5] "56415008" "714315002" "713926009"
+```
+
+Now we can thus clean our data:
+
+``` r
+our_data$bacteria <- as.mo(our_data$bacteria, info = TRUE)
+#> ℹ Retrieved values from the `microorganisms.codes` data set for "ESCCOL",
+#> "KLEPNE", "STAAUR", and "STRPNE".
+#> ℹ Microorganism translation was uncertain for four microorganisms. Run
+#> `mo_uncertainties()` to review these uncertainties, or use
+#> `add_custom_microorganisms()` to add custom entries.
+```
+
+Apparently, there was some uncertainty about the translation to
+taxonomic codes. Let’s check this:
+
+``` r
+mo_uncertainties()
+#> Matching scores are based on the resemblance between the input and the full
+#> taxonomic name, and the pathogenicity in humans. See `?mo_matching_score`.
+#> Colour keys: 0.000-0.549 0.550-0.649 0.650-0.749 0.750-1.000
+#>
+#> --------------------------------------------------------------------------------
+#> "E. coli" -> Escherichia coli (B_ESCHR_COLI, 0.688)
+#> Also matched: Enterococcus crotali (0.650), Escherichia coli coli
+#> (0.643), Escherichia coli expressing (0.611), Enterobacter cowanii
+#> (0.600), Enterococcus columbae (0.595), Enterococcus camelliae (0.591),
+#> Enterococcus casseliflavus (0.577), Enterobacter cloacae cloacae
+#> (0.571), Enterobacter cloacae complex (0.571), and Enterobacter cloacae
+#> dissolvens (0.565)
+#> --------------------------------------------------------------------------------
+#> "K. pneumoniae" -> Klebsiella pneumoniae (B_KLBSL_PNMN, 0.786)
+#> Also matched: Klebsiella pneumoniae complex (0.707), Klebsiella
+#> pneumoniae ozaenae (0.707), Klebsiella pneumoniae pneumoniae (0.688),
+#> Klebsiella pneumoniae rhinoscleromatis (0.658), Klebsiella pasteurii
+#> (0.500), Klebsiella planticola (0.500), Kingella potus (0.400),
+#> Kluyveromyces pseudotropicale (0.386), Kluyveromyces pseudotropicalis
+#> (0.363), and Kosakonia pseudosacchari (0.361)
+#> --------------------------------------------------------------------------------
+#> "S. aureus" -> Staphylococcus aureus (B_STPHY_AURS, 0.690)
+#> Also matched: Staphylococcus aureus aureus (0.643), Staphylococcus
+#> argenteus (0.625), Staphylococcus aureus anaerobius (0.625),
+#> Staphylococcus auricularis (0.615), Salmonella Aurelianis (0.595),
+#> Salmonella Aarhus (0.588), Salmonella Amounderness (0.587),
+#> Staphylococcus argensis (0.587), Streptococcus australis (0.587), and
+#> Salmonella choleraesuis arizonae (0.562)
+#> --------------------------------------------------------------------------------
+#> "S. pneumoniae" -> Streptococcus pneumoniae (B_STRPT_PNMN, 0.750)
+#> Also matched: Streptococcus pseudopneumoniae (0.700), Streptococcus
+#> phocae salmonis (0.552), Serratia proteamaculans quinovora (0.545),
+#> Streptococcus pseudoporcinus (0.536), Staphylococcus piscifermentans
+#> (0.533), Staphylococcus pseudintermedius (0.532), Serratia
+#> proteamaculans proteamaculans (0.526), Streptococcus gallolyticus
+#> pasteurianus (0.526), Salmonella Portanigra (0.524), and Streptococcus
+#> periodonticum (0.519)
+#>
+#> Only the first 10 other matches of each record are shown. Run
+#> `print(mo_uncertainties(), n = ...)` to view more entries, or save
+#> `mo_uncertainties()` to an object.
+```
+
+That’s all good.
+
+### Antibiotic results
+
+The column with antibiotic test results must also be cleaned. The `AMR`
+package comes with three new data types to work with such test results:
+`mic` for minimal inhibitory concentrations (MIC), `disk` for disk
+diffusion diameters, and `sir` for SIR data that have been interpreted
+already. This package can also determine SIR values based on MIC or disk
+diffusion values, read more about that on the
+[`as.sir()`](https://amr-for-r.org/reference/as.sir.md) page.
+
+For now, we will just clean the SIR columns in our data using dplyr:
+
+``` r
+# method 1, be explicit about the columns:
+our_data <- our_data %>%
+ mutate_at(vars(AMX:GEN), as.sir)
+
+# method 2, let the AMR package determine the eligible columns
+our_data <- our_data %>%
+ mutate_if(is_sir_eligible, as.sir)
+
+# result:
+our_data
+#> # A tibble: 3,000 × 8
+#> patient_id hospital date bacteria AMX AMC CIP GEN
+#>
+#> 1 J3 A 2012-11-21 B_ESCHR_COLI R I S S
+#> 2 R7 A 2018-04-03 B_KLBSL_PNMN R I S S
+#> 3 P3 A 2014-09-19 B_ESCHR_COLI R S S S
+#> 4 P10 A 2015-12-10 B_ESCHR_COLI S I S S
+#> 5 B7 A 2015-03-02 B_ESCHR_COLI S S S S
+#> 6 W3 A 2018-03-31 B_STPHY_AURS R S R S
+#> 7 J8 A 2016-06-14 B_ESCHR_COLI R S S S
+#> 8 M3 A 2015-10-25 B_ESCHR_COLI R S S S
+#> 9 J3 A 2019-06-19 B_ESCHR_COLI S S S S
+#> 10 G6 A 2015-04-27 B_STPHY_AURS S S S S
+#> # ℹ 2,990 more rows
+```
+
+This is basically it for the cleaning, time to start the data inclusion.
+
+### First isolates
+
+We need to know which isolates we can *actually* use for analysis
+without repetition bias.
+
+To conduct an analysis of antimicrobial resistance, you must [only
+include the first isolate of every patient per
+episode](https:/pubmed.ncbi.nlm.nih.gov/17304462/) (Hindler *et al.*,
+Clin Infect Dis. 2007). If you would not do this, you could easily get
+an overestimate or underestimate of the resistance of an antibiotic.
+Imagine that a patient was admitted with an MRSA and that it was found
+in 5 different blood cultures the following weeks (yes, some countries
+like the Netherlands have these blood drawing policies). The resistance
+percentage of oxacillin of all isolates would be overestimated, because
+you included this MRSA more than once. It would clearly be [selection
+bias](https://en.wikipedia.org/wiki/Selection_bias).
+
+The Clinical and Laboratory Standards Institute (CLSI) appoints this as
+follows:
+
+> *(…) When preparing a cumulative antibiogram to guide clinical
+> decisions about empirical antimicrobial therapy of initial infections,
+> **only the first isolate of a given species per patient, per analysis
+> period (eg, one year) should be included, irrespective of body site,
+> antimicrobial susceptibility profile, or other phenotypical
+> characteristics (eg, biotype)**. The first isolate is easily
+> identified, and cumulative antimicrobial susceptibility test data
+> prepared using the first isolate are generally comparable to
+> cumulative antimicrobial susceptibility test data calculated by other
+> methods, providing duplicate isolates are excluded.*
+> [M39-A4 Analysis and Presentation of Cumulative Antimicrobial
+> Susceptibility Test Data, 4th Edition. CLSI, 2014. Chapter
+> 6.4](https://clsi.org/standards/products/microbiology/documents/m39/)
+
+This `AMR` package includes this methodology with the
+[`first_isolate()`](https://amr-for-r.org/reference/first_isolate.md)
+function and is able to apply the four different methods as defined by
+[Hindler *et al.* in
+2007](https://academic.oup.com/cid/article/44/6/867/364325):
+phenotype-based, episode-based, patient-based, isolate-based. The right
+method depends on your goals and analysis, but the default
+phenotype-based method is in any case the method to properly correct for
+most duplicate isolates. Read more about the methods on the
+[`first_isolate()`](https://amr-for-r.org/reference/first_isolate.md)
+page.
+
+The outcome of the function can easily be added to our data:
+
+``` r
+our_data <- our_data %>%
+ mutate(first = first_isolate(info = TRUE))
+#> ℹ Determining first isolates using an episode length of 365 days
+#> ℹ Using column 'bacteria' as input for `col_mo`.
+#> ℹ Using column 'date' as input for `col_date`.
+#> ℹ Using column 'patient_id' as input for `col_patient_id`.
+#> ℹ Basing inclusion on all antimicrobial results, using a points threshold
+#> of 2
+#> => Found 2,724 'phenotype-based' first isolates (90.8% of total where a
+#> microbial ID was available)
+```
+
+So only 91% is suitable for resistance analysis! We can now filter on it
+with the [`filter()`](https://dplyr.tidyverse.org/reference/filter.html)
+function, also from the `dplyr` package:
+
+``` r
+our_data_1st <- our_data %>%
+ filter(first == TRUE)
+```
+
+For future use, the above two syntaxes can be shortened:
+
+``` r
+our_data_1st <- our_data %>%
+ filter_first_isolate()
+```
+
+So we end up with 2 724 isolates for analysis. Now our data looks like:
+
+``` r
+our_data_1st
+#> # A tibble: 2,724 × 9
+#> patient_id hospital date bacteria AMX AMC CIP GEN first
+#>
+#> 1 J3 A 2012-11-21 B_ESCHR_COLI R I S S TRUE
+#> 2 R7 A 2018-04-03 B_KLBSL_PNMN R I S S TRUE
+#> 3 P3 A 2014-09-19 B_ESCHR_COLI R S S S TRUE
+#> 4 P10 A 2015-12-10 B_ESCHR_COLI S I S S TRUE
+#> 5 B7 A 2015-03-02 B_ESCHR_COLI S S S S TRUE
+#> 6 W3 A 2018-03-31 B_STPHY_AURS R S R S TRUE
+#> 7 M3 A 2015-10-25 B_ESCHR_COLI R S S S TRUE
+#> 8 J3 A 2019-06-19 B_ESCHR_COLI S S S S TRUE
+#> 9 G6 A 2015-04-27 B_STPHY_AURS S S S S TRUE
+#> 10 P4 A 2011-06-21 B_ESCHR_COLI S S S S TRUE
+#> # ℹ 2,714 more rows
+```
+
+Time for the analysis.
+
+## Analysing the data
+
+The base R [`summary()`](https://rdrr.io/r/base/summary.html) function
+gives a good first impression, as it comes with support for the new `mo`
+and `sir` classes that we now have in our data set:
+
+``` r
+summary(our_data_1st)
+#> patient_id hospital date
+#> Length:2724 Length:2724 Min. :2011-01-01
+#> Class :character Class :character 1st Qu.:2013-04-07
+#> Mode :character Mode :character Median :2015-06-03
+#> Mean :2015-06-09
+#> 3rd Qu.:2017-08-11
+#> Max. :2019-12-27
+#> bacteria AMX AMC
+#> Class :mo Class:sir Class:sir
+#> :0 %S :41.6% (n=1133) %S :52.6% (n=1432)
+#> Unique:4 %SDD : 0.0% (n=0) %SDD : 0.0% (n=0)
+#> #1 :B_ESCHR_COLI %I :16.4% (n=446) %I :12.2% (n=333)
+#> #2 :B_STPHY_AURS %R :42.0% (n=1145) %R :35.2% (n=959)
+#> #3 :B_STRPT_PNMN %NI : 0.0% (n=0) %NI : 0.0% (n=0)
+#> CIP GEN first
+#> Class:sir Class:sir Mode:logical
+#> %S :52.5% (n=1431) %S :61.0% (n=1661) TRUE:2724
+#> %SDD : 0.0% (n=0) %SDD : 0.0% (n=0)
+#> %I : 6.5% (n=176) %I : 3.0% (n=82)
+#> %R :41.0% (n=1117) %R :36.0% (n=981)
+#> %NI : 0.0% (n=0) %NI : 0.0% (n=0)
+
+glimpse(our_data_1st)
+#> Rows: 2,724
+#> Columns: 9
+#> $ patient_id "J3", "R7", "P3", "P10", "B7", "W3", "M3", "J3", "G6", "P4"…
+#> $ hospital "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",…
+#> $ date 2012-11-21, 2018-04-03, 2014-09-19, 2015-12-10, 2015-03-02…
+#> $ bacteria "B_ESCHR_COLI", "B_KLBSL_PNMN", "B_ESCHR_COLI", "B_ESCHR_COL…
+#> $ AMX R, R, R, S, S, R, R, S, S, S, S, R, S, S, R, R, R, R, S, R,…
+#> $ AMC I, I, S, I, S, S, S, S, S, S, S, S, S, S, S, S, S, R, S, S,…
+#> $ CIP S, S, S, S, S, R, S, S, S, S, S, S, S, S, S, S, S, S, S, S,…
+#> $ GEN S, S, S, S, S, S, S, S, S, S, S, R, S, S, S, S, S, S, S, S,…
+#> $ first TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,…
+
+# number of unique values per column:
+sapply(our_data_1st, n_distinct)
+#> patient_id hospital date bacteria AMX AMC CIP
+#> 260 3 1854 4 3 3 3
+#> GEN first
+#> 3 1
+```
+
+### Availability of species
+
+To just get an idea how the species are distributed, create a frequency
+table with [`count()`](https://amr-for-r.org/reference/count.md) based
+on the name of the microorganisms:
+
+``` r
+our_data %>%
+ count(mo_name(bacteria), sort = TRUE)
+#> # A tibble: 4 × 2
+#> `mo_name(bacteria)` n
+#>
+#> 1 Escherichia coli 1518
+#> 2 Staphylococcus aureus 730
+#> 3 Streptococcus pneumoniae 426
+#> 4 Klebsiella pneumoniae 326
+
+our_data_1st %>%
+ count(mo_name(bacteria), sort = TRUE)
+#> # A tibble: 4 × 2
+#> `mo_name(bacteria)` n
+#>
+#> 1 Escherichia coli 1321
+#> 2 Staphylococcus aureus 682
+#> 3 Streptococcus pneumoniae 402
+#> 4 Klebsiella pneumoniae 319
+```
+
+### Select and filter with antibiotic selectors
+
+Using so-called antibiotic class selectors, you can select or filter
+columns based on the antibiotic class that your antibiotic results are
+in:
+
+``` r
+our_data_1st %>%
+ select(date, aminoglycosides())
+#> ℹ For `aminoglycosides()` using column 'GEN' (gentamicin)
+#> # A tibble: 2,724 × 2
+#> date GEN
+#>
+#> 1 2012-11-21 S
+#> 2 2018-04-03 S
+#> 3 2014-09-19 S
+#> 4 2015-12-10 S
+#> 5 2015-03-02 S
+#> 6 2018-03-31 S
+#> 7 2015-10-25 S
+#> 8 2019-06-19 S
+#> 9 2015-04-27 S
+#> 10 2011-06-21 S
+#> # ℹ 2,714 more rows
+
+our_data_1st %>%
+ select(bacteria, betalactams())
+#> ℹ For `betalactams()` using columns 'AMX' (amoxicillin) and 'AMC'
+#> (amoxicillin/clavulanic acid)
+#> # A tibble: 2,724 × 3
+#> bacteria AMX AMC
+#>
+#> 1 B_ESCHR_COLI R I
+#> 2 B_KLBSL_PNMN R I
+#> 3 B_ESCHR_COLI R S
+#> 4 B_ESCHR_COLI S I
+#> 5 B_ESCHR_COLI S S
+#> 6 B_STPHY_AURS R S
+#> 7 B_ESCHR_COLI R S
+#> 8 B_ESCHR_COLI S S
+#> 9 B_STPHY_AURS S S
+#> 10 B_ESCHR_COLI S S
+#> # ℹ 2,714 more rows
+
+our_data_1st %>%
+ select(bacteria, where(is.sir))
+#> # A tibble: 2,724 × 5
+#> bacteria AMX AMC CIP GEN
+#>
+#> 1 B_ESCHR_COLI R I S S
+#> 2 B_KLBSL_PNMN R I S S
+#> 3 B_ESCHR_COLI R S S S
+#> 4 B_ESCHR_COLI S I S S
+#> 5 B_ESCHR_COLI S S S S
+#> 6 B_STPHY_AURS R S R S
+#> 7 B_ESCHR_COLI R S S S
+#> 8 B_ESCHR_COLI S S S S
+#> 9 B_STPHY_AURS S S S S
+#> 10 B_ESCHR_COLI S S S S
+#> # ℹ 2,714 more rows
+
+# filtering using AB selectors is also possible:
+our_data_1st %>%
+ filter(any(aminoglycosides() == "R"))
+#> ℹ For `aminoglycosides()` using column 'GEN' (gentamicin)
+#> # A tibble: 981 × 9
+#> patient_id hospital date bacteria AMX AMC CIP GEN first
+#>
+#> 1 J5 A 2017-12-25 B_STRPT_PNMN R S S R TRUE
+#> 2 X1 A 2017-07-04 B_STPHY_AURS R S S R TRUE
+#> 3 B3 A 2016-07-24 B_ESCHR_COLI S S S R TRUE
+#> 4 V7 A 2012-04-03 B_ESCHR_COLI S S S R TRUE
+#> 5 C9 A 2017-03-23 B_ESCHR_COLI S S S R TRUE
+#> 6 R1 A 2018-06-10 B_STPHY_AURS S S S R TRUE
+#> 7 S2 A 2013-07-19 B_STRPT_PNMN S S S R TRUE
+#> 8 P5 A 2019-03-09 B_STPHY_AURS S S S R TRUE
+#> 9 Q8 A 2019-08-10 B_STPHY_AURS S S S R TRUE
+#> 10 K5 A 2013-03-15 B_STRPT_PNMN S S S R TRUE
+#> # ℹ 971 more rows
+
+our_data_1st %>%
+ filter(all(betalactams() == "R"))
+#> ℹ For `betalactams()` using columns 'AMX' (amoxicillin) and 'AMC'
+#> (amoxicillin/clavulanic acid)
+#> # A tibble: 462 × 9
+#> patient_id hospital date bacteria AMX AMC CIP GEN first
+#>
+#> 1 M7 A 2013-07-22 B_STRPT_PNMN R R S S TRUE
+#> 2 R10 A 2013-12-20 B_STPHY_AURS R R S S TRUE
+#> 3 R7 A 2015-10-25 B_STPHY_AURS R R S S TRUE
+#> 4 R8 A 2019-10-25 B_STPHY_AURS R R S S TRUE
+#> 5 B6 A 2016-11-20 B_ESCHR_COLI R R R R TRUE
+#> 6 I7 A 2015-08-19 B_ESCHR_COLI R R S S TRUE
+#> 7 N3 A 2014-12-29 B_STRPT_PNMN R R R S TRUE
+#> 8 Q2 A 2019-09-22 B_ESCHR_COLI R R S S TRUE
+#> 9 X7 A 2011-03-20 B_ESCHR_COLI R R S R TRUE
+#> 10 V1 A 2018-08-07 B_STPHY_AURS R R S S TRUE
+#> # ℹ 452 more rows
+
+# even works in base R (since R 3.0):
+our_data_1st[all(betalactams() == "R"), ]
+#> ℹ For `betalactams()` using columns 'AMX' (amoxicillin) and 'AMC'
+#> (amoxicillin/clavulanic acid)
+#> # A tibble: 462 × 9
+#> patient_id hospital date bacteria AMX AMC CIP GEN first
+#>
+#> 1 M7 A 2013-07-22 B_STRPT_PNMN R R S S TRUE
+#> 2 R10 A 2013-12-20 B_STPHY_AURS R R S S TRUE
+#> 3 R7 A 2015-10-25 B_STPHY_AURS R R S S TRUE
+#> 4 R8 A 2019-10-25 B_STPHY_AURS R R S S TRUE
+#> 5 B6 A 2016-11-20 B_ESCHR_COLI R R R R TRUE
+#> 6 I7 A 2015-08-19 B_ESCHR_COLI R R S S TRUE
+#> 7 N3 A 2014-12-29 B_STRPT_PNMN R R R S TRUE
+#> 8 Q2 A 2019-09-22 B_ESCHR_COLI R R S S TRUE
+#> 9 X7 A 2011-03-20 B_ESCHR_COLI R R S R TRUE
+#> 10 V1 A 2018-08-07 B_STPHY_AURS R R S S TRUE
+#> # ℹ 452 more rows
+```
+
+### Generate antibiograms
+
+Since AMR v2.0 (March 2023), it is very easy to create different types
+of antibiograms, with support for 20 different languages.
+
+There are four antibiogram types, as proposed by Klinker *et al.* (2021,
+[DOI
+10.1177/20499361211011373](https://doi.org/10.1177/20499361211011373)),
+and they are all supported by the new
+[`antibiogram()`](https://amr-for-r.org/reference/antibiogram.md)
+function:
+
+1. **Traditional Antibiogram (TA)** e.g, for the susceptibility of
+ *Pseudomonas aeruginosa* to piperacillin/tazobactam (TZP)
+2. **Combination Antibiogram (CA)** e.g, for the sdditional
+ susceptibility of *Pseudomonas aeruginosa* to TZP + tobramycin
+ versus TZP alone
+3. **Syndromic Antibiogram (SA)** e.g, for the susceptibility of
+ *Pseudomonas aeruginosa* to TZP among respiratory specimens
+ (obtained among ICU patients only)
+4. **Weighted-Incidence Syndromic Combination Antibiogram (WISCA)**
+ e.g, for the susceptibility of *Pseudomonas aeruginosa* to TZP among
+ respiratory specimens (obtained among ICU patients only) for male
+ patients age \>=65 years with heart failure
+
+In this section, we show how to use the
+[`antibiogram()`](https://amr-for-r.org/reference/antibiogram.md)
+function to create any of the above antibiogram types. For starters,
+this is what the included `example_isolates` data set looks like:
+
+``` r
+example_isolates
+#> # A tibble: 2,000 × 46
+#> date patient age gender ward mo PEN OXA FLC AMX
+#>
+#> 1 2002-01-02 A77334 65 F Clinical B_ESCHR_COLI R NA NA NA
+#> 2 2002-01-03 A77334 65 F Clinical B_ESCHR_COLI R NA NA NA
+#> 3 2002-01-07 067927 45 F ICU B_STPHY_EPDR R NA R NA
+#> 4 2002-01-07 067927 45 F ICU B_STPHY_EPDR R NA R NA
+#> 5 2002-01-13 067927 45 F ICU B_STPHY_EPDR R NA R NA
+#> 6 2002-01-13 067927 45 F ICU B_STPHY_EPDR R NA R NA
+#> 7 2002-01-14 462729 78 M Clinical B_STPHY_AURS R NA S R
+#> 8 2002-01-14 462729 78 M Clinical B_STPHY_AURS R NA S R
+#> 9 2002-01-16 067927 45 F ICU B_STPHY_EPDR R NA R NA
+#> 10 2002-01-17 858515 79 F ICU B_STPHY_EPDR R NA S NA
+#> # ℹ 1,990 more rows
+#> # ℹ 36 more variables: AMC , AMP , TZP , CZO , FEP ,
+#> # CXM , FOX , CTX , CAZ , CRO , GEN ,
+#> # TOB , AMK , KAN , TMP , SXT , NIT ,
+#> # FOS , LNZ , CIP , MFX , VAN , TEC ,
+#> # TCY , TGC , DOX , ERY , CLI , AZM ,
+#> # IPM , MEM , MTR , CHL , COL , MUP , …
+```
+
+#### Traditional Antibiogram
+
+To create a traditional antibiogram, simply state which antibiotics
+should be used. The `antibiotics` argument in the
+[`antibiogram()`](https://amr-for-r.org/reference/antibiogram.md)
+function supports any (combination) of the previously mentioned
+antibiotic class selectors:
+
+``` r
+antibiogram(example_isolates,
+ antibiotics = c(aminoglycosides(), carbapenems()))
+#> ℹ For `aminoglycosides()` using columns 'GEN' (gentamicin), 'TOB'
+#> (tobramycin), 'AMK' (amikacin), and 'KAN' (kanamycin)
+#> ℹ For `carbapenems()` using columns 'IPM' (imipenem) and 'MEM' (meropenem)
+```
+
+| Pathogen | Amikacin | Gentamicin | Imipenem | Kanamycin | Meropenem | Tobramycin |
+|:-----------------|:---------------------|:--------------------|:---------------------|:----------------|:---------------------|:--------------------|
+| CoNS | 0% (0-8%,N=43) | 86% (82-90%,N=309) | 52% (37-67%,N=48) | 0% (0-8%,N=43) | 52% (37-67%,N=48) | 22% (12-35%,N=55) |
+| *E. coli* | 100% (98-100%,N=171) | 98% (96-99%,N=460) | 100% (99-100%,N=422) | NA | 100% (99-100%,N=418) | 97% (96-99%,N=462) |
+| *E. faecalis* | 0% (0-9%,N=39) | 0% (0-9%,N=39) | 100% (91-100%,N=38) | 0% (0-9%,N=39) | NA | 0% (0-9%,N=39) |
+| *K. pneumoniae* | NA | 90% (79-96%,N=58) | 100% (93-100%,N=51) | NA | 100% (93-100%,N=53) | 90% (79-96%,N=58) |
+| *P. aeruginosa* | NA | 100% (88-100%,N=30) | NA | 0% (0-12%,N=30) | NA | 100% (88-100%,N=30) |
+| *P. mirabilis* | NA | 94% (80-99%,N=34) | 94% (79-99%,N=32) | NA | NA | 94% (80-99%,N=34) |
+| *S. aureus* | NA | 99% (97-100%,N=233) | NA | NA | NA | 98% (92-100%,N=86) |
+| *S. epidermidis* | 0% (0-8%,N=44) | 79% (71-85%,N=163) | NA | 0% (0-8%,N=44) | NA | 51% (40-61%,N=89) |
+| *S. hominis* | NA | 92% (84-97%,N=80) | NA | NA | NA | 85% (74-93%,N=62) |
+| *S. pneumoniae* | 0% (0-3%,N=117) | 0% (0-3%,N=117) | NA | 0% (0-3%,N=117) | NA | 0% (0-3%,N=117) |
+
+Notice that the
+[`antibiogram()`](https://amr-for-r.org/reference/antibiogram.md)
+function automatically prints in the right format when using Quarto or R
+Markdown (such as this page), and even applies italics for taxonomic
+names (by using
+[`italicise_taxonomy()`](https://amr-for-r.org/reference/italicise_taxonomy.md)
+internally).
+
+It also uses the language of your OS if this is either English, Arabic,
+Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Greek,
+Hindi, Indonesian, Italian, Japanese, Korean, Norwegian, Polish,
+Portuguese, Romanian, Russian, Spanish, Swahili, Swedish, Turkish,
+Ukrainian, Urdu, or Vietnamese. In this next example, we force the
+language to be Spanish using the `language` argument:
+
+``` r
+antibiogram(example_isolates,
+ mo_transform = "gramstain",
+ antibiotics = aminoglycosides(),
+ ab_transform = "name",
+ language = "es")
+#> ℹ For `aminoglycosides()` using columns 'GEN' (gentamicin), 'TOB'
+#> (tobramycin), 'AMK' (amikacin), and 'KAN' (kanamycin)
+```
+
+| Patógeno | Amikacina | Gentamicina | Kanamicina | Tobramicina |
+|:--------------|:-------------------|:--------------------|:----------------|:-------------------|
+| Gram negativo | 98% (96-99%,N=256) | 96% (95-98%,N=684) | 0% (0-10%,N=35) | 96% (94-97%,N=686) |
+| Gram positivo | 0% (0-1%,N=436) | 63% (60-66%,N=1170) | 0% (0-1%,N=436) | 34% (31-38%,N=665) |
+
+#### Combined Antibiogram
+
+To create a combined antibiogram, use antibiotic codes or names with a
+plus `+` character like this:
+
+``` r
+combined_ab <- antibiogram(example_isolates,
+ antibiotics = c("TZP", "TZP+TOB", "TZP+GEN"),
+ ab_transform = NULL)
+combined_ab
+```
+
+| Pathogen | TZP | TZP + GEN | TZP + TOB |
+|:-----------------|:---------------------|:---------------------|:---------------------|
+| CoNS | 30% (16-49%,N=33) | 97% (95-99%,N=274) | NA |
+| *E. coli* | 94% (92-96%,N=416) | 100% (98-100%,N=459) | 99% (97-100%,N=461) |
+| *K. pneumoniae* | 89% (77-96%,N=53) | 93% (83-98%,N=58) | 93% (83-98%,N=58) |
+| *P. aeruginosa* | NA | 100% (88-100%,N=30) | 100% (88-100%,N=30) |
+| *P. mirabilis* | NA | 100% (90-100%,N=34) | 100% (90-100%,N=34) |
+| *S. aureus* | NA | 100% (98-100%,N=231) | 100% (96-100%,N=91) |
+| *S. epidermidis* | NA | 100% (97-100%,N=128) | 100% (92-100%,N=46) |
+| *S. hominis* | NA | 100% (95-100%,N=74) | 100% (93-100%,N=53) |
+| *S. pneumoniae* | 100% (97-100%,N=112) | 100% (97-100%,N=112) | 100% (97-100%,N=112) |
+
+#### Syndromic Antibiogram
+
+To create a syndromic antibiogram, the `syndromic_group` argument must
+be used. This can be any column in the data, or e.g. an
+[`ifelse()`](https://rdrr.io/r/base/ifelse.html) with calculations based
+on certain columns:
+
+``` r
+antibiogram(example_isolates,
+ antibiotics = c(aminoglycosides(), carbapenems()),
+ syndromic_group = "ward")
+#> ℹ For `aminoglycosides()` using columns 'GEN' (gentamicin), 'TOB'
+#> (tobramycin), 'AMK' (amikacin), and 'KAN' (kanamycin)
+#> ℹ For `carbapenems()` using columns 'IPM' (imipenem) and 'MEM' (meropenem)
+```
+
+| Syndromic Group | Pathogen | Amikacin | Gentamicin | Imipenem | Kanamycin | Meropenem | Tobramycin |
+|:----------------|:-----------------|:---------------------|:--------------------|:---------------------|:----------------|:---------------------|:--------------------|
+| Clinical | CoNS | NA | 89% (84-93%,N=205) | 57% (39-74%,N=35) | NA | 57% (39-74%,N=35) | 26% (12-45%,N=31) |
+| ICU | CoNS | NA | 79% (68-88%,N=73) | NA | NA | NA | NA |
+| Outpatient | CoNS | NA | 84% (66-95%,N=31) | NA | NA | NA | NA |
+| Clinical | *E. coli* | 100% (97-100%,N=104) | 98% (96-99%,N=297) | 100% (99-100%,N=266) | NA | 100% (99-100%,N=276) | 98% (96-99%,N=299) |
+| ICU | *E. coli* | 100% (93-100%,N=52) | 99% (95-100%,N=137) | 100% (97-100%,N=133) | NA | 100% (97-100%,N=118) | 96% (92-99%,N=137) |
+| Clinical | *K. pneumoniae* | NA | 92% (81-98%,N=51) | 100% (92-100%,N=44) | NA | 100% (92-100%,N=46) | 92% (81-98%,N=51) |
+| Clinical | *P. mirabilis* | NA | 100% (88-100%,N=30) | NA | NA | NA | 100% (88-100%,N=30) |
+| Clinical | *S. aureus* | NA | 99% (95-100%,N=150) | NA | NA | NA | 97% (89-100%,N=63) |
+| ICU | *S. aureus* | NA | 100% (95-100%,N=66) | NA | NA | NA | NA |
+| Clinical | *S. epidermidis* | NA | 82% (72-90%,N=79) | NA | NA | NA | 55% (39-70%,N=44) |
+| ICU | *S. epidermidis* | NA | 72% (60-82%,N=75) | NA | NA | NA | 41% (26-58%,N=41) |
+| Clinical | *S. hominis* | NA | 96% (85-99%,N=45) | NA | NA | NA | 94% (79-99%,N=31) |
+| Clinical | *S. pneumoniae* | 0% (0-5%,N=78) | 0% (0-5%,N=78) | NA | 0% (0-5%,N=78) | NA | 0% (0-5%,N=78) |
+| ICU | *S. pneumoniae* | 0% (0-12%,N=30) | 0% (0-12%,N=30) | NA | 0% (0-12%,N=30) | NA | 0% (0-12%,N=30) |
+
+#### Weighted-Incidence Syndromic Combination Antibiogram (WISCA)
+
+To create a **Weighted-Incidence Syndromic Combination Antibiogram
+(WISCA)**, simply set `wisca = TRUE` in the
+[`antibiogram()`](https://amr-for-r.org/reference/antibiogram.md)
+function, or use the dedicated
+[`wisca()`](https://amr-for-r.org/reference/antibiogram.md) function.
+Unlike traditional antibiograms, WISCA provides syndrome-based
+susceptibility estimates, weighted by pathogen incidence and
+antimicrobial susceptibility patterns.
+
+``` r
+example_isolates %>%
+ wisca(antibiotics = c("TZP", "TZP+TOB", "TZP+GEN"),
+ minimum = 10) # Recommended threshold: ≥30
+```
+
+| Piperacillin/tazobactam | Piperacillin/tazobactam + Gentamicin | Piperacillin/tazobactam + Tobramycin |
+|:------------------------|:-------------------------------------|:-------------------------------------|
+| 69.4% (64.3-74.3%) | 92.6% (91.1-93.9%) | 88.7% (85.8-91.2%) |
+
+WISCA uses a **Bayesian decision model** to integrate data from multiple
+pathogens, improving empirical therapy guidance, especially for
+low-incidence infections. It is **pathogen-agnostic**, meaning results
+are syndrome-based rather than stratified by microorganism.
+
+For reliable results, ensure your data includes **only first isolates**
+(use
+[`first_isolate()`](https://amr-for-r.org/reference/first_isolate.md))
+and consider filtering for **the top *n* species** (use
+[`top_n_microorganisms()`](https://amr-for-r.org/reference/top_n_microorganisms.md)),
+as WISCA outcomes are most meaningful when based on robust incidence
+estimates.
+
+For **patient- or syndrome-specific WISCA**, run the function on a
+grouped `tibble`, i.e., using
+[`group_by()`](https://dplyr.tidyverse.org/reference/group_by.html)
+first:
+
+``` r
+example_isolates %>%
+ top_n_microorganisms(n = 10) %>%
+ group_by(age_group = age_groups(age, c(25, 50, 75)),
+ gender) %>%
+ wisca(antibiotics = c("TZP", "TZP+TOB", "TZP+GEN"))
+```
+
+| age_group | gender | Piperacillin/tazobactam | Piperacillin/tazobactam + Gentamicin | Piperacillin/tazobactam + Tobramycin |
+|:----------|:-------|:------------------------|:-------------------------------------|:-------------------------------------|
+| 0-24 | F | 56.6% (25.2-83.9%) | 73.6% (48-91.6%) | 68.6% (42.9-89.5%) |
+| 0-24 | M | 60.3% (28.4-87.1%) | 79.7% (57.6-94.2%) | 60.1% (29.5-87.7%) |
+| 25-49 | F | 66.6% (45.6-85.5%) | 91.7% (84.6-96.7%) | 83% (67.9-94%) |
+| 25-49 | M | 56.4% (29.1-81.7%) | 89.2% (80.3-95.7%) | 72.4% (49.7-90%) |
+| 50-74 | F | 67.8% (55.8-80.1%) | 95.6% (93.2-97.5%) | 88.1% (80.4-94.6%) |
+| 50-74 | M | 66.2% (54.8-75.8%) | 95.2% (92.4-97.4%) | 84.4% (74.4-92.5%) |
+| 75+ | F | 71.7% (61-81.7%) | 96.6% (94.4-98.2%) | 90.6% (84.6-95.3%) |
+| 75+ | M | 72.9% (63.8-82%) | 96.6% (94.6-98.1%) | 92.8% (87.8-96.5%) |
+
+#### Plotting antibiograms
+
+Antibiograms can be plotted using
+[`autoplot()`](https://ggplot2.tidyverse.org/reference/autoplot.html)
+from the `ggplot2` packages, since this `AMR` package provides an
+extension to that function:
+
+``` r
+autoplot(combined_ab)
+```
+
+
+
+To calculate antimicrobial resistance in a more sensible way, also by
+correcting for too few results, we use the
+[`resistance()`](https://amr-for-r.org/reference/proportion.md) and
+[`susceptibility()`](https://amr-for-r.org/reference/proportion.md)
+functions.
+
+### Resistance percentages
+
+The functions
+[`resistance()`](https://amr-for-r.org/reference/proportion.md) and
+[`susceptibility()`](https://amr-for-r.org/reference/proportion.md) can
+be used to calculate antimicrobial resistance or susceptibility. For
+more specific analyses, the functions
+[`proportion_S()`](https://amr-for-r.org/reference/proportion.md),
+[`proportion_SI()`](https://amr-for-r.org/reference/proportion.md),
+[`proportion_I()`](https://amr-for-r.org/reference/proportion.md),
+[`proportion_IR()`](https://amr-for-r.org/reference/proportion.md) and
+[`proportion_R()`](https://amr-for-r.org/reference/proportion.md) can be
+used to determine the proportion of a specific antimicrobial outcome.
+
+All these functions contain a `minimum` argument, denoting the minimum
+required number of test results for returning a value. These functions
+will otherwise return `NA`. The default is `minimum = 30`, following the
+[CLSI M39-A4
+guideline](https://clsi.org/standards/products/microbiology/documents/m39/)
+for applying microbial epidemiology.
+
+As per the EUCAST guideline of 2019, we calculate resistance as the
+proportion of R
+([`proportion_R()`](https://amr-for-r.org/reference/proportion.md),
+equal to
+[`resistance()`](https://amr-for-r.org/reference/proportion.md)) and
+susceptibility as the proportion of S and I
+([`proportion_SI()`](https://amr-for-r.org/reference/proportion.md),
+equal to
+[`susceptibility()`](https://amr-for-r.org/reference/proportion.md)).
+These functions can be used on their own:
+
+``` r
+our_data_1st %>% resistance(AMX)
+#> [1] 0.4203377
+```
+
+Or can be used in conjunction with
+[`group_by()`](https://dplyr.tidyverse.org/reference/group_by.html) and
+[`summarise()`](https://dplyr.tidyverse.org/reference/summarise.html),
+both from the `dplyr` package:
+
+``` r
+our_data_1st %>%
+ group_by(hospital) %>%
+ summarise(amoxicillin = resistance(AMX))
+#> # A tibble: 3 × 2
+#> hospital amoxicillin
+#>
+#> 1 A 0.340
+#> 2 B 0.551
+#> 3 C 0.370
+```
+
+### Interpreting MIC and Disk Diffusion Values
+
+Minimal inhibitory concentration (MIC) values and disk diffusion
+diameters can be interpreted into clinical breakpoints (SIR) using
+[`as.sir()`](https://amr-for-r.org/reference/as.sir.md). Here’s an
+example with randomly generated MIC values for *Klebsiella pneumoniae*
+and ciprofloxacin:
+
+``` r
+set.seed(123)
+mic_values <- random_mic(100)
+sir_values <- as.sir(mic_values, mo = "K. pneumoniae", ab = "cipro", guideline = "EUCAST 2024")
+
+my_data <- tibble(MIC = mic_values, SIR = sir_values)
+my_data
+#> # A tibble: 100 × 2
+#> MIC SIR
+#>
+#> 1 <=0.0001 S
+#> 2 0.0160 S
+#> 3 >=8.0000 R
+#> 4 0.0320 S
+#> 5 0.0080 S
+#> 6 64.0000 R
+#> 7 0.0080 S
+#> 8 0.1250 S
+#> 9 0.0320 S
+#> 10 0.0002 S
+#> # ℹ 90 more rows
+```
+
+This allows direct interpretation according to EUCAST or CLSI
+breakpoints, facilitating automated AMR data processing.
+
+### Plotting MIC and SIR Interpretations
+
+We can visualise MIC distributions and their SIR interpretations using
+`ggplot2`, using the new
+[`scale_y_mic()`](https://amr-for-r.org/reference/plot.md) for the
+y-axis and
+[`scale_colour_sir()`](https://amr-for-r.org/reference/plot.md) to
+colour-code SIR categories.
+
+``` r
+# add a group
+my_data$group <- rep(c("A", "B", "C", "D"), each = 25)
+
+ggplot(my_data,
+ aes(x = group, y = MIC, colour = SIR)) +
+ geom_jitter(width = 0.2, size = 2) +
+ geom_boxplot(fill = NA, colour = "grey40") +
+ scale_y_mic() +
+ scale_colour_sir() +
+ labs(title = "MIC Distribution and SIR Interpretation",
+ x = "Sample Groups",
+ y = "MIC (mg/L)")
+```
+
+
+
+This plot provides an intuitive way to assess susceptibility patterns
+across different groups while incorporating clinical breakpoints.
+
+For a more straightforward and less manual approach, `ggplot2`’s
+function
+[`autoplot()`](https://ggplot2.tidyverse.org/reference/autoplot.html)
+has been extended by this package to directly plot MIC and disk
+diffusion values:
+
+``` r
+autoplot(mic_values)
+```
+
+
+
+``` r
+
+# by providing `mo` and `ab`, colours will indicate the SIR interpretation:
+autoplot(mic_values, mo = "K. pneumoniae", ab = "cipro", guideline = "EUCAST 2024")
+```
+
+
+
+------------------------------------------------------------------------
+
+*Author: Dr. Matthijs Berends, 23rd Feb 2025*
diff --git a/articles/AMR_for_Python.html b/articles/AMR_for_Python.html
index addb946e1..65d08935e 100644
--- a/articles/AMR_for_Python.html
+++ b/articles/AMR_for_Python.html
@@ -30,7 +30,7 @@
AMR (for R)
- 3.0.1.9002
+ 3.0.1.9003
diff --git a/articles/AMR_for_Python.md b/articles/AMR_for_Python.md
new file mode 100644
index 000000000..550ffccbd
--- /dev/null
+++ b/articles/AMR_for_Python.md
@@ -0,0 +1,217 @@
+# AMR for Python
+
+## Introduction
+
+The `AMR` package for R is a powerful tool for antimicrobial resistance
+(AMR) analysis. It provides extensive features for handling microbial
+and antimicrobial data. However, for those who work primarily in Python,
+we now have a more intuitive option available: the [`AMR` Python
+package](https://pypi.org/project/AMR/).
+
+This Python package is a wrapper around the `AMR` R package. It uses the
+`rpy2` package internally. Despite the need to have R installed, Python
+users can now easily work with AMR data directly through Python code.
+
+## Prerequisites
+
+This package was only tested with a [virtual environment
+(venv)](https://docs.python.org/3/library/venv.html). You can set up
+such an environment by running:
+
+``` python
+# linux and macOS:
+python -m venv /path/to/new/virtual/environment
+
+# Windows:
+python -m venv C:\path\to\new\virtual\environment
+```
+
+Then you can [activate the
+environment](https://docs.python.org/3/library/venv.html#how-venvs-work),
+after which the venv is ready to work with.
+
+## Install AMR
+
+1. Since the Python package is available on the official [Python
+ Package Index](https://pypi.org/project/AMR/), you can just run:
+
+ ``` bash
+ pip install AMR
+ ```
+
+2. Make sure you have R installed. There is **no need to install the
+ `AMR` R package**, as it will be installed automatically.
+
+ For Linux:
+
+ ``` bash
+ # Ubuntu / Debian
+ sudo apt install r-base
+ # Fedora:
+ sudo dnf install R
+ # CentOS/RHEL
+ sudo yum install R
+ ```
+
+ For macOS (using [Homebrew](https://brew.sh)):
+
+ ``` bash
+ brew install r
+ ```
+
+ For Windows, visit the [CRAN download
+ page](https://cran.r-project.org) to download and install R.
+
+## Examples of Usage
+
+### Cleaning Taxonomy
+
+Here’s an example that demonstrates how to clean microorganism and drug
+names using the `AMR` Python package:
+
+``` python
+import pandas as pd
+import AMR
+
+# Sample data
+data = {
+ "MOs": ['E. coli', 'ESCCOL', 'esco', 'Esche coli'],
+ "Drug": ['Cipro', 'CIP', 'J01MA02', 'Ciproxin']
+}
+df = pd.DataFrame(data)
+
+# Use AMR functions to clean microorganism and drug names
+df['MO_clean'] = AMR.mo_name(df['MOs'])
+df['Drug_clean'] = AMR.ab_name(df['Drug'])
+
+# Display the results
+print(df)
+```
+
+| MOs | Drug | MO_clean | Drug_clean |
+|------------|----------|------------------|---------------|
+| E. coli | Cipro | Escherichia coli | Ciprofloxacin |
+| ESCCOL | CIP | Escherichia coli | Ciprofloxacin |
+| esco | J01MA02 | Escherichia coli | Ciprofloxacin |
+| Esche coli | Ciproxin | Escherichia coli | Ciprofloxacin |
+
+#### Explanation
+
+- **mo_name:** This function standardises microorganism names. Here,
+ different variations of *Escherichia coli* (such as “E. coli”,
+ “ESCCOL”, “esco”, and “Esche coli”) are all converted into the
+ correct, standardised form, “Escherichia coli”.
+
+- **ab_name**: Similarly, this function standardises antimicrobial
+ names. The different representations of ciprofloxacin (e.g., “Cipro”,
+ “CIP”, “J01MA02”, and “Ciproxin”) are all converted to the standard
+ name, “Ciprofloxacin”.
+
+### Calculating AMR
+
+``` python
+import AMR
+import pandas as pd
+
+df = AMR.example_isolates
+result = AMR.resistance(df["AMX"])
+print(result)
+```
+
+ [0.59555556]
+
+### Generating Antibiograms
+
+One of the core functions of the `AMR` package is generating an
+antibiogram, a table that summarises the antimicrobial susceptibility of
+bacterial isolates. Here’s how you can generate an antibiogram from
+Python:
+
+``` python
+result2a = AMR.antibiogram(df[["mo", "AMX", "CIP", "TZP"]])
+print(result2a)
+```
+
+| Pathogen | Amoxicillin | Ciprofloxacin | Piperacillin/tazobactam |
+|----------------|----------------|---------------|-------------------------|
+| CoNS | 7% (10/142) | 73% (183/252) | 30% (10/33) |
+| E. coli | 50% (196/392) | 88% (399/456) | 94% (393/416) |
+| K. pneumoniae | 0% (0/58) | 96% (53/55) | 89% (47/53) |
+| P. aeruginosa | 0% (0/30) | 100% (30/30) | None |
+| P. mirabilis | None | 94% (34/36) | None |
+| S. aureus | 6% (8/131) | 90% (171/191) | None |
+| S. epidermidis | 1% (1/91) | 64% (87/136) | None |
+| S. hominis | None | 80% (56/70) | None |
+| S. pneumoniae | 100% (112/112) | None | 100% (112/112) |
+
+``` python
+result2b = AMR.antibiogram(df[["mo", "AMX", "CIP", "TZP"]], mo_transform = "gramstain")
+print(result2b)
+```
+
+| Pathogen | Amoxicillin | Ciprofloxacin | Piperacillin/tazobactam |
+|---------------|---------------|---------------|-------------------------|
+| Gram-negative | 36% (226/631) | 91% (621/684) | 88% (565/641) |
+| Gram-positive | 43% (305/703) | 77% (560/724) | 86% (296/345) |
+
+In this example, we generate an antibiogram by selecting various
+antibiotics.
+
+### Taxonomic Data Sets Now in Python!
+
+As a Python user, you might like that the most important data sets of
+the `AMR` R package, `microorganisms`, `antimicrobials`,
+`clinical_breakpoints`, and `example_isolates`, are now available as
+regular Python data frames:
+
+``` python
+AMR.microorganisms
+```
+
+| mo | fullname | status | kingdom | gbif | gbif_parent | gbif_renamed_to | prevalence |
+|--------------|------------------------------------|----------|----------|----------|-------------|-----------------|------------|
+| B_GRAMN | (unknown Gram-negatives) | unknown | Bacteria | None | None | None | 2.0 |
+| B_GRAMP | (unknown Gram-positives) | unknown | Bacteria | None | None | None | 2.0 |
+| B_ANAER-NEG | (unknown anaerobic Gram-negatives) | unknown | Bacteria | None | None | None | 2.0 |
+| B_ANAER-POS | (unknown anaerobic Gram-positives) | unknown | Bacteria | None | None | None | 2.0 |
+| B_ANAER | (unknown anaerobic bacteria) | unknown | Bacteria | None | None | None | 2.0 |
+| … | … | … | … | … | … | … | … |
+| B_ZYMMN_POMC | Zymomonas pomaceae | accepted | Bacteria | 10744418 | 3221412 | None | 2.0 |
+| B_ZYMPH | Zymophilus | synonym | Bacteria | None | 9475166 | None | 2.0 |
+| B_ZYMPH_PCVR | Zymophilus paucivorans | synonym | Bacteria | None | None | None | 2.0 |
+| B_ZYMPH_RFFN | Zymophilus raffinosivorans | synonym | Bacteria | None | None | None | 2.0 |
+| F_ZYZYG | Zyzygomyces | unknown | Fungi | None | 7581 | None | 2.0 |
+
+``` python
+AMR.antimicrobials
+```
+
+| ab | cid | name | group | oral_ddd | oral_units | iv_ddd | iv_units |
+|-----|------------|-----------------------|--------------------------|----------|------------|--------|----------|
+| AMA | 4649.0 | 4-aminosalicylic acid | Antimycobacterials | 12.00 | g | NaN | None |
+| ACM | 6450012.0 | Acetylmidecamycin | Macrolides/lincosamides | NaN | None | NaN | None |
+| ASP | 49787020.0 | Acetylspiramycin | Macrolides/lincosamides | NaN | None | NaN | None |
+| ALS | 8954.0 | Aldesulfone sodium | Other antibacterials | 0.33 | g | NaN | None |
+| AMK | 37768.0 | Amikacin | Aminoglycosides | NaN | None | 1.0 | g |
+| … | … | … | … | … | … | … | … |
+| VIR | 11979535.0 | Virginiamycine | Other antibacterials | NaN | None | NaN | None |
+| VOR | 71616.0 | Voriconazole | Antifungals/antimycotics | 0.40 | g | 0.4 | g |
+| XBR | 72144.0 | Xibornol | Other antibacterials | NaN | None | NaN | None |
+| ZID | 77846445.0 | Zidebactam | Other antibacterials | NaN | None | NaN | None |
+| ZFD | NaN | Zoliflodacin | None | NaN | None | NaN | None |
+
+## Conclusion
+
+With the `AMR` Python package, Python users can now effortlessly call R
+functions from the `AMR` R package. This eliminates the need for complex
+`rpy2` configurations and provides a clean, easy-to-use interface for
+antimicrobial resistance analysis. The examples provided above
+demonstrate how this can be applied to typical workflows, such as
+standardising microorganism and antimicrobial names or calculating
+resistance.
+
+By just running `import AMR`, users can seamlessly integrate the robust
+features of the R `AMR` package into Python workflows.
+
+Whether you’re cleaning data or analysing resistance patterns, the `AMR`
+Python package makes it easy to work with AMR data in Python.
diff --git a/articles/AMR_with_tidymodels.html b/articles/AMR_with_tidymodels.html
index 4716db1e1..42d2fe8c8 100644
--- a/articles/AMR_with_tidymodels.html
+++ b/articles/AMR_with_tidymodels.html
@@ -30,7 +30,7 @@
AMR (for R)
- 3.0.1.9002
+ 3.0.1.9003
@@ -413,7 +413,7 @@ ROC curve looks like this:
predictions%>%roc_curve(mo, `.pred_Gram-negative`)%>%autoplot()
-
+
@@ -677,7 +677,7 @@ sets.
x ="Year", y ="Resistance Proportion")+theme_minimal()
-
+
Additionally, we can visualise resistance trends in
ggplot2 and directly add linear models there:
@@ -691,7 +691,7 @@ sets.
formula =y~x, alpha =0.25)+theme_minimal()
-
+
diff --git a/articles/AMR_with_tidymodels.md b/articles/AMR_with_tidymodels.md
new file mode 100644
index 000000000..4db5a2065
--- /dev/null
+++ b/articles/AMR_with_tidymodels.md
@@ -0,0 +1,606 @@
+# AMR with tidymodels
+
+> This page was entirely written by our [AMR for R
+> Assistant](https://chat.amr-for-r.org), a ChatGPT manually-trained
+> model able to answer any question about the `AMR` package.
+
+Antimicrobial resistance (AMR) is a global health crisis, and
+understanding resistance patterns is crucial for managing effective
+treatments. The `AMR` R package provides robust tools for analysing AMR
+data, including convenient antimicrobial selector functions like
+[`aminoglycosides()`](https://amr-for-r.org/reference/antimicrobial_selectors.md)
+and
+[`betalactams()`](https://amr-for-r.org/reference/antimicrobial_selectors.md).
+
+In this post, we will explore how to use the `tidymodels` framework to
+predict resistance patterns in the `example_isolates` dataset in two
+examples.
+
+This post contains the following examples:
+
+1. Using Antimicrobial Selectors
+2. Predicting ESBL Presence Using Raw MICs
+3. Predicting AMR Over Time
+
+## Example 1: Using Antimicrobial Selectors
+
+By leveraging the power of `tidymodels` and the `AMR` package, we’ll
+build a reproducible machine learning workflow to predict the Gramstain
+of the microorganism to two important antibiotic classes:
+aminoglycosides and beta-lactams.
+
+### **Objective**
+
+Our goal is to build a predictive model using the `tidymodels` framework
+to determine the Gramstain of the microorganism based on microbial data.
+We will:
+
+1. Preprocess data using the selector functions
+ [`aminoglycosides()`](https://amr-for-r.org/reference/antimicrobial_selectors.md)
+ and
+ [`betalactams()`](https://amr-for-r.org/reference/antimicrobial_selectors.md).
+2. Define a logistic regression model for prediction.
+3. Use a structured `tidymodels` workflow to preprocess, train, and
+ evaluate the model.
+
+### **Data Preparation**
+
+We begin by loading the required libraries and preparing the
+`example_isolates` dataset from the `AMR` package.
+
+``` r
+# Load required libraries
+library(AMR) # For AMR data analysis
+library(tidymodels) # For machine learning workflows, and data manipulation (dplyr, tidyr, ...)
+```
+
+Prepare the data:
+
+``` r
+# Your data could look like this:
+example_isolates
+#> # A tibble: 2,000 × 46
+#> date patient age gender ward mo PEN OXA FLC AMX
+#>
+#> 1 2002-01-02 A77334 65 F Clinical B_ESCHR_COLI R NA NA NA
+#> 2 2002-01-03 A77334 65 F Clinical B_ESCHR_COLI R NA NA NA
+#> 3 2002-01-07 067927 45 F ICU B_STPHY_EPDR R NA R NA
+#> 4 2002-01-07 067927 45 F ICU B_STPHY_EPDR R NA R NA
+#> 5 2002-01-13 067927 45 F ICU B_STPHY_EPDR R NA R NA
+#> 6 2002-01-13 067927 45 F ICU B_STPHY_EPDR R NA R NA
+#> 7 2002-01-14 462729 78 M Clinical B_STPHY_AURS R NA S R
+#> 8 2002-01-14 462729 78 M Clinical B_STPHY_AURS R NA S R
+#> 9 2002-01-16 067927 45 F ICU B_STPHY_EPDR R NA R NA
+#> 10 2002-01-17 858515 79 F ICU B_STPHY_EPDR R NA S NA
+#> # ℹ 1,990 more rows
+#> # ℹ 36 more variables: AMC , AMP , TZP , CZO , FEP ,
+#> # CXM , FOX , CTX , CAZ , CRO , GEN ,
+#> # TOB , AMK , KAN , TMP , SXT , NIT ,
+#> # FOS , LNZ , CIP , MFX , VAN , TEC ,
+#> # TCY , TGC , DOX , ERY , CLI , AZM ,
+#> # IPM , MEM , MTR , CHL , COL , MUP , …
+
+# Select relevant columns for prediction
+data <- example_isolates %>%
+ # select AB results dynamically
+ select(mo, aminoglycosides(), betalactams()) %>%
+ # replace NAs with NI (not-interpretable)
+ mutate(across(where(is.sir),
+ ~replace_na(.x, "NI")),
+ # make factors of SIR columns
+ across(where(is.sir),
+ as.integer),
+ # get Gramstain of microorganisms
+ mo = as.factor(mo_gramstain(mo))) %>%
+ # drop NAs - the ones without a Gramstain (fungi, etc.)
+ drop_na()
+#> ℹ For `aminoglycosides()` using columns 'GEN' (gentamicin), 'TOB'
+#> (tobramycin), 'AMK' (amikacin), and 'KAN' (kanamycin)
+#> ℹ For `betalactams()` using columns 'PEN' (benzylpenicillin), 'OXA'
+#> (oxacillin), 'FLC' (flucloxacillin), 'AMX' (amoxicillin), 'AMC'
+#> (amoxicillin/clavulanic acid), 'AMP' (ampicillin), 'TZP'
+#> (piperacillin/tazobactam), 'CZO' (cefazolin), 'FEP' (cefepime), 'CXM'
+#> (cefuroxime), 'FOX' (cefoxitin), 'CTX' (cefotaxime), 'CAZ' (ceftazidime),
+#> 'CRO' (ceftriaxone), 'IPM' (imipenem), and 'MEM' (meropenem)
+```
+
+**Explanation:**
+
+- [`aminoglycosides()`](https://amr-for-r.org/reference/antimicrobial_selectors.md)
+ and
+ [`betalactams()`](https://amr-for-r.org/reference/antimicrobial_selectors.md)
+ dynamically select columns for antimicrobials in these classes.
+- `drop_na()` ensures the model receives complete cases for training.
+
+### **Defining the Workflow**
+
+We now define the `tidymodels` workflow, which consists of three steps:
+preprocessing, model specification, and fitting.
+
+#### 1. Preprocessing with a Recipe
+
+We create a recipe to preprocess the data for modelling.
+
+``` r
+# Define the recipe for data preprocessing
+resistance_recipe <- recipe(mo ~ ., data = data) %>%
+ step_corr(c(aminoglycosides(), betalactams()), threshold = 0.9)
+resistance_recipe
+#>
+#> ── Recipe ──────────────────────────────────────────────────────────────────────
+#>
+#> ── Inputs
+#> Number of variables by role
+#> outcome: 1
+#> predictor: 20
+#>
+#> ── Operations
+#> • Correlation filter on: c(aminoglycosides(), betalactams())
+```
+
+For a recipe that includes at least one preprocessing operation, like we
+have with `step_corr()`, the necessary parameters can be estimated from
+a training set using `prep()`:
+
+``` r
+prep(resistance_recipe)
+#> ℹ For `aminoglycosides()` using columns 'GEN' (gentamicin), 'TOB'
+#> (tobramycin), 'AMK' (amikacin), and 'KAN' (kanamycin)
+#> ℹ For `betalactams()` using columns 'PEN' (benzylpenicillin), 'OXA'
+#> (oxacillin), 'FLC' (flucloxacillin), 'AMX' (amoxicillin), 'AMC'
+#> (amoxicillin/clavulanic acid), 'AMP' (ampicillin), 'TZP'
+#> (piperacillin/tazobactam), 'CZO' (cefazolin), 'FEP' (cefepime), 'CXM'
+#> (cefuroxime), 'FOX' (cefoxitin), 'CTX' (cefotaxime), 'CAZ' (ceftazidime),
+#> 'CRO' (ceftriaxone), 'IPM' (imipenem), and 'MEM' (meropenem)
+#>
+#> ── Recipe ──────────────────────────────────────────────────────────────────────
+#>
+#> ── Inputs
+#> Number of variables by role
+#> outcome: 1
+#> predictor: 20
+#>
+#> ── Training information
+#> Training data contained 1968 data points and no incomplete rows.
+#>
+#> ── Operations
+#> • Correlation filter on: AMX CTX | Trained
+```
+
+**Explanation:**
+
+- `recipe(mo ~ ., data = data)` will take the `mo` column as outcome and
+ all other columns as predictors.
+- `step_corr()` removes predictors (i.e., antibiotic columns) that have
+ a higher correlation than 90%.
+
+Notice how the recipe contains just the antimicrobial selector
+functions - no need to define the columns specifically. In the
+preparation (retrieved with `prep()`) we can see that the columns or
+variables ‘AMX’ and ‘CTX’ were removed as they correlate too much with
+existing, other variables.
+
+#### 2. Specifying the Model
+
+We define a logistic regression model since resistance prediction is a
+binary classification task.
+
+``` r
+# Specify a logistic regression model
+logistic_model <- logistic_reg() %>%
+ set_engine("glm") # Use the Generalised Linear Model engine
+logistic_model
+#> Logistic Regression Model Specification (classification)
+#>
+#> Computational engine: glm
+```
+
+**Explanation:**
+
+- `logistic_reg()` sets up a logistic regression model.
+- `set_engine("glm")` specifies the use of R’s built-in GLM engine.
+
+#### 3. Building the Workflow
+
+We bundle the recipe and model together into a `workflow`, which
+organises the entire modelling process.
+
+``` r
+# Combine the recipe and model into a workflow
+resistance_workflow <- workflow() %>%
+ add_recipe(resistance_recipe) %>% # Add the preprocessing recipe
+ add_model(logistic_model) # Add the logistic regression model
+resistance_workflow
+#> ══ Workflow ════════════════════════════════════════════════════════════════════
+#> Preprocessor: Recipe
+#> Model: logistic_reg()
+#>
+#> ── Preprocessor ────────────────────────────────────────────────────────────────
+#> 1 Recipe Step
+#>
+#> • step_corr()
+#>
+#> ── Model ───────────────────────────────────────────────────────────────────────
+#> Logistic Regression Model Specification (classification)
+#>
+#> Computational engine: glm
+```
+
+### **Training and Evaluating the Model**
+
+To train the model, we split the data into training and testing sets.
+Then, we fit the workflow on the training set and evaluate its
+performance.
+
+``` r
+# Split data into training and testing sets
+set.seed(123) # For reproducibility
+data_split <- initial_split(data, prop = 0.8) # 80% training, 20% testing
+training_data <- training(data_split) # Training set
+testing_data <- testing(data_split) # Testing set
+
+# Fit the workflow to the training data
+fitted_workflow <- resistance_workflow %>%
+ fit(training_data) # Train the model
+```
+
+**Explanation:**
+
+- `initial_split()` splits the data into training and testing sets.
+- `fit()` trains the workflow on the training set.
+
+Notice how in `fit()`, the antimicrobial selector functions are
+internally called again. For training, these functions are called since
+they are stored in the recipe.
+
+Next, we evaluate the model on the testing data.
+
+``` r
+# Make predictions on the testing set
+predictions <- fitted_workflow %>%
+ predict(testing_data) # Generate predictions
+probabilities <- fitted_workflow %>%
+ predict(testing_data, type = "prob") # Generate probabilities
+
+predictions <- predictions %>%
+ bind_cols(probabilities) %>%
+ bind_cols(testing_data) # Combine with true labels
+
+predictions
+#> # A tibble: 394 × 24
+#> .pred_class `.pred_Gram-negative` `.pred_Gram-positive` mo GEN TOB
+#>
+#> 1 Gram-positive 1.07e- 1 8.93 e- 1 Gram-p… 5 5
+#> 2 Gram-positive 3.17e- 8 1.000e+ 0 Gram-p… 5 1
+#> 3 Gram-negative 9.99e- 1 1.42 e- 3 Gram-n… 5 5
+#> 4 Gram-positive 2.22e-16 1 e+ 0 Gram-p… 5 5
+#> 5 Gram-negative 9.46e- 1 5.42 e- 2 Gram-n… 5 5
+#> 6 Gram-positive 1.07e- 1 8.93 e- 1 Gram-p… 5 5
+#> 7 Gram-positive 2.22e-16 1 e+ 0 Gram-p… 1 5
+#> 8 Gram-positive 2.22e-16 1 e+ 0 Gram-p… 4 4
+#> 9 Gram-negative 1 e+ 0 2.22 e-16 Gram-n… 1 1
+#> 10 Gram-positive 6.05e-11 1.000e+ 0 Gram-p… 4 4
+#> # ℹ 384 more rows
+#> # ℹ 18 more variables: AMK , KAN , PEN , OXA , FLC ,
+#> # AMX , AMC , AMP , TZP , CZO , FEP ,
+#> # CXM , FOX , CTX , CAZ , CRO , IPM , MEM
+
+# Evaluate model performance
+metrics <- predictions %>%
+ metrics(truth = mo, estimate = .pred_class) # Calculate performance metrics
+
+metrics
+#> # A tibble: 2 × 3
+#> .metric .estimator .estimate
+#>
+#> 1 accuracy binary 0.995
+#> 2 kap binary 0.989
+
+
+# To assess some other model properties, you can make our own `metrics()` function
+our_metrics <- metric_set(accuracy, kap, ppv, npv) # add Positive Predictive Value and Negative Predictive Value
+metrics2 <- predictions %>%
+ our_metrics(truth = mo, estimate = .pred_class) # run again on our `our_metrics()` function
+
+metrics2
+#> # A tibble: 4 × 3
+#> .metric .estimator .estimate
+#>
+#> 1 accuracy binary 0.995
+#> 2 kap binary 0.989
+#> 3 ppv binary 0.987
+#> 4 npv binary 1
+```
+
+**Explanation:**
+
+- [`predict()`](https://rdrr.io/r/stats/predict.html) generates
+ predictions on the testing set.
+- `metrics()` computes evaluation metrics like accuracy and kappa.
+
+It appears we can predict the Gram stain with a 99.5% accuracy based on
+AMR results of only aminoglycosides and beta-lactam antibiotics. The ROC
+curve looks like this:
+
+``` r
+predictions %>%
+ roc_curve(mo, `.pred_Gram-negative`) %>%
+ autoplot()
+```
+
+
+
+### **Conclusion**
+
+In this post, we demonstrated how to build a machine learning pipeline
+with the `tidymodels` framework and the `AMR` package. By combining
+selector functions like
+[`aminoglycosides()`](https://amr-for-r.org/reference/antimicrobial_selectors.md)
+and
+[`betalactams()`](https://amr-for-r.org/reference/antimicrobial_selectors.md)
+with `tidymodels`, we efficiently prepared data, trained a model, and
+evaluated its performance.
+
+This workflow is extensible to other antimicrobial classes and
+resistance patterns, empowering users to analyse AMR data systematically
+and reproducibly.
+
+------------------------------------------------------------------------
+
+## Example 2: Predicting ESBL Presence Using Raw MICs
+
+In this second example, we demonstrate how to use `` columns
+directly in `tidymodels` workflows using AMR-specific recipe steps. This
+includes a transformation to `log2` scale using `step_mic_log2()`, which
+prepares MIC values for use in classification models.
+
+This approach and idea formed the basis for the publication [DOI:
+10.3389/fmicb.2025.1582703](https://doi.org/10.3389/fmicb.2025.1582703)
+to model the presence of extended-spectrum beta-lactamases (ESBL).
+
+> NOTE: THIS EXAMPLE WILL BE AVAILABLE IN A NEXT VERSION (#TODO)
+>
+> The new AMR package version will contain new tidymodels selectors such
+> as `step_mic_log2()`.
+
+------------------------------------------------------------------------
+
+## Example 2: Predicting AMR Over Time
+
+In this third example, we aim to predict antimicrobial resistance (AMR)
+trends over time using `tidymodels`. We will model resistance to three
+antibiotics (amoxicillin `AMX`, amoxicillin-clavulanic acid `AMC`, and
+ciprofloxacin `CIP`), based on historical data grouped by year and
+hospital ward.
+
+### **Objective**
+
+Our goal is to:
+
+1. Prepare the dataset by aggregating resistance data over time.
+2. Define a regression model to predict AMR trends.
+3. Use `tidymodels` to preprocess, train, and evaluate the model.
+
+### **Data Preparation**
+
+We start by transforming the `example_isolates` dataset into a
+structured time-series format.
+
+``` r
+# Load required libraries
+library(AMR)
+library(tidymodels)
+
+# Transform dataset
+data_time <- example_isolates %>%
+ top_n_microorganisms(n = 10) %>% # Filter on the top #10 species
+ mutate(year = as.integer(format(date, "%Y")), # Extract year from date
+ gramstain = mo_gramstain(mo)) %>% # Get taxonomic names
+ group_by(year, gramstain) %>%
+ summarise(across(c(AMX, AMC, CIP),
+ function(x) resistance(x, minimum = 0),
+ .names = "res_{.col}"),
+ .groups = "drop") %>%
+ filter(!is.na(res_AMX) & !is.na(res_AMC) & !is.na(res_CIP)) # Drop missing values
+#> ℹ Using column 'mo' as input for `col_mo`.
+
+data_time
+#> # A tibble: 32 × 5
+#> year gramstain res_AMX res_AMC res_CIP
+#>
+#> 1 2002 Gram-negative 1 0.105 0.0606
+#> 2 2002 Gram-positive 0.838 0.182 0.162
+#> 3 2003 Gram-negative 1 0.0714 0
+#> 4 2003 Gram-positive 0.714 0.244 0.154
+#> 5 2004 Gram-negative 0.464 0.0938 0
+#> 6 2004 Gram-positive 0.849 0.299 0.244
+#> 7 2005 Gram-negative 0.412 0.132 0.0588
+#> 8 2005 Gram-positive 0.882 0.382 0.154
+#> 9 2006 Gram-negative 0.379 0 0.1
+#> 10 2006 Gram-positive 0.778 0.333 0.353
+#> # ℹ 22 more rows
+```
+
+**Explanation:**
+
+- `mo_name(mo)`: Converts microbial codes into proper species names.
+- [`resistance()`](https://amr-for-r.org/reference/proportion.md):
+ Converts AMR results into numeric values (proportion of resistant
+ isolates).
+- `group_by(year, ward, species)`: Aggregates resistance rates by year
+ and ward.
+
+### **Defining the Workflow**
+
+We now define the modelling workflow, which consists of a preprocessing
+step, a model specification, and the fitting process.
+
+#### 1. Preprocessing with a Recipe
+
+``` r
+# Define the recipe
+resistance_recipe_time <- recipe(res_AMX ~ year + gramstain, data = data_time) %>%
+ step_dummy(gramstain, one_hot = TRUE) %>% # Convert categorical to numerical
+ step_normalize(year) %>% # Normalise year for better model performance
+ step_nzv(all_predictors()) # Remove near-zero variance predictors
+
+resistance_recipe_time
+#>
+#> ── Recipe ──────────────────────────────────────────────────────────────────────
+#>
+#> ── Inputs
+#> Number of variables by role
+#> outcome: 1
+#> predictor: 2
+#>
+#> ── Operations
+#> • Dummy variables from: gramstain
+#> • Centering and scaling for: year
+#> • Sparse, unbalanced variable filter on: all_predictors()
+```
+
+**Explanation:**
+
+- `step_dummy()`: Encodes categorical variables (`ward`, `species`) as
+ numerical indicators.
+- `step_normalize()`: Normalises the `year` variable.
+- `step_nzv()`: Removes near-zero variance predictors.
+
+#### 2. Specifying the Model
+
+We use a linear regression model to predict resistance trends.
+
+``` r
+# Define the linear regression model
+lm_model <- linear_reg() %>%
+ set_engine("lm") # Use linear regression
+
+lm_model
+#> Linear Regression Model Specification (regression)
+#>
+#> Computational engine: lm
+```
+
+**Explanation:**
+
+- `linear_reg()`: Defines a linear regression model.
+- `set_engine("lm")`: Uses R’s built-in linear regression engine.
+
+#### 3. Building the Workflow
+
+We combine the preprocessing recipe and model into a workflow.
+
+``` r
+# Create workflow
+resistance_workflow_time <- workflow() %>%
+ add_recipe(resistance_recipe_time) %>%
+ add_model(lm_model)
+
+resistance_workflow_time
+#> ══ Workflow ════════════════════════════════════════════════════════════════════
+#> Preprocessor: Recipe
+#> Model: linear_reg()
+#>
+#> ── Preprocessor ────────────────────────────────────────────────────────────────
+#> 3 Recipe Steps
+#>
+#> • step_dummy()
+#> • step_normalize()
+#> • step_nzv()
+#>
+#> ── Model ───────────────────────────────────────────────────────────────────────
+#> Linear Regression Model Specification (regression)
+#>
+#> Computational engine: lm
+```
+
+### **Training and Evaluating the Model**
+
+We split the data into training and testing sets, fit the model, and
+evaluate performance.
+
+``` r
+# Split the data
+set.seed(123)
+data_split_time <- initial_split(data_time, prop = 0.8)
+train_time <- training(data_split_time)
+test_time <- testing(data_split_time)
+
+# Train the model
+fitted_workflow_time <- resistance_workflow_time %>%
+ fit(train_time)
+
+# Make predictions
+predictions_time <- fitted_workflow_time %>%
+ predict(test_time) %>%
+ bind_cols(test_time)
+
+# Evaluate model
+metrics_time <- predictions_time %>%
+ metrics(truth = res_AMX, estimate = .pred)
+
+metrics_time
+#> # A tibble: 3 × 3
+#> .metric .estimator .estimate
+#>
+#> 1 rmse standard 0.0774
+#> 2 rsq standard 0.711
+#> 3 mae standard 0.0704
+```
+
+**Explanation:**
+
+- `initial_split()`: Splits data into training and testing sets.
+- `fit()`: Trains the workflow.
+- [`predict()`](https://rdrr.io/r/stats/predict.html): Generates
+ resistance predictions.
+- `metrics()`: Evaluates model performance.
+
+### **Visualising Predictions**
+
+We plot resistance trends over time for amoxicillin.
+
+``` r
+library(ggplot2)
+
+# Plot actual vs predicted resistance over time
+ggplot(predictions_time, aes(x = year)) +
+ geom_point(aes(y = res_AMX, color = "Actual")) +
+ geom_line(aes(y = .pred, color = "Predicted")) +
+ labs(title = "Predicted vs Actual AMX Resistance Over Time",
+ x = "Year",
+ y = "Resistance Proportion") +
+ theme_minimal()
+```
+
+
+
+Additionally, we can visualise resistance trends in `ggplot2` and
+directly add linear models there:
+
+``` r
+ggplot(data_time, aes(x = year, y = res_AMX, color = gramstain)) +
+ geom_line() +
+ labs(title = "AMX Resistance Trends",
+ x = "Year",
+ y = "Resistance Proportion") +
+ # add a linear model directly in ggplot2:
+ geom_smooth(method = "lm",
+ formula = y ~ x,
+ alpha = 0.25) +
+ theme_minimal()
+```
+
+
+
+### **Conclusion**
+
+In this example, we demonstrated how to analyze AMR trends over time
+using `tidymodels`. By aggregating resistance rates by year and hospital
+ward, we built a predictive model to track changes in resistance to
+amoxicillin (`AMX`), amoxicillin-clavulanic acid (`AMC`), and
+ciprofloxacin (`CIP`).
+
+This method can be extended to other antibiotics and resistance
+patterns, providing valuable insights into AMR dynamics in healthcare
+settings.
diff --git a/articles/EUCAST.html b/articles/EUCAST.html
index 82a188ee1..cea7f1cd3 100644
--- a/articles/EUCAST.html
+++ b/articles/EUCAST.html
@@ -30,7 +30,7 @@
AMR (for R)
- 3.0.1.9002
+ 3.0.1.9003
diff --git a/articles/EUCAST.md b/articles/EUCAST.md
new file mode 100644
index 000000000..a75e634ef
--- /dev/null
+++ b/articles/EUCAST.md
@@ -0,0 +1,126 @@
+# Apply EUCAST rules
+
+## Introduction
+
+What are EUCAST rules? The European Committee on Antimicrobial
+Susceptibility Testing (EUCAST) states [on their
+website](https://www.eucast.org/expert_rules_and_expected_phenotypes):
+
+> *EUCAST expert rules (see below) are a tabulated collection of expert
+> knowledge on interpretive rules, expected resistant phenotypes and
+> expected susceptible phenotypes which should be applied to
+> antimicrobial susceptibility testing in order to reduce testing,
+> reduce errors and make appropriate recommendations for reporting
+> particular resistances.*
+
+In Europe, a lot of medical microbiological laboratories already apply
+these rules ([Brown *et al.*,
+2015](https://www.eurosurveillance.org/content/10.2807/1560-7917.ES2015.20.2.21008)).
+Our package features their latest insights on expected resistant
+phenotypes (v1.2, 2023).
+
+## Examples
+
+These rules can be used to discard improbable bug-drug combinations in
+your data. For example, *Klebsiella* produces beta-lactamase that
+prevents ampicillin (or amoxicillin) from working against it. In other
+words, practically every strain of *Klebsiella* is resistant to
+ampicillin.
+
+Sometimes, laboratory data can still contain such strains with
+*Klebsiella* being susceptible to ampicillin. This could be because an
+antibiogram is available before an identification is available, and the
+antibiogram is then not re-interpreted based on the identification. The
+[`eucast_rules()`](https://amr-for-r.org/reference/eucast_rules.md)
+function resolves this, by applying the latest ‘EUCAST Expected
+Resistant Phenotypes’ guideline:
+
+``` r
+oops <- tibble::tibble(
+ mo = c(
+ "Klebsiella pneumoniae",
+ "Escherichia coli"
+ ),
+ ampicillin = as.sir("S")
+)
+oops
+#> # A tibble: 2 × 2
+#> mo ampicillin
+#>
+#> 1 Klebsiella pneumoniae S
+#> 2 Escherichia coli S
+
+eucast_rules(oops, info = FALSE, overwrite = TRUE)
+#> # A tibble: 2 × 2
+#> mo ampicillin
+#>
+#> 1 Klebsiella pneumoniae R
+#> 2 Escherichia coli S
+```
+
+A more convenient function is
+[`mo_is_intrinsic_resistant()`](https://amr-for-r.org/reference/mo_property.md)
+that uses the same guideline, but allows to check for one or more
+specific microorganisms or antimicrobials:
+
+``` r
+mo_is_intrinsic_resistant(
+ c("Klebsiella pneumoniae", "Escherichia coli"),
+ "ampicillin"
+)
+#> [1] TRUE FALSE
+
+mo_is_intrinsic_resistant(
+ "Klebsiella pneumoniae",
+ c("ampicillin", "kanamycin")
+)
+#> [1] TRUE FALSE
+```
+
+EUCAST rules can not only be used for correction, they can also be used
+for filling in known resistance and susceptibility based on results of
+other antimicrobials drugs. This process is called *interpretive
+reading*, and is basically a form of imputation:
+
+``` r
+data <- tibble::tibble(
+ mo = c(
+ "Staphylococcus aureus",
+ "Enterococcus faecalis",
+ "Escherichia coli",
+ "Klebsiella pneumoniae",
+ "Pseudomonas aeruginosa"
+ ),
+ VAN = "-", # Vancomycin
+ AMX = "-", # Amoxicillin
+ COL = "-", # Colistin
+ CAZ = "-", # Ceftazidime
+ CXM = "-", # Cefuroxime
+ PEN = "S", # Benzylenicillin
+ FOX = "S" # Cefoxitin
+)
+```
+
+``` r
+data
+```
+
+| mo | VAN | AMX | COL | CAZ | CXM | PEN | FOX |
+|:-----------------------|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+| Staphylococcus aureus | \- | \- | \- | \- | \- | S | S |
+| Enterococcus faecalis | \- | \- | \- | \- | \- | S | S |
+| Escherichia coli | \- | \- | \- | \- | \- | S | S |
+| Klebsiella pneumoniae | \- | \- | \- | \- | \- | S | S |
+| Pseudomonas aeruginosa | \- | \- | \- | \- | \- | S | S |
+
+``` r
+eucast_rules(data, overwrite = TRUE)
+```
+
+| mo | VAN | AMX | COL | CAZ | CXM | PEN | FOX |
+|:-----------------------|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+| Staphylococcus aureus | \- | S | R | R | S | S | S |
+| Enterococcus faecalis | \- | \- | R | R | R | S | R |
+| Escherichia coli | R | \- | \- | \- | \- | R | S |
+| Klebsiella pneumoniae | R | R | \- | \- | \- | R | S |
+| Pseudomonas aeruginosa | R | R | \- | \- | R | R | R |
diff --git a/articles/PCA.html b/articles/PCA.html
index 6643a764a..8168310c6 100644
--- a/articles/PCA.html
+++ b/articles/PCA.html
@@ -30,7 +30,7 @@
AMR (for R)
- 3.0.1.9002
+ 3.0.1.9003
@@ -210,18 +210,18 @@ per drug explain the difference per microorganism.
But we can’t see the explanation of the points. Perhaps this works
better with our new ggplot_pca() function, that
automatically adds the right labels and even groups:
diff --git a/articles/PCA.md b/articles/PCA.md
new file mode 100644
index 000000000..741cc4899
--- /dev/null
+++ b/articles/PCA.md
@@ -0,0 +1,157 @@
+# Conduct principal component analysis (PCA) for AMR
+
+**NOTE: This page will be updated soon, as the pca() function is
+currently being developed.**
+
+## Introduction
+
+## Transforming
+
+For PCA, we need to transform our AMR data first. This is what the
+`example_isolates` data set in this package looks like:
+
+``` r
+library(AMR)
+library(dplyr)
+glimpse(example_isolates)
+#> Rows: 2,000
+#> Columns: 46
+#> $ date 2002-01-02, 2002-01-03, 2002-01-07, 2002-01-07, 2002-01-13, 2…
+#> $ patient "A77334", "A77334", "067927", "067927", "067927", "067927", "4…
+#> $ age 65, 65, 45, 45, 45, 45, 78, 78, 45, 79, 67, 67, 71, 71, 75, 50…
+#> $ gender "F", "F", "F", "F", "F", "F", "M", "M", "F", "F", "M", "M", "M…
+#> $ ward "Clinical", "Clinical", "ICU", "ICU", "ICU", "ICU", "Clinical"…
+#> $ mo "B_ESCHR_COLI", "B_ESCHR_COLI", "B_STPHY_EPDR", "B_STPHY_EPDR",…
+#> $ PEN R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, S,…
+#> $ OXA NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+#> $ FLC NA, NA, R, R, R, R, S, S, R, S, S, S, NA, NA, NA, NA, NA, R, R…
+#> $ AMX NA, NA, NA, NA, NA, NA, R, R, NA, NA, NA, NA, NA, NA, R, NA, N…
+#> $ AMC I, I, NA, NA, NA, NA, S, S, NA, NA, S, S, I, I, R, I, I, NA, N…
+#> $ AMP NA, NA, NA, NA, NA, NA, R, R, NA, NA, NA, NA, NA, NA, R, NA, N…
+#> $ TZP NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+#> $ CZO NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R, NA,…
+#> $ FEP NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+#> $ CXM I, I, R, R, R, R, S, S, R, S, S, S, S, S, NA, S, S, R, R, S, S…
+#> $ FOX NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R, NA,…
+#> $ CTX NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, S, NA, S, S…
+#> $ CAZ NA, NA, R, R, R, R, R, R, R, R, R, R, NA, NA, NA, S, S, R, R, …
+#> $ CRO NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, S, NA, S, S…
+#> $ GEN NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+#> $ TOB NA, NA, NA, NA, NA, NA, S, S, NA, NA, NA, NA, S, S, NA, NA, NA…
+#> $ AMK NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+#> $ KAN NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+#> $ TMP R, R, S, S, R, R, R, R, S, S, NA, NA, S, S, S, S, S, R, R, R, …
+#> $ SXT R, R, S, S, NA, NA, NA, NA, S, S, NA, NA, S, S, S, S, S, NA, N…
+#> $ NIT NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R,…
+#> $ FOS NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+#> $ LNZ R, R, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R, R, R, R, R, N…
+#> $ CIP NA, NA, NA, NA, NA, NA, NA, NA, S, S, NA, NA, NA, NA, NA, S, S…
+#> $ MFX NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+#> $ VAN R, R, S, S, S, S, S, S, S, S, NA, NA, R, R, R, R, R, S, S, S, …
+#> $ TEC R, R, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R, R, R, R, R, N…
+#> $ TCY R, R, S, S, S, S, S, S, S, I, S, S, NA, NA, I, R, R, S, I, R, …
+#> $ TGC NA, NA, S, S, S, S, S, S, S, NA, S, S, NA, NA, NA, R, R, S, NA…
+#> $ DOX NA, NA, S, S, S, S, S, S, S, NA, S, S, NA, NA, NA, R, R, S, NA…
+#> $ ERY R, R, R, R, R, R, S, S, R, S, S, S, R, R, R, R, R, R, R, R, S,…
+#> $ CLI R, R, NA, NA, NA, R, NA, NA, NA, NA, NA, NA, R, R, R, R, R, NA…
+#> $ AZM R, R, R, R, R, R, S, S, R, S, S, S, R, R, R, R, R, R, R, R, S,…
+#> $ IPM NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, S, NA, S, S…
+#> $ MEM NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+#> $ MTR NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+#> $ CHL NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+#> $ COL NA, NA, R, R, R, R, R, R, R, R, R, R, NA, NA, NA, R, R, R, R, …
+#> $ MUP NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+#> $ RIF R, R, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R, R, R, R, R, N…
+```
+
+Now to transform this to a data set with only resistance percentages per
+taxonomic order and genus:
+
+``` r
+resistance_data <- example_isolates %>%
+ group_by(
+ order = mo_order(mo), # group on anything, like order
+ genus = mo_genus(mo)
+ ) %>% # and genus as we do here
+ summarise_if(is.sir, resistance) %>% # then get resistance of all drugs
+ select(
+ order, genus, AMC, CXM, CTX,
+ CAZ, GEN, TOB, TMP, SXT
+ ) # and select only relevant columns
+
+head(resistance_data)
+#> # A tibble: 6 × 10
+#> # Groups: order [5]
+#> order genus AMC CXM CTX CAZ GEN TOB TMP SXT
+#>
+#> 1 (unknown order) (unknown ge… NA NA NA NA NA NA NA NA
+#> 2 Actinomycetales Schaalia NA NA NA NA NA NA NA NA
+#> 3 Bacteroidales Bacteroides NA NA NA NA NA NA NA NA
+#> 4 Campylobacterales Campylobact… NA NA NA NA NA NA NA NA
+#> 5 Caryophanales Gemella NA NA NA NA NA NA NA NA
+#> 6 Caryophanales Listeria NA NA NA NA NA NA NA NA
+```
+
+## Perform principal component analysis
+
+The new [`pca()`](https://amr-for-r.org/reference/pca.md) function will
+automatically filter on rows that contain numeric values in all selected
+variables, so we now only need to do:
+
+``` r
+pca_result <- pca(resistance_data)
+#> ℹ Columns selected for PCA: "AMC", "CAZ", "CTX", "CXM", "GEN", "SXT",
+#> "TMP", and "TOB". Total observations available: 7.
+```
+
+The result can be reviewed with the good old
+[`summary()`](https://rdrr.io/r/base/summary.html) function:
+
+``` r
+summary(pca_result)
+#> Groups (n=4, named as 'order'):
+#> [1] "Caryophanales" "Enterobacterales" "Lactobacillales" "Pseudomonadales"
+#> Importance of components:
+#> PC1 PC2 PC3 PC4 PC5 PC6 PC7
+#> Standard deviation 2.1539 1.6807 0.6138 0.33879 0.20808 0.03140 1.232e-16
+#> Proportion of Variance 0.5799 0.3531 0.0471 0.01435 0.00541 0.00012 0.000e+00
+#> Cumulative Proportion 0.5799 0.9330 0.9801 0.99446 0.99988 1.00000 1.000e+00
+```
+
+ #> Groups (n=4, named as 'order'):
+ #> [1] "Caryophanales" "Enterobacterales" "Lactobacillales" "Pseudomonadales"
+
+Good news. The first two components explain a total of 93.3% of the
+variance (see the PC1 and PC2 values of the *Proportion of Variance*. We
+can create a so-called biplot with the base R
+[`biplot()`](https://rdrr.io/r/stats/biplot.html) function, to see which
+antimicrobial resistance per drug explain the difference per
+microorganism.
+
+## Plotting the results
+
+``` r
+biplot(pca_result)
+```
+
+
+
+But we can’t see the explanation of the points. Perhaps this works
+better with our new
+[`ggplot_pca()`](https://amr-for-r.org/reference/ggplot_pca.md)
+function, that automatically adds the right labels and even groups:
+
+``` r
+ggplot_pca(pca_result)
+```
+
+
+
+You can also print an ellipse per group, and edit the appearance:
+
+``` r
+ggplot_pca(pca_result, ellipse = TRUE) +
+ ggplot2::labs(title = "An AMR/PCA biplot!")
+```
+
+
diff --git a/articles/WHONET.html b/articles/WHONET.html
index 6afab86d5..cabec7b01 100644
--- a/articles/WHONET.html
+++ b/articles/WHONET.html
@@ -30,7 +30,7 @@
AMR (for R)
- 3.0.1.9002
+ 3.0.1.9003
@@ -311,7 +311,7 @@ using the included ggplot_sir()group_by(Country)%>%select(Country, AMP_ND2, AMC_ED20, CAZ_ED10, CIP_ED5)%>%ggplot_sir(translate_ab ="ab", facet ="Country", datalabels =FALSE)
-
+
diff --git a/articles/WHONET.md b/articles/WHONET.md
new file mode 100644
index 000000000..3b4e98ed2
--- /dev/null
+++ b/articles/WHONET.md
@@ -0,0 +1,137 @@
+# Work with WHONET data
+
+### Import of data
+
+This tutorial assumes you already imported the WHONET data with e.g. the
+[`readxl` package](https://readxl.tidyverse.org/). In RStudio, this can
+be done using the menu button ‘Import Dataset’ in the tab ‘Environment’.
+Choose the option ‘From Excel’ and select your exported file. Make sure
+date fields are imported correctly.
+
+An example syntax could look like this:
+
+``` r
+library(readxl)
+data <- read_excel(path = "path/to/your/file.xlsx")
+```
+
+This package comes with an [example data set
+`WHONET`](https://amr-for-r.org/reference/WHONET.html). We will use it
+for this analysis.
+
+### Preparation
+
+First, load the relevant packages if you did not yet did this. I use the
+tidyverse for all of my analyses. All of them. If you don’t know it yet,
+I suggest you read about it on their website:
+.
+
+``` r
+library(dplyr) # part of tidyverse
+library(ggplot2) # part of tidyverse
+library(AMR) # this package
+library(cleaner) # to create frequency tables
+```
+
+We will have to transform some variables to simplify and automate the
+analysis:
+
+- Microorganisms should be transformed to our own microorganism codes
+ (called an `mo`) using [our Catalogue of Life reference data
+ set](https://amr-for-r.org/reference/catalogue_of_life), which
+ contains all ~70,000 microorganisms from the taxonomic kingdoms
+ Bacteria, Fungi and Protozoa. We do the tranformation with
+ [`as.mo()`](https://amr-for-r.org/reference/as.mo.md). This function
+ also recognises almost all WHONET abbreviations of microorganisms.
+- Antimicrobial results or interpretations have to be clean and valid.
+ In other words, they should only contain values `"S"`, `"I"` or `"R"`.
+ That is exactly where the
+ [`as.sir()`](https://amr-for-r.org/reference/as.sir.md) function is
+ for.
+
+``` r
+# transform variables
+data <- WHONET %>%
+ # get microbial ID based on given organism
+ mutate(mo = as.mo(Organism)) %>%
+ # transform everything from "AMP_ND10" to "CIP_EE" to the new `sir` class
+ mutate_at(vars(AMP_ND10:CIP_EE), as.sir)
+```
+
+No errors or warnings, so all values are transformed succesfully.
+
+We also created a package dedicated to data cleaning and checking,
+called the `cleaner` package. Its
+[`freq()`](https://msberends.github.io/cleaner/reference/freq.html)
+function can be used to create frequency tables.
+
+So let’s check our data, with a couple of frequency tables:
+
+``` r
+# our newly created `mo` variable, put in the mo_name() function
+data %>% freq(mo_name(mo), nmax = 10)
+```
+
+**Frequency table**
+
+Class: character
+Length: 500
+Available: 500 (100%, NA: 0 = 0%)
+Unique: 38
+
+Shortest: 11
+Longest: 40
+
+| | Item | Count | Percent | Cum. Count | Cum. Percent |
+|:----|:-----------------------------------------|------:|--------:|-----------:|-------------:|
+| 1 | Escherichia coli | 245 | 49.0% | 245 | 49.0% |
+| 2 | Coagulase-negative Staphylococcus (CoNS) | 74 | 14.8% | 319 | 63.8% |
+| 3 | Staphylococcus epidermidis | 38 | 7.6% | 357 | 71.4% |
+| 4 | Streptococcus pneumoniae | 31 | 6.2% | 388 | 77.6% |
+| 5 | Staphylococcus hominis | 21 | 4.2% | 409 | 81.8% |
+| 6 | Proteus mirabilis | 9 | 1.8% | 418 | 83.6% |
+| 7 | Enterococcus faecium | 8 | 1.6% | 426 | 85.2% |
+| 8 | Staphylococcus capitis urealyticus | 8 | 1.6% | 434 | 86.8% |
+| 9 | Enterobacter cloacae | 5 | 1.0% | 439 | 87.8% |
+| 10 | Enterococcus columbae | 4 | 0.8% | 443 | 88.6% |
+
+(omitted 28 entries, n = 57 \[11.4%\])
+
+``` r
+# our transformed antibiotic columns
+# amoxicillin/clavulanic acid (J01CR02) as an example
+data %>% freq(AMC_ND2)
+```
+
+**Frequency table**
+
+Class: factor \> ordered \> sir (numeric)
+Length: 500
+Levels: 5: S \< SDD \< I \< R \< NI
+Available: 481 (96.2%, NA: 19 = 3.8%)
+Unique: 3
+
+Drug: Amoxicillin/clavulanic acid (AMC, J01CR02/QJ01CR02)
+Drug group: Beta-lactams/penicillins
+%SI: 78.59%
+
+| | Item | Count | Percent | Cum. Count | Cum. Percent |
+|:----|:-----|------:|--------:|-----------:|-------------:|
+| 1 | S | 356 | 74.01% | 356 | 74.01% |
+| 2 | R | 103 | 21.41% | 459 | 95.43% |
+| 3 | I | 22 | 4.57% | 481 | 100.00% |
+
+### A first glimpse at results
+
+An easy `ggplot` will already give a lot of information, using the
+included [`ggplot_sir()`](https://amr-for-r.org/reference/ggplot_sir.md)
+function:
+
+``` r
+data %>%
+ group_by(Country) %>%
+ select(Country, AMP_ND2, AMC_ED20, CAZ_ED10, CIP_ED5) %>%
+ ggplot_sir(translate_ab = "ab", facet = "Country", datalabels = FALSE)
+```
+
+
diff --git a/articles/WISCA.html b/articles/WISCA.html
index b9e37608e..672c466c8 100644
--- a/articles/WISCA.html
+++ b/articles/WISCA.html
@@ -30,7 +30,7 @@
AMR (for R)
- 3.0.1.9002
+ 3.0.1.9003
diff --git a/articles/WISCA.md b/articles/WISCA.md
new file mode 100644
index 000000000..66a5526de
--- /dev/null
+++ b/articles/WISCA.md
@@ -0,0 +1,252 @@
+# Estimating Empirical Coverage with WISCA
+
+> This explainer was largely written by our [AMR for R
+> Assistant](https://chat.amr-for-r.org), a ChatGPT manually-trained
+> model able to answer any question about the `AMR` package.
+
+## Introduction
+
+Clinical guidelines for empirical antimicrobial therapy require
+*probabilistic reasoning*: what is the chance that a regimen will cover
+the likely infecting organisms, before culture results are available?
+
+This is the purpose of **WISCA**, or **Weighted-Incidence Syndromic
+Combination Antibiogram**.
+
+WISCA is a Bayesian approach that integrates:
+
+- **Pathogen prevalence** (how often each species causes the syndrome),
+- **Regimen susceptibility** (how often a regimen works *if* the
+ pathogen is known),
+
+to estimate the **overall empirical coverage** of antimicrobial
+regimens, with quantified uncertainty.
+
+This vignette explains how WISCA works, why it is useful, and how to
+apply it using the `AMR` package.
+
+## Why traditional antibiograms fall short
+
+A standard antibiogram gives you:
+
+ Species → Antibiotic → Susceptibility %
+
+But clinicians don’t know the species *a priori*. They need to choose a
+regimen that covers the **likely pathogens**, without knowing which one
+is present.
+
+Traditional antibiograms calculate the susceptibility % as just the
+number of resistant isolates divided by the total number of tested
+isolates. Therefore, traditional antibiograms:
+
+- Fragment information by organism,
+- Do not weight by real-world prevalence,
+- Do not account for combination therapy or sample size,
+- Do not provide uncertainty.
+
+## The idea of WISCA
+
+WISCA asks:
+
+> “What is the **probability** that this regimen **will cover** the
+> pathogen, given the syndrome?”
+
+This means combining two things:
+
+- **Incidence** of each pathogen in the syndrome,
+- **Susceptibility** of each pathogen to the regimen.
+
+We can write this as:
+
+$$\text{Coverage} = \sum\limits_{i}\left( \text{Incidence}_{i} \times \text{Susceptibility}_{i} \right)$$
+
+For example, suppose:
+
+- *E. coli* causes 60% of cases, and 90% of *E. coli* are susceptible to
+ a drug.
+- *Klebsiella* causes 40% of cases, and 70% of *Klebsiella* are
+ susceptible.
+
+Then:
+
+$$\text{Coverage} = (0.6 \times 0.9) + (0.4 \times 0.7) = 0.82$$
+
+But in real data, incidence and susceptibility are **estimated from
+samples**, so they carry uncertainty. WISCA models this
+**probabilistically**, using conjugate Bayesian distributions.
+
+## The Bayesian engine behind WISCA
+
+### Pathogen incidence
+
+Let:
+
+- $K$ be the number of pathogens,
+- $\alpha = (1,1,\ldots,1)$ be a **Dirichlet** prior (uniform),
+- $n = \left( n_{1},\ldots,n_{K} \right)$ be the observed counts per
+ species.
+
+Then the posterior incidence is:
+
+$$p \sim \text{Dirichlet}\left( \alpha_{1} + n_{1},\ldots,\alpha_{K} + n_{K} \right)$$
+
+To simulate from this, we use:
+
+$$x_{i} \sim \text{Gamma}\left( \alpha_{i} + n_{i},\ 1 \right),\quad p_{i} = \frac{x_{i}}{\sum\limits_{j = 1}^{K}x_{j}}$$
+
+### Susceptibility
+
+Each pathogen–regimen pair has a prior and data:
+
+- Prior: $\text{Beta}\left( \alpha_{0},\beta_{0} \right)$, with default
+ $\alpha_{0} = \beta_{0} = 1$
+- Data: $S$ susceptible out of $N$ tested
+
+The $S$ category could also include values SDD (susceptible,
+dose-dependent) and I (intermediate \[CLSI\], or susceptible, increased
+exposure \[EUCAST\]).
+
+Then the posterior is:
+
+$$\theta \sim \text{Beta}\left( \alpha_{0} + S,\ \beta_{0} + N - S \right)$$
+
+### Final coverage estimate
+
+Putting it together:
+
+1. Simulate pathogen incidence: $\mathbf{p} \sim \text{Dirichlet}$
+2. Simulate susceptibility:
+ $\theta_{i} \sim \text{Beta}\left( 1 + S_{i},\ 1 + R_{i} \right)$
+3. Combine:
+
+$$\text{Coverage} = \sum\limits_{i = 1}^{K}p_{i} \cdot \theta_{i}$$
+
+Repeat this simulation (e.g. 1000×) and summarise:
+
+- **Mean** = expected coverage
+- **Quantiles** = credible interval
+
+## Practical use in the `AMR` package
+
+### Prepare data and simulate synthetic syndrome
+
+``` r
+library(AMR)
+data <- example_isolates
+
+# Structure of our data
+data
+#> # A tibble: 2,000 × 46
+#> date patient age gender ward mo PEN OXA FLC AMX
+#>
+#> 1 2002-01-02 A77334 65 F Clinical B_ESCHR_COLI R NA NA NA
+#> 2 2002-01-03 A77334 65 F Clinical B_ESCHR_COLI R NA NA NA
+#> 3 2002-01-07 067927 45 F ICU B_STPHY_EPDR R NA R NA
+#> 4 2002-01-07 067927 45 F ICU B_STPHY_EPDR R NA R NA
+#> 5 2002-01-13 067927 45 F ICU B_STPHY_EPDR R NA R NA
+#> 6 2002-01-13 067927 45 F ICU B_STPHY_EPDR R NA R NA
+#> 7 2002-01-14 462729 78 M Clinical B_STPHY_AURS R NA S R
+#> 8 2002-01-14 462729 78 M Clinical B_STPHY_AURS R NA S R
+#> 9 2002-01-16 067927 45 F ICU B_STPHY_EPDR R NA R NA
+#> 10 2002-01-17 858515 79 F ICU B_STPHY_EPDR R NA S NA
+#> # ℹ 1,990 more rows
+#> # ℹ 36 more variables: AMC , AMP , TZP , CZO , FEP ,
+#> # CXM , FOX , CTX , CAZ , CRO , GEN ,
+#> # TOB , AMK , KAN , TMP , SXT , NIT ,
+#> # FOS , LNZ , CIP , MFX , VAN , TEC ,
+#> # TCY , TGC , DOX , ERY , CLI , AZM ,
+#> # IPM , MEM , MTR , CHL , COL , MUP , …
+
+# Add a fake syndrome column
+data$syndrome <- ifelse(data$mo %like% "coli", "UTI", "No UTI")
+```
+
+### Basic WISCA antibiogram
+
+``` r
+wisca(data,
+ antimicrobials = c("AMC", "CIP", "GEN"))
+```
+
+| Amoxicillin/clavulanic acid | Ciprofloxacin | Gentamicin |
+|:----------------------------|:-----------------|:-------------------|
+| 73.7% (71.7-75.8%) | 77% (74.3-79.4%) | 72.8% (70.7-74.8%) |
+
+### Use combination regimens
+
+``` r
+wisca(data,
+ antimicrobials = c("AMC", "AMC + CIP", "AMC + GEN"))
+```
+
+| Amoxicillin/clavulanic acid | Amoxicillin/clavulanic acid + Ciprofloxacin | Amoxicillin/clavulanic acid + Gentamicin |
+|:----------------------------|:--------------------------------------------|:-----------------------------------------|
+| 73.8% (71.8-75.7%) | 87.5% (85.9-89%) | 89.7% (88.2-91.1%) |
+
+### Stratify by syndrome
+
+``` r
+wisca(data,
+ antimicrobials = c("AMC", "AMC + CIP", "AMC + GEN"),
+ syndromic_group = "syndrome")
+```
+
+| Syndromic Group | Amoxicillin/clavulanic acid | Amoxicillin/clavulanic acid + Ciprofloxacin | Amoxicillin/clavulanic acid + Gentamicin |
+|:----------------|:----------------------------|:--------------------------------------------|:-----------------------------------------|
+| No UTI | 70.1% (67.8-72.3%) | 85.2% (83.1-87.2%) | 87.1% (85.3-88.7%) |
+| UTI | 80.9% (77.7-83.8%) | 88.2% (85.7-90.5%) | 90.9% (88.7-93%) |
+
+The `AMR` package is available in 28 languages, which can all be used
+for the [`wisca()`](https://amr-for-r.org/reference/antibiogram.md)
+function too:
+
+``` r
+wisca(data,
+ antimicrobials = c("AMC", "AMC + CIP", "AMC + GEN"),
+ syndromic_group = gsub("UTI", "UCI", data$syndrome),
+ language = "Spanish")
+```
+
+| Grupo sindrómico | Amoxicilina/ácido clavulánico | Amoxicilina/ácido clavulánico + Ciprofloxacina | Amoxicilina/ácido clavulánico + Gentamicina |
+|:-----------------|:------------------------------|:-----------------------------------------------|:--------------------------------------------|
+| No UCI | 70% (67.8-72.4%) | 85.3% (83.3-87.2%) | 87% (85.3-88.8%) |
+| UCI | 80.9% (77.7-83.9%) | 88.2% (85.5-90.6%) | 90.9% (88.7-93%) |
+
+## Sensible defaults, which can be customised
+
+- `simulations = 1000`: number of Monte Carlo draws
+- `conf_interval = 0.95`: coverage interval width
+- `combine_SI = TRUE`: count “I” and “SDD” as susceptible
+
+## Limitations
+
+- It assumes your data are representative
+- No adjustment for patient-level covariates, although these could be
+ passed onto the `syndromic_group` argument
+- WISCA does not model resistance over time, you might want to use
+ `tidymodels` for that, for which we [wrote a basic
+ introduction](https://amr-for-r.org/articles/AMR_with_tidymodels.html)
+
+## Summary
+
+WISCA enables:
+
+- Empirical regimen comparison,
+- Syndrome-specific coverage estimation,
+- Fully probabilistic interpretation.
+
+It is available in the `AMR` package via either:
+
+``` r
+wisca(...)
+
+antibiogram(..., wisca = TRUE)
+```
+
+## Reference
+
+Bielicki, JA, et al. (2016). *Selecting appropriate empirical antibiotic
+regimens for paediatric bloodstream infections: application of a
+Bayesian decision model to local and pooled antimicrobial resistance
+surveillance data.* **J Antimicrob Chemother**. 71(3):794-802.
+
diff --git a/articles/datasets.html b/articles/datasets.html
index fc378feaf..7bc589768 100644
--- a/articles/datasets.html
+++ b/articles/datasets.html
@@ -30,7 +30,7 @@
AMR (for R)
- 3.0.1.9002
+ 3.0.1.9003
@@ -80,7 +80,7 @@
@@ -417,14 +417,14 @@ all SNOMED codes as comma separated values.
antimicrobials: Antibiotic and Antifungal Drugs
-
A data set with 496 rows and 14 columns, containing the following
+
A data set with 498 rows and 14 columns, containing the following
column names: ab, cid, name, group, atc,
atc_group1, atc_group2, abbreviations,
synonyms, oral_ddd, oral_units,
iv_ddd, iv_units, and loinc.
This data set is in R available as antimicrobials, after
you load the AMR package.
-
It was last updated on 1 September 2025 14:56:55 UTC. Find more info
+
It was last updated on 24 November 2025 10:24:02 UTC. Find more info
about the contents, (scientific) source, and structure of this data set
here.