1
0
mirror of https://github.com/msberends/AMR.git synced 2026-06-29 16:56:21 +02:00

improve top_n_microorganisms(): add property_for_each, fix property=NULL, enforce rank order (#297)

This commit is contained in:
Matthijs Berends
2026-06-26 21:40:11 +02:00
committed by GitHub
parent 02bd9a71c1
commit f7d353361c
15 changed files with 143 additions and 95 deletions

View File

@@ -85,6 +85,27 @@ _pkgdown.yml # pkgdown website configuration
- `translate.R` — 28-language translation system - `translate.R` — 28-language translation system
- `ggplot_sir.R` / `ggplot_pca.R` / `plotting.R` — visualisation functions - `ggplot_sir.R` / `ggplot_pca.R` / `plotting.R` — visualisation functions
## Code Style
Follow the [tidyverse style guide](https://style.tidyverse.org/) precisely. Key rules:
- 2-space indentation; no tabs
- `<-` for assignment, not `=`
- Spaces around all binary operators and after commas; no spaces inside parentheses
- When a function call must break across lines, place the first argument on a new line indented by 2 spaces, and put the closing `)` on its own line — **never align arguments to the opening parenthesis** (no hanging/forced mid-line indentation)
```r
# good
stop_(
"some long message part one ",
"part two"
)
# bad — forces indentation to match the opening parenthesis
stop_("some long message part one ",
"part two")
```
## Custom S3 Classes ## Custom S3 Classes
The package defines five S3 classes with full print/format/plot/vctrs support: The package defines five S3 classes with full print/format/plot/vctrs support:

View File

@@ -1,5 +1,5 @@
Package: AMR Package: AMR
Version: 3.0.1.9076 Version: 3.0.1.9077
Date: 2026-06-26 Date: 2026-06-26
Title: Antimicrobial Resistance Data Analysis Title: Antimicrobial Resistance Data Analysis
Description: Functions to simplify and standardise antimicrobial resistance (AMR) Description: Functions to simplify and standardise antimicrobial resistance (AMR)

View File

@@ -1,4 +1,4 @@
# AMR 3.0.1.9076 # AMR 3.0.1.9077
Planned as v3.1.0, end of June 2026. Planned as v3.1.0, end of June 2026.
@@ -37,6 +37,7 @@ Planned as v3.1.0, end of June 2026.
* Fixed some EUCAST Expert Rules, mostly on *S. pneumoniae* * Fixed some EUCAST Expert Rules, mostly on *S. pneumoniae*
### Updated ### Updated
* `top_n_microorganisms()`: new `property_for_each` argument for sub-grouping within top *n* groups; rank ordering enforced (only lower taxonomic ranks allowed); fixed `property = NULL` not being accepted; inner filter now tracks original row indices to prevent cross-group contamination
* Taxonomic update for all microorganisms, now updated to June 2026 * Taxonomic update for all microorganisms, now updated to June 2026
* `mo_kingdom()` now returns the formal taxonomic kingdom; a one-time note per session explains the change when querying bacterial or archaeal records. * `mo_kingdom()` now returns the formal taxonomic kingdom; a one-time note per session explains the change when querying bacterial or archaeal records.
* `mo_taxonomy()` and `mo_info()` gained `domain` for the list output * `mo_taxonomy()` and `mo_info()` gained `domain` for the list output

Binary file not shown.

View File

@@ -126,7 +126,8 @@ step_mic_log2 <- function(
trained = FALSE, trained = FALSE,
columns = NULL, columns = NULL,
skip = FALSE, skip = FALSE,
id = recipes::rand_id("mic_log2")) { id = recipes::rand_id("mic_log2")
) {
recipes::add_step( recipes::add_step(
recipe, recipe,
step_mic_log2_new( step_mic_log2_new(
@@ -201,7 +202,8 @@ step_sir_numeric <- function(
trained = FALSE, trained = FALSE,
columns = NULL, columns = NULL,
skip = FALSE, skip = FALSE,
id = recipes::rand_id("sir_numeric")) { id = recipes::rand_id("sir_numeric")
) {
recipes::add_step( recipes::add_step(
recipe, recipe,
step_sir_numeric_new( step_sir_numeric_new(

View File

@@ -29,73 +29,88 @@
#' Filter Top *n* Microorganisms #' Filter Top *n* Microorganisms
#' #'
#' This function filters a data set to include only the top *n* microorganisms based on a specified property, such as taxonomic family or genus. For example, it can filter a data set to the top 3 species, or to any species in the top 5 genera, or to the top 3 species in each of the top 5 genera. #' Filters a data set to include only the top *n* microorganisms based on a specified property, such as taxonomic family or genus. For example, it can filter a data set to the top 3 species, to any species in the top 5 genera, or to the top 3 species in each of the top 5 genera.
#' @param x A data frame containing microbial data. #' @param x A data frame containing microbial data.
#' @param n An integer specifying the maximum number of unique values of the `property` to include in the output. #' @param n A positive whole number specifying the maximum number of unique values of `property` to include in the output.
#' @param property A character string indicating the microorganism property to use for filtering. Must be one of the column names of the [microorganisms] data set: `r vector_or(colnames(microorganisms), sort = FALSE, documentation = TRUE)`. If `NULL`, the raw values from `col_mo` will be used without transformation. When using `"species"` (default) or `"subpecies"`, the genus will be added to make sure each (sub)species still belongs to the right genus. #' @param property A character string indicating the microorganism property to use for filtering. Must be one of the column names of the [microorganisms] data set: `r vector_or(colnames(microorganisms), sort = FALSE, documentation = TRUE)`. If `NULL`, the raw values from `col_mo` will be used without transformation. When using `"species"` (default) or `"subspecies"`, the genus is prepended to ensure each name is unambiguous.
#' @param n_for_each An optional integer specifying the maximum number of rows to retain for each value of the selected property. If `NULL`, all rows within the top *n* groups will be included. #' @param n_for_each An optional positive whole number specifying the maximum number of distinct microorganism groups at the level of `property_for_each` to retain within each of the top *n* groups. Only used when `property_for_each` is also set.
#' @param property_for_each The microorganism property to use for sub-grouping within each top *n* group. Must be one of the column names of the [microorganisms] data set and at a strictly lower taxonomic rank than `property` (allowed order: domain > kingdom > phylum > class > order > family > genus > species > subspecies). Defaults to `"species"`. Only relevant when `n_for_each` is set.
#' @param col_mo A character string indicating the column in `x` that contains microorganism names or codes. Defaults to the first column of class [`mo`]. Values will be coerced using [as.mo()]. #' @param col_mo A character string indicating the column in `x` that contains microorganism names or codes. Defaults to the first column of class [`mo`]. Values will be coerced using [as.mo()].
#' @param ... Additional arguments passed on to [mo_property()] when `property` is not `NULL`. #' @param ... Additional arguments passed on to [mo_property()] when `property` is not `NULL`.
#' @details This function is useful for preprocessing data before creating [antibiograms][antibiogram()] or other analyses that require focused subsets of microbial data. For example, it can filter a data set to only include isolates from the top 10 species. #' @details This function is useful for preprocessing data before creating [antibiograms][antibiogram()] or other analyses that require focused subsets of microbial data.
#' @export #' @export
#' @seealso [mo_property()], [as.mo()], [antibiogram()] #' @seealso [mo_property()], [as.mo()], [antibiogram()]
#' @examples #' @examples
#' # filter to the top 3 species: #' # filter to the top 3 species:
#' top_n_microorganisms(example_isolates, #' top_n_microorganisms(example_isolates, n = 3)
#' n = 3
#' )
#' #'
#' # filter to any species in the top 5 genera: #' # filter to any species in the top 5 genera:
#' top_n_microorganisms(example_isolates, #' top_n_microorganisms(example_isolates, n = 5, property = "genus")
#' n = 5, property = "genus"
#' )
#' #'
#' # filter to the top 3 species in each of the top 5 genera: #' # filter to the top 3 species in each of the top 5 genera:
#' top_n_microorganisms(example_isolates, #' top_n_microorganisms(example_isolates,
#' n = 5, property = "genus", n_for_each = 3 #' n = 5, property = "genus", n_for_each = 3
#' ) #' )
top_n_microorganisms <- function(x, n, property = "species", n_for_each = NULL, col_mo = NULL, ...) { #'
#' # filter to the top 2 genera in each of the top 3 families:
#' top_n_microorganisms(example_isolates,
#' n = 3, property = "family", n_for_each = 2, property_for_each = "genus"
#' )
top_n_microorganisms <- function(x, n, property = "species", n_for_each = NULL, property_for_each = "species", col_mo = NULL, ...) {
meet_criteria(x, allow_class = "data.frame") # also checks dimensions to be >0 meet_criteria(x, allow_class = "data.frame") # also checks dimensions to be >0
meet_criteria(n, allow_class = c("numeric", "integer"), has_length = 1, is_finite = TRUE, is_positive = TRUE) meet_criteria(n, allow_class = c("numeric", "integer"), has_length = 1, is_finite = TRUE, is_positive = TRUE)
meet_criteria(property, allow_class = "character", has_length = 1, is_in = colnames(AMR::microorganisms)) meet_criteria(property, allow_class = "character", has_length = 1, is_in = colnames(AMR::microorganisms), allow_NULL = TRUE)
meet_criteria(n_for_each, allow_class = c("numeric", "integer"), has_length = 1, is_finite = TRUE, is_positive = TRUE, allow_NULL = TRUE) meet_criteria(n_for_each, allow_class = c("numeric", "integer"), has_length = 1, is_finite = TRUE, is_positive = TRUE, allow_NULL = TRUE)
meet_criteria(property_for_each, allow_class = "character", has_length = 1, is_in = colnames(AMR::microorganisms), allow_NULL = TRUE)
meet_criteria(col_mo, allow_class = "character", has_length = 1, allow_NULL = TRUE, is_in = colnames(x)) meet_criteria(col_mo, allow_class = "character", has_length = 1, allow_NULL = TRUE, is_in = colnames(x))
if (is.null(col_mo)) { if (is.null(col_mo)) {
col_mo <- search_type_in_df(x = x, type = "mo", info = TRUE) col_mo <- search_type_in_df(x = x, type = "mo", info = TRUE)
stop_if(is.null(col_mo), "{.arg col_mo} must be set") stop_if(is.null(col_mo), "{.arg col_mo} must be set")
} }
x.bak <- x .taxonomic_ranks <- c("domain", "kingdom", "phylum", "class", "order", "family", "genus", "species", "subspecies")
if (!is.null(n_for_each) && !is.null(property) && !is.null(property_for_each)) {
prop_rank <- match(property, .taxonomic_ranks)
each_rank <- match(property_for_each, .taxonomic_ranks)
if (!is.na(prop_rank) && !is.na(each_rank) && each_rank <= prop_rank) {
stop_(
"`property_for_each` (\"", property_for_each, "\") must be at a lower ",
"taxonomic rank than `property` (\"", property, "\")"
)
}
}
x.bak <- x
x[, col_mo] <- as.mo(x[, col_mo, drop = TRUE], keep_synonyms = TRUE) x[, col_mo] <- as.mo(x[, col_mo, drop = TRUE], keep_synonyms = TRUE)
if (is.null(property)) { get_prop_val <- function(prop) {
x$prop_val <- x[[col_mo]] if (is.null(prop)) {
} else if (property == "species") { x[[col_mo]]
x$prop_val <- paste(mo_genus(x[[col_mo]], ...), mo_species(x[[col_mo]], ...)) } else if (prop == "species") {
} else if (property == "subspecies") { paste(mo_genus(x[[col_mo]], ...), mo_species(x[[col_mo]], ...))
x$prop_val <- paste(mo_genus(x[[col_mo]], ...), mo_species(x[[col_mo]], ...), mo_subspecies(x[[col_mo]], ...)) } else if (prop == "subspecies") {
paste(mo_genus(x[[col_mo]], ...), mo_species(x[[col_mo]], ...), mo_subspecies(x[[col_mo]], ...))
} else { } else {
x$prop_val <- mo_property(x[[col_mo]], property = property, ...) mo_property(x[[col_mo]], property = prop, ...)
}
} }
counts <- sort(table(x$prop_val), decreasing = TRUE)
n <- as.integer(n) x$prop_val <- get_prop_val(property)
if (length(counts) < n) { counts <- sort(table(x$prop_val), decreasing = TRUE)
n <- length(counts) n <- min(as.integer(n), length(counts))
} filtered_rows <- which(x$prop_val %in% names(counts)[seq_len(n)])
count_values <- names(counts)[seq_len(n)]
filtered_rows <- which(x$prop_val %in% count_values)
if (!is.null(n_for_each)) { if (!is.null(n_for_each)) {
n_for_each <- as.integer(n_for_each) n_for_each <- as.integer(n_for_each)
x$prop_val_each <- get_prop_val(property_for_each)
filtered_x <- x[filtered_rows, , drop = FALSE] filtered_x <- x[filtered_rows, , drop = FALSE]
filtered_x$.orig_row <- filtered_rows
filtered_rows <- do.call( filtered_rows <- do.call(
c, c,
lapply(split(filtered_x, filtered_x$prop_val), function(group) { lapply(split(filtered_x, filtered_x$prop_val), function(group) {
top_values <- names(sort(table(group[[col_mo]]), decreasing = TRUE)[seq_len(n_for_each)]) top_each <- names(sort(table(group$prop_val_each), decreasing = TRUE)[seq_len(n_for_each)])
top_values <- top_values[!is.na(top_values)] group$.orig_row[group$prop_val_each %in% top_each[!is.na(top_each)]]
which(x[[col_mo]] %in% top_values)
}) })
) )
} }

View File

@@ -11,6 +11,7 @@ knitr::opts_chunk$set(
# fig.path = "man/figures/README-", # fig.path = "man/figures/README-",
out.width = "100%" out.width = "100%"
) )
options(width = 100)
AMR:::reset_all_thrown_messages() AMR:::reset_all_thrown_messages()
``` ```

Binary file not shown.

Binary file not shown.

View File

@@ -13,6 +13,7 @@ knitr::opts_chunk$set(
fig.path = "pkgdown/assets/", fig.path = "pkgdown/assets/",
out.width = "100%" out.width = "100%"
) )
options(width = 100)
AMR:::reset_all_thrown_messages() AMR:::reset_all_thrown_messages()
``` ```

View File

@@ -27,12 +27,9 @@
<div style="display: flex; font-size: 0.8em;"> <div style="display: flex; font-size: 0.8em;">
<p style="text-align:left; width: 50%;"> <p style="text-align:left; width: 50%;">
<small><a href="https://amr-for-r.org/">amr-for-r.org</a></small> <small><a href="https://amr-for-r.org/">amr-for-r.org</a></small>
</p> </p>
<p style="text-align:right; width: 50%;"> <p style="text-align:right; width: 50%;">
<small><a href="https://doi.org/10.18637/jss.v104.i03" target="_blank">doi.org/10.18637/jss.v104.i03</a></small> <small><a href="https://doi.org/10.18637/jss.v104.i03" target="_blank">doi.org/10.18637/jss.v104.i03</a></small>
</p> </p>
@@ -64,7 +61,7 @@ formed the basis of two PhD theses ([DOI
[DOI 10.33612/diss.192486375](https://doi.org/10.33612/diss.192486375)). [DOI 10.33612/diss.192486375](https://doi.org/10.33612/diss.192486375)).
After installing this package, R knows [**~97 000 distinct microbial After installing this package, R knows [**~97 000 distinct microbial
species**](./reference/microorganisms.html) (updated May 2026) and all species**](./reference/microorganisms.html) (updated mei 2026) and all
[**~620 antimicrobial and antiviral [**~620 antimicrobial and antiviral
drugs**](./reference/antimicrobials.html) by name and code (including drugs**](./reference/antimicrobials.html) by name and code (including
ATC, EARS-Net, ASIARS-Net, PubChem, LOINC and SNOMED CT), and knows all ATC, EARS-Net, ASIARS-Net, PubChem, LOINC and SNOMED CT), and knows all
@@ -175,11 +172,13 @@ example_isolates %>%
#> Using column mo as input for `mo_fullname()` #> Using column mo as input for `mo_fullname()`
#> Using column mo as input for `mo_is_gram_negative()` #> Using column mo as input for `mo_is_gram_negative()`
#> Using column mo as input for `mo_is_intrinsic_resistant()` #> Using column mo as input for `mo_is_intrinsic_resistant()`
#> Determining intrinsic resistance based on 'EUCAST Expected Resistant #> Determining intrinsic resistance based on 'EUCAST Expected
#> Phenotypes' v1.2 (2023). This note will be shown once per session. #> Resistant Phenotypes' v1.2 (2023). This note will be shown
#> For `aminoglycosides()` using columns GEN (gentamicin), TOB (tobramycin), AMK #> once per session.
#> (amikacin), and KAN (kanamycin) #> For `aminoglycosides()` using columns GEN (gentamicin), TOB
#> For `carbapenems()` using columns IPM (imipenem) and MEM (meropenem) #> (tobramycin), AMK (amikacin), and KAN (kanamycin)
#> For `carbapenems()` using columns IPM (imipenem) and MEM
#> (meropenem)
#> # A tibble: 35 × 7 #> # A tibble: 35 × 7
#> bacteria GEN TOB AMK KAN IPM MEM #> bacteria GEN TOB AMK KAN IPM MEM
#> <chr> <sir> <sir> <sir> <sir> <sir> <sir> #> <chr> <sir> <sir> <sir> <sir> <sir> <sir>
@@ -229,8 +228,8 @@ wisca(example_isolates,
``` ```
| Piperacillin/tazobactam | Piperacillin/tazobactam + Gentamicin | Piperacillin/tazobactam + Tobramycin | | Piperacillin/tazobactam | Piperacillin/tazobactam + Gentamicin | Piperacillin/tazobactam + Tobramycin |
|:---|:---|:---| |:------------------------|:-------------------------------------|:-------------------------------------|
| 69.9% (64.7-75.2%) | 93.7% (92.2-95.1%) | 89.8% (86.8-92.3%) | | 70% (64.7-75.2%) | 93.6% (92.2-95.1%) | 89.8% (87-92.5%) |
WISCA supports stratification by any clinical variable, so you can WISCA supports stratification by any clinical variable, so you can
generate syndrome-specific or ward-specific coverage estimates: generate syndrome-specific or ward-specific coverage estimates:
@@ -244,10 +243,10 @@ wisca(example_isolates,
``` ```
| Syndromic Group | Piperacillin/tazobactam | Piperacillin/tazobactam + Gentamicin | Piperacillin/tazobactam + Tobramycin | | Syndromic Group | Piperacillin/tazobactam | Piperacillin/tazobactam + Gentamicin | Piperacillin/tazobactam + Tobramycin |
|:---|:---|:---|:---| |:----------------|:------------------------|:-------------------------------------|:-------------------------------------|
| Clinical | 74.6% (69-80.1%) | 93.6% (91.9-95.1%) | 90.5% (86.9-93%) | | Clinical | 74.6% (68.6-80.6%) | 93.7% (92.1-95.1%) | 90.4% (87-93.1%) |
| ICU | 57% (48.7-65.8%) | 86.7% (83.7-89.7%) | 82.8% (77.9-87.2%) | | ICU | 57% (48.6-65.7%) | 86.8% (83.6-89.8%) | 82.9% (78.1-87.3%) |
| Outpatient | 57.5% (46.5-68.7%) | 76.7% (70.6-82.4%) | 67.5% (57.2-76.7%) | | Outpatient | 56.9% (45.9-68.2%) | 76.7% (70.6-82.3%) | 68% (57.6-77.2%) |
**For AMR surveillance**, traditional antibiograms remain the right tool **For AMR surveillance**, traditional antibiograms remain the right tool
for tracking resistance per species over time: for tracking resistance per species over time:
@@ -256,11 +255,12 @@ for tracking resistance per species over time:
antibiogram(example_isolates, antibiogram(example_isolates,
mo_transform = "gramstain", mo_transform = "gramstain",
antimicrobials = c("AMC", carbapenems(), "TZP")) antimicrobials = c("AMC", carbapenems(), "TZP"))
#> For `carbapenems()` using columns IPM (imipenem) and MEM (meropenem) #> For `carbapenems()` using columns IPM (imipenem) and MEM
#> (meropenem)
``` ```
| Pathogen | Amoxicillin/clavulanic acid | Imipenem | Meropenem | Piperacillin/tazobactam | | Pathogen | Amoxicillin/clavulanic acid | Imipenem | Meropenem | Piperacillin/tazobactam |
|:---|:---|:---|:---|:---| |:--------------|:----------------------------|:--------------------|:---------------------|:------------------------|
| Gram-negative | 76% (73-79%,N=726) | 99% (98-100%,N=631) | 100% (99-100%,N=626) | 88% (85-91%,N=641) | | Gram-negative | 76% (73-79%,N=726) | 99% (98-100%,N=631) | 100% (99-100%,N=626) | 88% (85-91%,N=641) |
| Gram-positive | 76% (74-79%,N=1138) | 81% (75-85%,N=257) | 77% (70-82%,N=203) | 86% (82-89%,N=345) | | Gram-positive | 76% (74-79%,N=1138) | 81% (75-85%,N=257) | 77% (70-82%,N=203) | 86% (82-89%,N=345) |
@@ -274,7 +274,7 @@ antibiogram(example_isolates,
``` ```
| Pathogen | Piperacillin/tazobactam | Piperacillin/tazobactam + Gentamicin | Piperacillin/tazobactam + Tobramycin | | Pathogen | Piperacillin/tazobactam | Piperacillin/tazobactam + Gentamicin | Piperacillin/tazobactam + Tobramycin |
|:---|:---|:---|:---| |:--------------|:------------------------|:-------------------------------------|:-------------------------------------|
| Gram-negative | 88% (85-91%,N=641) | 99% (97-99%,N=691) | 98% (97-99%,N=693) | | Gram-negative | 88% (85-91%,N=641) | 99% (97-99%,N=691) | 98% (97-99%,N=693) |
| Gram-positive | 86% (82-89%,N=345) | 98% (96-98%,N=1044) | 95% (93-97%,N=550) | | Gram-positive | 86% (82-89%,N=345) | 98% (96-98%,N=1044) | 95% (93-97%,N=550) |
@@ -349,9 +349,10 @@ example_isolates %>%
summarise(across(c(GEN, TOB), summarise(across(c(GEN, TOB),
list(total_R = resistance, list(total_R = resistance,
conf_int = function(x) sir_confidence_interval(x, collapse = "-")))) conf_int = function(x) sir_confidence_interval(x, collapse = "-"))))
#> `resistance()` assumes the EUCAST guideline and thus considers the 'I' #> `resistance()` assumes the EUCAST guideline and thus
#> category susceptible. Set the `guideline` argument or the `AMR_guideline` #> considers the 'I' category susceptible. Set the `guideline`
#> option to either "CLSI" or "EUCAST", see `?AMR-options`. #> argument or the `AMR_guideline` option to either "CLSI" or
#> "EUCAST", see `?AMR-options`.
#> This message will be shown once per session. #> This message will be shown once per session.
#> # A tibble: 3 × 5 #> # A tibble: 3 × 5
#> ward GEN_total_R GEN_conf_int TOB_total_R TOB_conf_int #> ward GEN_total_R GEN_conf_int TOB_total_R TOB_conf_int
@@ -375,15 +376,16 @@ out <- example_isolates %>%
# calculate AMR using resistance(), over all aminoglycosides and polymyxins: # calculate AMR using resistance(), over all aminoglycosides and polymyxins:
summarise(across(c(aminoglycosides(), polymyxins()), summarise(across(c(aminoglycosides(), polymyxins()),
resistance)) resistance))
#> For `aminoglycosides()` using columns GEN (gentamicin), TOB (tobramycin), AMK #> For `aminoglycosides()` using columns GEN (gentamicin), TOB
#> (amikacin), and KAN (kanamycin) #> (tobramycin), AMK (amikacin), and KAN (kanamycin)
#> For `polymyxins()` using column COL (colistin) #> For `polymyxins()` using column COL (colistin)
#> Warning: There was 1 warning in `summarise()`. #> Warning: There was 1 warning in `summarise()`.
#> In argument: `across(c(aminoglycosides(), polymyxins()), resistance)`. #> In argument: `across(c(aminoglycosides(), polymyxins()),
#> resistance)`.
#> In group 3: `ward = "Outpatient"`. #> In group 3: `ward = "Outpatient"`.
#> Caused by warning: #> Caused by warning:
#> ! Introducing NA: only 23 results available for KAN in group: ward = "Outpatient" #> ! Introducing NA: only 23 results available for KAN in group:
#> (whilst `minimum = 30`). #> ward = "Outpatient" (whilst `minimum = 30`).
out out
#> # A tibble: 3 × 6 #> # A tibble: 3 × 6
#> ward GEN TOB AMK KAN COL #> ward GEN TOB AMK KAN COL

View File

@@ -12,7 +12,7 @@ The \code{AMR} package is a peer-reviewed, \href{https://amr-for-r.org/#copyrigh
This work was published in the Journal of Statistical Software (Volume 104(3); \doi{10.18637/jss.v104.i03}) and formed the basis of two PhD theses (\doi{10.33612/diss.177417131} and \doi{10.33612/diss.192486375}). This work was published in the Journal of Statistical Software (Volume 104(3); \doi{10.18637/jss.v104.i03}) and formed the basis of two PhD theses (\doi{10.33612/diss.177417131} and \doi{10.33612/diss.192486375}).
After installing this package, R knows \href{https://amr-for-r.org/reference/microorganisms.html}{\strong{~97 000 distinct microbial species}} (updated May 2026) and all \href{https://amr-for-r.org/reference/antimicrobials.html}{\strong{~620 antimicrobial and antiviral drugs}} by name and code (including ATC, EARS-Net, ASIARS-Net, PubChem, LOINC and SNOMED CT), and knows all about valid SIR and MIC values. The integral clinical breakpoint guidelines from CLSI 2011-2026 and EUCAST 2011-2026 are included, even with epidemiological cut-off (ECOFF) values. It supports and can read any data format, including WHONET data. This package works on Windows, macOS and Linux with all versions of R since R-3.0 (April 2013). \strong{It was designed to work in any setting, including those with very limited resources}. It was created for both routine data analysis and academic research at the Faculty of Medical Sciences of the \href{https://www.rug.nl}{University of Groningen} and the \href{https://www.umcg.nl}{University Medical Center Groningen}. After installing this package, R knows \href{https://amr-for-r.org/reference/microorganisms.html}{\strong{~97 000 distinct microbial species}} (updated mei 2026) and all \href{https://amr-for-r.org/reference/antimicrobials.html}{\strong{~620 antimicrobial and antiviral drugs}} by name and code (including ATC, EARS-Net, ASIARS-Net, PubChem, LOINC and SNOMED CT), and knows all about valid SIR and MIC values. The integral clinical breakpoint guidelines from CLSI 2011-2026 and EUCAST 2011-2026 are included, even with epidemiological cut-off (ECOFF) values. It supports and can read any data format, including WHONET data. This package works on Windows, macOS and Linux with all versions of R since R-3.0 (April 2013). \strong{It was designed to work in any setting, including those with very limited resources}. It was created for both routine data analysis and academic research at the Faculty of Medical Sciences of the \href{https://www.rug.nl}{University of Groningen} and the \href{https://www.umcg.nl}{University Medical Center Groningen}.
The \code{AMR} package is available in English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Swahili, Swedish, Turkish, Ukrainian, Urdu, and Vietnamese. Antimicrobial drug (group) names and colloquial microorganism names are provided in these languages. The \code{AMR} package is available in English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Swahili, Swedish, Turkish, Ukrainian, Urdu, and Vietnamese. Antimicrobial drug (group) names and colloquial microorganism names are provided in these languages.
} }

View File

@@ -46,7 +46,7 @@ A list with class \code{"htest"} containing the following
\code{(observed - expected) / sqrt(expected)}.} \code{(observed - expected) / sqrt(expected)}.}
\item{stdres}{standardized residuals, \item{stdres}{standardized residuals,
\code{(observed - expected) / sqrt(V)}, where \code{V} is the \code{(observed - expected) / sqrt(V)}, where \code{V} is the
residual cell variance (Agresti, 2007, section 2.4.5 residual cell variance {(\if{html}{\out{<a href="#reference+chisq.test.Rd+R+3AAgresti+3A2007" class="citation">}}Agresti 2007\if{html}{\out{</a>}}, section 2.4.5)}
for the case where \code{x} is a matrix, \code{n * p * (1 - p)} otherwise).} for the case where \code{x} is a matrix, \code{n * p * (1 - p)} otherwise).}
} }
\description{ \description{

View File

@@ -59,8 +59,9 @@ ggplot_pca(
} }
\item{pc.biplot}{ \item{pc.biplot}{
If true, use what Gabriel (1971) refers to as a "principal component If true, use what {\if{html}{\cite{}\out{<a href="#reference+biplot.princomp.Rd+R+3AGabriel+3A1971" class="citation">}}Gabriel (1971)\if{html}{\out{</a>}}} refers to as a
biplot", with \code{lambda = 1} and observations scaled up by sqrt(n) and \dQuote{principal component biplot},
with \code{lambda = 1} and observations scaled up by sqrt(n) and
variables scaled down by sqrt(n). Then inner products between variables scaled down by sqrt(n). Then inner products between
variables approximate covariances and distances between observations variables approximate covariances and distances between observations
approximate Mahalanobis distance. approximate Mahalanobis distance.

View File

@@ -9,6 +9,7 @@ top_n_microorganisms(
n, n,
property = "species", property = "species",
n_for_each = NULL, n_for_each = NULL,
property_for_each = "species",
col_mo = NULL, col_mo = NULL,
... ...
) )
@@ -16,37 +17,40 @@ top_n_microorganisms(
\arguments{ \arguments{
\item{x}{A data frame containing microbial data.} \item{x}{A data frame containing microbial data.}
\item{n}{An integer specifying the maximum number of unique values of the \code{property} to include in the output.} \item{n}{A positive whole number specifying the maximum number of unique values of \code{property} to include in the output.}
\item{property}{A character string indicating the microorganism property to use for filtering. Must be one of the column names of the \link{microorganisms} data set: \code{"mo"}, \code{"fullname"}, \code{"status"}, \code{"domain"}, \code{"kingdom"}, \code{"phylum"}, \code{"class"}, \code{"order"}, \code{"family"}, \code{"genus"}, \code{"species"}, \code{"subspecies"}, \code{"rank"}, \code{"ref"}, \code{"oxygen_tolerance"}, \code{"morphology"}, \code{"source"}, \code{"lpsn"}, \code{"lpsn_parent"}, \code{"lpsn_renamed_to"}, \code{"mycobank"}, \code{"mycobank_parent"}, \code{"mycobank_renamed_to"}, \code{"gbif"}, \code{"gbif_parent"}, \code{"gbif_renamed_to"}, \code{"prevalence"}, or \code{"snomed"}. If \code{NULL}, the raw values from \code{col_mo} will be used without transformation. When using \code{"species"} (default) or \code{"subpecies"}, the genus will be added to make sure each (sub)species still belongs to the right genus.} \item{property}{A character string indicating the microorganism property to use for filtering. Must be one of the column names of the \link{microorganisms} data set: \code{"mo"}, \code{"fullname"}, \code{"status"}, \code{"domain"}, \code{"kingdom"}, \code{"phylum"}, \code{"class"}, \code{"order"}, \code{"family"}, \code{"genus"}, \code{"species"}, \code{"subspecies"}, \code{"rank"}, \code{"ref"}, \code{"oxygen_tolerance"}, \code{"morphology"}, \code{"source"}, \code{"lpsn"}, \code{"lpsn_parent"}, \code{"lpsn_renamed_to"}, \code{"mycobank"}, \code{"mycobank_parent"}, \code{"mycobank_renamed_to"}, \code{"gbif"}, \code{"gbif_parent"}, \code{"gbif_renamed_to"}, \code{"prevalence"}, or \code{"snomed"}. If \code{NULL}, the raw values from \code{col_mo} will be used without transformation. When using \code{"species"} (default) or \code{"subspecies"}, the genus is prepended to ensure each name is unambiguous.}
\item{n_for_each}{An optional integer specifying the maximum number of rows to retain for each value of the selected property. If \code{NULL}, all rows within the top \emph{n} groups will be included.} \item{n_for_each}{An optional positive whole number specifying the maximum number of distinct microorganism groups at the level of \code{property_for_each} to retain within each of the top \emph{n} groups. Only used when \code{property_for_each} is also set.}
\item{property_for_each}{The microorganism property to use for sub-grouping within each top \emph{n} group. Must be one of the column names of the \link{microorganisms} data set and at a strictly lower taxonomic rank than \code{property} (allowed order: domain > kingdom > phylum > class > order > family > genus > species > subspecies). Defaults to \code{"species"}. Only relevant when \code{n_for_each} is set.}
\item{col_mo}{A character string indicating the column in \code{x} that contains microorganism names or codes. Defaults to the first column of class \code{\link{mo}}. Values will be coerced using \code{\link[=as.mo]{as.mo()}}.} \item{col_mo}{A character string indicating the column in \code{x} that contains microorganism names or codes. Defaults to the first column of class \code{\link{mo}}. Values will be coerced using \code{\link[=as.mo]{as.mo()}}.}
\item{...}{Additional arguments passed on to \code{\link[=mo_property]{mo_property()}} when \code{property} is not \code{NULL}.} \item{...}{Additional arguments passed on to \code{\link[=mo_property]{mo_property()}} when \code{property} is not \code{NULL}.}
} }
\description{ \description{
This function filters a data set to include only the top \emph{n} microorganisms based on a specified property, such as taxonomic family or genus. For example, it can filter a data set to the top 3 species, or to any species in the top 5 genera, or to the top 3 species in each of the top 5 genera. Filters a data set to include only the top \emph{n} microorganisms based on a specified property, such as taxonomic family or genus. For example, it can filter a data set to the top 3 species, to any species in the top 5 genera, or to the top 3 species in each of the top 5 genera.
} }
\details{ \details{
This function is useful for preprocessing data before creating \link[=antibiogram]{antibiograms} or other analyses that require focused subsets of microbial data. For example, it can filter a data set to only include isolates from the top 10 species. This function is useful for preprocessing data before creating \link[=antibiogram]{antibiograms} or other analyses that require focused subsets of microbial data.
} }
\examples{ \examples{
# filter to the top 3 species: # filter to the top 3 species:
top_n_microorganisms(example_isolates, top_n_microorganisms(example_isolates, n = 3)
n = 3
)
# filter to any species in the top 5 genera: # filter to any species in the top 5 genera:
top_n_microorganisms(example_isolates, top_n_microorganisms(example_isolates, n = 5, property = "genus")
n = 5, property = "genus"
)
# filter to the top 3 species in each of the top 5 genera: # filter to the top 3 species in each of the top 5 genera:
top_n_microorganisms(example_isolates, top_n_microorganisms(example_isolates,
n = 5, property = "genus", n_for_each = 3 n = 5, property = "genus", n_for_each = 3
) )
# filter to the top 2 genera in each of the top 3 families:
top_n_microorganisms(example_isolates,
n = 3, property = "family", n_for_each = 2, property_for_each = "genus"
)
} }
\seealso{ \seealso{
\code{\link[=mo_property]{mo_property()}}, \code{\link[=as.mo]{as.mo()}}, \code{\link[=antibiogram]{antibiogram()}} \code{\link[=mo_property]{mo_property()}}, \code{\link[=as.mo]{as.mo()}}, \code{\link[=antibiogram]{antibiogram()}}