1
0
mirror of https://github.com/msberends/AMR.git synced 2026-06-29 05:36:19 +02:00

improve top_n_microorganisms(): add property_for_each, fix property=NULL, enforce rank order (#297)

This commit is contained in:
Matthijs Berends
2026-06-26 21:40:11 +02:00
committed by GitHub
parent 02bd9a71c1
commit f7d353361c
15 changed files with 143 additions and 95 deletions

View File

@@ -85,6 +85,27 @@ _pkgdown.yml # pkgdown website configuration
- `translate.R` — 28-language translation system
- `ggplot_sir.R` / `ggplot_pca.R` / `plotting.R` — visualisation functions
## Code Style
Follow the [tidyverse style guide](https://style.tidyverse.org/) precisely. Key rules:
- 2-space indentation; no tabs
- `<-` for assignment, not `=`
- Spaces around all binary operators and after commas; no spaces inside parentheses
- When a function call must break across lines, place the first argument on a new line indented by 2 spaces, and put the closing `)` on its own line — **never align arguments to the opening parenthesis** (no hanging/forced mid-line indentation)
```r
# good
stop_(
"some long message part one ",
"part two"
)
# bad — forces indentation to match the opening parenthesis
stop_("some long message part one ",
"part two")
```
## Custom S3 Classes
The package defines five S3 classes with full print/format/plot/vctrs support:

View File

@@ -1,5 +1,5 @@
Package: AMR
Version: 3.0.1.9076
Version: 3.0.1.9077
Date: 2026-06-26
Title: Antimicrobial Resistance Data Analysis
Description: Functions to simplify and standardise antimicrobial resistance (AMR)

View File

@@ -1,4 +1,4 @@
# AMR 3.0.1.9076
# AMR 3.0.1.9077
Planned as v3.1.0, end of June 2026.
@@ -37,6 +37,7 @@ Planned as v3.1.0, end of June 2026.
* Fixed some EUCAST Expert Rules, mostly on *S. pneumoniae*
### Updated
* `top_n_microorganisms()`: new `property_for_each` argument for sub-grouping within top *n* groups; rank ordering enforced (only lower taxonomic ranks allowed); fixed `property = NULL` not being accepted; inner filter now tracks original row indices to prevent cross-group contamination
* Taxonomic update for all microorganisms, now updated to June 2026
* `mo_kingdom()` now returns the formal taxonomic kingdom; a one-time note per session explains the change when querying bacterial or archaeal records.
* `mo_taxonomy()` and `mo_info()` gained `domain` for the list output

Binary file not shown.

View File

@@ -120,13 +120,14 @@ all_disk_predictors <- function() {
#' @rdname amr-tidymodels
#' @export
step_mic_log2 <- function(
recipe,
...,
role = NA,
trained = FALSE,
columns = NULL,
skip = FALSE,
id = recipes::rand_id("mic_log2")) {
recipe,
...,
role = NA,
trained = FALSE,
columns = NULL,
skip = FALSE,
id = recipes::rand_id("mic_log2")
) {
recipes::add_step(
recipe,
step_mic_log2_new(
@@ -195,13 +196,14 @@ tidy.step_mic_log2 <- function(x, ...) {
#' @rdname amr-tidymodels
#' @export
step_sir_numeric <- function(
recipe,
...,
role = NA,
trained = FALSE,
columns = NULL,
skip = FALSE,
id = recipes::rand_id("sir_numeric")) {
recipe,
...,
role = NA,
trained = FALSE,
columns = NULL,
skip = FALSE,
id = recipes::rand_id("sir_numeric")
) {
recipes::add_step(
recipe,
step_sir_numeric_new(

View File

@@ -29,73 +29,88 @@
#' Filter Top *n* Microorganisms
#'
#' This function filters a data set to include only the top *n* microorganisms based on a specified property, such as taxonomic family or genus. For example, it can filter a data set to the top 3 species, or to any species in the top 5 genera, or to the top 3 species in each of the top 5 genera.
#' Filters a data set to include only the top *n* microorganisms based on a specified property, such as taxonomic family or genus. For example, it can filter a data set to the top 3 species, to any species in the top 5 genera, or to the top 3 species in each of the top 5 genera.
#' @param x A data frame containing microbial data.
#' @param n An integer specifying the maximum number of unique values of the `property` to include in the output.
#' @param property A character string indicating the microorganism property to use for filtering. Must be one of the column names of the [microorganisms] data set: `r vector_or(colnames(microorganisms), sort = FALSE, documentation = TRUE)`. If `NULL`, the raw values from `col_mo` will be used without transformation. When using `"species"` (default) or `"subpecies"`, the genus will be added to make sure each (sub)species still belongs to the right genus.
#' @param n_for_each An optional integer specifying the maximum number of rows to retain for each value of the selected property. If `NULL`, all rows within the top *n* groups will be included.
#' @param n A positive whole number specifying the maximum number of unique values of `property` to include in the output.
#' @param property A character string indicating the microorganism property to use for filtering. Must be one of the column names of the [microorganisms] data set: `r vector_or(colnames(microorganisms), sort = FALSE, documentation = TRUE)`. If `NULL`, the raw values from `col_mo` will be used without transformation. When using `"species"` (default) or `"subspecies"`, the genus is prepended to ensure each name is unambiguous.
#' @param n_for_each An optional positive whole number specifying the maximum number of distinct microorganism groups at the level of `property_for_each` to retain within each of the top *n* groups. Only used when `property_for_each` is also set.
#' @param property_for_each The microorganism property to use for sub-grouping within each top *n* group. Must be one of the column names of the [microorganisms] data set and at a strictly lower taxonomic rank than `property` (allowed order: domain > kingdom > phylum > class > order > family > genus > species > subspecies). Defaults to `"species"`. Only relevant when `n_for_each` is set.
#' @param col_mo A character string indicating the column in `x` that contains microorganism names or codes. Defaults to the first column of class [`mo`]. Values will be coerced using [as.mo()].
#' @param ... Additional arguments passed on to [mo_property()] when `property` is not `NULL`.
#' @details This function is useful for preprocessing data before creating [antibiograms][antibiogram()] or other analyses that require focused subsets of microbial data. For example, it can filter a data set to only include isolates from the top 10 species.
#' @details This function is useful for preprocessing data before creating [antibiograms][antibiogram()] or other analyses that require focused subsets of microbial data.
#' @export
#' @seealso [mo_property()], [as.mo()], [antibiogram()]
#' @examples
#' # filter to the top 3 species:
#' top_n_microorganisms(example_isolates,
#' n = 3
#' )
#' top_n_microorganisms(example_isolates, n = 3)
#'
#' # filter to any species in the top 5 genera:
#' top_n_microorganisms(example_isolates,
#' n = 5, property = "genus"
#' )
#' top_n_microorganisms(example_isolates, n = 5, property = "genus")
#'
#' # filter to the top 3 species in each of the top 5 genera:
#' top_n_microorganisms(example_isolates,
#' n = 5, property = "genus", n_for_each = 3
#' )
top_n_microorganisms <- function(x, n, property = "species", n_for_each = NULL, col_mo = NULL, ...) {
#'
#' # filter to the top 2 genera in each of the top 3 families:
#' top_n_microorganisms(example_isolates,
#' n = 3, property = "family", n_for_each = 2, property_for_each = "genus"
#' )
top_n_microorganisms <- function(x, n, property = "species", n_for_each = NULL, property_for_each = "species", col_mo = NULL, ...) {
meet_criteria(x, allow_class = "data.frame") # also checks dimensions to be >0
meet_criteria(n, allow_class = c("numeric", "integer"), has_length = 1, is_finite = TRUE, is_positive = TRUE)
meet_criteria(property, allow_class = "character", has_length = 1, is_in = colnames(AMR::microorganisms))
meet_criteria(property, allow_class = "character", has_length = 1, is_in = colnames(AMR::microorganisms), allow_NULL = TRUE)
meet_criteria(n_for_each, allow_class = c("numeric", "integer"), has_length = 1, is_finite = TRUE, is_positive = TRUE, allow_NULL = TRUE)
meet_criteria(property_for_each, allow_class = "character", has_length = 1, is_in = colnames(AMR::microorganisms), allow_NULL = TRUE)
meet_criteria(col_mo, allow_class = "character", has_length = 1, allow_NULL = TRUE, is_in = colnames(x))
if (is.null(col_mo)) {
col_mo <- search_type_in_df(x = x, type = "mo", info = TRUE)
stop_if(is.null(col_mo), "{.arg col_mo} must be set")
}
x.bak <- x
.taxonomic_ranks <- c("domain", "kingdom", "phylum", "class", "order", "family", "genus", "species", "subspecies")
if (!is.null(n_for_each) && !is.null(property) && !is.null(property_for_each)) {
prop_rank <- match(property, .taxonomic_ranks)
each_rank <- match(property_for_each, .taxonomic_ranks)
if (!is.na(prop_rank) && !is.na(each_rank) && each_rank <= prop_rank) {
stop_(
"`property_for_each` (\"", property_for_each, "\") must be at a lower ",
"taxonomic rank than `property` (\"", property, "\")"
)
}
}
x.bak <- x
x[, col_mo] <- as.mo(x[, col_mo, drop = TRUE], keep_synonyms = TRUE)
if (is.null(property)) {
x$prop_val <- x[[col_mo]]
} else if (property == "species") {
x$prop_val <- paste(mo_genus(x[[col_mo]], ...), mo_species(x[[col_mo]], ...))
} else if (property == "subspecies") {
x$prop_val <- paste(mo_genus(x[[col_mo]], ...), mo_species(x[[col_mo]], ...), mo_subspecies(x[[col_mo]], ...))
} else {
x$prop_val <- mo_property(x[[col_mo]], property = property, ...)
get_prop_val <- function(prop) {
if (is.null(prop)) {
x[[col_mo]]
} else if (prop == "species") {
paste(mo_genus(x[[col_mo]], ...), mo_species(x[[col_mo]], ...))
} else if (prop == "subspecies") {
paste(mo_genus(x[[col_mo]], ...), mo_species(x[[col_mo]], ...), mo_subspecies(x[[col_mo]], ...))
} else {
mo_property(x[[col_mo]], property = prop, ...)
}
}
counts <- sort(table(x$prop_val), decreasing = TRUE)
n <- as.integer(n)
if (length(counts) < n) {
n <- length(counts)
}
count_values <- names(counts)[seq_len(n)]
filtered_rows <- which(x$prop_val %in% count_values)
x$prop_val <- get_prop_val(property)
counts <- sort(table(x$prop_val), decreasing = TRUE)
n <- min(as.integer(n), length(counts))
filtered_rows <- which(x$prop_val %in% names(counts)[seq_len(n)])
if (!is.null(n_for_each)) {
n_for_each <- as.integer(n_for_each)
x$prop_val_each <- get_prop_val(property_for_each)
filtered_x <- x[filtered_rows, , drop = FALSE]
filtered_x$.orig_row <- filtered_rows
filtered_rows <- do.call(
c,
lapply(split(filtered_x, filtered_x$prop_val), function(group) {
top_values <- names(sort(table(group[[col_mo]]), decreasing = TRUE)[seq_len(n_for_each)])
top_values <- top_values[!is.na(top_values)]
which(x[[col_mo]] %in% top_values)
top_each <- names(sort(table(group$prop_val_each), decreasing = TRUE)[seq_len(n_for_each)])
group$.orig_row[group$prop_val_each %in% top_each[!is.na(top_each)]]
})
)
}

View File

@@ -11,6 +11,7 @@ knitr::opts_chunk$set(
# fig.path = "man/figures/README-",
out.width = "100%"
)
options(width = 100)
AMR:::reset_all_thrown_messages()
```

Binary file not shown.

Binary file not shown.

View File

@@ -13,6 +13,7 @@ knitr::opts_chunk$set(
fig.path = "pkgdown/assets/",
out.width = "100%"
)
options(width = 100)
AMR:::reset_all_thrown_messages()
```

View File

@@ -27,12 +27,9 @@
<div style="display: flex; font-size: 0.8em;">
<p style="text-align:left; width: 50%;">
<small><a href="https://amr-for-r.org/">amr-for-r.org</a></small>
</p>
<p style="text-align:right; width: 50%;">
<small><a href="https://doi.org/10.18637/jss.v104.i03" target="_blank">doi.org/10.18637/jss.v104.i03</a></small>
</p>
@@ -64,7 +61,7 @@ formed the basis of two PhD theses ([DOI
[DOI 10.33612/diss.192486375](https://doi.org/10.33612/diss.192486375)).
After installing this package, R knows [**~97 000 distinct microbial
species**](./reference/microorganisms.html) (updated May 2026) and all
species**](./reference/microorganisms.html) (updated mei 2026) and all
[**~620 antimicrobial and antiviral
drugs**](./reference/antimicrobials.html) by name and code (including
ATC, EARS-Net, ASIARS-Net, PubChem, LOINC and SNOMED CT), and knows all
@@ -175,11 +172,13 @@ example_isolates %>%
#> Using column mo as input for `mo_fullname()`
#> Using column mo as input for `mo_is_gram_negative()`
#> Using column mo as input for `mo_is_intrinsic_resistant()`
#> Determining intrinsic resistance based on 'EUCAST Expected Resistant
#> Phenotypes' v1.2 (2023). This note will be shown once per session.
#> For `aminoglycosides()` using columns GEN (gentamicin), TOB (tobramycin), AMK
#> (amikacin), and KAN (kanamycin)
#> For `carbapenems()` using columns IPM (imipenem) and MEM (meropenem)
#> Determining intrinsic resistance based on 'EUCAST Expected
#> Resistant Phenotypes' v1.2 (2023). This note will be shown
#> once per session.
#> For `aminoglycosides()` using columns GEN (gentamicin), TOB
#> (tobramycin), AMK (amikacin), and KAN (kanamycin)
#> For `carbapenems()` using columns IPM (imipenem) and MEM
#> (meropenem)
#> # A tibble: 35 × 7
#> bacteria GEN TOB AMK KAN IPM MEM
#> <chr> <sir> <sir> <sir> <sir> <sir> <sir>
@@ -229,8 +228,8 @@ wisca(example_isolates,
```
| Piperacillin/tazobactam | Piperacillin/tazobactam + Gentamicin | Piperacillin/tazobactam + Tobramycin |
|:---|:---|:---|
| 69.9% (64.7-75.2%) | 93.7% (92.2-95.1%) | 89.8% (86.8-92.3%) |
|:------------------------|:-------------------------------------|:-------------------------------------|
| 70% (64.7-75.2%) | 93.6% (92.2-95.1%) | 89.8% (87-92.5%) |
WISCA supports stratification by any clinical variable, so you can
generate syndrome-specific or ward-specific coverage estimates:
@@ -244,10 +243,10 @@ wisca(example_isolates,
```
| Syndromic Group | Piperacillin/tazobactam | Piperacillin/tazobactam + Gentamicin | Piperacillin/tazobactam + Tobramycin |
|:---|:---|:---|:---|
| Clinical | 74.6% (69-80.1%) | 93.6% (91.9-95.1%) | 90.5% (86.9-93%) |
| ICU | 57% (48.7-65.8%) | 86.7% (83.7-89.7%) | 82.8% (77.9-87.2%) |
| Outpatient | 57.5% (46.5-68.7%) | 76.7% (70.6-82.4%) | 67.5% (57.2-76.7%) |
|:----------------|:------------------------|:-------------------------------------|:-------------------------------------|
| Clinical | 74.6% (68.6-80.6%) | 93.7% (92.1-95.1%) | 90.4% (87-93.1%) |
| ICU | 57% (48.6-65.7%) | 86.8% (83.6-89.8%) | 82.9% (78.1-87.3%) |
| Outpatient | 56.9% (45.9-68.2%) | 76.7% (70.6-82.3%) | 68% (57.6-77.2%) |
**For AMR surveillance**, traditional antibiograms remain the right tool
for tracking resistance per species over time:
@@ -256,13 +255,14 @@ for tracking resistance per species over time:
antibiogram(example_isolates,
mo_transform = "gramstain",
antimicrobials = c("AMC", carbapenems(), "TZP"))
#> For `carbapenems()` using columns IPM (imipenem) and MEM (meropenem)
#> For `carbapenems()` using columns IPM (imipenem) and MEM
#> (meropenem)
```
| Pathogen | Amoxicillin/clavulanic acid | Imipenem | Meropenem | Piperacillin/tazobactam |
|:---|:---|:---|:---|:---|
| Gram-negative | 76% (73-79%,N=726) | 99% (98-100%,N=631) | 100% (99-100%,N=626) | 88% (85-91%,N=641) |
| Gram-positive | 76% (74-79%,N=1138) | 81% (75-85%,N=257) | 77% (70-82%,N=203) | 86% (82-89%,N=345) |
| Pathogen | Amoxicillin/clavulanic acid | Imipenem | Meropenem | Piperacillin/tazobactam |
|:--------------|:----------------------------|:--------------------|:---------------------|:------------------------|
| Gram-negative | 76% (73-79%,N=726) | 99% (98-100%,N=631) | 100% (99-100%,N=626) | 88% (85-91%,N=641) |
| Gram-positive | 76% (74-79%,N=1138) | 81% (75-85%,N=257) | 77% (70-82%,N=203) | 86% (82-89%,N=345) |
Combination antibiograms show the additional coverage gained by adding a
second agent, stratified by species:
@@ -273,10 +273,10 @@ antibiogram(example_isolates,
antimicrobials = c("TZP", "TZP+TOB", "TZP+GEN"))
```
| Pathogen | Piperacillin/tazobactam | Piperacillin/tazobactam + Gentamicin | Piperacillin/tazobactam + Tobramycin |
|:---|:---|:---|:---|
| Gram-negative | 88% (85-91%,N=641) | 99% (97-99%,N=691) | 98% (97-99%,N=693) |
| Gram-positive | 86% (82-89%,N=345) | 98% (96-98%,N=1044) | 95% (93-97%,N=550) |
| Pathogen | Piperacillin/tazobactam | Piperacillin/tazobactam + Gentamicin | Piperacillin/tazobactam + Tobramycin |
|:--------------|:------------------------|:-------------------------------------|:-------------------------------------|
| Gram-negative | 88% (85-91%,N=641) | 99% (97-99%,N=691) | 98% (97-99%,N=693) |
| Gram-positive | 86% (82-89%,N=345) | 98% (96-98%,N=1044) | 95% (93-97%,N=550) |
Like many other functions in this package, `antibiogram()` and `wisca()`
come with support for 28 languages that are often detected automatically
@@ -349,9 +349,10 @@ example_isolates %>%
summarise(across(c(GEN, TOB),
list(total_R = resistance,
conf_int = function(x) sir_confidence_interval(x, collapse = "-"))))
#> `resistance()` assumes the EUCAST guideline and thus considers the 'I'
#> category susceptible. Set the `guideline` argument or the `AMR_guideline`
#> option to either "CLSI" or "EUCAST", see `?AMR-options`.
#> `resistance()` assumes the EUCAST guideline and thus
#> considers the 'I' category susceptible. Set the `guideline`
#> argument or the `AMR_guideline` option to either "CLSI" or
#> "EUCAST", see `?AMR-options`.
#> This message will be shown once per session.
#> # A tibble: 3 × 5
#> ward GEN_total_R GEN_conf_int TOB_total_R TOB_conf_int
@@ -375,15 +376,16 @@ out <- example_isolates %>%
# calculate AMR using resistance(), over all aminoglycosides and polymyxins:
summarise(across(c(aminoglycosides(), polymyxins()),
resistance))
#> For `aminoglycosides()` using columns GEN (gentamicin), TOB (tobramycin), AMK
#> (amikacin), and KAN (kanamycin)
#> For `aminoglycosides()` using columns GEN (gentamicin), TOB
#> (tobramycin), AMK (amikacin), and KAN (kanamycin)
#> For `polymyxins()` using column COL (colistin)
#> Warning: There was 1 warning in `summarise()`.
#> In argument: `across(c(aminoglycosides(), polymyxins()), resistance)`.
#> In argument: `across(c(aminoglycosides(), polymyxins()),
#> resistance)`.
#> In group 3: `ward = "Outpatient"`.
#> Caused by warning:
#> ! Introducing NA: only 23 results available for KAN in group: ward = "Outpatient"
#> (whilst `minimum = 30`).
#> ! Introducing NA: only 23 results available for KAN in group:
#> ward = "Outpatient" (whilst `minimum = 30`).
out
#> # A tibble: 3 × 6
#> ward GEN TOB AMK KAN COL

View File

@@ -12,7 +12,7 @@ The \code{AMR} package is a peer-reviewed, \href{https://amr-for-r.org/#copyrigh
This work was published in the Journal of Statistical Software (Volume 104(3); \doi{10.18637/jss.v104.i03}) and formed the basis of two PhD theses (\doi{10.33612/diss.177417131} and \doi{10.33612/diss.192486375}).
After installing this package, R knows \href{https://amr-for-r.org/reference/microorganisms.html}{\strong{~97 000 distinct microbial species}} (updated May 2026) and all \href{https://amr-for-r.org/reference/antimicrobials.html}{\strong{~620 antimicrobial and antiviral drugs}} by name and code (including ATC, EARS-Net, ASIARS-Net, PubChem, LOINC and SNOMED CT), and knows all about valid SIR and MIC values. The integral clinical breakpoint guidelines from CLSI 2011-2026 and EUCAST 2011-2026 are included, even with epidemiological cut-off (ECOFF) values. It supports and can read any data format, including WHONET data. This package works on Windows, macOS and Linux with all versions of R since R-3.0 (April 2013). \strong{It was designed to work in any setting, including those with very limited resources}. It was created for both routine data analysis and academic research at the Faculty of Medical Sciences of the \href{https://www.rug.nl}{University of Groningen} and the \href{https://www.umcg.nl}{University Medical Center Groningen}.
After installing this package, R knows \href{https://amr-for-r.org/reference/microorganisms.html}{\strong{~97 000 distinct microbial species}} (updated mei 2026) and all \href{https://amr-for-r.org/reference/antimicrobials.html}{\strong{~620 antimicrobial and antiviral drugs}} by name and code (including ATC, EARS-Net, ASIARS-Net, PubChem, LOINC and SNOMED CT), and knows all about valid SIR and MIC values. The integral clinical breakpoint guidelines from CLSI 2011-2026 and EUCAST 2011-2026 are included, even with epidemiological cut-off (ECOFF) values. It supports and can read any data format, including WHONET data. This package works on Windows, macOS and Linux with all versions of R since R-3.0 (April 2013). \strong{It was designed to work in any setting, including those with very limited resources}. It was created for both routine data analysis and academic research at the Faculty of Medical Sciences of the \href{https://www.rug.nl}{University of Groningen} and the \href{https://www.umcg.nl}{University Medical Center Groningen}.
The \code{AMR} package is available in English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Swahili, Swedish, Turkish, Ukrainian, Urdu, and Vietnamese. Antimicrobial drug (group) names and colloquial microorganism names are provided in these languages.
}

View File

@@ -46,7 +46,7 @@ A list with class \code{"htest"} containing the following
\code{(observed - expected) / sqrt(expected)}.}
\item{stdres}{standardized residuals,
\code{(observed - expected) / sqrt(V)}, where \code{V} is the
residual cell variance (Agresti, 2007, section 2.4.5
residual cell variance {(\if{html}{\out{<a href="#reference+chisq.test.Rd+R+3AAgresti+3A2007" class="citation">}}Agresti 2007\if{html}{\out{</a>}}, section 2.4.5)}
for the case where \code{x} is a matrix, \code{n * p * (1 - p)} otherwise).}
}
\description{

View File

@@ -59,8 +59,9 @@ ggplot_pca(
}
\item{pc.biplot}{
If true, use what Gabriel (1971) refers to as a "principal component
biplot", with \code{lambda = 1} and observations scaled up by sqrt(n) and
If true, use what {\if{html}{\cite{}\out{<a href="#reference+biplot.princomp.Rd+R+3AGabriel+3A1971" class="citation">}}Gabriel (1971)\if{html}{\out{</a>}}} refers to as a
\dQuote{principal component biplot},
with \code{lambda = 1} and observations scaled up by sqrt(n) and
variables scaled down by sqrt(n). Then inner products between
variables approximate covariances and distances between observations
approximate Mahalanobis distance.

View File

@@ -9,6 +9,7 @@ top_n_microorganisms(
n,
property = "species",
n_for_each = NULL,
property_for_each = "species",
col_mo = NULL,
...
)
@@ -16,37 +17,40 @@ top_n_microorganisms(
\arguments{
\item{x}{A data frame containing microbial data.}
\item{n}{An integer specifying the maximum number of unique values of the \code{property} to include in the output.}
\item{n}{A positive whole number specifying the maximum number of unique values of \code{property} to include in the output.}
\item{property}{A character string indicating the microorganism property to use for filtering. Must be one of the column names of the \link{microorganisms} data set: \code{"mo"}, \code{"fullname"}, \code{"status"}, \code{"domain"}, \code{"kingdom"}, \code{"phylum"}, \code{"class"}, \code{"order"}, \code{"family"}, \code{"genus"}, \code{"species"}, \code{"subspecies"}, \code{"rank"}, \code{"ref"}, \code{"oxygen_tolerance"}, \code{"morphology"}, \code{"source"}, \code{"lpsn"}, \code{"lpsn_parent"}, \code{"lpsn_renamed_to"}, \code{"mycobank"}, \code{"mycobank_parent"}, \code{"mycobank_renamed_to"}, \code{"gbif"}, \code{"gbif_parent"}, \code{"gbif_renamed_to"}, \code{"prevalence"}, or \code{"snomed"}. If \code{NULL}, the raw values from \code{col_mo} will be used without transformation. When using \code{"species"} (default) or \code{"subpecies"}, the genus will be added to make sure each (sub)species still belongs to the right genus.}
\item{property}{A character string indicating the microorganism property to use for filtering. Must be one of the column names of the \link{microorganisms} data set: \code{"mo"}, \code{"fullname"}, \code{"status"}, \code{"domain"}, \code{"kingdom"}, \code{"phylum"}, \code{"class"}, \code{"order"}, \code{"family"}, \code{"genus"}, \code{"species"}, \code{"subspecies"}, \code{"rank"}, \code{"ref"}, \code{"oxygen_tolerance"}, \code{"morphology"}, \code{"source"}, \code{"lpsn"}, \code{"lpsn_parent"}, \code{"lpsn_renamed_to"}, \code{"mycobank"}, \code{"mycobank_parent"}, \code{"mycobank_renamed_to"}, \code{"gbif"}, \code{"gbif_parent"}, \code{"gbif_renamed_to"}, \code{"prevalence"}, or \code{"snomed"}. If \code{NULL}, the raw values from \code{col_mo} will be used without transformation. When using \code{"species"} (default) or \code{"subspecies"}, the genus is prepended to ensure each name is unambiguous.}
\item{n_for_each}{An optional integer specifying the maximum number of rows to retain for each value of the selected property. If \code{NULL}, all rows within the top \emph{n} groups will be included.}
\item{n_for_each}{An optional positive whole number specifying the maximum number of distinct microorganism groups at the level of \code{property_for_each} to retain within each of the top \emph{n} groups. Only used when \code{property_for_each} is also set.}
\item{property_for_each}{The microorganism property to use for sub-grouping within each top \emph{n} group. Must be one of the column names of the \link{microorganisms} data set and at a strictly lower taxonomic rank than \code{property} (allowed order: domain > kingdom > phylum > class > order > family > genus > species > subspecies). Defaults to \code{"species"}. Only relevant when \code{n_for_each} is set.}
\item{col_mo}{A character string indicating the column in \code{x} that contains microorganism names or codes. Defaults to the first column of class \code{\link{mo}}. Values will be coerced using \code{\link[=as.mo]{as.mo()}}.}
\item{...}{Additional arguments passed on to \code{\link[=mo_property]{mo_property()}} when \code{property} is not \code{NULL}.}
}
\description{
This function filters a data set to include only the top \emph{n} microorganisms based on a specified property, such as taxonomic family or genus. For example, it can filter a data set to the top 3 species, or to any species in the top 5 genera, or to the top 3 species in each of the top 5 genera.
Filters a data set to include only the top \emph{n} microorganisms based on a specified property, such as taxonomic family or genus. For example, it can filter a data set to the top 3 species, to any species in the top 5 genera, or to the top 3 species in each of the top 5 genera.
}
\details{
This function is useful for preprocessing data before creating \link[=antibiogram]{antibiograms} or other analyses that require focused subsets of microbial data. For example, it can filter a data set to only include isolates from the top 10 species.
This function is useful for preprocessing data before creating \link[=antibiogram]{antibiograms} or other analyses that require focused subsets of microbial data.
}
\examples{
# filter to the top 3 species:
top_n_microorganisms(example_isolates,
n = 3
)
top_n_microorganisms(example_isolates, n = 3)
# filter to any species in the top 5 genera:
top_n_microorganisms(example_isolates,
n = 5, property = "genus"
)
top_n_microorganisms(example_isolates, n = 5, property = "genus")
# filter to the top 3 species in each of the top 5 genera:
top_n_microorganisms(example_isolates,
n = 5, property = "genus", n_for_each = 3
)
# filter to the top 2 genera in each of the top 3 families:
top_n_microorganisms(example_isolates,
n = 3, property = "family", n_for_each = 2, property_for_each = "genus"
)
}
\seealso{
\code{\link[=mo_property]{mo_property()}}, \code{\link[=as.mo]{as.mo()}}, \code{\link[=antibiogram]{antibiogram()}}