improve top_n_microorganisms(): add property_for_each, fix property=NULL, enforce rank order (#297)

2026-06-29 16:56:21 +02:00 · 2026-06-26 21:40:11 +02:00
parent 02bd9a71c1
commit f7d353361c
15 changed files with 143 additions and 95 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -85,6 +85,27 @@ _pkgdown.yml    # pkgdown website configuration
 - `translate.R` — 28-language translation system
 - `ggplot_sir.R` / `ggplot_pca.R` / `plotting.R` — visualisation functions
 ## Code Style
 Follow the [tidyverse style guide](https://style.tidyverse.org/) precisely. Key rules:
 - 2-space indentation; no tabs
 - `<-` for assignment, not `=`
 - Spaces around all binary operators and after commas; no spaces inside parentheses
 - When a function call must break across lines, place the first argument on a new line indented by 2 spaces, and put the closing `)` on its own line — **never align arguments to the opening parenthesis** (no hanging/forced mid-line indentation)
 ```r
 # good
 stop_(
  "some long message part one ",
  "part two"
 )
 # bad — forces indentation to match the opening parenthesis
 stop_("some long message part one ",
      "part two")
 ```
 ## Custom S3 Classes
 The package defines five S3 classes with full print/format/plot/vctrs support:
--- a/2
+++ b/2
@@ -1,5 +1,5 @@
 Package: AMR
-Version: 3.0.1.9076
+Version: 3.0.1.9077
 Date: 2026-06-26
 Title: Antimicrobial Resistance Data Analysis
 Description: Functions to simplify and standardise antimicrobial resistance (AMR)
--- a/NEWS.md
+++ b/NEWS.md
@@ -1,4 +1,4 @@
-# AMR 3.0.1.9076
+# AMR 3.0.1.9077
 Planned as v3.1.0, end of June 2026.
@@ -37,6 +37,7 @@ Planned as v3.1.0, end of June 2026.
 * Fixed some EUCAST Expert Rules, mostly on *S. pneumoniae*
 ### Updated
 * `top_n_microorganisms()`: new `property_for_each` argument for sub-grouping within top *n* groups; rank ordering enforced (only lower taxonomic ranks allowed); fixed `property = NULL` not being accepted; inner filter now tracks original row indices to prevent cross-group contamination
 * Taxonomic update for all microorganisms, now updated to June 2026
 * `mo_kingdom()` now returns the formal taxonomic kingdom; a one-time note per session explains the change when querying bacterial or archaeal records.
 * `mo_taxonomy()` and `mo_info()` gained `domain` for the list output
--- a/R/sysdata.rda
+++ b/R/sysdata.rda
--- a/R/tidymodels.R
+++ b/R/tidymodels.R
@@ -126,7 +126,8 @@ step_mic_log2 <- function(
  trained = FALSE,
  columns = NULL,
  skip = FALSE,
-    id = recipes::rand_id("mic_log2")) {
+  id = recipes::rand_id("mic_log2")
 ) {
  recipes::add_step(
    recipe,
    step_mic_log2_new(
@@ -201,7 +202,8 @@ step_sir_numeric <- function(
  trained = FALSE,
  columns = NULL,
  skip = FALSE,
-    id = recipes::rand_id("sir_numeric")) {
+  id = recipes::rand_id("sir_numeric")
 ) {
  recipes::add_step(
    recipe,
    step_sir_numeric_new(
--- a/R/top_n_microorganisms.R
+++ b/R/top_n_microorganisms.R
@@ -29,73 +29,88 @@
 #' Filter Top *n* Microorganisms
 #'
-#' This function filters a data set to include only the top *n* microorganisms based on a specified property, such as taxonomic family or genus. For example, it can filter a data set to the top 3 species, or to any species in the top 5 genera, or to the top 3 species in each of the top 5 genera.
+#' Filters a data set to include only the top *n* microorganisms based on a specified property, such as taxonomic family or genus. For example, it can filter a data set to the top 3 species, to any species in the top 5 genera, or to the top 3 species in each of the top 5 genera.
 #' @param x A data frame containing microbial data.
-#' @param n An integer specifying the maximum number of unique values of the `property` to include in the output.
+#' @param n A positive whole number specifying the maximum number of unique values of `property` to include in the output.
-#' @param property A character string indicating the microorganism property to use for filtering. Must be one of the column names of the [microorganisms] data set: `r vector_or(colnames(microorganisms), sort = FALSE, documentation = TRUE)`. If `NULL`, the raw values from `col_mo` will be used without transformation. When using `"species"` (default) or `"subpecies"`, the genus will be added to make sure each (sub)species still belongs to the right genus.
+#' @param property A character string indicating the microorganism property to use for filtering. Must be one of the column names of the [microorganisms] data set: `r vector_or(colnames(microorganisms), sort = FALSE, documentation = TRUE)`. If `NULL`, the raw values from `col_mo` will be used without transformation. When using `"species"` (default) or `"subspecies"`, the genus is prepended to ensure each name is unambiguous.
-#' @param n_for_each An optional integer specifying the maximum number of rows to retain for each value of the selected property. If `NULL`, all rows within the top *n* groups will be included.
+#' @param n_for_each An optional positive whole number specifying the maximum number of distinct microorganism groups at the level of `property_for_each` to retain within each of the top *n* groups. Only used when `property_for_each` is also set.
 #' @param property_for_each The microorganism property to use for sub-grouping within each top *n* group. Must be one of the column names of the [microorganisms] data set and at a strictly lower taxonomic rank than `property` (allowed order: domain > kingdom > phylum > class > order > family > genus > species > subspecies). Defaults to `"species"`. Only relevant when `n_for_each` is set.
 #' @param col_mo A character string indicating the column in `x` that contains microorganism names or codes. Defaults to the first column of class [`mo`]. Values will be coerced using [as.mo()].
 #' @param ... Additional arguments passed on to [mo_property()] when `property` is not `NULL`.
-#' @details This function is useful for preprocessing data before creating [antibiograms][antibiogram()] or other analyses that require focused subsets of microbial data. For example, it can filter a data set to only include isolates from the top 10 species.
+#' @details This function is useful for preprocessing data before creating [antibiograms][antibiogram()] or other analyses that require focused subsets of microbial data.
 #' @export
 #' @seealso [mo_property()], [as.mo()], [antibiogram()]
 #' @examples
 #' # filter to the top 3 species:
-#' top_n_microorganisms(example_isolates,
+#' top_n_microorganisms(example_isolates, n = 3)
 #'   n = 3
 #' )
 #'
 #' # filter to any species in the top 5 genera:
-#' top_n_microorganisms(example_isolates,
+#' top_n_microorganisms(example_isolates, n = 5, property = "genus")
 #'   n = 5, property = "genus"
 #' )
 #'
 #' # filter to the top 3 species in each of the top 5 genera:
 #' top_n_microorganisms(example_isolates,
 #'   n = 5, property = "genus", n_for_each = 3
 #' )
-top_n_microorganisms <- function(x, n, property = "species", n_for_each = NULL, col_mo = NULL, ...) {
+#'
 #' # filter to the top 2 genera in each of the top 3 families:
 #' top_n_microorganisms(example_isolates,
 #'   n = 3, property = "family", n_for_each = 2, property_for_each = "genus"
 #' )
 top_n_microorganisms <- function(x, n, property = "species", n_for_each = NULL, property_for_each = "species", col_mo = NULL, ...) {
  meet_criteria(x, allow_class = "data.frame") # also checks dimensions to be >0
  meet_criteria(n, allow_class = c("numeric", "integer"), has_length = 1, is_finite = TRUE, is_positive = TRUE)
-  meet_criteria(property, allow_class = "character", has_length = 1, is_in = colnames(AMR::microorganisms))
+  meet_criteria(property, allow_class = "character", has_length = 1, is_in = colnames(AMR::microorganisms), allow_NULL = TRUE)
  meet_criteria(n_for_each, allow_class = c("numeric", "integer"), has_length = 1, is_finite = TRUE, is_positive = TRUE, allow_NULL = TRUE)
  meet_criteria(property_for_each, allow_class = "character", has_length = 1, is_in = colnames(AMR::microorganisms), allow_NULL = TRUE)
  meet_criteria(col_mo, allow_class = "character", has_length = 1, allow_NULL = TRUE, is_in = colnames(x))
  if (is.null(col_mo)) {
    col_mo <- search_type_in_df(x = x, type = "mo", info = TRUE)
    stop_if(is.null(col_mo), "{.arg col_mo} must be set")
  }
-  x.bak <- x
+  .taxonomic_ranks <- c("domain", "kingdom", "phylum", "class", "order", "family", "genus", "species", "subspecies")
  if (!is.null(n_for_each) && !is.null(property) && !is.null(property_for_each)) {
    prop_rank <- match(property, .taxonomic_ranks)
    each_rank <- match(property_for_each, .taxonomic_ranks)
    if (!is.na(prop_rank) && !is.na(each_rank) && each_rank <= prop_rank) {
      stop_(
        "`property_for_each` (\"", property_for_each, "\") must be at a lower ",
        "taxonomic rank than `property` (\"", property, "\")"
      )
    }
  }
  x.bak <- x
  x[, col_mo] <- as.mo(x[, col_mo, drop = TRUE], keep_synonyms = TRUE)
-  if (is.null(property)) {
+  get_prop_val <- function(prop) {
-    x$prop_val <- x[[col_mo]]
+    if (is.null(prop)) {
-  } else if (property == "species") {
+      x[[col_mo]]
-    x$prop_val <- paste(mo_genus(x[[col_mo]], ...), mo_species(x[[col_mo]], ...))
+    } else if (prop == "species") {
-  } else if (property == "subspecies") {
+      paste(mo_genus(x[[col_mo]], ...), mo_species(x[[col_mo]], ...))
-    x$prop_val <- paste(mo_genus(x[[col_mo]], ...), mo_species(x[[col_mo]], ...), mo_subspecies(x[[col_mo]], ...))
+    } else if (prop == "subspecies") {
      paste(mo_genus(x[[col_mo]], ...), mo_species(x[[col_mo]], ...), mo_subspecies(x[[col_mo]], ...))
    } else {
-    x$prop_val <- mo_property(x[[col_mo]], property = property, ...)
+      mo_property(x[[col_mo]], property = prop, ...)
    }
  }
  counts <- sort(table(x$prop_val), decreasing = TRUE)
-  n <- as.integer(n)
+  x$prop_val <- get_prop_val(property)
-  if (length(counts) < n) {
+  counts <- sort(table(x$prop_val), decreasing = TRUE)
-    n <- length(counts)
+  n <- min(as.integer(n), length(counts))
-  }
+  filtered_rows <- which(x$prop_val %in% names(counts)[seq_len(n)])
  count_values <- names(counts)[seq_len(n)]
  filtered_rows <- which(x$prop_val %in% count_values)
  if (!is.null(n_for_each)) {
    n_for_each <- as.integer(n_for_each)
    x$prop_val_each <- get_prop_val(property_for_each)
    filtered_x <- x[filtered_rows, , drop = FALSE]
    filtered_x$.orig_row <- filtered_rows
    filtered_rows <- do.call(
      c,
      lapply(split(filtered_x, filtered_x$prop_val), function(group) {
-        top_values <- names(sort(table(group[[col_mo]]), decreasing = TRUE)[seq_len(n_for_each)])
+        top_each <- names(sort(table(group$prop_val_each), decreasing = TRUE)[seq_len(n_for_each)])
-        top_values <- top_values[!is.na(top_values)]
+        group$.orig_row[group$prop_val_each %in% top_each[!is.na(top_each)]]
        which(x[[col_mo]] %in% top_values)
      })
    )
  }
--- a/README.Rmd
+++ b/README.Rmd
@@ -11,6 +11,7 @@ knitr::opts_chunk$set(
  # fig.path = "man/figures/README-",
  out.width = "100%"
 )
 options(width = 100)
 AMR:::reset_all_thrown_messages()
 ```
--- a/data/antibiotics.rda
+++ b/data/antibiotics.rda
--- a/data/antimicrobials.rda
+++ b/data/antimicrobials.rda
--- a/index.Rmd
+++ b/index.Rmd
@@ -13,6 +13,7 @@ knitr::opts_chunk$set(
  fig.path = "pkgdown/assets/",
  out.width = "100%"
 )
 options(width = 100)
 AMR:::reset_all_thrown_messages()
 ```
--- a/index.md
+++ b/index.md
@@ -27,12 +27,9 @@
 <div style="display: flex; font-size: 0.8em;">
 <p style="text-align:left; width: 50%;">
 <small><a href="https://amr-for-r.org/">amr-for-r.org</a></small>
 </p>
 <p style="text-align:right; width: 50%;">
 <small><a href="https://doi.org/10.18637/jss.v104.i03" target="_blank">doi.org/10.18637/jss.v104.i03</a></small>
 </p>
@@ -64,7 +61,7 @@ formed the basis of two PhD theses ([DOI
 [DOI 10.33612/diss.192486375](https://doi.org/10.33612/diss.192486375)).
 After installing this package, R knows [**~97 000 distinct microbial
-species**](./reference/microorganisms.html) (updated May 2026) and all
+species**](./reference/microorganisms.html) (updated mei 2026) and all
 [**~620 antimicrobial and antiviral
 drugs**](./reference/antimicrobials.html) by name and code (including
 ATC, EARS-Net, ASIARS-Net, PubChem, LOINC and SNOMED CT), and knows all
@@ -175,11 +172,13 @@ example_isolates %>%
 #> ℹ Using column mo as input for `mo_fullname()`
 #> ℹ Using column mo as input for `mo_is_gram_negative()`
 #> ℹ Using column mo as input for `mo_is_intrinsic_resistant()`
-#> ℹ Determining intrinsic resistance based on 'EUCAST Expected Resistant
+#> ℹ Determining intrinsic resistance based on 'EUCAST Expected
-#>   Phenotypes' v1.2 (2023). This note will be shown once per session.
+#>   Resistant Phenotypes' v1.2 (2023). This note will be shown
-#> ℹ For `aminoglycosides()` using columns GEN (gentamicin), TOB (tobramycin), AMK
+#>   once per session.
-#>   (amikacin), and KAN (kanamycin)
+#> ℹ For `aminoglycosides()` using columns GEN (gentamicin), TOB
-#> ℹ For `carbapenems()` using columns IPM (imipenem) and MEM (meropenem)
+#>   (tobramycin), AMK (amikacin), and KAN (kanamycin)
 #> ℹ For `carbapenems()` using columns IPM (imipenem) and MEM
 #>   (meropenem)
 #> # A tibble: 35 × 7
 #>    bacteria                     GEN   TOB   AMK   KAN   IPM   MEM  
 #>    <chr>                        <sir> <sir> <sir> <sir> <sir> <sir>
@@ -229,8 +228,8 @@ wisca(example_isolates,
 ```
 | Piperacillin/tazobactam | Piperacillin/tazobactam + Gentamicin | Piperacillin/tazobactam + Tobramycin |
-|:---|:---|:---|
+|:------------------------|:-------------------------------------|:-------------------------------------|
-| 69.9% (64.7-75.2%) | 93.7% (92.2-95.1%) | 89.8% (86.8-92.3%) |
+| 70% (64.7-75.2%)        | 93.6% (92.2-95.1%)                   | 89.8% (87-92.5%)                     |
 WISCA supports stratification by any clinical variable, so you can
 generate syndrome-specific or ward-specific coverage estimates:
@@ -244,10 +243,10 @@ wisca(example_isolates,
 ```
 | Syndromic Group | Piperacillin/tazobactam | Piperacillin/tazobactam + Gentamicin | Piperacillin/tazobactam + Tobramycin |
-|:---|:---|:---|:---|
+|:----------------|:------------------------|:-------------------------------------|:-------------------------------------|
-| Clinical | 74.6% (69-80.1%) | 93.6% (91.9-95.1%) | 90.5% (86.9-93%) |
+| Clinical        | 74.6% (68.6-80.6%)      | 93.7% (92.1-95.1%)                   | 90.4% (87-93.1%)                     |
-| ICU | 57% (48.7-65.8%) | 86.7% (83.7-89.7%) | 82.8% (77.9-87.2%) |
+| ICU             | 57% (48.6-65.7%)        | 86.8% (83.6-89.8%)                   | 82.9% (78.1-87.3%)                   |
-| Outpatient | 57.5% (46.5-68.7%) | 76.7% (70.6-82.4%) | 67.5% (57.2-76.7%) |
+| Outpatient      | 56.9% (45.9-68.2%)      | 76.7% (70.6-82.3%)                   | 68% (57.6-77.2%)                     |
 **For AMR surveillance**, traditional antibiograms remain the right tool
 for tracking resistance per species over time:
@@ -256,11 +255,12 @@ for tracking resistance per species over time:
 antibiogram(example_isolates,
            mo_transform = "gramstain",
            antimicrobials = c("AMC", carbapenems(), "TZP"))
-#> ℹ For `carbapenems()` using columns IPM (imipenem) and MEM (meropenem)
+#> ℹ For `carbapenems()` using columns IPM (imipenem) and MEM
 #>   (meropenem)
 ```
 | Pathogen      | Amoxicillin/clavulanic acid | Imipenem            | Meropenem            | Piperacillin/tazobactam |
-|:---|:---|:---|:---|:---|
+|:--------------|:----------------------------|:--------------------|:---------------------|:------------------------|
 | Gram-negative | 76% (73-79%,N=726)          | 99% (98-100%,N=631) | 100% (99-100%,N=626) | 88% (85-91%,N=641)      |
 | Gram-positive | 76% (74-79%,N=1138)         | 81% (75-85%,N=257)  | 77% (70-82%,N=203)   | 86% (82-89%,N=345)      |
@@ -274,7 +274,7 @@ antibiogram(example_isolates,
 ```
 | Pathogen      | Piperacillin/tazobactam | Piperacillin/tazobactam + Gentamicin | Piperacillin/tazobactam + Tobramycin |
-|:---|:---|:---|:---|
+|:--------------|:------------------------|:-------------------------------------|:-------------------------------------|
 | Gram-negative | 88% (85-91%,N=641)      | 99% (97-99%,N=691)                   | 98% (97-99%,N=693)                   |
 | Gram-positive | 86% (82-89%,N=345)      | 98% (96-98%,N=1044)                  | 95% (93-97%,N=550)                   |
@@ -349,9 +349,10 @@ example_isolates %>%
  summarise(across(c(GEN, TOB),
                   list(total_R = resistance,
                        conf_int = function(x) sir_confidence_interval(x, collapse = "-"))))
-#> ℹ `resistance()` assumes the EUCAST guideline and thus considers the 'I'
+#> ℹ `resistance()` assumes the EUCAST guideline and thus
-#>   category susceptible. Set the `guideline` argument or the `AMR_guideline`
+#>   considers the 'I' category susceptible. Set the `guideline`
-#>   option to either "CLSI" or "EUCAST", see `?AMR-options`.
+#>   argument or the `AMR_guideline` option to either "CLSI" or
 #>   "EUCAST", see `?AMR-options`.
 #> ℹ This message will be shown once per session.
 #> # A tibble: 3 × 5
 #>   ward       GEN_total_R GEN_conf_int TOB_total_R TOB_conf_int
@@ -375,15 +376,16 @@ out <- example_isolates %>%
  # calculate AMR using resistance(), over all aminoglycosides and polymyxins:
  summarise(across(c(aminoglycosides(), polymyxins()),
            resistance))
-#> ℹ For `aminoglycosides()` using columns GEN (gentamicin), TOB (tobramycin), AMK
+#> ℹ For `aminoglycosides()` using columns GEN (gentamicin), TOB
-#>   (amikacin), and KAN (kanamycin)
+#>   (tobramycin), AMK (amikacin), and KAN (kanamycin)
 #> ℹ For `polymyxins()` using column COL (colistin)
 #> Warning: There was 1 warning in `summarise()`.
-#> ℹ In argument: `across(c(aminoglycosides(), polymyxins()), resistance)`.
+#> ℹ In argument: `across(c(aminoglycosides(), polymyxins()),
 #>   resistance)`.
 #> ℹ In group 3: `ward = "Outpatient"`.
 #> Caused by warning:
-#> ! Introducing NA: only 23 results available for KAN in group: ward = "Outpatient"
+#> ! Introducing NA: only 23 results available for KAN in group:
-#> (whilst `minimum = 30`).
+#> ward = "Outpatient" (whilst `minimum = 30`).
 out
 #> # A tibble: 3 × 6
 #>   ward         GEN   TOB   AMK   KAN   COL
--- a/man/AMR.Rd
+++ b/man/AMR.Rd
@@ -12,7 +12,7 @@ The \code{AMR} package is a peer-reviewed, \href{https://amr-for-r.org/#copyrigh
 This work was published in the Journal of Statistical Software (Volume 104(3); \doi{10.18637/jss.v104.i03}) and formed the basis of two PhD theses (\doi{10.33612/diss.177417131} and \doi{10.33612/diss.192486375}).
-After installing this package, R knows \href{https://amr-for-r.org/reference/microorganisms.html}{\strong{~97 000 distinct microbial species}} (updated May 2026) and all \href{https://amr-for-r.org/reference/antimicrobials.html}{\strong{~620 antimicrobial and antiviral drugs}} by name and code (including ATC, EARS-Net, ASIARS-Net, PubChem, LOINC and SNOMED CT), and knows all about valid SIR and MIC values. The integral clinical breakpoint guidelines from CLSI 2011-2026 and EUCAST 2011-2026 are included, even with epidemiological cut-off (ECOFF) values. It supports and can read any data format, including WHONET data. This package works on Windows, macOS and Linux with all versions of R since R-3.0 (April 2013). \strong{It was designed to work in any setting, including those with very limited resources}. It was created for both routine data analysis and academic research at the Faculty of Medical Sciences of the \href{https://www.rug.nl}{University of Groningen} and the \href{https://www.umcg.nl}{University Medical Center Groningen}.
+After installing this package, R knows \href{https://amr-for-r.org/reference/microorganisms.html}{\strong{~97 000 distinct microbial species}} (updated mei 2026) and all \href{https://amr-for-r.org/reference/antimicrobials.html}{\strong{~620 antimicrobial and antiviral drugs}} by name and code (including ATC, EARS-Net, ASIARS-Net, PubChem, LOINC and SNOMED CT), and knows all about valid SIR and MIC values. The integral clinical breakpoint guidelines from CLSI 2011-2026 and EUCAST 2011-2026 are included, even with epidemiological cut-off (ECOFF) values. It supports and can read any data format, including WHONET data. This package works on Windows, macOS and Linux with all versions of R since R-3.0 (April 2013). \strong{It was designed to work in any setting, including those with very limited resources}. It was created for both routine data analysis and academic research at the Faculty of Medical Sciences of the \href{https://www.rug.nl}{University of Groningen} and the \href{https://www.umcg.nl}{University Medical Center Groningen}.
 The \code{AMR} package is available in English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Swahili, Swedish, Turkish, Ukrainian, Urdu, and Vietnamese. Antimicrobial drug (group) names and colloquial microorganism names are provided in these languages.
 }
--- a/man/g.test.Rd
+++ b/man/g.test.Rd
@@ -46,7 +46,7 @@ A list with class \code{"htest"} containing the following
    \code{(observed - expected) / sqrt(expected)}.}
  \item{stdres}{standardized residuals,
    \code{(observed - expected) / sqrt(V)}, where \code{V} is the
-    residual cell variance (Agresti, 2007, section 2.4.5
+    residual cell variance {(\if{html}{\out{<a href="#reference+chisq.test.Rd+R+3AAgresti+3A2007" class="citation">}}Agresti 2007\if{html}{\out{</a>}}, section 2.4.5)}
    for the case where \code{x} is a matrix, \code{n * p * (1 - p)} otherwise).}
 }
 \description{
--- a/man/ggplot_pca.Rd
+++ b/man/ggplot_pca.Rd
@@ -59,8 +59,9 @@ ggplot_pca(
  }
 \item{pc.biplot}{
-    If true, use what Gabriel (1971) refers to as a "principal component
+    If true, use what {\if{html}{\cite{}\out{<a href="#reference+biplot.princomp.Rd+R+3AGabriel+3A1971" class="citation">}}Gabriel (1971)\if{html}{\out{</a>}}} refers to as a
-    biplot", with \code{lambda = 1} and observations scaled up by sqrt(n) and
+    \dQuote{principal component biplot},
    with \code{lambda = 1} and observations scaled up by sqrt(n) and
    variables scaled down by sqrt(n).  Then inner products between
    variables approximate covariances and distances between observations
    approximate Mahalanobis distance.
--- a/man/top_n_microorganisms.Rd
+++ b/man/top_n_microorganisms.Rd
@@ -9,6 +9,7 @@ top_n_microorganisms(
  n,
  property = "species",
  n_for_each = NULL,
  property_for_each = "species",
  col_mo = NULL,
  ...
 )
@@ -16,37 +17,40 @@ top_n_microorganisms(
 \arguments{
 \item{x}{A data frame containing microbial data.}
-\item{n}{An integer specifying the maximum number of unique values of the \code{property} to include in the output.}
+\item{n}{A positive whole number specifying the maximum number of unique values of \code{property} to include in the output.}
-\item{property}{A character string indicating the microorganism property to use for filtering. Must be one of the column names of the \link{microorganisms} data set: \code{"mo"}, \code{"fullname"}, \code{"status"}, \code{"domain"}, \code{"kingdom"}, \code{"phylum"}, \code{"class"}, \code{"order"}, \code{"family"}, \code{"genus"}, \code{"species"}, \code{"subspecies"}, \code{"rank"}, \code{"ref"}, \code{"oxygen_tolerance"}, \code{"morphology"}, \code{"source"}, \code{"lpsn"}, \code{"lpsn_parent"}, \code{"lpsn_renamed_to"}, \code{"mycobank"}, \code{"mycobank_parent"}, \code{"mycobank_renamed_to"}, \code{"gbif"}, \code{"gbif_parent"}, \code{"gbif_renamed_to"}, \code{"prevalence"}, or \code{"snomed"}. If \code{NULL}, the raw values from \code{col_mo} will be used without transformation. When using \code{"species"} (default) or \code{"subpecies"}, the genus will be added to make sure each (sub)species still belongs to the right genus.}
+\item{property}{A character string indicating the microorganism property to use for filtering. Must be one of the column names of the \link{microorganisms} data set: \code{"mo"}, \code{"fullname"}, \code{"status"}, \code{"domain"}, \code{"kingdom"}, \code{"phylum"}, \code{"class"}, \code{"order"}, \code{"family"}, \code{"genus"}, \code{"species"}, \code{"subspecies"}, \code{"rank"}, \code{"ref"}, \code{"oxygen_tolerance"}, \code{"morphology"}, \code{"source"}, \code{"lpsn"}, \code{"lpsn_parent"}, \code{"lpsn_renamed_to"}, \code{"mycobank"}, \code{"mycobank_parent"}, \code{"mycobank_renamed_to"}, \code{"gbif"}, \code{"gbif_parent"}, \code{"gbif_renamed_to"}, \code{"prevalence"}, or \code{"snomed"}. If \code{NULL}, the raw values from \code{col_mo} will be used without transformation. When using \code{"species"} (default) or \code{"subspecies"}, the genus is prepended to ensure each name is unambiguous.}
-\item{n_for_each}{An optional integer specifying the maximum number of rows to retain for each value of the selected property. If \code{NULL}, all rows within the top \emph{n} groups will be included.}
+\item{n_for_each}{An optional positive whole number specifying the maximum number of distinct microorganism groups at the level of \code{property_for_each} to retain within each of the top \emph{n} groups. Only used when \code{property_for_each} is also set.}
 \item{property_for_each}{The microorganism property to use for sub-grouping within each top \emph{n} group. Must be one of the column names of the \link{microorganisms} data set and at a strictly lower taxonomic rank than \code{property} (allowed order: domain > kingdom > phylum > class > order > family > genus > species > subspecies). Defaults to \code{"species"}. Only relevant when \code{n_for_each} is set.}
 \item{col_mo}{A character string indicating the column in \code{x} that contains microorganism names or codes. Defaults to the first column of class \code{\link{mo}}. Values will be coerced using \code{\link[=as.mo]{as.mo()}}.}
 \item{...}{Additional arguments passed on to \code{\link[=mo_property]{mo_property()}} when \code{property} is not \code{NULL}.}
 }
 \description{
-This function filters a data set to include only the top \emph{n} microorganisms based on a specified property, such as taxonomic family or genus. For example, it can filter a data set to the top 3 species, or to any species in the top 5 genera, or to the top 3 species in each of the top 5 genera.
+Filters a data set to include only the top \emph{n} microorganisms based on a specified property, such as taxonomic family or genus. For example, it can filter a data set to the top 3 species, to any species in the top 5 genera, or to the top 3 species in each of the top 5 genera.
 }
 \details{
-This function is useful for preprocessing data before creating \link[=antibiogram]{antibiograms} or other analyses that require focused subsets of microbial data. For example, it can filter a data set to only include isolates from the top 10 species.
+This function is useful for preprocessing data before creating \link[=antibiogram]{antibiograms} or other analyses that require focused subsets of microbial data.
 }
 \examples{
 # filter to the top 3 species:
-top_n_microorganisms(example_isolates,
+top_n_microorganisms(example_isolates, n = 3)
  n = 3
 )
 # filter to any species in the top 5 genera:
-top_n_microorganisms(example_isolates,
+top_n_microorganisms(example_isolates, n = 5, property = "genus")
  n = 5, property = "genus"
 )
 # filter to the top 3 species in each of the top 5 genera:
 top_n_microorganisms(example_isolates,
  n = 5, property = "genus", n_for_each = 3
 )
 # filter to the top 2 genera in each of the top 3 families:
 top_n_microorganisms(example_isolates,
  n = 3, property = "family", n_for_each = 2, property_for_each = "genus"
 )
 }
 \seealso{
 \code{\link[=mo_property]{mo_property()}}, \code{\link[=as.mo]{as.mo()}}, \code{\link[=antibiogram]{antibiogram()}}