1
0
mirror of https://github.com/msberends/AMR.git synced 2026-06-29 18:56:19 +02:00
Files
AMR/reference/top_n_microorganisms.md
2026-06-26 19:49:26 +00:00

190 lines
9.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Filter Top *n* Microorganisms
Filters a data set to include only the top *n* microorganisms based on a
specified property, such as taxonomic family or genus. For example, it
can filter a data set to the top 3 species, to any species in the top 5
genera, or to the top 3 species in each of the top 5 genera.
## Usage
``` r
top_n_microorganisms(
x,
n,
property = "species",
n_for_each = NULL,
property_for_each = "species",
col_mo = NULL,
...
)
```
## Arguments
- x:
A data frame containing microbial data.
- n:
A positive whole number specifying the maximum number of unique values
of `property` to include in the output.
- property:
A character string indicating the microorganism property to use for
filtering. Must be one of the column names of the
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
data set: `"mo"`, `"fullname"`, `"status"`, `"domain"`, `"kingdom"`,
`"phylum"`, `"class"`, `"order"`, `"family"`, `"genus"`, `"species"`,
`"subspecies"`, `"rank"`, `"ref"`, `"oxygen_tolerance"`,
`"morphology"`, `"source"`, `"lpsn"`, `"lpsn_parent"`,
`"lpsn_renamed_to"`, `"mycobank"`, `"mycobank_parent"`,
`"mycobank_renamed_to"`, `"gbif"`, `"gbif_parent"`,
`"gbif_renamed_to"`, `"prevalence"`, or `"snomed"`. If `NULL`, the raw
values from `col_mo` will be used without transformation. When using
`"species"` (default) or `"subspecies"`, the genus is prepended to
ensure each name is unambiguous.
- n_for_each:
An optional positive whole number specifying the maximum number of
distinct microorganism groups at the level of `property_for_each` to
retain within each of the top *n* groups. Only used when
`property_for_each` is also set.
- property_for_each:
The microorganism property to use for sub-grouping within each top *n*
group. Must be one of the column names of the
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
data set and at a strictly lower taxonomic rank than `property`
(allowed order: domain \> kingdom \> phylum \> class \> order \>
family \> genus \> species \> subspecies). Defaults to `"species"`.
Only relevant when `n_for_each` is set.
- col_mo:
A character string indicating the column in `x` that contains
microorganism names or codes. Defaults to the first column of class
[`mo`](https://amr-for-r.org/reference/as.mo.md). Values will be
coerced using [`as.mo()`](https://amr-for-r.org/reference/as.mo.md).
- ...:
Additional arguments passed on to
[`mo_property()`](https://amr-for-r.org/reference/mo_property.md) when
`property` is not `NULL`.
## Details
This function is useful for preprocessing data before creating
[antibiograms](https://amr-for-r.org/reference/antibiogram.md) or other
analyses that require focused subsets of microbial data.
## See also
[`mo_property()`](https://amr-for-r.org/reference/mo_property.md),
[`as.mo()`](https://amr-for-r.org/reference/as.mo.md),
[`antibiogram()`](https://amr-for-r.org/reference/antibiogram.md)
## Examples
``` r
# filter to the top 3 species:
top_n_microorganisms(example_isolates, n = 3)
#> # A tibble: 1,015 × 46
#> date patient age gender ward mo PEN OXA FLC AMX
#> <date> <chr> <dbl> <chr> <chr> <mo> <sir> <sir> <sir> <sir>
#> 1 2002-01-02 A77334 65 F Clinical B_ESCHR_COLI R NA NA NA
#> 2 2002-01-03 A77334 65 F Clinical B_ESCHR_COLI R NA NA NA
#> 3 2002-01-14 462729 78 M Clinical B_STPHY_AURS R NA S R
#> 4 2002-01-14 462729 78 M Clinical B_STPHY_AURS R NA S R
#> 5 2002-01-19 738003 71 M Clinical B_ESCHR_COLI R NA NA NA
#> 6 2002-01-19 738003 71 M Clinical B_ESCHR_COLI R NA NA NA
#> 7 2002-02-03 481442 76 M ICU B_STPHY_CONS R NA S NA
#> 8 2002-02-14 067927 45 F ICU B_STPHY_CONS R NA R NA
#> 9 2002-02-14 067927 45 F ICU B_STPHY_CONS S NA S NA
#> 10 2002-02-21 A56499 64 M Clinical B_STPHY_CONS S NA S NA
#> # 1,005 more rows
#> # 36 more variables: AMC <sir>, AMP <sir>, TZP <sir>, CZO <sir>, FEP <sir>,
#> # CXM <sir>, FOX <sir>, CTX <sir>, CAZ <sir>, CRO <sir>, GEN <sir>,
#> # TOB <sir>, AMK <sir>, KAN <sir>, TMP <sir>, SXT <sir>, NIT <sir>,
#> # FOS <sir>, LNZ <sir>, CIP <sir>, MFX <sir>, VAN <sir>, TEC <sir>,
#> # TCY <sir>, TGC <sir>, DOX <sir>, ERY <sir>, CLI <sir>, AZM <sir>,
#> # IPM <sir>, MEM <sir>, MTR <sir>, CHL <sir>, COL <sir>, MUP <sir>, …
# filter to any species in the top 5 genera:
top_n_microorganisms(example_isolates, n = 5, property = "genus")
#> # A tibble: 1,742 × 46
#> date patient age gender ward mo PEN OXA FLC AMX
#> <date> <chr> <dbl> <chr> <chr> <mo> <sir> <sir> <sir> <sir>
#> 1 2002-01-02 A77334 65 F Clinical B_ESCHR_COLI R NA NA NA
#> 2 2002-01-03 A77334 65 F Clinical B_ESCHR_COLI R NA NA NA
#> 3 2002-01-07 067927 45 F ICU B_STPHY_EPDR R NA R NA
#> 4 2002-01-07 067927 45 F ICU B_STPHY_EPDR R NA R NA
#> 5 2002-01-13 067927 45 F ICU B_STPHY_EPDR R NA R NA
#> 6 2002-01-13 067927 45 F ICU B_STPHY_EPDR R NA R NA
#> 7 2002-01-14 462729 78 M Clinical B_STPHY_AURS R NA S R
#> 8 2002-01-14 462729 78 M Clinical B_STPHY_AURS R NA S R
#> 9 2002-01-16 067927 45 F ICU B_STPHY_EPDR R NA R NA
#> 10 2002-01-17 858515 79 F ICU B_STPHY_EPDR R NA S NA
#> # 1,732 more rows
#> # 36 more variables: AMC <sir>, AMP <sir>, TZP <sir>, CZO <sir>, FEP <sir>,
#> # CXM <sir>, FOX <sir>, CTX <sir>, CAZ <sir>, CRO <sir>, GEN <sir>,
#> # TOB <sir>, AMK <sir>, KAN <sir>, TMP <sir>, SXT <sir>, NIT <sir>,
#> # FOS <sir>, LNZ <sir>, CIP <sir>, MFX <sir>, VAN <sir>, TEC <sir>,
#> # TCY <sir>, TGC <sir>, DOX <sir>, ERY <sir>, CLI <sir>, AZM <sir>,
#> # IPM <sir>, MEM <sir>, MTR <sir>, CHL <sir>, COL <sir>, MUP <sir>, …
# filter to the top 3 species in each of the top 5 genera:
top_n_microorganisms(example_isolates,
n = 5, property = "genus", n_for_each = 3
)
#> # A tibble: 1,497 × 46
#> date patient age gender ward mo PEN OXA FLC AMX
#> <date> <chr> <dbl> <chr> <chr> <mo> <sir> <sir> <sir> <sir>
#> 1 2002-02-21 4FC193 69 M Clinical B_ENTRC_FACM NA NA NA NA
#> 2 2002-04-08 130252 78 M ICU B_ENTRC_FCLS NA NA NA NA
#> 3 2002-06-23 798871 82 M Clinical B_ENTRC_FCLS NA NA NA NA
#> 4 2002-06-23 798871 82 M Clinical B_ENTRC_FCLS NA NA NA NA
#> 5 2003-04-20 6BC362 62 M ICU B_ENTRC NA NA NA NA
#> 6 2003-04-21 6BC362 62 M ICU B_ENTRC NA NA NA NA
#> 7 2003-08-13 F35553 52 M ICU B_ENTRC_FCLS NA NA NA NA
#> 8 2003-09-05 F35553 52 M ICU B_ENTRC NA NA NA NA
#> 9 2003-09-05 F35553 52 M ICU B_ENTRC_FCLS NA NA NA NA
#> 10 2003-09-28 1B0933 80 M Clinical B_ENTRC NA NA NA NA
#> # 1,487 more rows
#> # 36 more variables: AMC <sir>, AMP <sir>, TZP <sir>, CZO <sir>, FEP <sir>,
#> # CXM <sir>, FOX <sir>, CTX <sir>, CAZ <sir>, CRO <sir>, GEN <sir>,
#> # TOB <sir>, AMK <sir>, KAN <sir>, TMP <sir>, SXT <sir>, NIT <sir>,
#> # FOS <sir>, LNZ <sir>, CIP <sir>, MFX <sir>, VAN <sir>, TEC <sir>,
#> # TCY <sir>, TGC <sir>, DOX <sir>, ERY <sir>, CLI <sir>, AZM <sir>,
#> # IPM <sir>, MEM <sir>, MTR <sir>, CHL <sir>, COL <sir>, MUP <sir>, …
# filter to the top 2 genera in each of the top 3 families:
top_n_microorganisms(example_isolates,
n = 3, property = "family", n_for_each = 2, property_for_each = "genus"
)
#> # A tibble: 1,659 × 46
#> date patient age gender ward mo PEN OXA FLC AMX
#> <date> <chr> <dbl> <chr> <chr> <mo> <sir> <sir> <sir> <sir>
#> 1 2002-01-02 A77334 65 F Clinical B_ESCHR_COLI R NA NA NA
#> 2 2002-01-03 A77334 65 F Clinical B_ESCHR_COLI R NA NA NA
#> 3 2002-01-19 738003 71 M Clinical B_ESCHR_COLI R NA NA NA
#> 4 2002-01-19 738003 71 M Clinical B_ESCHR_COLI R NA NA NA
#> 5 2002-02-27 066895 85 F Clinical B_KLBSL_PNMN R NA NA R
#> 6 2002-02-27 066895 85 F Clinical B_KLBSL_PNMN R NA NA R
#> 7 2002-03-08 4FC193 69 M Clinical B_ESCHR_COLI R NA NA R
#> 8 2002-04-01 496896 46 F ICU B_ESCHR_COLI R NA NA NA
#> 9 2002-04-01 496896 46 F ICU B_ESCHR_COLI R NA NA NA
#> 10 2002-04-23 EE2510 69 F ICU B_ESCHR_COLI R NA NA NA
#> # 1,649 more rows
#> # 36 more variables: AMC <sir>, AMP <sir>, TZP <sir>, CZO <sir>, FEP <sir>,
#> # CXM <sir>, FOX <sir>, CTX <sir>, CAZ <sir>, CRO <sir>, GEN <sir>,
#> # TOB <sir>, AMK <sir>, KAN <sir>, TMP <sir>, SXT <sir>, NIT <sir>,
#> # FOS <sir>, LNZ <sir>, CIP <sir>, MFX <sir>, VAN <sir>, TEC <sir>,
#> # TCY <sir>, TGC <sir>, DOX <sir>, ERY <sir>, CLI <sir>, AZM <sir>,
#> # IPM <sir>, MEM <sir>, MTR <sir>, CHL <sir>, COL <sir>, MUP <sir>, …
```