1
0
mirror of https://github.com/msberends/AMR.git synced 2025-12-15 23:10:28 +01:00
Files
AMR/reference/join.md
2025-11-24 10:42:21 +00:00

157 lines
6.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Join [microorganisms](https://amr-for-r.org/reference/microorganisms.md) to a Data Set
Join the data set
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
easily to an existing data set or to a
[character](https://rdrr.io/r/base/character.html) vector.
## Usage
``` r
inner_join_microorganisms(x, by = NULL, suffix = c("2", ""), ...)
left_join_microorganisms(x, by = NULL, suffix = c("2", ""), ...)
right_join_microorganisms(x, by = NULL, suffix = c("2", ""), ...)
full_join_microorganisms(x, by = NULL, suffix = c("2", ""), ...)
semi_join_microorganisms(x, by = NULL, ...)
anti_join_microorganisms(x, by = NULL, ...)
```
## Arguments
- x:
Existing data set to join, or
[character](https://rdrr.io/r/base/character.html) vector. In case of
a [character](https://rdrr.io/r/base/character.html) vector, the
resulting [data.frame](https://rdrr.io/r/base/data.frame.html) will
contain a column 'x' with these values.
- by:
A variable to join by - if left empty will search for a column with
class [`mo`](https://amr-for-r.org/reference/as.mo.md) (created with
[`as.mo()`](https://amr-for-r.org/reference/as.mo.md)) or will be
`"mo"` if that column name exists in `x`, could otherwise be a column
name of `x` with values that exist in `microorganisms$mo` (such as
`by = "bacteria_id"`), or another column in
[microorganisms](https://amr-for-r.org/reference/microorganisms.md)
(but then it should be named, like
`by = c("bacteria_id" = "fullname")`).
- suffix:
If there are non-joined duplicate variables in `x` and `y`, these
suffixes will be added to the output to disambiguate them. Should be a
[character](https://rdrr.io/r/base/character.html) vector of length 2.
- ...:
Ignored, only in place to allow future extensions.
## Value
a [data.frame](https://rdrr.io/r/base/data.frame.html)
## Details
**Note:** As opposed to the `join()` functions of `dplyr`,
[character](https://rdrr.io/r/base/character.html) vectors are supported
and at default existing columns will get a suffix `"2"` and the newly
joined columns will not get a suffix.
If the `dplyr` package is installed, their join functions will be used.
Otherwise, the much slower
[`merge()`](https://rdatatable.gitlab.io/data.table/reference/merge.html)
and [`interaction()`](https://rdrr.io/r/base/interaction.html) functions
from base R will be used.
## Examples
``` r
left_join_microorganisms(as.mo("K. pneumoniae"))
#> # A tibble: 1 × 26
#> mo fullname status kingdom phylum class order family genus species
#> <mo> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 B_KLBSL_PNMN Klebsiell… accep… Bacter… Pseud… Gamm… Ente… Enter… Kleb… pneumo…
#> # 16 more variables: subspecies <chr>, rank <chr>, ref <chr>,
#> # oxygen_tolerance <chr>, source <chr>, lpsn <chr>, lpsn_parent <chr>,
#> # lpsn_renamed_to <chr>, mycobank <chr>, mycobank_parent <chr>,
#> # mycobank_renamed_to <chr>, gbif <chr>, gbif_parent <chr>,
#> # gbif_renamed_to <chr>, prevalence <dbl>, snomed <list>
left_join_microorganisms("B_KLBSL_PNMN")
#> # A tibble: 1 × 26
#> mo fullname status kingdom phylum class order family genus species
#> <mo> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 B_KLBSL_PNMN Klebsiell… accep… Bacter… Pseud… Gamm… Ente… Enter… Kleb… pneumo…
#> # 16 more variables: subspecies <chr>, rank <chr>, ref <chr>,
#> # oxygen_tolerance <chr>, source <chr>, lpsn <chr>, lpsn_parent <chr>,
#> # lpsn_renamed_to <chr>, mycobank <chr>, mycobank_parent <chr>,
#> # mycobank_renamed_to <chr>, gbif <chr>, gbif_parent <chr>,
#> # gbif_renamed_to <chr>, prevalence <dbl>, snomed <list>
df <- data.frame(
date = seq(
from = as.Date("2018-01-01"),
to = as.Date("2018-01-07"),
by = 1
),
bacteria = as.mo(c(
"S. aureus", "MRSA", "MSSA", "STAAUR",
"E. coli", "E. coli", "E. coli"
)),
stringsAsFactors = FALSE
)
colnames(df)
#> [1] "date" "bacteria"
df_joined <- left_join_microorganisms(df, "bacteria")
colnames(df_joined)
#> [1] "date" "bacteria" "fullname"
#> [4] "status" "kingdom" "phylum"
#> [7] "class" "order" "family"
#> [10] "genus" "species" "subspecies"
#> [13] "rank" "ref" "oxygen_tolerance"
#> [16] "source" "lpsn" "lpsn_parent"
#> [19] "lpsn_renamed_to" "mycobank" "mycobank_parent"
#> [22] "mycobank_renamed_to" "gbif" "gbif_parent"
#> [25] "gbif_renamed_to" "prevalence" "snomed"
# \donttest{
if (require("dplyr")) {
example_isolates %>%
left_join_microorganisms() %>%
colnames()
}
#> Joining, by = "mo"
#> [1] "date" "patient" "age"
#> [4] "gender" "ward" "mo"
#> [7] "PEN" "OXA" "FLC"
#> [10] "AMX" "AMC" "AMP"
#> [13] "TZP" "CZO" "FEP"
#> [16] "CXM" "FOX" "CTX"
#> [19] "CAZ" "CRO" "GEN"
#> [22] "TOB" "AMK" "KAN"
#> [25] "TMP" "SXT" "NIT"
#> [28] "FOS" "LNZ" "CIP"
#> [31] "MFX" "VAN" "TEC"
#> [34] "TCY" "TGC" "DOX"
#> [37] "ERY" "CLI" "AZM"
#> [40] "IPM" "MEM" "MTR"
#> [43] "CHL" "COL" "MUP"
#> [46] "RIF" "fullname" "status"
#> [49] "kingdom" "phylum" "class"
#> [52] "order" "family" "genus"
#> [55] "species" "subspecies" "rank"
#> [58] "ref" "oxygen_tolerance" "source"
#> [61] "lpsn" "lpsn_parent" "lpsn_renamed_to"
#> [64] "mycobank" "mycobank_parent" "mycobank_renamed_to"
#> [67] "gbif" "gbif_parent" "gbif_renamed_to"
#> [70] "prevalence" "snomed"
# }
```