# Join [microorganisms](https://amr-for-r.org/reference/microorganisms.md) to a Data Set Join the data set [microorganisms](https://amr-for-r.org/reference/microorganisms.md) easily to an existing data set or to a [character](https://rdrr.io/r/base/character.html) vector. ## Usage ``` r inner_join_microorganisms(x, by = NULL, suffix = c("2", ""), ...) left_join_microorganisms(x, by = NULL, suffix = c("2", ""), ...) right_join_microorganisms(x, by = NULL, suffix = c("2", ""), ...) full_join_microorganisms(x, by = NULL, suffix = c("2", ""), ...) semi_join_microorganisms(x, by = NULL, ...) anti_join_microorganisms(x, by = NULL, ...) ``` ## Arguments - x: Existing data set to join, or [character](https://rdrr.io/r/base/character.html) vector. In case of a [character](https://rdrr.io/r/base/character.html) vector, the resulting [data.frame](https://rdrr.io/r/base/data.frame.html) will contain a column 'x' with these values. - by: A variable to join by - if left empty will search for a column with class [`mo`](https://amr-for-r.org/reference/as.mo.md) (created with [`as.mo()`](https://amr-for-r.org/reference/as.mo.md)) or will be `"mo"` if that column name exists in `x`, could otherwise be a column name of `x` with values that exist in `microorganisms$mo` (such as `by = "bacteria_id"`), or another column in [microorganisms](https://amr-for-r.org/reference/microorganisms.md) (but then it should be named, like `by = c("bacteria_id" = "fullname")`). - suffix: If there are non-joined duplicate variables in `x` and `y`, these suffixes will be added to the output to disambiguate them. Should be a [character](https://rdrr.io/r/base/character.html) vector of length 2. - ...: Ignored, only in place to allow future extensions. ## Value a [data.frame](https://rdrr.io/r/base/data.frame.html) ## Details **Note:** As opposed to the `join()` functions of `dplyr`, [character](https://rdrr.io/r/base/character.html) vectors are supported and at default existing columns will get a suffix `"2"` and the newly joined columns will not get a suffix. If the `dplyr` package is installed, their join functions will be used. Otherwise, the much slower [`merge()`](https://rdrr.io/pkg/data.table/man/merge.html) and [`interaction()`](https://rdrr.io/r/base/interaction.html) functions from base R will be used. ## Examples ``` r left_join_microorganisms(as.mo("K. pneumoniae")) #> # A tibble: 1 × 28 #> mo fullname status domain kingdom phylum class order family genus #> #> 1 B_KLBSL_PNMN Klebsiella… accep… Bacte… Pseudo… Pseud… Gamm… Ente… Enter… Kleb… #> # ℹ 18 more variables: species , subspecies , rank , ref , #> # oxygen_tolerance , morphology , source , lpsn , #> # lpsn_parent , lpsn_renamed_to , mycobank , #> # mycobank_parent , mycobank_renamed_to , gbif , #> # gbif_parent , gbif_renamed_to , prevalence , snomed left_join_microorganisms("B_KLBSL_PNMN") #> # A tibble: 1 × 28 #> mo fullname status domain kingdom phylum class order family genus #> #> 1 B_KLBSL_PNMN Klebsiella… accep… Bacte… Pseudo… Pseud… Gamm… Ente… Enter… Kleb… #> # ℹ 18 more variables: species , subspecies , rank , ref , #> # oxygen_tolerance , morphology , source , lpsn , #> # lpsn_parent , lpsn_renamed_to , mycobank , #> # mycobank_parent , mycobank_renamed_to , gbif , #> # gbif_parent , gbif_renamed_to , prevalence , snomed df <- data.frame( date = seq( from = as.Date("2018-01-01"), to = as.Date("2018-01-07"), by = 1 ), bacteria = as.mo(c( "S. aureus", "MRSA", "MSSA", "STAAUR", "E. coli", "E. coli", "E. coli" )), stringsAsFactors = FALSE ) colnames(df) #> [1] "date" "bacteria" df_joined <- left_join_microorganisms(df, "bacteria") colnames(df_joined) #> [1] "date" "bacteria" "fullname" #> [4] "status" "domain" "kingdom" #> [7] "phylum" "class" "order" #> [10] "family" "genus" "species" #> [13] "subspecies" "rank" "ref" #> [16] "oxygen_tolerance" "morphology" "source" #> [19] "lpsn" "lpsn_parent" "lpsn_renamed_to" #> [22] "mycobank" "mycobank_parent" "mycobank_renamed_to" #> [25] "gbif" "gbif_parent" "gbif_renamed_to" #> [28] "prevalence" "snomed" # \donttest{ if (require("dplyr")) { example_isolates %>% left_join_microorganisms() %>% colnames() } #> Joining, by = "mo" #> [1] "date" "patient" "age" #> [4] "gender" "ward" "mo" #> [7] "PEN" "OXA" "FLC" #> [10] "AMX" "AMC" "AMP" #> [13] "TZP" "CZO" "FEP" #> [16] "CXM" "FOX" "CTX" #> [19] "CAZ" "CRO" "GEN" #> [22] "TOB" "AMK" "KAN" #> [25] "TMP" "SXT" "NIT" #> [28] "FOS" "LNZ" "CIP" #> [31] "MFX" "VAN" "TEC" #> [34] "TCY" "TGC" "DOX" #> [37] "ERY" "CLI" "AZM" #> [40] "IPM" "MEM" "MTR" #> [43] "CHL" "COL" "MUP" #> [46] "RIF" "fullname" "status" #> [49] "domain" "kingdom" "phylum" #> [52] "class" "order" "family" #> [55] "genus" "species" "subspecies" #> [58] "rank" "ref" "oxygen_tolerance" #> [61] "morphology" "source" "lpsn" #> [64] "lpsn_parent" "lpsn_renamed_to" "mycobank" #> [67] "mycobank_parent" "mycobank_renamed_to" "gbif" #> [70] "gbif_parent" "gbif_renamed_to" "prevalence" #> [73] "snomed" # } ```