Support for German and Spanish microorganism properties, cleanup

2026-06-19 10:56:24 +02:00 · 2018-09-04 11:33:30 +02:00
parent 5405002119
commit b388e3fee7
20 changed files with 256 additions and 157 deletions
--- a/4
+++ b/4
@@ -1,6 +1,6 @@
 Package: AMR
-Version: 0.3.0.9006
-Date: 2018-09-01
+Version: 0.3.0.9007
+Date: 2018-09-04
 Title: Antimicrobial Resistance Analysis
 Authors@R: c(
    person(
--- a/2
+++ b/2
@@ -94,12 +94,10 @@ export(mo_family)
 export(mo_fullname)
 export(mo_genus)
 export(mo_gramstain)
-export(mo_gramstain_nl)
 export(mo_property)
 export(mo_species)
 export(mo_subspecies)
 export(mo_type)
-export(mo_type_nl)
 export(n_rsi)
 export(p.symbol)
 export(portion_I)
--- a/NEWS.md
+++ b/NEWS.md
@@ -10,7 +10,15 @@
  * Column names of datasets `microorganisms` and `septic_patients`
  * All old syntaxes will still work with this version, but will throw warnings
 * Functions `as.atc` and `is.atc` to transform/look up antibiotic ATC codes as defined by the WHO. The existing function `guess_atc` is now an alias of `as.atc`.
-* Aliases for existing function `mo_property`: `mo_family`, `mo_genus`, `mo_species`, `mo_subspecies`, `mo_fullname`, `mo_type`, `mo_gramstain`, `mo_aerobic`, `mo_type_nl` and `mo_gramstain_nl`
+* Aliases for existing function `mo_property`: `mo_family`, `mo_genus`, `mo_species`, `mo_subspecies`, `mo_fullname`, `mo_aerobic`, `mo_type`, `mo_gramstain`. The last two functions have a `language` parameter, with support for Spanish, German and Dutch:
+  ```r
+  mo_gramstain("E. coli")
+  # [1] "Negative rods"
+  mo_gramstain("E. coli", language = "de") # "de" = Deutsch / German
+  # [1] "Negative Staebchen"
+  mo_gramstain("E. coli", language = "es") # "es" = Español / Spanish
+  # [1] "Bacilos negativos"
+  ```
 * Function `ab_property` and its aliases: `ab_official`, `ab_tradenames`, `ab_certe`, `ab_umcg`, `ab_official_nl` and `ab_trivial_nl`
 * Introduction to AMR as a vignette

--- a/R/ab_property.R
+++ b/R/ab_property.R
@@ -36,7 +36,7 @@
 ab_property <- function(x, property = 'official') {
  property <- property[1]
  if (!property %in% colnames(antibiotics)) {
-    stop("invalid property: ", property, " - use a column name of `antibiotics`")
+    stop("invalid property: ", property, " - use a column name of the `antibiotics` data set")
  }
  if (!is.atc(x)) {
    x <- as.atc(x) # this will give a warning if x cannot be coerced
--- a/R/atc.R
+++ b/R/atc.R
@@ -17,9 +17,9 @@
 # ==================================================================== #


-#' Find ATC code based on antibiotic property
+#' Transform to ATC code
 #'
-#' Use this function to determine the ATC code of one or more antibiotics. The dataset \code{\link{antibiotics}} will be searched for abbreviations, official names and trade names.
+#' Use this function to determine the ATC code of one or more antibiotics. The data set \code{\link{antibiotics}} will be searched for abbreviations, official names and trade names.
 #' @param x character vector to determine \code{ATC} code
 #' @rdname as.atc
 #' @aliases atc
--- a/R/data.R
+++ b/R/data.R
@@ -123,7 +123,7 @@
 #' Data set with human pathogenic microorganisms
 #'
 #' A data set containing 2,664 (potential) human pathogenic microorganisms. MO codes can be looked up using \code{\link{guess_mo}}.
-#' @format A \code{\link{tibble}} with 2,664 observations and 12 variables:
+#' @format A \code{\link{tibble}} with 2,664 observations and 16 variables:
 #' \describe{
 #'   \item{\code{mo}}{ID of microorganism}
 #'   \item{\code{bactsys}}{Bactsyscode of microorganism}
@@ -132,11 +132,15 @@
 #'   \item{\code{species}}{Species name of microorganism, like \code{"coli"}}
 #'   \item{\code{subspecies}}{Subspecies name of bio-/serovar of microorganism, like \code{"EHEC"}}
 #'   \item{\code{fullname}}{Full name, like \code{"Echerichia coli (EHEC)"}}
+#'   \item{\code{aerobic}}{Logical whether bacteria is aerobic}
 #'   \item{\code{type}}{Type of microorganism, like \code{"Bacteria"} and \code{"Fungus/yeast"}}
 #'   \item{\code{gramstain}}{Gram of microorganism, like \code{"Negative rods"}}
-#'   \item{\code{aerobic}}{Logical whether bacteria is aerobic}
+#'   \item{\code{type_de}}{Type of microorganism in German, like \code{"Bakterien"} and \code{"Pilz/Hefe"}}
+#'   \item{\code{gramstain_de}}{Gram of microorganism in German, like \code{"Negative Staebchen"}}
 #'   \item{\code{type_nl}}{Type of microorganism in Dutch, like \code{"Bacterie"} and \code{"Schimmel/gist"}}
 #'   \item{\code{gramstain_nl}}{Gram of microorganism in Dutch, like \code{"Negatieve staven"}}
+#'   \item{\code{type_es}}{Type of microorganism in Spanish, like \code{"Bacteria"} and \code{"Hongo/levadura"}}
+#'   \item{\code{gramstain_es}}{Gram of microorganism in Spanish, like \code{"Bacilos negativos"}}
 #' }
 #  source MOLIS (LIS of Certe) - \url{https://www.certe.nl}
 # new <- microorganisms %>% filter(genus == "Bacteroides") %>% .[1,]
--- a/R/mo.R
+++ b/R/mo.R
@@ -19,15 +19,19 @@
 #' Transform to microorganism ID
 #'
 #' Use this function to determine a valid ID based on a genus (and species). This input can be a full name (like \code{"Staphylococcus aureus"}), an abbreviated name (like \code{"S. aureus"}), or just a genus. You could also \code{\link{select}} a genus and species column, zie Examples.
-#' @param x a character vector or a dataframe with one or two columns
-#' @param Becker a logical to indicate whether \emph{Staphylococci} should be categorised into Coagulase Negative \emph{Staphylococci} ("CoNS") and Coagulase Positive \emph{Staphylococci} ("CoPS") instead of their own species, according to Karsten Becker \emph{et al.} [1]. This excludes \emph{Staphylococcus aureus} at default, use \code{Becker = "all"} to also categorise \emph{S. aureus} as "CoPS".
-#' @param Lancefield a logical to indicate whether beta-haemolytic \emph{Streptococci} should be categorised into Lancefield groups instead of their own species, according to Rebecca C. Lancefield [2]. These \emph{Streptococci} will be categorised in their first group, i.e. \emph{Streptococcus dysgalactiae} will be group C, although officially it was also categorised into groups G and L. Groups D and E will be ignored, since they are \emph{Enterococci}.
+#' @param x a character vector or a \code{data.frame} with one or two columns
+#' @param Becker a logical to indicate whether \emph{Staphylococci} should be categorised into Coagulase Negative \emph{Staphylococci} ("CoNS") and Coagulase Positive \emph{Staphylococci} ("CoPS") instead of their own species, according to Karsten Becker \emph{et al.} [1].
+#'
+#'   This excludes \emph{Staphylococcus aureus} at default, use \code{Becker = "all"} to also categorise \emph{S. aureus} as "CoPS".
+#' @param Lancefield a logical to indicate whether beta-haemolytic \emph{Streptococci} should be categorised into Lancefield groups instead of their own species, according to Rebecca C. Lancefield [2]. These \emph{Streptococci} will be categorised in their first group, i.e. \emph{Streptococcus dysgalactiae} will be group C, although officially it was also categorised into groups G and L.
+#'
+#'   This excludes \emph{Enterococci} at default (who are in group D), use \code{Lancefield = "all"} to also categorise all \emph{Enterococci} as group D.
 #' @rdname as.mo
 #' @aliases mo
 #' @keywords mo Becker becker Lancefield lancefield guess
 #' @details \code{guess_mo} is an alias of \code{as.mo}.
 #'
-#' Use the \code{\link{mo_property}} functions to get properties based on the returned mo, see Examples.
+#' Use the \code{\link{mo_property}} functions to get properties based on the returned code, see Examples.
 #'
 #' Some exceptions have been built in to get more logical results, based on prevalence of human pathogens. These are:
 #' \itemize{
@@ -39,10 +43,9 @@
 #' Moreover, this function also supports ID's based on only Gram stain, when the species is not known. \cr
 #' For example, \code{"Gram negative rods"} and \code{"GNR"} will both return the ID of a Gram negative rod: \code{GNR}.
 #' @source
-#' [1] Becker K \emph{et al.} \strong{Coagulase-Negative Staphylococci}. 2014. Clin Microbiol Rev. 27(4): 870–926. \cr
-#'     \url{https://dx.doi.org/10.1128/CMR.00109-13} \cr
-#' [2] Lancefield RC \strong{A serological differentiation of human and other groups of hemolytic streptococci}. 1933. J Exp Med. 57(4): 571–95. \cr
-#'     \url{https://dx.doi.org/10.1084/jem.57.4.571}
+#' [1] Becker K \emph{et al.} \strong{Coagulase-Negative Staphylococci}. 2014. Clin Microbiol Rev. 27(4): 870–926. \url{https://dx.doi.org/10.1128/CMR.00109-13}
+#'
+#' [2] Lancefield RC \strong{A serological differentiation of human and other groups of hemolytic streptococci}. 1933. J Exp Med. 57(4): 571–95. \url{https://dx.doi.org/10.1084/jem.57.4.571}
 #' @export
 #' @importFrom dplyr %>% pull left_join
 #' @return Character (vector) with class \code{"mo"}. Unknown values will return \code{NA}.
@@ -63,7 +66,7 @@
 #' guess_mo("S. epidermidis")                 # will remain species: STAEPI
 #' guess_mo("S. epidermidis", Becker = TRUE)  # will not remain species: STACNS
 #'
-#' guess_mo("S. pyogenes")                    # will remain species: STCAGA
+#' guess_mo("S. pyogenes")                    # will remain species: STCPYO
 #' guess_mo("S. pyogenes", Lancefield = TRUE) # will not remain species: STCGRA
 #'
 #' # Use mo_* functions to get a specific property based on `mo`
@@ -177,10 +180,17 @@ as.mo <- function(x, Becker = FALSE, Lancefield = FALSE) {
    if (tolower(x[i]) %like% 'coagulase negative'
        | tolower(x[i]) %like% 'cns'
        | tolower(x[i]) %like% 'cons') {
-      # coerce S. coagulase negative, also as CNS and CoNS
+      # coerce S. coagulase negative
      x[i] <- 'STACNS'
      next
    }
+    if (tolower(x[i]) %like% 'coagulase positive'
+        | tolower(x[i]) %like% 'cps'
+        | tolower(x[i]) %like% 'cops') {
+      # coerce S. coagulase positive
+      x[i] <- 'STACPS'
+      next
+    }

    # translate known trivial names to genus+species
    if (!is.na(x_trimmed[i])) {
@@ -204,7 +214,7 @@ as.mo <- function(x, Becker = FALSE, Lancefield = FALSE) {
        next
      }
      if (toupper(x_trimmed[i]) %in% c('PISP', 'PRSP', 'VISP', 'VRSP')) {
-        # peni R, peni I, vanco I, vanco R: S. pneumoniae
+        # peni I, peni R, vanco I, vanco R: S. pneumoniae
        x[i] <- 'STCPNE'
        next
      }
@@ -327,7 +337,7 @@ as.mo <- function(x, Becker = FALSE, Lancefield = FALSE) {
    }
  }

-  if (Lancefield == TRUE) {
+  if (Lancefield == TRUE | Lancefield == "all") {
    # group A
    x[x == "STCPYO"] <- "STCGRA" # S. pyogenes
    # group B
@@ -338,6 +348,9 @@ as.mo <- function(x, Becker = FALSE, Lancefield = FALSE) {
                                              "zooepidemicus", "dysgalactiae")) %>%
      pull(mo)
    x[x %in% S_groupC] <- "STCGRC" # S. agalactiae
+    if (Lancefield == "all") {
+      x[substr(x, 1, 3) == "ENC"] <- "STCGRD" # all Enterococci
+    }
    # group F
    x[x == "STCANG"] <- "STCGRF" # S. anginosus
    # group H
--- a/R/mo_property.R
+++ b/R/mo_property.R
@@ -18,59 +18,80 @@

 #' Property of a microorganism
 #'
-#' Use these functions to return a specific property of a microorganism from the \code{\link{microorganisms}} data set, based on their \code{mo}. Get such an ID with \code{\link{as.mo}}.
-#' @param x a (vector of a) valid \code{\link{mo}} or any text that can be coerced to a valid microorganism code with \code{\link{as.mo}}
+#' Use these functions to return a specific property of a microorganism from the \code{\link{microorganisms}} data set. All input values will be evaluated internally with \code{\link{as.mo}}.
+#' @param x any (vector of) text that can be coerced to a valid microorganism code with \code{\link{as.mo}}
 #' @param property one of the column names of one of the \code{\link{microorganisms}} data set, like \code{"mo"}, \code{"bactsys"}, \code{"family"}, \code{"genus"}, \code{"species"}, \code{"fullname"}, \code{"gramstain"} and \code{"aerobic"}
+#' @inheritParams as.mo
+#' @param language language of the returned text, either one of \code{"en"} (English), \code{"de"} (German) or \code{"nl"} (Dutch)
+#' @source
+#' [1] Becker K \emph{et al.} \strong{Coagulase-Negative Staphylococci}. 2014. Clin Microbiol Rev. 27(4): 870–926. \url{https://dx.doi.org/10.1128/CMR.00109-13}
+#'
+#' [2] Lancefield RC \strong{A serological differentiation of human and other groups of hemolytic streptococci}. 1933. J Exp Med. 57(4): 571–95. \url{https://dx.doi.org/10.1084/jem.57.4.571}
 #' @rdname mo_property
 #' @export
 #' @importFrom dplyr %>% left_join pull
 #' @seealso \code{\link{microorganisms}}
 #' @examples
 #' # All properties
-#' mo_family("E. coli")       # Enterobacteriaceae
-#' mo_genus("E. coli")        # Escherichia
-#' mo_species("E. coli")      # coli
-#' mo_subspecies("E. coli")   # <NA>
-#' mo_fullname("E. coli")     # Escherichia coli
-#' mo_type("E. coli")         # Bacteria
-#' mo_gramstain("E. coli")    # Negative rods
-#' mo_aerobic("E. coli")      # TRUE
-#' mo_type_nl("E. coli")      # Bacterie
-#' mo_gramstain_nl("E. coli") # Negatieve staven
+#' mo_family("E. coli")          # "Enterobacteriaceae"
+#' mo_genus("E. coli")           # "Escherichia"
+#' mo_species("E. coli")         # "coli"
+#' mo_subspecies("E. coli")      # <NA>
+#' mo_fullname("E. coli")        # "Escherichia coli"
+#' mo_type("E. coli")            # "Bacteria"
+#' mo_gramstain("E. coli")       # "Negative rods"
+#' mo_aerobic("E. coli")         # TRUE
+#'
+#' # language support for Spanish, German and Dutch
+#' mo_type("E. coli", "es")      # "Bakteria"
+#' mo_type("E. coli", "de")      # "Bakterien"
+#' mo_type("E. coli", "nl")      # "Bacterie"
+#' mo_gramstain("E. coli", "es") # "Bacilos negativos"
+#' mo_gramstain("E. coli", "de") # "Negative Staebchen"
+#' mo_gramstain("E. coli", "nl") # "Negatieve staven"
 #'
 #'
 #' # Abbreviations known in the field
-#' mo_genus("EHEC")           # Escherichia
-#' mo_species("EHEC")         # coli
-#' mo_subspecies("EHEC")      # EHEC
-#' mo_fullname("EHEC")        # Escherichia coli (EHEC)
+#' mo_genus("MRSA")              # "Staphylococcus"
+#' mo_species("MRSA")            # "aureus"
+#' mo_gramstain("MRSA")          # "Positive cocci"
 #'
-#' mo_genus("MRSA")           # Staphylococcus
-#' mo_species("MRSA")         # aureus
-#' mo_gramstain("MRSA")       # Positive cocci
-#'
-#' mo_genus("VISA")           # Staphylococcus
-#' mo_species("VISA")         # aureus
+#' mo_genus("VISA")              # "Staphylococcus"
+#' mo_species("VISA")            # "aureus"
 #'
 #'
 #' # Known subspecies
-#' mo_genus("doylei")         # Campylobacter
-#' mo_species("doylei")       # jejuni
-#' mo_fullname("doylei")      # Campylobacter jejuni (doylei)
+#' mo_genus("EHEC")              # "Escherichia"
+#' mo_species("EHEC")            # "coli"
+#' mo_subspecies("EHEC")         # "EHEC"
+#' mo_fullname("EHEC")           # "Escherichia coli (EHEC)"
+#'
+#' mo_genus("doylei")            # "Campylobacter"
+#' mo_species("doylei")          # "jejuni"
+#' mo_fullname("doylei")         # "Campylobacter jejuni (doylei)"
+#'
+#' mo_fullname("K. pneu rh")     # "Klebsiella pneumoniae (rhinoscleromatis)"
 #'
 #'
 #' # Anaerobic bacteria
-#' mo_genus("B. fragilis")    # Bacteroides
-#' mo_species("B. fragilis")  # fragilis
-#' mo_aerobic("B. fragilis")  # FALSE
-mo_property <- function(x, property = 'fullname') {
-  property <- property[1]
+#' mo_genus("B. fragilis")       # "Bacteroides"
+#' mo_species("B. fragilis")     # "fragilis"
+#' mo_aerobic("B. fragilis")     # FALSE
+#'
+#'
+#' # Becker classification, see ?as.mo
+#' mo_fullname("S. epidermidis")                 # "Staphylococcus epidermidis"
+#' mo_fullname("S. epidermidis", Becker = TRUE)  # "Coagulase Negative Staphylococcus (CoNS)"
+#'
+#' # Lancefield classification, see ?as.mo
+#' mo_fullname("S. pyogenes")                    # "Streptococcus pyogenes"
+#' mo_fullname("S. pyogenes", Lancefield = TRUE) # "Streptococcus group A"
+mo_property <- function(x, property = 'fullname', Becker = FALSE, Lancefield = FALSE) {
+  property <- tolower(property[1])
  if (!property %in% colnames(microorganisms)) {
-    stop("invalid property: ", property, " - use a column name of `microorganisms`")
-  }
-  if (!is.mo(x)) {
-    x <- as.mo(x) # this will give a warning if x cannot be coerced
+    stop("invalid property: ", property, " - use a column name of the `microorganisms` data set")
  }
+  x <- as.mo(x = x, Becker = Becker, Lancefield = Lancefield) # this will give a warning if x cannot be coerced
  suppressWarnings(
    data.frame(mo = x, stringsAsFactors = FALSE) %>%
      left_join(AMR::microorganisms, by = "mo") %>%
@@ -92,32 +113,32 @@ mo_genus <- function(x) {

 #' @rdname mo_property
 #' @export
-mo_species <- function(x) {
-  mo_property(x, "species")
+mo_species <- function(x, Becker = FALSE, Lancefield = FALSE) {
+  mo_property(x, "species", Becker = Becker, Lancefield = Lancefield)
 }

 #' @rdname mo_property
 #' @export
-mo_subspecies <- function(x) {
-  mo_property(x, "subspecies")
+mo_subspecies <- function(x, Becker = FALSE, Lancefield = FALSE) {
+  mo_property(x, "subspecies", Becker = Becker, Lancefield = Lancefield)
 }

 #' @rdname mo_property
 #' @export
-mo_fullname <- function(x) {
-  mo_property(x, "fullname")
+mo_fullname <- function(x, Becker = FALSE, Lancefield = FALSE) {
+  mo_property(x, "fullname", Becker = Becker, Lancefield = Lancefield)
 }

 #' @rdname mo_property
 #' @export
-mo_type <- function(x) {
-  mo_property(x, "type")
+mo_type <- function(x, language = "en") {
+  mo_property(x, paste0("type", checklang(language)))
 }

 #' @rdname mo_property
 #' @export
-mo_gramstain <- function(x) {
-  mo_property(x, "gramstain")
+mo_gramstain <- function(x, language = "en") {
+  mo_property(x, paste0("gramstain", checklang(language)))
 }

 #' @rdname mo_property
@@ -126,14 +147,15 @@ mo_aerobic <- function(x) {
  mo_property(x, "aerobic")
 }

-#' @rdname mo_property
-#' @export
-mo_type_nl <- function(x) {
-  mo_property(x, "type_nl")
-}
-
-#' @rdname mo_property
-#' @export
-mo_gramstain_nl <- function(x) {
-  mo_property(x, "gramstain_nl")
+checklang <- function(language) {
+  language <- tolower(language[1])
+  supported <- c("en", "de", "nl", "es")
+  if (!language %in% c(NULL, "", supported)) {
+    stop("invalid language: ", language, " - use one of ", paste0("'", sort(supported), "'", collapse = ", "), call. = FALSE)
+  }
+  if (language %in% c(NULL, "", "en")) {
+    ""
+  } else {
+    paste0("_", language)
+  }
 }
--- a/README.md
+++ b/README.md
@@ -55,7 +55,7 @@ This `AMR` package basically does four important things:
   * Use `first_isolate` to identify the first isolates of every patient [using guidelines from the CLSI](https://clsi.org/standards/products/microbiology/documents/m39/) (Clinical and Laboratory Standards Institute).
     * You can also identify first *weighted* isolates of every patient, an adjusted version of the CLSI guideline. This takes into account key antibiotics of every strain and compares them.
   * Use `MDRO` (abbreviation of Multi Drug Resistant Organisms) to check your isolates for exceptional resistance with country-specific guidelines or EUCAST rules. Currently, national guidelines for Germany and the Netherlands are supported.
-   * The data set `microorganisms` contains the family, genus, species, subspecies, colloquial name and Gram stain of almost 3,000 potential human pathogenic microorganisms (bacteria, fungi/yeasts and parasites). This enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like `mo_genus`, `mo_family` or `mo_gramstain`. Since it uses `as.mo` internally, AI is supported. For example, `mo_genus("MRSA")` and `mo_genus("S. aureus")` will both return `"Staphylococcus"`. These functions can be used to add new variables to your data.
+   * The data set `microorganisms` contains the family, genus, species, subspecies, colloquial name and Gram stain of almost 3,000 potential human pathogenic microorganisms (bacteria, fungi/yeasts and parasites). This enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like `mo_genus`, `mo_family` or `mo_gramstain`. As they use `as.mo` internally, they also use artificial intelligence. For example, `mo_genus("MRSA")` and `mo_genus("S. aureus")` will both return `"Staphylococcus"`. Some functions can return results in Spanish, German and Dutch. These functions can be used to add new variables to your data.
   * The data set `antibiotics` contains the ATC code, LIS codes, official name, trivial name and DDD of both oral and parenteral administration. It also contains a total of 298 trade names. Use functions like `ab_official` and `ab_tradenames` to look up values. As the `mo_*` functions use `as.mo` internally, the `ab_*` functions use `as.atc` internally so it uses AI to guess your expected result. For example, `ab_official("Fluclox")`, `ab_official("Floxapen")` and `ab_official("J01CF05")` will all return `"Flucloxacillin"`. These functions can again be used to add new variables to your data.

 3. It **analyses the data** with convenient functions that use well-known methods.
--- a/data/microorganisms.rda
+++ b/data/microorganisms.rda
--- a/man/as.atc.Rd
+++ b/man/as.atc.Rd
@@ -5,7 +5,7 @@
 \alias{atc}
 \alias{guess_atc}
 \alias{is.atc}
-\title{Find ATC code based on antibiotic property}
+\title{Transform to ATC code}
 \usage{
 as.atc(x)

@@ -20,7 +20,7 @@ is.atc(x)
 Character (vector) with class \code{"act"}. Unknown values will return \code{NA}.
 }
 \description{
-Use this function to determine the ATC code of one or more antibiotics. The dataset \code{\link{antibiotics}} will be searched for abbreviations, official names and trade names.
+Use this function to determine the ATC code of one or more antibiotics. The data set \code{\link{antibiotics}} will be searched for abbreviations, official names and trade names.
 }
 \details{
 Use the \code{\link{ab_property}} functions to get properties based on the returned ATC code, see Examples.
--- a/man/as.mo.Rd
+++ b/man/as.mo.Rd
@@ -7,10 +7,9 @@
 \alias{guess_mo}
 \title{Transform to microorganism ID}
 \source{
-[1] Becker K \emph{et al.} \strong{Coagulase-Negative Staphylococci}. 2014. Clin Microbiol Rev. 27(4): 870–926. \cr
-    \url{https://dx.doi.org/10.1128/CMR.00109-13} \cr
-[2] Lancefield RC \strong{A serological differentiation of human and other groups of hemolytic streptococci}. 1933. J Exp Med. 57(4): 571–95. \cr
-    \url{https://dx.doi.org/10.1084/jem.57.4.571}
+[1] Becker K \emph{et al.} \strong{Coagulase-Negative Staphylococci}. 2014. Clin Microbiol Rev. 27(4): 870–926. \url{https://dx.doi.org/10.1128/CMR.00109-13}
+
+[2] Lancefield RC \strong{A serological differentiation of human and other groups of hemolytic streptococci}. 1933. J Exp Med. 57(4): 571–95. \url{https://dx.doi.org/10.1084/jem.57.4.571}
 }
 \usage{
 as.mo(x, Becker = FALSE, Lancefield = FALSE)
@@ -20,11 +19,15 @@ is.mo(x)
 guess_mo(x, Becker = FALSE, Lancefield = FALSE)
 }
 \arguments{
-\item{x}{a character vector or a dataframe with one or two columns}
+\item{x}{a character vector or a \code{data.frame} with one or two columns}

-\item{Becker}{a logical to indicate whether \emph{Staphylococci} should be categorised into Coagulase Negative \emph{Staphylococci} ("CoNS") and Coagulase Positive \emph{Staphylococci} ("CoPS") instead of their own species, according to Karsten Becker \emph{et al.} [1]. This excludes \emph{Staphylococcus aureus} at default, use \code{Becker = "all"} to also categorise \emph{S. aureus} as "CoPS".}
+\item{Becker}{a logical to indicate whether \emph{Staphylococci} should be categorised into Coagulase Negative \emph{Staphylococci} ("CoNS") and Coagulase Positive \emph{Staphylococci} ("CoPS") instead of their own species, according to Karsten Becker \emph{et al.} [1].

-\item{Lancefield}{a logical to indicate whether beta-haemolytic \emph{Streptococci} should be categorised into Lancefield groups instead of their own species, according to Rebecca C. Lancefield [2]. These \emph{Streptococci} will be categorised in their first group, i.e. \emph{Streptococcus dysgalactiae} will be group C, although officially it was also categorised into groups G and L. Groups D and E will be ignored, since they are \emph{Enterococci}.}
+  This excludes \emph{Staphylococcus aureus} at default, use \code{Becker = "all"} to also categorise \emph{S. aureus} as "CoPS".}
+
+\item{Lancefield}{a logical to indicate whether beta-haemolytic \emph{Streptococci} should be categorised into Lancefield groups instead of their own species, according to Rebecca C. Lancefield [2]. These \emph{Streptococci} will be categorised in their first group, i.e. \emph{Streptococcus dysgalactiae} will be group C, although officially it was also categorised into groups G and L.
+
+  This excludes \emph{Enterococci} at default (who are in group D), use \code{Lancefield = "all"} to also categorise all \emph{Enterococci} as group D.}
 }
 \value{
 Character (vector) with class \code{"mo"}. Unknown values will return \code{NA}.
@@ -35,7 +38,7 @@ Use this function to determine a valid ID based on a genus (and species). This i
 \details{
 \code{guess_mo} is an alias of \code{as.mo}.

-Use the \code{\link{mo_property}} functions to get properties based on the returned mo, see Examples.
+Use the \code{\link{mo_property}} functions to get properties based on the returned code, see Examples.

 Some exceptions have been built in to get more logical results, based on prevalence of human pathogens. These are:
 \itemize{
@@ -63,7 +66,7 @@ as.mo("VRSA") # Vancomycin Resistant S. aureus
 guess_mo("S. epidermidis")                 # will remain species: STAEPI
 guess_mo("S. epidermidis", Becker = TRUE)  # will not remain species: STACNS

-guess_mo("S. pyogenes")                    # will remain species: STCAGA
+guess_mo("S. pyogenes")                    # will remain species: STCPYO
 guess_mo("S. pyogenes", Lancefield = TRUE) # will not remain species: STCGRA

 # Use mo_* functions to get a specific property based on `mo`
--- a/man/microorganisms.Rd
+++ b/man/microorganisms.Rd
@@ -4,7 +4,7 @@
 \name{microorganisms}
 \alias{microorganisms}
 \title{Data set with human pathogenic microorganisms}
-\format{A \code{\link{tibble}} with 2,664 observations and 12 variables:
+\format{A \code{\link{tibble}} with 2,664 observations and 16 variables:
 \describe{
  \item{\code{mo}}{ID of microorganism}
  \item{\code{bactsys}}{Bactsyscode of microorganism}
@@ -13,11 +13,15 @@
  \item{\code{species}}{Species name of microorganism, like \code{"coli"}}
  \item{\code{subspecies}}{Subspecies name of bio-/serovar of microorganism, like \code{"EHEC"}}
  \item{\code{fullname}}{Full name, like \code{"Echerichia coli (EHEC)"}}
+  \item{\code{aerobic}}{Logical whether bacteria is aerobic}
  \item{\code{type}}{Type of microorganism, like \code{"Bacteria"} and \code{"Fungus/yeast"}}
  \item{\code{gramstain}}{Gram of microorganism, like \code{"Negative rods"}}
-  \item{\code{aerobic}}{Logical whether bacteria is aerobic}
+  \item{\code{type_de}}{Type of microorganism in German, like \code{"Bakterien"} and \code{"Pilz/Hefe"}}
+  \item{\code{gramstain_de}}{Gram of microorganism in German, like \code{"Negative Staebchen"}}
  \item{\code{type_nl}}{Type of microorganism in Dutch, like \code{"Bacterie"} and \code{"Schimmel/gist"}}
  \item{\code{gramstain_nl}}{Gram of microorganism in Dutch, like \code{"Negatieve staven"}}
+  \item{\code{type_es}}{Type of microorganism in Spanish, like \code{"Bacteria"} and \code{"Hongo/levadura"}}
+  \item{\code{gramstain_es}}{Gram of microorganism in Spanish, like \code{"Bacilos negativos"}}
 }}
 \usage{
 microorganisms
--- a/man/mo_property.Rd
+++ b/man/mo_property.Rd
@@ -10,78 +10,105 @@
 \alias{mo_type}
 \alias{mo_gramstain}
 \alias{mo_aerobic}
-\alias{mo_type_nl}
-\alias{mo_gramstain_nl}
 \title{Property of a microorganism}
+\source{
+[1] Becker K \emph{et al.} \strong{Coagulase-Negative Staphylococci}. 2014. Clin Microbiol Rev. 27(4): 870–926. \url{https://dx.doi.org/10.1128/CMR.00109-13}
+
+[2] Lancefield RC \strong{A serological differentiation of human and other groups of hemolytic streptococci}. 1933. J Exp Med. 57(4): 571–95. \url{https://dx.doi.org/10.1084/jem.57.4.571}
+}
 \usage{
-mo_property(x, property = "fullname")
+mo_property(x, property = "fullname", Becker = FALSE,
+  Lancefield = FALSE)

 mo_family(x)

 mo_genus(x)

-mo_species(x)
+mo_species(x, Becker = FALSE, Lancefield = FALSE)

-mo_subspecies(x)
+mo_subspecies(x, Becker = FALSE, Lancefield = FALSE)

-mo_fullname(x)
+mo_fullname(x, Becker = FALSE, Lancefield = FALSE)

-mo_type(x)
+mo_type(x, language = "en")

-mo_gramstain(x)
+mo_gramstain(x, language = "en")

 mo_aerobic(x)
-
-mo_type_nl(x)
-
-mo_gramstain_nl(x)
 }
 \arguments{
-\item{x}{a (vector of a) valid \code{\link{mo}} or any text that can be coerced to a valid microorganism code with \code{\link{as.mo}}}
+\item{x}{any (vector of) text that can be coerced to a valid microorganism code with \code{\link{as.mo}}}

 \item{property}{one of the column names of one of the \code{\link{microorganisms}} data set, like \code{"mo"}, \code{"bactsys"}, \code{"family"}, \code{"genus"}, \code{"species"}, \code{"fullname"}, \code{"gramstain"} and \code{"aerobic"}}
+
+\item{Becker}{a logical to indicate whether \emph{Staphylococci} should be categorised into Coagulase Negative \emph{Staphylococci} ("CoNS") and Coagulase Positive \emph{Staphylococci} ("CoPS") instead of their own species, according to Karsten Becker \emph{et al.} [1].
+
+  This excludes \emph{Staphylococcus aureus} at default, use \code{Becker = "all"} to also categorise \emph{S. aureus} as "CoPS".}
+
+\item{Lancefield}{a logical to indicate whether beta-haemolytic \emph{Streptococci} should be categorised into Lancefield groups instead of their own species, according to Rebecca C. Lancefield [2]. These \emph{Streptococci} will be categorised in their first group, i.e. \emph{Streptococcus dysgalactiae} will be group C, although officially it was also categorised into groups G and L.
+
+  This excludes \emph{Enterococci} at default (who are in group D), use \code{Lancefield = "all"} to also categorise all \emph{Enterococci} as group D.}
+
+\item{language}{language of the returned text, either one of \code{"en"} (English), \code{"de"} (German) or \code{"nl"} (Dutch)}
 }
 \description{
-Use these functions to return a specific property of a microorganism from the \code{\link{microorganisms}} data set, based on their \code{mo}. Get such an ID with \code{\link{as.mo}}.
+Use these functions to return a specific property of a microorganism from the \code{\link{microorganisms}} data set. All input values will be evaluated internally with \code{\link{as.mo}}.
 }
 \examples{
 # All properties
-mo_family("E. coli")       # Enterobacteriaceae
-mo_genus("E. coli")        # Escherichia
-mo_species("E. coli")      # coli
-mo_subspecies("E. coli")   # <NA>
-mo_fullname("E. coli")     # Escherichia coli
-mo_type("E. coli")         # Bacteria
-mo_gramstain("E. coli")    # Negative rods
-mo_aerobic("E. coli")      # TRUE
-mo_type_nl("E. coli")      # Bacterie
-mo_gramstain_nl("E. coli") # Negatieve staven
+mo_family("E. coli")          # "Enterobacteriaceae"
+mo_genus("E. coli")           # "Escherichia"
+mo_species("E. coli")         # "coli"
+mo_subspecies("E. coli")      # <NA>
+mo_fullname("E. coli")        # "Escherichia coli"
+mo_type("E. coli")            # "Bacteria"
+mo_gramstain("E. coli")       # "Negative rods"
+mo_aerobic("E. coli")         # TRUE
+
+# language support for Spanish, German and Dutch
+mo_type("E. coli", "es")      # "Bakteria"
+mo_type("E. coli", "de")      # "Bakterien"
+mo_type("E. coli", "nl")      # "Bacterie"
+mo_gramstain("E. coli", "es") # "Bacilos negativos"
+mo_gramstain("E. coli", "de") # "Negative Staebchen"
+mo_gramstain("E. coli", "nl") # "Negatieve staven"


 # Abbreviations known in the field
-mo_genus("EHEC")           # Escherichia
-mo_species("EHEC")         # coli
-mo_subspecies("EHEC")      # EHEC
-mo_fullname("EHEC")        # Escherichia coli (EHEC)
+mo_genus("MRSA")              # "Staphylococcus"
+mo_species("MRSA")            # "aureus"
+mo_gramstain("MRSA")          # "Positive cocci"

-mo_genus("MRSA")           # Staphylococcus
-mo_species("MRSA")         # aureus
-mo_gramstain("MRSA")       # Positive cocci
-
-mo_genus("VISA")           # Staphylococcus
-mo_species("VISA")         # aureus
+mo_genus("VISA")              # "Staphylococcus"
+mo_species("VISA")            # "aureus"


 # Known subspecies
-mo_genus("doylei")         # Campylobacter
-mo_species("doylei")       # jejuni
-mo_fullname("doylei")      # Campylobacter jejuni (doylei)
+mo_genus("EHEC")              # "Escherichia"
+mo_species("EHEC")            # "coli"
+mo_subspecies("EHEC")         # "EHEC"
+mo_fullname("EHEC")           # "Escherichia coli (EHEC)"
+
+mo_genus("doylei")            # "Campylobacter"
+mo_species("doylei")          # "jejuni"
+mo_fullname("doylei")         # "Campylobacter jejuni (doylei)"
+
+mo_fullname("K. pneu rh")     # "Klebsiella pneumoniae (rhinoscleromatis)"


 # Anaerobic bacteria
-mo_genus("B. fragilis")    # Bacteroides
-mo_species("B. fragilis")  # fragilis
-mo_aerobic("B. fragilis")  # FALSE
+mo_genus("B. fragilis")       # "Bacteroides"
+mo_species("B. fragilis")     # "fragilis"
+mo_aerobic("B. fragilis")     # FALSE
+
+
+# Becker classification, see ?as.mo
+mo_fullname("S. epidermidis")                 # "Staphylococcus epidermidis"
+mo_fullname("S. epidermidis", Becker = TRUE)  # "Coagulase Negative Staphylococcus (CoNS)"
+
+# Lancefield classification, see ?as.mo
+mo_fullname("S. pyogenes")                    # "Streptococcus pyogenes"
+mo_fullname("S. pyogenes", Lancefield = TRUE) # "Streptococcus group A"
 }
 \seealso{
 \code{\link{microorganisms}}
--- a/tests/testthat/test-ab_property.R
+++ b/tests/testthat/test-ab_property.R
@@ -8,4 +8,7 @@ test_that("ab_property works", {
  expect_equal(ab_umcg("amox"), "AMOX")
  expect_equal(class(ab_tradenames("amox")), "character")
  expect_equal(class(ab_tradenames(c("amox", "amox"))), "list")
+  expect_equal(ab_atc("amox"), as.character(as.atc("amox")))
+
+  expect_error(ab_property("amox", "invalid property"))
 })
--- a/tests/testthat/test-deprecated.R
+++ b/tests/testthat/test-deprecated.R
@@ -16,9 +16,11 @@ test_that("deprecated functions work", {

  old_mo <- "ESCCOL"
  class(old_mo) <- "bactid"
+  df_oldmo <- data.frame(test = old_mo)
  # print
  expect_output(print(old_mo))
-  # test data.frame and pull
-  expect_equal(as.character(dplyr::pull(data.frame(test = old_mo), test)), "ESCCOL")
+  # test pull
+  library(dplyr)
+  expect_identical(df_oldmo %>% pull(test), old_mo)

 })
--- a/tests/testthat/test-mo.R
+++ b/tests/testthat/test-mo.R
@@ -12,7 +12,7 @@ test_that("as.mo works", {
  expect_equal(as.character(as.mo("klpn")), "KLEPNE")
  expect_equal(as.character(as.mo("Klebsiella")), "KLE")
  expect_equal(as.character(as.mo("K. pneu rhino")), "KLEPNERH") # K. pneumoniae subspp. rhinoscleromatis
-  expect_equal(as.character(as.mo("coagulase negative")), "STACNS")
+  expect_equal(as.character(as.mo("Bartonella")), "BAR")

  expect_equal(as.character(as.mo("P. aer")), "PSEAER") # not Pasteurella aerogenes

@@ -30,16 +30,21 @@ test_that("as.mo works", {
  expect_equal(as.character(as.mo("VISP")), "STCPNE")
  expect_equal(as.character(as.mo("VRSP")), "STCPNE")

+  expect_equal(as.character(as.mo("CNS")), "STACNS")
+  expect_equal(as.character(as.mo("CoNS")), "STACNS")
+  expect_equal(as.character(as.mo("CPS")), "STACPS")
+  expect_equal(as.character(as.mo("CoPS")), "STACPS")
+
  expect_identical(
    as.character(
      as.mo(c("stau",
-                     "STAU",
-                     "staaur",
-                     "S. aureus",
-                     "S aureus",
-                     "Staphylococcus aureus",
-                     "MRSA",
-                     "VISA"))),
+              "STAU",
+              "staaur",
+              "S. aureus",
+              "S aureus",
+              "Staphylococcus aureus",
+              "MRSA",
+              "VISA"))),
    rep("STAAUR", 8))

  # check for Becker classification
@@ -55,19 +60,23 @@ test_that("as.mo works", {
  expect_identical(as.character(guess_mo("STAAUR", Becker = "all")), "STACPS")

  # check for Lancefield classification
-  expect_identical(as.character(guess_mo("S. pyogenes", Lancefield = FALSE)), "STCPYO")
-  expect_identical(as.character(guess_mo("S. pyogenes", Lancefield = TRUE)),  "STCGRA")
-  expect_identical(as.character(guess_mo("STCPYO",      Lancefield = TRUE)),  "STCGRA")
-  expect_identical(as.character(guess_mo("S. agalactiae",  Lancefield = FALSE)),  "STCAGA")
-  expect_identical(as.character(guess_mo("S. agalactiae",  Lancefield = TRUE)),   "STCGRB") # group B
-  expect_identical(as.character(guess_mo("S. equisimilis", Lancefield = FALSE)),  "STCEQS")
-  expect_identical(as.character(guess_mo("S. equisimilis", Lancefield = TRUE)),   "STCGRC") # group C
-  expect_identical(as.character(guess_mo("S. anginosus",   Lancefield = FALSE)),  "STCANG")
-  expect_identical(as.character(guess_mo("S. anginosus",   Lancefield = TRUE)),   "STCGRF") # group F
-  expect_identical(as.character(guess_mo("S. sanguis",     Lancefield = FALSE)),  "STCSAN")
-  expect_identical(as.character(guess_mo("S. sanguis",     Lancefield = TRUE)),   "STCGRH") # group H
-  expect_identical(as.character(guess_mo("S. salivarius",  Lancefield = FALSE)),  "STCSAL")
-  expect_identical(as.character(guess_mo("S. salivarius",  Lancefield = TRUE)),   "STCGRK") # group K
+  expect_identical(as.character(guess_mo("S. pyogenes", Lancefield = FALSE)),    "STCPYO")
+  expect_identical(as.character(guess_mo("S. pyogenes", Lancefield = TRUE)),     "STCGRA")
+  expect_identical(as.character(guess_mo("STCPYO",      Lancefield = TRUE)),     "STCGRA") # group A
+  expect_identical(as.character(guess_mo("S. agalactiae",  Lancefield = FALSE)), "STCAGA")
+  expect_identical(as.character(guess_mo("S. agalactiae",  Lancefield = TRUE)),  "STCGRB") # group B
+  expect_identical(as.character(guess_mo("S. equisimilis", Lancefield = FALSE)), "STCEQS")
+  expect_identical(as.character(guess_mo("S. equisimilis", Lancefield = TRUE)),  "STCGRC") # group C
+  # Enterococci must only be influenced if Lancefield = "all"
+  expect_identical(as.character(guess_mo("E. faecium", Lancefield = FALSE)),     "ENCFAC")
+  expect_identical(as.character(guess_mo("E. faecium", Lancefield = TRUE)),      "ENCFAC")
+  expect_identical(as.character(guess_mo("E. faecium", Lancefield = "all")),     "STCGRD") # group D
+  expect_identical(as.character(guess_mo("S. anginosus",   Lancefield = FALSE)), "STCANG")
+  expect_identical(as.character(guess_mo("S. anginosus",   Lancefield = TRUE)),  "STCGRF") # group F
+  expect_identical(as.character(guess_mo("S. sanguis",     Lancefield = FALSE)), "STCSAN")
+  expect_identical(as.character(guess_mo("S. sanguis",     Lancefield = TRUE)),  "STCGRH") # group H
+  expect_identical(as.character(guess_mo("S. salivarius",  Lancefield = FALSE)), "STCSAL")
+  expect_identical(as.character(guess_mo("S. salivarius",  Lancefield = TRUE)),  "STCGRK") # group K

  library(dplyr)

--- a/tests/testthat/test-mo_property.R
+++ b/tests/testthat/test-mo_property.R
@@ -9,6 +9,12 @@ test_that("mo_property works", {
  expect_equal(mo_type("E. coli"), "Bacteria")
  expect_equal(mo_gramstain("E. coli"), "Negative rods")
  expect_equal(mo_aerobic("E. coli"), TRUE)
-  expect_equal(mo_type_nl("E. coli"), "Bacterie")
-  expect_equal(mo_gramstain_nl("E. coli"), "Negatieve staven")
+
+  expect_equal(mo_type("E. coli", language = "de"), "Bakterien")
+  expect_equal(mo_gramstain("E. coli", language = "de"), "Negative Staebchen")
+
+  expect_equal(mo_type("E. coli", language = "nl"), "Bacterie")
+  expect_equal(mo_gramstain("E. coli", language = "nl"), "Negatieve staven")
+
+  expect_error(mo_type("E. coli", language = "INVALID"))
 })
--- a/vignettes/.gitignore
+++ b/vignettes/.gitignore
@@ -1,4 +1,5 @@
 figure
 *.html
 *.md
+*.R
 rsconnect
--- a/vignettes/AMR.Rmd
+++ b/vignettes/AMR.Rmd
@@ -34,7 +34,7 @@ This `AMR` package basically does four important things:
   * Use `first_isolate` to identify the first isolates of every patient [using guidelines from the CLSI](https://clsi.org/standards/products/microbiology/documents/m39/) (Clinical and Laboratory Standards Institute).
     * You can also identify first *weighted* isolates of every patient, an adjusted version of the CLSI guideline. This takes into account key antibiotics of every strain and compares them.
   * Use `MDRO` (abbreviation of Multi Drug Resistant Organisms) to check your isolates for exceptional resistance with country-specific guidelines or EUCAST rules. Currently, national guidelines for Germany and the Netherlands are supported.
-   * The data set `microorganisms` contains the family, genus, species, subspecies, colloquial name and Gram stain of almost 2,650 microorganisms (2,207 bacteria, 285 fungi/yeasts, 153 parasites, 1 other). This enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like `mo_genus`, `mo_family` or `mo_gramstain`. Since it uses `as.mo` internally, AI is supported. For example, `mo_genus("MRSA")` and `mo_genus("S. aureus")` will both return `"Staphylococcus"`. These functions can be used to add new variables to your data.
+   * The data set `microorganisms` contains the family, genus, species, subspecies, colloquial name and Gram stain of almost 3,000 potential human pathogenic microorganisms (bacteria, fungi/yeasts and parasites). This enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like `mo_genus`, `mo_family` or `mo_gramstain`. As they use `as.mo` internally, they also use artificial intelligence. For example, `mo_genus("MRSA")` and `mo_genus("S. aureus")` will both return `"Staphylococcus"`. Some functions can return results in Spanish, German and Dutch. These functions can be used to add new variables to your data.
   * The data set `antibiotics` contains the ATC code, LIS codes, official name, trivial name and DDD of both oral and parenteral administration. It also contains a total of 298 trade names. Use functions like `ab_official` and `ab_tradenames` to look up values. As the `mo_*` functions use `as.mo` internally, the `ab_*` functions use `as.atc` internally so it uses AI to guess your expected result. For example, `ab_official("Fluclox")`, `ab_official("Floxapen")` and `ab_official("J01CF05")` will all return `"Flucloxacillin"`. These functions can again be used to add new variables to your data.

 3. It **analyses the data** with convenient functions that use well-known methods.
@@ -52,7 +52,6 @@ This `AMR` package basically does four important things:
     * Results of 40 antibiotics (each antibiotic in its own column) with a total of 38,414 antimicrobial results
     * Real and genuine data

-
 ----
 ```{r, echo = FALSE}
 # this will print "2018" in 2018, and "2018-yyyy" after 2018.