support for portuguese, language determination based on system

2026-02-01 16:32:54 +01:00 · 2018-09-08 16:06:47 +02:00
parent b8a6c9af19
commit 26f5be0033
19 changed files with 307 additions and 106 deletions
--- a/2
+++ b/2
@@ -1,6 +1,6 @@
 Package: AMR
 Version: 0.3.0.9007
-Date: 2018-09-04
+Date: 2018-09-08
 Title: Antimicrobial Resistance Analysis
 Authors@R: c(
    person(
--- a/NEWS.md
+++ b/NEWS.md
@@ -10,14 +10,16 @@
  * Column names of datasets `microorganisms` and `septic_patients`
  * All old syntaxes will still work with this version, but will throw warnings
 * Functions `as.atc` and `is.atc` to transform/look up antibiotic ATC codes as defined by the WHO. The existing function `guess_atc` is now an alias of `as.atc`.
-* Aliases for existing function `mo_property`: `mo_family`, `mo_genus`, `mo_species`, `mo_subspecies`, `mo_fullname`, `mo_shortname`, `mo_aerobic`, `mo_type` and `mo_gramstain`. The last two functions have a `language` parameter, with support for Spanish, German and Dutch:
+* Aliases for existing function `mo_property`: `mo_family`, `mo_genus`, `mo_species`, `mo_subspecies`, `mo_fullname`, `mo_shortname`, `mo_aerobic`, `mo_type` and `mo_gramstain`. They also come with support for German, Dutch, Spanish and Portuguese, and it defaults to the systems locale:
  ```r
  mo_gramstain("E. coli")
  # [1] "Negative rods"
  mo_gramstain("E. coli", language = "de") # "de" = Deutsch / German
-  # [1] "Negative Staebchen"
+  # [1] "Negative Stäbchen"
  mo_gramstain("E. coli", language = "es") # "es" = Español / Spanish
  # [1] "Bacilos negativos"
+  mo_fullname("S. group A") # when run on a on a Portuguese system
+  # [1] "Streptococcus grupo A"
  ```
 * Function `ab_property` and its aliases: `ab_official`, `ab_tradenames`, `ab_certe`, `ab_umcg`, `ab_official_nl` and `ab_trivial_nl`
 * Introduction to AMR as a vignette
@@ -34,6 +36,7 @@
  ab_atc(c("Bactroban", "Amoxil", "Zithromax", "Floxapen"))
  # [1] "R01AX06" "J01CA04" "J01FA10" "J01CF05"
  ```
+* For `first_isolate`, rows will be ignored when there's no species available
 * Function `ratio` is now deprecated and will be removed in a future release, as it is not really the scope of this package
 * Fix for `as.mic` for values ending in zeroes after a real number
 * Tremendous speed improvement for `as.bactid` (now `as.mo`)
--- a/R/data.R
+++ b/R/data.R
@@ -123,7 +123,7 @@
 #' Data set with human pathogenic microorganisms
 #'
 #' A data set containing 2,669 (potential) human pathogenic microorganisms. MO codes can be looked up using \code{\link{guess_mo}}.
-#' @format A \code{\link{tibble}} with 2,669 observations and 16 variables:
+#' @format A \code{\link{tibble}} with 2,669 observations and 10 variables:
 #' \describe{
 #'   \item{\code{mo}}{ID of microorganism}
 #'   \item{\code{bactsys}}{Bactsyscode of microorganism}
@@ -135,12 +135,6 @@
 #'   \item{\code{aerobic}}{Logical whether bacteria is aerobic}
 #'   \item{\code{type}}{Type of microorganism, like \code{"Bacteria"} and \code{"Fungus/yeast"}}
 #'   \item{\code{gramstain}}{Gram of microorganism, like \code{"Negative rods"}}
-#'   \item{\code{type_de}}{Type of microorganism in German, like \code{"Bakterien"} and \code{"Pilz/Hefe"}}
-#'   \item{\code{gramstain_de}}{Gram of microorganism in German, like \code{"Negative Staebchen"}}
-#'   \item{\code{type_nl}}{Type of microorganism in Dutch, like \code{"Bacterie"} and \code{"Schimmel/gist"}}
-#'   \item{\code{gramstain_nl}}{Gram of microorganism in Dutch, like \code{"Negatieve staven"}}
-#'   \item{\code{type_es}}{Type of microorganism in Spanish, like \code{"Bacteria"} and \code{"Hongo/levadura"}}
-#'   \item{\code{gramstain_es}}{Gram of microorganism in Spanish, like \code{"Bacilos negativos"}}
 #' }
 #  source MOLIS (LIS of Certe) - \url{https://www.certe.nl}
 # new <- microorganisms %>% filter(genus == "Bacteroides") %>% .[1,]
--- a/R/first_isolate.R
+++ b/R/first_isolate.R
@@ -326,7 +326,8 @@ first_isolate <- function(tbl,
      filter(
        row_number() %>% between(row.start,
                                 row.end),
-        genus != '') %>%
+        genus != "",
+        species != "") %>%
      nrow()
  )

@@ -373,7 +374,8 @@ first_isolate <- function(tbl,
          real_first_isolate =
            if_else(
              between(row_number(), row.start, row.end)
-              & genus != ''
+              & genus != ""
+              & species != ""
              & (other_pat_or_mo
                 | days_diff >= episode_days
                 | key_ab_other),
@@ -388,7 +390,8 @@ first_isolate <- function(tbl,
          real_first_isolate =
            if_else(
              between(row_number(), row.start, row.end)
-              & genus != ''
+              & genus != ""
+              & species != ""
              & (other_pat_or_mo
                 | days_diff >= episode_days),
              TRUE,
--- a/R/mo.R
+++ b/R/mo.R
@@ -125,6 +125,8 @@ as.mo <- function(x, Becker = FALSE, Lancefield = FALSE) {
  x <- unique(x)

  x_backup <- x
+  # translate to English for supported languages of mo_property
+  x <- gsub("(Gruppe|gruppe|groep|grupo)", "group", x)
  # remove dots and other non-text in case of "E. coli" except spaces
  x <- gsub("[^a-zA-Z0-9 ]+", "", x)
  # but spaces before and after should be omitted
@@ -170,6 +172,11 @@ as.mo <- function(x, Becker = FALSE, Lancefield = FALSE) {
      x[i] <- 'HAEINF'
      next
    }
+    if (tolower(x[i]) == '^c.*difficile$') {
+      # avoid detection of Clostridium difficile in case of C. difficile
+      x[i] <- 'CLODIF'
+      next
+    }
    if (tolower(x[i]) == '^st.*au$'
        | tolower(x[i]) == '^stau$'
        | tolower(x[i]) == '^staaur$') {
--- a/R/mo_property.R
+++ b/R/mo_property.R
@@ -22,12 +22,13 @@
 #' @param x any (vector of) text that can be coerced to a valid microorganism code with \code{\link{as.mo}}
 #' @param property one of the column names of one of the \code{\link{microorganisms}} data set, like \code{"mo"}, \code{"bactsys"}, \code{"family"}, \code{"genus"}, \code{"species"}, \code{"fullname"}, \code{"gramstain"} and \code{"aerobic"}
 #' @inheritParams as.mo
-#' @param language language of the returned text, either one of \code{"en"} (English), \code{"de"} (German) or \code{"nl"} (Dutch)
+#' @param language language of the returned text, defaults to the systems language. Either one of \code{"en"} (English), \code{"de"} (German), \code{"nl"} (Dutch), \code{"es"} (Spanish) or \code{"pt"} (Portuguese).
 #' @source
 #' [1] Becker K \emph{et al.} \strong{Coagulase-Negative Staphylococci}. 2014. Clin Microbiol Rev. 27(4): 870–926. \url{https://dx.doi.org/10.1128/CMR.00109-13}
 #'
 #' [2] Lancefield RC \strong{A serological differentiation of human and other groups of hemolytic streptococci}. 1933. J Exp Med. 57(4): 571–95. \url{https://dx.doi.org/10.1084/jem.57.4.571}
 #' @rdname mo_property
+#' @name mo_property
 #' @return Character or logical (only \code{mo_aerobic})
 #' @export
 #' @importFrom dplyr %>% left_join pull
@@ -44,14 +45,6 @@
 #' mo_gramstain("E. coli")       # "Negative rods"
 #' mo_aerobic("E. coli")         # TRUE
 #'
-#' # language support for Spanish, German and Dutch
-#' mo_type("E. coli", "es")      # "Bakteria"
-#' mo_type("E. coli", "de")      # "Bakterien"
-#' mo_type("E. coli", "nl")      # "Bacterie"
-#' mo_gramstain("E. coli", "es") # "Bacilos negativos"
-#' mo_gramstain("E. coli", "de") # "Negative Staebchen"
-#' mo_gramstain("E. coli", "nl") # "Negatieve staven"
-#'
 #'
 #' # Abbreviations known in the field
 #' mo_genus("MRSA")              # "Staphylococcus"
@@ -95,26 +88,23 @@
 #' mo_fullname("S. pyo", Lancefield = TRUE)  # "Streptococcus group A"
 #' mo_shortname("S. pyo")                    # "S. pyogenes"
 #' mo_shortname("S. pyo", Lancefield = TRUE) # "GAS"
-mo_property <- function(x, property = 'fullname', Becker = FALSE, Lancefield = FALSE) {
-  property <- tolower(property[1])
-  if (!property %in% colnames(microorganisms)) {
-    stop("invalid property: ", property, " - use a column name of the `microorganisms` data set")
-  }
-  result1 <- as.mo(x = x, Becker = Becker, Lancefield = Lancefield) # this will give a warning if x cannot be coerced
-  result2 <- suppressWarnings(
-    data.frame(mo = result1, stringsAsFactors = FALSE) %>%
-      left_join(AMR::microorganisms, by = "mo") %>%
-      pull(property)
-  )
-  if (property != "aerobic") {
-    # will else not retain logical class
-    result2[x %in% c("", NA) | result2 %in% c("", NA, "(no MO)")] <- ""
-  }
-  result2
-}
-
-#' @rdname mo_property
-#' @export
+#'
+#'
+#' # Language support for German, Dutch, Spanish and Portuguese
+#' mo_type("E. coli", language = "de")       # "Bakterium"
+#' mo_type("E. coli", language = "nl")       # "Bacterie"
+#' mo_type("E. coli", language = "es")       # "Bakteria"
+#' mo_gramstain("E. coli", language = "de")  # "Negative Staebchen"
+#' mo_gramstain("E. coli", language = "nl")  # "Negatieve staven"
+#' mo_gramstain("E. coli", language = "es")  # "Bacilos negativos"
+#' mo_gramstain("Giardia", language = "pt")  # "Parasitas"
+#'
+#' mo_fullname("S. pyo",
+#'             Lancefield = TRUE,
+#'             language = "de")              # "Streptococcus Gruppe A"
+#' mo_fullname("S. pyo",
+#'             Lancefield = TRUE,
+#'             language = "nl")              # "Streptococcus groep A"
 mo_family <- function(x) {
  mo_property(x, "family")
 }
@@ -127,34 +117,34 @@ mo_genus <- function(x) {

 #' @rdname mo_property
 #' @export
-mo_species <- function(x, Becker = FALSE, Lancefield = FALSE) {
-  mo_property(x, "species", Becker = Becker, Lancefield = Lancefield)
+mo_species <- function(x, Becker = FALSE, Lancefield = FALSE, language = NULL) {
+  mo_property(x, "species", Becker = Becker, Lancefield = Lancefield, language = language)
 }

 #' @rdname mo_property
 #' @export
-mo_subspecies <- function(x, Becker = FALSE, Lancefield = FALSE) {
-  mo_property(x, "subspecies", Becker = Becker, Lancefield = Lancefield)
+mo_subspecies <- function(x, Becker = FALSE, Lancefield = FALSE, language = NULL) {
+  mo_property(x, "subspecies", Becker = Becker, Lancefield = Lancefield, language = language)
 }

 #' @rdname mo_property
 #' @export
-mo_fullname <- function(x, Becker = FALSE, Lancefield = FALSE) {
-  mo_property(x, "fullname", Becker = Becker, Lancefield = Lancefield)
+mo_fullname <- function(x, Becker = FALSE, Lancefield = FALSE, language = NULL) {
+  mo_property(x, "fullname", Becker = Becker, Lancefield = Lancefield, language = language)
 }

 #' @rdname mo_property
 #' @export
-mo_shortname <- function(x, Becker = FALSE, Lancefield = FALSE) {
+mo_shortname <- function(x, Becker = FALSE, Lancefield = FALSE, language = NULL) {
  if (Becker %in% c(TRUE, "all") | Lancefield == TRUE) {
    res1 <- as.mo(x)
    res2 <- suppressWarnings(as.mo(x, Becker = Becker, Lancefield = Lancefield))
    res2_fullname <- mo_fullname(res2)
    res2_fullname[res2_fullname %like% "\\(CoNS\\)"] <- "CoNS"
    res2_fullname[res2_fullname %like% "\\(CoPS\\)"] <- "CoPS"
-    res2_fullname <- gsub("Streptococcus group (.*)",
-                          "G\\1S",
-                          res2_fullname) # turn "Streptococcus group A" to "GAS"
+    res2_fullname <- gsub("Streptococcus (group|gruppe|Gruppe|groep|grupo) (.)",
+                          "G\\2S",
+                          res2_fullname) # turn "Streptococcus group A" and "Streptococcus grupo A" to "GAS"
    res2_fullname[res2_fullname == mo_fullname(x)] <- paste0(substr(mo_genus(res2_fullname), 1, 1),
                                                             ". ",
                                                             suppressWarnings(mo_species(res2_fullname)))
@@ -170,20 +160,20 @@ mo_shortname <- function(x, Becker = FALSE, Lancefield = FALSE) {
    result <- paste0(substr(mo_genus(x), 1, 1), ". ", suppressWarnings(mo_species(x)))
  }
  result[result %in% c(". ")] <- ""
-  result
+  mo_translate(result, language = language)
 }


 #' @rdname mo_property
 #' @export
-mo_type <- function(x, language = "en") {
-  mo_property(x, paste0("type", checklang(language)))
+mo_type <- function(x, language = NULL) {
+  mo_property(x, "type", language = language)
 }

 #' @rdname mo_property
 #' @export
-mo_gramstain <- function(x, language = "en") {
-  mo_property(x, paste0("gramstain", checklang(language)))
+mo_gramstain <- function(x, language = NULL) {
+  mo_property(x, "gramstain", language = language)
 }

 #' @rdname mo_property
@@ -192,15 +182,127 @@ mo_aerobic <- function(x) {
  mo_property(x, "aerobic")
 }

-checklang <- function(language) {
-  language <- tolower(language[1])
-  supported <- c("en", "de", "nl", "es")
-  if (!language %in% c(NULL, "", supported)) {
-    stop("invalid language: ", language, " - use one of ", paste0("'", sort(supported), "'", collapse = ", "), call. = FALSE)
+#' @rdname mo_property
+#' @export
+mo_property <- function(x, property = 'fullname', Becker = FALSE, Lancefield = FALSE, language = NULL) {
+  property <- tolower(property[1])
+  if (!property %in% colnames(microorganisms)) {
+    stop("invalid property: ", property, " - use a column name of the `microorganisms` data set")
  }
-  if (language %in% c(NULL, "", "en")) {
-    ""
-  } else {
-    paste0("_", language)
+  result1 <- as.mo(x = x, Becker = Becker, Lancefield = Lancefield) # this will give a warning if x cannot be coerced
+  result2 <- suppressWarnings(
+    data.frame(mo = result1, stringsAsFactors = FALSE) %>%
+      left_join(AMR::microorganisms, by = "mo") %>%
+      pull(property)
+  )
+  if (property != "aerobic") {
+    # will else not retain `logical` class
+    result2[x %in% c("", NA) | result2 %in% c("", NA, "(no MO)")] <- ""
+    result2 <- mo_translate(result2, language = language)
  }
+  result2
+}
+
+#' @importFrom dplyr %>% case_when
+mo_translate <- function(x, language) {
+  if (is.null(language)) {
+    language <- mo_getlangcode()
+  } else {
+    language <- tolower(language[1])
+  }
+  if (language %in% c("en", "")) {
+    return(x)
+  }
+
+  supported <- c("en", "de", "nl", "es", "pt")
+  if (!language %in% supported) {
+    stop("Unsupported language: '", language, "' - use one of ", paste0("'", sort(supported), "'", collapse = ", "), call. = FALSE)
+  }
+
+  case_when(
+    # German
+    language == "de" ~ x %>%
+      gsub("(no MO)",          "(kein MO)", ., fixed = TRUE) %>%
+      gsub("Negative rods",    "Negative St\u00e4bchen", ., fixed = TRUE) %>%
+      gsub("Negative cocci",   "Negative Kokken", ., fixed = TRUE) %>%
+      gsub("Positive rods",    "Positive St\u00e4bchen", ., fixed = TRUE) %>%
+      gsub("Positive cocci",   "Positive Kokken", ., fixed = TRUE) %>%
+      gsub("Parasites",        "Parasiten", ., fixed = TRUE) %>%
+      gsub("Fungi and yeasts", "Pilze und Hefen", ., fixed = TRUE) %>%
+      gsub("Bacteria",         "Bakterium", ., fixed = TRUE) %>%
+      gsub("Fungus/yeast",     "Pilz/Hefe", ., fixed = TRUE) %>%
+      gsub("Parasite",         "Parasit", ., fixed = TRUE) %>%
+      gsub("biogroup",         "Biogruppe", ., fixed = TRUE) %>%
+      gsub("biotype",          "Biotyp", ., fixed = TRUE) %>%
+      gsub("vegetative",       "vegetativ", ., fixed = TRUE) %>%
+      gsub("([([ ]*?)group",   "\\1Gruppe", .) %>%
+      gsub("([([ ]*?)Group",   "\\1Gruppe", .),
+
+    # Dutch
+    language == "nl" ~ x %>%
+      gsub("(no MO)",          "(geen MO)", ., fixed = TRUE) %>%
+      gsub("Negative rods",    "Negatieve staven", ., fixed = TRUE) %>%
+      gsub("Negative cocci",   "Negatieve kokken", ., fixed = TRUE) %>%
+      gsub("Positive rods",    "Positieve staven", ., fixed = TRUE) %>%
+      gsub("Positive cocci",   "Positieve kokken", ., fixed = TRUE) %>%
+      gsub("Parasites",        "Parasieten", ., fixed = TRUE) %>%
+      gsub("Fungi and yeasts", "Schimmels en gisten", ., fixed = TRUE) %>%
+      gsub("Bacteria",         "Bacterie", ., fixed = TRUE) %>%
+      gsub("Fungus/yeast",     "Schimmel/gist", ., fixed = TRUE) %>%
+      gsub("Parasite",         "Parasiet", ., fixed = TRUE) %>%
+      gsub("biogroup",         "biogroep", ., fixed = TRUE) %>%
+      # gsub("biotype",          "biotype", ., fixed = TRUE) %>%
+      gsub("vegetative",       "vegetatief", ., fixed = TRUE) %>%
+      gsub("([([ ]*?)group",   "\\1groep", .) %>%
+      gsub("([([ ]*?)Group",   "\\1Groep", .),
+
+    # Spanish
+    language == "es" ~ x %>%
+      gsub("(no MO)",          "(sin MO)", ., fixed = TRUE) %>%
+      gsub("Negative rods",    "Bacilos negativos", ., fixed = TRUE) %>%
+      gsub("Negative cocci",   "Cocos negativos", ., fixed = TRUE) %>%
+      gsub("Positive rods",    "Bacilos positivos", ., fixed = TRUE) %>%
+      gsub("Positive cocci",   "Cocos positivos", ., fixed = TRUE) %>%
+      gsub("Parasites",        "Par\u00e1sitos", ., fixed = TRUE) %>%
+      gsub("Fungi and yeasts", "Hongos y levaduras", ., fixed = TRUE) %>%
+      # gsub("Bacteria",         "Bacteria", ., fixed = TRUE) %>%
+      gsub("Fungus/yeast",     "Hongo/levadura", ., fixed = TRUE) %>%
+      gsub("Parasite",         "Par\u00e1sito", ., fixed = TRUE) %>%
+      gsub("biogroup",         "biogrupo", ., fixed = TRUE) %>%
+      gsub("biotype",          "biotipo", ., fixed = TRUE) %>%
+      gsub("vegetative",       "vegetativo", ., fixed = TRUE) %>%
+      gsub("([([ ]*?)group",   "\\1grupo", .) %>%
+      gsub("([([ ]*?)Group",   "\\1Grupo", .),
+
+    # Portuguese
+    language == "pt" ~ x %>%
+      gsub("(no MO)",          "(sem MO)", ., fixed = TRUE) %>%
+      gsub("Negative rods",    "Bacilos negativos", ., fixed = TRUE) %>%
+      gsub("Negative cocci",   "Cocos negativos", ., fixed = TRUE) %>%
+      gsub("Positive rods",    "Bacilos positivos", ., fixed = TRUE) %>%
+      gsub("Positive cocci",   "Cocos positivos", ., fixed = TRUE) %>%
+      gsub("Parasites",        "Parasitas", ., fixed = TRUE) %>%
+      gsub("Fungi and yeasts", "Cogumelos e leveduras", ., fixed = TRUE) %>%
+      gsub("Bacteria",         "Bact\u00e9ria", ., fixed = TRUE) %>%
+      gsub("Fungus/yeast",     "Cogumelo/levedura", ., fixed = TRUE) %>%
+      gsub("Parasite",         "Parasita", ., fixed = TRUE) %>%
+      gsub("biogroup",         "biogrupo", ., fixed = TRUE) %>%
+      gsub("biotype",          "bi\u00f3tipo", ., fixed = TRUE) %>%
+      gsub("vegetative",       "vegetativo", ., fixed = TRUE) %>%
+      gsub("([([ ]*?)group",   "\\1grupo", .) %>%
+      gsub("([([ ]*?)Group",   "\\1Grupo", .)
+  )
+
+}
+
+#' @importFrom dplyr case_when
+mo_getlangcode <- function() {
+  sys <- base::Sys.getlocale()
+  case_when(
+    sys %like% '(Deutsch|German|de_)'       ~ "de",
+    sys %like% '(Nederlands|Dutch|nl_)'     ~ "nl",
+    sys %like% '(Espa.ol|Spanish|es_)'      ~ "es",
+    sys %like% '(Portugu.s|Portuguese|pt_)' ~ "pt",
+    TRUE                                    ~ "en"
+  )
 }
--- a/R/zzz.R
+++ b/R/zzz.R
@@ -1,3 +1,49 @@
+# ==================================================================== #
+# TITLE                                                                #
+# Antimicrobial Resistance (AMR) Analysis                              #
+#                                                                      #
+# AUTHORS                                                              #
+# Berends MS (m.s.berends@umcg.nl), Luz CF (c.f.luz@umcg.nl)           #
+#                                                                      #
+# LICENCE                                                              #
+# This program is free software; you can redistribute it and/or modify #
+# it under the terms of the GNU General Public License version 2.0,    #
+# as published by the Free Software Foundation.                        #
+#                                                                      #
+# This program is distributed in the hope that it will be useful,      #
+# but WITHOUT ANY WARRANTY; without even the implied warranty of       #
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the        #
+# GNU General Public License for more details.                         #
+# ==================================================================== #
+
+#' The \code{AMR} Package
+#'
+#' Welcome to the \code{AMR} package. This page gives some additional contact information abount the authors.
+#' @details
+#' This package was intended to simplify the analysis and prediction of Antimicrobial Resistance (AMR) and work with antibiotic properties by using evidence-based methods.
+#'
+#' This package was created for academic research by PhD students of the Faculty of Medical Sciences of the University of Groningen and the Medical Microbiology & Infection Prevention (MMBI) department of the University Medical Center Groningen (UMCG).
+#' @section Authors:
+#' Matthijs S. Berends[1,2] Christian F. Luz[1], Erwin E.A. Hassing[2],  Corinna Glasner[1],  Alex W. Friedrich[1],  Bhanu Sinha[1] \cr
+#'
+#' [1] Department of Medical Microbiology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands - \url{rug.nl} \url{umcg.nl} \cr
+#' [2] Certe Medical Diagnostics & Advice, Groningen, the Netherlands - \url{certe.nl}
+#' @section Contact us:
+#' For suggestions, comments or questions, please contact us at:
+#'
+#' Matthijs S. Berends \cr
+#' m.s.berends [at] umcg [dot] nl \cr
+#' Department of Medical Microbiology, University of Groningen \cr
+#' University Medical Center Groningen \cr
+#' Post Office Box 30001 \cr
+#' 9700 RB Groningen
+#'
+#' If you have found a bug, please file a new issue at: \cr
+#' \url{https://github.com/msberends/AMR/issues}
+#' @name AMR
+#' @rdname AMR
+NULL
+
 .onLoad <- function(libname, pkgname) {
  backports::import(pkgname)
 }
--- a/README.md
+++ b/README.md
@@ -55,7 +55,7 @@ This `AMR` package basically does four important things:
   * Use `first_isolate` to identify the first isolates of every patient [using guidelines from the CLSI](https://clsi.org/standards/products/microbiology/documents/m39/) (Clinical and Laboratory Standards Institute).
     * You can also identify first *weighted* isolates of every patient, an adjusted version of the CLSI guideline. This takes into account key antibiotics of every strain and compares them.
   * Use `MDRO` (abbreviation of Multi Drug Resistant Organisms) to check your isolates for exceptional resistance with country-specific guidelines or EUCAST rules. Currently, national guidelines for Germany and the Netherlands are supported.
-   * The data set `microorganisms` contains the family, genus, species, subspecies, colloquial name and Gram stain of almost 3,000 potential human pathogenic microorganisms (bacteria, fungi/yeasts and parasites). This enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like `mo_genus`, `mo_family` or `mo_gramstain`. As they use `as.mo` internally, they also use artificial intelligence. For example, `mo_genus("MRSA")` and `mo_genus("S. aureus")` will both return `"Staphylococcus"`. Some functions can return results in Spanish, German and Dutch. These functions can be used to add new variables to your data.
+   * The data set `microorganisms` contains the family, genus, species, subspecies, colloquial name and Gram stain of almost 3,000 potential human pathogenic microorganisms (bacteria, fungi/yeasts and parasites). This enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like `mo_genus`, `mo_family` or `mo_gramstain`. As they use `as.mo` internally, they also use artificial intelligence. For example, `mo_genus("MRSA")` and `mo_genus("S. aureus")` will both return `"Staphylococcus"`. They also come with support for German, Dutch, Spanish and Portuguese. These functions can be used to add new variables to your data.
   * The data set `antibiotics` contains the ATC code, LIS codes, official name, trivial name and DDD of both oral and parenteral administration. It also contains a total of 298 trade names. Use functions like `ab_official` and `ab_tradenames` to look up values. As the `mo_*` functions use `as.mo` internally, the `ab_*` functions use `as.atc` internally so it uses AI to guess your expected result. For example, `ab_official("Fluclox")`, `ab_official("Floxapen")` and `ab_official("J01CF05")` will all return `"Flucloxacillin"`. These functions can again be used to add new variables to your data.

 3. It **analyses the data** with convenient functions that use well-known methods.
@@ -384,11 +384,11 @@ septic_patients   # A tibble: 2,000 x 49

 # Dataset with ATC antibiotics codes, official names, trade names 
 # and DDDs (oral and parenteral)
-antibiotics       # A tibble: 420 x 18
+antibiotics       # A tibble: 423 x 18

 # Dataset with bacteria codes and properties like gram stain and 
 # aerobic/anaerobic
-microorganisms    # A tibble: 2,453 x 12
+microorganisms    # A tibble: 2,669 x 10
 ```

 ## Copyright
--- a/data/microorganisms.rda
+++ b/data/microorganisms.rda
--- a/data/septic_patients.rda
+++ b/data/septic_patients.rda
--- a/man/AMR.Rd
+++ b/man/AMR.Rd
@@ -0,0 +1,36 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/zzz.R
+\name{AMR}
+\alias{AMR}
+\title{The \code{AMR} Package}
+\description{
+Welcome to the \code{AMR} package. This page gives some additional contact information abount the authors.
+}
+\details{
+This package was intended to simplify the analysis and prediction of Antimicrobial Resistance (AMR) and work with antibiotic properties by using evidence-based methods.
+
+This package was created for academic research by PhD students of the Faculty of Medical Sciences of the University of Groningen and the Medical Microbiology & Infection Prevention (MMBI) department of the University Medical Center Groningen (UMCG).
+}
+\section{Authors}{
+
+Matthijs S. Berends[1,2] Christian F. Luz[1], Erwin E.A. Hassing[2],  Corinna Glasner[1],  Alex W. Friedrich[1],  Bhanu Sinha[1] \cr
+
+[1] Department of Medical Microbiology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands - \url{rug.nl} \url{umcg.nl} \cr
+[2] Certe Medical Diagnostics & Advice, Groningen, the Netherlands - \url{certe.nl}
+}
+
+\section{Contact us}{
+
+For suggestions, comments or questions, please contact us at:
+
+Matthijs S. Berends \cr
+m.s.berends [at] umcg [dot] nl \cr
+Department of Medical Microbiology, University of Groningen \cr
+University Medical Center Groningen \cr
+Post Office Box 30001 \cr
+9700 RB Groningen
+
+If you have found a bug, please file a new issue at: \cr
+\url{https://github.com/msberends/AMR/issues}
+}
+
--- a/man/as.mo.Rd
+++ b/man/as.mo.Rd
@@ -44,7 +44,6 @@ Some exceptions have been built in to get more logical results, based on prevale
 \itemize{
  \item{\code{"E. coli"} will return the ID of \emph{Escherichia coli} and not \emph{Entamoeba coli}, although the latter would alphabetically come first}
  \item{\code{"H. influenzae"} will return the ID of \emph{Haemophilus influenzae} and not \emph{Haematobacter influenzae}}
-  \item{Something like \code{"s pyo"} will return the ID of \emph{Streptococcus pyogenes} and not \emph{Actinomyes pyogenes}}
  \item{Something like \code{"p aer"} will return the ID of \emph{Pseudomonas aeruginosa} and not \emph{Pasteurella aerogenes}}
  \item{Something like \code{"stau"} or \code{"staaur"} will return the ID of \emph{Staphylococcus aureus} and not \emph{Staphylococcus auricularis}}
 }
--- a/man/microorganisms.Rd
+++ b/man/microorganisms.Rd
@@ -4,7 +4,7 @@
 \name{microorganisms}
 \alias{microorganisms}
 \title{Data set with human pathogenic microorganisms}
-\format{A \code{\link{tibble}} with 2,669 observations and 16 variables:
+\format{A \code{\link{tibble}} with 2,669 observations and 10 variables:
 \describe{
  \item{\code{mo}}{ID of microorganism}
  \item{\code{bactsys}}{Bactsyscode of microorganism}
@@ -16,12 +16,6 @@
  \item{\code{aerobic}}{Logical whether bacteria is aerobic}
  \item{\code{type}}{Type of microorganism, like \code{"Bacteria"} and \code{"Fungus/yeast"}}
  \item{\code{gramstain}}{Gram of microorganism, like \code{"Negative rods"}}
-  \item{\code{type_de}}{Type of microorganism in German, like \code{"Bakterien"} and \code{"Pilz/Hefe"}}
-  \item{\code{gramstain_de}}{Gram of microorganism in German, like \code{"Negative Staebchen"}}
-  \item{\code{type_nl}}{Type of microorganism in Dutch, like \code{"Bacterie"} and \code{"Schimmel/gist"}}
-  \item{\code{gramstain_nl}}{Gram of microorganism in Dutch, like \code{"Negatieve staven"}}
-  \item{\code{type_es}}{Type of microorganism in Spanish, like \code{"Bacteria"} and \code{"Hongo/levadura"}}
-  \item{\code{gramstain_es}}{Gram of microorganism in Spanish, like \code{"Bacilos negativos"}}
 }}
 \usage{
 microorganisms
--- a/man/mo_property.Rd
+++ b/man/mo_property.Rd
@@ -18,32 +18,30 @@
 [2] Lancefield RC \strong{A serological differentiation of human and other groups of hemolytic streptococci}. 1933. J Exp Med. 57(4): 571–95. \url{https://dx.doi.org/10.1084/jem.57.4.571}
 }
 \usage{
-mo_property(x, property = "fullname", Becker = FALSE,
-  Lancefield = FALSE)
-
 mo_family(x)

 mo_genus(x)

-mo_species(x, Becker = FALSE, Lancefield = FALSE)
+mo_species(x, Becker = FALSE, Lancefield = FALSE, language = NULL)

-mo_subspecies(x, Becker = FALSE, Lancefield = FALSE)
+mo_subspecies(x, Becker = FALSE, Lancefield = FALSE, language = NULL)

-mo_fullname(x, Becker = FALSE, Lancefield = FALSE)
+mo_fullname(x, Becker = FALSE, Lancefield = FALSE, language = NULL)

-mo_shortname(x, Becker = FALSE, Lancefield = FALSE)
+mo_shortname(x, Becker = FALSE, Lancefield = FALSE, language = NULL)

-mo_type(x, language = "en")
+mo_type(x, language = NULL)

-mo_gramstain(x, language = "en")
+mo_gramstain(x, language = NULL)

 mo_aerobic(x)
+
+mo_property(x, property = "fullname", Becker = FALSE,
+  Lancefield = FALSE, language = NULL)
 }
 \arguments{
 \item{x}{any (vector of) text that can be coerced to a valid microorganism code with \code{\link{as.mo}}}

-\item{property}{one of the column names of one of the \code{\link{microorganisms}} data set, like \code{"mo"}, \code{"bactsys"}, \code{"family"}, \code{"genus"}, \code{"species"}, \code{"fullname"}, \code{"gramstain"} and \code{"aerobic"}}
-
 \item{Becker}{a logical to indicate whether \emph{Staphylococci} should be categorised into Coagulase Negative \emph{Staphylococci} ("CoNS") and Coagulase Positive \emph{Staphylococci} ("CoPS") instead of their own species, according to Karsten Becker \emph{et al.} [1].

  This excludes \emph{Staphylococcus aureus} at default, use \code{Becker = "all"} to also categorise \emph{S. aureus} as "CoPS".}
@@ -52,7 +50,9 @@ mo_aerobic(x)

  This excludes \emph{Enterococci} at default (who are in group D), use \code{Lancefield = "all"} to also categorise all \emph{Enterococci} as group D.}

-\item{language}{language of the returned text, either one of \code{"en"} (English), \code{"de"} (German) or \code{"nl"} (Dutch)}
+\item{language}{language of the returned text, defaults to the systems language. Either one of \code{"en"} (English), \code{"de"} (German), \code{"nl"} (Dutch), \code{"es"} (Spanish) or \code{"pt"} (Portuguese).}
+
+\item{property}{one of the column names of one of the \code{\link{microorganisms}} data set, like \code{"mo"}, \code{"bactsys"}, \code{"family"}, \code{"genus"}, \code{"species"}, \code{"fullname"}, \code{"gramstain"} and \code{"aerobic"}}
 }
 \value{
 Character or logical (only \code{mo_aerobic})
@@ -72,14 +72,6 @@ mo_type("E. coli")            # "Bacteria"
 mo_gramstain("E. coli")       # "Negative rods"
 mo_aerobic("E. coli")         # TRUE

-# language support for Spanish, German and Dutch
-mo_type("E. coli", "es")      # "Bakteria"
-mo_type("E. coli", "de")      # "Bakterien"
-mo_type("E. coli", "nl")      # "Bacterie"
-mo_gramstain("E. coli", "es") # "Bacilos negativos"
-mo_gramstain("E. coli", "de") # "Negative Staebchen"
-mo_gramstain("E. coli", "nl") # "Negatieve staven"
-

 # Abbreviations known in the field
 mo_genus("MRSA")              # "Staphylococcus"
@@ -123,6 +115,23 @@ mo_fullname("S. pyo")                     # "Streptococcus pyogenes"
 mo_fullname("S. pyo", Lancefield = TRUE)  # "Streptococcus group A"
 mo_shortname("S. pyo")                    # "S. pyogenes"
 mo_shortname("S. pyo", Lancefield = TRUE) # "GAS"
+
+
+# Language support for German, Dutch, Spanish and Portuguese
+mo_type("E. coli", language = "de")       # "Bakterium"
+mo_type("E. coli", language = "nl")       # "Bacterie"
+mo_type("E. coli", language = "es")       # "Bakteria"
+mo_gramstain("E. coli", language = "de")  # "Negative Staebchen"
+mo_gramstain("E. coli", language = "nl")  # "Negatieve staven"
+mo_gramstain("E. coli", language = "es")  # "Bacilos negativos"
+mo_gramstain("Giardia", language = "pt")  # "Parasitas"
+
+mo_fullname("S. pyo",
+            Lancefield = TRUE,
+            language = "de")              # "Streptococcus Gruppe A"
+mo_fullname("S. pyo",
+            Lancefield = TRUE,
+            language = "nl")              # "Streptococcus groep A"
 }
 \seealso{
 \code{\link{microorganisms}}
--- a/tests/testthat/test-data.R
+++ b/tests/testthat/test-data.R
@@ -0,0 +1,7 @@
+context("data.R")
+
+test_that("data sets are valid", {
+  # IDs should always be unique
+  expect_identical(nrow(antibiotics), length(unique(antibiotics$atc)))
+  expect_identical(nrow(microorganisms), length(unique(microorganisms$mo)))
+})
--- a/tests/testthat/test-first_isolate.R
+++ b/tests/testthat/test-first_isolate.R
@@ -10,7 +10,7 @@ test_that("first isolates work", {
                    col_mo = "mo",
                    info = TRUE),
      na.rm = TRUE),
-    1331)
+    1330)

  # septic_patients contains 1426 out of 2000 first *weighted* isolates
  expect_equal(
@@ -24,7 +24,7 @@ test_that("first isolates work", {
                      type = "keyantibiotics",
                      info = TRUE),
        na.rm = TRUE)),
-    1426)
+    1425)
  # and 1449 when not ignoring I
  expect_equal(
    suppressWarnings(
@@ -38,7 +38,7 @@ test_that("first isolates work", {
                      type = "keyantibiotics",
                      info = TRUE),
        na.rm = TRUE)),
-    1449)
+    1448)
  # and 1430 when using points
  expect_equal(
    suppressWarnings(
@@ -64,7 +64,7 @@ test_that("first isolates work", {
                    info = TRUE,
                    icu_exclude = TRUE),
      na.rm = TRUE),
-    1176)
+    1175)

  # set 1500 random observations to be of specimen type 'Urine'
  random_rows <- sample(x = 1:2000, size = 1500, replace = FALSE)
--- a/tests/testthat/test-mo.R
+++ b/tests/testthat/test-mo.R
@@ -13,6 +13,7 @@ test_that("as.mo works", {
  expect_equal(as.character(as.mo("Klebsiella")), "KLE")
  expect_equal(as.character(as.mo("K. pneu rhino")), "KLEPNERH") # K. pneumoniae subspp. rhinoscleromatis
  expect_equal(as.character(as.mo("Bartonella")), "BAR")
+  expect_equal(as.character(as.mo("C. difficile")), "CLODIF")

  expect_equal(as.character(as.mo("S. pyo")), "STCPYO") # not Actinomyces pyogenes

--- a/tests/testthat/test-mo_property.R
+++ b/tests/testthat/test-mo_property.R
@@ -6,8 +6,8 @@ test_that("mo_property works", {
  expect_equal(mo_species("E. coli"), "coli")
  expect_equal(mo_subspecies("E. coli"), "")
  expect_equal(mo_fullname("E. coli"), "Escherichia coli")
-  expect_equal(mo_type("E. coli"), "Bacteria")
-  expect_equal(mo_gramstain("E. coli"), "Negative rods")
+  expect_equal(mo_type("E. coli", language = "en"), "Bacteria")
+  expect_equal(mo_gramstain("E. coli", language = "en"), "Negative rods")
  expect_equal(mo_aerobic("E. coli"), TRUE)

  expect_equal(mo_shortname("MRSA"), "S. aureus")
@@ -16,8 +16,8 @@ test_that("mo_property works", {
  expect_equal(mo_shortname("S. aga"), "S. agalactiae")
  expect_equal(mo_shortname("S. aga", Lancefield = TRUE), "GBS")

-  expect_equal(mo_type("E. coli", language = "de"), "Bakterien")
-  expect_equal(mo_gramstain("E. coli", language = "de"), "Negative Staebchen")
+  expect_equal(mo_type("E. coli", language = "de"), "Bakterium")
+  expect_equal(mo_gramstain("E. coli", language = "de"), "Negative St\u00e4bchen")

  expect_equal(mo_type("E. coli", language = "nl"), "Bacterie")
  expect_equal(mo_gramstain("E. coli", language = "nl"), "Negatieve staven")
--- a/vignettes/AMR.Rmd
+++ b/vignettes/AMR.Rmd
@@ -34,7 +34,7 @@ This `AMR` package basically does four important things:
   * Use `first_isolate` to identify the first isolates of every patient [using guidelines from the CLSI](https://clsi.org/standards/products/microbiology/documents/m39/) (Clinical and Laboratory Standards Institute).
     * You can also identify first *weighted* isolates of every patient, an adjusted version of the CLSI guideline. This takes into account key antibiotics of every strain and compares them.
   * Use `MDRO` (abbreviation of Multi Drug Resistant Organisms) to check your isolates for exceptional resistance with country-specific guidelines or EUCAST rules. Currently, national guidelines for Germany and the Netherlands are supported.
-   * The data set `microorganisms` contains the family, genus, species, subspecies, colloquial name and Gram stain of almost 3,000 potential human pathogenic microorganisms (bacteria, fungi/yeasts and parasites). This enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like `mo_genus`, `mo_family` or `mo_gramstain`. As they use `as.mo` internally, they also use artificial intelligence. For example, `mo_genus("MRSA")` and `mo_genus("S. aureus")` will both return `"Staphylococcus"`. Some functions can return results in Spanish, German and Dutch. These functions can be used to add new variables to your data.
+   * The data set `microorganisms` contains the family, genus, species, subspecies, colloquial name and Gram stain of almost 3,000 potential human pathogenic microorganisms (bacteria, fungi/yeasts and parasites). This enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like `mo_genus`, `mo_family` or `mo_gramstain`. As they use `as.mo` internally, they also use artificial intelligence. For example, `mo_genus("MRSA")` and `mo_genus("S. aureus")` will both return `"Staphylococcus"`. They also come with support for German, Dutch, Spanish and Portuguese. These functions can be used to add new variables to your data.
   * The data set `antibiotics` contains the ATC code, LIS codes, official name, trivial name and DDD of both oral and parenteral administration. It also contains a total of 298 trade names. Use functions like `ab_official` and `ab_tradenames` to look up values. As the `mo_*` functions use `as.mo` internally, the `ab_*` functions use `as.atc` internally so it uses AI to guess your expected result. For example, `ab_official("Fluclox")`, `ab_official("Floxapen")` and `ab_official("J01CF05")` will all return `"Flucloxacillin"`. These functions can again be used to add new variables to your data.

 3. It **analyses the data** with convenient functions that use well-known methods.