new MOs, cleanup

2025-06-08 05:54:01 +02:00 · 2018-09-01 21:19:46 +02:00 · 2018-09-01 21:19:46 +02:00 · 75fe4d401f
commit 75fe4d401f
parent 5965d3c794
20 changed files with 166 additions and 179 deletions
--- a/2
+++ b/2
@ -1,6 +1,6 @@
 Package: AMR
 Version: 0.3.0.9006
-Date: 2018-08-31
+Date: 2018-09-01
 Title: Antimicrobial Resistance Analysis
 Authors@R: c(
    person(
--- a/NEWS.md
+++ b/NEWS.md
@ -15,7 +15,7 @@
 * Introduction to AMR as a vignette

 #### Changed
-* Added 182 microorganisms to the `microorganisms` data set, now *n* = 2,646 (2,207 bacteria, 285 fungi/yeasts, 153 parasites, 1 other)
+* Added 226 microorganisms to the `microorganisms` data set and removed the few viruses it contained, now *n* = 2,664 (2,225 bacteria, 285 fungi/yeasts, 153 parasites, 1 other)
 * Added three antimicrobial agents to the `antibiotics` data set: Terbinafine (D01BA02), Rifaximin (A07AA11) and Isoconazole (D01AC05)
 * Added 163 trade names to the `antibiotics` data set, it now contains 298 different trade names in total, e.g.:
  ```r
@ -28,7 +28,7 @@
  ```
 * Function `ratio` is now deprecated and will be removed in a future release, as it is not really the scope of this package
 * Fix for `as.mic` for values ending in zeroes after a real number
-* Huge speed improvement for `as.bactid` (now `as.mo`)
+* Tremendous speed improvement for `as.bactid` (now `as.mo`)
 * Added parameters `minimum` and `as_percent` to `portion_df`
 * Support for quasiquotation in the functions series `count_*` and `portions_*`, and `n_rsi`. This allows to check for more than 2 vectors or columns.
  ```r
--- a/R/atc.R
+++ b/R/atc.R
@ -31,7 +31,7 @@
 #' In the ATC classification system, the active substances are classified in a hierarchy with five different levels.  The system has fourteen main anatomical/pharmacological groups or 1st levels. Each ATC main group is divided into 2nd levels which could be either pharmacological or therapeutic groups.  The 3rd and 4th levels are chemical, pharmacological or therapeutic subgroups and the 5th level is the chemical substance.  The 2nd, 3rd and 4th levels are often used to identify pharmacological subgroups when that is considered more appropriate than therapeutic or chemical subgroups.
 #'   Source: \url{https://www.whocc.no/atc/structure_and_principles/}
 #' @return Character (vector) with class \code{"act"}. Unknown values will return \code{NA}.
-#' @seealso \code{\link{antibiotics}} for the dataframe that is being used to determine ATC's.
+#' @seealso \code{\link{antibiotics}} for the dataframe that is being used to determine ATCs.
 #' @examples
 #' # These examples all return "J01FA01", the ATC code of Erythromycin:
 #' as.atc("J01FA01")
--- a/R/data.R
+++ b/R/data.R
@ -16,10 +16,10 @@
 # GNU General Public License for more details.                         #
 # ==================================================================== #

-#' Dataset with 423 antibiotics
+#' Data set with 423 antibiotics
 #'
-#' A dataset containing all antibiotics with a J0 code and some other antimicrobial agents, with their DDD's. Except for trade names and abbreviations, all properties were downloaded from the WHO, see Source.
-#' @format A data.frame with 423 observations and 18 variables:
+#' A data set containing all antibiotics with a J0 code and some other antimicrobial agents, with their DDDs. Except for trade names and abbreviations, all properties were downloaded from the WHO, see Source.
+#' @format A \code{\link{tibble}} with 423 observations and 18 variables:
 #' \describe{
 #'   \item{\code{atc}}{ATC code, like \code{J01CR02}}
 #'   \item{\code{certe}}{Certe code, like \code{amcl}}
@ -120,10 +120,10 @@
 #
 "antibiotics"

-#' Dataset with ~2650 microorganisms
+#' Data set with human pathogenic microorganisms
 #'
-#' A dataset containing 2,646 microorganisms. MO codes of the UMCG can be looked up using \code{\link{microorganisms.umcg}}.
-#' @format A data.frame with 2,646 observations and 12 variables:
+#' A data set containing 2,664 (potential) human pathogenic microorganisms. MO codes can be looked up using \code{\link{guess_mo}}.
+#' @format A \code{\link{tibble}} with 2,664 observations and 12 variables:
 #' \describe{
 #'   \item{\code{mo}}{ID of microorganism}
 #'   \item{\code{bactsys}}{Bactsyscode of microorganism}
@ -151,10 +151,10 @@
 #' @seealso \code{\link{guess_mo}} \code{\link{antibiotics}} \code{\link{microorganisms.umcg}}
 "microorganisms"

-#' Translation table for UMCG with ~1100 microorganisms
+#' Translation table for UMCG with ~1,100 microorganisms
 #'
-#' A dataset containing all bacteria codes of UMCG MMB. These codes can be joined to data with an ID from \code{\link{microorganisms}$mo} (using \code{\link{left_join_microorganisms}}). GLIMS codes can also be translated to valid \code{mo}'s with \code{\link{guess_mo}}.
-#' @format A data.frame with 1090 observations and 2 variables:
+#' A data set containing all bacteria codes of UMCG MMB. These codes can be joined to data with an ID from \code{\link{microorganisms}$mo} (using \code{\link{left_join_microorganisms}}). GLIMS codes can also be translated to valid \code{MO}s with \code{\link{guess_mo}}.
+#' @format A \code{\link{tibble}} with 1,090 observations and 2 variables:
 #' \describe{
 #'   \item{\code{umcg}}{Code of microorganism according to UMCG MMB}
 #'   \item{\code{mo}}{Code of microorganism in \code{\link{microorganisms}}}
@ -163,10 +163,10 @@
 #' @seealso \code{\link{guess_mo}} \code{\link{microorganisms}}
 "microorganisms.umcg"

-#' Dataset with 2000 blood culture isolates of septic patients
+#' Data set with 2000 blood culture isolates of septic patients
 #'
-#' An anonymised dataset containing 2000 microbial blood culture isolates with their full antibiograms found in septic patients in 4 different hospitals in the Netherlands, between 2001 and 2017. It is true, genuine data. This \code{data.frame} can be used to practice AMR analysis. For examples, press F1.
-#' @format A data.frame with 2000 observations and 49 variables:
+#' An anonymised data set containing 2,000 microbial blood culture isolates with their full antibiograms found in septic patients in 4 different hospitals in the Netherlands, between 2001 and 2017. It is true, genuine data. This \code{data.frame} can be used to practice AMR analysis. For examples, press F1.
+#' @format A \code{\link{tibble}} with 2,000 observations and 49 variables:
 #' \describe{
 #'   \item{\code{date}}{date of receipt at the laboratory}
 #'   \item{\code{hospital_id}}{ID of the hospital, from A to D}
@ -185,13 +185,13 @@
 #' # PREPARATION #
 #' # ----------- #
 #'
-#' # Save this example dataset to an object, so we can edit it:
+#' # Save this example data set to an object, so we can edit it:
 #' my_data <- septic_patients
 #'
 #' # load the dplyr package to make data science A LOT easier
 #' library(dplyr)
 #'
-#' # Add first isolates to our dataset:
+#' # Add first isolates to our data set:
 #' my_data <- my_data %>%
 #'   mutate(first_isolates = first_isolate(my_data, "date", "patient_id", "mo"))
 #'
--- a/R/eucast.R
+++ b/R/eucast.R
@ -280,8 +280,10 @@ EUCAST_rules <- function(tbl,
  }

  # join to microorganisms data set
+  col_mo_original <- NULL
  if (!tbl %>% pull(col_mo) %>% is.mo()) {
-    warning("Improve integrity of the `", col_mo, "` column by transforming it with 'as.mo'.")
+    col_mo_original <- tbl %>% pull(col_mo)
+    tbl[, col_mo] <- as.mo(tbl[, col_mo])
  }
  tbl <- tbl %>% left_join_microorganisms(by = col_mo, suffix = c("_tempmicroorganisms", ""))

@ -685,6 +687,10 @@ EUCAST_rules <- function(tbl,
  tbl <- tbl %>% select(-c((tbl.ncol - microorganisms.ncol):tbl.ncol))
  # and remove added suffices
  colnames(tbl) <- gsub("_tempmicroorganisms", "", colnames(tbl))
+  # restore old col_mo values if needed
+  if (!is.null(col_mo_original)) {
+    tbl[, col_mo] <- col_mo_original
+  }

  if (info == TRUE) {
    cat('Done.\n\nEUCAST Expert rules applied to',
--- a/R/first_isolate.R
+++ b/R/first_isolate.R
@ -178,7 +178,7 @@ first_isolate <- function(tbl,

  if (!is.na(col_mo)) {
    if (!tbl %>% pull(col_mo) %>% is.mo()) {
-      warning("Improve integrity of the `", col_mo, "` column by transforming it with 'as.mo'.")
+      tbl[, col_mo] <- as.mo(tbl[, col_mo])
    }
    # join to microorganisms data set
    tbl <- tbl %>% left_join_microorganisms(by = col_mo)
@ -311,7 +311,7 @@ first_isolate <- function(tbl,
    if (info == TRUE) {
      message('No isolates found.')
    }
-    # NA's where genus is unavailable
+    # NAs where genus is unavailable
    tbl <- tbl %>%
      mutate(real_first_isolate = if_else(genus == '', NA, FALSE))
    if (output_logical == FALSE) {
@ -406,7 +406,7 @@ first_isolate <- function(tbl,
    all_first[which(all_first[, col_icu] == TRUE), 'real_first_isolate'] <- FALSE
  }

-  # NA's where genus is unavailable
+  # NAs where genus is unavailable
  all_first <- all_first %>%
    mutate(real_first_isolate = if_else(genus %in% c('', '(no MO)', NA), NA, real_first_isolate))

--- a/R/globals.R
+++ b/R/globals.R
@ -16,61 +16,41 @@
 # GNU General Public License for more details.                         #
 # ==================================================================== #

-globalVariables(c('abname',
-                  'Antibiotic',
-                  'Interpretation',
-                  'Percentage',
-                  'bind_rows',
-                  'element_blank',
-                  'element_line',
-                  'theme',
-                  'theme_minimal',
-                  'antibiotic',
-                  'antibiotics',
-                  'atc',
-                  'bactid',
-                  'C_chisq_sim',
-                  'certe',
-                  'cnt',
-                  'count',
-                  'Count',
-                  'counts',
-                  'cum_count',
-                  'cum_percent',
-                  'date_lab',
-                  'days_diff',
-                  'fctlvl',
-                  'first_isolate_row_index',
-                  'Freq',
-                  'fullname',
-                  'genus',
-                  'gramstain',
-                  'item',
-                  'key_ab',
-                  'key_ab_lag',
-                  'key_ab_other',
-                  'labs',
-                  'median',
-                  'mic',
-                  'MIC',
-                  'microorganisms',
-                  'mocode',
-                  'n',
-                  'na.omit',
-                  'observations',
-                  'official',
-                  'other_pat_or_mo',
-                  'Pasted',
-                  'patient_id',
-                  'quantile',
-                  'R',
-                  'real_first_isolate',
-                  'S',
-                  'septic_patients',
-                  'species',
-                  'umcg',
-                  'value',
-                  'values',
-                  'View',
-                  'y',
-                  '.'))
+globalVariables(c(".",
+                  "antibiotic",
+                  "Antibiotic",
+                  "antibiotics",
+                  "cnt",
+                  "count",
+                  "Count",
+                  "cum_count",
+                  "cum_percent",
+                  "date_lab",
+                  "days_diff",
+                  "fctlvl",
+                  "first_isolate_row_index",
+                  "Freq",
+                  "genus",
+                  "gramstain",
+                  "Interpretation",
+                  "item",
+                  "key_ab",
+                  "key_ab_lag",
+                  "key_ab_other",
+                  "median",
+                  "mic",
+                  "microorganisms",
+                  "mo",
+                  "n",
+                  "observations",
+                  "other_pat_or_mo",
+                  "Pasted",
+                  "patient_id",
+                  "Percentage",
+                  "R",
+                  "real_first_isolate",
+                  "S",
+                  "septic_patients",
+                  "species",
+                  "value",
+                  "y"))
--- a/R/key_antibiotics.R
+++ b/R/key_antibiotics.R
@ -140,6 +140,9 @@ key_antibiotics <- function(tbl,
                    GramNeg_4, GramNeg_5, GramNeg_6)
  gram_negative <- gram_negative[!is.na(gram_negative)]

+  if (!tbl %>% pull(col_mo) %>% is.mo()) {
+    tbl[, col_mo] <- as.mo(tbl[, col_mo])
+  }
  # join microorganisms
  tbl <- tbl %>% left_join_microorganisms(col_mo)

--- a/R/mo.R
+++ b/R/mo.R
@ -91,7 +91,6 @@
 #' }
 as.mo <- function(x, Becker = FALSE, Lancefield = FALSE) {

-
  if (NCOL(x) == 2) {
    # support tidyverse selection like: df %>% select(colA, colB)
    # paste these columns together
@ -131,74 +130,11 @@ as.mo <- function(x, Becker = FALSE, Lancefield = FALSE) {
  x_species <- paste(x, 'species')
  # add start en stop regex
  x <- paste0('^', x, '$')
+  x_withspaces_all <- x_withspaces
  x_withspaces <- paste0('^', x_withspaces, '$')

  for (i in 1:length(x)) {

-    if (Becker == TRUE | Becker == "all") {
-      mo <- suppressWarnings(guess_mo(x_backup[i]))
-      if (mo %like% '^STA') {
-        # See Source. It's this figure:
-        # https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4187637/figure/F3/
-        species <- left_join_microorganisms(mo)$species
-        if (species %in% c("arlettae", "auricularis", "capitis",
-                           "caprae", "carnosus", "cohnii", "condimenti",
-                           "devriesei", "epidermidis", "equorum",
-                           "fleurettii", "gallinarum", "haemolyticus",
-                           "hominis", "jettensis", "kloosii", "lentus",
-                           "lugdunensis", "massiliensis", "microti",
-                           "muscae", "nepalensis", "pasteuri", "petrasii",
-                           "pettenkoferi", "piscifermentans", "rostri",
-                           "saccharolyticus", "saprophyticus", "sciuri",
-                           "stepanovicii", "simulans", "succinus",
-                           "vitulinus", "warneri", "xylosus")) {
-          x[i] <- "STACNS"
-          next
-        } else if ((Becker == "all"  & species == "aureus")
-                   | species %in% c("simiae", "agnetis", "chromogenes",
-                                    "delphini", "felis", "lutrae",
-                                    "hyicus", "intermedius",
-                                    "pseudintermedius", "pseudointermedius",
-                                    "schleiferi")) {
-          x[i] <- "STACPS"
-          next
-        }
-      }
-    }
-
-    if (Lancefield == TRUE) {
-      mo <- suppressWarnings(guess_mo(x_backup[i]))
-      if (mo %like% '^STC') {
-        # See Source
-        species <- left_join_microorganisms(mo)$species
-        if (species == "pyogenes") {
-          x[i] <- "STCGRA"
-          next
-        }
-        if (species == "agalactiae") {
-          x[i] <- "STCGRB"
-          next
-        }
-        if (species %in% c("equisimilis", "equi",
-                           "zooepidemicus", "dysgalactiae")) {
-          x[i] <- "STCGRC"
-          next
-        }
-        if (species == "anginosus") {
-          x[i] <- "STCGRF"
-          next
-        }
-        if (species == "sanguis") {
-          x[i] <- "STCGRH"
-          next
-        }
-        if (species == "salivarius") {
-          x[i] <- "STCGRK"
-          next
-        }
-      }
-    }
-
    if (identical(x_trimmed[i], "")) {
      # empty values
      x[i] <- NA
@ -206,12 +142,12 @@ as.mo <- function(x, Becker = FALSE, Lancefield = FALSE) {
      next
    }
    if (x_backup[i] %in% AMR::microorganisms$mo) {
-      # is already a valid mo
+      # is already a valid MO code
      x[i] <- x_backup[i]
      next
    }
    if (x_trimmed[i] %in% AMR::microorganisms$mo) {
-      # is already a valid mo
+      # is already a valid MO code
      x[i] <- x_trimmed[i]
      next
    }
@ -303,6 +239,13 @@ as.mo <- function(x, Becker = FALSE, Lancefield = FALSE) {
      next
    }

+    # try fullname without start and stop regex, to also find subspecies, like "K. pneu rhino"
+    found <- MOs[which(gsub("[\\(\\)]", "", MOs$fullname) %like% x_withspaces_all[i]),]$mo
+    if (length(found) > 0) {
+      x[i] <- found[1L]
+      next
+    }
+
    # search for GLIMS code
    found <- AMR::microorganisms.umcg[which(toupper(AMR::microorganisms.umcg$umcg) == toupper(x_trimmed[i])),]$mo
    if (length(found) > 0) {
@ -352,6 +295,57 @@ as.mo <- function(x, Becker = FALSE, Lancefield = FALSE) {
            call. = FALSE)
  }

+  if (Becker == TRUE | Becker == "all") {
+    # See Source. It's this figure:
+    # https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4187637/figure/F3/
+    CoNS <- MOs %>%
+      filter(genus == "Staphylococcus",
+             species %in% c("arlettae", "auricularis", "capitis",
+                            "caprae", "carnosus", "cohnii", "condimenti",
+                            "devriesei", "epidermidis", "equorum",
+                            "fleurettii", "gallinarum", "haemolyticus",
+                            "hominis", "jettensis", "kloosii", "lentus",
+                            "lugdunensis", "massiliensis", "microti",
+                            "muscae", "nepalensis", "pasteuri", "petrasii",
+                            "pettenkoferi", "piscifermentans", "rostri",
+                            "saccharolyticus", "saprophyticus", "sciuri",
+                            "stepanovicii", "simulans", "succinus",
+                            "vitulinus", "warneri", "xylosus")) %>%
+      pull(mo)
+    CoPS <- MOs %>%
+      filter(genus == "Staphylococcus",
+             species %in% c("simiae", "agnetis", "chromogenes",
+                            "delphini", "felis", "lutrae",
+                            "hyicus", "intermedius",
+                            "pseudintermedius", "pseudointermedius",
+                            "schleiferi")) %>%
+      pull(mo)
+    x[x %in% CoNS] <- "STACNS"
+    x[x %in% CoPS] <- "STACPS"
+    if (Becker == "all") {
+      x[x == "STAAUR"] <- "STACPS"
+    }
+  }
+
+  if (Lancefield == TRUE) {
+    # group A
+    x[x == "STCPYO"] <- "STCGRA" # S. pyogenes
+    # group B
+    x[x == "STCAGA"] <- "STCGRB" # S. agalactiae
+    # group C
+    S_groupC <- MOs %>% filter(genus == "Streptococcus",
+                               species %in% c("equisimilis", "equi",
+                                              "zooepidemicus", "dysgalactiae")) %>%
+      pull(mo)
+    x[x %in% S_groupC] <- "STCGRC" # S. agalactiae
+    # group F
+    x[x == "STCANG"] <- "STCGRF" # S. anginosus
+    # group H
+    x[x == "STCSAN"] <- "STCGRH" # S. sanguis
+    # group K
+    x[x == "STCSAL"] <- "STCGRK" # S. salivarius
+  }
+
  # left join the found results to the original input values (x_input)
  df_found <- data.frame(input = as.character(unique(x_input)),
                         found = x,
--- a/README.md
+++ b/README.md
@ -55,7 +55,7 @@ This `AMR` package basically does four important things:
   * Use `first_isolate` to identify the first isolates of every patient [using guidelines from the CLSI](https://clsi.org/standards/products/microbiology/documents/m39/) (Clinical and Laboratory Standards Institute).
     * You can also identify first *weighted* isolates of every patient, an adjusted version of the CLSI guideline. This takes into account key antibiotics of every strain and compares them.
   * Use `MDRO` (abbreviation of Multi Drug Resistant Organisms) to check your isolates for exceptional resistance with country-specific guidelines or EUCAST rules. Currently, national guidelines for Germany and the Netherlands are supported.
-   * The data set `microorganisms` contains the family, genus, species, subspecies, colloquial name and Gram stain of almost 2,650 microorganisms (2,207 bacteria, 285 fungi/yeasts, 153 parasites, 1 other). This enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like `mo_genus`, `mo_family` or `mo_gramstain`. Since it uses `as.mo` internally, AI is supported. For example, `mo_genus("MRSA")` and `mo_genus("S. aureus")` will both return `"Staphylococcus"`. These functions can be used to add new variables to your data.
+   * The data set `microorganisms` contains the family, genus, species, subspecies, colloquial name and Gram stain of almost 3,000 potential human pathogenic microorganisms (bacteria, fungi/yeasts and parasites). This enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like `mo_genus`, `mo_family` or `mo_gramstain`. Since it uses `as.mo` internally, AI is supported. For example, `mo_genus("MRSA")` and `mo_genus("S. aureus")` will both return `"Staphylococcus"`. These functions can be used to add new variables to your data.
   * The data set `antibiotics` contains the ATC code, LIS codes, official name, trivial name and DDD of both oral and parenteral administration. It also contains a total of 298 trade names. Use functions like `ab_official` and `ab_tradenames` to look up values. As the `mo_*` functions use `as.mo` internally, the `ab_*` functions use `as.atc` internally so it uses AI to guess your expected result. For example, `ab_official("Fluclox")`, `ab_official("Floxapen")` and `ab_official("J01CF05")` will all return `"Flucloxacillin"`. These functions can again be used to add new variables to your data.

 3. It **analyses the data** with convenient functions that use well-known methods.
@ -74,7 +74,7 @@ This `AMR` package basically does four important things:
     * Real and genuine data

 ## How to get it?
-All versions of this package [are published on CRAN](http://cran.r-project.org/package=AMR), the official R network with a peer-reviewed submission process.
+All stable versions of this package [are published on CRAN](http://cran.r-project.org/package=AMR), the official R network with a peer-reviewed submission process.

 ### Install from CRAN
 [![CRAN_Badge](https://www.r-pkg.org/badges/version/AMR)](http://cran.r-project.org/package=AMR) [![CRAN_Downloads](https://cranlogs.r-pkg.org/badges/grand-total/AMR)](http://cran.r-project.org/package=AMR)
@ -89,15 +89,14 @@ All versions of this package [are published on CRAN](http://cran.r-project.org/p
  - `install.packages("AMR")`

 ### Install from GitHub
-[![Last_Commit](https://img.shields.io/github/last-commit/msberends/AMR.svg)](https://github.com/msberends/AMR/commits/master)

-This is the latest development version. Although it may contain bugfixes and even new functions compared to the latest released version on CRAN, it is also subject to change and may be unstable or behave unexpectedly. Always consider this a beta version. All below 'badges' should be green.
+This is the latest **development version**. Although it may contain bugfixes and even new functions compared to the latest released version on CRAN, it is also subject to change and may be unstable or behave unexpectedly. Always consider this a beta version. All below 'badges' should be green:

-Development Test | Result
--- | :---:
-Works on Linux and macOS | [![Travis_Build](https://travis-ci.org/msberends/AMR.svg?branch=master)](https://travis-ci.org/msberends/AMR)
-Works on Windows | [![AppVeyor_Build](https://ci.appveyor.com/api/projects/status/github/msberends/AMR?branch=master&svg=true)](https://ci.appveyor.com/project/msberends/AMR)
-Syntax lines checked | [![Code_Coverage](https://codecov.io/gh/msberends/AMR/branch/master/graph/badge.svg)](https://codecov.io/gh/msberends/AMR)
+Development Test | Result | Reference
+--- | :---: | ---
+Works on Linux and macOS | [![Travis_Build](https://travis-ci.org/msberends/AMR.svg?branch=master)](https://travis-ci.org/msberends/AMR) | Checked by Travis CI, GmbH [[ref 1]](https://travis-ci.org/msberends/AMR) 
+Works on Windows | [![AppVeyor_Build](https://ci.appveyor.com/api/projects/status/github/msberends/AMR?branch=master&svg=true)](https://ci.appveyor.com/project/msberends/AMR) | Checked by Appveyor Systems Inc. [[ref 2]](https://ci.appveyor.com/project/msberends/AMR)
+Syntax lines checked | [![Code_Coverage](https://codecov.io/gh/msberends/AMR/branch/master/graph/badge.svg)](https://codecov.io/gh/msberends/AMR) | Checked by Codecov LLC [[ref 3]](https://codecov.io/gh/msberends/AMR)

 If so, try it with:
 ```r
--- a/data/microorganisms.rda
+++ b/data/microorganisms.rda
--- a/man/antibiotics.Rd
+++ b/man/antibiotics.Rd
@ -3,8 +3,8 @@
 \docType{data}
 \name{antibiotics}
 \alias{antibiotics}
-\title{Dataset with 423 antibiotics}
-\format{A data.frame with 423 observations and 18 variables:
+\title{Data set with 423 antibiotics}
+\format{A \code{\link{tibble}} with 423 observations and 18 variables:
 \describe{
  \item{\code{atc}}{ATC code, like \code{J01CR02}}
  \item{\code{certe}}{Certe code, like \code{amcl}}
@ -32,7 +32,7 @@
 antibiotics
 }
 \description{
-A dataset containing all antibiotics with a J0 code and some other antimicrobial agents, with their DDD's. Except for trade names and abbreviations, all properties were downloaded from the WHO, see Source.
+A data set containing all antibiotics with a J0 code and some other antimicrobial agents, with their DDDs. Except for trade names and abbreviations, all properties were downloaded from the WHO, see Source.
 }
 \seealso{
 \code{\link{microorganisms}}
--- a/man/as.atc.Rd
+++ b/man/as.atc.Rd
@ -45,6 +45,6 @@ ab_official(Cipro)       # returns "Ciprofloxacin"
 ab_umcg(Cipro)           # returns "CIPR", the code used in the UMCG
 }
 \seealso{
-\code{\link{antibiotics}} for the dataframe that is being used to determine ATC's.
+\code{\link{antibiotics}} for the dataframe that is being used to determine ATCs.
 }
 \keyword{atc}
--- a/man/microorganisms.Rd
+++ b/man/microorganisms.Rd
@ -3,8 +3,8 @@
 \docType{data}
 \name{microorganisms}
 \alias{microorganisms}
-\title{Dataset with ~2650 microorganisms}
-\format{A data.frame with 2,646 observations and 12 variables:
+\title{Data set with human pathogenic microorganisms}
+\format{A \code{\link{tibble}} with 2,664 observations and 12 variables:
 \describe{
  \item{\code{mo}}{ID of microorganism}
  \item{\code{bactsys}}{Bactsyscode of microorganism}
@ -23,7 +23,7 @@
 microorganisms
 }
 \description{
-A dataset containing 2,646 microorganisms. MO codes of the UMCG can be looked up using \code{\link{microorganisms.umcg}}.
+A data set containing 2,664 (potential) human pathogenic microorganisms. MO codes can be looked up using \code{\link{guess_mo}}.
 }
 \seealso{
 \code{\link{guess_mo}} \code{\link{antibiotics}} \code{\link{microorganisms.umcg}}
--- a/man/microorganisms.umcg.Rd
+++ b/man/microorganisms.umcg.Rd
@ -3,8 +3,8 @@
 \docType{data}
 \name{microorganisms.umcg}
 \alias{microorganisms.umcg}
-\title{Translation table for UMCG with ~1100 microorganisms}
-\format{A data.frame with 1090 observations and 2 variables:
+\title{Translation table for UMCG with ~1,100 microorganisms}
+\format{A \code{\link{tibble}} with 1,090 observations and 2 variables:
 \describe{
  \item{\code{umcg}}{Code of microorganism according to UMCG MMB}
  \item{\code{mo}}{Code of microorganism in \code{\link{microorganisms}}}
@ -13,7 +13,7 @@
 microorganisms.umcg
 }
 \description{
-A dataset containing all bacteria codes of UMCG MMB. These codes can be joined to data with an ID from \code{\link{microorganisms}$mo} (using \code{\link{left_join_microorganisms}}). GLIMS codes can also be translated to valid \code{mo}'s with \code{\link{guess_mo}}.
+A data set containing all bacteria codes of UMCG MMB. These codes can be joined to data with an ID from \code{\link{microorganisms}$mo} (using \code{\link{left_join_microorganisms}}). GLIMS codes can also be translated to valid \code{MO}s with \code{\link{guess_mo}}.
 }
 \seealso{
 \code{\link{guess_mo}} \code{\link{microorganisms}}
--- a/man/septic_patients.Rd
+++ b/man/septic_patients.Rd
@ -3,8 +3,8 @@
 \docType{data}
 \name{septic_patients}
 \alias{septic_patients}
-\title{Dataset with 2000 blood culture isolates of septic patients}
-\format{A data.frame with 2000 observations and 49 variables:
+\title{Data set with 2000 blood culture isolates of septic patients}
+\format{A \code{\link{tibble}} with 2,000 observations and 49 variables:
 \describe{
  \item{\code{date}}{date of receipt at the laboratory}
  \item{\code{hospital_id}}{ID of the hospital, from A to D}
@ -21,20 +21,20 @@
 septic_patients
 }
 \description{
-An anonymised dataset containing 2000 microbial blood culture isolates with their full antibiograms found in septic patients in 4 different hospitals in the Netherlands, between 2001 and 2017. It is true, genuine data. This \code{data.frame} can be used to practice AMR analysis. For examples, press F1.
+An anonymised data set containing 2,000 microbial blood culture isolates with their full antibiograms found in septic patients in 4 different hospitals in the Netherlands, between 2001 and 2017. It is true, genuine data. This \code{data.frame} can be used to practice AMR analysis. For examples, press F1.
 }
 \examples{
 # ----------- #
 # PREPARATION #
 # ----------- #

-# Save this example dataset to an object, so we can edit it:
+# Save this example data set to an object, so we can edit it:
 my_data <- septic_patients

 # load the dplyr package to make data science A LOT easier
 library(dplyr)

-# Add first isolates to our dataset:
+# Add first isolates to our data set:
 my_data <- my_data \%>\%
  mutate(first_isolates = first_isolate(my_data, "date", "patient_id", "mo"))

--- a/tests/testthat/test-eucast.R
+++ b/tests/testthat/test-eucast.R
@ -20,7 +20,6 @@ test_that("EUCAST rules work", {
                      "ENTAER"), # Enterobacter aerogenes
                  amox = "R",           # Amoxicillin
                  stringsAsFactors = FALSE)
-  expect_warning(EUCAST_rules(a, info = FALSE))
  expect_identical(suppressWarnings(EUCAST_rules(a, info = FALSE)), b)
  expect_identical(suppressWarnings(interpretive_reading(a, info = TRUE)), b)

--- a/tests/testthat/test-first_isolate.R
+++ b/tests/testthat/test-first_isolate.R
@ -124,10 +124,15 @@ test_that("first isolates work", {
                             col_date = "non-existing col",
                             col_mo = "mo"))

-  expect_warning(septic_patients %>%
+  # if mo is not an mo class, result should be the same
+  expect_identical(septic_patients %>%
                   mutate(mo = as.character(mo)) %>%
                   first_isolate(col_date = "date",
                                 col_mo = "mo",
-                                 col_patient_id = "patient_id"))
+                                 col_patient_id = "patient_id"),
+                   septic_patients %>%
+                     first_isolate(col_date = "date",
+                                   col_mo = "mo",
+                                   col_patient_id = "patient_id"))

 })
--- a/tests/testthat/test-mo.R
+++ b/tests/testthat/test-mo.R
@ -11,6 +11,7 @@ test_that("as.mo works", {
  expect_equal(as.character(as.mo(" ESCCOL ")), "ESCCOL")
  expect_equal(as.character(as.mo("klpn")), "KLEPNE")
  expect_equal(as.character(as.mo("Klebsiella")), "KLE")
+  expect_equal(as.character(as.mo("K. pneu rhino")), "KLEPNERH") # K. pneumoniae subspp. rhinoscleromatis
  expect_equal(as.character(as.mo("coagulase negative")), "STACNS")

  expect_equal(as.character(as.mo("P. aer")), "PSEAER") # not Pasteurella aerogenes
--- a/vignettes/AMR.Rmd
+++ b/vignettes/AMR.Rmd
@ -5,7 +5,7 @@ output:
  rmarkdown::html_vignette:
    toc: true
 vignette: >
-  %\VignetteIndexEntry{Creating Frequency Tables}
+  %\VignetteIndexEntry{Introduction to the AMR package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
 ---
@ -23,10 +23,10 @@ This `AMR` package basically does four important things:

 1. It **cleanses existing data**, by transforming it to reproducible and profound *classes*, making the most efficient use of R. These functions all use artificial intelligence to guess results that you would expect:

-   * Use `as.bactid` to get an ID of a microorganism. The IDs are quite obvious - the ID of *E. coli* is "ESCCOL" and the ID of *S. aureus* is "STAAUR". This `as.bactid` function takes almost any text as input that looks like the name or code of a microorganism like "E. coli", "esco" and "esccol". Even `as.bactid("MRSA")` will return the ID of *S. aureus*. Moreover, it can group all coagulase negative and positive *Staphylococci*, and can transform *Streptococci* into Lancefield groups. To find bacteria based on your input, this package contains a freely available database of ~2,650 different (potential) human pathogenic microorganisms.
+   * Use `as.mo` to get an ID of a microorganism. The IDs are quite obvious - the ID of *E. coli* is "ESCCOL" and the ID of *S. aureus* is "STAAUR". The function takes almost any text as input that looks like the name or code of a microorganism like "E. coli", "esco" and "esccol". Even `as.mo("MRSA")` will return the ID of *S. aureus*. Moreover, it can group all coagulase negative and positive *Staphylococci*, and can transform *Streptococci* into Lancefield groups. To find bacteria based on your input, this package contains a freely available database of ~2,650 different (potential) human pathogenic microorganisms.
   * Use `as.rsi` to transform values to valid antimicrobial results. It produces just S, I or R based on your input and warns about invalid values. Even values like "<=0.002; S" (combined MIC/RSI) will result in "S".
   * Use `as.mic` to cleanse your MIC values. It produces a so-called factor (called *ordinal* in SPSS) with valid MIC values as levels. A value like "<=0.002; S" (combined MIC/RSI) will result in "<=0.002".
-   * Use `as.atc` to get the ATC code of an antibiotic as defined by the WHO. This package contains a database with most LIS codes, official names, DDDs and even trade names of antibiotics. For example, the values "Furabid", "Furadantine", "nitro" all return the ATC code of Nitrofurantoine.
+   * Use `as.atc` to get the ATC code of an antibiotic as defined by the WHO. This package contains a database with most LIS codes, official names, DDDs and even trade names of antibiotics. For example, the values "Furabid", "Furadantin", "nitro" all return the ATC code of Nitrofurantoine.
   
 2. It **enhances existing data** and **adds new data** from data sets included in this package.

@ -34,8 +34,8 @@ This `AMR` package basically does four important things:
   * Use `first_isolate` to identify the first isolates of every patient [using guidelines from the CLSI](https://clsi.org/standards/products/microbiology/documents/m39/) (Clinical and Laboratory Standards Institute).
     * You can also identify first *weighted* isolates of every patient, an adjusted version of the CLSI guideline. This takes into account key antibiotics of every strain and compares them.
   * Use `MDRO` (abbreviation of Multi Drug Resistant Organisms) to check your isolates for exceptional resistance with country-specific guidelines or EUCAST rules. Currently, national guidelines for Germany and the Netherlands are supported.
-   * The data set `microorganisms` contains the family, genus, species, subspecies, colloquial name and Gram stain of almost 2,650 microorganisms (2,207 bacteria, 285 fungi/yeasts, 153 parasites, 1 other). This enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like `mo_genus`, `mo_family` or `mo_gramstain`. Since it uses `as.bactid` internally, AI is supported. For example, `mo_genus("MRSA")` and `mo_genus("S. aureus")` will both return `"Staphylococcus"`. These functions can be used to add new variables to your data.
-   * The data set `antibiotics` contains the ATC code, LIS codes, official name, trivial name and DDD of both oral and parenteral administration. It also contains a total of 298 trade names. Use functions like `ab_official` and `ab_tradenames` to look up values. As the `mo_*` functions use `as.bactid` internally, the `ab_*` functions use `as.atc` internally so it uses AI to guess your expected result. For example, `ab_official("Fluclox")`, `ab_official("Floxapen")` and `ab_official("J01CF05")` will all return `"Flucloxacillin"`. These functions can again be used to add new variables to your data.
+   * The data set `microorganisms` contains the family, genus, species, subspecies, colloquial name and Gram stain of almost 2,650 microorganisms (2,207 bacteria, 285 fungi/yeasts, 153 parasites, 1 other). This enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like `mo_genus`, `mo_family` or `mo_gramstain`. Since it uses `as.mo` internally, AI is supported. For example, `mo_genus("MRSA")` and `mo_genus("S. aureus")` will both return `"Staphylococcus"`. These functions can be used to add new variables to your data.
+   * The data set `antibiotics` contains the ATC code, LIS codes, official name, trivial name and DDD of both oral and parenteral administration. It also contains a total of 298 trade names. Use functions like `ab_official` and `ab_tradenames` to look up values. As the `mo_*` functions use `as.mo` internally, the `ab_*` functions use `as.atc` internally so it uses AI to guess your expected result. For example, `ab_official("Fluclox")`, `ab_official("Floxapen")` and `ab_official("J01CF05")` will all return `"Flucloxacillin"`. These functions can again be used to add new variables to your data.

 3. It **analyses the data** with convenient functions that use well-known methods.