AMR/R/mo.R

# ==================================================================== #
# TITLE                                                                #
# Antimicrobial Resistance (AMR) Data Analysis for R                   #
#                                                                      #
# SOURCE                                                               #
# https://github.com/msberends/AMR                                     #
#                                                                      #
# LICENCE                                                              #
# (c) 2018-2021 Berends MS, Luz CF et al.                              #
# Developed at the University of Groningen, the Netherlands, in        #
# collaboration with non-profit organisations Certe Medical            #
# Diagnostics & Advice, and University Medical Center Groningen.       #
#                                                                      #
# This R package is free software; you can freely use and distribute   #
# it for both personal and commercial purposes under the terms of the  #
# GNU General Public License version 2.0 (GNU GPL-2), as published by  #
# the Free Software Foundation.                                        #
# We created this package for both routine data analysis and academic  #
# research and it was publicly released in the hope that it will be    #
# useful, but it comes WITHOUT ANY WARRANTY OR LIABILITY.              #
#                                                                      #
# Visit our website for the full manual and a complete tutorial about  #
# how to conduct AMR data analysis: https://msberends.github.io/AMR/   #
# ==================================================================== #

#' Transform Input to a Microorganism ID
#'
#' Use this function to determine a valid microorganism ID ([`mo`]). Determination is done using intelligent rules and the complete taxonomic kingdoms Bacteria, Chromista, Protozoa, Archaea and most microbial species from the kingdom Fungi (see *Source*). The input can be almost anything: a full name (like `"Staphylococcus aureus"`), an abbreviated name (such as `"S. aureus"`), an abbreviation known in the field (such as `"MRSA"`), or just a genus. See *Examples*.
#' @inheritSection lifecycle Stable Lifecycle
#' @param x a character vector or a [data.frame] with one or two columns
#' @param Becker a logical to indicate whether staphylococci should be categorised into coagulase-negative staphylococci ("CoNS") and coagulase-positive staphylococci ("CoPS") instead of their own species, according to Karsten Becker *et al.* (1,2,3).
#'
#' This excludes *Staphylococcus aureus* at default, use `Becker = "all"` to also categorise *S. aureus* as "CoPS".
#' @param Lancefield a logical to indicate whether beta-haemolytic *Streptococci* should be categorised into Lancefield groups instead of their own species, according to Rebecca C. Lancefield (4). These *Streptococci* will be categorised in their first group, e.g. *Streptococcus dysgalactiae* will be group C, although officially it was also categorised into groups G and L.
#'
#' This excludes *Enterococci* at default (who are in group D), use `Lancefield = "all"` to also categorise all *Enterococci* as group D.
#' @param allow_uncertain a number between `0` (or `"none"`) and `3` (or `"all"`), or `TRUE` (= `2`) or `FALSE` (= `0`) to indicate whether the input should be checked for less probable results, see *Details*
#' @param reference_df a [data.frame] to be used for extra reference when translating `x` to a valid [`mo`]. See [set_mo_source()] and [get_mo_source()] to automate the usage of your own codes (e.g. used in your analysis or organisation).
#' @param ignore_pattern a regular expression (case-insensitive) of which all matches in `x` must return `NA`. This can be convenient to exclude known non-relevant input and can also be set with the option `AMR_ignore_pattern`, e.g. `options(AMR_ignore_pattern = "(not reported|contaminated flora)")`.
#' @param language language to translate text like "no growth", which defaults to the system language (see [get_locale()])
#' @param ... other arguments passed on to functions
#' @rdname as.mo
#' @aliases mo
#' @keywords mo Becker becker Lancefield lancefield guess
#' @details
#' ## General Info
#'
#' A microorganism ID from this package (class: [`mo`]) is human readable and typically looks like these examples:
#' ```
#'   Code               Full name
#'   ---------------    --------------------------------------
#'   B_KLBSL            Klebsiella
#'   B_KLBSL_PNMN       Klebsiella pneumoniae
#'   B_KLBSL_PNMN_RHNS  Klebsiella pneumoniae rhinoscleromatis
#'   |   |    |    |
#'   |   |    |    |
#'   |   |    |    \---> subspecies, a 4-5 letter acronym
#'   |   |    \----> species, a 4-5 letter acronym
#'   |   \----> genus, a 5-7 letter acronym
#'   \----> taxonomic kingdom: A (Archaea), AN (Animalia), B (Bacteria),
#'                             C (Chromista), F (Fungi), P (Protozoa)
#' ```
#'
#' Values that cannot be coerced will be considered 'unknown' and will get the MO code `UNKNOWN`.
#'
#' Use the [`mo_*`][mo_property()] functions to get properties based on the returned code, see *Examples*.
#'
#' The algorithm uses data from the Catalogue of Life (see below) and from one other source (see [microorganisms]).
#'
#' The [as.mo()] function uses several coercion rules for fast and logical results. It assesses the input matching criteria in the following order:
#'
#' 1. Human pathogenic prevalence: the function  starts with more prevalent microorganisms, followed by less prevalent ones;
#' 2. Taxonomic kingdom: the function starts with determining Bacteria, then Fungi, then Protozoa, then others;
#' 3. Breakdown of input values to identify possible matches.
#'
#' This will lead to the effect that e.g. `"E. coli"` (a microorganism highly prevalent in humans) will return the microbial ID of *Escherichia coli* and not *Entamoeba coli* (a microorganism less prevalent in humans), although the latter would alphabetically come first.
#'
#' ## Coping with Uncertain Results
#'
#' In addition, the [as.mo()] function can differentiate four levels of uncertainty to guess valid results:
#' - Uncertainty level 0: no additional rules are applied;
#' - Uncertainty level 1: allow previously accepted (but now invalid) taxonomic names and minor spelling errors;
#' - Uncertainty level 2: allow all of level 1, strip values between brackets, inverse the words of the input, strip off text elements from the end keeping at least two elements;
#' - Uncertainty level 3: allow all of level 1 and 2, strip off text elements from the end, allow any part of a taxonomic name.
#'
#' The level of uncertainty can be set using the argument `allow_uncertain`. The default is `allow_uncertain = TRUE`, which is equal to uncertainty level 2. Using `allow_uncertain = FALSE` is equal to uncertainty level 0 and will skip all rules. You can also use e.g. `as.mo(..., allow_uncertain = 1)` to only allow up to level 1 uncertainty.
#'
#' With the default setting (`allow_uncertain = TRUE`, level 2), below examples will lead to valid results:
#' - `"Streptococcus group B (known as S. agalactiae)"`. The text between brackets will be removed and a warning will be thrown that the result *Streptococcus group B* (``r as.mo("Streptococcus group B")``) needs review.
#' - `"S. aureus - please mind: MRSA"`. The last word will be stripped, after which the function will try to find a match. If it does not, the second last word will be stripped, etc. Again, a warning will be thrown that the result *Staphylococcus aureus* (``r as.mo("Staphylococcus aureus")``) needs review.
#' - `"Fluoroquinolone-resistant Neisseria gonorrhoeae"`. The first word will be stripped, after which the function will try to find a match. A warning will be thrown that the result *Neisseria gonorrhoeae* (``r as.mo("Neisseria gonorrhoeae")``) needs review.
#'
#' There are three helper functions that can be run after using the [as.mo()] function:
#' - Use [mo_uncertainties()] to get a [data.frame] that prints in a pretty format with all taxonomic names that were guessed. The output contains the matching score for all matches (see *Matching Score for Microorganisms* below).
#' - Use [mo_failures()] to get a [character] [vector] with all values that could not be coerced to a valid value.
#' - Use [mo_renamed()] to get a [data.frame] with all values that could be coerced based on old, previously accepted taxonomic names.
#'
#' ## Microbial Prevalence of Pathogens in Humans
#'
#' The intelligent rules consider the prevalence of microorganisms in humans grouped into three groups, which is available as the `prevalence` columns in the [microorganisms] and [microorganisms.old] data sets. The grouping into human pathogenic prevalence is explained in the section *Matching Score for Microorganisms* below.
#' @inheritSection mo_matching_score Matching Score for Microorganisms
#' @inheritSection catalogue_of_life Catalogue of Life
#  (source as a section here, so it can be inherited by other man pages:)
#' @section Source:
#' 1. Becker K *et al.* **Coagulase-Negative Staphylococci**. 2014. Clin Microbiol Rev. 27(4): 870–926; \doi{10.1128/CMR.00109-13}
#' 2. Becker K *et al.* **Implications of identifying the recently defined members of the *S. aureus* complex, *S. argenteus* and *S. schweitzeri*: A position paper of members of the ESCMID Study Group for staphylococci and Staphylococcal Diseases (ESGS).** 2019. Clin Microbiol Infect; \doi{10.1016/j.cmi.2019.02.028}
#' 3. Becker K *et al.* **Emergence of coagulase-negative staphylococci** 2020. Expert Rev Anti Infect Ther. 18(4):349-366; \doi{10.1080/14787210.2020.1730813}
#' 4. Lancefield RC **A serological differentiation of human and other groups of hemolytic streptococci**. 1933. J Exp Med. 57(4): 571–95; \doi{10.1084/jem.57.4.571}
#' 5. Catalogue of Life: Annual Checklist (public online taxonomic database), <http://www.catalogueoflife.org> (check included annual version with [catalogue_of_life_version()]).
#' @export
#' @return A [character] [vector] with additional class [`mo`]
#' @seealso [microorganisms] for the [data.frame] that is being used to determine ID's.
#'
#' The [`mo_*`][mo_property()] functions (such as [mo_genus()], [mo_gramstain()]) to get properties based on the returned code.
#' @inheritSection AMR Reference Data Publicly Available
#' @inheritSection AMR Read more on Our Website!
#' @examples
#' \donttest{
#' # These examples all return "B_STPHY_AURS", the ID of S. aureus:
#' as.mo("sau") # WHONET code
#' as.mo("stau")
#' as.mo("STAU")
#' as.mo("staaur")
#' as.mo("S. aureus")
#' as.mo("S aureus")
#' as.mo("Staphylococcus aureus")
#' as.mo("Staphylococcus aureus (MRSA)")
#' as.mo("Zthafilokkoockus oureuz") # handles incorrect spelling
#' as.mo("MRSA")    # Methicillin Resistant S. aureus
#' as.mo("VISA")    # Vancomycin Intermediate S. aureus
#' as.mo("VRSA")    # Vancomycin Resistant S. aureus
#' as.mo(115329001) # SNOMED CT code
#'
#' # Dyslexia is no problem - these all work:
#' as.mo("Ureaplasma urealyticum")
#' as.mo("Ureaplasma urealyticus")
#' as.mo("Ureaplasmium urealytica")
#' as.mo("Ureaplazma urealitycium")
#'
#' as.mo("Streptococcus group A")
#' as.mo("GAS") # Group A Streptococci
#' as.mo("GBS") # Group B Streptococci
#'
#' as.mo("S. epidermidis")                 # will remain species: B_STPHY_EPDR
#' as.mo("S. epidermidis", Becker = TRUE)  # will not remain species: B_STPHY_CONS
#'
#' as.mo("S. pyogenes")                    # will remain species: B_STRPT_PYGN
#' as.mo("S. pyogenes", Lancefield = TRUE) # will not remain species: B_STRPT_GRPA
#'
#' # All mo_* functions use as.mo() internally too (see ?mo_property):
#' mo_genus("E. coli")                           # returns "Escherichia"
#' mo_gramstain("E. coli")                       # returns "Gram negative"
#' mo_is_intrinsic_resistant("E. coli", "vanco") # returns TRUE
#' }
as.mo <- function(x,
                  Becker = FALSE,
                  Lancefield = FALSE,
                  allow_uncertain = TRUE,
                  reference_df = get_mo_source(),
                  ignore_pattern = getOption("AMR_ignore_pattern"),
                  language = get_locale(),
                  ...) {
  meet_criteria(x, allow_class = c("mo", "data.frame", "list", "character", "numeric", "integer", "factor"), allow_NA = TRUE)
  meet_criteria(Becker, allow_class = c("logical", "character"), has_length = 1)
  meet_criteria(Lancefield, allow_class = c("logical", "character"), has_length = 1)
  meet_criteria(allow_uncertain, allow_class = c("logical", "numeric", "integer"), has_length = 1)
  meet_criteria(reference_df, allow_class = "data.frame", allow_NULL = TRUE)
  meet_criteria(ignore_pattern, allow_class = "character", has_length = 1, allow_NULL = TRUE)
  meet_criteria(language, has_length = 1, is_in = c(LANGUAGES_SUPPORTED, ""), allow_NULL = TRUE, allow_NA = TRUE)

  check_dataset_integrity()

  if (tryCatch(all(x[!is.na(x)] %in% MO_lookup$mo)
               & isFALSE(Becker)
               & isFALSE(Lancefield), error = function(e) FALSE)) {
    # don't look into valid MO codes, just return them
    # is.mo() won't work - MO codes might change between package versions
    return(set_clean_class(x, new_class = c("mo", "character")))
  }

  # start off with replaced language-specific non-ASCII characters with ASCII characters
  x <- parse_and_convert(x)
  # replace mo codes used in older package versions
  x <- replace_old_mo_codes(x, property = "mo")
  # ignore cases that match the ignore pattern
  x <- replace_ignore_pattern(x, ignore_pattern)

  # WHONET: xxx = no growth
  x[tolower(as.character(paste0(x, ""))) %in% c("", "xxx", "na", "nan")] <- NA_character_
  # Laboratory systems: remove (translated) entries like "no growth", etc.
  x[trimws2(x) %like% translate_AMR("no .*growth", language = language)] <- NA_character_
  x[trimws2(x) %like% paste0("^(", translate_AMR("no|not", language = language), ") [a-z]+")] <- "UNKNOWN"
  uncertainty_level <- translate_allow_uncertain(allow_uncertain)
  
  if (tryCatch(all(x == "" | gsub(".*(unknown ).*", "unknown name", tolower(x), perl = TRUE) %in% MO_lookup$fullname_lower, na.rm = TRUE)
               & isFALSE(Becker)
               & isFALSE(Lancefield), error = function(e) FALSE)) {
    # to improve speed, special case for taxonomically correct full names (case-insensitive)
    return(MO_lookup[match(gsub(".*(unknown ).*", "unknown name", tolower(x), perl = TRUE), MO_lookup$fullname_lower), "mo", drop = TRUE])
  }

  if (!is.null(reference_df)
      && check_validity_mo_source(reference_df)
      && isFALSE(Becker)
      && isFALSE(Lancefield)
      && all(x %in% unlist(reference_df), na.rm = TRUE)) {

    reference_df <- repair_reference_df(reference_df)
    suppressWarnings(
      y <- data.frame(x = x, stringsAsFactors = FALSE) %pm>%
        pm_left_join(reference_df, by = "x") %pm>%
        pm_pull(mo) 
    )

  } else if (all(x[!is.na(x)] %in% MO_lookup$mo)
             & isFALSE(Becker)
             & isFALSE(Lancefield)) {
    y <- x

  } else {
    # will be checked for mo class in validation and uses exec_as.mo internally if necessary
    y <- mo_validate(x = x, property = "mo",
                     Becker = Becker, Lancefield = Lancefield,
                     allow_uncertain = uncertainty_level,
                     reference_df = reference_df,
                     ignore_pattern = ignore_pattern,
                     language = language,
                     ...)
  }

  set_clean_class(y,
                  new_class = c("mo", "character"))
}

#' @rdname as.mo
#' @export
is.mo <- function(x) {
  inherits(x, "mo")
}

# param property a column name of microorganisms
# param initial_search logical - is FALSE when coming from uncertain tries, which uses exec_as.mo internally too
# param dyslexia_mode logical - also check for characters that resemble others
# param debug logical - show different lookup texts while searching
# param reference_data_to_use data.frame - the data set to check for
# param actual_uncertainty - (only for initial_search = FALSE) the actual uncertainty level used in the function for score calculation (sometimes passed as 2 or 3 by uncertain_fn())
# param actual_input - (only for initial_search = FALSE) the actual, original input
# param language - used for translating "no growth", etc.
exec_as.mo <- function(x,
                       Becker = FALSE,
                       Lancefield = FALSE,
                       allow_uncertain = TRUE,
                       reference_df = get_mo_source(),
                       property = "mo",
                       initial_search = TRUE,
                       dyslexia_mode = FALSE,
                       debug = FALSE,
                       ignore_pattern = getOption("AMR_ignore_pattern"),
                       reference_data_to_use = MO_lookup,
                       actual_uncertainty = 1,
                       actual_input = NULL,
                       language = get_locale()) {
  meet_criteria(x, allow_class = c("mo", "data.frame", "list", "character", "numeric", "integer", "factor"), allow_NA = TRUE)
  meet_criteria(Becker, allow_class = c("logical", "character"), has_length = 1)
  meet_criteria(Lancefield, allow_class = c("logical", "character"), has_length = 1)
  meet_criteria(allow_uncertain, allow_class = c("logical", "numeric", "integer"), has_length = 1)
  meet_criteria(reference_df, allow_class = "data.frame", allow_NULL = TRUE)
  meet_criteria(property, allow_class = "character", has_length = 1, is_in = colnames(microorganisms))
  meet_criteria(initial_search, allow_class = "logical", has_length = 1)
  meet_criteria(dyslexia_mode, allow_class = "logical", has_length = 1)
  meet_criteria(debug, allow_class = "logical", has_length = 1)
  meet_criteria(ignore_pattern, allow_class = "character", has_length = 1, allow_NULL = TRUE)
  meet_criteria(reference_data_to_use, allow_class = "data.frame")
  meet_criteria(actual_uncertainty, allow_class = "numeric", has_length = 1)
  meet_criteria(actual_input, allow_class = "character", allow_NULL = TRUE)
  meet_criteria(language, has_length = 1, is_in = c(LANGUAGES_SUPPORTED, ""), allow_NULL = TRUE, allow_NA = TRUE)

  check_dataset_integrity()
  
  if (isTRUE(debug) && initial_search == TRUE) {
    time_start_tracking()
  }
  
  lookup <- function(needle,
                     column = property,
                     haystack = reference_data_to_use,
                     n = 1,
                     debug_mode = debug,
                     initial = initial_search,
                     uncertainty = actual_uncertainty,
                     input_actual = actual_input) {

    if (!is.null(input_actual)) {
      input <- input_actual
    } else {
      input <- tryCatch(x_backup[i], error = function(e) "")
    }

    # `column` can be NULL for all columns, or a selection
    # returns a character (vector) - if `column` > length 1 then with columns as names
    if (isTRUE(debug_mode)) {
      cat(font_silver("Looking up: ", substitute(needle), collapse = ""), 
          "\n           ", time_track())
    }
    if (length(column) == 1) {
      res_df <- haystack[which(eval(substitute(needle), envir = haystack, enclos = parent.frame())), , drop = FALSE]
      if (NROW(res_df) > 1 & uncertainty != -1) {
        # sort the findings on matching score
        scores <- mo_matching_score(x = input,
                                    n = res_df[, "fullname", drop = TRUE])
        res_df <- res_df[order(scores, decreasing = TRUE), , drop = FALSE]
      }
      res <- as.character(res_df[, column, drop = TRUE])
      if (length(res) == 0) {
        if (isTRUE(debug_mode)) {
          cat(font_red(" (no match)\n"))
        }
        NA_character_
      } else {
        if (isTRUE(debug_mode)) {
          cat(font_green(paste0(" MATCH (", NROW(res_df), " results)\n")))
        }
        if ((length(res) > n | uncertainty > 1) & uncertainty != -1) {
          # save the other possible results as well, but not for forced certain results (then uncertainty == -1)
          uncertainties <<- rbind(uncertainties,
                                  format_uncertainty_as_df(uncertainty_level = uncertainty,
                                                           input = input,
                                                           result_mo = res_df[1, "mo", drop = TRUE],
                                                           candidates = as.character(res_df[, "fullname", drop = TRUE])),
                                  stringsAsFactors = FALSE)
        }
        res[seq_len(min(n, length(res)))]
      }
    } else {
      if (is.null(column)) {
        column <- names(haystack)
      }
      res <- haystack[which(eval(substitute(needle), envir = haystack, enclos = parent.frame())), , drop = FALSE]
      res <- res[seq_len(min(n, nrow(res))), column, drop = TRUE]
      if (NROW(res) == 0) {
        if (isTRUE(debug_mode)) {
          cat(font_red(" (no rows)\n"))
        }
        res <- rep(NA_character_, length(column))
      } else {
        if (isTRUE(debug_mode)) {
          cat(font_green(paste0(" MATCH (", NROW(res), " rows)\n")))
        }
      }
      res <- as.character(res)
      names(res) <- column
      res
    }
  }

  # start off with replaced language-specific non-ASCII characters with ASCII characters
  x <- parse_and_convert(x)
  # replace mo codes used in older package versions
  x <- replace_old_mo_codes(x, property)
  # ignore cases that match the ignore pattern
  x <- replace_ignore_pattern(x, ignore_pattern)

  # WHONET: xxx = no growth
  x[tolower(as.character(paste0(x, ""))) %in% c("", "xxx", "na", "nan")] <- NA_character_
  # Laboratory systems: remove (translated) entries like "no growth", etc.
  x[trimws2(x) %like% translate_AMR("no .*growth", language = language)] <- NA_character_
  x[trimws2(x) %like% paste0("^(", translate_AMR("no|not", language = language), ") [a-z]+")] <- "UNKNOWN"

  if (initial_search == TRUE) {
    # keep track of time - give some hints to improve speed if it takes a long time
    start_time <- Sys.time()
    
    pkg_env$mo_failures <- NULL
    pkg_env$mo_uncertainties <- NULL
    pkg_env$mo_renamed <- NULL
  }
  pkg_env$mo_renamed_last_run <- NULL

  failures <- character(0)
  uncertainty_level <- translate_allow_uncertain(allow_uncertain)
  uncertainties <- data.frame(uncertainty = integer(0),
                              input = character(0),
                              fullname = character(0),
                              renamed_to = character(0),
                              mo = character(0),
                              candidates = character(0),
                              stringsAsFactors = FALSE)

  x_input <- x
  # already strip leading and trailing spaces
  x <- trimws(x)
  # only check the uniques, which is way faster
  x <- unique(x)
  # remove empty values (to later fill them in again with NAs)
  # ("xxx" is WHONET code for 'no growth')
  x <- x[!is.na(x)
         & !is.null(x)
         & !identical(x, "")
         & !identical(x, "xxx")]

  # defined df to check for
  if (!is.null(reference_df)) {
    check_validity_mo_source(reference_df)
    reference_df <- repair_reference_df(reference_df)
  }
  
  # all empty
  if (all(identical(trimws(x_input), "") | is.na(x_input) | length(x) == 0)) {
    if (property == "mo") {
      return(set_clean_class(rep(NA_character_, length(x_input)),
                             new_class = c("mo", "character")))
    } else {
      return(rep(NA_character_, length(x_input)))
    }

  } else if (all(x %in% reference_df[, 1][[1]])) {
    # all in reference df
    colnames(reference_df)[1] <- "x"
    suppressWarnings(
      x <- MO_lookup[match(reference_df[match(x, reference_df$x), "mo", drop = TRUE], MO_lookup$mo), property, drop = TRUE]
    )

  } else if (all(x %in% reference_data_to_use$mo)) {
    x <- MO_lookup[match(x, MO_lookup$mo), property, drop = TRUE]

  } else if (all(tolower(x) %in% reference_data_to_use$fullname_lower)) {
    # we need special treatment for very prevalent full names, they are likely!
    # e.g. as.mo("Staphylococcus aureus")
    x <- MO_lookup[match(tolower(x), MO_lookup$fullname_lower), property, drop = TRUE]

  } else if (all(x %in% reference_data_to_use$fullname)) {
    # we need special treatment for very prevalent full names, they are likely!
    # e.g. as.mo("Staphylococcus aureus")
    x <- MO_lookup[match(x, MO_lookup$fullname), property, drop = TRUE]

  } else if (all(toupper(x) %in% microorganisms.codes$code)) {
    # commonly used MO codes
    x <- MO_lookup[match(microorganisms.codes[match(toupper(x),
                                                    microorganisms.codes$code),
                                              "mo",
                                              drop = TRUE],
                         MO_lookup$mo),
                   property,
                   drop = TRUE]

  } else if (!all(x %in% microorganisms[, property])) {

    strip_whitespace <- function(x, dyslexia_mode) {
      # all whitespaces (tab, new lines, etc.) should be one space
      # and spaces before and after should be left blank
      trimmed <- trimws2(x)
      # also, make sure the trailing and leading characters are a-z or 0-9
      # in case of non-regex
      if (dyslexia_mode == FALSE) {
        trimmed <- gsub("^[^a-zA-Z0-9)(]+", "", trimmed, perl = TRUE)
        trimmed <- gsub("[^a-zA-Z0-9)(]+$", "", trimmed, perl = TRUE)
      }
      trimmed
    }

    x_backup_untouched <- x
    x <- strip_whitespace(x, dyslexia_mode)
    # translate 'unknown' names back to English
    if (any(x %like% "unbekannt|onbekend|desconocid|sconosciut|iconnu|desconhecid", na.rm = TRUE)) {
      trns <- subset(translations_file, pattern %like% "unknown" | affect_mo_name == TRUE)
      langs <- LANGUAGES_SUPPORTED[LANGUAGES_SUPPORTED != "en"]
      for (l in langs) {
        for (i in seq_len(nrow(trns))) {
          if (!is.na(trns[i, l, drop = TRUE])) {
            x <- gsub(pattern = trns[i, l, drop = TRUE],
                      replacement = trns$pattern[i],
                      x = x,
                      ignore.case = TRUE,
                      perl = TRUE)
          }
        }
      }
    }
    
    x_backup <- x
    
    # from here on case-insensitive
    x <- tolower(x)
    
    x_backup[x %like_case% "^(fungus|fungi)$"] <- "(unknown fungus)" # will otherwise become the kingdom
    x_backup[x_backup_untouched == "Fungi"] <- "Fungi" # is literally the kingdom
    
    # Fill in fullnames and MO codes at once
    known_names <- tolower(x_backup) %in% MO_lookup$fullname_lower
    x[known_names] <- MO_lookup[match(tolower(x_backup)[known_names], MO_lookup$fullname_lower), property, drop = TRUE]
    known_codes <- toupper(x_backup) %in% MO_lookup$mo
    x[known_codes] <- MO_lookup[match(toupper(x_backup)[known_codes], MO_lookup$mo), property, drop = TRUE]
    already_known <- known_names | known_codes

    # now only continue where the right taxonomic output is not already known
    if (any(!already_known)) {
      x_known <- x[already_known]

      # remove spp and species
      x <- gsub(" +(spp.?|ssp.?|sp.? |ss ?.?|subsp.?|subspecies|biovar |serovar |species)", " ", x)
      x <- gsub("(spp.?|subsp.?|subspecies|biovar|serovar|species)", "", x)
      x <- gsub("^([a-z]{2,4})(spe.?)$", "\\1", x, perl = TRUE) # when ending in SPE instead of SPP and preceded by 2-4 characters
      x <- strip_whitespace(x, dyslexia_mode)
      
      x_backup_without_spp <- x
      x_species <- paste(x, "species")
      # translate to English for supported languages of mo_property
      x <- gsub("(gruppe|groep|grupo|gruppo|groupe)", "group", x, perl = TRUE)
      # no groups and complexes as ending
      x <- gsub("(complex|group)$", "", x, perl = TRUE)
      x <- gsub("(^|[^a-z])((an)?aero+b)[a-z]*", "", x, perl = TRUE)
      x <- gsub("^atyp[a-z]*", "", x, perl = TRUE)
      x <- gsub("(vergroen)[a-z]*", "viridans", x, perl = TRUE)
      x <- gsub("[a-z]*diff?erent[a-z]*", "", x, perl = TRUE)
      x <- gsub("(hefe|gist|gisten|levadura|lievito|fermento|levure)[a-z]*", "yeast", x, perl = TRUE)
      x <- gsub("(schimmels?|mofo|molde|stampo|moisissure|fungi)[a-z]*", "fungus", x, perl = TRUE)
      x <- gsub("fungus[ph|f]rya", "fungiphrya", x, perl = TRUE)
      # no contamination
      x <- gsub("(contamination|kontamination|mengflora|contaminaci.n|contamina..o)", "", x, perl = TRUE)
      # remove non-text in case of "E. coli" except dots and spaces
      x <- trimws(gsub("[^.a-zA-Z0-9/ \\-]+", " ", x, perl = TRUE))
      # but make sure that dots are followed by a space
      x <- gsub("[.] ?", ". ", x, perl = TRUE)
      # replace minus by a space
      x <- gsub("-+", " ", x, perl = TRUE)
      # replace hemolytic by haemolytic
      x <- gsub("ha?emoly", "haemoly", x, perl = TRUE)
      # place minus back in streptococci
      x <- gsub("(alpha|beta|gamma).?ha?emoly", "\\1-haemoly", x, perl = TRUE)
      # remove genus as first word
      x <- gsub("^genus ", "", x, perl = TRUE)
      # remove 'uncertain'-like texts
      x <- trimws(gsub("(uncertain|susp[ie]c[a-z]+|verdacht)", "", x, perl = TRUE))
      # allow characters that resemble others = dyslexia_mode ----
      if (dyslexia_mode == TRUE) {
        x <- tolower(x)
        x <- gsub("[iy]+", "[iy]+", x)
        x <- gsub("(c|k|q|qu|s|z|x|ks)+", "(c|k|q|qu|s|z|x|ks)+", x)
        x <- gsub("(ph|hp|f|v)+", "(ph|hp|f|v)+", x)
        x <- gsub("(th|ht|t)+", "(th|ht|t)+", x)
        x <- gsub("a+", "a+", x)
        x <- gsub("u+", "u+", x)
        # allow any ending of -um, -us, -ium, -icum, -ius, -icus, -ica, -ia and -a (needs perl for the negative backward lookup):
        x <- gsub("(u\\+\\(c\\|k\\|q\\|qu\\+\\|s\\|z\\|x\\|ks\\)\\+)(?![a-z])",
                  "(u[s|m]|[iy][ck]?u[ms]|[iy]?[ck]?a)", x, perl = TRUE)
        x <- gsub("(\\[iy\\]\\+\\(c\\|k\\|q\\|qu\\+\\|s\\|z\\|x\\|ks\\)\\+a\\+)(?![a-z])",
                  "(u[s|m]|[iy][ck]?u[ms]|[iy]?[ck]?a)", x, perl = TRUE)
        x <- gsub("(\\[iy\\]\\+u\\+m)(?![a-z])",
                  "(u[s|m]|[iy][ck]?u[ms]|[iy]?[ck]?a)", x, perl = TRUE)
        x <- gsub("(\\[iy\\]\\+a\\+)(?![a-z])",
                  "([iy]*a+|[iy]+a*)", x, perl = TRUE)
        x <- gsub("e+", "e+", x)
        x <- gsub("o+", "o+", x)
        x <- gsub("(.)\\1+", "\\1+", x)
        # allow multiplication of all other consonants
        x <- gsub("([bdgjlnrw]+)", "\\1+", x, perl = TRUE)
        # allow ending in -en or -us
        x <- gsub("e\\+n(?![a-z[])", "(e+n|u+(c|k|q|qu|s|z|x|ks)+)", x, perl = TRUE)
        # if the input is longer than 10 characters, allow any forgotten consonant between all characters, as some might just have forgotten one...
        # this will allow "Pasteurella damatis" to be correctly read as "Pasteurella dagmatis".
        consonants <- paste(letters[!letters %in% c("a", "e", "i", "o", "u")], collapse = "")
        x[nchar(x_backup_without_spp) > 10] <- gsub("[+]", paste0("+[", consonants, "]?"), x[nchar(x_backup_without_spp) > 10])
        # allow au and ou after all above regex implementations
        x <- gsub("a+[bcdfghjklmnpqrstvwxyz]?u+[bcdfghjklmnpqrstvwxyz]?", "(a+u+|o+u+)[bcdfghjklmnpqrstvwxyz]?", x, fixed = TRUE)
        x <- gsub("o+[bcdfghjklmnpqrstvwxyz]?u+[bcdfghjklmnpqrstvwxyz]?", "(a+u+|o+u+)[bcdfghjklmnpqrstvwxyz]?", x, fixed = TRUE)
      }
      x <- strip_whitespace(x, dyslexia_mode)
      # make sure to remove regex overkill (will lead to errors)
      x <- gsub("++", "+", x, fixed = TRUE)
      x <- gsub("?+", "?", x, fixed = TRUE)
      
      x_trimmed <- x
      x_trimmed_species <- paste(x_trimmed, "species")
      x_trimmed_without_group <- gsub(" gro.u.p$", "", x_trimmed, perl = TRUE)
      # remove last part from "-" or "/"
      x_trimmed_without_group <- gsub("(.*)[-/].*", "\\1", x_trimmed_without_group)
      # replace space and dot by regex sign
      x_withspaces <- gsub("[ .]+", ".* ", x, perl = TRUE)
      x <- gsub("[ .]+", ".*", x, perl = TRUE)
      # add start en stop regex
      x <- paste0("^", x, "$")
      
      x_withspaces_start_only <- paste0("^", x_withspaces)
      x_withspaces_end_only <- paste0(x_withspaces, "$")
      x_withspaces_start_end <- paste0("^", x_withspaces, "$")
      
      if (isTRUE(debug)) {
        cat(paste0(font_blue("x"), '                       "', x, '"\n'))
        cat(paste0(font_blue("x_species"), '               "', x_species, '"\n'))
        cat(paste0(font_blue("x_withspaces_start_only"), ' "', x_withspaces_start_only, '"\n'))
        cat(paste0(font_blue("x_withspaces_end_only"), '   "', x_withspaces_end_only, '"\n'))
        cat(paste0(font_blue("x_withspaces_start_end"), '  "', x_withspaces_start_end, '"\n'))
        cat(paste0(font_blue("x_backup"), '                "', x_backup, '"\n'))
        cat(paste0(font_blue("x_backup_without_spp"), '    "', x_backup_without_spp, '"\n'))
        cat(paste0(font_blue("x_trimmed"), '               "', x_trimmed, '"\n'))
        cat(paste0(font_blue("x_trimmed_species"), '       "', x_trimmed_species, '"\n'))
        cat(paste0(font_blue("x_trimmed_without_group"), ' "', x_trimmed_without_group, '"\n'))
      }
      
      if (initial_search == TRUE) {
        progress <- progress_ticker(n = length(x[!already_known]), n_min = 25) # start if n >= 25
        on.exit(close(progress))
      }
      
      for (i in which(!already_known)) {
        
        if (initial_search == TRUE) {
          progress$tick()
        }
        
        # valid MO code ----
        found <- lookup(mo == toupper(x_backup[i]))
        if (!is.na(found)) {
          x[i] <- found[1L]
          next
        }
        
        # valid fullname ----
        found <- lookup(fullname_lower %in% gsub("[^a-zA-Z0-9_. -]", "", tolower(c(x_backup[i], x_backup_without_spp[i])), perl = TRUE))
        # added the gsub() for "(unknown fungus)", since fullname_lower does not contain brackets
        if (!is.na(found)) {
          x[i] <- found[1L]
          next
        }
        
        # old fullname ----
        found <- lookup(fullname_lower %in% tolower(c(x_backup[i], x_backup_without_spp[i])),
                        column = NULL, # all columns
                        haystack = MO.old_lookup)
        if (!all(is.na(found))) {
          # when property is "ref" (which is the case in mo_ref, mo_authors and mo_year), return the old value, so:
          # mo_ref() of "Chlamydia psittaci" will be "Page, 1968" (with warning)
          # mo_ref() of "Chlamydophila psittaci" will be "Everett et al., 1999"
          if (property == "ref") {
            x[i] <- found["ref"]
          } else {
            x[i] <- lookup(fullname == found["fullname_new"], haystack = MO_lookup)
          }
          pkg_env$mo_renamed_last_run <- found["fullname"]
          was_renamed(name_old = found["fullname"],
                      name_new = lookup(fullname == found["fullname_new"], "fullname", haystack = MO_lookup),
                      ref_old = found["ref"],
                      ref_new = lookup(fullname == found["fullname_new"], "ref", haystack = MO_lookup),
                      mo = lookup(fullname == found["fullname_new"], "mo", haystack = MO_lookup))
          next
        }
        
        if (x_backup[i] %like_case% "\\(unknown [a-z]+\\)" | tolower(x_backup_without_spp[i]) %in% c("other", "none", "unknown")) {
          # empty and nonsense values, ignore without warning
          x[i] <- lookup(mo == "UNKNOWN")
          next
        }
        
        # exact SNOMED code ----
        if (x_backup[i] %like_case% "^[0-9]+$") {
          snomed_found <- unlist(lapply(reference_data_to_use$snomed,
                                        function(s) if (x_backup[i] %in% s) {
                                          TRUE
                                        } else {
                                          FALSE
                                        }))
          if (sum(snomed_found, na.rm = TRUE) > 0) {
            found <- reference_data_to_use[snomed_found == TRUE, property][[1]]
            if (!is.na(found)) {
              x[i] <- found[1L]
              next
            }
          }
        }
        
        # very probable: is G. species ----
        found <- lookup(g_species %in% gsub("[^a-z0-9/ \\-]+", "",
                                            tolower(c(x_backup[i], x_backup_without_spp[i])), perl = TRUE))
        if (!is.na(found)) {
          x[i] <- found[1L]
          next
        }
        
        # WHONET and other common LIS codes ----
        found <- microorganisms.codes[which(microorganisms.codes$code %in% toupper(c(x_backup_untouched[i], x_backup[i], x_backup_without_spp[i]))), "mo", drop = TRUE][1L]
        if (!is.na(found)) {
          x[i] <- lookup(mo == found)
          next
        }
        
        # user-defined reference ----
        if (!is.null(reference_df)) {
          if (x_backup[i] %in% reference_df[, 1]) {
            # already checked integrity of reference_df, all MOs are valid
            ref_mo <- reference_df[reference_df[, 1] == x_backup[i], "mo"][[1L]]
            x[i] <- lookup(mo == ref_mo)
            next
          }
        }
        
        # WHONET: xxx = no growth
        if (tolower(as.character(paste0(x_backup_without_spp[i], ""))) %in% c("", "xxx", "na", "nan")) {
          x[i] <- NA_character_
          next
        }
        
        # check for very small input, but ignore the O antigens of E. coli
        if (nchar(gsub("[^a-zA-Z]", "", x_trimmed[i])) < 3
            & !toupper(x_backup_without_spp[i]) %like_case% "O?(26|103|104|104|111|121|145|157)") {
          # fewer than 3 chars and not looked for species, add as failure
          x[i] <- lookup(mo == "UNKNOWN")
          if (initial_search == TRUE) {
            failures <- c(failures, x_backup[i])
          }
          next
        }
        
        if (x_backup_without_spp[i] %like_case% "(virus|viridae)") {
          # there is no fullname like virus or viridae, so don't try to coerce it
          x[i] <- NA_character_
          next
        }
        
        # translate known trivial abbreviations to genus + species ----
        if (toupper(x_backup_without_spp[i]) %in% c("MRSA", "MSSA", "VISA", "VRSA", "BORSA", "GISA")
            | x_backup_without_spp[i] %like_case% "(^| )(mrsa|mssa|visa|vrsa|borsa|gisa|la-?mrsa|ca-?mrsa)( |$)") {
          x[i] <- lookup(fullname == "Staphylococcus aureus", uncertainty = -1)
          next
        }
        if (toupper(x_backup_without_spp[i]) %in% c("MRSE", "MSSE")
            | x_backup_without_spp[i] %like_case% "(^| )(mrse|msse)( |$)") {
          x[i] <- lookup(fullname == "Staphylococcus epidermidis", uncertainty = -1)
          next
        }
        if (toupper(x_backup_without_spp[i]) == "VRE"
            | x_backup_without_spp[i] %like_case% "(^| )vre "
            | x_backup_without_spp[i] %like_case% "(enterococci|enterokok|enterococo)[a-z]*?$")  {
          x[i] <- lookup(genus == "Enterococcus", uncertainty = -1)
          next
        }
        # support for:
        # - AIEC (Adherent-Invasive E. coli)
        # - ATEC (Atypical Entero-pathogenic E. coli)
        # - DAEC (Diffusely Adhering E. coli)
        # - EAEC (Entero-Aggresive E. coli)
        # - EHEC (Entero-Haemorrhagic E. coli)
        # - EIEC (Entero-Invasive E. coli)
        # - EPEC (Entero-Pathogenic E. coli)
        # - ETEC (Entero-Toxigenic E. coli)
        # - NMEC (Neonatal Meningitis‐causing E. coli)
        # - STEC (Shiga-toxin producing E. coli)
        # - UPEC (Uropathogenic E. coli)
        if (toupper(x_backup_without_spp[i]) %in% c("AIEC", "ATEC", "DAEC", "EAEC", "EHEC", "EIEC", "EPEC", "ETEC", "NMEC", "STEC", "UPEC")
            # also support O-antigens of E. coli: O26, O103, O104, O111, O121, O145, O157
            | x_backup_without_spp[i] %like_case% "o?(26|103|104|111|121|145|157)") {
          x[i] <- lookup(fullname == "Escherichia coli", uncertainty = -1)
          next
        }
        if (toupper(x_backup_without_spp[i]) == "MRPA"
            | x_backup_without_spp[i] %like_case% "(^| )mrpa( |$)") {
          # multi resistant P. aeruginosa
          x[i] <- lookup(fullname == "Pseudomonas aeruginosa", uncertainty = -1)
          next
        }
        if (toupper(x_backup_without_spp[i]) == "CRSM") {
          # co-trim resistant S. maltophilia
          x[i] <- lookup(fullname == "Stenotrophomonas maltophilia", uncertainty = -1)
          next
        }
        if (toupper(x_backup_without_spp[i]) %in% c("PISP", "PRSP", "VISP", "VRSP")
            | x_backup_without_spp[i] %like_case% "(^| )(pisp|prsp|visp|vrsp)( |$)") {
          # peni I, peni R, vanco I, vanco R: S. pneumoniae
          x[i] <- lookup(fullname == "Streptococcus pneumoniae", uncertainty = -1)
          next
        }
        if (x_backup_without_spp[i] %like_case% "^g[abcdfghk]s$") {
          # Streptococci, like GBS = Group B Streptococci (B_STRPT_GRPB)
          x[i] <- lookup(mo == toupper(gsub("g([abcdfghk])s",
                                            "B_STRPT_GRP\\1",
                                            x_backup_without_spp[i],
                                            perl = TRUE)), uncertainty = -1)
          next
        }
        if (x_backup_without_spp[i] %like_case% "(streptococ|streptokok).* [abcdfghk]$") {
          # Streptococci in different languages, like "estreptococos grupo B"
          x[i] <- lookup(mo == toupper(gsub(".*(streptococ|streptokok|estreptococ).* ([abcdfghk])$",
                                            "B_STRPT_GRP\\2",
                                            x_backup_without_spp[i],
                                            perl = TRUE)), uncertainty = -1)
          next
        }
        if (x_backup_without_spp[i] %like_case% "group [abcdfghk] (streptococ|streptokok|estreptococ)") {
          # Streptococci in different languages, like "Group A Streptococci"
          x[i] <- lookup(mo == toupper(gsub(".*group ([abcdfghk]) (streptococ|streptokok|estreptococ).*",
                                            "B_STRPT_GRP\\1",
                                            x_backup_without_spp[i],
                                            perl = TRUE)), uncertainty = -1)
          next
        }
        if (x_backup_without_spp[i] %like_case% "haemoly.*strep") {
          # Haemolytic streptococci in different languages
          x[i] <- lookup(mo == "B_STRPT_HAEM", uncertainty = -1)
          next
        }
        # CoNS/CoPS in different languages (support for German, Dutch, Spanish, Portuguese)
        if (x_backup_without_spp[i] %like_case% "[ck]oagulas[ea] negatie?[vf]"
            | x_trimmed[i] %like_case% "[ck]oagulas[ea] negatie?[vf]"
            | x_backup_without_spp[i] %like_case% "[ck]o?ns[^a-z]?$") {
          # coerce S. coagulase negative
          x[i] <- lookup(mo == "B_STPHY_CONS", uncertainty = -1)
          next
        }
        if (x_backup_without_spp[i] %like_case% "[ck]oagulas[ea] positie?[vf]"
            | x_trimmed[i] %like_case% "[ck]oagulas[ea] positie?[vf]"
            | x_backup_without_spp[i] %like_case% "[ck]o?ps[^a-z]?$") {
          # coerce S. coagulase positive
          x[i] <- lookup(mo == "B_STPHY_COPS", uncertainty = -1)
          next
        }
        # streptococcal groups: milleri and viridans
        if (x_trimmed[i] %like_case% "strepto.* mil+er+i"
            | x_backup_without_spp[i] %like_case% "strepto.* mil+er+i"
            | x_backup_without_spp[i] %like_case% "mgs[^a-z]?$") {
          # Milleri Group Streptococcus (MGS)
          x[i] <- lookup(mo == "B_STRPT_MILL", uncertainty = -1)
          next
        }
        if (x_trimmed[i] %like_case% "strepto.* viridans"
            | x_backup_without_spp[i] %like_case% "strepto.* viridans"
            | x_backup_without_spp[i] %like_case% "vgs[^a-z]?$") {
          # Viridans Group Streptococcus (VGS)
          x[i] <- lookup(mo == "B_STRPT_VIRI", uncertainty = -1)
          next
        }
        if (x_backup_without_spp[i] %like_case% "gram[ -]?neg.*"
            | x_backup_without_spp[i] %like_case% "negatie?[vf]"
            | x_trimmed[i] %like_case% "gram[ -]?neg.*") {
          # coerce Gram negatives
          x[i] <- lookup(mo == "B_GRAMN", uncertainty = -1)
          next
        }
        if (x_backup_without_spp[i] %like_case% "gram[ -]?pos.*"
            | x_backup_without_spp[i] %like_case% "positie?[vf]"
            | x_trimmed[i] %like_case% "gram[ -]?pos.*") {
          # coerce Gram positives
          x[i] <- lookup(mo == "B_GRAMP", uncertainty = -1)
          next
        }
        if (x_backup_without_spp[i] %like_case% "mycoba[ck]teri.[nm]?$") {
          # coerce mycobacteria in multiple languages
          x[i] <- lookup(genus == "Mycobacterium", uncertainty = -1)
          next
        }
        
        if (x_backup_without_spp[i] %like_case% "salmonella [a-z]+ ?.*") {
          if (x_backup_without_spp[i] %like_case% "salmonella group") {
            # Salmonella Group A to Z, just return S. species for now
            x[i] <- lookup(genus == "Salmonella", uncertainty = -1)
            next
          } else if (x_backup[i] %like_case% "[sS]almonella [A-Z][a-z]+ ?.*" &
                     !x_backup[i] %like% "t[iy](ph|f)[iy]") {
            # Salmonella with capital letter species like "Salmonella Goettingen" - they're all S. enterica
            # except for S. typhi, S. paratyphi, S. typhimurium
            x[i] <- lookup(fullname == "Salmonella enterica", uncertainty = -1)
            uncertainties <- rbind(uncertainties,
                                   format_uncertainty_as_df(uncertainty_level = 1,
                                                            input = x_backup[i],
                                                            result_mo = lookup(fullname == "Salmonella enterica", "mo", uncertainty = -1)),
                                   stringsAsFactors = FALSE)
            next
          }
        }
        
        # trivial names known to the field:
        if ("meningococcus" %like_case% x_trimmed[i]) {
          # coerce Neisseria meningitidis
          x[i] <- lookup(fullname == "Neisseria meningitidis", uncertainty = -1)
          next
        }
        if ("gonococcus" %like_case% x_trimmed[i]) {
          # coerce Neisseria gonorrhoeae
          x[i] <- lookup(fullname == "Neisseria gonorrhoeae", uncertainty = -1)
          next
        }
        if ("pneumococcus" %like_case% x_trimmed[i]) {
          # coerce Streptococcus penumoniae
          x[i] <- lookup(fullname == "Streptococcus pneumoniae", uncertainty = -1)
          next
        }
        
        if (x_backup[i] %in% pkg_env$mo_failed) {
          # previously failed already in this session ----
          # (at this point the latest reference_df has also been checked)
          x[i] <- lookup(mo == "UNKNOWN")
          if (initial_search == TRUE) {
            failures <- c(failures, x_backup[i])
          }
          next
        }
        
        # NOW RUN THROUGH DIFFERENT PREVALENCE LEVELS
        check_per_prevalence <- function(data_to_check,
                                         data.old_to_check,
                                         a.x_backup,
                                         b.x_trimmed,
                                         c.x_trimmed_without_group,
                                         d.x_withspaces_start_end,
                                         e.x_withspaces_start_only,
                                         f.x_withspaces_end_only,
                                         g.x_backup_without_spp,
                                         h.x_species,
                                         i.x_trimmed_species) {
          
          # FIRST TRY FULLNAMES AND CODES ----
          # if only genus is available, return only genus
          
          if (all(!c(x[i], b.x_trimmed) %like_case% " ")) {
            found <- lookup(fullname_lower %in% c(h.x_species, i.x_trimmed_species),
                            haystack = data_to_check)
            if (!is.na(found)) {
              x[i] <- found[1L]
              return(x[i])
            }
            if (nchar(g.x_backup_without_spp) >= 6) {
              found <- lookup(fullname_lower %like_case% paste0("^", unregex(g.x_backup_without_spp), "[a-z]+"),
                              haystack = data_to_check)
              if (!is.na(found)) {
                x[i] <- found[1L]
                return(x[i])
              }
            }
            # rest of genus only is in allow_uncertain part.
          }
          
          # allow no codes less than 4 characters long, was already checked for WHONET earlier
          if (nchar(g.x_backup_without_spp) < 4) {
            x[i] <- lookup(mo == "UNKNOWN")
            if (initial_search == TRUE) {
              failures <- c(failures, a.x_backup)
            }
            return(x[i])
          }
          
          # try probable: trimmed version of fullname ----
          found <- lookup(fullname_lower %in% tolower(g.x_backup_without_spp),
                          haystack = data_to_check)
          if (!is.na(found)) {
            return(found[1L])
          }
          
          # try any match keeping spaces ----
          if (nchar(g.x_backup_without_spp) >= 6) {
            found <- lookup(fullname_lower %like_case% d.x_withspaces_start_end,
                            haystack = data_to_check)
            if (!is.na(found)) {
              return(found[1L])
            }
          }
          
          # try any match keeping spaces, not ending with $ ----
          found <- lookup(fullname_lower %like_case% paste0(trimws(e.x_withspaces_start_only), " "),
                          haystack = data_to_check)
          if (!is.na(found)) {
            return(found[1L])
          }
          if (nchar(g.x_backup_without_spp) >= 6) {
            found <- lookup(fullname_lower %like_case% e.x_withspaces_start_only,
                            haystack = data_to_check)
            if (!is.na(found)) {
              return(found[1L])
            }
          }
          
          # try any match keeping spaces, not start with ^ ----
          found <- lookup(fullname_lower %like_case% paste0(" ", trimws(f.x_withspaces_end_only)),
                          haystack = data_to_check)
          if (!is.na(found)) {
            return(found[1L])
          }
          
          # try a trimmed version
          if (nchar(g.x_backup_without_spp) >= 6) {
            found <- lookup(fullname_lower %like_case% b.x_trimmed |
                              fullname_lower %like_case% c.x_trimmed_without_group,
                            haystack = data_to_check)
            if (!is.na(found)) {
              return(found[1L])
            }
          }
          
          
          # try splitting of characters in the middle and then find ID ----
          # only when text length is 6 or lower
          # like esco = E. coli, klpn = K. pneumoniae, stau = S. aureus, staaur = S. aureus
          if (nchar(g.x_backup_without_spp) <= 6) {
            x_length <- nchar(g.x_backup_without_spp)
            x_split <- paste0("^",
                              g.x_backup_without_spp %pm>% substr(1, x_length / 2),
                              ".* ",
                              g.x_backup_without_spp %pm>% substr((x_length / 2) + 1, x_length))
            found <- lookup(fullname_lower %like_case% x_split,
                            haystack = data_to_check)
            if (!is.na(found)) {
              return(found[1L])
            }
          }
          
          # try fullname without start and without nchar limit of >= 6 ----
          # like "K. pneu rhino" >> "Klebsiella pneumoniae (rhinoscleromatis)" = KLEPNERH
          found <- lookup(fullname_lower %like_case% e.x_withspaces_start_only,
                          haystack = data_to_check)
          if (!is.na(found)) {
            return(found[1L])
          }
          
          # MISCELLANEOUS ----
          
          # look for old taxonomic names ----
          found <- lookup(fullname_lower %like_case% e.x_withspaces_start_only,
                          column = NULL, # all columns
                          haystack = data.old_to_check)
          if (!all(is.na(found))) {
            # when property is "ref" (which is the case in mo_ref, mo_authors and mo_year), return the old value, so:
            # mo_ref() of "Chlamydia psittaci" will be "Page, 1968" (with warning)
            # mo_ref() of "Chlamydophila psittaci" will be "Everett et al., 1999"
            if (property == "ref") {
              x[i] <- found["ref"]
            } else {
              x[i] <- lookup(fullname == found["fullname_new"], haystack = MO_lookup)
            }
            pkg_env$mo_renamed_last_run <- found["fullname"]
            was_renamed(name_old = found["fullname"],
                        name_new = lookup(fullname == found["fullname_new"], "fullname", haystack = MO_lookup),
                        ref_old = found["ref"],
                        ref_new = lookup(fullname == found["fullname_new"], "ref", haystack = MO_lookup),
                        mo = lookup(fullname == found["fullname_new"], "mo", haystack = MO_lookup))
            return(x[i])
          }
          
          # check for uncertain results ----
          uncertain_fn <- function(a.x_backup,
                                   b.x_trimmed,
                                   d.x_withspaces_start_end,
                                   e.x_withspaces_start_only,
                                   f.x_withspaces_end_only,
                                   g.x_backup_without_spp,
                                   uncertain.reference_data_to_use) {
            
            if (uncertainty_level == 0) {
              # do not allow uncertainties
              return(NA_character_)
            }
            
            # UNCERTAINTY LEVEL 1 ----
            if (uncertainty_level >= 1) {
              now_checks_for_uncertainty_level <- 1
              
              # (1) look again for old taxonomic names, now for G. species ----
              if (isTRUE(debug)) {
                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (1) look again for old taxonomic names, now for G. species\n"))
              }
              if (isTRUE(debug)) {
                message("Running '", d.x_withspaces_start_end, "' and '", e.x_withspaces_start_only, "'")
              }
              found <- lookup(fullname_lower %like_case% d.x_withspaces_start_end |
                                fullname_lower %like_case% e.x_withspaces_start_only,
                              column = NULL, # all columns
                              haystack = data.old_to_check)
              if (!all(is.na(found)) & nchar(g.x_backup_without_spp) >= 6) {
                if (property == "ref") {
                  # when property is "ref" (which is the case in mo_ref, mo_authors and mo_year), return the old value, so:
                  # mo_ref("Chlamydia psittaci") = "Page, 1968" (with warning)
                  # mo_ref("Chlamydophila psittaci") = "Everett et al., 1999"
                  x <- found["ref"]
                } else {
                  x <- lookup(fullname == found["fullname_new"], haystack = MO_lookup)
                }
                was_renamed(name_old = found["fullname"],
                            name_new = lookup(fullname == found["fullname_new"], "fullname", haystack = MO_lookup),
                            ref_old = found["ref"],
                            ref_new = lookup(fullname == found["fullname_new"], "ref", haystack = MO_lookup),
                            mo = lookup(fullname == found["fullname_new"], "mo", haystack = MO_lookup))
                pkg_env$mo_renamed_last_run <- found["fullname"]
                uncertainties <<- rbind(uncertainties,
                                        format_uncertainty_as_df(uncertainty_level = now_checks_for_uncertainty_level,
                                                                 input = a.x_backup,
                                                                 result_mo = lookup(fullname == found["fullname_new"], "mo", haystack = MO_lookup)),
                                        stringsAsFactors = FALSE)
                return(x)
              }
              
              # (2) Try with misspelled input ----
              # just rerun with dyslexia_mode = TRUE will used the extensive regex part above
              if (isTRUE(debug)) {
                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (2) Try with misspelled input\n"))
              }
              if (isTRUE(debug)) {
                message("Running '", a.x_backup, "'")
              }
              # first try without dyslexia mode
              found <- suppressMessages(suppressWarnings(exec_as.mo(a.x_backup, initial_search = FALSE, dyslexia_mode = FALSE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 1, actual_input = a.x_backup)))
              if (empty_result(found)) {
                # then with dyslexia mode
                found <- suppressMessages(suppressWarnings(exec_as.mo(a.x_backup, initial_search = FALSE, dyslexia_mode = TRUE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 1, actual_input = a.x_backup)))
              }
              if (!empty_result(found)) {
                found_result <- found
                uncertainties <<- rbind(uncertainties,
                                        attr(found, which = "uncertainties", exact = TRUE),
                                        stringsAsFactors = FALSE)
                found <- lookup(mo == found)
                return(found)
              }
            }
            
            # UNCERTAINTY LEVEL 2 ----
            if (uncertainty_level >= 2) {
              now_checks_for_uncertainty_level <- 2
              
              # (3) look for genus only, part of name ----
              if (isTRUE(debug)) {
                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (3) look for genus only, part of name\n"))
              }
              if (nchar(g.x_backup_without_spp) > 4 & !b.x_trimmed %like_case% " ") {
                if (!b.x_trimmed %like_case% "^[A-Z][a-z]+") {
                  if (isTRUE(debug)) {
                    message("Running '", paste(b.x_trimmed, "species"), "'")
                  }
                  # not when input is like Genustext, because then Neospora would lead to Actinokineospora
                  found <- lookup(fullname_lower %like_case% paste(b.x_trimmed, "species"),
                                  haystack = uncertain.reference_data_to_use)
                  if (!is.na(found)) {
                    found_result <- found
                    found <- lookup(mo == found)
                    uncertainties <<- rbind(uncertainties,
                                            format_uncertainty_as_df(uncertainty_level = now_checks_for_uncertainty_level,
                                                                     input = a.x_backup,
                                                                     result_mo = found_result),
                                            stringsAsFactors = FALSE)
                    return(found)
                  }
                }
              }
              
              # (4) strip values between brackets ----
              if (isTRUE(debug)) {
                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (4) strip values between brackets\n"))
              }
              a.x_backup_stripped <- gsub("( *[(].*[)] *)", " ", a.x_backup, perl = TRUE)
              a.x_backup_stripped <- trimws(gsub(" +", " ", a.x_backup_stripped, perl = TRUE))
              if (isTRUE(debug)) {
                message("Running '", a.x_backup_stripped, "'")
              }
              # first try without dyslexia mode
              found <- suppressMessages(suppressWarnings(exec_as.mo(a.x_backup_stripped, initial_search = FALSE, dyslexia_mode = FALSE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 2, actual_input = a.x_backup)))
              if (empty_result(found)) {
                # then with dyslexia mode
                found <- suppressMessages(suppressWarnings(exec_as.mo(a.x_backup_stripped, initial_search = FALSE, dyslexia_mode = TRUE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 2, actual_input = a.x_backup)))
              }
              if (!empty_result(found) & nchar(g.x_backup_without_spp) >= 6) {
                found_result <- found
                uncertainties <<- rbind(uncertainties,
                                        attr(found, which = "uncertainties", exact = TRUE),
                                        stringsAsFactors = FALSE)
                found <- lookup(mo == found)
                return(found)
              }
              
              # (5) inverse input ----
              if (isTRUE(debug)) {
                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (5) inverse input\n"))
              }
              a.x_backup_inversed <- paste(rev(unlist(strsplit(a.x_backup, split = " "))), collapse = " ")
              if (isTRUE(debug)) {
                message("Running '", a.x_backup_inversed, "'")
              }
              
              # first try without dyslexia mode
              found <- suppressMessages(suppressWarnings(exec_as.mo(a.x_backup_inversed, initial_search = FALSE, dyslexia_mode = FALSE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 2, actual_input = a.x_backup)))
              if (empty_result(found)) {
                # then with dyslexia mode
                found <- suppressMessages(suppressWarnings(exec_as.mo(a.x_backup_inversed, initial_search = FALSE, dyslexia_mode = TRUE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 2, actual_input = a.x_backup)))
              }
              if (!empty_result(found) & nchar(g.x_backup_without_spp) >= 6) {
                found_result <- found
                uncertainties <<- rbind(uncertainties,
                                        attr(found, which = "uncertainties", exact = TRUE),
                                        stringsAsFactors = FALSE)
                found <- lookup(mo == found)
                return(found)
              }
              
              # (6) try to strip off half an element from end and check the remains ----
              if (isTRUE(debug)) {
                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (6) try to strip off half an element from end and check the remains\n"))
              }
              x_strip <- a.x_backup %pm>% strsplit("[ .]") %pm>% unlist()
              if (length(x_strip) > 1) {
                for (i in seq_len(length(x_strip) - 1)) {
                  lastword <- x_strip[length(x_strip) - i + 1]
                  lastword_half <- substr(lastword, 1, as.integer(nchar(lastword) / 2))
                  # remove last half of the second term
                  x_strip_collapsed <- paste(c(x_strip[seq_len(length(x_strip) - i)], lastword_half), collapse = " ")
                  if (nchar(x_strip_collapsed) >= 4 & nchar(lastword_half) > 2) {
                    if (isTRUE(debug)) {
                      message("Running '", x_strip_collapsed, "'")
                    }
                    # first try without dyslexia mode
                    found <- suppressMessages(suppressWarnings(exec_as.mo(x_strip_collapsed, initial_search = FALSE, dyslexia_mode = FALSE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 2, actual_input = a.x_backup)))
                    if (empty_result(found)) {
                      # then with dyslexia mode
                      found <- suppressMessages(suppressWarnings(exec_as.mo(x_strip_collapsed, initial_search = FALSE, dyslexia_mode = TRUE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 2, actual_input = a.x_backup)))
                    }
                    if (!empty_result(found)) {
                      found_result <- found
                      uncertainties <<- rbind(uncertainties,
                                              attr(found, which = "uncertainties", exact = TRUE),
                                              stringsAsFactors = FALSE)
                      found <- lookup(mo == found)
                      return(found)
                    }
                  }
                }
              }
              # (7) try to strip off one element from end and check the remains ----
              if (isTRUE(debug)) {
                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (7) try to strip off one element from end and check the remains\n"))
              }
              if (length(x_strip) > 1) {
                for (i in seq_len(length(x_strip) - 1)) {
                  x_strip_collapsed <- paste(x_strip[seq_len(length(x_strip) - i)], collapse = " ")
                  if (nchar(x_strip_collapsed) >= 6) {
                    if (isTRUE(debug)) {
                      message("Running '", x_strip_collapsed, "'")
                    }
                    # first try without dyslexia mode
                    found <- suppressMessages(suppressWarnings(exec_as.mo(x_strip_collapsed, initial_search = FALSE, dyslexia_mode = FALSE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 2, actual_input = a.x_backup)))
                    if (empty_result(found)) {
                      # then with dyslexia mode
                      found <- suppressMessages(suppressWarnings(exec_as.mo(x_strip_collapsed, initial_search = FALSE, dyslexia_mode = TRUE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 2, actual_input = a.x_backup)))
                    }
                    
                    if (!empty_result(found)) {
                      found_result <- found
                      uncertainties <<- rbind(uncertainties,
                                              attr(found, which = "uncertainties", exact = TRUE),
                                              stringsAsFactors = FALSE)
                      found <- lookup(mo == found)
                      return(found)
                    }
                  }
                }
              }
              # (8) check for unknown yeasts/fungi ----
              if (isTRUE(debug)) {
                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (8) check for unknown yeasts/fungi\n"))
              }
              if (b.x_trimmed %like_case% "yeast") {
                found <- "F_YEAST"
                found_result <- found
                found <- lookup(mo == found)
                uncertainties <<- rbind(uncertainties,
                                        format_uncertainty_as_df(uncertainty_level = now_checks_for_uncertainty_level,
                                                                 input = a.x_backup,
                                                                 result_mo = found_result),
                                        stringsAsFactors = FALSE)
                return(found)
              }
              if (b.x_trimmed %like_case% "(fungus|fungi)" & !b.x_trimmed %like_case% "fungiphrya") {
                found <- "F_FUNGUS"
                found_result <- found
                found <- lookup(mo == found)
                uncertainties <<- rbind(uncertainties,
                                        format_uncertainty_as_df(uncertainty_level = now_checks_for_uncertainty_level,
                                                                 input = a.x_backup,
                                                                 result_mo = found_result),
                                        stringsAsFactors = FALSE)
                return(found)
              }
              # (9) try to strip off one element from start and check the remains (only allow >= 2-part name outcome) ----
              if (isTRUE(debug)) {
                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (9) try to strip off one element from start and check the remains (only allow >= 2-part name outcome)\n"))
              }
              x_strip <- a.x_backup %pm>% strsplit("[ .]") %pm>% unlist()
              if (length(x_strip) > 1 & nchar(g.x_backup_without_spp) >= 6) {
                for (i in 2:(length(x_strip))) {
                  x_strip_collapsed <- paste(x_strip[i:length(x_strip)], collapse = " ")
                  if (isTRUE(debug)) {
                    message("Running '", x_strip_collapsed, "'")
                  }
                  # first try without dyslexia mode
                  found <- suppressMessages(suppressWarnings(exec_as.mo(x_strip_collapsed, initial_search = FALSE, dyslexia_mode = FALSE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 2, actual_input = a.x_backup)))
                  if (empty_result(found)) {
                    # then with dyslexia mode
                    found <- suppressMessages(suppressWarnings(exec_as.mo(x_strip_collapsed, initial_search = FALSE, dyslexia_mode = TRUE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 2, actual_input = a.x_backup)))
                  }
                  if (!empty_result(found)) {
                    found_result <- found
                    # uncertainty level 2 only if searched part contains a space (otherwise it will be found with lvl 3)
                    if (x_strip_collapsed %like_case% " ") {
                      uncertainties <<- rbind(uncertainties,
                                              attr(found, which = "uncertainties", exact = TRUE),
                                              stringsAsFactors = FALSE)
                      found <- lookup(mo == found)
                      return(found)
                    }
                  }
                }
              }
            }
            
            # UNCERTAINTY LEVEL 3 ----
            if (uncertainty_level >= 3) {
              now_checks_for_uncertainty_level <- 3
              
              # (10) try to strip off one element from start and check the remains (any text size) ----
              if (isTRUE(debug)) {
                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (10) try to strip off one element from start and check the remains (any text size)\n"))
              }
              x_strip <- a.x_backup %pm>% strsplit("[ .]") %pm>% unlist()
              if (length(x_strip) > 1 & nchar(g.x_backup_without_spp) >= 6) {
                for (i in 2:(length(x_strip))) {
                  x_strip_collapsed <- paste(x_strip[i:length(x_strip)], collapse = " ")
                  if (isTRUE(debug)) {
                    message("Running '", x_strip_collapsed, "'")
                  }
                  # first try without dyslexia mode
                  found <- suppressMessages(suppressWarnings(exec_as.mo(x_strip_collapsed, initial_search = FALSE, dyslexia_mode = FALSE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 3, actual_input = a.x_backup)))
                  if (empty_result(found)) {
                    # then with dyslexia mode
                    found <- suppressMessages(suppressWarnings(exec_as.mo(x_strip_collapsed, initial_search = FALSE, dyslexia_mode = TRUE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 3, actual_input = a.x_backup)))
                  }
                  if (!empty_result(found)) {
                    found_result <- found
                    uncertainties <<- rbind(uncertainties,
                                            attr(found, which = "uncertainties", exact = TRUE),
                                            stringsAsFactors = FALSE)
                    found <- lookup(mo == found)
                    return(found)
                  }
                }
              }
              # (11) try to strip off one element from end and check the remains (any text size) ----
              # (this is in fact 7 but without nchar limit of >=6)
              if (isTRUE(debug)) {
                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (11) try to strip off one element from end and check the remains (any text size)\n"))
              }
              if (length(x_strip) > 1) {
                for (i in seq_len(length(x_strip) - 1)) {
                  x_strip_collapsed <- paste(x_strip[seq_len(length(x_strip) - i)], collapse = " ")
                  if (isTRUE(debug)) {
                    message("Running '", x_strip_collapsed, "'")
                  }
                  # first try without dyslexia mode
                  found <- suppressMessages(suppressWarnings(exec_as.mo(x_strip_collapsed, initial_search = FALSE, dyslexia_mode = FALSE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 3, actual_input = a.x_backup)))
                  if (empty_result(found)) {
                    # then with dyslexia mode
                    found <- suppressMessages(suppressWarnings(exec_as.mo(x_strip_collapsed, initial_search = FALSE, dyslexia_mode = TRUE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 3, actual_input = a.x_backup)))
                  }
                  if (!empty_result(found)) {
                    found_result <- found
                    uncertainties <<- rbind(uncertainties,
                                            attr(found, which = "uncertainties", exact = TRUE),
                                            stringsAsFactors = FALSE)
                    found <- lookup(mo == found)
                    return(found)
                  }
                }
              }
              
              # (12) part of a name (very unlikely match) ----
              if (isTRUE(debug)) {
                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (12) part of a name (very unlikely match)\n"))
              }
              if (isTRUE(debug)) {
                message("Running '", f.x_withspaces_end_only, "'")
              }
              if (nchar(g.x_backup_without_spp) >= 6) {
                found <- lookup(fullname_lower %like_case% f.x_withspaces_end_only, column = "mo")
                if (!is.na(found)) {
                  found_result <- lookup(mo == found)
                  uncertainties <<- rbind(uncertainties,
                                          attr(found, which = "uncertainties", exact = TRUE),
                                          stringsAsFactors = FALSE)
                  found <- lookup(mo == found)
                  return(found)
                }
              }
            }
            
            
            # didn't found in uncertain results too
            return(NA_character_)
          }
          
          # uncertain results
          x[i] <- uncertain_fn(a.x_backup = a.x_backup,
                               b.x_trimmed = b.x_trimmed,
                               d.x_withspaces_start_end = d.x_withspaces_start_end,
                               e.x_withspaces_start_only = e.x_withspaces_start_only,
                               f.x_withspaces_end_only = f.x_withspaces_end_only,
                               g.x_backup_without_spp = g.x_backup_without_spp,
                               uncertain.reference_data_to_use = MO_lookup)
          if (!empty_result(x[i])) {
            return(x[i])
          }
          
          # didn't found any
          return(NA_character_)
        }
        
        # CHECK ALL IN ONE GO ----
        x[i] <- check_per_prevalence(data_to_check = MO_lookup,
                                     data.old_to_check = MO.old_lookup,
                                     a.x_backup = x_backup[i],
                                     b.x_trimmed = x_trimmed[i],
                                     c.x_trimmed_without_group = x_trimmed_without_group[i],
                                     d.x_withspaces_start_end = x_withspaces_start_end[i],
                                     e.x_withspaces_start_only = x_withspaces_start_only[i],
                                     f.x_withspaces_end_only = x_withspaces_end_only[i],
                                     g.x_backup_without_spp = x_backup_without_spp[i],
                                     h.x_species = x_species[i],
                                     i.x_trimmed_species = x_trimmed_species[i])
        if (!empty_result(x[i])) {
          next
        }
        
        
        # no results found: make them UNKNOWN ----
        x[i] <- lookup(mo == "UNKNOWN", uncertainty = -1)
        if (initial_search == TRUE) {
          failures <- c(failures, x_backup[i])
        }
      }
      
      if (initial_search == TRUE) {
        close(progress)
      }
      
      if (isTRUE(debug) && initial_search == TRUE) {
        cat("Ended search", time_track(), "\n")
      }
      
      
      # handling failures ----
      failures <- failures[!failures %in% c(NA, NULL, NaN)]
      if (length(failures) > 0 & initial_search == TRUE) {
        pkg_env$mo_failures <- sort(unique(failures))
        pkg_env$mo_failed <- c(pkg_env$mo_failed, pkg_env$mo_failures)
        plural <- c("value", "it", "was")
        if (pm_n_distinct(failures) > 1) {
          plural <- c("values", "them", "were")
        }
        x_input_clean <- trimws2(x_input)
        total_failures <- length(x_input_clean[as.character(x_input_clean) %in% as.character(failures) & !x_input %in% c(NA, NULL, NaN)])
        total_n <- length(x_input[!x_input %in% c(NA, NULL, NaN)])
        msg <- paste0(nr2char(pm_n_distinct(failures)), " unique ", plural[1],
                      " (covering ", percentage(total_failures / total_n),
                      ") could not be coerced and ", plural[3], " considered 'unknown'")
        if (pm_n_distinct(failures) <= 10) {
          msg <- paste0(msg, ": ", vector_and(failures, quotes = TRUE))
        }
        msg <- paste0(msg,
                      ".\nUse `mo_failures()` to review ", plural[2], ". Edit the `allow_uncertain` argument if needed (see ?as.mo).\n",
                      "You can also use your own reference data with set_mo_source() or directly, e.g.:\n",
                      '  as.mo("mycode", reference_df = data.frame(own = "mycode", mo = "', MO_lookup$mo[match("Escherichia coli", MO_lookup$fullname)], '"))\n',
                      '  mo_name("mycode", reference_df = data.frame(own = "mycode", mo = "', MO_lookup$mo[match("Escherichia coli", MO_lookup$fullname)], '"))\n')
        warning_(paste0("\n", msg),
                 add_fn = font_red,
                 call = FALSE,
                 immediate = TRUE) # thus will always be shown, even if >= warnings
      }
      # handling uncertainties ----
      if (NROW(uncertainties) > 0 & initial_search == TRUE) {
        uncertainties <- as.list(pm_distinct(uncertainties, input, .keep_all = TRUE))
        pkg_env$mo_uncertainties <- uncertainties
        
        plural <- c("", "it", "was")
        if (length(uncertainties$input) > 1) {
          plural <- c("s", "them", "were")
        }
        msg <- paste0("Translation is uncertain of ", nr2char(length(uncertainties$input)), " microorganism", plural[1],
                      ". Use `mo_uncertainties()` to review ", plural[2], ".")
        message_(msg)
      }
      x[already_known] <- x_known
    }
  }
  
  # Becker ----
  if (Becker == TRUE | Becker == "all") {
    # warn when species found that are not in:
    # - Becker et al. 2014, PMID 25278577
    # - Becker et al. 2019, PMID 30872103
    # - Becker et al. 2020, PMID 32056452
    post_Becker <- character(0) # 2020-10-20 currently all are mentioned in above papers (otherwise uncomment the section below)
    
    # nolint start
    # if (any(x %in% MO_lookup[which(MO_lookup$species %in% post_Becker), property])) {
    #   warning_("Becker ", font_italic("et al."), " (2014, 2019) does not contain these species named after their publication: ",
    #            font_italic(paste("S.",
    #                              sort(mo_species(unique(x[x %in% MO_lookup[which(MO_lookup$species %in% post_Becker), property]]))),
    #                              collapse = ", ")),
    #            ".",
    #            call = FALSE,
    #            immediate = TRUE)
    # }
    # nolint end

    # 'MO_CONS' and 'MO_COPS' are <mo> vectors created in R/zzz.R
    CoNS <- MO_lookup[which(MO_lookup$mo %in% MO_CONS), property, drop = TRUE]
    x[x %in% CoNS] <- lookup(mo == "B_STPHY_CONS", uncertainty = -1)

    CoPS <- MO_lookup[which(MO_lookup$mo %in% MO_COPS), property, drop = TRUE]
    x[x %in% CoPS] <- lookup(mo == "B_STPHY_COPS", uncertainty = -1)

    if (Becker == "all") {
      x[x %in% lookup(fullname %like_case% "^Staphylococcus aureus", n = Inf)] <- lookup(mo == "B_STPHY_COPS", uncertainty = -1)
    }
  }

  # Lancefield ----
  if (Lancefield == TRUE | Lancefield == "all") {
    # group A - S. pyogenes
    x[x %in% lookup(genus == "Streptococcus" & species == "pyogenes", n = Inf)] <- lookup(fullname == "Streptococcus group A", uncertainty = -1)
    # group B - S. agalactiae
    x[x %in% lookup(genus == "Streptococcus" & species == "agalactiae", n = Inf)] <- lookup(fullname == "Streptococcus group B", uncertainty = -1)
    # group C
    x[x %in% lookup(genus == "Streptococcus" &
                      species %in% c("equisimilis", "equi", "zooepidemicus", "dysgalactiae"),
                    n = Inf)] <- lookup(fullname == "Streptococcus group C", uncertainty = -1)
    if (Lancefield == "all") {
      # all Enterococci
      x[x %in% lookup(genus == "Enterococcus", n = Inf)] <- lookup(fullname == "Streptococcus group D", uncertainty = -1)
    }
    # group F - S. anginosus
    x[x %in% lookup(genus == "Streptococcus" & species == "anginosus", n = Inf)] <- lookup(fullname == "Streptococcus group F", uncertainty = -1)
    # group H - S. sanguinis
    x[x %in% lookup(genus == "Streptococcus" & species == "sanguinis", n = Inf)] <- lookup(fullname == "Streptococcus group H", uncertainty = -1)
    # group K - S. salivarius
    x[x %in% lookup(genus == "Streptococcus" & species == "salivarius", n = Inf)] <- lookup(fullname == "Streptococcus group K", uncertainty = -1)
  }

  # Wrap up ----------------------------------------------------------------

  # comply to x, which is also unique and without empty values
  x_input_unique_nonempty <- unique(x_input[!is.na(x_input)
                                            & !is.null(x_input)
                                            & !identical(x_input, "")
                                            & !identical(x_input, "xxx")])

  # left join the found results to the original input values (x_input)
  df_found <- data.frame(input = as.character(x_input_unique_nonempty),
                         found = as.character(x),
                         stringsAsFactors = FALSE)
  df_input <- data.frame(input = as.character(x_input),
                         stringsAsFactors = FALSE)

  # super fast using match() which is a lot faster than merge()
  x <- df_found$found[match(df_input$input, df_found$input)]

  if (property == "mo") {
    x <- set_clean_class(x, new_class = c("mo", "character"))
  }
  
  # keep track of time
  end_time <- Sys.time()

  if (length(mo_renamed()) > 0) {
    print(mo_renamed())
  }

  if (initial_search == FALSE) {
    # we got here from uncertain_fn().
    if (NROW(uncertainties) == 0) {
      # the stripped/transformed version of x_backup is apparently a full hit, like with: as.mo("Escherichia (hello there) coli")
      uncertainties <- rbind(uncertainties,
                             format_uncertainty_as_df(uncertainty_level = actual_uncertainty,
                                                      input = actual_input,
                                                      result_mo = x,
                                                      candidates = ""),
                             stringsAsFactors = FALSE)
    }
    # this will save the uncertain items as attribute, so they can be bound to `uncertainties` in the uncertain_fn() function
    x <- structure(x, uncertainties = uncertainties)
  } else {
    # keep track of time - give some hints to improve speed if it takes a long time
    delta_time <- difftime(end_time, start_time, units = "secs")
    if (delta_time >= 30) {
      message_("Using `as.mo()` took ", round(delta_time), " seconds, which is a long time. Some suggestions to improve speed include:")
      message_(word_wrap("- Try to use as many valid taxonomic names as possible for your input.",
                         extra_indent = 2),
               as_note = FALSE)
      message_(word_wrap("- Save the output and use it as input for future calculations, e.g. create a new variable to your data using `as.mo()`. All functions in this package that rely on microorganism codes will automatically use that new column where possible. All `mo_*()` functions also do not require you to set their `x` argument as long as you have a column of class <mo>.",
                         extra_indent = 2),
               as_note = FALSE)
      message_(word_wrap("- Use `set_mo_source()` to continually transform your organisation codes to microorganisms codes used by this package, see `?mo_source`.",
                         extra_indent = 2),
               as_note = FALSE)
    }
  }
  
  if (isTRUE(debug) && initial_search == TRUE) {
    cat("Finished function", time_track(), "\n")
  }

  x
}

empty_result <- function(x) {
  all(x %in% c(NA, "UNKNOWN"))
}

was_renamed <- function(name_old, name_new, ref_old = "", ref_new = "", mo = "") {
  newly_set <- data.frame(old_name = name_old,
                          old_ref = ref_old,
                          new_name = name_new,
                          new_ref = ref_new,
                          mo = mo,
                          stringsAsFactors = FALSE)
  already_set <- pkg_env$mo_renamed
  if (!is.null(already_set)) {
    pkg_env$mo_renamed = rbind(already_set,
                               newly_set,
                               stringsAsFactors = FALSE)
  } else {
    pkg_env$mo_renamed <- newly_set
  }
}

format_uncertainty_as_df <- function(uncertainty_level,
                                     input,
                                     result_mo,
                                     candidates = NULL) {
  if (!is.null(pkg_env$mo_renamed_last_run)) {
    fullname <- pkg_env$mo_renamed_last_run
    pkg_env$mo_renamed_last_run <- NULL
    renamed_to <- MO_lookup[match(result_mo, MO_lookup$mo), "fullname", drop = TRUE][1]
  } else {
    fullname <- MO_lookup[match(result_mo, MO_lookup$mo), "fullname", drop = TRUE][1]
    renamed_to <- NA_character_
  }
  data.frame(uncertainty = uncertainty_level,
             input = input,
             fullname = fullname,
             renamed_to = renamed_to,
             mo = result_mo,
             # save max 26 entries: the one to be chosen and 25 more
             candidates = if (length(candidates) > 1) paste(candidates[c(2:min(26, length(candidates)))], collapse = ", ") else "",
             stringsAsFactors = FALSE)
}

# will be exported using s3_register() in R/zzz.R
pillar_shaft.mo <- function(x, ...) {
  out <- format(x)
  # grey out the kingdom (part until first "_")
  out[!is.na(x)] <- gsub("^([A-Z]+_)(.*)", paste0(font_subtle("\\1"), "\\2"), out[!is.na(x)], perl = TRUE)
  # and grey out every _
  out[!is.na(x)] <- gsub("_", font_subtle("_"), out[!is.na(x)])

  # markup NA and UNKNOWN
  out[is.na(x)] <- font_na("  NA")
  out[x == "UNKNOWN"] <- font_na("  UNKNOWN")

  # make it always fit exactly
  max_char <- max(nchar(x))
  if (is.na(max_char)) {
    max_char <- 7
  }
  create_pillar_column(out,
                       align = "left",
                       width = max_char + ifelse(any(x %in% c(NA, "UNKNOWN")), 2, 0))
}

# will be exported using s3_register() in R/zzz.R
type_sum.mo <- function(x, ...) {
  "mo"
}

# will be exported using s3_register() in R/zzz.R
freq.mo <- function(x, ...) {
  x_noNA <- as.mo(x[!is.na(x)]) # as.mo() to get the newest mo codes
  grams <- mo_gramstain(x_noNA, language = NULL)
  digits <- list(...)$digits
  if (is.null(digits)) {
    digits <- 2
  }
  cleaner::freq.default(
    x = x,
    ...,
    .add_header = list(
      `Gram-negative` = paste0(
        format(sum(grams == "Gram-negative", na.rm = TRUE),
               big.mark = ",",
               decimal.mark = "."),
        " (", percentage(sum(grams == "Gram-negative", na.rm = TRUE) / length(grams),
                         digits = digits),
        ")"),
      `Gram-positive` = paste0(
        format(sum(grams == "Gram-positive", na.rm = TRUE),
               big.mark = ",",
               decimal.mark = "."),
        " (", percentage(sum(grams == "Gram-positive", na.rm = TRUE) / length(grams),
                         digits = digits),
        ")"),
      `Nr. of genera` = pm_n_distinct(mo_genus(x_noNA, language = NULL)),
      `Nr. of species` = pm_n_distinct(paste(mo_genus(x_noNA, language = NULL),
                                             mo_species(x_noNA, language = NULL)))))
}

# will be exported using s3_register() in R/zzz.R
get_skimmers.mo <- function(column) {
  skimr::sfl(
    skim_type = "mo",
    unique_total = ~pm_n_distinct(., na.rm = TRUE),
    gram_negative = ~sum(mo_is_gram_negative(stats::na.omit(.))),
    gram_positive = ~sum(mo_is_gram_positive(stats::na.omit(.))),
    top_genus = ~names(sort(-table(mo_genus(stats::na.omit(.), language = NULL))))[1L],
    top_species = ~names(sort(-table(mo_name(stats::na.omit(.), language = NULL))))[1L]
  )
}

#' @method print mo
#' @export
#' @noRd
print.mo <- function(x, print.shortnames = FALSE, ...) {
  cat("Class <mo>\n")
  x_names <- names(x)
  if (is.null(x_names) & print.shortnames == TRUE) {
    x_names <- tryCatch(mo_shortname(x, ...), error = function(e) NULL)
  }
  x <- as.character(x)
  names(x) <- x_names
  print.default(x, quote = FALSE)
}

#' @method summary mo
#' @export
#' @noRd
summary.mo <- function(object, ...) {
  # unique and top 1-3
  x <- as.mo(object) # force again, could be mo from older pkg version
  top <- as.data.frame(table(x), responseName = "n", stringsAsFactors = FALSE)
  top_3 <- top[order(-top$n), 1][1:3]
  value <- c("Class" = "mo",
             "<NA>" = length(x[is.na(x)]),
             "Unique" = pm_n_distinct(x[!is.na(x)]),
             "#1" = top_3[1],
             "#2" = top_3[2],
             "#3" = top_3[3])
  class(value) <- c("summaryDefault", "table")
  value
}

#' @method as.data.frame mo
#' @export
#' @noRd
as.data.frame.mo <- function(x, ...) {
  nm <- deparse1(substitute(x))
  if (!"nm" %in% names(list(...))) {
    as.data.frame.vector(as.mo(x), ..., nm = nm)
  } else {
    as.data.frame.vector(as.mo(x), ...)
  }
}

#' @method [ mo
#' @export
#' @noRd
"[.mo" <- function(x, ...) {
  y <- NextMethod()
  attributes(y) <- attributes(x)
  y
}
#' @method [[ mo
#' @export
#' @noRd
"[[.mo" <- function(x, ...) {
  y <- NextMethod()
  attributes(y) <- attributes(x)
  y
}
#' @method [<- mo
#' @export
#' @noRd
"[<-.mo" <- function(i, j, ..., value) {
  y <- NextMethod()
  attributes(y) <- attributes(i)
  # must only contain valid MOs
  class_integrity_check(y, "microorganism code", c(as.character(microorganisms$mo),
                                                   as.character(microorganisms.translation$mo_old)))
}
#' @method [[<- mo
#' @export
#' @noRd
"[[<-.mo" <- function(i, j, ..., value) {
  y <- NextMethod()
  attributes(y) <- attributes(i)
  # must only contain valid MOs
  class_integrity_check(y, "microorganism code", c(as.character(microorganisms$mo),
                                                   as.character(microorganisms.translation$mo_old)))
}
#' @method c mo
#' @export
#' @noRd
c.mo <- function(x, ...) {
  y <- NextMethod()
  attributes(y) <- attributes(x)
  # must only contain valid MOs
  class_integrity_check(y, "microorganism code", c(as.character(microorganisms$mo),
                                                   as.character(microorganisms.translation$mo_old)))
}

#' @method unique mo
#' @export
#' @noRd
unique.mo <- function(x, incomparables = FALSE, ...) {
  y <- NextMethod()
  attributes(y) <- attributes(x)
  y
}

#' @method rep mo
#' @export
#' @noRd
rep.mo <- function(x, ...) {
  y <- NextMethod()
  attributes(y) <- attributes(x)
  y
}

#' @rdname as.mo
#' @export
mo_failures <- function() {
  pkg_env$mo_failures
}

#' @rdname as.mo
#' @export
mo_uncertainties <- function() {
  if (is.null(pkg_env$mo_uncertainties)) {
    return(NULL)
  }
  set_clean_class(as.data.frame(pkg_env$mo_uncertainties, 
                                stringsAsFactors = FALSE),
                  new_class = c("mo_uncertainties", "data.frame"))
}

#' @method print mo_uncertainties
#' @export
#' @noRd
print.mo_uncertainties <- function(x, ...) {
  if (NROW(x) == 0) {
    return(NULL)
  }
  message_("Matching scores are based on human pathogenic prevalence and the resemblance between the input and the full taxonomic name. See `?mo_matching_score`.", as_note = FALSE)

  msg <- ""
  for (i in seq_len(nrow(x))) {
    if (x[i, ]$candidates != "") {
      candidates <- unlist(strsplit(x[i, ]$candidates, ", ", fixed = TRUE))
      scores <- mo_matching_score(x = x[i, ]$input, n = candidates)
      # sort on descending scores
      candidates <- candidates[order(1 - scores)]
      scores_formatted <- trimws(formatC(round(scores, 3), format = "f", digits = 3))
      n_candidates <- length(candidates)
      candidates <- vector_and(paste0(candidates, " (", scores_formatted[order(1 - scores)], ")"),
                               quotes = FALSE, 
                               sort = FALSE)
      # align with input after arrow
      candidates <- paste0("\n",
                           strwrap(paste0("Also matched",
                                          ifelse(n_candidates >= 25, " (max 25)", ""), ": ",
                                          candidates), # this is already max 25 due to format_uncertainty_as_df()
                                   indent = nchar(x[i, ]$input) + 6,
                                   exdent = nchar(x[i, ]$input) + 6,
                                   width = 0.98 * getOption("width")),
                           collapse = "")
      # after strwrap, make taxonomic names italic
      candidates <- gsub("([A-Za-z]+)", font_italic("\\1"), candidates, perl = TRUE)
      candidates <- gsub(paste(font_italic(c("Also", "matched"), collapse = NULL), collapse = " "),
                         "Also matched",
                         candidates, fixed = TRUE)
      candidates <- gsub(font_italic("max"), "max", candidates, fixed = TRUE)
    } else {
      candidates <- ""
    }
    score <- trimws(formatC(round(mo_matching_score(x = x[i, ]$input,
                                                    n = x[i, ]$fullname),
                                  3),
                            format = "f", digits = 3))
    msg <- paste(msg,
                 paste0(
                   strwrap(
                     paste0('"', x[i, ]$input, '" -> ',
                            paste0(font_bold(font_italic(x[i, ]$fullname)),
                                   ifelse(!is.na(x[i, ]$renamed_to), paste(", renamed to", font_italic(x[i, ]$renamed_to)), ""),
                                   " (", x[i, ]$mo,
                                   ", matching score = ", score,
                                   ") ")),
                     width = 0.98 * getOption("width"),
                     exdent = nchar(x[i, ]$input) + 6),
                   collapse = "\n"),
                 candidates,
                 sep = "\n")
    msg <- paste0(gsub("\n\n", "\n", msg), "\n\n")
  }
  cat(msg)
}

#' @rdname as.mo
#' @export
mo_renamed <- function() {
  items <- pkg_env$mo_renamed
  if (is.null(items)) {
    items <- data.frame(stringsAsFactors = FALSE)
  } else {
    items <- pm_distinct(items, old_name, .keep_all = TRUE)
  }
  set_clean_class(as.data.frame(items,
                                stringsAsFactors = FALSE),
                  new_class = c("mo_renamed", "data.frame"))
}

#' @method print mo_renamed
#' @export
#' @noRd
print.mo_renamed <- function(x, ...) {
  if (NROW(x) == 0) {
    return(invisible())
  }
  for (i in seq_len(nrow(x))) {
    message_(font_italic(x$old_name[i]),
             ifelse(x$old_ref[i] %in% c("", NA),
                    "",
                    paste0(" (",  gsub("et al.", font_italic("et al."), x$old_ref[i]), ")")),
             " was renamed ",
             ifelse(!x$new_ref[i] %in% c("", NA) && as.integer(gsub("[^0-9]", "", x$new_ref[i])) < as.integer(gsub("[^0-9]", "", x$old_ref[i])),
                    font_bold("back to "),
                    ""),
             font_italic(x$new_name[i]),
             ifelse(x$new_ref[i] %in% c("", NA), 
                    "",
                    paste0(" (",  gsub("et al.", font_italic("et al."), x$new_ref[i]), ")")),
             " [", x$mo[i], "]")
  }
}

nr2char <- function(x) {
  if (x %in% c(1:10)) {
    v <- c("one" = 1, "two" = 2, "three" = 3, "four" = 4, "five" = 5,
           "six" = 6, "seven" = 7, "eight" = 8, "nine" = 9, "ten" = 10)
    names(v[x])
  } else {
    x
  }
}

unregex <- function(x) {
  gsub("[^a-zA-Z0-9 -]", "", x)
}

translate_allow_uncertain <- function(allow_uncertain) {
  if (isTRUE(allow_uncertain)) {
    # default to uncertainty level 2
    allow_uncertain <- 2
  } else {
    allow_uncertain[tolower(allow_uncertain) == "none"] <- 0
    allow_uncertain[tolower(allow_uncertain) == "all"] <- 3
    allow_uncertain <- as.integer(allow_uncertain)
    stop_ifnot(allow_uncertain %in% c(0:3),
               '`allow_uncertain` must be a number between 0 (or "none") and 3 (or "all"), or TRUE (= 2) or FALSE (= 0)', call = FALSE)
  }
  allow_uncertain
}

get_mo_failures_uncertainties_renamed <- function() {
  remember <- list(failures = pkg_env$mo_failures,
                   uncertainties = pkg_env$mo_uncertainties,
                   renamed = pkg_env$mo_renamed)
  # empty them, otherwise mo_shortname("Chlamydophila psittaci") will give 3 notes
  pkg_env$mo_failures <- NULL
  pkg_env$mo_uncertainties <- NULL
  pkg_env$mo_renamed <- NULL
  remember
}

load_mo_failures_uncertainties_renamed <- function(metadata) {
  pkg_env$mo_failures <- metadata$failures
  pkg_env$mo_uncertainties <- metadata$uncertainties
  pkg_env$mo_renamed <- metadata$renamed
}

trimws2 <- function(x) {
  trimws(gsub("[\\s]+", " ", x, perl = TRUE))
}

parse_and_convert <- function(x) {
  tryCatch({
    if (!is.null(dim(x))) {
      if (NCOL(x) > 2) {
        stop("a maximum of two columns is allowed", call. = FALSE)
      } else if (NCOL(x) == 2) {
        # support Tidyverse selection like: df %>% select(colA, colB)
        # paste these columns together
        x <- as.data.frame(x, stringsAsFactors = FALSE)
        colnames(x) <- c("A", "B")
        x <- paste(x$A, x$B)
      } else {
        # support Tidyverse selection like: df %>% select(colA)
        x <- as.data.frame(x, stringsAsFactors = FALSE)[[1]]
      }
    }
    x[is.null(x)] <- NA
    parsed <- iconv(x, to = "UTF-8")
    parsed[is.na(parsed) & !is.na(x)] <- iconv(x[is.na(parsed) & !is.na(x)], from = "Latin1", to = "ASCII//TRANSLIT")
    parsed <- gsub('"', "", parsed, fixed = TRUE)
    parsed <- gsub(" +", " ", parsed, perl = TRUE)
    parsed <- trimws(parsed)
  }, error = function(e) stop(e$message, call. = FALSE)) # this will also be thrown when running `as.mo(no_existing_object)`
  parsed
}

replace_old_mo_codes <- function(x, property) {
  if (any(toupper(x) %in% microorganisms.translation$mo_old, na.rm = TRUE)) {
    # get the ones that match
    matched <- match(toupper(x), microorganisms.translation$mo_old)
    # and their new codes
    mo_new <- microorganisms.translation$mo_new[matched]
    # assign on places where a match was found
    x[which(!is.na(matched))] <- mo_new[which(!is.na(matched))]
    n_matched <- length(matched[!is.na(matched)])
    if (property != "mo") {
      message_(font_blue("The input contained old microbial codes (from previous package versions). Please update your MO codes with `as.mo()`."))
    } else {
      if (n_matched == 1) {
        message_(font_blue("1 old microbial code (from previous package versions) was updated to a current used MO code."))
      } else {
        message_(font_blue(n_matched, "old microbial codes (from previous package versions) were updated to current used MO codes.")) 
      }
    }
  }
  x
}

replace_ignore_pattern <- function(x, ignore_pattern) {
  if (!is.null(ignore_pattern) && !identical(trimws2(ignore_pattern), "")) {
    ignore_cases <- x %like% ignore_pattern
    if (sum(ignore_cases) > 0) {
      message_("The following input was ignored by `ignore_pattern = \"", ignore_pattern, "\"`: ",
               vector_and(x[ignore_cases], quotes = TRUE))
      x[ignore_cases] <- NA_character_
    }
  }
  x
}

repair_reference_df <- function(reference_df) {
  # has valid own reference_df
  reference_df <- reference_df %pm>%
    pm_filter(!is.na(mo))
  
  # keep only first two columns, second must be mo
  if (colnames(reference_df)[1] == "mo") {
    reference_df <- reference_df %pm>% pm_select(2, "mo")
  } else {
    reference_df <- reference_df %pm>% pm_select(1, "mo")
  }
  
  # remove factors, just keep characters
  colnames(reference_df)[1] <- "x"
  reference_df[, "x"] <- as.character(reference_df[, "x", drop = TRUE])
  reference_df[, "mo"] <- as.character(reference_df[, "mo", drop = TRUE])
  
  # some microbial codes might be old
  reference_df[, "mo"] <- as.mo(reference_df[, "mo", drop = TRUE])
  reference_df
}

strip_words <- function(text, n, side = "right") {
  out <- lapply(strsplit(text, " "), function(x) {
    if (side %like% "^r" & length(x) > n) {
      x[seq_len(length(x) - n)]
    } else if (side %like% "^l" & length(x) > n) {
      x[2:length(x)]
    }
  })
  vapply(FUN.VALUE = character(1), out, paste, collapse = " ")
}
-												fix for printing tibbles, improve guess_bactid

											
										
										
											2018-06-08 12:06:54 +02:00
+								# ==================================================================== #
 								# TITLE                                                                #
-												(v1.5.0.9014) only_rsi_columns, is.rsi.eligible improvement

											
										
										
											2021-02-02 23:57:35 +01:00
+								# Antimicrobial Resistance (AMR) Data Analysis for R                   #
-												fix for printing tibbles, improve guess_bactid

											
										
										
											2018-06-08 12:06:54 +02:00
+								#                                                                      #
-												big website update, licence txt update

											
										
										
											2019-01-02 23:24:07 +01:00
+								# SOURCE                                                               #
-												(v1.2.0.9026) move to github

											
										
										
											2020-07-08 14:48:06 +02:00
+								# https://github.com/msberends/AMR                                     #
-												fix for printing tibbles, improve guess_bactid

											
										
										
											2018-06-08 12:06:54 +02:00
+								#                                                                      #
 								# LICENCE                                                              #
-												(v1.4.0.9047) unit tests

											
										
										
											2020-12-27 00:30:28 +01:00
+								# (c) 2018-2021 Berends MS, Luz CF et al.                              #
-												(v1.4.0) matching score update

											
										
										
											2020-10-08 11:16:03 +02:00
+								# Developed at the University of Groningen, the Netherlands, in        #
 								# collaboration with non-profit organisations Certe Medical            #
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								# Diagnostics & Advice, and University Medical Center Groningen.       #
-												fix for printing tibbles, improve guess_bactid

											
										
										
											2018-06-08 12:06:54 +02:00
+								#                                                                      #
-												big website update, licence txt update

											
										
										
											2019-01-02 23:24:07 +01:00
+								# This R package is free software; you can freely use and distribute   #
 								# it for both personal and commercial purposes under the terms of the  #
 								# GNU General Public License version 2.0 (GNU GPL-2), as published by  #
 								# the Free Software Foundation.                                        #
-												(v0.9.0.9008) Happy new year! Add lifecycles

											
										
										
											2020-01-05 17:22:09 +01:00
+								# We created this package for both routine data analysis and academic  #
 								# research and it was publicly released in the hope that it will be    #
 								# useful, but it comes WITHOUT ANY WARRANTY OR LIABILITY.              #
-												(v1.4.0) matching score update

											
										
										
											2020-10-08 11:16:03 +02:00
+								#                                                                      #
 								# Visit our website for the full manual and a complete tutorial about  #
-												(v1.5.0.9014) only_rsi_columns, is.rsi.eligible improvement

											
										
										
											2021-02-02 23:57:35 +01:00
+								# how to conduct AMR data analysis: https://msberends.github.io/AMR/   #
-												fix for printing tibbles, improve guess_bactid

											
										
										
											2018-06-08 12:06:54 +02:00
+								# ==================================================================== #
-												(v1.5.0.9006) major documentation update

											
										
										
											2021-01-18 16:57:56 +01:00
+								#' Transform Input to a Microorganism ID
-												fix for printing tibbles, improve guess_bactid

											
										
										
											2018-06-08 12:06:54 +02:00
+								#'
-												(v1.5.0.9006) major documentation update

											
										
										
											2021-01-18 16:57:56 +01:00
+								#' Use this function to determine a valid microorganism ID ([`mo`]). Determination is done using intelligent rules and the complete taxonomic kingdoms Bacteria, Chromista, Protozoa, Archaea and most microbial species from the kingdom Fungi (see *Source*). The input can be almost anything: a full name (like `"Staphylococcus aureus"`), an abbreviated name (such as `"S. aureus"`), an abbreviation known in the field (such as `"MRSA"`), or just a genus. See *Examples*.
 								#' @inheritSection lifecycle Stable Lifecycle
-												(v1.3.0.9022) mo_matching_score(), poorman update, as.rsi() fix

											
										
										
											2020-09-18 16:05:53 +02:00
+								#' @param x a character vector or a [data.frame] with one or two columns
-												(v1.4.0.9013) is_gram_negative/positive update

											
										
										
											2020-11-09 13:07:02 +01:00
+								#' @param Becker a logical to indicate whether staphylococci should be categorised into coagulase-negative staphylococci ("CoNS") and coagulase-positive staphylococci ("CoPS") instead of their own species, according to Karsten Becker *et al.* (1,2,3).
-												Support for German and Spanish microorganism properties, cleanup

											
										
										
											2018-09-04 11:33:30 +02:00
+								#'
-												(v0.9.0.9003) as.mo() speedup for fullnames

											
										
										
											2019-12-20 15:05:58 +01:00
+								#' This excludes *Staphylococcus aureus* at default, use `Becker = "all"` to also categorise *S. aureus* as "CoPS".
-												(v1.4.0.9003) CoNS update

											
										
										
											2020-10-20 21:00:57 +02:00
+								#' @param Lancefield a logical to indicate whether beta-haemolytic *Streptococci* should be categorised into Lancefield groups instead of their own species, according to Rebecca C. Lancefield (4). These *Streptococci* will be categorised in their first group, e.g. *Streptococcus dysgalactiae* will be group C, although officially it was also categorised into groups G and L.
-												Support for German and Spanish microorganism properties, cleanup

											
										
										
											2018-09-04 11:33:30 +02:00
+								#'
-												(v0.9.0.9003) as.mo() speedup for fullnames

											
										
										
											2019-12-20 15:05:58 +01:00
+								#' This excludes *Enterococci* at default (who are in group D), use `Lancefield = "all"` to also categorise all *Enterococci* as group D.
-												(v1.5.0.9006) major documentation update

											
										
										
											2021-01-18 16:57:56 +01:00
+								#' @param allow_uncertain a number between `0` (or `"none"`) and `3` (or `"all"`), or `TRUE` (= `2`) or `FALSE` (= `0`) to indicate whether the input should be checked for less probable results, see *Details*
-												(v1.3.0.9022) mo_matching_score(), poorman update, as.rsi() fix

											
										
										
											2020-09-18 16:05:53 +02:00
+								#' @param reference_df a [data.frame] to be used for extra reference when translating `x` to a valid [`mo`]. See [set_mo_source()] and [get_mo_source()] to automate the usage of your own codes (e.g. used in your analysis or organisation).
-												(v1.3.0.9014) as.mo() speed improvement

											
										
										
											2020-09-03 12:31:48 +02:00
+								#' @param ignore_pattern a regular expression (case-insensitive) of which all matches in `x` must return `NA`. This can be convenient to exclude known non-relevant input and can also be set with the option `AMR_ignore_pattern`, e.g. `options(AMR_ignore_pattern = "(not reported|contaminated flora)")`.
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								#' @param language language to translate text like "no growth", which defaults to the system language (see [get_locale()])
-												(v1.4.0.9043) documentation update

											
										
										
											2020-12-22 00:51:17 +01:00
+								#' @param ... other arguments passed on to functions
-												replaced bactid by mo

											
										
										
											2018-08-31 13:36:19 +02:00
+								#' @rdname as.mo
 								#' @aliases mo
 								#' @keywords mo Becker becker Lancefield lancefield guess
-												first inclusion of ITIS data

											
										
										
											2018-09-24 23:33:29 +02:00
+								#' @details
-												(v1.5.0.9006) major documentation update

											
										
										
											2021-01-18 16:57:56 +01:00
+								#' ## General Info
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								#'
-												(v1.3.0.9014) as.mo() speed improvement

											
										
										
											2020-09-03 12:31:48 +02:00
+								#' A microorganism ID from this package (class: [`mo`]) is human readable and typically looks like these examples:
-												(v0.8.0.9036) complete documentation rewrite

											
										
										
											2019-11-28 22:32:17 +01:00
+								#' ```
-												(v0.7.1.9075) new microorganism codes

											
										
										
											2019-09-18 15:46:09 +02:00
+								#'   Code               Full name
 								#'   ---------------    --------------------------------------
 								#'   B_KLBSL            Klebsiella
 								#'   B_KLBSL_PNMN       Klebsiella pneumoniae
 								#'   B_KLBSL_PNMN_RHNS  Klebsiella pneumoniae rhinoscleromatis
 								#'   |   |    |    |
 								#'   |   |    |    |
-												(v1.3.0.9014) as.mo() speed improvement

											
										
										
											2020-09-03 12:31:48 +02:00
+								#'   |   |    |    \---> subspecies, a 4-5 letter acronym
 								#'   |   |    \----> species, a 4-5 letter acronym
 								#'   |   \----> genus, a 5-7 letter acronym
 								#'   \----> taxonomic kingdom: A (Archaea), AN (Animalia), B (Bacteria),
-												(v0.7.1.9055) algorithm improvements

											
										
										
											2019-08-13 16:15:08 +02:00
+								#'                             C (Chromista), F (Fungi), P (Protozoa)
-												(v0.8.0.9036) complete documentation rewrite

											
										
										
											2019-11-28 22:32:17 +01:00
+								#' ```
-												emph guess_bactid

											
										
										
											2018-08-01 08:03:31 +02:00
+								#'
-												(v1.2.0.9035) as.mo() speed improvement

											
										
										
											2020-07-22 10:24:23 +02:00
+								#' Values that cannot be coerced will be considered 'unknown' and will get the MO code `UNKNOWN`.
-												unknown codes, rsi fix

											
										
										
											2019-03-02 22:47:04 +01:00
+								#'
-												(v1.5.0.9006) major documentation update

											
										
										
											2021-01-18 16:57:56 +01:00
+								#' Use the [`mo_*`][mo_property()] functions to get properties based on the returned code, see *Examples*.
-												memory for as.mo()

											
										
										
											2019-03-15 13:57:25 +01:00
+								#'
-												(v0.8.0.9036) complete documentation rewrite

											
										
										
											2019-11-28 22:32:17 +01:00
+								#' The algorithm uses data from the Catalogue of Life (see below) and from one other source (see [microorganisms]).
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								#'
-												(v0.8.0.9036) complete documentation rewrite

											
										
										
											2019-11-28 22:32:17 +01:00
+								#' The [as.mo()] function uses several coercion rules for fast and logical results. It assesses the input matching criteria in the following order:
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								#'
-												(v0.8.0.9036) complete documentation rewrite

											
										
										
											2019-11-28 22:32:17 +01:00
+								#' 1. Human pathogenic prevalence: the function  starts with more prevalent microorganisms, followed by less prevalent ones;
 								#' 2. Taxonomic kingdom: the function starts with determining Bacteria, then Fungi, then Protozoa, then others;
 								#' 3. Breakdown of input values to identify possible matches.
-												first inclusion of ITIS data

											
										
										
											2018-09-24 23:33:29 +02:00
+								#'
-												(v1.3.0.9022) mo_matching_score(), poorman update, as.rsi() fix

											
										
										
											2020-09-18 16:05:53 +02:00
+								#' This will lead to the effect that e.g. `"E. coli"` (a microorganism highly prevalent in humans) will return the microbial ID of *Escherichia coli* and not *Entamoeba coli* (a microorganism less prevalent in humans), although the latter would alphabetically come first.
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								#'
-												(v1.5.0.9006) major documentation update

											
										
										
											2021-01-18 16:57:56 +01:00
+								#' ## Coping with Uncertain Results
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								#'
 								#' In addition, the [as.mo()] function can differentiate four levels of uncertainty to guess valid results:
-												(v0.8.0.9036) complete documentation rewrite

											
										
										
											2019-11-28 22:32:17 +01:00
+								#' - Uncertainty level 0: no additional rules are applied;
 								#' - Uncertainty level 1: allow previously accepted (but now invalid) taxonomic names and minor spelling errors;
 								#' - Uncertainty level 2: allow all of level 1, strip values between brackets, inverse the words of the input, strip off text elements from the end keeping at least two elements;
 								#' - Uncertainty level 3: allow all of level 1 and 2, strip off text elements from the end, allow any part of a taxonomic name.
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								#'
-												(v1.2.0.9035) as.mo() speed improvement

											
										
										
											2020-07-22 10:24:23 +02:00
+								#' The level of uncertainty can be set using the argument `allow_uncertain`. The default is `allow_uncertain = TRUE`, which is equal to uncertainty level 2. Using `allow_uncertain = FALSE` is equal to uncertainty level 0 and will skip all rules. You can also use e.g. `as.mo(..., allow_uncertain = 1)` to only allow up to level 1 uncertainty.
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								#'
-												(v1.2.0.9035) as.mo() speed improvement

											
										
										
											2020-07-22 10:24:23 +02:00
+								#' With the default setting (`allow_uncertain = TRUE`, level 2), below examples will lead to valid results:
-												(v1.0.1.9004) Support dplyr 1.0.0

											
										
										
											2020-03-14 14:05:43 +01:00
+								#' - `"Streptococcus group B (known as S. agalactiae)"`. The text between brackets will be removed and a warning will be thrown that the result *Streptococcus group B* (``r as.mo("Streptococcus group B")``) needs review.
 								#' - `"S. aureus - please mind: MRSA"`. The last word will be stripped, after which the function will try to find a match. If it does not, the second last word will be stripped, etc. Again, a warning will be thrown that the result *Staphylococcus aureus* (``r as.mo("Staphylococcus aureus")``) needs review.
 								#' - `"Fluoroquinolone-resistant Neisseria gonorrhoeae"`. The first word will be stripped, after which the function will try to find a match. A warning will be thrown that the result *Neisseria gonorrhoeae* (``r as.mo("Neisseria gonorrhoeae")``) needs review.
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								#'
-												(v1.2.0.9035) as.mo() speed improvement

											
										
										
											2020-07-22 10:24:23 +02:00
+								#' There are three helper functions that can be run after using the [as.mo()] function:
-												(v1.5.0.9006) major documentation update

											
										
										
											2021-01-18 16:57:56 +01:00
+								#' - Use [mo_uncertainties()] to get a [data.frame] that prints in a pretty format with all taxonomic names that were guessed. The output contains the matching score for all matches (see *Matching Score for Microorganisms* below).
-												(v1.3.0.9022) mo_matching_score(), poorman update, as.rsi() fix

											
										
										
											2020-09-18 16:05:53 +02:00
+								#' - Use [mo_failures()] to get a [character] [vector] with all values that could not be coerced to a valid value.
 								#' - Use [mo_renamed()] to get a [data.frame] with all values that could be coerced based on old, previously accepted taxonomic names.
-												mo codes for WHONET

											
										
										
											2019-02-08 16:06:54 +01:00
+								#'
-												(v1.5.0.9006) major documentation update

											
										
										
											2021-01-18 16:57:56 +01:00
+								#' ## Microbial Prevalence of Pathogens in Humans
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								#'
-												(v1.5.0.9006) major documentation update

											
										
										
											2021-01-18 16:57:56 +01:00
+								#' The intelligent rules consider the prevalence of microorganisms in humans grouped into three groups, which is available as the `prevalence` columns in the [microorganisms] and [microorganisms.old] data sets. The grouping into human pathogenic prevalence is explained in the section *Matching Score for Microorganisms* below.
 								#' @inheritSection mo_matching_score Matching Score for Microorganisms
-												Catalogue of life

											
										
										
											2019-02-20 00:04:48 +01:00
+								#' @inheritSection catalogue_of_life Catalogue of Life
-												speed improvement eucast_rules(), support more old MO codes

											
										
										
											2019-05-20 12:00:18 +02:00
+								#  (source as a section here, so it can be inherited by other man pages:)
-												first inclusion of ITIS data

											
										
										
											2018-09-24 23:33:29 +02:00
+								#' @section Source:
-												v1.5.0

											
										
										
											2021-01-06 11:16:17 +01:00
+								#' 1. Becker K *et al.* **Coagulase-Negative Staphylococci**. 2014. Clin Microbiol Rev. 27(4): 870–926; \doi{10.1128/CMR.00109-13}
 								#' 2. Becker K *et al.* **Implications of identifying the recently defined members of the *S. aureus* complex, *S. argenteus* and *S. schweitzeri*: A position paper of members of the ESCMID Study Group for staphylococci and Staphylococcal Diseases (ESGS).** 2019. Clin Microbiol Infect; \doi{10.1016/j.cmi.2019.02.028}
 								#' 3. Becker K *et al.* **Emergence of coagulase-negative staphylococci** 2020. Expert Rev Anti Infect Ther. 18(4):349-366; \doi{10.1080/14787210.2020.1730813}
 								#' 4. Lancefield RC **A serological differentiation of human and other groups of hemolytic streptococci**. 1933. J Exp Med. 57(4): 571–95; \doi{10.1084/jem.57.4.571}
-												(v1.4.0.9003) CoNS update

											
										
										
											2020-10-20 21:00:57 +02:00
+								#' 5. Catalogue of Life: Annual Checklist (public online taxonomic database), <http://www.catalogueoflife.org> (check included annual version with [catalogue_of_life_version()]).
-												fix for printing tibbles, improve guess_bactid

											
										
										
											2018-06-08 12:06:54 +02:00
+								#' @export
-												(v1.3.0.9022) mo_matching_score(), poorman update, as.rsi() fix

											
										
										
											2020-09-18 16:05:53 +02:00
+								#' @return A [character] [vector] with additional class [`mo`]
 								#' @seealso [microorganisms] for the [data.frame] that is being used to determine ID's.
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								#'
-												(v1.4.0.9041) updates based on review

											
										
										
											2020-12-17 16:22:25 +01:00
+								#' The [`mo_*`][mo_property()] functions (such as [mo_genus()], [mo_gramstain()]) to get properties based on the returned code.
-												(v1.5.0.9006) major documentation update

											
										
										
											2021-01-18 16:57:56 +01:00
+								#' @inheritSection AMR Reference Data Publicly Available
 								#' @inheritSection AMR Read more on Our Website!
-												fix for printing tibbles, improve guess_bactid

											
										
										
											2018-06-08 12:06:54 +02:00
+								#' @examples
-												(v0.7.1.9006) new rsi calculations, atc class removal

											
										
										
											2019-07-02 16:48:52 +02:00
+								#' \donttest{
-												(v0.7.1.9075) new microorganism codes

											
										
										
											2019-09-18 15:46:09 +02:00
+								#' # These examples all return "B_STPHY_AURS", the ID of S. aureus:
-												WHONET fix

											
										
										
											2019-03-09 08:21:00 +01:00
+								#' as.mo("sau") # WHONET code
-												replaced bactid by mo

											
										
										
											2018-08-31 13:36:19 +02:00
+								#' as.mo("stau")
 								#' as.mo("STAU")
 								#' as.mo("staaur")
 								#' as.mo("S. aureus")
 								#' as.mo("S aureus")
 								#' as.mo("Staphylococcus aureus")
-												AI improvements

											
										
										
											2018-12-07 12:04:55 +01:00
+								#' as.mo("Staphylococcus aureus (MRSA)")
-												(v0.8.0.9031) as.mo() improvements

											
										
										
											2019-11-15 15:25:03 +01:00
+								#' as.mo("Zthafilokkoockus oureuz") # handles incorrect spelling
-												(v0.9.0.9016) Support SNOMED codes

											
										
										
											2020-01-27 19:14:23 +01:00
+								#' as.mo("MRSA")    # Methicillin Resistant S. aureus
 								#' as.mo("VISA")    # Vancomycin Intermediate S. aureus
 								#' as.mo("VRSA")    # Vancomycin Resistant S. aureus
 								#' as.mo(115329001) # SNOMED CT code
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								#'
-												DSMZ data

											
										
										
											2019-03-18 14:29:41 +01:00
+								#' # Dyslexia is no problem - these all work:
 								#' as.mo("Ureaplasma urealyticum")
 								#' as.mo("Ureaplasma urealyticus")
 								#' as.mo("Ureaplasmium urealytica")
 								#' as.mo("Ureaplazma urealitycium")
 								#'
-												added mo_shortname

											
										
										
											2018-09-05 10:51:46 +02:00
+								#' as.mo("Streptococcus group A")
 								#' as.mo("GAS") # Group A Streptococci
 								#' as.mo("GBS") # Group B Streptococci
 								#'
-												(v0.7.1.9075) new microorganism codes

											
										
										
											2019-09-18 15:46:09 +02:00
+								#' as.mo("S. epidermidis")                 # will remain species: B_STPHY_EPDR
 								#' as.mo("S. epidermidis", Becker = TRUE)  # will not remain species: B_STPHY_CONS
-												Becker classification
Lancefield classification
Added Lancefield groups to `microorganisms` data set

											
										
										
											2018-08-02 13:15:45 +02:00
+								#'
-												(v0.7.1.9075) new microorganism codes

											
										
										
											2019-09-18 15:46:09 +02:00
+								#' as.mo("S. pyogenes")                    # will remain species: B_STRPT_PYGN
 								#' as.mo("S. pyogenes", Lancefield = TRUE) # will not remain species: B_STRPT_GRPA
-												Becker classification
Lancefield classification
Added Lancefield groups to `microorganisms` data set

											
										
										
											2018-08-02 13:15:45 +02:00
+								#'
-												DSMZ data

											
										
										
											2019-03-18 14:29:41 +01:00
+								#' # All mo_* functions use as.mo() internally too (see ?mo_property):
-												(v1.4.0.9020) mo_is_intrinsic_resistant

											
										
										
											2020-11-16 11:03:24 +01:00
+								#' mo_genus("E. coli")                           # returns "Escherichia"
 								#' mo_gramstain("E. coli")                       # returns "Gram negative"
 								#' mo_is_intrinsic_resistant("E. coli", "vanco") # returns TRUE
-												fix for printing tibbles, improve guess_bactid

											
										
										
											2018-06-08 12:06:54 +02:00
+								#' }
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								as.mo <- function(x,
 								                  Becker = FALSE,
 								                  Lancefield = FALSE,
 								                  allow_uncertain = TRUE,
-												(v1.3.0.9014) as.mo() speed improvement

											
										
										
											2020-09-03 12:31:48 +02:00
+								                  reference_df = get_mo_source(),
 								                  ignore_pattern = getOption("AMR_ignore_pattern"),
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								                  language = get_locale(),
-												(v0.8.0.9034) add cid to antivirals

											
										
										
											2019-11-23 12:39:57 +01:00
+								                  ...) {
-												(v1.4.0.9003) CoNS update

											
										
										
											2020-10-20 21:00:57 +02:00
+								  meet_criteria(x, allow_class = c("mo", "data.frame", "list", "character", "numeric", "integer", "factor"), allow_NA = TRUE)
-												(v1.4.0.9001) is_gram_positive(), is_gram_negative(), parameter hardening

											
										
										
											2020-10-19 17:09:19 +02:00
+								  meet_criteria(Becker, allow_class = c("logical", "character"), has_length = 1)
 								  meet_criteria(Lancefield, allow_class = c("logical", "character"), has_length = 1)
 								  meet_criteria(allow_uncertain, allow_class = c("logical", "numeric", "integer"), has_length = 1)
 								  meet_criteria(reference_df, allow_class = "data.frame", allow_NULL = TRUE)
 								  meet_criteria(ignore_pattern, allow_class = "character", has_length = 1, allow_NULL = TRUE)
 								  meet_criteria(language, has_length = 1, is_in = c(LANGUAGES_SUPPORTED, ""), allow_NULL = TRUE, allow_NA = TRUE)
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v0.9.0.9023) EUCAST 2020 guidelines

											
										
										
											2020-02-14 19:54:13 +01:00
+								  check_dataset_integrity()
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v1.3.0.9038) prefinal 1.4.0

											
										
										
											2020-10-04 19:26:43 +02:00
+								  if (tryCatch(all(x[!is.na(x)] %in% MO_lookup$mo)
-												(v1.3.0.9016) mo_uncertainties() overhaul

											
										
										
											2020-09-12 08:49:01 +02:00
+								               & isFALSE(Becker)
 								               & isFALSE(Lancefield), error = function(e) FALSE)) {
 								    # don't look into valid MO codes, just return them
-												(v1.4.0.9015) bugfix

											
										
										
											2020-11-10 16:35:56 +01:00
+								    # is.mo() won't work - MO codes might change between package versions
-												(v1.4.0.9021) more robust class setting

											
										
										
											2020-11-16 16:57:55 +01:00
+								    return(set_clean_class(x, new_class = c("mo", "character")))
-												(v1.3.0.9016) mo_uncertainties() overhaul

											
										
										
											2020-09-12 08:49:01 +02:00
+								  }
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v1.0.1.9005) as.mo() improvements

											
										
										
											2020-04-13 21:09:56 +02:00
+								  # start off with replaced language-specific non-ASCII characters with ASCII characters
-												(v1.0.1.9007) small bugfix

											
										
										
											2020-04-14 15:10:09 +02:00
+								  x <- parse_and_convert(x)
-												(v1.2.0.9035) as.mo() speed improvement

											
										
										
											2020-07-22 10:24:23 +02:00
+								  # replace mo codes used in older package versions
-												(v1.2.0.9036) unit test fix

											
										
										
											2020-07-22 12:29:51 +02:00
+								  x <- replace_old_mo_codes(x, property = "mo")
-												(v1.3.0.9014) as.mo() speed improvement

											
										
										
											2020-09-03 12:31:48 +02:00
+								  # ignore cases that match the ignore pattern
 								  x <- replace_ignore_pattern(x, ignore_pattern)
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v0.6.1.9053) prerelease fixes

											
										
										
											2019-06-02 19:23:19 +02:00
+								  # WHONET: xxx = no growth
 								  x[tolower(as.character(paste0(x, ""))) %in% c("", "xxx", "na", "nan")] <- NA_character_
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								  # Laboratory systems: remove (translated) entries like "no growth", etc.
 								  x[trimws2(x) %like% translate_AMR("no .*growth", language = language)] <- NA_character_
 								  x[trimws2(x) %like% paste0("^(", translate_AMR("no|not", language = language), ") [a-z]+")] <- "UNKNOWN"
-												added Becker 2019

											
										
										
											2019-03-26 14:24:03 +01:00
+								  uncertainty_level <- translate_allow_uncertain(allow_uncertain)
-												(v1.5.0.9022) mo properties speed improvement

											
										
										
											2021-02-21 22:56:35 +01:00
 								  if (tryCatch(all(x == "" | gsub(".*(unknown ).*", "unknown name", tolower(x), perl = TRUE) %in% MO_lookup$fullname_lower, na.rm = TRUE)
 								               & isFALSE(Becker)
 								               & isFALSE(Lancefield), error = function(e) FALSE)) {
 								    # to improve speed, special case for taxonomically correct full names (case-insensitive)
 								    return(MO_lookup[match(gsub(".*(unknown ).*", "unknown name", tolower(x), perl = TRUE), MO_lookup$fullname_lower), "mo", drop = TRUE])
 								  }
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v1.4.0.9041) updates based on review

											
										
										
											2020-12-17 16:22:25 +01:00
+								  if (!is.null(reference_df)
-												(v1.4.0.9043) documentation update

											
										
										
											2020-12-22 00:51:17 +01:00
+								      && check_validity_mo_source(reference_df)
-												(v1.4.0.9012) reference_df fix

											
										
										
											2020-11-05 01:11:49 +01:00
+								      && isFALSE(Becker)
 								      && isFALSE(Lancefield)
 								      && all(x %in% unlist(reference_df), na.rm = TRUE)) {
 								    reference_df <- repair_reference_df(reference_df)
-												mo_source improvement

											
										
										
											2019-03-01 09:34:04 +01:00
+								    suppressWarnings(
-												(v1.3.0.9022) mo_matching_score(), poorman update, as.rsi() fix

											
										
										
											2020-09-18 16:05:53 +02:00
+								      y <- data.frame(x = x, stringsAsFactors = FALSE) %pm>%
 								        pm_left_join(reference_df, by = "x") %pm>%
-												(v1.4.0.9012) reference_df fix

											
										
										
											2020-11-05 01:11:49 +01:00
+								        pm_pull(mo)
-												mo_source improvement

											
										
										
											2019-03-01 09:34:04 +01:00
+								    )
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v1.3.0.9038) prefinal 1.4.0

											
										
										
											2020-10-04 19:26:43 +02:00
+								  } else if (all(x[!is.na(x)] %in% MO_lookup$mo)
-												uncertainty levels, new WHONET codes

											
										
										
											2019-03-12 12:19:27 +01:00
+								             & isFALSE(Becker)
 								             & isFALSE(Lancefield)) {
-												as.mo improvement

											
										
										
											2019-02-26 12:33:26 +01:00
+								    y <- x
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												DSMZ data

											
										
										
											2019-03-18 14:29:41 +01:00
+								  } else {
 								    # will be checked for mo class in validation and uses exec_as.mo internally if necessary
 								    y <- mo_validate(x = x, property = "mo",
 								                     Becker = Becker, Lancefield = Lancefield,
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								                     allow_uncertain = uncertainty_level,
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								                     reference_df = reference_df,
-												(v1.3.0.9014) as.mo() speed improvement

											
										
										
											2020-09-03 12:31:48 +02:00
+								                     ignore_pattern = ignore_pattern,
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								                     language = language,
-												new antibiotics

											
										
										
											2019-05-10 16:44:59 +02:00
+								                     ...)
-												DSMZ data

											
										
										
											2019-03-18 14:29:41 +01:00
+								  }
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v1.4.0.9021) more robust class setting

											
										
										
											2020-11-16 16:57:55 +01:00
+								  set_clean_class(y,
 								                  new_class = c("mo", "character"))
-												speed improvement for as.mo, more old taxonomic names

											
										
										
											2018-09-27 23:23:48 +02:00
+								}
 								#' @rdname as.mo
 								#' @export
 								is.mo <- function(x) {
-												(v0.9.0.9018) Remove mo_history

											
										
										
											2020-01-31 23:27:38 +01:00
+								  inherits(x, "mo")
-												speed improvement for as.mo, more old taxonomic names

											
										
										
											2018-09-27 23:23:48 +02:00
+								}
-												(v0.9.0.9023) EUCAST 2020 guidelines

											
										
										
											2020-02-14 19:54:13 +01:00
+								# param property a column name of microorganisms
-												as.mo improvement

											
										
										
											2019-03-15 17:36:42 +01:00
+								# param initial_search logical - is FALSE when coming from uncertain tries, which uses exec_as.mo internally too
-												(v0.7.1.9038) mo algorithm inprovements

											
										
										
											2019-08-12 19:07:15 +02:00
+								# param dyslexia_mode logical - also check for characters that resemble others
-												DSMZ data

											
										
										
											2019-03-18 14:29:41 +01:00
+								# param debug logical - show different lookup texts while searching
-												(v0.7.1.9077) mo codes fix

											
										
										
											2019-09-20 14:18:29 +02:00
+								# param reference_data_to_use data.frame - the data set to check for
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								# param actual_uncertainty - (only for initial_search = FALSE) the actual uncertainty level used in the function for score calculation (sometimes passed as 2 or 3 by uncertain_fn())
 								# param actual_input - (only for initial_search = FALSE) the actual, original input
 								# param language - used for translating "no growth", etc.
-												memory for as.mo()

											
										
										
											2019-03-15 13:57:25 +01:00
+								exec_as.mo <- function(x,
 								                       Becker = FALSE,
 								                       Lancefield = FALSE,
 								                       allow_uncertain = TRUE,
 								                       reference_df = get_mo_source(),
 								                       property = "mo",
-												as.mo improvement

											
										
										
											2019-03-15 17:36:42 +01:00
+								                       initial_search = TRUE,
-												(v0.7.1.9038) mo algorithm inprovements

											
										
										
											2019-08-12 19:07:15 +02:00
+								                       dyslexia_mode = FALSE,
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								                       debug = FALSE,
-												(v1.3.0.9014) as.mo() speed improvement

											
										
										
											2020-09-03 12:31:48 +02:00
+								                       ignore_pattern = getOption("AMR_ignore_pattern"),
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								                       reference_data_to_use = MO_lookup,
 								                       actual_uncertainty = 1,
 								                       actual_input = NULL,
 								                       language = get_locale()) {
-												(v1.4.0.9003) CoNS update

											
										
										
											2020-10-20 21:00:57 +02:00
+								  meet_criteria(x, allow_class = c("mo", "data.frame", "list", "character", "numeric", "integer", "factor"), allow_NA = TRUE)
-												(v1.4.0.9001) is_gram_positive(), is_gram_negative(), parameter hardening

											
										
										
											2020-10-19 17:09:19 +02:00
+								  meet_criteria(Becker, allow_class = c("logical", "character"), has_length = 1)
 								  meet_criteria(Lancefield, allow_class = c("logical", "character"), has_length = 1)
 								  meet_criteria(allow_uncertain, allow_class = c("logical", "numeric", "integer"), has_length = 1)
 								  meet_criteria(reference_df, allow_class = "data.frame", allow_NULL = TRUE)
 								  meet_criteria(property, allow_class = "character", has_length = 1, is_in = colnames(microorganisms))
 								  meet_criteria(initial_search, allow_class = "logical", has_length = 1)
 								  meet_criteria(dyslexia_mode, allow_class = "logical", has_length = 1)
 								  meet_criteria(debug, allow_class = "logical", has_length = 1)
 								  meet_criteria(ignore_pattern, allow_class = "character", has_length = 1, allow_NULL = TRUE)
 								  meet_criteria(reference_data_to_use, allow_class = "data.frame")
 								  meet_criteria(actual_uncertainty, allow_class = "numeric", has_length = 1)
 								  meet_criteria(actual_input, allow_class = "character", allow_NULL = TRUE)
 								  meet_criteria(language, has_length = 1, is_in = c(LANGUAGES_SUPPORTED, ""), allow_NULL = TRUE, allow_NA = TRUE)
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v0.9.0.9023) EUCAST 2020 guidelines

											
										
										
											2020-02-14 19:54:13 +01:00
+								  check_dataset_integrity()
-												(v1.4.0.9044) mo tibble printing, mo_shortname() fix

											
										
										
											2020-12-24 23:29:10 +01:00
-												(v1.5.0.9021) improve speed of %like%

											
										
										
											2021-02-21 20:15:09 +01:00
+								  if (isTRUE(debug) && initial_search == TRUE) {
 								    time_start_tracking()
 								  }
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								  lookup <- function(needle,
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								                     column = property,
 								                     haystack = reference_data_to_use,
 								                     n = 1,
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								                     debug_mode = debug,
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								                     initial = initial_search,
 								                     uncertainty = actual_uncertainty,
 								                     input_actual = actual_input) {
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								    if (!is.null(input_actual)) {
 								      input <- input_actual
 								    } else {
 								      input <- tryCatch(x_backup[i], error = function(e) "")
 								    }
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v1.3.0.9017) small fix

											
										
										
											2020-09-12 13:54:21 +02:00
+								    # `column` can be NULL for all columns, or a selection
 								    # returns a character (vector) - if `column` > length 1 then with columns as names
 								    if (isTRUE(debug_mode)) {
-												(v1.5.0.9021) improve speed of %like%

											
										
										
											2021-02-21 20:15:09 +01:00
+								      cat(font_silver("Looking up: ", substitute(needle), collapse = ""),
 								          "\n           ", time_track())
-												(v1.3.0.9017) small fix

											
										
										
											2020-09-12 13:54:21 +02:00
+								    }
 								    if (length(column) == 1) {
 								      res_df <- haystack[which(eval(substitute(needle), envir = haystack, enclos = parent.frame())), , drop = FALSE]
-												(v1.3.0.9020) fix for uncertainty in as.mo()

											
										
										
											2020-09-14 19:41:48 +02:00
+								      if (NROW(res_df) > 1 & uncertainty != -1) {
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								        # sort the findings on matching score
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								        scores <- mo_matching_score(x = input,
-												(v1.3.0.9031) matching score update

											
										
										
											2020-09-26 16:51:17 +02:00
+								                                    n = res_df[, "fullname", drop = TRUE])
-												(v1.3.0.9022) mo_matching_score(), poorman update, as.rsi() fix

											
										
										
											2020-09-18 16:05:53 +02:00
+								        res_df <- res_df[order(scores, decreasing = TRUE), , drop = FALSE]
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								      }
-												(v1.3.0.9017) small fix

											
										
										
											2020-09-12 13:54:21 +02:00
+								      res <- as.character(res_df[, column, drop = TRUE])
 								      if (length(res) == 0) {
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								        if (isTRUE(debug_mode)) {
 								          cat(font_red(" (no match)\n"))
 								        }
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								        NA_character_
-												(v1.3.0.9017) small fix

											
										
										
											2020-09-12 13:54:21 +02:00
+								      } else {
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								        if (isTRUE(debug_mode)) {
-												(v1.5.0.9021) improve speed of %like%

											
										
										
											2021-02-21 20:15:09 +01:00
+								          cat(font_green(paste0(" MATCH (", NROW(res_df), " results)\n")))
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								        }
-												(v1.3.0.9020) fix for uncertainty in as.mo()

											
										
										
											2020-09-14 19:41:48 +02:00
+								        if ((length(res) > n | uncertainty > 1) & uncertainty != -1) {
 								          # save the other possible results as well, but not for forced certain results (then uncertainty == -1)
-												(v1.3.0.9017) small fix

											
										
										
											2020-09-12 13:54:21 +02:00
+								          uncertainties <<- rbind(uncertainties,
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								                                  format_uncertainty_as_df(uncertainty_level = uncertainty,
 								                                                           input = input,
-												(v1.3.0.9017) small fix

											
										
										
											2020-09-12 13:54:21 +02:00
+								                                                           result_mo = res_df[1, "mo", drop = TRUE],
-												(v1.4.0.9017) stringsAsFactors definitions

											
										
										
											2020-11-11 16:49:27 +01:00
+								                                                           candidates = as.character(res_df[, "fullname", drop = TRUE])),
 								                                  stringsAsFactors = FALSE)
-												(v1.3.0.9017) small fix

											
										
										
											2020-09-12 13:54:21 +02:00
+								        }
 								        res[seq_len(min(n, length(res)))]
 								      }
 								    } else {
 								      if (is.null(column)) {
 								        column <- names(haystack)
 								      }
 								      res <- haystack[which(eval(substitute(needle), envir = haystack, enclos = parent.frame())), , drop = FALSE]
 								      res <- res[seq_len(min(n, nrow(res))), column, drop = TRUE]
 								      if (NROW(res) == 0) {
-												(v1.5.0.9021) improve speed of %like%

											
										
										
											2021-02-21 20:15:09 +01:00
+								        if (isTRUE(debug_mode)) {
 								          cat(font_red(" (no rows)\n"))
 								        }
-												(v1.3.0.9017) small fix

											
										
										
											2020-09-12 13:54:21 +02:00
+								        res <- rep(NA_character_, length(column))
-												(v1.5.0.9021) improve speed of %like%

											
										
										
											2021-02-21 20:15:09 +01:00
+								      } else {
 								        if (isTRUE(debug_mode)) {
 								          cat(font_green(paste0(" MATCH (", NROW(res), " rows)\n")))
 								        }
-												(v1.3.0.9017) small fix

											
										
										
											2020-09-12 13:54:21 +02:00
+								      }
 								      res <- as.character(res)
 								      names(res) <- column
 								      res
 								    }
 								  }
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v1.0.1.9005) as.mo() improvements

											
										
										
											2020-04-13 21:09:56 +02:00
+								  # start off with replaced language-specific non-ASCII characters with ASCII characters
-												(v1.0.1.9007) small bugfix

											
										
										
											2020-04-14 15:10:09 +02:00
+								  x <- parse_and_convert(x)
-												(v1.2.0.9035) as.mo() speed improvement

											
										
										
											2020-07-22 10:24:23 +02:00
+								  # replace mo codes used in older package versions
-												(v1.2.0.9036) unit test fix

											
										
										
											2020-07-22 12:29:51 +02:00
+								  x <- replace_old_mo_codes(x, property)
-												(v1.3.0.9014) as.mo() speed improvement

											
										
										
											2020-09-03 12:31:48 +02:00
+								  # ignore cases that match the ignore pattern
 								  x <- replace_ignore_pattern(x, ignore_pattern)
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v0.6.1.9053) prerelease fixes

											
										
										
											2019-06-02 19:23:19 +02:00
+								  # WHONET: xxx = no growth
 								  x[tolower(as.character(paste0(x, ""))) %in% c("", "xxx", "na", "nan")] <- NA_character_
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								  # Laboratory systems: remove (translated) entries like "no growth", etc.
 								  x[trimws2(x) %like% translate_AMR("no .*growth", language = language)] <- NA_character_
 								  x[trimws2(x) %like% paste0("^(", translate_AMR("no|not", language = language), ") [a-z]+")] <- "UNKNOWN"
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												as.mo improvement

											
										
										
											2019-03-15 17:36:42 +01:00
+								  if (initial_search == TRUE) {
-												(v1.4.0.9044) mo tibble printing, mo_shortname() fix

											
										
										
											2020-12-24 23:29:10 +01:00
+								    # keep track of time - give some hints to improve speed if it takes a long time
 								    start_time <- Sys.time()
-												(v1.4.0.9046) get_episode

											
										
										
											2020-12-27 00:07:00 +01:00
+								    pkg_env$mo_failures <- NULL
 								    pkg_env$mo_uncertainties <- NULL
 								    pkg_env$mo_renamed <- NULL
-												better as.mo handling

											
										
										
											2018-12-06 14:36:39 +01:00
+								  }
-												(v1.4.0.9046) get_episode

											
										
										
											2020-12-27 00:07:00 +01:00
+								  pkg_env$mo_renamed_last_run <- NULL
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v1.3.0.9016) mo_uncertainties() overhaul

											
										
										
											2020-09-12 08:49:01 +02:00
+								  failures <- character(0)
 								  uncertainty_level <- translate_allow_uncertain(allow_uncertain)
-												(v0.7.1.9058) as.mo() improvement

											
										
										
											2019-08-20 11:40:54 +02:00
+								  uncertainties <- data.frame(uncertainty = integer(0),
 								                              input = character(0),
-												age_groups fix

											
										
										
											2019-02-27 11:36:12 +01:00
+								                              fullname = character(0),
-												(v0.7.1.9058) as.mo() improvement

											
										
										
											2019-08-20 11:40:54 +02:00
+								                              renamed_to = character(0),
-												(v1.3.0.9016) mo_uncertainties() overhaul

											
										
										
											2020-09-12 08:49:01 +02:00
+								                              mo = character(0),
 								                              candidates = character(0),
-												(v0.7.1.9058) as.mo() improvement

											
										
										
											2019-08-20 11:40:54 +02:00
+								                              stringsAsFactors = FALSE)
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												speed improvement for as.mo, more old taxonomic names

											
										
										
											2018-09-27 23:23:48 +02:00
+								  x_input <- x
-												as.mo improvements

											
										
										
											2019-02-23 18:08:28 +01:00
+								  # already strip leading and trailing spaces
-												(v1.0.1.9005) as.mo() improvements

											
										
										
											2020-04-13 21:09:56 +02:00
+								  x <- trimws(x)
-												speed improvement for as.mo, more old taxonomic names

											
										
										
											2018-09-27 23:23:48 +02:00
+								  # only check the uniques, which is way faster
 								  x <- unique(x)
-												fix for as.mo and freq

											
										
										
											2018-11-30 12:05:59 +01:00
+								  # remove empty values (to later fill them in again with NAs)
-												(v0.6.1.9053) prerelease fixes

											
										
										
											2019-06-02 19:23:19 +02:00
+								  # ("xxx" is WHONET code for 'no growth')
-												con WHONET, filter ab class

											
										
										
											2019-03-05 22:47:42 +01:00
+								  x <- x[!is.na(x)
 								         & !is.null(x)
 								         & !identical(x, "")
-												(v0.6.1.9053) prerelease fixes

											
										
										
											2019-06-02 19:23:19 +02:00
+								         & !identical(x, "xxx")]
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												authors from ITIS, diff for freq

											
										
										
											2018-10-01 11:39:43 +02:00
+								  # defined df to check for
 								  if (!is.null(reference_df)) {
-												(v1.4.0.9043) documentation update

											
										
										
											2020-12-22 00:51:17 +01:00
+								    check_validity_mo_source(reference_df)
-												(v1.4.0.9012) reference_df fix

											
										
										
											2020-11-05 01:11:49 +01:00
+								    reference_df <- repair_reference_df(reference_df)
-												authors from ITIS, diff for freq

											
										
										
											2018-10-01 11:39:43 +02:00
+								  }
-												(v1.5.0.9021) improve speed of %like%

											
										
										
											2021-02-21 20:15:09 +01:00
-												as.mo improvements

											
										
										
											2019-02-23 18:08:28 +01:00
+								  # all empty
-												small as.mo fix

											
										
										
											2019-03-06 14:39:02 +01:00
+								  if (all(identical(trimws(x_input), "") | is.na(x_input) | length(x) == 0)) {
-												EUCAST update, as.mo bugfix for empty vlaues

											
										
										
											2019-01-08 16:23:45 +01:00
+								    if (property == "mo") {
-												(v1.4.0.9021) more robust class setting

											
										
										
											2020-11-16 16:57:55 +01:00
+								      return(set_clean_class(rep(NA_character_, length(x_input)),
 								                             new_class = c("mo", "character")))
-												EUCAST update, as.mo bugfix for empty vlaues

											
										
										
											2019-01-08 16:23:45 +01:00
+								    } else {
 								      return(rep(NA_character_, length(x_input)))
 								    }
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												con WHONET, filter ab class

											
										
										
											2019-03-05 22:47:42 +01:00
+								  } else if (all(x %in% reference_df[, 1][[1]])) {
-												set_mo_source

											
										
										
											2019-01-21 15:53:01 +01:00
+								    # all in reference df
-												authors from ITIS, diff for freq

											
										
										
											2018-10-01 11:39:43 +02:00
+								    colnames(reference_df)[1] <- "x"
 								    suppressWarnings(
-												(v1.3.0.9014) as.mo() speed improvement

											
										
										
											2020-09-03 12:31:48 +02:00
+								      x <- MO_lookup[match(reference_df[match(x, reference_df$x), "mo", drop = TRUE], MO_lookup$mo), property, drop = TRUE]
-												authors from ITIS, diff for freq

											
										
										
											2018-10-01 11:39:43 +02:00
+								    )
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								  } else if (all(x %in% reference_data_to_use$mo)) {
-												(v1.3.0.9014) as.mo() speed improvement

											
										
										
											2020-09-03 12:31:48 +02:00
+								    x <- MO_lookup[match(x, MO_lookup$mo), property, drop = TRUE]
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								  } else if (all(tolower(x) %in% reference_data_to_use$fullname_lower)) {
-												as.mo improvement

											
										
										
											2019-02-23 21:49:02 +01:00
+								    # we need special treatment for very prevalent full names, they are likely!
-												as.mo improvements

											
										
										
											2019-02-23 18:08:28 +01:00
+								    # e.g. as.mo("Staphylococcus aureus")
-												(v1.3.0.9014) as.mo() speed improvement

											
										
										
											2020-09-03 12:31:48 +02:00
+								    x <- MO_lookup[match(tolower(x), MO_lookup$fullname_lower), property, drop = TRUE]
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v1.3.0.9002) intrinsic_resistant data set

											
										
										
											2020-08-14 13:36:10 +02:00
+								  } else if (all(x %in% reference_data_to_use$fullname)) {
 								    # we need special treatment for very prevalent full names, they are likely!
 								    # e.g. as.mo("Staphylococcus aureus")
-												(v1.3.0.9014) as.mo() speed improvement

											
										
										
											2020-09-03 12:31:48 +02:00
+								    x <- MO_lookup[match(x, MO_lookup$fullname), property, drop = TRUE]
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v0.9.0.9023) EUCAST 2020 guidelines

											
										
										
											2020-02-14 19:54:13 +01:00
+								  } else if (all(toupper(x) %in% microorganisms.codes$code)) {
-												set_mo_source

											
										
										
											2019-01-21 15:53:01 +01:00
+								    # commonly used MO codes
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								    x <- MO_lookup[match(microorganisms.codes[match(toupper(x),
 								                                                    microorganisms.codes$code),
-												(v1.3.0.9014) as.mo() speed improvement

											
										
										
											2020-09-03 12:31:48 +02:00
+								                                              "mo",
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								                                              drop = TRUE],
 								                         MO_lookup$mo),
-												(v1.3.0.9014) as.mo() speed improvement

											
										
										
											2020-09-03 12:31:48 +02:00
+								                   property,
 								                   drop = TRUE]
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v0.9.0.9023) EUCAST 2020 guidelines

											
										
										
											2020-02-14 19:54:13 +01:00
+								  } else if (!all(x %in% microorganisms[, property])) {
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								    strip_whitespace <- function(x, dyslexia_mode) {
-												fixes for microorganisms.codes

											
										
										
											2019-05-28 16:50:40 +02:00
+								      # all whitespaces (tab, new lines, etc.) should be one space
-												(v1.4.0.9050) ab selectors base R

											
										
										
											2020-12-27 20:32:40 +01:00
+								      # and spaces before and after should be left blank
-												(v1.0.1.9005) as.mo() improvements

											
										
										
											2020-04-13 21:09:56 +02:00
+								      trimmed <- trimws2(x)
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								      # also, make sure the trailing and leading characters are a-z or 0-9
 								      # in case of non-regex
 								      if (dyslexia_mode == FALSE) {
-												(v1.3.0.9029) eucast rules fix, unique()

											
										
										
											2020-09-25 14:44:50 +02:00
+								        trimmed <- gsub("^[^a-zA-Z0-9)(]+", "", trimmed, perl = TRUE)
 								        trimmed <- gsub("[^a-zA-Z0-9)(]+$", "", trimmed, perl = TRUE)
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								      }
 								      trimmed
-												fixes for microorganisms.codes

											
										
										
											2019-05-28 16:50:40 +02:00
+								    }
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v0.8.0.9002) eucast_rules() fix for S. maltophilia

											
										
										
											2019-10-23 14:48:25 +02:00
+								    x_backup_untouched <- x
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								    x <- strip_whitespace(x, dyslexia_mode)
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								    # translate 'unknown' names back to English
 								    if (any(x %like% "unbekannt|onbekend|desconocid|sconosciut|iconnu|desconhecid", na.rm = TRUE)) {
 								      trns <- subset(translations_file, pattern %like% "unknown" | affect_mo_name == TRUE)
-												(v1.5.0.9028) Updated taxonomy until March 2021

											
										
										
											2021-03-04 23:28:32 +01:00
+								      langs <- LANGUAGES_SUPPORTED[LANGUAGES_SUPPORTED != "en"]
 								      for (l in langs) {
 								        for (i in seq_len(nrow(trns))) {
 								          if (!is.na(trns[i, l, drop = TRUE])) {
 								            x <- gsub(pattern = trns[i, l, drop = TRUE],
 								                      replacement = trns$pattern[i],
 								                      x = x,
 								                      ignore.case = TRUE,
 								                      perl = TRUE)
 								          }
 								        }
 								      }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								    }
-												as.mo improvements

											
										
										
											2019-02-23 18:08:28 +01:00
+								    x_backup <- x
-												(v1.5.0.9028) Updated taxonomy until March 2021

											
										
										
											2021-03-04 23:28:32 +01:00
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								    # from here on case-insensitive
 								    x <- tolower(x)
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
 								    x_backup[x %like_case% "^(fungus|fungi)$"] <- "(unknown fungus)" # will otherwise become the kingdom
 								    x_backup[x_backup_untouched == "Fungi"] <- "Fungi" # is literally the kingdom
 								    # Fill in fullnames and MO codes at once
-												(v1.5.0.9021) improve speed of %like%

											
										
										
											2021-02-21 20:15:09 +01:00
+								    known_names <- tolower(x_backup) %in% MO_lookup$fullname_lower
-												(v1.5.0.9022) mo properties speed improvement

											
										
										
											2021-02-21 22:56:35 +01:00
+								    x[known_names] <- MO_lookup[match(tolower(x_backup)[known_names], MO_lookup$fullname_lower), property, drop = TRUE]
-												(v1.5.0.9021) improve speed of %like%

											
										
										
											2021-02-21 20:15:09 +01:00
+								    known_codes <- toupper(x_backup) %in% MO_lookup$mo
 								    x[known_codes] <- MO_lookup[match(toupper(x_backup)[known_codes], MO_lookup$mo), property, drop = TRUE]
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								    already_known <- known_names | known_codes
-												(v1.5.0.9022) mo properties speed improvement

											
										
										
											2021-02-21 22:56:35 +01:00
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								    # now only continue where the right taxonomic output is not already known
 								    if (any(!already_known)) {
 								      x_known <- x[already_known]
-												(v1.5.0.9024) more speed improvements

											
										
										
											2021-02-22 20:21:33 +01:00
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								      # remove spp and species
-												(v1.5.0.9024) more speed improvements

											
										
										
											2021-02-22 20:21:33 +01:00
+								      x <- gsub(" +(spp.?|ssp.?|sp.? |ss ?.?|subsp.?|subspecies|biovar |serovar |species)", " ", x)
 								      x <- gsub("(spp.?|subsp.?|subspecies|biovar|serovar|species)", "", x)
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								      x <- gsub("^([a-z]{2,4})(spe.?)$", "\\1", x, perl = TRUE) # when ending in SPE instead of SPP and preceded by 2-4 characters
 								      x <- strip_whitespace(x, dyslexia_mode)
 								      x_backup_without_spp <- x
 								      x_species <- paste(x, "species")
 								      # translate to English for supported languages of mo_property
 								      x <- gsub("(gruppe|groep|grupo|gruppo|groupe)", "group", x, perl = TRUE)
 								      # no groups and complexes as ending
 								      x <- gsub("(complex|group)$", "", x, perl = TRUE)
 								      x <- gsub("(^|[^a-z])((an)?aero+b)[a-z]*", "", x, perl = TRUE)
 								      x <- gsub("^atyp[a-z]*", "", x, perl = TRUE)
 								      x <- gsub("(vergroen)[a-z]*", "viridans", x, perl = TRUE)
 								      x <- gsub("[a-z]*diff?erent[a-z]*", "", x, perl = TRUE)
 								      x <- gsub("(hefe|gist|gisten|levadura|lievito|fermento|levure)[a-z]*", "yeast", x, perl = TRUE)
 								      x <- gsub("(schimmels?|mofo|molde|stampo|moisissure|fungi)[a-z]*", "fungus", x, perl = TRUE)
 								      x <- gsub("fungus[ph|f]rya", "fungiphrya", x, perl = TRUE)
 								      # no contamination
 								      x <- gsub("(contamination|kontamination|mengflora|contaminaci.n|contamina..o)", "", x, perl = TRUE)
 								      # remove non-text in case of "E. coli" except dots and spaces
 								      x <- trimws(gsub("[^.a-zA-Z0-9/ \\-]+", " ", x, perl = TRUE))
 								      # but make sure that dots are followed by a space
 								      x <- gsub("[.] ?", ". ", x, perl = TRUE)
 								      # replace minus by a space
 								      x <- gsub("-+", " ", x, perl = TRUE)
 								      # replace hemolytic by haemolytic
 								      x <- gsub("ha?emoly", "haemoly", x, perl = TRUE)
 								      # place minus back in streptococci
 								      x <- gsub("(alpha|beta|gamma).?ha?emoly", "\\1-haemoly", x, perl = TRUE)
 								      # remove genus as first word
 								      x <- gsub("^genus ", "", x, perl = TRUE)
 								      # remove 'uncertain'-like texts
 								      x <- trimws(gsub("(uncertain|susp[ie]c[a-z]+|verdacht)", "", x, perl = TRUE))
 								      # allow characters that resemble others = dyslexia_mode ----
 								      if (dyslexia_mode == TRUE) {
 								        x <- tolower(x)
-												(v1.5.0.9024) more speed improvements

											
										
										
											2021-02-22 20:21:33 +01:00
+								        x <- gsub("[iy]+", "[iy]+", x)
 								        x <- gsub("(c|k|q|qu|s|z|x|ks)+", "(c|k|q|qu|s|z|x|ks)+", x)
 								        x <- gsub("(ph|hp|f|v)+", "(ph|hp|f|v)+", x)
 								        x <- gsub("(th|ht|t)+", "(th|ht|t)+", x)
 								        x <- gsub("a+", "a+", x)
 								        x <- gsub("u+", "u+", x)
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								        # allow any ending of -um, -us, -ium, -icum, -ius, -icus, -ica, -ia and -a (needs perl for the negative backward lookup):
 								        x <- gsub("(u\\+\\(c\\|k\\|q\\|qu\\+\\|s\\|z\\|x\\|ks\\)\\+)(?![a-z])",
 								                  "(u[s|m]|[iy][ck]?u[ms]|[iy]?[ck]?a)", x, perl = TRUE)
 								        x <- gsub("(\\[iy\\]\\+\\(c\\|k\\|q\\|qu\\+\\|s\\|z\\|x\\|ks\\)\\+a\\+)(?![a-z])",
 								                  "(u[s|m]|[iy][ck]?u[ms]|[iy]?[ck]?a)", x, perl = TRUE)
 								        x <- gsub("(\\[iy\\]\\+u\\+m)(?![a-z])",
 								                  "(u[s|m]|[iy][ck]?u[ms]|[iy]?[ck]?a)", x, perl = TRUE)
 								        x <- gsub("(\\[iy\\]\\+a\\+)(?![a-z])",
 								                  "([iy]*a+|[iy]+a*)", x, perl = TRUE)
-												(v1.5.0.9024) more speed improvements

											
										
										
											2021-02-22 20:21:33 +01:00
+								        x <- gsub("e+", "e+", x)
 								        x <- gsub("o+", "o+", x)
 								        x <- gsub("(.)\\1+", "\\1+", x)
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								        # allow multiplication of all other consonants
 								        x <- gsub("([bdgjlnrw]+)", "\\1+", x, perl = TRUE)
 								        # allow ending in -en or -us
 								        x <- gsub("e\\+n(?![a-z[])", "(e+n|u+(c|k|q|qu|s|z|x|ks)+)", x, perl = TRUE)
 								        # if the input is longer than 10 characters, allow any forgotten consonant between all characters, as some might just have forgotten one...
 								        # this will allow "Pasteurella damatis" to be correctly read as "Pasteurella dagmatis".
 								        consonants <- paste(letters[!letters %in% c("a", "e", "i", "o", "u")], collapse = "")
 								        x[nchar(x_backup_without_spp) > 10] <- gsub("[+]", paste0("+[", consonants, "]?"), x[nchar(x_backup_without_spp) > 10])
-												(v1.5.0.9024) more speed improvements

											
										
										
											2021-02-22 20:21:33 +01:00
+								        # allow au and ou after all above regex implementations
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								        x <- gsub("a+[bcdfghjklmnpqrstvwxyz]?u+[bcdfghjklmnpqrstvwxyz]?", "(a+u+|o+u+)[bcdfghjklmnpqrstvwxyz]?", x, fixed = TRUE)
 								        x <- gsub("o+[bcdfghjklmnpqrstvwxyz]?u+[bcdfghjklmnpqrstvwxyz]?", "(a+u+|o+u+)[bcdfghjklmnpqrstvwxyz]?", x, fixed = TRUE)
-												(v0.9.0.9016) Support SNOMED codes

											
										
										
											2020-01-27 19:14:23 +01:00
+								      }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								      x <- strip_whitespace(x, dyslexia_mode)
 								      # make sure to remove regex overkill (will lead to errors)
 								      x <- gsub("++", "+", x, fixed = TRUE)
 								      x <- gsub("?+", "?", x, fixed = TRUE)
 								      x_trimmed <- x
 								      x_trimmed_species <- paste(x_trimmed, "species")
 								      x_trimmed_without_group <- gsub(" gro.u.p$", "", x_trimmed, perl = TRUE)
 								      # remove last part from "-" or "/"
 								      x_trimmed_without_group <- gsub("(.*)[-/].*", "\\1", x_trimmed_without_group)
 								      # replace space and dot by regex sign
 								      x_withspaces <- gsub("[ .]+", ".* ", x, perl = TRUE)
 								      x <- gsub("[ .]+", ".*", x, perl = TRUE)
 								      # add start en stop regex
 								      x <- paste0("^", x, "$")
 								      x_withspaces_start_only <- paste0("^", x_withspaces)
 								      x_withspaces_end_only <- paste0(x_withspaces, "$")
 								      x_withspaces_start_end <- paste0("^", x_withspaces, "$")
 								      if (isTRUE(debug)) {
 								        cat(paste0(font_blue("x"), '                       "', x, '"\n'))
 								        cat(paste0(font_blue("x_species"), '               "', x_species, '"\n'))
 								        cat(paste0(font_blue("x_withspaces_start_only"), ' "', x_withspaces_start_only, '"\n'))
 								        cat(paste0(font_blue("x_withspaces_end_only"), '   "', x_withspaces_end_only, '"\n'))
 								        cat(paste0(font_blue("x_withspaces_start_end"), '  "', x_withspaces_start_end, '"\n'))
 								        cat(paste0(font_blue("x_backup"), '                "', x_backup, '"\n'))
 								        cat(paste0(font_blue("x_backup_without_spp"), '    "', x_backup_without_spp, '"\n'))
 								        cat(paste0(font_blue("x_trimmed"), '               "', x_trimmed, '"\n'))
 								        cat(paste0(font_blue("x_trimmed_species"), '       "', x_trimmed_species, '"\n'))
 								        cat(paste0(font_blue("x_trimmed_without_group"), ' "', x_trimmed_without_group, '"\n'))
-												(v0.7.1.9075) new microorganism codes

											
										
										
											2019-09-18 15:46:09 +02:00
+								      }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
 								      if (initial_search == TRUE) {
 								        progress <- progress_ticker(n = length(x[!already_known]), n_min = 25) # start if n >= 25
 								        on.exit(close(progress))
-												as.mo improvement

											
										
										
											2019-02-23 16:02:31 +01:00
+								      }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
 								      for (i in which(!already_known)) {
 								        if (initial_search == TRUE) {
 								          progress$tick()
-												(v1.1.0.9020) updated taxonomy

											
										
										
											2020-05-27 16:37:49 +02:00
+								        }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
 								        # valid MO code ----
 								        found <- lookup(mo == toupper(x_backup[i]))
 								        if (!is.na(found)) {
 								          x[i] <- found[1L]
 								          next
 								        }
 								        # valid fullname ----
 								        found <- lookup(fullname_lower %in% gsub("[^a-zA-Z0-9_. -]", "", tolower(c(x_backup[i], x_backup_without_spp[i])), perl = TRUE))
 								        # added the gsub() for "(unknown fungus)", since fullname_lower does not contain brackets
 								        if (!is.na(found)) {
 								          x[i] <- found[1L]
 								          next
 								        }
 								        # old fullname ----
 								        found <- lookup(fullname_lower %in% tolower(c(x_backup[i], x_backup_without_spp[i])),
 								                        column = NULL, # all columns
 								                        haystack = MO.old_lookup)
 								        if (!all(is.na(found))) {
 								          # when property is "ref" (which is the case in mo_ref, mo_authors and mo_year), return the old value, so:
 								          # mo_ref() of "Chlamydia psittaci" will be "Page, 1968" (with warning)
 								          # mo_ref() of "Chlamydophila psittaci" will be "Everett et al., 1999"
 								          if (property == "ref") {
 								            x[i] <- found["ref"]
 								          } else {
 								            x[i] <- lookup(fullname == found["fullname_new"], haystack = MO_lookup)
 								          }
 								          pkg_env$mo_renamed_last_run <- found["fullname"]
 								          was_renamed(name_old = found["fullname"],
 								                      name_new = lookup(fullname == found["fullname_new"], "fullname", haystack = MO_lookup),
 								                      ref_old = found["ref"],
 								                      ref_new = lookup(fullname == found["fullname_new"], "ref", haystack = MO_lookup),
 								                      mo = lookup(fullname == found["fullname_new"], "mo", haystack = MO_lookup))
 								          next
 								        }
 								        if (x_backup[i] %like_case% "\\(unknown [a-z]+\\)" | tolower(x_backup_without_spp[i]) %in% c("other", "none", "unknown")) {
 								          # empty and nonsense values, ignore without warning
 								          x[i] <- lookup(mo == "UNKNOWN")
 								          next
 								        }
 								        # exact SNOMED code ----
 								        if (x_backup[i] %like_case% "^[0-9]+$") {
 								          snomed_found <- unlist(lapply(reference_data_to_use$snomed,
 								                                        function(s) if (x_backup[i] %in% s) {
 								                                          TRUE
 								                                        } else {
 								                                          FALSE
 								                                        }))
 								          if (sum(snomed_found, na.rm = TRUE) > 0) {
 								            found <- reference_data_to_use[snomed_found == TRUE, property][[1]]
 								            if (!is.na(found)) {
 								              x[i] <- found[1L]
 								              next
 								            }
 								          }
 								        }
 								        # very probable: is G. species ----
 								        found <- lookup(g_species %in% gsub("[^a-z0-9/ \\-]+", "",
 								                                            tolower(c(x_backup[i], x_backup_without_spp[i])), perl = TRUE))
 								        if (!is.na(found)) {
 								          x[i] <- found[1L]
 								          next
 								        }
 								        # WHONET and other common LIS codes ----
 								        found <- microorganisms.codes[which(microorganisms.codes$code %in% toupper(c(x_backup_untouched[i], x_backup[i], x_backup_without_spp[i]))), "mo", drop = TRUE][1L]
 								        if (!is.na(found)) {
 								          x[i] <- lookup(mo == found)
 								          next
 								        }
 								        # user-defined reference ----
 								        if (!is.null(reference_df)) {
 								          if (x_backup[i] %in% reference_df[, 1]) {
 								            # already checked integrity of reference_df, all MOs are valid
 								            ref_mo <- reference_df[reference_df[, 1] == x_backup[i], "mo"][[1L]]
 								            x[i] <- lookup(mo == ref_mo)
-												(v1.1.0.9004) lose dependencies

											
										
										
											2020-05-16 13:05:47 +02:00
+								            next
 								          }
-												(v0.9.0.9016) Support SNOMED codes

											
										
										
											2020-01-27 19:14:23 +01:00
+								        }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
 								        # WHONET: xxx = no growth
 								        if (tolower(as.character(paste0(x_backup_without_spp[i], ""))) %in% c("", "xxx", "na", "nan")) {
 								          x[i] <- NA_character_
-												(v1.1.0.9004) lose dependencies

											
										
										
											2020-05-16 13:05:47 +02:00
+								          next
-												(v0.7.1.9083) LIS code improvement

											
										
										
											2019-09-23 17:32:05 +02:00
+								        }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
 								        # check for very small input, but ignore the O antigens of E. coli
 								        if (nchar(gsub("[^a-zA-Z]", "", x_trimmed[i])) < 3
 								            & !toupper(x_backup_without_spp[i]) %like_case% "O?(26|103|104|104|111|121|145|157)") {
 								          # fewer than 3 chars and not looked for species, add as failure
 								          x[i] <- lookup(mo == "UNKNOWN")
 								          if (initial_search == TRUE) {
 								            failures <- c(failures, x_backup[i])
 								          }
 								          next
-												as.mo improvement

											
										
										
											2019-03-15 17:36:42 +01:00
+								        }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
 								        if (x_backup_without_spp[i] %like_case% "(virus|viridae)") {
 								          # there is no fullname like virus or viridae, so don't try to coerce it
 								          x[i] <- NA_character_
-												(v0.7.1.9024) eucast_rules() fix, new MOs

											
										
										
											2019-08-06 14:39:22 +02:00
+								          next
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								        }
 								        # translate known trivial abbreviations to genus + species ----
-												(v1.5.0.9026) vignette update, support for GISA

											
										
										
											2021-02-25 12:31:12 +01:00
+								        if (toupper(x_backup_without_spp[i]) %in% c("MRSA", "MSSA", "VISA", "VRSA", "BORSA", "GISA")
 								            | x_backup_without_spp[i] %like_case% "(^| )(mrsa|mssa|visa|vrsa|borsa|gisa|la-?mrsa|ca-?mrsa)( |$)") {
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								          x[i] <- lookup(fullname == "Staphylococcus aureus", uncertainty = -1)
-												(v0.7.1.9024) eucast_rules() fix, new MOs

											
										
										
											2019-08-06 14:39:22 +02:00
+								          next
 								        }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								        if (toupper(x_backup_without_spp[i]) %in% c("MRSE", "MSSE")
 								            | x_backup_without_spp[i] %like_case% "(^| )(mrse|msse)( |$)") {
 								          x[i] <- lookup(fullname == "Staphylococcus epidermidis", uncertainty = -1)
 								          next
-												(v.1.5.0.9000) implementation of EUCAST rules v11 (2021)

											
										
										
											2021-01-12 22:08:04 +01:00
+								        }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								        if (toupper(x_backup_without_spp[i]) == "VRE"
 								            | x_backup_without_spp[i] %like_case% "(^| )vre "
 								            | x_backup_without_spp[i] %like_case% "(enterococci|enterokok|enterococo)[a-z]*?$")  {
 								          x[i] <- lookup(genus == "Enterococcus", uncertainty = -1)
 								          next
 								        }
 								        # support for:
 								        # - AIEC (Adherent-Invasive E. coli)
 								        # - ATEC (Atypical Entero-pathogenic E. coli)
 								        # - DAEC (Diffusely Adhering E. coli)
 								        # - EAEC (Entero-Aggresive E. coli)
 								        # - EHEC (Entero-Haemorrhagic E. coli)
 								        # - EIEC (Entero-Invasive E. coli)
 								        # - EPEC (Entero-Pathogenic E. coli)
 								        # - ETEC (Entero-Toxigenic E. coli)
 								        # - NMEC (Neonatal Meningitis‐causing E. coli)
 								        # - STEC (Shiga-toxin producing E. coli)
 								        # - UPEC (Uropathogenic E. coli)
 								        if (toupper(x_backup_without_spp[i]) %in% c("AIEC", "ATEC", "DAEC", "EAEC", "EHEC", "EIEC", "EPEC", "ETEC", "NMEC", "STEC", "UPEC")
 								            # also support O-antigens of E. coli: O26, O103, O104, O111, O121, O145, O157
 								            | x_backup_without_spp[i] %like_case% "o?(26|103|104|111|121|145|157)") {
 								          x[i] <- lookup(fullname == "Escherichia coli", uncertainty = -1)
 								          next
 								        }
 								        if (toupper(x_backup_without_spp[i]) == "MRPA"
 								            | x_backup_without_spp[i] %like_case% "(^| )mrpa( |$)") {
 								          # multi resistant P. aeruginosa
 								          x[i] <- lookup(fullname == "Pseudomonas aeruginosa", uncertainty = -1)
 								          next
 								        }
 								        if (toupper(x_backup_without_spp[i]) == "CRSM") {
 								          # co-trim resistant S. maltophilia
 								          x[i] <- lookup(fullname == "Stenotrophomonas maltophilia", uncertainty = -1)
 								          next
 								        }
 								        if (toupper(x_backup_without_spp[i]) %in% c("PISP", "PRSP", "VISP", "VRSP")
 								            | x_backup_without_spp[i] %like_case% "(^| )(pisp|prsp|visp|vrsp)( |$)") {
 								          # peni I, peni R, vanco I, vanco R: S. pneumoniae
 								          x[i] <- lookup(fullname == "Streptococcus pneumoniae", uncertainty = -1)
 								          next
 								        }
 								        if (x_backup_without_spp[i] %like_case% "^g[abcdfghk]s$") {
 								          # Streptococci, like GBS = Group B Streptococci (B_STRPT_GRPB)
 								          x[i] <- lookup(mo == toupper(gsub("g([abcdfghk])s",
 								                                            "B_STRPT_GRP\\1",
-												(v1.5.0.9023) mo properties speed improvement

											
										
										
											2021-02-21 23:19:40 +01:00
+								                                            x_backup_without_spp[i],
 								                                            perl = TRUE)), uncertainty = -1)
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								          next
 								        }
 								        if (x_backup_without_spp[i] %like_case% "(streptococ|streptokok).* [abcdfghk]$") {
 								          # Streptococci in different languages, like "estreptococos grupo B"
 								          x[i] <- lookup(mo == toupper(gsub(".*(streptococ|streptokok|estreptococ).* ([abcdfghk])$",
 								                                            "B_STRPT_GRP\\2",
-												(v1.5.0.9023) mo properties speed improvement

											
										
										
											2021-02-21 23:19:40 +01:00
+								                                            x_backup_without_spp[i],
 								                                            perl = TRUE)), uncertainty = -1)
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								          next
 								        }
 								        if (x_backup_without_spp[i] %like_case% "group [abcdfghk] (streptococ|streptokok|estreptococ)") {
 								          # Streptococci in different languages, like "Group A Streptococci"
 								          x[i] <- lookup(mo == toupper(gsub(".*group ([abcdfghk]) (streptococ|streptokok|estreptococ).*",
 								                                            "B_STRPT_GRP\\1",
-												(v1.5.0.9023) mo properties speed improvement

											
										
										
											2021-02-21 23:19:40 +01:00
+								                                            x_backup_without_spp[i],
 								                                            perl = TRUE)), uncertainty = -1)
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								          next
 								        }
 								        if (x_backup_without_spp[i] %like_case% "haemoly.*strep") {
 								          # Haemolytic streptococci in different languages
 								          x[i] <- lookup(mo == "B_STRPT_HAEM", uncertainty = -1)
 								          next
 								        }
 								        # CoNS/CoPS in different languages (support for German, Dutch, Spanish, Portuguese)
 								        if (x_backup_without_spp[i] %like_case% "[ck]oagulas[ea] negatie?[vf]"
 								            | x_trimmed[i] %like_case% "[ck]oagulas[ea] negatie?[vf]"
 								            | x_backup_without_spp[i] %like_case% "[ck]o?ns[^a-z]?$") {
 								          # coerce S. coagulase negative
 								          x[i] <- lookup(mo == "B_STPHY_CONS", uncertainty = -1)
 								          next
 								        }
 								        if (x_backup_without_spp[i] %like_case% "[ck]oagulas[ea] positie?[vf]"
 								            | x_trimmed[i] %like_case% "[ck]oagulas[ea] positie?[vf]"
 								            | x_backup_without_spp[i] %like_case% "[ck]o?ps[^a-z]?$") {
 								          # coerce S. coagulase positive
 								          x[i] <- lookup(mo == "B_STPHY_COPS", uncertainty = -1)
 								          next
 								        }
 								        # streptococcal groups: milleri and viridans
 								        if (x_trimmed[i] %like_case% "strepto.* mil+er+i"
 								            | x_backup_without_spp[i] %like_case% "strepto.* mil+er+i"
 								            | x_backup_without_spp[i] %like_case% "mgs[^a-z]?$") {
 								          # Milleri Group Streptococcus (MGS)
 								          x[i] <- lookup(mo == "B_STRPT_MILL", uncertainty = -1)
 								          next
 								        }
 								        if (x_trimmed[i] %like_case% "strepto.* viridans"
 								            | x_backup_without_spp[i] %like_case% "strepto.* viridans"
 								            | x_backup_without_spp[i] %like_case% "vgs[^a-z]?$") {
 								          # Viridans Group Streptococcus (VGS)
 								          x[i] <- lookup(mo == "B_STRPT_VIRI", uncertainty = -1)
 								          next
 								        }
 								        if (x_backup_without_spp[i] %like_case% "gram[ -]?neg.*"
 								            | x_backup_without_spp[i] %like_case% "negatie?[vf]"
 								            | x_trimmed[i] %like_case% "gram[ -]?neg.*") {
 								          # coerce Gram negatives
 								          x[i] <- lookup(mo == "B_GRAMN", uncertainty = -1)
 								          next
 								        }
 								        if (x_backup_without_spp[i] %like_case% "gram[ -]?pos.*"
 								            | x_backup_without_spp[i] %like_case% "positie?[vf]"
 								            | x_trimmed[i] %like_case% "gram[ -]?pos.*") {
 								          # coerce Gram positives
 								          x[i] <- lookup(mo == "B_GRAMP", uncertainty = -1)
 								          next
 								        }
 								        if (x_backup_without_spp[i] %like_case% "mycoba[ck]teri.[nm]?$") {
 								          # coerce mycobacteria in multiple languages
 								          x[i] <- lookup(genus == "Mycobacterium", uncertainty = -1)
 								          next
 								        }
 								        if (x_backup_without_spp[i] %like_case% "salmonella [a-z]+ ?.*") {
 								          if (x_backup_without_spp[i] %like_case% "salmonella group") {
 								            # Salmonella Group A to Z, just return S. species for now
 								            x[i] <- lookup(genus == "Salmonella", uncertainty = -1)
 								            next
-												(v1.5.0.9023) mo properties speed improvement

											
										
										
											2021-02-21 23:19:40 +01:00
+								          } else if (x_backup[i] %like_case% "[sS]almonella [A-Z][a-z]+ ?.*" &
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								                     !x_backup[i] %like% "t[iy](ph|f)[iy]") {
 								            # Salmonella with capital letter species like "Salmonella Goettingen" - they're all S. enterica
 								            # except for S. typhi, S. paratyphi, S. typhimurium
 								            x[i] <- lookup(fullname == "Salmonella enterica", uncertainty = -1)
 								            uncertainties <- rbind(uncertainties,
 								                                   format_uncertainty_as_df(uncertainty_level = 1,
 								                                                            input = x_backup[i],
 								                                                            result_mo = lookup(fullname == "Salmonella enterica", "mo", uncertainty = -1)),
 								                                   stringsAsFactors = FALSE)
 								            next
-												speed improvement for as.mo, more old taxonomic names

											
										
										
											2018-09-27 23:23:48 +02:00
+								          }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								        }
 								        # trivial names known to the field:
 								        if ("meningococcus" %like_case% x_trimmed[i]) {
 								          # coerce Neisseria meningitidis
 								          x[i] <- lookup(fullname == "Neisseria meningitidis", uncertainty = -1)
 								          next
 								        }
 								        if ("gonococcus" %like_case% x_trimmed[i]) {
 								          # coerce Neisseria gonorrhoeae
 								          x[i] <- lookup(fullname == "Neisseria gonorrhoeae", uncertainty = -1)
 								          next
 								        }
 								        if ("pneumococcus" %like_case% x_trimmed[i]) {
 								          # coerce Streptococcus penumoniae
 								          x[i] <- lookup(fullname == "Streptococcus pneumoniae", uncertainty = -1)
 								          next
 								        }
 								        if (x_backup[i] %in% pkg_env$mo_failed) {
 								          # previously failed already in this session ----
 								          # (at this point the latest reference_df has also been checked)
 								          x[i] <- lookup(mo == "UNKNOWN")
 								          if (initial_search == TRUE) {
 								            failures <- c(failures, x_backup[i])
 								          }
 								          next
 								        }
 								        # NOW RUN THROUGH DIFFERENT PREVALENCE LEVELS
 								        check_per_prevalence <- function(data_to_check,
 								                                         data.old_to_check,
 								                                         a.x_backup,
 								                                         b.x_trimmed,
 								                                         c.x_trimmed_without_group,
 								                                         d.x_withspaces_start_end,
 								                                         e.x_withspaces_start_only,
 								                                         f.x_withspaces_end_only,
 								                                         g.x_backup_without_spp,
 								                                         h.x_species,
 								                                         i.x_trimmed_species) {
 								          # FIRST TRY FULLNAMES AND CODES ----
 								          # if only genus is available, return only genus
 								          if (all(!c(x[i], b.x_trimmed) %like_case% " ")) {
 								            found <- lookup(fullname_lower %in% c(h.x_species, i.x_trimmed_species),
-												(v1.1.0.9004) lose dependencies

											
										
										
											2020-05-16 13:05:47 +02:00
+								                            haystack = data_to_check)
 								            if (!is.na(found)) {
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								              x[i] <- found[1L]
 								              return(x[i])
 								            }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								            if (nchar(g.x_backup_without_spp) >= 6) {
 								              found <- lookup(fullname_lower %like_case% paste0("^", unregex(g.x_backup_without_spp), "[a-z]+"),
 								                              haystack = data_to_check)
 								              if (!is.na(found)) {
 								                x[i] <- found[1L]
 								                return(x[i])
 								              }
 								            }
 								            # rest of genus only is in allow_uncertain part.
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								          }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
 								          # allow no codes less than 4 characters long, was already checked for WHONET earlier
 								          if (nchar(g.x_backup_without_spp) < 4) {
 								            x[i] <- lookup(mo == "UNKNOWN")
 								            if (initial_search == TRUE) {
 								              failures <- c(failures, a.x_backup)
 								            }
 								            return(x[i])
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								          }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
 								          # try probable: trimmed version of fullname ----
 								          found <- lookup(fullname_lower %in% tolower(g.x_backup_without_spp),
-												(v1.4.0.9030) as.mo() fix for known lab codes

											
										
										
											2020-12-03 16:59:04 +01:00
+								                          haystack = data_to_check)
 								          if (!is.na(found)) {
 								            return(found[1L])
 								          }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
 								          # try any match keeping spaces ----
 								          if (nchar(g.x_backup_without_spp) >= 6) {
 								            found <- lookup(fullname_lower %like_case% d.x_withspaces_start_end,
 								                            haystack = data_to_check)
 								            if (!is.na(found)) {
 								              return(found[1L])
 								            }
 								          }
 								          # try any match keeping spaces, not ending with $ ----
 								          found <- lookup(fullname_lower %like_case% paste0(trimws(e.x_withspaces_start_only), " "),
-												(v1.4.0.9030) as.mo() fix for known lab codes

											
										
										
											2020-12-03 16:59:04 +01:00
+								                          haystack = data_to_check)
 								          if (!is.na(found)) {
 								            return(found[1L])
 								          }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								          if (nchar(g.x_backup_without_spp) >= 6) {
 								            found <- lookup(fullname_lower %like_case% e.x_withspaces_start_only,
 								                            haystack = data_to_check)
 								            if (!is.na(found)) {
 								              return(found[1L])
 								            }
 								          }
 								          # try any match keeping spaces, not start with ^ ----
 								          found <- lookup(fullname_lower %like_case% paste0(" ", trimws(f.x_withspaces_end_only)),
-												(v1.4.0.9030) as.mo() fix for known lab codes

											
										
										
											2020-12-03 16:59:04 +01:00
+								                          haystack = data_to_check)
 								          if (!is.na(found)) {
 								            return(found[1L])
 								          }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
 								          # try a trimmed version
 								          if (nchar(g.x_backup_without_spp) >= 6) {
 								            found <- lookup(fullname_lower %like_case% b.x_trimmed |
 								                              fullname_lower %like_case% c.x_trimmed_without_group,
 								                            haystack = data_to_check)
 								            if (!is.na(found)) {
 								              return(found[1L])
 								            }
 								          }
 								          # try splitting of characters in the middle and then find ID ----
 								          # only when text length is 6 or lower
 								          # like esco = E. coli, klpn = K. pneumoniae, stau = S. aureus, staaur = S. aureus
 								          if (nchar(g.x_backup_without_spp) <= 6) {
 								            x_length <- nchar(g.x_backup_without_spp)
 								            x_split <- paste0("^",
 								                              g.x_backup_without_spp %pm>% substr(1, x_length / 2),
 								                              ".* ",
 								                              g.x_backup_without_spp %pm>% substr((x_length / 2) + 1, x_length))
 								            found <- lookup(fullname_lower %like_case% x_split,
 								                            haystack = data_to_check)
 								            if (!is.na(found)) {
 								              return(found[1L])
 								            }
 								          }
 								          # try fullname without start and without nchar limit of >= 6 ----
 								          # like "K. pneu rhino" >> "Klebsiella pneumoniae (rhinoscleromatis)" = KLEPNERH
 								          found <- lookup(fullname_lower %like_case% e.x_withspaces_start_only,
-												(v1.1.0.9004) lose dependencies

											
										
										
											2020-05-16 13:05:47 +02:00
+								                          haystack = data_to_check)
 								          if (!is.na(found)) {
-												algorithm update

											
										
										
											2019-02-21 18:55:52 +01:00
+								            return(found[1L])
 								          }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
 								          # MISCELLANEOUS ----
 								          # look for old taxonomic names ----
 								          found <- lookup(fullname_lower %like_case% e.x_withspaces_start_only,
 								                          column = NULL, # all columns
 								                          haystack = data.old_to_check)
 								          if (!all(is.na(found))) {
 								            # when property is "ref" (which is the case in mo_ref, mo_authors and mo_year), return the old value, so:
 								            # mo_ref() of "Chlamydia psittaci" will be "Page, 1968" (with warning)
 								            # mo_ref() of "Chlamydophila psittaci" will be "Everett et al., 1999"
 								            if (property == "ref") {
 								              x[i] <- found["ref"]
 								            } else {
 								              x[i] <- lookup(fullname == found["fullname_new"], haystack = MO_lookup)
-												reorganised notes and warnings

											
										
										
											2018-12-14 10:52:20 +01:00
+								            }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								            pkg_env$mo_renamed_last_run <- found["fullname"]
 								            was_renamed(name_old = found["fullname"],
 								                        name_new = lookup(fullname == found["fullname_new"], "fullname", haystack = MO_lookup),
 								                        ref_old = found["ref"],
 								                        ref_new = lookup(fullname == found["fullname_new"], "ref", haystack = MO_lookup),
 								                        mo = lookup(fullname == found["fullname_new"], "mo", haystack = MO_lookup))
 								            return(x[i])
 								          }
 								          # check for uncertain results ----
 								          uncertain_fn <- function(a.x_backup,
 								                                   b.x_trimmed,
 								                                   d.x_withspaces_start_end,
 								                                   e.x_withspaces_start_only,
 								                                   f.x_withspaces_end_only,
 								                                   g.x_backup_without_spp,
 								                                   uncertain.reference_data_to_use) {
 								            if (uncertainty_level == 0) {
 								              # do not allow uncertainties
 								              return(NA_character_)
-												added Becker 2019

											
										
										
											2019-03-26 14:24:03 +01:00
+								            }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
 								            # UNCERTAINTY LEVEL 1 ----
 								            if (uncertainty_level >= 1) {
 								              now_checks_for_uncertainty_level <- 1
 								              # (1) look again for old taxonomic names, now for G. species ----
 								              if (isTRUE(debug)) {
 								                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (1) look again for old taxonomic names, now for G. species\n"))
 								              }
 								              if (isTRUE(debug)) {
 								                message("Running '", d.x_withspaces_start_end, "' and '", e.x_withspaces_start_only, "'")
 								              }
 								              found <- lookup(fullname_lower %like_case% d.x_withspaces_start_end |
 								                                fullname_lower %like_case% e.x_withspaces_start_only,
 								                              column = NULL, # all columns
 								                              haystack = data.old_to_check)
 								              if (!all(is.na(found)) & nchar(g.x_backup_without_spp) >= 6) {
 								                if (property == "ref") {
 								                  # when property is "ref" (which is the case in mo_ref, mo_authors and mo_year), return the old value, so:
 								                  # mo_ref("Chlamydia psittaci") = "Page, 1968" (with warning)
 								                  # mo_ref("Chlamydophila psittaci") = "Everett et al., 1999"
 								                  x <- found["ref"]
 								                } else {
 								                  x <- lookup(fullname == found["fullname_new"], haystack = MO_lookup)
 								                }
 								                was_renamed(name_old = found["fullname"],
 								                            name_new = lookup(fullname == found["fullname_new"], "fullname", haystack = MO_lookup),
 								                            ref_old = found["ref"],
 								                            ref_new = lookup(fullname == found["fullname_new"], "ref", haystack = MO_lookup),
 								                            mo = lookup(fullname == found["fullname_new"], "mo", haystack = MO_lookup))
 								                pkg_env$mo_renamed_last_run <- found["fullname"]
 								                uncertainties <<- rbind(uncertainties,
 								                                        format_uncertainty_as_df(uncertainty_level = now_checks_for_uncertainty_level,
 								                                                                 input = a.x_backup,
 								                                                                 result_mo = lookup(fullname == found["fullname_new"], "mo", haystack = MO_lookup)),
 								                                        stringsAsFactors = FALSE)
 								                return(x)
 								              }
 								              # (2) Try with misspelled input ----
 								              # just rerun with dyslexia_mode = TRUE will used the extensive regex part above
 								              if (isTRUE(debug)) {
 								                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (2) Try with misspelled input\n"))
 								              }
 								              if (isTRUE(debug)) {
 								                message("Running '", a.x_backup, "'")
 								              }
 								              # first try without dyslexia mode
 								              found <- suppressMessages(suppressWarnings(exec_as.mo(a.x_backup, initial_search = FALSE, dyslexia_mode = FALSE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 1, actual_input = a.x_backup)))
 								              if (empty_result(found)) {
 								                # then with dyslexia mode
 								                found <- suppressMessages(suppressWarnings(exec_as.mo(a.x_backup, initial_search = FALSE, dyslexia_mode = TRUE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 1, actual_input = a.x_backup)))
 								              }
 								              if (!empty_result(found)) {
 								                found_result <- found
 								                uncertainties <<- rbind(uncertainties,
 								                                        attr(found, which = "uncertainties", exact = TRUE),
 								                                        stringsAsFactors = FALSE)
 								                found <- lookup(mo == found)
 								                return(found)
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								              }
 								            }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
 								            # UNCERTAINTY LEVEL 2 ----
 								            if (uncertainty_level >= 2) {
 								              now_checks_for_uncertainty_level <- 2
 								              # (3) look for genus only, part of name ----
 								              if (isTRUE(debug)) {
 								                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (3) look for genus only, part of name\n"))
 								              }
 								              if (nchar(g.x_backup_without_spp) > 4 & !b.x_trimmed %like_case% " ") {
-												(v1.5.0.9023) mo properties speed improvement

											
										
										
											2021-02-21 23:19:40 +01:00
+								                if (!b.x_trimmed %like_case% "^[A-Z][a-z]+") {
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								                  if (isTRUE(debug)) {
 								                    message("Running '", paste(b.x_trimmed, "species"), "'")
 								                  }
 								                  # not when input is like Genustext, because then Neospora would lead to Actinokineospora
 								                  found <- lookup(fullname_lower %like_case% paste(b.x_trimmed, "species"),
 								                                  haystack = uncertain.reference_data_to_use)
 								                  if (!is.na(found)) {
 								                    found_result <- found
 								                    found <- lookup(mo == found)
 								                    uncertainties <<- rbind(uncertainties,
 								                                            format_uncertainty_as_df(uncertainty_level = now_checks_for_uncertainty_level,
 								                                                                     input = a.x_backup,
 								                                                                     result_mo = found_result),
 								                                            stringsAsFactors = FALSE)
 								                    return(found)
 								                  }
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								                }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								              }
 								              # (4) strip values between brackets ----
 								              if (isTRUE(debug)) {
 								                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (4) strip values between brackets\n"))
 								              }
 								              a.x_backup_stripped <- gsub("( *[(].*[)] *)", " ", a.x_backup, perl = TRUE)
 								              a.x_backup_stripped <- trimws(gsub(" +", " ", a.x_backup_stripped, perl = TRUE))
 								              if (isTRUE(debug)) {
 								                message("Running '", a.x_backup_stripped, "'")
 								              }
 								              # first try without dyslexia mode
 								              found <- suppressMessages(suppressWarnings(exec_as.mo(a.x_backup_stripped, initial_search = FALSE, dyslexia_mode = FALSE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 2, actual_input = a.x_backup)))
 								              if (empty_result(found)) {
 								                # then with dyslexia mode
 								                found <- suppressMessages(suppressWarnings(exec_as.mo(a.x_backup_stripped, initial_search = FALSE, dyslexia_mode = TRUE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 2, actual_input = a.x_backup)))
 								              }
 								              if (!empty_result(found) & nchar(g.x_backup_without_spp) >= 6) {
 								                found_result <- found
 								                uncertainties <<- rbind(uncertainties,
 								                                        attr(found, which = "uncertainties", exact = TRUE),
 								                                        stringsAsFactors = FALSE)
 								                found <- lookup(mo == found)
 								                return(found)
 								              }
 								              # (5) inverse input ----
 								              if (isTRUE(debug)) {
 								                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (5) inverse input\n"))
 								              }
 								              a.x_backup_inversed <- paste(rev(unlist(strsplit(a.x_backup, split = " "))), collapse = " ")
 								              if (isTRUE(debug)) {
 								                message("Running '", a.x_backup_inversed, "'")
 								              }
 								              # first try without dyslexia mode
 								              found <- suppressMessages(suppressWarnings(exec_as.mo(a.x_backup_inversed, initial_search = FALSE, dyslexia_mode = FALSE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 2, actual_input = a.x_backup)))
 								              if (empty_result(found)) {
 								                # then with dyslexia mode
 								                found <- suppressMessages(suppressWarnings(exec_as.mo(a.x_backup_inversed, initial_search = FALSE, dyslexia_mode = TRUE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 2, actual_input = a.x_backup)))
 								              }
 								              if (!empty_result(found) & nchar(g.x_backup_without_spp) >= 6) {
 								                found_result <- found
 								                uncertainties <<- rbind(uncertainties,
 								                                        attr(found, which = "uncertainties", exact = TRUE),
 								                                        stringsAsFactors = FALSE)
 								                found <- lookup(mo == found)
 								                return(found)
 								              }
 								              # (6) try to strip off half an element from end and check the remains ----
 								              if (isTRUE(debug)) {
 								                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (6) try to strip off half an element from end and check the remains\n"))
 								              }
 								              x_strip <- a.x_backup %pm>% strsplit("[ .]") %pm>% unlist()
 								              if (length(x_strip) > 1) {
 								                for (i in seq_len(length(x_strip) - 1)) {
 								                  lastword <- x_strip[length(x_strip) - i + 1]
 								                  lastword_half <- substr(lastword, 1, as.integer(nchar(lastword) / 2))
 								                  # remove last half of the second term
 								                  x_strip_collapsed <- paste(c(x_strip[seq_len(length(x_strip) - i)], lastword_half), collapse = " ")
 								                  if (nchar(x_strip_collapsed) >= 4 & nchar(lastword_half) > 2) {
 								                    if (isTRUE(debug)) {
 								                      message("Running '", x_strip_collapsed, "'")
 								                    }
 								                    # first try without dyslexia mode
 								                    found <- suppressMessages(suppressWarnings(exec_as.mo(x_strip_collapsed, initial_search = FALSE, dyslexia_mode = FALSE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 2, actual_input = a.x_backup)))
 								                    if (empty_result(found)) {
 								                      # then with dyslexia mode
 								                      found <- suppressMessages(suppressWarnings(exec_as.mo(x_strip_collapsed, initial_search = FALSE, dyslexia_mode = TRUE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 2, actual_input = a.x_backup)))
 								                    }
 								                    if (!empty_result(found)) {
 								                      found_result <- found
 								                      uncertainties <<- rbind(uncertainties,
 								                                              attr(found, which = "uncertainties", exact = TRUE),
 								                                              stringsAsFactors = FALSE)
 								                      found <- lookup(mo == found)
 								                      return(found)
 								                    }
 								                  }
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								                }
-												(v0.7.1.9038) mo algorithm inprovements

											
										
										
											2019-08-12 19:07:15 +02:00
+								              }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								              # (7) try to strip off one element from end and check the remains ----
 								              if (isTRUE(debug)) {
 								                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (7) try to strip off one element from end and check the remains\n"))
 								              }
 								              if (length(x_strip) > 1) {
 								                for (i in seq_len(length(x_strip) - 1)) {
 								                  x_strip_collapsed <- paste(x_strip[seq_len(length(x_strip) - i)], collapse = " ")
 								                  if (nchar(x_strip_collapsed) >= 6) {
 								                    if (isTRUE(debug)) {
 								                      message("Running '", x_strip_collapsed, "'")
 								                    }
 								                    # first try without dyslexia mode
 								                    found <- suppressMessages(suppressWarnings(exec_as.mo(x_strip_collapsed, initial_search = FALSE, dyslexia_mode = FALSE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 2, actual_input = a.x_backup)))
 								                    if (empty_result(found)) {
 								                      # then with dyslexia mode
 								                      found <- suppressMessages(suppressWarnings(exec_as.mo(x_strip_collapsed, initial_search = FALSE, dyslexia_mode = TRUE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 2, actual_input = a.x_backup)))
 								                    }
 								                    if (!empty_result(found)) {
 								                      found_result <- found
 								                      uncertainties <<- rbind(uncertainties,
 								                                              attr(found, which = "uncertainties", exact = TRUE),
 								                                              stringsAsFactors = FALSE)
 								                      found <- lookup(mo == found)
 								                      return(found)
 								                    }
 								                  }
 								                }
 								              }
 								              # (8) check for unknown yeasts/fungi ----
 								              if (isTRUE(debug)) {
 								                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (8) check for unknown yeasts/fungi\n"))
 								              }
 								              if (b.x_trimmed %like_case% "yeast") {
 								                found <- "F_YEAST"
 								                found_result <- found
 								                found <- lookup(mo == found)
 								                uncertainties <<- rbind(uncertainties,
 								                                        format_uncertainty_as_df(uncertainty_level = now_checks_for_uncertainty_level,
 								                                                                 input = a.x_backup,
 								                                                                 result_mo = found_result),
 								                                        stringsAsFactors = FALSE)
 								                return(found)
 								              }
 								              if (b.x_trimmed %like_case% "(fungus|fungi)" & !b.x_trimmed %like_case% "fungiphrya") {
 								                found <- "F_FUNGUS"
 								                found_result <- found
 								                found <- lookup(mo == found)
 								                uncertainties <<- rbind(uncertainties,
 								                                        format_uncertainty_as_df(uncertainty_level = now_checks_for_uncertainty_level,
 								                                                                 input = a.x_backup,
 								                                                                 result_mo = found_result),
 								                                        stringsAsFactors = FALSE)
 								                return(found)
 								              }
 								              # (9) try to strip off one element from start and check the remains (only allow >= 2-part name outcome) ----
 								              if (isTRUE(debug)) {
 								                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (9) try to strip off one element from start and check the remains (only allow >= 2-part name outcome)\n"))
 								              }
 								              x_strip <- a.x_backup %pm>% strsplit("[ .]") %pm>% unlist()
 								              if (length(x_strip) > 1 & nchar(g.x_backup_without_spp) >= 6) {
 								                for (i in 2:(length(x_strip))) {
 								                  x_strip_collapsed <- paste(x_strip[i:length(x_strip)], collapse = " ")
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								                  if (isTRUE(debug)) {
 								                    message("Running '", x_strip_collapsed, "'")
 								                  }
 								                  # first try without dyslexia mode
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								                  found <- suppressMessages(suppressWarnings(exec_as.mo(x_strip_collapsed, initial_search = FALSE, dyslexia_mode = FALSE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 2, actual_input = a.x_backup)))
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								                  if (empty_result(found)) {
 								                    # then with dyslexia mode
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								                    found <- suppressMessages(suppressWarnings(exec_as.mo(x_strip_collapsed, initial_search = FALSE, dyslexia_mode = TRUE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 2, actual_input = a.x_backup)))
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								                  }
 								                  if (!empty_result(found)) {
 								                    found_result <- found
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								                    # uncertainty level 2 only if searched part contains a space (otherwise it will be found with lvl 3)
 								                    if (x_strip_collapsed %like_case% " ") {
 								                      uncertainties <<- rbind(uncertainties,
 								                                              attr(found, which = "uncertainties", exact = TRUE),
 								                                              stringsAsFactors = FALSE)
 								                      found <- lookup(mo == found)
 								                      return(found)
 								                    }
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								                  }
-												added Becker 2019

											
										
										
											2019-03-26 14:24:03 +01:00
+								                }
-												uncertainty levels, new WHONET codes

											
										
										
											2019-03-12 12:19:27 +01:00
+								              }
 								            }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
 								            # UNCERTAINTY LEVEL 3 ----
 								            if (uncertainty_level >= 3) {
 								              now_checks_for_uncertainty_level <- 3
 								              # (10) try to strip off one element from start and check the remains (any text size) ----
 								              if (isTRUE(debug)) {
 								                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (10) try to strip off one element from start and check the remains (any text size)\n"))
 								              }
 								              x_strip <- a.x_backup %pm>% strsplit("[ .]") %pm>% unlist()
 								              if (length(x_strip) > 1 & nchar(g.x_backup_without_spp) >= 6) {
 								                for (i in 2:(length(x_strip))) {
 								                  x_strip_collapsed <- paste(x_strip[i:length(x_strip)], collapse = " ")
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								                  if (isTRUE(debug)) {
 								                    message("Running '", x_strip_collapsed, "'")
 								                  }
 								                  # first try without dyslexia mode
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								                  found <- suppressMessages(suppressWarnings(exec_as.mo(x_strip_collapsed, initial_search = FALSE, dyslexia_mode = FALSE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 3, actual_input = a.x_backup)))
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								                  if (empty_result(found)) {
 								                    # then with dyslexia mode
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								                    found <- suppressMessages(suppressWarnings(exec_as.mo(x_strip_collapsed, initial_search = FALSE, dyslexia_mode = TRUE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 3, actual_input = a.x_backup)))
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								                  }
 								                  if (!empty_result(found)) {
 								                    found_result <- found
 								                    uncertainties <<- rbind(uncertainties,
-												(v1.4.0.9017) stringsAsFactors definitions

											
										
										
											2020-11-11 16:49:27 +01:00
+								                                            attr(found, which = "uncertainties", exact = TRUE),
 								                                            stringsAsFactors = FALSE)
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								                    found <- lookup(mo == found)
-												(v1.1.0.9004) lose dependencies

											
										
										
											2020-05-16 13:05:47 +02:00
+								                    return(found)
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								                  }
 								                }
 								              }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								              # (11) try to strip off one element from end and check the remains (any text size) ----
 								              # (this is in fact 7 but without nchar limit of >=6)
 								              if (isTRUE(debug)) {
 								                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (11) try to strip off one element from end and check the remains (any text size)\n"))
 								              }
 								              if (length(x_strip) > 1) {
 								                for (i in seq_len(length(x_strip) - 1)) {
 								                  x_strip_collapsed <- paste(x_strip[seq_len(length(x_strip) - i)], collapse = " ")
 								                  if (isTRUE(debug)) {
 								                    message("Running '", x_strip_collapsed, "'")
 								                  }
 								                  # first try without dyslexia mode
 								                  found <- suppressMessages(suppressWarnings(exec_as.mo(x_strip_collapsed, initial_search = FALSE, dyslexia_mode = FALSE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 3, actual_input = a.x_backup)))
 								                  if (empty_result(found)) {
 								                    # then with dyslexia mode
 								                    found <- suppressMessages(suppressWarnings(exec_as.mo(x_strip_collapsed, initial_search = FALSE, dyslexia_mode = TRUE, allow_uncertain = FALSE, debug = debug, reference_data_to_use = uncertain.reference_data_to_use, actual_uncertainty = 3, actual_input = a.x_backup)))
 								                  }
 								                  if (!empty_result(found)) {
 								                    found_result <- found
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								                    uncertainties <<- rbind(uncertainties,
-												(v1.4.0.9017) stringsAsFactors definitions

											
										
										
											2020-11-11 16:49:27 +01:00
+								                                            attr(found, which = "uncertainties", exact = TRUE),
 								                                            stringsAsFactors = FALSE)
-												(v1.3.0.9020) fix for uncertainty in as.mo()

											
										
										
											2020-09-14 19:41:48 +02:00
+								                    found <- lookup(mo == found)
-												(v1.1.0.9004) lose dependencies

											
										
										
											2020-05-16 13:05:47 +02:00
+								                    return(found)
-												(v0.7.1.9004) atc class removal

											
										
										
											2019-06-27 11:57:45 +02:00
+								                  }
 								                }
 								              }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
 								              # (12) part of a name (very unlikely match) ----
 								              if (isTRUE(debug)) {
 								                cat(font_bold("\n[ UNCERTAINTY LEVEL", now_checks_for_uncertainty_level, "] (12) part of a name (very unlikely match)\n"))
-												added Becker 2019

											
										
										
											2019-03-26 14:24:03 +01:00
+								              }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								              if (isTRUE(debug)) {
 								                message("Running '", f.x_withspaces_end_only, "'")
 								              }
 								              if (nchar(g.x_backup_without_spp) >= 6) {
 								                found <- lookup(fullname_lower %like_case% f.x_withspaces_end_only, column = "mo")
 								                if (!is.na(found)) {
 								                  found_result <- lookup(mo == found)
-												added Becker 2019

											
										
										
											2019-03-26 14:24:03 +01:00
+								                  uncertainties <<- rbind(uncertainties,
-												(v1.4.0.9017) stringsAsFactors definitions

											
										
										
											2020-11-11 16:49:27 +01:00
+								                                          attr(found, which = "uncertainties", exact = TRUE),
 								                                          stringsAsFactors = FALSE)
-												(v1.3.0.9020) fix for uncertainty in as.mo()

											
										
										
											2020-09-14 19:41:48 +02:00
+								                  found <- lookup(mo == found)
-												(v1.1.0.9004) lose dependencies

											
										
										
											2020-05-16 13:05:47 +02:00
+								                  return(found)
-												uncertainty levels, new WHONET codes

											
										
										
											2019-03-12 12:19:27 +01:00
+								                }
-												mo codes for WHONET

											
										
										
											2019-02-08 16:06:54 +01:00
+								              }
 								            }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
 								            # didn't found in uncertain results too
 								            return(NA_character_)
-												(v0.7.1.9038) mo algorithm inprovements

											
										
										
											2019-08-12 19:07:15 +02:00
+								          }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
 								          # uncertain results
 								          x[i] <- uncertain_fn(a.x_backup = a.x_backup,
 								                               b.x_trimmed = b.x_trimmed,
 								                               d.x_withspaces_start_end = d.x_withspaces_start_end,
 								                               e.x_withspaces_start_only = e.x_withspaces_start_only,
 								                               f.x_withspaces_end_only = f.x_withspaces_end_only,
 								                               g.x_backup_without_spp = g.x_backup_without_spp,
 								                               uncertain.reference_data_to_use = MO_lookup)
 								          if (!empty_result(x[i])) {
 								            return(x[i])
 								          }
 								          # didn't found any
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								          return(NA_character_)
 								        }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
 								        # CHECK ALL IN ONE GO ----
 								        x[i] <- check_per_prevalence(data_to_check = MO_lookup,
 								                                     data.old_to_check = MO.old_lookup,
 								                                     a.x_backup = x_backup[i],
 								                                     b.x_trimmed = x_trimmed[i],
 								                                     c.x_trimmed_without_group = x_trimmed_without_group[i],
 								                                     d.x_withspaces_start_end = x_withspaces_start_end[i],
 								                                     e.x_withspaces_start_only = x_withspaces_start_only[i],
 								                                     f.x_withspaces_end_only = x_withspaces_end_only[i],
 								                                     g.x_backup_without_spp = x_backup_without_spp[i],
 								                                     h.x_species = x_species[i],
 								                                     i.x_trimmed_species = x_trimmed_species[i])
-												(v1.1.0.9004) lose dependencies

											
										
										
											2020-05-16 13:05:47 +02:00
+								        if (!empty_result(x[i])) {
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								          next
 								        }
 								        # no results found: make them UNKNOWN ----
 								        x[i] <- lookup(mo == "UNKNOWN", uncertainty = -1)
 								        if (initial_search == TRUE) {
 								          failures <- c(failures, x_backup[i])
-												better as.mo handling

											
										
										
											2018-12-06 14:36:39 +01:00
+								        }
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								      }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
-												DSMZ data

											
										
										
											2019-03-18 14:29:41 +01:00
+								      if (initial_search == TRUE) {
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								        close(progress)
-												as.mo improvement

											
										
										
											2019-03-15 17:36:42 +01:00
+								      }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
-												(v1.5.0.9021) improve speed of %like%

											
										
										
											2021-02-21 20:15:09 +01:00
+								      if (isTRUE(debug) && initial_search == TRUE) {
 								        cat("Ended search", time_track(), "\n")
 								      }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
 								      # handling failures ----
 								      failures <- failures[!failures %in% c(NA, NULL, NaN)]
 								      if (length(failures) > 0 & initial_search == TRUE) {
 								        pkg_env$mo_failures <- sort(unique(failures))
 								        pkg_env$mo_failed <- c(pkg_env$mo_failed, pkg_env$mo_failures)
 								        plural <- c("value", "it", "was")
 								        if (pm_n_distinct(failures) > 1) {
 								          plural <- c("values", "them", "were")
 								        }
 								        x_input_clean <- trimws2(x_input)
 								        total_failures <- length(x_input_clean[as.character(x_input_clean) %in% as.character(failures) & !x_input %in% c(NA, NULL, NaN)])
 								        total_n <- length(x_input[!x_input %in% c(NA, NULL, NaN)])
 								        msg <- paste0(nr2char(pm_n_distinct(failures)), " unique ", plural[1],
 								                      " (covering ", percentage(total_failures / total_n),
 								                      ") could not be coerced and ", plural[3], " considered 'unknown'")
 								        if (pm_n_distinct(failures) <= 10) {
 								          msg <- paste0(msg, ": ", vector_and(failures, quotes = TRUE))
 								        }
 								        msg <- paste0(msg,
 								                      ".\nUse `mo_failures()` to review ", plural[2], ". Edit the `allow_uncertain` argument if needed (see ?as.mo).\n",
 								                      "You can also use your own reference data with set_mo_source() or directly, e.g.:\n",
 								                      '  as.mo("mycode", reference_df = data.frame(own = "mycode", mo = "', MO_lookup$mo[match("Escherichia coli", MO_lookup$fullname)], '"))\n',
 								                      '  mo_name("mycode", reference_df = data.frame(own = "mycode", mo = "', MO_lookup$mo[match("Escherichia coli", MO_lookup$fullname)], '"))\n')
 								        warning_(paste0("\n", msg),
 								                 add_fn = font_red,
 								                 call = FALSE,
 								                 immediate = TRUE) # thus will always be shown, even if >= warnings
 								      }
 								      # handling uncertainties ----
 								      if (NROW(uncertainties) > 0 & initial_search == TRUE) {
 								        uncertainties <- as.list(pm_distinct(uncertainties, input, .keep_all = TRUE))
 								        pkg_env$mo_uncertainties <- uncertainties
 								        plural <- c("", "it", "was")
 								        if (length(uncertainties$input) > 1) {
 								          plural <- c("s", "them", "were")
 								        }
-												(v1.5.0.9031) math processing of MICs

											
										
										
											2021-03-05 15:36:39 +01:00
+								        msg <- paste0("Translation is uncertain of ", nr2char(length(uncertainties$input)), " microorganism", plural[1],
 								                      ". Use `mo_uncertainties()` to review ", plural[2], ".")
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
+								        message_(msg)
 								      }
 								      x[already_known] <- x_known
-												speed improvement for as.mo, more old taxonomic names

											
										
										
											2018-09-27 23:23:48 +02:00
+								    }
-												new class bactid

											
										
										
											2018-07-23 14:14:03 +02:00
+								  }
-												(v1.5.0.9020) translation fix

											
										
										
											2021-02-18 23:23:14 +01:00
-												algorithm improvement

											
										
										
											2018-09-14 10:31:21 +02:00
+								  # Becker ----
-												new MOs, cleanup

											
										
										
											2018-09-01 21:19:46 +02:00
+								  if (Becker == TRUE | Becker == "all") {
-												(v1.4.0.9003) CoNS update

											
										
										
											2020-10-20 21:00:57 +02:00
+								    # warn when species found that are not in:
 								    # - Becker et al. 2014, PMID 25278577
 								    # - Becker et al. 2019, PMID 30872103
 								    # - Becker et al. 2020, PMID 32056452
-												(v1.5.0.9015) unit test fix, grouped first isolates

											
										
										
											2021-02-04 16:48:16 +01:00
+								    post_Becker <- character(0) # 2020-10-20 currently all are mentioned in above papers (otherwise uncomment the section below)
-												(v1.4.0.9036) more unit tests

											
										
										
											2020-12-11 12:17:23 +01:00
 								    # nolint start
 								    # if (any(x %in% MO_lookup[which(MO_lookup$species %in% post_Becker), property])) {
 								    #   warning_("Becker ", font_italic("et al."), " (2014, 2019) does not contain these species named after their publication: ",
 								    #            font_italic(paste("S.",
 								    #                              sort(mo_species(unique(x[x %in% MO_lookup[which(MO_lookup$species %in% post_Becker), property]]))),
 								    #                              collapse = ", ")),
 								    #            ".",
 								    #            call = FALSE,
 								    #            immediate = TRUE)
 								    # }
 								    # nolint end
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v1.3.0.9035) mdro() for EUCAST 3.2, examples cleanup

											
										
										
											2020-09-29 23:35:46 +02:00
+								    # 'MO_CONS' and 'MO_COPS' are <mo> vectors created in R/zzz.R
 								    CoNS <- MO_lookup[which(MO_lookup$mo %in% MO_CONS), property, drop = TRUE]
-												(v1.3.0.9020) fix for uncertainty in as.mo()

											
										
										
											2020-09-14 19:41:48 +02:00
+								    x[x %in% CoNS] <- lookup(mo == "B_STPHY_CONS", uncertainty = -1)
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v1.3.0.9035) mdro() for EUCAST 3.2, examples cleanup

											
										
										
											2020-09-29 23:35:46 +02:00
+								    CoPS <- MO_lookup[which(MO_lookup$mo %in% MO_COPS), property, drop = TRUE]
-												(v1.3.0.9020) fix for uncertainty in as.mo()

											
										
										
											2020-09-14 19:41:48 +02:00
+								    x[x %in% CoPS] <- lookup(mo == "B_STPHY_COPS", uncertainty = -1)
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												new MOs, cleanup

											
										
										
											2018-09-01 21:19:46 +02:00
+								    if (Becker == "all") {
-												(v1.3.0.9020) fix for uncertainty in as.mo()

											
										
										
											2020-09-14 19:41:48 +02:00
+								      x[x %in% lookup(fullname %like_case% "^Staphylococcus aureus", n = Inf)] <- lookup(mo == "B_STPHY_COPS", uncertainty = -1)
-												new MOs, cleanup

											
										
										
											2018-09-01 21:19:46 +02:00
+								    }
 								  }
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												algorithm improvement

											
										
										
											2018-09-14 10:31:21 +02:00
+								  # Lancefield ----
-												Support for German and Spanish microorganism properties, cleanup

											
										
										
											2018-09-04 11:33:30 +02:00
+								  if (Lancefield == TRUE | Lancefield == "all") {
-												speed improvement for as.mo, more old taxonomic names

											
										
										
											2018-09-27 23:23:48 +02:00
+								    # group A - S. pyogenes
-												(v1.3.0.9020) fix for uncertainty in as.mo()

											
										
										
											2020-09-14 19:41:48 +02:00
+								    x[x %in% lookup(genus == "Streptococcus" & species == "pyogenes", n = Inf)] <- lookup(fullname == "Streptococcus group A", uncertainty = -1)
-												speed improvement for as.mo, more old taxonomic names

											
										
										
											2018-09-27 23:23:48 +02:00
+								    # group B - S. agalactiae
-												(v1.3.0.9020) fix for uncertainty in as.mo()

											
										
										
											2020-09-14 19:41:48 +02:00
+								    x[x %in% lookup(genus == "Streptococcus" & species == "agalactiae", n = Inf)] <- lookup(fullname == "Streptococcus group B", uncertainty = -1)
-												new MOs, cleanup

											
										
										
											2018-09-01 21:19:46 +02:00
+								    # group C
-												(v1.1.0.9004) lose dependencies

											
										
										
											2020-05-16 13:05:47 +02:00
+								    x[x %in% lookup(genus == "Streptococcus" &
 								                      species %in% c("equisimilis", "equi", "zooepidemicus", "dysgalactiae"),
-												(v1.3.0.9020) fix for uncertainty in as.mo()

											
										
										
											2020-09-14 19:41:48 +02:00
+								                    n = Inf)] <- lookup(fullname == "Streptococcus group C", uncertainty = -1)
-												Support for German and Spanish microorganism properties, cleanup

											
										
										
											2018-09-04 11:33:30 +02:00
+								    if (Lancefield == "all") {
-												speed improvement for as.mo, more old taxonomic names

											
										
										
											2018-09-27 23:23:48 +02:00
+								      # all Enterococci
-												(v1.3.0.9020) fix for uncertainty in as.mo()

											
										
										
											2020-09-14 19:41:48 +02:00
+								      x[x %in% lookup(genus == "Enterococcus", n = Inf)] <- lookup(fullname == "Streptococcus group D", uncertainty = -1)
-												speed improvement for as.mo, more old taxonomic names

											
										
										
											2018-09-27 23:23:48 +02:00
+								    }
 								    # group F - S. anginosus
-												(v1.3.0.9020) fix for uncertainty in as.mo()

											
										
										
											2020-09-14 19:41:48 +02:00
+								    x[x %in% lookup(genus == "Streptococcus" & species == "anginosus", n = Inf)] <- lookup(fullname == "Streptococcus group F", uncertainty = -1)
-												speed improvement for as.mo, more old taxonomic names

											
										
										
											2018-09-27 23:23:48 +02:00
+								    # group H - S. sanguinis
-												(v1.3.0.9020) fix for uncertainty in as.mo()

											
										
										
											2020-09-14 19:41:48 +02:00
+								    x[x %in% lookup(genus == "Streptococcus" & species == "sanguinis", n = Inf)] <- lookup(fullname == "Streptococcus group H", uncertainty = -1)
-												speed improvement for as.mo, more old taxonomic names

											
										
										
											2018-09-27 23:23:48 +02:00
+								    # group K - S. salivarius
-												(v1.3.0.9020) fix for uncertainty in as.mo()

											
										
										
											2020-09-14 19:41:48 +02:00
+								    x[x %in% lookup(genus == "Streptococcus" & species == "salivarius", n = Inf)] <- lookup(fullname == "Streptococcus group K", uncertainty = -1)
-												new MOs, cleanup

											
										
										
											2018-09-01 21:19:46 +02:00
+								  }
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												mo codes for WHONET

											
										
										
											2019-02-08 16:06:54 +01:00
+								  # Wrap up ----------------------------------------------------------------
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												fix for as.mo, added also_single_tested

											
										
										
											2018-10-19 13:53:31 +02:00
+								  # comply to x, which is also unique and without empty values
-												con WHONET, filter ab class

											
										
										
											2019-03-05 22:47:42 +01:00
+								  x_input_unique_nonempty <- unique(x_input[!is.na(x_input)
 								                                            & !is.null(x_input)
 								                                            & !identical(x_input, "")
-												(v0.6.1.9053) prerelease fixes

											
										
										
											2019-06-02 19:23:19 +02:00
+								                                            & !identical(x_input, "xxx")])
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												ab_* functions, mo_* functions, 180 new microorganisms, speed improvement for bactid

											
										
										
											2018-08-28 13:51:13 +02:00
+								  # left join the found results to the original input values (x_input)
-												fix for as.mo, added also_single_tested

											
										
										
											2018-10-19 13:53:31 +02:00
+								  df_found <- data.frame(input = as.character(x_input_unique_nonempty),
 								                         found = as.character(x),
-												ab_* functions, mo_* functions, 180 new microorganisms, speed improvement for bactid

											
										
										
											2018-08-28 13:51:13 +02:00
+								                         stringsAsFactors = FALSE)
-												speed improvement for as.mo, more old taxonomic names

											
										
										
											2018-09-27 23:23:48 +02:00
+								  df_input <- data.frame(input = as.character(x_input),
-												ab_* functions, mo_* functions, 180 new microorganisms, speed improvement for bactid

											
										
										
											2018-08-28 13:51:13 +02:00
+								                         stringsAsFactors = FALSE)
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v1.3.0.9014) as.mo() speed improvement

											
										
										
											2020-09-03 12:31:48 +02:00
+								  # super fast using match() which is a lot faster than merge()
-												(v1.1.0.9004) lose dependencies

											
										
										
											2020-05-16 13:05:47 +02:00
+								  x <- df_found$found[match(df_input$input, df_found$input)]
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												speed improvement for as.mo, more old taxonomic names

											
										
										
											2018-09-27 23:23:48 +02:00
+								  if (property == "mo") {
-												(v1.4.0.9021) more robust class setting

											
										
										
											2020-11-16 16:57:55 +01:00
+								    x <- set_clean_class(x, new_class = c("mo", "character"))
-												speed improvement for as.mo, more old taxonomic names

											
										
										
											2018-09-27 23:23:48 +02:00
+								  }
-												(v1.5.0.9028) Updated taxonomy until March 2021

											
										
										
											2021-03-04 23:28:32 +01:00
 								  # keep track of time
 								  end_time <- Sys.time()
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												mo codes for WHONET

											
										
										
											2019-02-08 16:06:54 +01:00
+								  if (length(mo_renamed()) > 0) {
-												(v0.7.1.9005) new rsi calculations, atc class removal

											
										
										
											2019-07-01 14:03:15 +02:00
+								    print(mo_renamed())
-												reorganised notes and warnings

											
										
										
											2018-12-14 10:52:20 +01:00
+								  }
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v1.3.0.9020) fix for uncertainty in as.mo()

											
										
										
											2020-09-14 19:41:48 +02:00
+								  if (initial_search == FALSE) {
 								    # we got here from uncertain_fn().
 								    if (NROW(uncertainties) == 0) {
 								      # the stripped/transformed version of x_backup is apparently a full hit, like with: as.mo("Escherichia (hello there) coli")
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								      uncertainties <- rbind(uncertainties,
 								                             format_uncertainty_as_df(uncertainty_level = actual_uncertainty,
 								                                                      input = actual_input,
-												(v1.3.0.9020) fix for uncertainty in as.mo()

											
										
										
											2020-09-14 19:41:48 +02:00
+								                                                      result_mo = x,
-												(v1.4.0.9017) stringsAsFactors definitions

											
										
										
											2020-11-11 16:49:27 +01:00
+								                                                      candidates = ""),
 								                             stringsAsFactors = FALSE)
-												(v1.3.0.9020) fix for uncertainty in as.mo()

											
										
										
											2020-09-14 19:41:48 +02:00
+								    }
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								    # this will save the uncertain items as attribute, so they can be bound to `uncertainties` in the uncertain_fn() function
 								    x <- structure(x, uncertainties = uncertainties)
-												(v1.4.0.9044) mo tibble printing, mo_shortname() fix

											
										
										
											2020-12-24 23:29:10 +01:00
+								  } else {
 								    # keep track of time - give some hints to improve speed if it takes a long time
 								    delta_time <- difftime(end_time, start_time, units = "secs")
 								    if (delta_time >= 30) {
-												(v1.5.0.9028) Updated taxonomy until March 2021

											
										
										
											2021-03-04 23:28:32 +01:00
+								      message_("Using `as.mo()` took ", round(delta_time), " seconds, which is a long time. Some suggestions to improve speed include:")
-												(v1.4.0.9044) mo tibble printing, mo_shortname() fix

											
										
										
											2020-12-24 23:29:10 +01:00
+								      message_(word_wrap("- Try to use as many valid taxonomic names as possible for your input.",
 								                         extra_indent = 2),
 								               as_note = FALSE)
-												(v1.5.0.9010) MDRO vignette update, get_episode for < day

											
										
										
											2021-01-24 14:48:56 +01:00
+								      message_(word_wrap("- Save the output and use it as input for future calculations, e.g. create a new variable to your data using `as.mo()`. All functions in this package that rely on microorganism codes will automatically use that new column where possible. All `mo_*()` functions also do not require you to set their `x` argument as long as you have a column of class <mo>.",
-												(v1.4.0.9044) mo tibble printing, mo_shortname() fix

											
										
										
											2020-12-24 23:29:10 +01:00
+								                         extra_indent = 2),
 								               as_note = FALSE)
-												(v1.5.0.9006) major documentation update

											
										
										
											2021-01-18 16:57:56 +01:00
+								      message_(word_wrap("- Use `set_mo_source()` to continually transform your organisation codes to microorganisms codes used by this package, see `?mo_source`.",
-												(v1.4.0.9044) mo tibble printing, mo_shortname() fix

											
										
										
											2020-12-24 23:29:10 +01:00
+								                         extra_indent = 2),
 								               as_note = FALSE)
 								    }
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								  }
-												(v1.4.0.9044) mo tibble printing, mo_shortname() fix

											
										
										
											2020-12-24 23:29:10 +01:00
-												(v1.5.0.9021) improve speed of %like%

											
										
										
											2021-02-21 20:15:09 +01:00
+								  if (isTRUE(debug) && initial_search == TRUE) {
 								    cat("Finished function", time_track(), "\n")
 								  }
-												fix for printing tibbles, improve guess_bactid

											
										
										
											2018-06-08 12:06:54 +02:00
+								  x
 								}
-												new class bactid

											
										
										
											2018-07-23 14:14:03 +02:00
-												unknown codes, rsi fix

											
										
										
											2019-03-02 22:47:04 +01:00
+								empty_result <- function(x) {
-												uncertainty levels, new WHONET codes

											
										
										
											2019-03-12 12:19:27 +01:00
+								  all(x %in% c(NA, "UNKNOWN"))
-												unknown codes, rsi fix

											
										
										
											2019-03-02 22:47:04 +01:00
+								}
-												mo codes for WHONET

											
										
										
											2019-02-08 16:06:54 +01:00
+								was_renamed <- function(name_old, name_new, ref_old = "", ref_new = "", mo = "") {
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								  newly_set <- data.frame(old_name = name_old,
-												(v0.7.1.9075) new microorganism codes

											
										
										
											2019-09-18 15:46:09 +02:00
+								                          old_ref = ref_old,
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								                          new_name = name_new,
 								                          new_ref = ref_new,
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								                          mo = mo,
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								                          stringsAsFactors = FALSE)
-												(v1.4.0.9046) get_episode

											
										
										
											2020-12-27 00:07:00 +01:00
+								  already_set <- pkg_env$mo_renamed
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								  if (!is.null(already_set)) {
-												(v1.4.0.9046) get_episode

											
										
										
											2020-12-27 00:07:00 +01:00
+								    pkg_env$mo_renamed = rbind(already_set,
-												(v1.4.0.9017) stringsAsFactors definitions

											
										
										
											2020-11-11 16:49:27 +01:00
+								                               newly_set,
-												(v1.4.0.9041) updates based on review

											
										
										
											2020-12-17 16:22:25 +01:00
+								                               stringsAsFactors = FALSE)
-												renamed year columns to ref

											
										
										
											2018-10-01 14:44:40 +02:00
+								  } else {
-												(v1.4.0.9046) get_episode

											
										
										
											2020-12-27 00:07:00 +01:00
+								    pkg_env$mo_renamed <- newly_set
-												renamed year columns to ref

											
										
										
											2018-10-01 14:44:40 +02:00
+								  }
-												new: 1680 old taxonomic names

											
										
										
											2018-09-25 16:44:40 +02:00
+								}
-												(v0.7.1.9058) as.mo() improvement

											
										
										
											2019-08-20 11:40:54 +02:00
+								format_uncertainty_as_df <- function(uncertainty_level,
 								                                     input,
-												(v1.3.0.9016) mo_uncertainties() overhaul

											
										
										
											2020-09-12 08:49:01 +02:00
+								                                     result_mo,
 								                                     candidates = NULL) {
-												(v1.4.0.9046) get_episode

											
										
										
											2020-12-27 00:07:00 +01:00
+								  if (!is.null(pkg_env$mo_renamed_last_run)) {
 								    fullname <- pkg_env$mo_renamed_last_run
 								    pkg_env$mo_renamed_last_run <- NULL
-												(v1.3.0.9016) mo_uncertainties() overhaul

											
										
										
											2020-09-12 08:49:01 +02:00
+								    renamed_to <- MO_lookup[match(result_mo, MO_lookup$mo), "fullname", drop = TRUE][1]
-												(v0.7.1.9058) as.mo() improvement

											
										
										
											2019-08-20 11:40:54 +02:00
+								  } else {
-												(v1.3.0.9016) mo_uncertainties() overhaul

											
										
										
											2020-09-12 08:49:01 +02:00
+								    fullname <- MO_lookup[match(result_mo, MO_lookup$mo), "fullname", drop = TRUE][1]
 								    renamed_to <- NA_character_
-												(v0.7.1.9058) as.mo() improvement

											
										
										
											2019-08-20 11:40:54 +02:00
+								  }
-												(v1.3.0.9016) mo_uncertainties() overhaul

											
										
										
											2020-09-12 08:49:01 +02:00
+								  data.frame(uncertainty = uncertainty_level,
 								             input = input,
 								             fullname = fullname,
 								             renamed_to = renamed_to,
 								             mo = result_mo,
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								             # save max 26 entries: the one to be chosen and 25 more
 								             candidates = if (length(candidates) > 1) paste(candidates[c(2:min(26, length(candidates)))], collapse = ", ") else "",
-												(v1.3.0.9016) mo_uncertainties() overhaul

											
										
										
											2020-09-12 08:49:01 +02:00
+								             stringsAsFactors = FALSE)
-												(v0.7.1.9058) as.mo() improvement

											
										
										
											2019-08-20 11:40:54 +02:00
+								}
-												(v1.3.0.9010) S3 extensions without dependencies

											
										
										
											2020-08-28 21:55:47 +02:00
+								# will be exported using s3_register() in R/zzz.R
-												(v1.3.0.9007) tibble printing

											
										
										
											2020-08-26 11:33:54 +02:00
+								pillar_shaft.mo <- function(x, ...) {
 								  out <- format(x)
 								  # grey out the kingdom (part until first "_")
-												(v1.3.0.9029) eucast rules fix, unique()

											
										
										
											2020-09-25 14:44:50 +02:00
+								  out[!is.na(x)] <- gsub("^([A-Z]+_)(.*)", paste0(font_subtle("\\1"), "\\2"), out[!is.na(x)], perl = TRUE)
-												(v1.3.0.9007) tibble printing

											
										
										
											2020-08-26 11:33:54 +02:00
+								  # and grey out every _
-												(v1.3.0.9010) S3 extensions without dependencies

											
										
										
											2020-08-28 21:55:47 +02:00
+								  out[!is.na(x)] <- gsub("_", font_subtle("_"), out[!is.na(x)])
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v1.3.0.9007) tibble printing

											
										
										
											2020-08-26 11:33:54 +02:00
+								  # markup NA and UNKNOWN
-												(v1.3.0.9010) S3 extensions without dependencies

											
										
										
											2020-08-28 21:55:47 +02:00
+								  out[is.na(x)] <- font_na("  NA")
 								  out[x == "UNKNOWN"] <- font_na("  UNKNOWN")
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												(v1.3.0.9007) tibble printing

											
										
										
											2020-08-26 11:33:54 +02:00
+								  # make it always fit exactly
-												(v1.4.0.9044) mo tibble printing, mo_shortname() fix

											
										
										
											2020-12-24 23:29:10 +01:00
+								  max_char <- max(nchar(x))
 								  if (is.na(max_char)) {
 								    max_char <- 7
 								  }
-												(v1.3.0.9010) S3 extensions without dependencies

											
										
										
											2020-08-28 21:55:47 +02:00
+								  create_pillar_column(out,
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								                       align = "left",
-												(v1.4.0.9044) mo tibble printing, mo_shortname() fix

											
										
										
											2020-12-24 23:29:10 +01:00
+								                       width = max_char + ifelse(any(x %in% c(NA, "UNKNOWN")), 2, 0))
-												(v1.3.0.9007) tibble printing

											
										
										
											2020-08-26 11:33:54 +02:00
+								}
-												(v1.3.0.9010) S3 extensions without dependencies

											
										
										
											2020-08-28 21:55:47 +02:00
+								# will be exported using s3_register() in R/zzz.R
-												(v1.3.0.9007) tibble printing

											
										
										
											2020-08-26 11:33:54 +02:00
+								type_sum.mo <- function(x, ...) {
 								  "mo"
 								}
-												(v1.3.0.9010) S3 extensions without dependencies

											
										
										
											2020-08-28 21:55:47 +02:00
+								# will be exported using s3_register() in R/zzz.R
 								freq.mo <- function(x, ...) {
 								  x_noNA <- as.mo(x[!is.na(x)]) # as.mo() to get the newest mo codes
 								  grams <- mo_gramstain(x_noNA, language = NULL)
 								  digits <- list(...)$digits
 								  if (is.null(digits)) {
 								    digits <- 2
 								  }
-												(v1.4.0.9041) updates based on review

											
										
										
											2020-12-17 16:22:25 +01:00
+								  cleaner::freq.default(
 								    x = x,
 								    ...,
 								    .add_header = list(
 								      `Gram-negative` = paste0(
 								        format(sum(grams == "Gram-negative", na.rm = TRUE),
 								               big.mark = ",",
 								               decimal.mark = "."),
 								        " (", percentage(sum(grams == "Gram-negative", na.rm = TRUE) / length(grams),
 								                         digits = digits),
 								        ")"),
 								      `Gram-positive` = paste0(
 								        format(sum(grams == "Gram-positive", na.rm = TRUE),
 								               big.mark = ",",
 								               decimal.mark = "."),
 								        " (", percentage(sum(grams == "Gram-positive", na.rm = TRUE) / length(grams),
 								                         digits = digits),
 								        ")"),
 								      `Nr. of genera` = pm_n_distinct(mo_genus(x_noNA, language = NULL)),
 								      `Nr. of species` = pm_n_distinct(paste(mo_genus(x_noNA, language = NULL),
 								                                             mo_species(x_noNA, language = NULL)))))
-												(v1.3.0.9010) S3 extensions without dependencies

											
										
										
											2020-08-28 21:55:47 +02:00
+								}
-												(v1.3.0.9032) support skimr

											
										
										
											2020-09-28 01:08:55 +02:00
+								# will be exported using s3_register() in R/zzz.R
 								get_skimmers.mo <- function(column) {
-												(v1.4.0.9041) updates based on review

											
										
										
											2020-12-17 16:22:25 +01:00
+								  skimr::sfl(
-												(v1.3.0.9032) support skimr

											
										
										
											2020-09-28 01:08:55 +02:00
+								    skim_type = "mo",
-												(v1.3.0.9033) skimr fix

											
										
										
											2020-09-28 11:00:59 +02:00
+								    unique_total = ~pm_n_distinct(., na.rm = TRUE),
-												(v1.4.0.9020) mo_is_intrinsic_resistant

											
										
										
											2020-11-16 11:03:24 +01:00
+								    gram_negative = ~sum(mo_is_gram_negative(stats::na.omit(.))),
 								    gram_positive = ~sum(mo_is_gram_positive(stats::na.omit(.))),
-												(v1.3.0.9032) support skimr

											
										
										
											2020-09-28 01:08:55 +02:00
+								    top_genus = ~names(sort(-table(mo_genus(stats::na.omit(.), language = NULL))))[1L],
 								    top_species = ~names(sort(-table(mo_name(stats::na.omit(.), language = NULL))))[1L]
 								  )
 								}
-												v1.2.0

											
										
										
											2020-05-28 16:48:55 +02:00
+								#' @method print mo
-												replaced bactid by mo

											
										
										
											2018-08-31 13:36:19 +02:00
+								#' @export
 								#' @noRd
-												(v1.3.0.9007) tibble printing

											
										
										
											2020-08-26 11:33:54 +02:00
+								print.mo <- function(x, print.shortnames = FALSE, ...) {
-												(v1.1.0.9020) updated taxonomy

											
										
										
											2020-05-27 16:37:49 +02:00
+								  cat("Class <mo>\n")
-												count_all and some fixes

											
										
										
											2018-10-12 16:35:18 +02:00
+								  x_names <- names(x)
-												(v1.3.0.9007) tibble printing

											
										
										
											2020-08-26 11:33:54 +02:00
+								  if (is.null(x_names) & print.shortnames == TRUE) {
 								    x_names <- tryCatch(mo_shortname(x, ...), error = function(e) NULL)
 								  }
-												count_all and some fixes

											
										
										
											2018-10-12 16:35:18 +02:00
+								  x <- as.character(x)
 								  names(x) <- x_names
 								  print.default(x, quote = FALSE)
-												replaced bactid by mo

											
										
										
											2018-08-31 13:36:19 +02:00
+								}
-												new class bactid

											
										
										
											2018-07-23 14:14:03 +02:00
-												v1.2.0

											
										
										
											2020-05-28 16:48:55 +02:00
+								#' @method summary mo
-												AI improvements

											
										
										
											2018-12-07 12:04:55 +01:00
+								#' @export
 								#' @noRd
 								summary.mo <- function(object, ...) {
 								  # unique and top 1-3
-												(v1.1.0.9004) lose dependencies

											
										
										
											2020-05-16 13:05:47 +02:00
+								  x <- as.mo(object) # force again, could be mo from older pkg version
 								  top <- as.data.frame(table(x), responseName = "n", stringsAsFactors = FALSE)
 								  top_3 <- top[order(-top$n), 1][1:3]
-												(v1.2.0.9035) as.mo() speed improvement

											
										
										
											2020-07-22 10:24:23 +02:00
+								  value <- c("Class" = "mo",
-												(v1.3.0.9010) S3 extensions without dependencies

											
										
										
											2020-08-28 21:55:47 +02:00
+								             "<NA>" = length(x[is.na(x)]),
-												(v1.3.0.9022) mo_matching_score(), poorman update, as.rsi() fix

											
										
										
											2020-09-18 16:05:53 +02:00
+								             "Unique" = pm_n_distinct(x[!is.na(x)]),
-												(v1.3.0.9010) S3 extensions without dependencies

											
										
										
											2020-08-28 21:55:47 +02:00
+								             "#1" = top_3[1],
 								             "#2" = top_3[2],
 								             "#3" = top_3[3])
-												(v1.2.0.9035) as.mo() speed improvement

											
										
										
											2020-07-22 10:24:23 +02:00
+								  class(value) <- c("summaryDefault", "table")
 								  value
-												AI improvements

											
										
										
											2018-12-07 12:04:55 +01:00
+								}
-												v1.2.0

											
										
										
											2020-05-28 16:48:55 +02:00
+								#' @method as.data.frame mo
-												new class bactid

											
										
										
											2018-07-23 14:14:03 +02:00
+								#' @export
-												replaced bactid by mo

											
										
										
											2018-08-31 13:36:19 +02:00
+								#' @noRd
-												(v1.1.0.9013) lose dependencies

											
										
										
											2020-05-19 13:18:01 +02:00
+								as.data.frame.mo <- function(x, ...) {
-												(v1.1.0.9012) lose dependencies

											
										
										
											2020-05-19 12:08:49 +02:00
+								  nm <- deparse1(substitute(x))
-												replaced bactid by mo

											
										
										
											2018-08-31 13:36:19 +02:00
+								  if (!"nm" %in% names(list(...))) {
-												(v1.1.0.9012) lose dependencies

											
										
										
											2020-05-19 12:08:49 +02:00
+								    as.data.frame.vector(as.mo(x), ..., nm = nm)
-												replaced bactid by mo

											
										
										
											2018-08-31 13:36:19 +02:00
+								  } else {
-												(v1.1.0.9012) lose dependencies

											
										
										
											2020-05-19 12:08:49 +02:00
+								    as.data.frame.vector(as.mo(x), ...)
-												replaced bactid by mo

											
										
										
											2018-08-31 13:36:19 +02:00
+								  }
 								}
-												v1.2.0

											
										
										
											2020-05-28 16:48:55 +02:00
+								#' @method [ mo
-												replaced bactid by mo

											
										
										
											2018-08-31 13:36:19 +02:00
+								#' @export
 								#' @noRd
-												(v0.7.1.9056) mo and ab subsetting

											
										
										
											2019-08-14 14:57:06 +02:00
+								"[.mo" <- function(x, ...) {
-												(v0.7.1.9036) preserve ab/mo classes in subsetting

											
										
										
											2019-08-12 14:48:09 +02:00
+								  y <- NextMethod()
-												(v0.7.1.9056) mo and ab subsetting

											
										
										
											2019-08-14 14:57:06 +02:00
+								  attributes(y) <- attributes(x)
 								  y
 								}
-												v1.2.0

											
										
										
											2020-05-28 16:48:55 +02:00
+								#' @method [[ mo
-												(v0.7.1.9056) mo and ab subsetting

											
										
										
											2019-08-14 14:57:06 +02:00
+								#' @export
 								#' @noRd
-												(v0.7.1.9062) mo/ab assignment improvements

											
										
										
											2019-08-26 16:02:03 +02:00
+								"[[.mo" <- function(x, ...) {
-												(v0.7.1.9056) mo and ab subsetting

											
										
										
											2019-08-14 14:57:06 +02:00
+								  y <- NextMethod()
-												(v0.7.1.9062) mo/ab assignment improvements

											
										
										
											2019-08-26 16:02:03 +02:00
+								  attributes(y) <- attributes(x)
-												(v0.7.1.9056) mo and ab subsetting

											
										
										
											2019-08-14 14:57:06 +02:00
+								  y
 								}
-												v1.2.0

											
										
										
											2020-05-28 16:48:55 +02:00
+								#' @method [<- mo
-												(v0.7.1.9056) mo and ab subsetting

											
										
										
											2019-08-14 14:57:06 +02:00
+								#' @export
 								#' @noRd
-												(v0.7.1.9062) mo/ab assignment improvements

											
										
										
											2019-08-26 16:02:03 +02:00
+								"[<-.mo" <- function(i, j, ..., value) {
-												(v0.7.1.9056) mo and ab subsetting

											
										
										
											2019-08-14 14:57:06 +02:00
+								  y <- NextMethod()
-												(v0.7.1.9062) mo/ab assignment improvements

											
										
										
											2019-08-26 16:02:03 +02:00
+								  attributes(y) <- attributes(i)
-												(v1.0.1.9005) as.mo() improvements

											
										
										
											2020-04-13 21:09:56 +02:00
+								  # must only contain valid MOs
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								  class_integrity_check(y, "microorganism code", c(as.character(microorganisms$mo),
-												(v0.9.0.9005) as.mo for G. species

											
										
										
											2019-12-21 10:56:06 +01:00
+								                                                   as.character(microorganisms.translation$mo_old)))
-												(v0.7.1.9056) mo and ab subsetting

											
										
										
											2019-08-14 14:57:06 +02:00
+								}
-												v1.2.0

											
										
										
											2020-05-28 16:48:55 +02:00
+								#' @method [[<- mo
-												(v0.7.1.9056) mo and ab subsetting

											
										
										
											2019-08-14 14:57:06 +02:00
+								#' @export
 								#' @noRd
-												(v0.7.1.9062) mo/ab assignment improvements

											
										
										
											2019-08-26 16:02:03 +02:00
+								"[[<-.mo" <- function(i, j, ..., value) {
-												(v0.7.1.9056) mo and ab subsetting

											
										
										
											2019-08-14 14:57:06 +02:00
+								  y <- NextMethod()
-												(v0.7.1.9062) mo/ab assignment improvements

											
										
										
											2019-08-26 16:02:03 +02:00
+								  attributes(y) <- attributes(i)
-												(v1.0.1.9005) as.mo() improvements

											
										
										
											2020-04-13 21:09:56 +02:00
+								  # must only contain valid MOs
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								  class_integrity_check(y, "microorganism code", c(as.character(microorganisms$mo),
-												(v0.9.0.9005) as.mo for G. species

											
										
										
											2019-12-21 10:56:06 +01:00
+								                                                   as.character(microorganisms.translation$mo_old)))
-												(v0.7.1.9056) mo and ab subsetting

											
										
										
											2019-08-14 14:57:06 +02:00
+								}
-												v1.2.0

											
										
										
											2020-05-28 16:48:55 +02:00
+								#' @method c mo
-												(v0.7.1.9056) mo and ab subsetting

											
										
										
											2019-08-14 14:57:06 +02:00
+								#' @export
 								#' @noRd
 								c.mo <- function(x, ...) {
 								  y <- NextMethod()
 								  attributes(y) <- attributes(x)
-												(v1.0.1.9005) as.mo() improvements

											
										
										
											2020-04-13 21:09:56 +02:00
+								  # must only contain valid MOs
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								  class_integrity_check(y, "microorganism code", c(as.character(microorganisms$mo),
-												(v0.9.0.9005) as.mo for G. species

											
										
										
											2019-12-21 10:56:06 +01:00
+								                                                   as.character(microorganisms.translation$mo_old)))
-												new class bactid

											
										
										
											2018-07-23 14:14:03 +02:00
+								}
-												better as.mo handling

											
										
										
											2018-12-06 14:36:39 +01:00
-												(v1.3.0.9029) eucast rules fix, unique()

											
										
										
											2020-09-25 14:44:50 +02:00
+								#' @method unique mo
 								#' @export
 								#' @noRd
 								unique.mo <- function(x, incomparables = FALSE, ...) {
 								  y <- NextMethod()
 								  attributes(y) <- attributes(x)
 								  y
 								}
-												(v1.5.0.9024) more speed improvements

											
										
										
											2021-02-22 20:21:33 +01:00
+								#' @method rep mo
 								#' @export
 								#' @noRd
 								rep.mo <- function(x, ...) {
 								  y <- NextMethod()
 								  attributes(y) <- attributes(x)
 								  y
 								}
-												mo codes for WHONET

											
										
										
											2019-02-08 16:06:54 +01:00
+								#' @rdname as.mo
-												better as.mo handling

											
										
										
											2018-12-06 14:36:39 +01:00
+								#' @export
 								mo_failures <- function() {
-												(v1.4.0.9046) get_episode

											
										
										
											2020-12-27 00:07:00 +01:00
+								  pkg_env$mo_failures
-												better as.mo handling

											
										
										
											2018-12-06 14:36:39 +01:00
+								}
-												mo codes for WHONET

											
										
										
											2019-02-08 16:06:54 +01:00
+								#' @rdname as.mo
 								#' @export
 								mo_uncertainties <- function() {
-												(v1.4.0.9046) get_episode

											
										
										
											2020-12-27 00:07:00 +01:00
+								  if (is.null(pkg_env$mo_uncertainties)) {
-												(v0.7.1.9005) new rsi calculations, atc class removal

											
										
										
											2019-07-01 14:03:15 +02:00
+								    return(NULL)
 								  }
-												(v1.4.0.9046) get_episode

											
										
										
											2020-12-27 00:07:00 +01:00
+								  set_clean_class(as.data.frame(pkg_env$mo_uncertainties,
-												(v1.4.0.9021) more robust class setting

											
										
										
											2020-11-16 16:57:55 +01:00
+								                                stringsAsFactors = FALSE),
 								                  new_class = c("mo_uncertainties", "data.frame"))
-												rlang dependency, new fungi

											
										
										
											2019-02-28 13:56:28 +01:00
+								}
-												v1.2.0

											
										
										
											2020-05-28 16:48:55 +02:00
+								#' @method print mo_uncertainties
-												rlang dependency, new fungi

											
										
										
											2019-02-28 13:56:28 +01:00
+								#' @export
 								#' @noRd
 								print.mo_uncertainties <- function(x, ...) {
-												uncertainty levels, new WHONET codes

											
										
										
											2019-03-12 12:19:27 +01:00
+								  if (NROW(x) == 0) {
 								    return(NULL)
 								  }
-												(v1.5.0.9024) more speed improvements

											
										
										
											2021-02-22 20:21:33 +01:00
+								  message_("Matching scores are based on human pathogenic prevalence and the resemblance between the input and the full taxonomic name. See `?mo_matching_score`.", as_note = FALSE)
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
-												age_groups fix

											
										
										
											2019-02-27 11:36:12 +01:00
+								  msg <- ""
-												(v0.7.1.9102) lintr

											
										
										
											2019-10-11 17:21:02 +02:00
+								  for (i in seq_len(nrow(x))) {
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								    if (x[i, ]$candidates != "") {
 								      candidates <- unlist(strsplit(x[i, ]$candidates, ", ", fixed = TRUE))
-												(v1.3.0.9033) skimr fix

											
										
										
											2020-09-28 11:00:59 +02:00
+								      scores <- mo_matching_score(x = x[i, ]$input, n = candidates)
-												(v1.3.0.9016) mo_uncertainties() overhaul

											
										
										
											2020-09-12 08:49:01 +02:00
+								      # sort on descending scores
 								      candidates <- candidates[order(1 - scores)]
-												(v1.3.0.9033) skimr fix

											
										
										
											2020-09-28 11:00:59 +02:00
+								      scores_formatted <- trimws(formatC(round(scores, 3), format = "f", digits = 3))
-												(v1.3.0.9018) language corrections

											
										
										
											2020-09-14 12:21:23 +02:00
+								      n_candidates <- length(candidates)
-												(v1.5.0.9019) use functions without loading AMR pkg

											
										
										
											2021-02-17 10:58:13 +01:00
+								      candidates <- vector_and(paste0(candidates, " (", scores_formatted[order(1 - scores)], ")"),
 								                               quotes = FALSE,
 								                               sort = FALSE)
-												(v1.3.0.9016) mo_uncertainties() overhaul

											
										
										
											2020-09-12 08:49:01 +02:00
+								      # align with input after arrow
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								      candidates <- paste0("\n",
-												(v1.3.0.9038) prefinal 1.4.0

											
										
										
											2020-10-04 19:26:43 +02:00
+								                           strwrap(paste0("Also matched",
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								                                          ifelse(n_candidates >= 25, " (max 25)", ""), ": ",
-												(v1.3.0.9038) prefinal 1.4.0

											
										
										
											2020-10-04 19:26:43 +02:00
+								                                          candidates), # this is already max 25 due to format_uncertainty_as_df()
 								                                   indent = nchar(x[i, ]$input) + 6,
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								                                   exdent = nchar(x[i, ]$input) + 6,
-												(v1.3.0.9038) prefinal 1.4.0

											
										
										
											2020-10-04 19:26:43 +02:00
+								                                   width = 0.98 * getOption("width")),
 								                           collapse = "")
 								      # after strwrap, make taxonomic names italic
-												(v1.5.0.9023) mo properties speed improvement

											
										
										
											2021-02-21 23:19:40 +01:00
+								      candidates <- gsub("([A-Za-z]+)", font_italic("\\1"), candidates, perl = TRUE)
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								      candidates <- gsub(paste(font_italic(c("Also", "matched"), collapse = NULL), collapse = " "),
-												(v1.3.0.9038) prefinal 1.4.0

											
										
										
											2020-10-04 19:26:43 +02:00
+								                         "Also matched",
 								                         candidates, fixed = TRUE)
 								      candidates <- gsub(font_italic("max"), "max", candidates, fixed = TRUE)
-												(v1.3.0.9016) mo_uncertainties() overhaul

											
										
										
											2020-09-12 08:49:01 +02:00
+								    } else {
 								      candidates <- ""
 								    }
-												(v1.3.0.9033) skimr fix

											
										
										
											2020-09-28 11:00:59 +02:00
+								    score <- trimws(formatC(round(mo_matching_score(x = x[i, ]$input,
 								                                                    n = x[i, ]$fullname),
 ),
 								                            format = "f", digits = 3))
-												age_groups fix

											
										
										
											2019-02-27 11:36:12 +01:00
+								    msg <- paste(msg,
-												(v1.3.0.9038) prefinal 1.4.0

											
										
										
											2020-10-04 19:26:43 +02:00
+								                 paste0(
 								                   strwrap(
 								                     paste0('"', x[i, ]$input, '" -> ',
 								                            paste0(font_bold(font_italic(x[i, ]$fullname)),
 								                                   ifelse(!is.na(x[i, ]$renamed_to), paste(", renamed to", font_italic(x[i, ]$renamed_to)), ""),
 								                                   " (", x[i, ]$mo,
 								                                   ", matching score = ", score,
 								                                   ") ")),
 								                     width = 0.98 * getOption("width"),
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								                     exdent = nchar(x[i, ]$input) + 6),
-												(v1.3.0.9038) prefinal 1.4.0

											
										
										
											2020-10-04 19:26:43 +02:00
+								                   collapse = "\n"),
 								                 candidates,
-												age_groups fix

											
										
										
											2019-02-27 11:36:12 +01:00
+								                 sep = "\n")
-												(v1.3.0.9038) prefinal 1.4.0

											
										
										
											2020-10-04 19:26:43 +02:00
+								    msg <- paste0(gsub("\n\n", "\n", msg), "\n\n")
-												age_groups fix

											
										
										
											2019-02-27 11:36:12 +01:00
+								  }
-												rlang dependency, new fungi

											
										
										
											2019-02-28 13:56:28 +01:00
+								  cat(msg)
-												mo codes for WHONET

											
										
										
											2019-02-08 16:06:54 +01:00
+								}
 								#' @rdname as.mo
-												better as.mo handling

											
										
										
											2018-12-06 14:36:39 +01:00
+								#' @export
 								mo_renamed <- function() {
-												(v1.4.0.9046) get_episode

											
										
										
											2020-12-27 00:07:00 +01:00
+								  items <- pkg_env$mo_renamed
-												(v0.7.1.9005) new rsi calculations, atc class removal

											
										
										
											2019-07-01 14:03:15 +02:00
+								  if (is.null(items)) {
-												(v1.4.0.9017) stringsAsFactors definitions

											
										
										
											2020-11-11 16:49:27 +01:00
+								    items <- data.frame(stringsAsFactors = FALSE)
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								  } else {
-												(v1.3.0.9022) mo_matching_score(), poorman update, as.rsi() fix

											
										
										
											2020-09-18 16:05:53 +02:00
+								    items <- pm_distinct(items, old_name, .keep_all = TRUE)
-												(v0.7.1.9005) new rsi calculations, atc class removal

											
										
										
											2019-07-01 14:03:15 +02:00
+								  }
-												(v1.4.0.9021) more robust class setting

											
										
										
											2020-11-16 16:57:55 +01:00
+								  set_clean_class(as.data.frame(items,
 								                                stringsAsFactors = FALSE),
 								                  new_class = c("mo_renamed", "data.frame"))
-												rlang dependency, new fungi

											
										
										
											2019-02-28 13:56:28 +01:00
+								}
-												v1.2.0

											
										
										
											2020-05-28 16:48:55 +02:00
+								#' @method print mo_renamed
-												rlang dependency, new fungi

											
										
										
											2019-02-28 13:56:28 +01:00
+								#' @export
 								#' @noRd
 								print.mo_renamed <- function(x, ...) {
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								  if (NROW(x) == 0) {
 								    return(invisible())
 								  }
-												(v0.7.1.9102) lintr

											
										
										
											2019-10-11 17:21:02 +02:00
+								  for (i in seq_len(nrow(x))) {
-												(v1.4.0.9011) message formatting

											
										
										
											2020-10-27 15:56:51 +01:00
+								    message_(font_italic(x$old_name[i]),
 								             ifelse(x$old_ref[i] %in% c("", NA),
 								                    "",
 								                    paste0(" (",  gsub("et al.", font_italic("et al."), x$old_ref[i]), ")")),
 								             " was renamed ",
-												(v1.5.0.9028) Updated taxonomy until March 2021

											
										
										
											2021-03-04 23:28:32 +01:00
+								             ifelse(!x$new_ref[i] %in% c("", NA) && as.integer(gsub("[^0-9]", "", x$new_ref[i])) < as.integer(gsub("[^0-9]", "", x$old_ref[i])),
-												(v1.4.0.9011) message formatting

											
										
										
											2020-10-27 15:56:51 +01:00
+								                    font_bold("back to "),
 								                    ""),
 								             font_italic(x$new_name[i]),
 								             ifelse(x$new_ref[i] %in% c("", NA),
 								                    "",
 								                    paste0(" (",  gsub("et al.", font_italic("et al."), x$new_ref[i]), ")")),
 								             " [", x$mo[i], "]")
-												(v0.7.1.9073) as.mo() self-learning algorithm

											
										
										
											2019-09-15 22:57:30 +02:00
+								  }
-												age_groups fix

											
										
										
											2019-02-27 11:36:12 +01:00
+								}
 								nr2char <- function(x) {
 								  if (x %in% c(1:10)) {
 								    v <- c("one" = 1, "two" = 2, "three" = 3, "four" = 4, "five" = 5,
 								           "six" = 6, "seven" = 7, "eight" = 8, "nine" = 9, "ten" = 10)
 								    names(v[x])
 								  } else {
 								    x
 								  }
-												better as.mo handling

											
										
										
											2018-12-06 14:36:39 +01:00
+								}
-												memory for as.mo()

											
										
										
											2019-03-15 13:57:25 +01:00
 								unregex <- function(x) {
 								  gsub("[^a-zA-Z0-9 -]", "", x)
 								}
-												DSMZ data

											
										
										
											2019-03-18 14:29:41 +01:00
-												added Becker 2019

											
										
										
											2019-03-26 14:24:03 +01:00
+								translate_allow_uncertain <- function(allow_uncertain) {
 								  if (isTRUE(allow_uncertain)) {
 								    # default to uncertainty level 2
 								    allow_uncertain <- 2
 								  } else {
-												(v0.7.1.9055) algorithm improvements

											
										
										
											2019-08-13 16:15:08 +02:00
+								    allow_uncertain[tolower(allow_uncertain) == "none"] <- 0
 								    allow_uncertain[tolower(allow_uncertain) == "all"] <- 3
-												added Becker 2019

											
										
										
											2019-03-26 14:24:03 +01:00
+								    allow_uncertain <- as.integer(allow_uncertain)
-												(v1.2.0.9011) mo_domain(), improved error handling

											
										
										
											2020-06-22 11:18:40 +02:00
+								    stop_ifnot(allow_uncertain %in% c(0:3),
 								               '`allow_uncertain` must be a number between 0 (or "none") and 3 (or "all"), or TRUE (= 2) or FALSE (= 0)', call = FALSE)
-												added Becker 2019

											
										
										
											2019-03-26 14:24:03 +01:00
+								  }
 								  allow_uncertain
 								}
-												(v0.7.1.9005) new rsi calculations, atc class removal

											
										
										
											2019-07-01 14:03:15 +02:00
 								get_mo_failures_uncertainties_renamed <- function() {
-												(v1.4.0.9046) get_episode

											
										
										
											2020-12-27 00:07:00 +01:00
+								  remember <- list(failures = pkg_env$mo_failures,
 								                   uncertainties = pkg_env$mo_uncertainties,
 								                   renamed = pkg_env$mo_renamed)
-												(v1.1.0.9020) updated taxonomy

											
										
										
											2020-05-27 16:37:49 +02:00
+								  # empty them, otherwise mo_shortname("Chlamydophila psittaci") will give 3 notes
-												(v1.4.0.9046) get_episode

											
										
										
											2020-12-27 00:07:00 +01:00
+								  pkg_env$mo_failures <- NULL
 								  pkg_env$mo_uncertainties <- NULL
 								  pkg_env$mo_renamed <- NULL
-												(v1.1.0.9020) updated taxonomy

											
										
										
											2020-05-27 16:37:49 +02:00
+								  remember
-												(v0.7.1.9005) new rsi calculations, atc class removal

											
										
										
											2019-07-01 14:03:15 +02:00
+								}
 								load_mo_failures_uncertainties_renamed <- function(metadata) {
-												(v1.4.0.9046) get_episode

											
										
										
											2020-12-27 00:07:00 +01:00
+								  pkg_env$mo_failures <- metadata$failures
 								  pkg_env$mo_uncertainties <- metadata$uncertainties
 								  pkg_env$mo_renamed <- metadata$renamed
-												(v0.7.1.9005) new rsi calculations, atc class removal

											
										
										
											2019-07-01 14:03:15 +02:00
+								}
-												(v0.8.0.9031) as.mo() improvements

											
										
										
											2019-11-15 15:25:03 +01:00
-												(v1.0.1.9005) as.mo() improvements

											
										
										
											2020-04-13 21:09:56 +02:00
+								trimws2 <- function(x) {
 								  trimws(gsub("[\\s]+", " ", x, perl = TRUE))
 								}
-												(v1.0.1.9007) small bugfix

											
										
										
											2020-04-14 15:10:09 +02:00
+								parse_and_convert <- function(x) {
-												(v1.0.1.9005) as.mo() improvements

											
										
										
											2020-04-13 21:09:56 +02:00
+								  tryCatch({
-												(v1.0.1.9007) small bugfix

											
										
										
											2020-04-14 15:10:09 +02:00
+								    if (!is.null(dim(x))) {
 								      if (NCOL(x) > 2) {
-												(v1.2.0.9011) mo_domain(), improved error handling

											
										
										
											2020-06-22 11:18:40 +02:00
+								        stop("a maximum of two columns is allowed", call. = FALSE)
-												(v1.0.1.9007) small bugfix

											
										
										
											2020-04-14 15:10:09 +02:00
+								      } else if (NCOL(x) == 2) {
-												(v1.5.0.9022) mo properties speed improvement

											
										
										
											2021-02-21 22:56:35 +01:00
+								        # support Tidyverse selection like: df %>% select(colA, colB)
-												(v1.0.1.9007) small bugfix

											
										
										
											2020-04-14 15:10:09 +02:00
+								        # paste these columns together
 								        x <- as.data.frame(x, stringsAsFactors = FALSE)
 								        colnames(x) <- c("A", "B")
 								        x <- paste(x$A, x$B)
 								      } else {
-												(v1.5.0.9022) mo properties speed improvement

											
										
										
											2021-02-21 22:56:35 +01:00
+								        # support Tidyverse selection like: df %>% select(colA)
-												(v1.0.1.9007) small bugfix

											
										
										
											2020-04-14 15:10:09 +02:00
+								        x <- as.data.frame(x, stringsAsFactors = FALSE)[[1]]
 								      }
 								    }
 								    x[is.null(x)] <- NA
-												(v1.0.1.9005) as.mo() improvements

											
										
										
											2020-04-13 21:09:56 +02:00
+								    parsed <- iconv(x, to = "UTF-8")
 								    parsed[is.na(parsed) & !is.na(x)] <- iconv(x[is.na(parsed) & !is.na(x)], from = "Latin1", to = "ASCII//TRANSLIT")
-												(v1.0.1.9006) added generic CLSI rules

											
										
										
											2020-04-14 14:12:31 +02:00
+								    parsed <- gsub('"', "", parsed, fixed = TRUE)
-												(v1.5.0.9022) mo properties speed improvement

											
										
										
											2021-02-21 22:56:35 +01:00
+								    parsed <- gsub(" +", " ", parsed, perl = TRUE)
 								    parsed <- trimws(parsed)
-												(v1.0.1.9006) added generic CLSI rules

											
										
										
											2020-04-14 14:12:31 +02:00
+								  }, error = function(e) stop(e$message, call. = FALSE)) # this will also be thrown when running `as.mo(no_existing_object)`
 								  parsed
-												(v1.0.1.9005) as.mo() improvements

											
										
										
											2020-04-13 21:09:56 +02:00
+								}
-												(v1.1.0.9004) lose dependencies

											
										
										
											2020-05-16 13:05:47 +02:00
-												(v1.2.0.9036) unit test fix

											
										
										
											2020-07-22 12:29:51 +02:00
+								replace_old_mo_codes <- function(x, property) {
-												(v1.2.0.9035) as.mo() speed improvement

											
										
										
											2020-07-22 10:24:23 +02:00
+								  if (any(toupper(x) %in% microorganisms.translation$mo_old, na.rm = TRUE)) {
 								    # get the ones that match
 								    matched <- match(toupper(x), microorganisms.translation$mo_old)
 								    # and their new codes
 								    mo_new <- microorganisms.translation$mo_new[matched]
 								    # assign on places where a match was found
 								    x[which(!is.na(matched))] <- mo_new[which(!is.na(matched))]
-												(v1.4.0.9015) bugfix

											
										
										
											2020-11-10 16:35:56 +01:00
+								    n_matched <- length(matched[!is.na(matched)])
-												(v1.2.0.9036) unit test fix

											
										
										
											2020-07-22 12:29:51 +02:00
+								    if (property != "mo") {
-												(v1.5.0.9024) more speed improvements

											
										
										
											2021-02-22 20:21:33 +01:00
+								      message_(font_blue("The input contained old microbial codes (from previous package versions). Please update your MO codes with `as.mo()`."))
-												(v1.4.0.9008) like variations

											
										
										
											2020-10-26 12:23:03 +01:00
+								    } else {
-												(v1.4.0.9015) bugfix

											
										
										
											2020-11-10 16:35:56 +01:00
+								      if (n_matched == 1) {
-												(v.1.5.0.9000) implementation of EUCAST rules v11 (2021)

											
										
										
											2021-01-12 22:08:04 +01:00
+								        message_(font_blue("1 old microbial code (from previous package versions) was updated to a current used MO code."))
-												(v1.4.0.9012) reference_df fix

											
										
										
											2020-11-05 01:11:49 +01:00
+								      } else {
-												(v.1.5.0.9000) implementation of EUCAST rules v11 (2021)

											
										
										
											2021-01-12 22:08:04 +01:00
+								        message_(font_blue(n_matched, "old microbial codes (from previous package versions) were updated to current used MO codes."))
-												(v1.4.0.9012) reference_df fix

											
										
										
											2020-11-05 01:11:49 +01:00
+								      }
-												(v1.2.0.9036) unit test fix

											
										
										
											2020-07-22 12:29:51 +02:00
+								    }
-												(v1.2.0.9035) as.mo() speed improvement

											
										
										
											2020-07-22 10:24:23 +02:00
+								  }
 								  x
 								}
-												(v1.3.0.9014) as.mo() speed improvement

											
										
										
											2020-09-03 12:31:48 +02:00
+								replace_ignore_pattern <- function(x, ignore_pattern) {
 								  if (!is.null(ignore_pattern) && !identical(trimws2(ignore_pattern), "")) {
 								    ignore_cases <- x %like% ignore_pattern
 								    if (sum(ignore_cases) > 0) {
-												(v1.4.0.9011) message formatting

											
										
										
											2020-10-27 15:56:51 +01:00
+								      message_("The following input was ignored by `ignore_pattern = \"", ignore_pattern, "\"`: ",
-												(v1.5.0.9015) unit test fix, grouped first isolates

											
										
										
											2021-02-04 16:48:16 +01:00
+								               vector_and(x[ignore_cases], quotes = TRUE))
 								      x[ignore_cases] <- NA_character_
-												(v1.3.0.9014) as.mo() speed improvement

											
										
										
											2020-09-03 12:31:48 +02:00
+								    }
 								  }
 								  x
 								}
-												(v1.4.0.9012) reference_df fix

											
										
										
											2020-11-05 01:11:49 +01:00
+								repair_reference_df <- function(reference_df) {
 								  # has valid own reference_df
 								  reference_df <- reference_df %pm>%
 								    pm_filter(!is.na(mo))
 								  # keep only first two columns, second must be mo
 								  if (colnames(reference_df)[1] == "mo") {
 								    reference_df <- reference_df %pm>% pm_select(2, "mo")
 								  } else {
 								    reference_df <- reference_df %pm>% pm_select(1, "mo")
 								  }
-												(v1.4.0.9015) bugfix

											
										
										
											2020-11-10 16:35:56 +01:00
-												(v1.4.0.9012) reference_df fix

											
										
										
											2020-11-05 01:11:49 +01:00
+								  # remove factors, just keep characters
 								  colnames(reference_df)[1] <- "x"
-												(v1.4.0.9015) bugfix

											
										
										
											2020-11-10 16:35:56 +01:00
+								  reference_df[, "x"] <- as.character(reference_df[, "x", drop = TRUE])
 								  reference_df[, "mo"] <- as.character(reference_df[, "mo", drop = TRUE])
 								  # some microbial codes might be old
 								  reference_df[, "mo"] <- as.mo(reference_df[, "mo", drop = TRUE])
-												(v1.4.0.9012) reference_df fix

											
										
										
											2020-11-05 01:11:49 +01:00
+								  reference_df
 								}
-												(v1.5.0.9012) ampc_cephalosporin_resistance for I

											
										
										
											2021-01-25 21:58:00 +01:00
 								strip_words <- function(text, n, side = "right") {
-												(v1.5.0.9013) updated tibble printing colours

											
										
										
											2021-01-28 16:09:30 +01:00
+								  out <- lapply(strsplit(text, " "), function(x) {
-												(v1.5.0.9012) ampc_cephalosporin_resistance for I

											
										
										
											2021-01-25 21:58:00 +01:00
+								    if (side %like% "^r" & length(x) > n) {
 								      x[seq_len(length(x) - n)]
 								    } else if (side %like% "^l" & length(x) > n) {
 								      x[2:length(x)]
 								    }
 								  })
 								  vapply(FUN.VALUE = character(1), out, paste, collapse = " ")
 								}