(v0.7.1.9073) as.mo() self-learning algorithm

2025-08-24 17:02:10 +02:00 · 2019-09-15 22:57:30 +02:00
parent cd178ee569
commit 398c5bdc4f
31 changed files with 1030 additions and 2360 deletions
--- a/man/as.mo.Rd
+++ b/man/as.mo.Rd
@@ -1,5 +1,5 @@
 % Generated by roxygen2: do not edit by hand
-% Please edit documentation in R/mo.R
+% Please edit documentation in R/mo.R, R/mo_history.R
 \name{as.mo}
 \alias{as.mo}
 \alias{mo}
@@ -7,6 +7,7 @@
 \alias{mo_failures}
 \alias{mo_uncertainties}
 \alias{mo_renamed}
+\alias{clear_mo_history}
 \title{Transform to microorganism ID}
 \usage{
 as.mo(x, Becker = FALSE, Lancefield = FALSE, allow_uncertain = TRUE,
@@ -19,6 +20,8 @@ mo_failures()
 mo_uncertainties()

 mo_renamed()
+
+clear_mo_history(...)
 }
 \arguments{
 \item{x}{a character vector or a \code{data.frame} with one or two columns}
@@ -45,7 +48,7 @@ Use this function to determine a valid microorganism ID (\code{mo}). Determinati
 }
 \details{
 \strong{General info} \cr
-A microbial ID from this package (class: \code{mo}) typically looks like these examples:\cr
+A microorganism ID from this package (class: \code{mo}) typically looks like these examples:\cr
 \preformatted{
  Code              Full name
  ---------------   --------------------------------------
@@ -65,7 +68,13 @@ Values that cannot be coered will be considered 'unknown' and will get the MO co

 Use the \code{\link{mo_property}_*} functions to get properties based on the returned code, see Examples.

-The algorithm uses data from the Catalogue of Life (see below) and from one other source (see \code{?microorganisms}).
+The algorithm uses data from the Catalogue of Life (see below) and from one other source (see \code{\link{microorganisms}}).
+
+\strong{Self-learning algoritm} \cr
+The \code{as.mo()} function gains experience from previously determined microorganism IDs and learns from it. This drastically improves both speed and reliability. Use \code{clear_mo_history()} to reset the algorithms. Only experience from your current \code{AMR} package version is used. This is done because in the future the taxonomic tree (which is included in this package) may change for any organism and it consequently has to rebuild its knowledge.
+
+Usually, any guess after the first try runs 80-95\% faster than the first try.
+

 \strong{Intelligent rules} \cr
 This function uses intelligent rules to help getting fast and logical results. It tries to find matches in this order:
@@ -105,7 +114,7 @@ Use \code{mo_failures()} to get a vector with all values that could not be coerc

 Use \code{mo_uncertainties()} to get a data.frame with all values that were coerced to a valid value, but with uncertainty.

-Use \code{mo_renamed()} to get a vector with all values that could be coerced based on an old, previously accepted taxonomic name.
+Use \code{mo_renamed()} to get a data.frame with all values that could be coerced based on an old, previously accepted taxonomic name.

 \strong{Microbial prevalence of pathogens in humans} \cr
 The intelligent rules take into account microbial prevalence of pathogens in humans. It uses three groups and all (sub)species are in only one group. These groups are:
@@ -117,7 +126,7 @@ The intelligent rules take into account microbial prevalence of pathogens in hum

 Group 1 contains all common Gram positives and Gram negatives, like all Enterobacteriaceae and e.g. \emph{Pseudomonas} and \emph{Legionella}.

-Group 2 probably contains less microbial pathogens; all other members of phyla that were found in humans in the Northern Netherlands between 2001 and 2018.
+Group 2 contains probably less pathogenic microorganisms; all other members of phyla that were found in humans in the Northern Netherlands between 2001 and 2018.
 }
 \section{Source}{

--- a/man/like.Rd
+++ b/man/like.Rd
@@ -3,14 +3,17 @@
 \name{like}
 \alias{like}
 \alias{\%like\%}
+\alias{\%like_case\%}
 \title{Pattern Matching}
 \source{
 Idea from the \href{https://github.com/Rdatatable/data.table/blob/master/R/like.R}{\code{like} function from the \code{data.table} package}, but made it case insensitive at default and let it support multiple patterns. Also, if the regex fails the first time, it tries again with \code{perl = TRUE}.
 }
 \usage{
-like(x, pattern)
+like(x, pattern, ignore.case = TRUE)

 x \%like\% pattern
+
+x \%like_case\% pattern
 }
 \arguments{
 \item{x}{a character vector where matches are sought, or an
@@ -24,12 +27,15 @@ x \%like\% pattern
    character vector of length 2 or more is supplied, the first element
    is used with a warning.  Missing values are allowed except for
    \code{regexpr} and \code{gregexpr}.}
+
+\item{ignore.case}{if \code{FALSE}, the pattern matching is \emph{case
+      sensitive} and if \code{TRUE}, case is ignored during matching.}
 }
 \value{
 A \code{logical} vector
 }
 \description{
-Convenient wrapper around \code{\link[base]{grep}} to match a pattern: \code{a \%like\% b}. It always returns a \code{logical} vector and is always case-insensitive. Also, \code{pattern} (\code{b}) can be as long as \code{x} (\code{a}) to compare items of each index in both vectors.
+Convenient wrapper around \code{\link[base]{grep}} to match a pattern: \code{a \%like\% b}. It always returns a \code{logical} vector and is always case-insensitive (use \code{a \%like_case\% b} for case-sensitive matching). Also, \code{pattern} (\code{b}) can be as long as \code{x} (\code{a}) to compare items of each index in both vectors, or can both have the same length to iterate over all cases.
 }
 \details{
 Using RStudio? This function can also be inserted from the Addins menu and can have its own Keyboard Shortcut like Ctrl+Shift+L or Cmd+Shift+L (see Tools > Modify Keyboard Shortcuts...).