2019-01-21 15:53:01 +01:00
# ==================================================================== #
# TITLE #
# Antimicrobial Resistance (AMR) Analysis #
# #
# SOURCE #
2020-07-08 14:48:06 +02:00
# https://github.com/msberends/AMR #
2019-01-21 15:53:01 +01:00
# #
# LICENCE #
2020-01-05 17:22:09 +01:00
# (c) 2018-2020 Berends MS, Luz CF et al. #
2019-01-21 15:53:01 +01:00
# #
# This R package is free software; you can freely use and distribute #
# it for both personal and commercial purposes under the terms of the #
# GNU General Public License version 2.0 (GNU GPL-2), as published by #
# the Free Software Foundation. #
# #
2020-01-05 17:22:09 +01:00
# We created this package for both routine data analysis and academic #
# research and it was publicly released in the hope that it will be #
# useful, but it comes WITHOUT ANY WARRANTY OR LIABILITY. #
2020-07-08 14:48:06 +02:00
# Visit our website for more info: https://msberends.github.io/AMR. #
2019-01-21 15:53:01 +01:00
# ==================================================================== #
2020-05-25 01:01:14 +02:00
#' User-defined reference data set for microorganisms
2019-01-21 15:53:01 +01:00
#'
2019-11-28 22:32:17 +01:00
#' @description These functions can be used to predefine your own reference to be used in [as.mo()] and consequently all `mo_*` functions like [mo_genus()] and [mo_gramstain()].
2019-02-28 13:56:28 +01:00
#'
2019-11-28 22:32:17 +01:00
#' This is **the fastest way** to have your organisation (or analysis) specific codes picked up and translated by this package.
2020-01-05 17:22:09 +01:00
#' @inheritSection lifecycle Stable lifecycle
2020-05-25 01:01:14 +02:00
#' @param path location of your reference file, see Details. Can be `""`, `NULL` or `FALSE` to delete the reference file.
2019-01-21 15:53:01 +01:00
#' @rdname mo_source
#' @name mo_source
#' @aliases set_mo_source get_mo_source
2019-11-28 22:32:17 +01:00
#' @details The reference file can be a text file seperated with commas (CSV) or tabs or pipes, an Excel file (either 'xls' or 'xlsx' format) or an R object file (extension '.rds'). To use an Excel file, you need to have the `readxl` package installed.
2019-01-21 15:53:01 +01:00
#'
2019-11-28 22:32:17 +01:00
#' [set_mo_source()] will check the file for validity: it must be a [`data.frame`], must have a column named `"mo"` which contains values from [`microorganisms$mo`][microorganisms] and must have a reference column with your own defined values. If all tests pass, [set_mo_source()] will read the file into R and export it to `"~/.mo_source.rds"`. This compressed data file will then be used at default for MO determination (function [as.mo()] and consequently all `mo_*` functions like [mo_genus()] and [mo_gramstain()]). The location of the original file will be saved as option with `options(mo_source = path)`. Its timestamp will be saved with `options(mo_source_datetime = ...)`.
2019-01-21 15:53:01 +01:00
#'
2019-11-28 22:32:17 +01:00
#' [get_mo_source()] will return the data set by reading `"~/.mo_source.rds"` with [readRDS()]. If the original file has changed (the file defined with `path`), it will call [set_mo_source()] to update the data file automatically.
2019-01-21 15:53:01 +01:00
#'
2020-05-25 01:01:14 +02:00
#' Reading an Excel file (`.xlsx`) with only one row has a size of 8-9 kB. The compressed file created with [set_mo_source()] will then have a size of 0.1 kB and can be read by [get_mo_source()] in only a couple of microseconds (millionths of a second).
2019-11-28 22:32:17 +01:00
#'
2020-05-25 01:01:14 +02:00
#' @section How to setup:
#'
#' Imagine this data on a sheet of an Excel file (mo codes were looked up in the [microorganisms] data set). The first column contains the organisation specific codes, the second column contains an MO code from this package:
2019-11-28 22:32:17 +01:00
#'
#' ```
2019-11-30 12:01:50 +01:00
#' | A | B |
#' --|--------------------|--------------|
#' 1 | Organisation XYZ | mo |
#' 2 | lab_mo_ecoli | B_ESCHR_COLI |
#' 3 | lab_mo_kpneumoniae | B_KLBSL_PNMN |
#' 4 | | |
2019-11-28 22:32:17 +01:00
#' ```
2019-01-21 15:53:01 +01:00
#'
2019-11-28 22:32:17 +01:00
#' We save it as `"home/me/ourcodes.xlsx"`. Now we have to set it as a source:
2020-05-25 01:01:14 +02:00
#'
2019-11-28 22:32:17 +01:00
#' ```
2019-02-27 11:36:12 +01:00
#' set_mo_source("home/me/ourcodes.xlsx")
2020-05-25 01:01:14 +02:00
#' #> NOTE: Created mo_source file '~/.mo_source.rds' from 'home/me/ourcodes.xlsx'
#' #> (columns "Organisation XYZ" and "mo")
2019-11-28 22:32:17 +01:00
#' ```
2019-01-21 15:53:01 +01:00
#'
2020-05-25 01:01:14 +02:00
#' It has now created a file `"~/.mo_source.rds"` with the contents of our Excel file. Only the first column with foreign values and the 'mo' column will be kept when creating the RDS file.
2019-02-28 13:56:28 +01:00
#'
#' And now we can use it in our functions:
2020-05-25 01:01:14 +02:00
#'
2019-11-28 22:32:17 +01:00
#' ```
2019-01-21 15:53:01 +01:00
#' as.mo("lab_mo_ecoli")
2020-05-25 01:01:14 +02:00
#' #> [1] B_ESCHR_COLI
2019-01-21 15:53:01 +01:00
#'
#' mo_genus("lab_mo_kpneumoniae")
2020-05-25 01:01:14 +02:00
#' #> [1] "Klebsiella"
2019-03-01 09:34:04 +01:00
#'
#' # other input values still work too
#' as.mo(c("Escherichia coli", "E. coli", "lab_mo_ecoli"))
2020-05-25 01:01:14 +02:00
#' #> [1] B_ESCHR_COLI B_ESCHR_COLI B_ESCHR_COLI
2019-11-28 22:32:17 +01:00
#' ```
2019-01-21 15:53:01 +01:00
#'
2020-05-25 01:01:14 +02:00
#' If we edit the Excel file by, let's say, adding row 4 like this:
#'
2019-11-28 22:32:17 +01:00
#' ```
#' | A | B |
#' --|--------------------|--------------|
#' 1 | Organisation XYZ | mo |
#' 2 | lab_mo_ecoli | B_ESCHR_COLI |
#' 3 | lab_mo_kpneumoniae | B_KLBSL_PNMN |
#' 4 | lab_Staph_aureus | B_STPHY_AURS |
#' 5 | | |
#' ```
2019-01-21 15:53:01 +01:00
#'
2019-11-30 12:01:50 +01:00
#' ...any new usage of an MO function in this package will update your data file:
2020-05-25 01:01:14 +02:00
#'
2019-11-28 22:32:17 +01:00
#' ```
2019-02-28 13:56:28 +01:00
#' as.mo("lab_mo_ecoli")
2020-05-25 01:01:14 +02:00
#' #> NOTE: Updated mo_source file '~/.mo_source.rds' from 'home/me/ourcodes.xlsx'
#' #> (columns "Organisation XYZ" and "mo")
#' #> [1] B_ESCHR_COLI
2019-02-28 13:56:28 +01:00
#'
#' mo_genus("lab_Staph_aureus")
2020-05-25 01:01:14 +02:00
#' #> [1] "Staphylococcus"
2019-11-28 22:32:17 +01:00
#' ```
2019-02-28 13:56:28 +01:00
#'
2020-05-25 01:01:14 +02:00
#' To delete the reference data file, just use `""`, `NULL` or `FALSE` as input for [set_mo_source()]:
#'
2019-11-28 22:32:17 +01:00
#' ```
2019-02-28 13:56:28 +01:00
#' set_mo_source(NULL)
#' # Removed mo_source file '~/.mo_source.rds'.
2019-11-28 22:32:17 +01:00
#' ```
2020-05-25 01:01:14 +02:00
#'
#' If the original Excel file is moved or deleted, the mo_source file will be removed upon the next use of [as.mo()]. If the mo_source file is manually deleted (i.e. without using [set_mo_source()]), the references to the mo_source file will be removed upon the next use of [as.mo()].
2019-02-28 13:56:28 +01:00
#' @export
#' @inheritSection AMR Read more on our website!
2019-01-21 15:53:01 +01:00
set_mo_source <- function ( path ) {
2020-07-13 09:17:24 +02:00
2019-10-11 17:21:02 +02:00
file_location <- path.expand ( " ~/mo_source.rds" )
2020-07-13 09:17:24 +02:00
2020-06-22 11:18:40 +02:00
stop_ifnot ( length ( path ) == 1 , " `path` must be of length 1" )
2020-07-13 09:17:24 +02:00
2020-05-25 01:01:14 +02:00
if ( is.null ( path ) || path %in% c ( FALSE , " " ) ) {
2019-01-21 15:53:01 +01:00
options ( mo_source = NULL )
options ( mo_source_timestamp = NULL )
2019-03-15 13:57:25 +01:00
if ( file.exists ( file_location ) ) {
unlink ( file_location )
2020-05-25 01:01:14 +02:00
message ( font_red ( paste0 ( " Removed mo_source file '" , font_bold ( file_location ) , " '" ) ) )
2019-01-21 15:53:01 +01:00
}
return ( invisible ( ) )
}
2020-07-13 09:17:24 +02:00
2020-06-22 11:18:40 +02:00
stop_ifnot ( file.exists ( path ) ,
" file not found: " , path )
2020-07-13 09:17:24 +02:00
2019-10-11 17:21:02 +02:00
if ( path %like% " [.]rds$" ) {
2019-01-21 15:53:01 +01:00
df <- readRDS ( path )
2020-07-13 09:17:24 +02:00
2019-10-11 17:21:02 +02:00
} else if ( path %like% " [.]xlsx?$" ) {
2019-01-21 15:53:01 +01:00
# is Excel file (old or new)
2020-06-17 15:14:37 +02:00
read_excel <- import_fn ( " read_excel" , " readxl" )
2020-05-16 21:40:50 +02:00
df <- read_excel ( path )
2020-07-13 09:17:24 +02:00
2019-10-11 17:21:02 +02:00
} else if ( path %like% " [.]tsv$" ) {
2019-02-27 11:36:12 +01:00
df <- utils :: read.table ( header = TRUE , sep = " \t" , stringsAsFactors = FALSE )
2020-07-13 09:17:24 +02:00
2019-01-21 15:53:01 +01:00
} else {
# try comma first
try (
df <- utils :: read.table ( header = TRUE , sep = " ," , stringsAsFactors = FALSE ) ,
silent = TRUE )
2020-05-16 13:05:47 +02:00
if ( ! mo_source_isvalid ( df , stop_on_error = FALSE ) ) {
2019-02-27 11:36:12 +01:00
# try tab
try (
df <- utils :: read.table ( header = TRUE , sep = " \t" , stringsAsFactors = FALSE ) ,
silent = TRUE )
}
2020-05-16 13:05:47 +02:00
if ( ! mo_source_isvalid ( df , stop_on_error = FALSE ) ) {
2019-01-21 15:53:01 +01:00
# try pipe
try (
df <- utils :: read.table ( header = TRUE , sep = " |" , stringsAsFactors = FALSE ) ,
silent = TRUE )
}
}
2020-07-13 09:17:24 +02:00
2020-05-16 13:05:47 +02:00
# check integrity
mo_source_isvalid ( df )
2020-07-13 09:17:24 +02:00
2020-05-25 01:01:14 +02:00
df <- subset ( df , ! is.na ( mo ) )
2020-07-13 09:17:24 +02:00
2019-03-01 09:34:04 +01:00
# keep only first two columns, second must be mo
2019-01-21 15:53:01 +01:00
if ( colnames ( df ) [1 ] == " mo" ) {
2020-05-25 01:01:14 +02:00
df <- df [ , c ( colnames ( df ) [2 ] , " mo" ) ]
2019-03-01 09:34:04 +01:00
} else {
2020-05-25 01:01:14 +02:00
df <- df [ , c ( colnames ( df ) [1 ] , " mo" ) ]
2019-01-21 15:53:01 +01:00
}
2020-07-13 09:17:24 +02:00
2019-01-21 15:53:01 +01:00
df <- as.data.frame ( df , stringAsFactors = FALSE )
2020-07-13 09:17:24 +02:00
2019-01-21 15:53:01 +01:00
# success
2019-03-15 13:57:25 +01:00
if ( file.exists ( file_location ) ) {
2019-01-21 15:53:01 +01:00
action <- " Updated"
} else {
action <- " Created"
}
2019-03-15 13:57:25 +01:00
saveRDS ( df , file_location )
2019-01-21 15:53:01 +01:00
options ( mo_source = path )
options ( mo_source_timestamp = as.character ( file.info ( path ) $ mtime ) )
2020-05-25 01:01:14 +02:00
message ( font_blue ( paste0 ( " NOTE: " ,
action , " mo_source file '" , font_bold ( file_location ) , " '" ,
" from '" , font_bold ( path ) , " '" ,
' \n (columns "' , colnames ( df ) [1 ] , ' " and "' , colnames ( df ) [2 ] , ' ")' ) ) )
2019-01-21 15:53:01 +01:00
}
#' @rdname mo_source
#' @export
get_mo_source <- function ( ) {
if ( is.null ( getOption ( " mo_source" , NULL ) ) ) {
2020-05-25 01:01:14 +02:00
return ( NULL )
}
if ( ! file.exists ( path.expand ( " ~/mo_source.rds" ) ) ) {
options ( mo_source = NULL )
options ( mo_source_timestamp = NULL )
message ( font_blue ( " NOTE: Removed references to deleted mo_source file (see ?mo_source)" ) )
return ( NULL )
2019-01-21 15:53:01 +01:00
}
2020-05-25 01:01:14 +02:00
old_time <- as.POSIXct ( getOption ( " mo_source_timestamp" ) )
new_time <- as.POSIXct ( as.character ( file.info ( getOption ( " mo_source" , " " ) ) $ mtime ) )
if ( is.na ( new_time ) ) {
# source file was deleted, remove reference too
set_mo_source ( " " )
return ( NULL )
}
if ( new_time != old_time ) {
# set updated source
set_mo_source ( getOption ( " mo_source" ) )
}
file_location <- path.expand ( " ~/mo_source.rds" )
readRDS ( file_location )
2019-01-21 15:53:01 +01:00
}
2019-03-05 22:47:42 +01:00
2020-05-16 13:05:47 +02:00
mo_source_isvalid <- function ( x , refer_to_name = " `reference_df`" , stop_on_error = TRUE ) {
2020-02-14 19:54:13 +01:00
check_dataset_integrity ( )
2019-03-05 22:47:42 +01:00
if ( deparse ( substitute ( x ) ) == " get_mo_source()" ) {
return ( TRUE )
}
if ( identical ( x , get_mo_source ( ) ) ) {
return ( TRUE )
}
if ( is.null ( x ) ) {
2020-05-16 13:05:47 +02:00
if ( stop_on_error == TRUE ) {
2020-06-22 11:18:40 +02:00
stop ( refer_to_name , " cannot be NULL" , call. = FALSE )
2020-05-16 13:05:47 +02:00
} else {
return ( FALSE )
}
2019-03-05 22:47:42 +01:00
}
if ( ! is.data.frame ( x ) ) {
2020-05-16 13:05:47 +02:00
if ( stop_on_error == TRUE ) {
2020-06-22 11:18:40 +02:00
stop ( refer_to_name , " must be a data.frame" , call. = FALSE )
2020-05-16 13:05:47 +02:00
} else {
return ( FALSE )
}
2019-03-05 22:47:42 +01:00
}
if ( ! " mo" %in% colnames ( x ) ) {
2020-05-16 13:05:47 +02:00
if ( stop_on_error == TRUE ) {
2020-06-22 11:18:40 +02:00
stop ( refer_to_name , " must contain a column 'mo'" , call. = FALSE )
2020-05-16 13:05:47 +02:00
} else {
return ( FALSE )
}
}
if ( ! all ( x $ mo %in% c ( " " , microorganisms $ mo , microorganisms.translation $ mo_old ) , na.rm = TRUE ) ) {
if ( stop_on_error == TRUE ) {
invalid <- x [which ( ! x $ mo %in% c ( " " , microorganisms $ mo , microorganisms.translation $ mo_old ) ) , , drop = FALSE ]
if ( nrow ( invalid ) > 1 ) {
plural <- " s"
} else {
plural <- " "
}
stop ( " Value" , plural , " " , paste0 ( " '" , invalid [ , 1 , drop = TRUE ] , " '" , collapse = " , " ) ,
" found in " , tolower ( refer_to_name ) ,
2020-06-22 11:18:40 +02:00
" , but with invalid microorganism code" , plural , " " , paste0 ( " '" , invalid $ mo , " '" , collapse = " , " ) ,
2020-05-16 13:05:47 +02:00
call. = FALSE )
} else {
return ( FALSE )
}
2019-03-05 22:47:42 +01:00
}
2020-05-16 13:05:47 +02:00
TRUE
2019-03-05 22:47:42 +01:00
}