diff --git a/R/globals.R b/R/globals.R index f48f3a205..b868fea1c 100755 --- a/R/globals.R +++ b/R/globals.R @@ -37,6 +37,7 @@ globalVariables(c(".", "date_lab", "diff.percent", "fctlvl", + "First name", "first_isolate_row_index", "Freq", "fullname", @@ -50,6 +51,7 @@ globalVariables(c(".", "key_ab_other", "labs", "Lancefield", + "Last name", "lbl", "median", "mic", @@ -80,6 +82,7 @@ globalVariables(c(".", "se_max", "se_min", "septic_patients", + "Sex", "shortname", "species", "trade_name", diff --git a/docs/index.html b/docs/index.html index 8e236023c..dcfa71191 100644 --- a/docs/index.html +++ b/docs/index.html @@ -233,11 +233,11 @@
This package is available on the official R network (CRAN), which has a peer-reviewed submission process. Install this package in R with:
- +install.packages("AMR")It will be downloaded and installed automatically. For RStudio, click on the menu Tools > Install Packages… and then type in “AMR” and press Install.
The latest and unpublished development version can be installed with (precaution: may be unstable):
- +install.packages("devtools")
+devtools::install_gitlab("msberends/AMR")The AMR package basically does four important things:
It cleanses existing data by providing new classes for microoganisms, antibiotics and antimicrobial results (both S/I/R and MIC). With this package, you learn R everything about microbiology that is needed for analysis. These functions all use artificial intelligence to guess results that you would expect:
+as.mo() to get an ID of a microorganism. The IDs are human readable for the trained eye - the ID of Klebsiella pneumoniae is “B_KLBSL_PNE” (B stands for Bacteria) and the ID of S. aureus is “B_STPHY_AUR”. The function takes almost any text as input that looks like the name or code of a microorganism like “E. coli”, “esco” or “esccol” and tries to find expected results using artificial intelligence (AI) on the included ITIS data set, consisting of almost 20,000 microorganisms. It is very fast, please see our benchmarks. Moreover, it can group Staphylococci into coagulase negative and positive (CoNS and CoPS, see source) and can categorise Streptococci into Lancefield groups (like beta-haemolytic Streptococcus Group B, source).as.rsi() to transform values to valid antimicrobial results. It produces just S, I or R based on your input and warns about invalid values. Even values like “<=0.002; S” (combined MIC/RSI) will result in “S”.as.mic() to cleanse your MIC values. It produces a so-called factor (called ordinal in SPSS) with valid MIC values as levels. A value like “<=0.002; S” (combined MIC/RSI) will result in “<=0.002”.as.atc() to get the ATC code of an antibiotic as defined by the WHO. This package contains a database with most LIS codes, official names, DDDs and even trade names of antibiotics. For example, the values “Furabid”, “Furadantin”, “nitro” all return the ATC code of Nitrofurantoine.It enhances existing data and adds new data from data sets included in this package.
+eucast_rules() to apply EUCAST expert rules to isolates.first_isolate() to identify the first isolates of every patient using guidelines from the CLSI (Clinical and Laboratory Standards Institute).
@@ -298,9 +298,9 @@
microorganisms contains the complete taxonomic tree of almost 20,000 microorganisms (bacteria, fungi/yeasts and protozoa). Furthermore, the colloquial name and Gram stain are available, which enables resistance analysis of e.g. different antibiotics per Gram stain. The package also contains functions to look up values in this data set like mo_genus(), mo_family(), mo_gramstain() or even mo_phylum(). As they use as.mo() internally, they also use artificial intelligence. For example, mo_genus("MRSA") and mo_genus("S. aureus") will both return "Staphylococcus". They also come with support for German, Dutch, Spanish, Italian, French and Portuguese. These functions can be used to add new variables to your data.antibiotics contains almost 500 antimicrobial drugs with their ATC code, EARS-Net code, common LIS codes, official name, trivial name and DDD of both oral and parenteral administration. It also contains hundreds of trade names. Use functions like atc_name() and atc_tradenames() to look up values. The atc_* functions use as.atc() internally so they support AI to guess your expected result. For example, atc_name("Fluclox"), atc_name("Floxapen") and atc_name("J01CF05") will all return "Flucloxacillin". These functions can again be used to add new variables to your data.It analyses the data with convenient functions that use well-known methods.
+portion_R(), portion_IR(), portion_I(), portion_SI() and portion_S() functions. Similarly, the number of isolates can be determined with the count_R(), count_IR(), count_I(), count_SI() and count_S() functions. All these functions can be used with the dplyr package (e.g. in conjunction with summarise())geom_rsi(), a function made for the ggplot2 packagekurtosis(), skewness() and create frequency tables with freq()
It teaches the user how to use all the above actions.
+septic_patients. This data set contains:
@@ -321,8 +321,6 @@
as.mo() to identify an MO code.first_isolate() and eucast_rules(), all parameters will be filled in automatically.antibiotics data set now contains a column ears_net.All ab_* functions are deprecated and replaced by atc_* functions:
ab_property -> atc_property()
-ab_name -> atc_name()
-ab_official -> atc_official()
-ab_trivial_nl -> atc_trivial_nl()
-ab_certe -> atc_certe()
-ab_umcg -> atc_umcg()
-ab_tradenames -> atc_tradenames()as.atc() internally. The old atc_property has been renamed atc_online_property(). This is done for two reasons: firstly, not all ATC codes are of antibiotics (ab) but can also be of antivirals or antifungals. Secondly, the input must have class atc or must be coerable to this class. Properties of these classes should start with the same class name, analogous to as.mo() and e.g. mo_genus.pkgdown)
-ab_* functions are deprecated and replaced by atc_* functions: r ab_property -> atc_property() ab_name -> atc_name() ab_official -> atc_official() ab_trivial_nl -> atc_trivial_nl() ab_certe -> atc_certe() ab_umcg -> atc_umcg() ab_tradenames -> atc_tradenames() These functions use as.atc() internally. The old atc_property has been renamed atc_online_property(). This is done for two reasons: firstly, not all ATC codes are of antibiotics (ab) but can also be of antivirals or antifungals. Secondly, the input must have class atc or must be coerable to this class. Properties of these classes should start with the same class name, analogous to as.mo() and e.g. mo_genus.pkgdown)set_mo_source() and get_mo_source() to use your own predefined MO codes as input for as.mo() and consequently all mo_* functionsdplyr version 0.8.0guess_ab_col() to find an antibiotic column in a tableas.atc()
mo_renamed() to get a list of all returned values from as.mo() that have had taxonomic renamingage() to calculate the (patients) age in yearsage_groups() to split ages into custom or predefined groups (like children or elderly). This allows for easier demographic antimicrobial resistance analysis per age group.New function ggplot_rsi_predict() as well as the base R plot() function can now be used for resistance prediction calculated with resistance_predict():
ggplot_rsi_predict() as well as the base R plot() function can now be used for resistance prediction calculated with resistance_predict(): r x <- resistance_predict(septic_patients, col_ab = "amox") plot(x) ggplot_rsi_predict(x)
Functions filter_first_isolate() and filter_first_weighted_isolate() to shorten and fasten filtering on data sets with antimicrobial results, e.g.:
is equal to:
- +filter_first_isolate() and filter_first_weighted_isolate() to shorten and fasten filtering on data sets with antimicrobial results, e.g.: r septic_patients %>% filter_first_isolate(...) # or filter_first_isolate(septic_patients, ...) is equal to: r septic_patients %>% mutate(only_firsts = first_isolate(septic_patients, ...)) %>% filter(only_firsts == TRUE) %>% select(-only_firsts)
New vignettes about how to conduct AMR analysis, predict antimicrobial resistance, use the G-test and more. These are also available (and even easier readable) on our website: https://msberends.gitlab.io/AMR.
as.atc()
atc_ddd() and atc_groups() have been renamed atc_online_ddd() and atc_online_groups(). The old functions are deprecated and will be removed in a future version.guess_mo() is now deprecated in favour of as.mo() and will be removed in future versionsguess_atc() is now deprecated in favour of as.atc() and will be removed in future versionseucast_rules():
-eucast_rules():as.mo():
-as.mo():as.atc()
first_isolate():
-first_isolate():septic_patients data set this yielded a difference of 0.15% more isolatescol_patientid), when this parameter was left blankcol_keyantibiotics()), when this parameter was left blankoutput_logical, the function will now always return a logical valuefilter_specimen to specimen_group, although using filter_specimen will still workportion functions, that low counts can influence the outcome and that the portion functions may camouflage this, since they only return the portion (albeit being dependent on the minimum parameter)microorganisms.certe and microorganisms.umcg into microorganisms.codes
as.atc()
rsi and mic
freq() function):
-freq() function):Support for tidyverse quasiquotation! Now you can create frequency tables of function outcomes:
-# Determine genus of microorganisms (mo) in `septic_patients` data set:
-# OLD WAY
-septic_patients %>%
- mutate(genus = mo_genus(mo)) %>%
- freq(genus)
-# NEW WAY
-septic_patients %>%
- freq(mo_genus(mo))
-
-# Even supports grouping variables:
-septic_patients %>%
- group_by(gender) %>%
- freq(mo_genus(mo))# Determine genus of microorganisms (mo) in `septic_patients` data set:
+# OLD WAY
+septic_patients %>%
+ mutate(genus = mo_genus(mo)) %>%
+ freq(genus)
+# NEW WAY
+septic_patients %>%
+ freq(mo_genus(mo))
+
+# Even supports grouping variables:
+septic_patients %>%
+ group_by(gender) %>%
+ freq(mo_genus(mo))header functionmo to show unique count of families, genera and speciesas.atc()
droplevels to exclude empty factor levels when input is a factorselect() on frequency tablesscale_y_percent() now contains the limits parametermdro(), key_antibiotics() and eucast_rules()
resistance_predict() function)as.mic() to support more values ending in (several) zeroesFix for as.mic() to support more values ending in (several) zeroes
as.atc()
EUCAST_rules was renamed to eucast_rules, the old function still exists as a deprecated functioneucast_rules function:
-eucast_rules function:rules to specify which rules should be applied (expert rules, breakpoints, others or all)verbose which can be set to TRUE to get very specific messages about which columns and rows were affectedas.atc()
septic_patients now reflects these changespipe for piperacillin (J01CA12), also to the mdro functionkingdom to the microorganisms data set, and function mo_kingdom to look up valuesas.mo (and subsequently all mo_* functions), as empty values wil be ignored a priori
as.mo will return NAFunction as.mo (and all mo_* wrappers) now supports genus abbreviations with “species” attached
as.mo (and all mo_* wrappers) now supports genus abbreviations with “species” attached r as.mo("E. species") # B_ESCHR mo_fullname("E. spp.") # "Escherichia species" as.mo("S. spp") # B_STPHY mo_fullname("S. species") # "Staphylococcus species"
combine_IR (TRUE/FALSE) to functions portion_df and count_df, to indicate that all values of I and R must be merged into one, so the output only consists of S vs. IR (susceptible vs. non-susceptible)portion_*(..., as_percent = TRUE) when minimal number of isolates would not be metas.atc()
portion_* functions now throws a warning when total available isolate is below parameter minimum
as.mo, as.rsi, as.mic, as.atc and freq will not set package name as attribute anymorefreq():
-freq():Support for grouping variables, test with:
- +septic_patients %>%
+ group_by(hospital_id) %>%
+ freq(gender)Support for (un)selecting columns:
-septic_patients %>%
- freq(hospital_id) %>%
- select(-count, -cum_count) # only get item, percent, cum_percentseptic_patients %>%
+ freq(hospital_id) %>%
+ select(-count, -cum_count) # only get item, percent, cum_percenthms::is.hms
as.atc()
na, to choose which character to print for empty valuesheader to turn the header info off (default when markdown = TRUE)title to manually setbthe title of the frequency tablefirst_isolate now tries to find columns to use as input when parameters are left blankmdro)as.atc()
ggplot_rsi and scale_y_percent have breaks parameteras.mo:
-as.mo:"CRS" -> Stenotrophomonas maltophilia
as.atc()
"MSSE" -> Staphylococcus epidermidis
join functionsis.rsi.eligible, now 15-20 times fasterg.test, when sum(x) is below 1000 or any of the expected values is below 5, Fisher’s Exact Test will be suggestedas.atc()
New
microorganisms now contains all microbial taxonomic data from ITIS (kingdoms Bacteria, Fungi and Protozoa), the Integrated Taxonomy Information System, available via https://itis.gov. The data set now contains more than 18,000 microorganisms with all known bacteria, fungi and protozoa according ITIS with genus, species, subspecies, family, order, class, phylum and subkingdom. The new data set microorganisms.old contains all previously known taxonomic names from those kingdoms.mo_property:
-mo_property:mo_phylum, mo_class, mo_order, mo_family, mo_genus, mo_species, mo_subspecies
mo_fullname, mo_shortname
@@ -529,52 +474,22 @@ These functions use as.atc()
mo_ref
They also come with support for German, Dutch, French, Italian, Spanish and Portuguese:
-mo_gramstain("E. coli")
-# [1] "Gram negative"
-mo_gramstain("E. coli", language = "de") # German
-# [1] "Gramnegativ"
-mo_gramstain("E. coli", language = "es") # Spanish
-# [1] "Gram negativo"
-mo_fullname("S. group A", language = "pt") # Portuguese
-# [1] "Streptococcus grupo A"Furthermore, former taxonomic names will give a note about the current taxonomic name:
- -count_R, count_IR, count_I, count_SI and count_S to selectively count resistant or susceptible isolates
+They also come with support for German, Dutch, French, Italian, Spanish and Portuguese: r mo_gramstain("E. coli") # [1] "Gram negative" mo_gramstain("E. coli", language = "de") # German # [1] "Gramnegativ" mo_gramstain("E. coli", language = "es") # Spanish # [1] "Gram negativo" mo_fullname("S. group A", language = "pt") # Portuguese # [1] "Streptococcus grupo A"
Furthermore, former taxonomic names will give a note about the current taxonomic name: r mo_gramstain("Esc blattae") # Note: 'Escherichia blattae' (Burgess et al., 1973) was renamed 'Shimwellia blattae' (Priest and Barker, 2010) # [1] "Gram negative"
count_R, count_IR, count_I, count_SI and count_S to selectively count resistant or susceptible isolatescount_df (which works like portion_df) to get all counts of S, I and R of a data set with antibiotic columns, with support for grouped variablesis.rsi.eligible to check for columns that have valid antimicrobial results, but do not have the rsi class yet. Transform the columns of your raw data with: data %>% mutate_if(is.rsi.eligible, as.rsi)
Functions as.mo and is.mo as replacements for as.bactid and is.bactid (since the microoganisms data set not only contains bacteria). These last two functions are deprecated and will be removed in a future release. The as.mo function determines microbial IDs using Artificial Intelligence (AI):
as.mo("E. coli")
-# [1] B_ESCHR_COL
-as.mo("MRSA")
-# [1] B_STPHY_AUR
-as.mo("S group A")
-# [1] B_STRPTC_GRAAnd with great speed too - on a quite regular Linux server from 2007 it takes us less than 0.02 seconds to transform 25,000 items:
- +as.mo and is.mo as replacements for as.bactid and is.bactid (since the microoganisms data set not only contains bacteria). These last two functions are deprecated and will be removed in a future release. The as.mo function determines microbial IDs using Artificial Intelligence (AI): r as.mo("E. coli") # [1] B_ESCHR_COL as.mo("MRSA") # [1] B_STPHY_AUR as.mo("S group A") # [1] B_STRPTC_GRA And with great speed too - on a quite regular Linux server from 2007 it takes us less than 0.02 seconds to transform 25,000 items: r thousands_of_E_colis <- rep("E. coli", 25000) microbenchmark::microbenchmark(as.mo(thousands_of_E_colis), unit = "s") # Unit: seconds # min median max neval # 0.01817717 0.01843957 0.03878077 100
reference_df for as.mo, so users can supply their own microbial IDs, name or codes as a reference tablebactid to mo, like:
-bactid to mo, like:EUCAST_rules, first_isolate and key_antibiotics
microorganisms and septic_patients
labels_rsi_count to print datalabels on a RSI ggplot2 modelFunctions as.atc and is.atc to transform/look up antibiotic ATC codes as defined by the WHO. The existing function guess_atc is now an alias of as.atc.
ab_property and its aliases: ab_name, ab_tradenames, ab_certe, ab_umcg and ab_trivial_nl
@@ -589,14 +504,7 @@ These functions use as.atc()
Changed
antibiotics data set: Terbinafine (D01BA02), Rifaximin (A07AA11) and Isoconazole (D01AC05)Added 163 trade names to the antibiotics data set, it now contains 298 different trade names in total, e.g.:
antibiotics data set, it now contains 298 different trade names in total, e.g.: r ab_official("Bactroban") # [1] "Mupirocin" ab_name(c("Bactroban", "Amoxil", "Zithromax", "Floxapen")) # [1] "Mupirocin" "Amoxicillin" "Azithromycin" "Flucloxacillin" ab_atc(c("Bactroban", "Amoxil", "Zithromax", "Floxapen")) # [1] "R01AX06" "J01CA04" "J01FA10" "J01CF05"
first_isolate, rows will be ignored when there’s no species availableratio is now deprecated and will be removed in a future release, as it is not really the scope of this packageas.atc()
prevalence column to the microorganisms data setminimum and as_percent to portion_df
Support for quasiquotation in the functions series count_* and portions_*, and n_rsi. This allows to check for more than 2 vectors or columns.
ggplot_rsi and geom_rsi so they can cope with count_df. The new fun parameter has value portion_df at default, but can be set to count_df.ggplot_rsi when the ggplot2 package was not loadedlabels_rsi_count to ggplot_rsi
-geom_rsi (and ggplot_rsi) so you can set your own preferencesquote to the freq functiondiff for frequency tablesfreq) header of class character
-Support for types (classes) list and matrix for freq
For lists, subsetting is possible:
- -count_* and portions_*, and n_rsi. This allows to check for more than 2 vectors or columns. ```r septic_patients %>% select(amox, cipr) %>% count_IR() # which is the same as: septic_patients %>% count_IR(amox, cipr)septic_patients %>% portion_S(amcl) septic_patients %>% portion_S(amcl, gent) septic_patients %>% portion_S(amcl, gent, pita) * Edited `ggplot_rsi` and `geom_rsi` so they can cope with `count_df`. The new `fun` parameter has value `portion_df` at default, but can be set to `count_df`. * Fix for `ggplot_rsi` when the `ggplot2` package was not loaded * Added datalabels function `labels_rsi_count` to `ggplot_rsi` * Added possibility to set any parameter to `geom_rsi` (and `ggplot_rsi`) so you can set your own preferences * Fix for joins, where predefined suffices would not be honoured * Added parameter `quote` to the `freq` function * Added generic function `diff` for frequency tables * Added longest en shortest character length in the frequency table (`freq`) header of class `character` * Support for types (classes) list and matrix for `freq`r my_matrix = with(septic_patients, matrix(c(age, gender), ncol = 2)) freq(my_matrix) For lists, subsetting is possible:r my_list = list(age = septic_patients$age, gender = septic_patients$gender) my_list %>% freq(age) my_list %>% freq(gender) ```
as.atc()
Newrsi_df was removed in favour of new functions portion_R, portion_IR, portion_I, portion_SI and portion_S to selectively calculate resistance or susceptibility. These functions are 20 to 30 times faster than the old rsi function. The old function still works, but is deprecated.
-rsi_df was removed in favour of new functions portion_R, portion_IR, portion_I, portion_SI and portion_S to selectively calculate resistance or susceptibility. These functions are 20 to 30 times faster than the old rsi function. The old function still works, but is deprecated.portion_df to get all portions of S, I and R of a data set with antibiotic columns, with support for grouped variablesggplot2
-geom_rsi, facet_rsi, scale_y_percent, scale_rsi_colours and theme_rsi
ggplot_rsi to apply all above functions on a data set:
@@ -678,32 +553,22 @@ These functions use as.atc()
as.bactid and is.bactid to transform/ look up microbial ID’s.guess_bactid is now an alias of as.bactid
kurtosis and skewness that are lacking in base R - they are generic functions and have support for vectors, data.frames and matricesg.test to perform the Χ2 distributed G-test, which use is the same as chisq.test
ratio to transform a vector of values to a preset ratioratio to transform a vector of values to a preset ratioratio(c(10, 500, 10), ratio = "1:2:1") would return 130, 260, 130%in% or %like% (and give them keyboard shortcuts), or to view the datasets that come with this packagep.symbol to transform p values to their related symbols: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
clipboard_import and clipboard_export as helper functions to quickly copy and paste from/to software like Excel and SPSS. These functions use the clipr package, but are a little altered to also support headless Linux servers (so you can use it in RStudio Server)freq):
-freq):rsi (antimicrobial resistance) to use as inputtable to use as input: freq(table(x, y))
@@ -718,8 +583,6 @@ These functions use as.atc()
options(max.print.freq = n) where n is your preset valueas.atc()
microorganisms dataset (especially for Salmonella) and the column bactid now has the new class "bactid"
rsi and mic functions:
-rsi and mic functions:as.rsi("<=0.002; S") will return S
as.mic("<=0.002; S") will return <=0.002
as.mic("<= 0.002") now worksrsi and mic do not add the attribute package.version anymore"groups" option for atc_property(..., property). It will return a vector of the ATC hierarchy as defined by the WHO. The new function atc_groups is a convenient wrapper around this.atc_property as it requires the host set by url to be responsivefirst_isolate algorithm to exclude isolates where bacteria ID or genus is unavailable924b62) from the dplyr package v0.7.5 and aboveguess_bactid (now called as.bactid)
-guess_bactid (now called as.bactid)yourdata %>% select(genus, species) %>% as.bactid() now also worksas.atc()
as.atc()
guess_bactid to determine the ID of a microorganism based on genus/species or known abbreviations like MRSAguess_atc to determine the ATC of an antibiotic based on name, trade name, or known abbreviationsfreq to create frequency tables, with additional info in a headerMDRO to determine Multi Drug Resistant Organisms (MDRO) with support for country-specific guidelines.
-MDRO to determine Multi Drug Resistant Organisms (MDRO) with support for country-specific guidelines.BRMO and MRGN are wrappers for Dutch and German guidelines, respectively"points" or "keyantibiotics", see ?first_isolate
tibbles and data.tables