+ Used in more than 100 countries
Since its first public release in early 2018, this package has been downloaded from more than 100 countries (source: CRAN logs). Click the map to enlarge, to see the names of the countries.
+
+
diff --git a/DESCRIPTION b/DESCRIPTION index f7afae4b..afbd4558 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR -Version: 1.2.0.9029 -Date: 2020-07-08 +Version: 1.2.0.9030 +Date: 2020-07-09 Title: Antimicrobial Resistance Analysis Authors@R: c( person(role = c("aut", "cre"), diff --git a/NEWS.md b/NEWS.md index 0eb541f0..114669f7 100755 --- a/NEWS.md +++ b/NEWS.md @@ -1,5 +1,5 @@ -# AMR 1.2.0.9029 -## Last updated: 08-Jul-2020 +# AMR 1.2.0.9030 +## Last updated: 09-Jul-2020 ### New * Function `ab_from_text()` to retrieve antimicrobial drug names, doses and forms of administration from clinical texts in e.g. health care records, which also corrects for misspelling since it uses `as.ab()` internally diff --git a/docs/404.html b/docs/404.html index 03c2b382..ceefe84f 100644 --- a/docs/404.html +++ b/docs/404.html @@ -81,7 +81,7 @@ AMR (for R) - 1.2.0.9029 + 1.2.0.9030 diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index 95c14eb1..2dfb5724 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -81,7 +81,7 @@ AMR (for R) - 1.2.0.9029 + 1.2.0.9030 diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index 7406b5fa..3943c64d 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -20,7 +20,7 @@ - + +// v0.0.1 +// Written by JooYoung Seo (jooyoung@psu.edu) and Atsushi Yasumoto on June 1st, 2020. + +document.addEventListener('DOMContentLoaded', function() { + const codeList = document.getElementsByClassName("sourceCode"); + for (var i = 0; i < codeList.length; i++) { + var linkList = codeList[i].getElementsByTagName('a'); + for (var j = 0; j < linkList.length; j++) { + if (linkList[j].innerHTML === "") { + linkList[j].setAttribute('aria-hidden', 'true'); + } + } + } +}); diff --git a/docs/articles/AMR_files/figure-html/plot 1-1.png b/docs/articles/AMR_files/figure-html/plot 1-1.png index a71b113d..13c83362 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 1-1.png and b/docs/articles/AMR_files/figure-html/plot 1-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 3-1.png b/docs/articles/AMR_files/figure-html/plot 3-1.png index 0d88e289..b5299cb9 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 3-1.png and b/docs/articles/AMR_files/figure-html/plot 3-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 4-1.png b/docs/articles/AMR_files/figure-html/plot 4-1.png index caff3eb9..406c1e5c 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 4-1.png and b/docs/articles/AMR_files/figure-html/plot 4-1.png differ diff --git a/docs/articles/AMR_files/figure-html/plot 5-1.png b/docs/articles/AMR_files/figure-html/plot 5-1.png index fb3520ab..49b54965 100644 Binary files a/docs/articles/AMR_files/figure-html/plot 5-1.png and b/docs/articles/AMR_files/figure-html/plot 5-1.png differ diff --git a/docs/articles/EUCAST.html b/docs/articles/EUCAST.html index 42fb770f..774397ab 100644 --- a/docs/articles/EUCAST.html +++ b/docs/articles/EUCAST.html @@ -20,7 +20,7 @@ - + +// v0.0.1 +// Written by JooYoung Seo (jooyoung@psu.edu) and Atsushi Yasumoto on June 1st, 2020. + +document.addEventListener('DOMContentLoaded', function() { + const codeList = document.getElementsByClassName("sourceCode"); + for (var i = 0; i < codeList.length; i++) { + var linkList = codeList[i].getElementsByTagName('a'); + for (var j = 0; j < linkList.length; j++) { + if (linkList[j].innerHTML === "") { + linkList[j].setAttribute('aria-hidden', 'true'); + } + } + } +}); diff --git a/docs/articles/MDR.html b/docs/articles/MDR.html index f5b31017..5cdee42d 100644 --- a/docs/articles/MDR.html +++ b/docs/articles/MDR.html @@ -20,7 +20,7 @@ - + +// v0.0.1 +// Written by JooYoung Seo (jooyoung@psu.edu) and Atsushi Yasumoto on June 1st, 2020. + +document.addEventListener('DOMContentLoaded', function() { + const codeList = document.getElementsByClassName("sourceCode"); + for (var i = 0; i < codeList.length; i++) { + var linkList = codeList[i].getElementsByTagName('a'); + for (var j = 0; j < linkList.length; j++) { + if (linkList[j].innerHTML === "") { + linkList[j].setAttribute('aria-hidden', 'true'); + } + } + } +}); diff --git a/docs/articles/PCA.html b/docs/articles/PCA.html index 6694bba8..41c90855 100644 --- a/docs/articles/PCA.html +++ b/docs/articles/PCA.html @@ -20,7 +20,7 @@ - + +// v0.0.1 +// Written by JooYoung Seo (jooyoung@psu.edu) and Atsushi Yasumoto on June 1st, 2020. + +document.addEventListener('DOMContentLoaded', function() { + const codeList = document.getElementsByClassName("sourceCode"); + for (var i = 0; i < codeList.length; i++) { + var linkList = codeList[i].getElementsByTagName('a'); + for (var j = 0; j < linkList.length; j++) { + if (linkList[j].innerHTML === "") { + linkList[j].setAttribute('aria-hidden', 'true'); + } + } + } +}); diff --git a/docs/articles/SPSS.html b/docs/articles/SPSS.html index fd8a38d6..148cf2f9 100644 --- a/docs/articles/SPSS.html +++ b/docs/articles/SPSS.html @@ -39,7 +39,7 @@ AMR (for R) - 1.2.0.9029 + 1.2.0.9030 @@ -186,7 +186,7 @@
vignettes/SPSS.Rmd
SPSS.Rmd
AMR
(for R). Developed at the University of Groningen in collaboration with non-profit organisations Certe Medical Diagnostics and Advice and University Medical Center Groningen.
NEWS.md
- R/ab_class_selectors.R
+ Source: R/ab_class_selectors.R
antibiotic_class_selectors.Rd
GNU GENERAL PUBLIC LICENSE +Version 2, June 1991 + +Copyright (C) 1989, 1991 Free Software Foundation, Inc., <http://fsf.org/> + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +Everyone is permitted to copy and distribute verbatim copies +of this license document, but changing it is not allowed. + +A SUMMARY OF THIS LICENSE BY THE ORIGINAL AUTHORS OF THE AMR R PACKAGE + +This R package, with package name 'AMR': +- May be used for commercial purposes +- May be used for private purposes +- May NOT be used for patent purposes +- May be modified, although: + - Modifications MUST be released under the same license when distributing the package + - Changes made to the code MUST be documented +- May be distributed, although: + - Source code MUST be made available when the package is distributed + - A copy of the license and copyright notice MUST be included with the package. +- Comes with a LIMITATION of liability +- Comes with NO warranty + +END OF THE SUMMARY + + +GNU GENERAL PUBLIC LICENSE +TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + +0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + +1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + +2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + +3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + +4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + +5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + +6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + +7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + +8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + +9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + +10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + +NO WARRANTY + +11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + +12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + +END OF TERMS AND CONDITIONS ++ +
vignettes/AMR.Rmd
+ AMR.Rmd
Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 22 June 2020.
+Conducting antimicrobial resistance analysis unfortunately requires in-depth knowledge from different scientific fields, which makes it hard to do right. At least, it requires:
+Of course, we cannot instantly provide you with knowledge and experience. But with this AMR
package, we aimed at providing (1) tools to simplify antimicrobial resistance data cleaning, transformation and analysis, (2) methods to easily incorporate international guidelines and (3) scientifically reliable reference data, including the requirements mentioned above.
The AMR
package enables standardised and reproducible antimicrobial resistance analysis, with the application of evidence-based rules, determination of first isolates, translation of various codes for microorganisms and antimicrobial agents, determination of (multi-drug) resistant microorganisms, and calculation of antimicrobial resistance, prevalence and future trends.
For this tutorial, we will create fake demonstration data to work with.
+You can skip to Cleaning the data if you already have your own data ready. If you start your analysis, try to make the structure of your data generally look like this:
+date | +patient_id | +mo | +AMX | +CIP | +
---|---|---|---|---|
2020-06-22 | +abcd | +Escherichia coli | +S | +S | +
2020-06-22 | +abcd | +Escherichia coli | +S | +R | +
2020-06-22 | +efgh | +Escherichia coli | +R | +S | +
As with many uses in R, we need some additional packages for AMR analysis. Our package works closely together with the tidyverse packages dplyr
and ggplot2
by RStudio. The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.
We will also use the cleaner
package, that can be used for cleaning data and creating frequency tables.
We will create some fake example data to use for analysis. For antimicrobial resistance analysis, we need at least: a patient ID, name or code of a microorganism, a date and antimicrobial results (an antibiogram). It could also include a specimen type (e.g. to filter on blood or urine), the ward type (e.g. to filter on ICUs).
+With additional columns (like a hospital name, the patients gender of even [well-defined] clinical properties) you can do a comparative analysis, as this tutorial will demonstrate too.
+To start with patients, we need a unique list of patients.
+ +The LETTERS
object is available in R - it’s a vector with 26 characters: A
to Z
. The patients
object we just created is now a vector of length 260, with values (patient IDs) varying from A1
to Z10
. Now we we also set the gender of our patients, by putting the ID and the gender in a table:
patients_table <- data.frame(patient_id = patients, + gender = c(rep("M", 135), + rep("F", 125)))
The first 135 patient IDs are now male, the other 125 are female.
+Let’s pretend that our data consists of blood cultures isolates from between 1 January 2010 and 1 January 2018.
+ +This dates
object now contains all days in our date range.
For this tutorial, we will uses four different microorganisms: Escherichia coli, Staphylococcus aureus, Streptococcus pneumoniae, and Klebsiella pneumoniae:
+bacteria <- c("Escherichia coli", "Staphylococcus aureus", + "Streptococcus pneumoniae", "Klebsiella pneumoniae")
For completeness, we can also add the hospital where the patients was admitted and we need to define valid antibmicrobial results for our randomisation:
+ +Using the sample()
function, we can randomly select items from all objects we defined earlier. To let our fake data reflect reality a bit, we will also approximately define the probabilities of bacteria and the antibiotic results with the prob
parameter.
sample_size <- 20000 +data <- data.frame(date = sample(dates, size = sample_size, replace = TRUE), + patient_id = sample(patients, size = sample_size, replace = TRUE), + hospital = sample(hospitals, size = sample_size, replace = TRUE, + prob = c(0.30, 0.35, 0.15, 0.20)), + bacteria = sample(bacteria, size = sample_size, replace = TRUE, + prob = c(0.50, 0.25, 0.15, 0.10)), + AMX = sample(ab_interpretations, size = sample_size, replace = TRUE, + prob = c(0.60, 0.05, 0.35)), + AMC = sample(ab_interpretations, size = sample_size, replace = TRUE, + prob = c(0.75, 0.10, 0.15)), + CIP = sample(ab_interpretations, size = sample_size, replace = TRUE, + prob = c(0.80, 0.00, 0.20)), + GEN = sample(ab_interpretations, size = sample_size, replace = TRUE, + prob = c(0.92, 0.00, 0.08)))
Using the left_join()
function from the dplyr
package, we can ‘map’ the gender to the patient ID using the patients_table
object we created earlier:
data <- data %>% left_join(patients_table)
The resulting data set contains 20,000 blood culture isolates. With the head()
function we can preview the first 6 rows of this data set:
head(data)
date | +patient_id | +hospital | +bacteria | +AMX | +AMC | +CIP | +GEN | +gender | +
---|---|---|---|---|---|---|---|---|
2014-04-15 | +I4 | +Hospital D | +Escherichia coli | +S | +R | +S | +S | +M | +
2011-02-09 | +D1 | +Hospital A | +Escherichia coli | +S | +S | +S | +S | +M | +
2013-12-16 | +K4 | +Hospital C | +Staphylococcus aureus | +S | +S | +R | +S | +M | +
2017-08-23 | +Z9 | +Hospital B | +Escherichia coli | +S | +S | +S | +S | +F | +
2010-01-14 | +N4 | +Hospital A | +Staphylococcus aureus | +R | +S | +S | +S | +M | +
2016-01-31 | +N1 | +Hospital D | +Staphylococcus aureus | +R | +S | +R | +S | +M | +
Now, let’s start the cleaning and the analysis!
+We also created a package dedicated to data cleaning and checking, called the cleaner
package. It freq()
function can be used to create frequency tables.
For example, for the gender
variable:
data %>% freq(gender)
Frequency table
+Class: character
+Length: 20,000
+Available: 20,000 (100%, NA: 0 = 0%)
+Unique: 2
Shortest: 1
+Longest: 1
+ | Item | +Count | +Percent | +Cum. Count | +Cum. Percent | +
---|---|---|---|---|---|
1 | +M | +10,328 | +51.64% | +10,328 | +51.64% | +
2 | +F | +9,672 | +48.36% | +20,000 | +100.00% | +
So, we can draw at least two conclusions immediately. From a data scientists perspective, the data looks clean: only values M
and F
. From a researchers perspective: there are slightly more men. Nothing we didn’t already know.
The data is already quite clean, but we still need to transform some variables. The bacteria
column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate()
function of the dplyr
package makes this really easy:
We also want to transform the antibiotics, because in real life data we don’t know if they are really clean. The as.rsi()
function ensures reliability and reproducibility in these kind of variables. The mutate_at()
will run the as.rsi()
function on defined variables:
Finally, we will apply EUCAST rules on our antimicrobial results. In Europe, most medical microbiological laboratories already apply these rules. Our package features their latest insights on intrinsic resistance and exceptional phenotypes. Moreover, the eucast_rules()
function can also apply additional rules, like forcing
Because the amoxicillin (column AMX
) and amoxicillin/clavulanic acid (column AMC
) in our data were generated randomly, some rows will undoubtedly contain AMX = S and AMC = R, which is technically impossible. The eucast_rules()
fixes this:
data <- eucast_rules(data, col_mo = "bacteria", rules = "all")
Now that we have the microbial ID, we can add some taxonomic properties:
+data <- data %>% + mutate(gramstain = mo_gramstain(bacteria), + genus = mo_genus(bacteria), + species = mo_species(bacteria))
We also need to know which isolates we can actually use for analysis.
+To conduct an analysis of antimicrobial resistance, you must only include the first isolate of every patient per episode (Hindler et al., Clin Infect Dis. 2007). If you would not do this, you could easily get an overestimate or underestimate of the resistance of an antibiotic. Imagine that a patient was admitted with an MRSA and that it was found in 5 different blood cultures the following weeks (yes, some countries like the Netherlands have these blood drawing policies). The resistance percentage of oxacillin of all isolates would be overestimated, because you included this MRSA more than once. It would clearly be selection bias.
+The Clinical and Laboratory Standards Institute (CLSI) appoints this as follows:
+++(…) When preparing a cumulative antibiogram to guide clinical decisions about empirical antimicrobial therapy of initial infections, only the first isolate of a given species per patient, per analysis period (eg, one year) should be included, irrespective of body site, antimicrobial susceptibility profile, or other phenotypical characteristics (eg, biotype). The first isolate is easily identified, and cumulative antimicrobial susceptibility test data prepared using the first isolate are generally comparable to cumulative antimicrobial susceptibility test data calculated by other methods, providing duplicate isolates are excluded.
+
M39-A4 Analysis and Presentation of Cumulative Antimicrobial Susceptibility Test Data, 4th Edition. CLSI, 2014. Chapter 6.4
This AMR
package includes this methodology with the first_isolate()
function. It adopts the episode of a year (can be changed by user) and it starts counting days after every selected isolate. This new variable can easily be added to our data:
data <- data %>% + mutate(first = first_isolate(.)) +# NOTE: Using column `bacteria` as input for `col_mo`. +# NOTE: Using column `date` as input for `col_date`. +# NOTE: Using column `patient_id` as input for `col_patient_id`.
So only 28.3% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
data_1st <- data %>% + filter(first == TRUE)
For future use, the above two syntaxes can be shortened with the filter_first_isolate()
function:
data_1st <- data %>% + filter_first_isolate()
We made a slight twist to the CLSI algorithm, to take into account the antimicrobial susceptibility profile. Have a look at all isolates of patient N3, sorted on date:
+isolate | +date | +patient_id | +bacteria | +AMX | +AMC | +CIP | +GEN | +first | +
---|---|---|---|---|---|---|---|---|
1 | +2010-03-10 | +N3 | +B_ESCHR_COLI | +I | +S | +S | +S | +TRUE | +
2 | +2010-05-11 | +N3 | +B_ESCHR_COLI | +S | +S | +S | +S | +FALSE | +
3 | +2010-05-17 | +N3 | +B_ESCHR_COLI | +R | +S | +S | +S | +FALSE | +
4 | +2010-05-18 | +N3 | +B_ESCHR_COLI | +R | +S | +S | +S | +FALSE | +
5 | +2010-07-30 | +N3 | +B_ESCHR_COLI | +S | +S | +S | +S | +FALSE | +
6 | +2010-09-15 | +N3 | +B_ESCHR_COLI | +S | +S | +S | +R | +FALSE | +
7 | +2010-10-06 | +N3 | +B_ESCHR_COLI | +S | +S | +R | +S | +FALSE | +
8 | +2010-11-30 | +N3 | +B_ESCHR_COLI | +S | +S | +S | +S | +FALSE | +
9 | +2011-01-27 | +N3 | +B_ESCHR_COLI | +R | +I | +S | +S | +FALSE | +
10 | +2011-01-30 | +N3 | +B_ESCHR_COLI | +R | +S | +S | +S | +FALSE | +
Only 1 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics()
function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.
If a column exists with a name like ‘key(…)ab’ the first_isolate()
function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output:
data <- data %>% + mutate(keyab = key_antibiotics(.)) %>% + mutate(first_weighted = first_isolate(.)) +# NOTE: Using column `bacteria` as input for `col_mo`. +# NOTE: Using column `bacteria` as input for `col_mo`. +# NOTE: Using column `date` as input for `col_date`. +# NOTE: Using column `patient_id` as input for `col_patient_id`. +# NOTE: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics = FALSE to prevent this.
isolate | +date | +patient_id | +bacteria | +AMX | +AMC | +CIP | +GEN | +first | +first_weighted | +
---|---|---|---|---|---|---|---|---|---|
1 | +2010-03-10 | +N3 | +B_ESCHR_COLI | +I | +S | +S | +S | +TRUE | +TRUE | +
2 | +2010-05-11 | +N3 | +B_ESCHR_COLI | +S | +S | +S | +S | +FALSE | +FALSE | +
3 | +2010-05-17 | +N3 | +B_ESCHR_COLI | +R | +S | +S | +S | +FALSE | +TRUE | +
4 | +2010-05-18 | +N3 | +B_ESCHR_COLI | +R | +S | +S | +S | +FALSE | +FALSE | +
5 | +2010-07-30 | +N3 | +B_ESCHR_COLI | +S | +S | +S | +S | +FALSE | +TRUE | +
6 | +2010-09-15 | +N3 | +B_ESCHR_COLI | +S | +S | +S | +R | +FALSE | +TRUE | +
7 | +2010-10-06 | +N3 | +B_ESCHR_COLI | +S | +S | +R | +S | +FALSE | +TRUE | +
8 | +2010-11-30 | +N3 | +B_ESCHR_COLI | +S | +S | +S | +S | +FALSE | +TRUE | +
9 | +2011-01-27 | +N3 | +B_ESCHR_COLI | +R | +I | +S | +S | +FALSE | +TRUE | +
10 | +2011-01-30 | +N3 | +B_ESCHR_COLI | +R | +S | +S | +S | +FALSE | +FALSE | +
Instead of 1, now 7 isolates are flagged. In total, 78.7% of all isolates are marked ‘first weighted’ - 50.4% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
+As with filter_first_isolate()
, there’s a shortcut for this new algorithm too:
data_1st <- data %>% + filter_first_weighted_isolate()
So we end up with 15,740 isolates for analysis.
+We can remove unneeded columns:
+ +Now our data looks like:
+head(data_1st)
date | +patient_id | +hospital | +bacteria | +AMX | +AMC | +CIP | +GEN | +gender | +gramstain | +genus | +species | +first_weighted | +
---|---|---|---|---|---|---|---|---|---|---|---|---|
2014-04-15 | +I4 | +Hospital D | +B_ESCHR_COLI | +R | +R | +S | +S | +M | +Gram-negative | +Escherichia | +coli | +TRUE | +
2011-02-09 | +D1 | +Hospital A | +B_ESCHR_COLI | +S | +S | +S | +S | +M | +Gram-negative | +Escherichia | +coli | +TRUE | +
2013-12-16 | +K4 | +Hospital C | +B_STPHY_AURS | +S | +S | +R | +S | +M | +Gram-positive | +Staphylococcus | +aureus | +TRUE | +
2017-08-23 | +Z9 | +Hospital B | +B_ESCHR_COLI | +S | +S | +S | +S | +F | +Gram-negative | +Escherichia | +coli | +TRUE | +
2010-01-14 | +N4 | +Hospital A | +B_STPHY_AURS | +R | +S | +S | +S | +M | +Gram-positive | +Staphylococcus | +aureus | +TRUE | +
2016-01-31 | +N1 | +Hospital D | +B_STPHY_AURS | +R | +S | +R | +S | +M | +Gram-positive | +Staphylococcus | +aureus | +TRUE | +
Time for the analysis!
+You might want to start by getting an idea of how the data is distributed. It’s an important start, because it also decides how you will continue your analysis. Although this package contains a convenient function to make frequency tables, exploratory data analysis (EDA) is not the primary scope of this package. Use a package like DataExplorer
for that, or read the free online book Exploratory Data Analysis with R by Roger D. Peng.
To just get an idea how the species are distributed, create a frequency table with our freq()
function. We created the genus
and species
column earlier based on the microbial ID. With paste()
, we can concatenate them together.
The freq()
function can be used like the base R language was intended:
Or can be used like the dplyr
way, which is easier readable:
data_1st %>% freq(genus, species)
Frequency table
+Class: character
+Length: 15,740
+Available: 15,740 (100%, NA: 0 = 0%)
+Unique: 4
Shortest: 16
+Longest: 24
+ | Item | +Count | +Percent | +Cum. Count | +Cum. Percent | +
---|---|---|---|---|---|
1 | +Escherichia coli | +7,938 | +50.43% | +7,938 | +50.43% | +
2 | +Staphylococcus aureus | +3,883 | +24.67% | +11,821 | +75.10% | +
3 | +Streptococcus pneumoniae | +2,317 | +14.72% | +14,138 | +89.82% | +
4 | +Klebsiella pneumoniae | +1,602 | +10.18% | +15,740 | +100.00% | +
If you want to get a quick glance of the number of isolates in different bug/drug combinations, you can use the bug_drug_combinations()
function:
data_1st %>% + bug_drug_combinations() %>% + head() # show first 6 rows
# NOTE: Using column `bacteria` as input for `col_mo`.
+mo | +ab | +S | +I | +R | +total | +
---|---|---|---|---|---|
E. coli | +AMX | +3808 | +236 | +3894 | +7938 | +
E. coli | +AMC | +6223 | +317 | +1398 | +7938 | +
E. coli | +CIP | +6050 | +0 | +1888 | +7938 | +
E. coli | +GEN | +7130 | +0 | +808 | +7938 | +
K. pneumoniae | +AMX | +0 | +0 | +1602 | +1602 | +
K. pneumoniae | +AMC | +1241 | +61 | +300 | +1602 | +
Using Tidyverse selections, you can also select columns based on the antibiotic class they are in:
+data_1st %>% + select(bacteria, fluoroquinolones()) %>% + bug_drug_combinations()
# Selecting fluoroquinolones: `CIP` (ciprofloxacin)
+# NOTE: Using column `bacteria` as input for `col_mo`.
+mo | +ab | +S | +I | +R | +total | +
---|---|---|---|---|---|
E. coli | +CIP | +6050 | +0 | +1888 | +7938 | +
K. pneumoniae | +CIP | +1218 | +0 | +384 | +1602 | +
S. aureus | +CIP | +2967 | +0 | +916 | +3883 | +
S. pneumoniae | +CIP | +1756 | +0 | +561 | +2317 | +
This will only give you the crude numbers in the data. To calculate antimicrobial resistance, we use the resistance()
and susceptibility()
functions.
The functions resistance()
and susceptibility()
can be used to calculate antimicrobial resistance or susceptibility. For more specific analyses, the functions proportion_S()
, proportion_SI()
, proportion_I()
, proportion_IR()
and proportion_R()
can be used to determine the proportion of a specific antimicrobial outcome.
As per the EUCAST guideline of 2019, we calculate resistance as the proportion of R (proportion_R()
, equal to resistance()
) and susceptibility as the proportion of S and I (proportion_SI()
, equal to susceptibility()
). These functions can be used on their own:
data_1st %>% resistance(AMX) +# [1] 0.535324
Or can be used in conjuction with group_by()
and summarise()
, both from the dplyr
package:
data_1st %>% + group_by(hospital) %>% + summarise(amoxicillin = resistance(AMX))
# `summarise()` ungrouping output (override with `.groups` argument)
+hospital | +amoxicillin | +
---|---|
Hospital A | +0.5322921 | +
Hospital B | +0.5393839 | +
Hospital C | +0.5327529 | +
Hospital D | +0.5348690 | +
Of course it would be very convenient to know the number of isolates responsible for the percentages. For that purpose the n_rsi()
can be used, which works exactly like n_distinct()
from the dplyr
package. It counts all isolates available for every group (i.e. values S, I or R):
data_1st %>% + group_by(hospital) %>% + summarise(amoxicillin = resistance(AMX), + available = n_rsi(AMX))
# `summarise()` ungrouping output (override with `.groups` argument)
+hospital | +amoxicillin | +available | +
---|---|---|
Hospital A | +0.5322921 | +4738 | +
Hospital B | +0.5393839 | +5421 | +
Hospital C | +0.5327529 | +2412 | +
Hospital D | +0.5348690 | +3169 | +
These functions can also be used to get the proportion of multiple antibiotics, to calculate empiric susceptibility of combination therapies very easily:
+data_1st %>% + group_by(genus) %>% + summarise(amoxiclav = susceptibility(AMC), + gentamicin = susceptibility(GEN), + amoxiclav_genta = susceptibility(AMC, GEN))
# `summarise()` ungrouping output (override with `.groups` argument)
+genus | +amoxiclav | +gentamicin | +amoxiclav_genta | +
---|---|---|---|
Escherichia | +0.8238851 | +0.8982111 | +0.9840010 | +
Klebsiella | +0.8127341 | +0.8951311 | +0.9818976 | +
Staphylococcus | +0.8246201 | +0.9260881 | +0.9863508 | +
Streptococcus | +0.5463962 | +0.0000000 | +0.5463962 | +
To make a transition to the next part, let’s see how this difference could be plotted:
+data_1st %>% + group_by(genus) %>% + summarise("1. Amoxi/clav" = susceptibility(AMC), + "2. Gentamicin" = susceptibility(GEN), + "3. Amoxi/clav + genta" = susceptibility(AMC, GEN)) %>% + # pivot_longer() from the tidyr package "lengthens" data: + tidyr::pivot_longer(-genus, names_to = "antibiotic") %>% + ggplot(aes(x = genus, + y = value, + fill = antibiotic)) + + geom_col(position = "dodge2") +# `summarise()` ungrouping output (override with `.groups` argument)
To show results in plots, most R users would nowadays use the ggplot2
package. This package lets you create plots in layers. You can read more about it on their website. A quick example would look like these syntaxes:
ggplot(data = a_data_set, + mapping = aes(x = year, + y = value)) + + geom_col() + + labs(title = "A title", + subtitle = "A subtitle", + x = "My X axis", + y = "My Y axis") + +# or as short as: +ggplot(a_data_set) + + geom_bar(aes(year))
The AMR
package contains functions to extend this ggplot2
package, for example geom_rsi()
. It automatically transforms data with count_df()
or proportion_df()
and show results in stacked bars. Its simplest and shortest example:
Omit the translate_ab = FALSE
to have the antibiotic codes (AMX, AMC, CIP, GEN) translated to official WHO names (amoxicillin, amoxicillin/clavulanic acid, ciprofloxacin, gentamicin).
If we group on e.g. the genus
column and add some additional functions from our package, we can create this:
# group the data on `genus` +ggplot(data_1st %>% group_by(genus)) + + # create bars with genus on x axis + # it looks for variables with class `rsi`, + # of which we have 4 (earlier created with `as.rsi`) + geom_rsi(x = "genus") + + # split plots on antibiotic + facet_rsi(facet = "antibiotic") + + # set colours to the R/SI interpretations + scale_rsi_colours() + + # show percentages on y axis + scale_y_percent(breaks = 0:4 * 25) + + # turn 90 degrees, to make it bars instead of columns + coord_flip() + + # add labels + labs(title = "Resistance per genus and antibiotic", + subtitle = "(this is fake data)") + + # and print genus in italic to follow our convention + # (is now y axis because we turned the plot) + theme(axis.text.y = element_text(face = "italic"))
To simplify this, we also created the ggplot_rsi()
function, which combines almost all above functions:
data_1st %>% + group_by(genus) %>% + ggplot_rsi(x = "genus", + facet = "antibiotic", + breaks = 0:4 * 25, + datalabels = FALSE) + + coord_flip()
The next example uses the example_isolates
data set. This is a data set included with this package and contains 2,000 microbial isolates with their full antibiograms. It reflects reality and can be used to practice AMR analysis.
We will compare the resistance to fosfomycin (column FOS
) in hospital A and D. The input for the fisher.test()
can be retrieved with a transformation like this:
# use package 'tidyr' to pivot data: +library(tidyr) + +check_FOS <- example_isolates %>% + filter(hospital_id %in% c("A", "D")) %>% # filter on only hospitals A and D + select(hospital_id, FOS) %>% # select the hospitals and fosfomycin + group_by(hospital_id) %>% # group on the hospitals + count_df(combine_SI = TRUE) %>% # count all isolates per group (hospital_id) + pivot_wider(names_from = hospital_id, # transform output so A and D are columns + values_from = value) %>% + select(A, D) %>% # and only select these columns + as.matrix() # transform to a good old matrix for fisher.test() + +check_FOS +# A D +# [1,] 25 77 +# [2,] 24 33
We can apply the test now with:
+# do Fisher's Exact Test +fisher.test(check_FOS) +# +# Fisher's Exact Test for Count Data +# +# data: check_FOS +# p-value = 0.03104 +# alternative hypothesis: true odds ratio is not equal to 1 +# 95 percent confidence interval: +# 0.2111489 0.9485124 +# sample estimates: +# odds ratio +# 0.4488318
As can be seen, the p value is 0.031, which means that the fosfomycin resistance found in isolates from patients in hospital A and D are really different.
+vignettes/EUCAST.Rmd
+ EUCAST.Rmd
What are EUCAST rules? The European Committee on Antimicrobial Susceptibility Testing (EUCAST) states on their website:
+++EUCAST expert rules are a tabulated collection of expert knowledge on intrinsic resistances, exceptional resistance phenotypes and interpretive rules that may be applied to antimicrobial susceptibility testing in order to reduce errors and make appropriate recommendations for reporting particular resistances.
+
In Europe, a lot of medical microbiological laboratories already apply these rules (Brown et al., 2015). Our package features their latest insights on intrinsic resistance and exceptional phenotypes (version 10.0, 2020). Moreover, the eucast_rules()
function we use for this purpose can also apply additional rules, like forcing
These rules can be used to discard impossible bug-drug combinations in your data. For example, Klebsiella produces beta-lactamase that prevents ampicillin (or amoxicillin) from working against it. In other words, practically every strain of Klebsiella is resistant to ampicillin.
+Sometimes, laboratory data can still contain such strains with ampicillin being susceptible to ampicillin. This could be because an antibiogram is available before an identification is available, and the antibiogram is then not re-interpreted based on the identification (namely, Klebsiella). EUCAST expert rules solve this, that can be applied using eucast_rules()
:
oops <- data.frame(mo = c("Klebsiella", + "Escherichia"), + ampicillin = "S") +oops +# mo ampicillin +# 1 Klebsiella S +# 2 Escherichia S + +eucast_rules(oops, info = FALSE) +# mo ampicillin +# 1 Klebsiella R +# 2 Escherichia S
EUCAST rules can not only be used for correction, they can also be used for filling in known resistance and susceptibility based on results of other antimicrobials drugs. This process is called interpretive reading and is part of the eucast_rules()
function as well:
data <- data.frame(mo = c("Staphylococcus aureus", + "Enterococcus faecalis", + "Escherichia coli", + "Klebsiella pneumoniae", + "Pseudomonas aeruginosa"), + VAN = "-", # Vancomycin + AMX = "-", # Amoxicillin + COL = "-", # Colistin + CAZ = "-", # Ceftazidime + CXM = "-", # Cefuroxime + PEN = "S", # Penicillin G + FOX = "S", # Cefoxitin + stringsAsFactors = FALSE)
data
mo | +VAN | +AMX | +COL | +CAZ | +CXM | +PEN | +FOX | +
---|---|---|---|---|---|---|---|
Staphylococcus aureus | +- | +- | +- | +- | +- | +S | +S | +
Enterococcus faecalis | +- | +- | +- | +- | +- | +S | +S | +
Escherichia coli | +- | +- | +- | +- | +- | +S | +S | +
Klebsiella pneumoniae | +- | +- | +- | +- | +- | +S | +S | +
Pseudomonas aeruginosa | +- | +- | +- | +- | +- | +S | +S | +
eucast_rules(data)
# Warning: Not all columns with antimicrobial results are of class <rsi>.
+# Transform eligible columns to class <rsi> on beforehand: your_data %>% mutate_if(is.rsi.eligible, as.rsi)
+mo | +VAN | +AMX | +COL | +CAZ | +CXM | +PEN | +FOX | +
---|---|---|---|---|---|---|---|
Staphylococcus aureus | +- | +S | +R | +R | +S | +S | +S | +
Enterococcus faecalis | +- | +- | +R | +R | +R | +S | +R | +
Escherichia coli | +R | +- | +- | +- | +- | +R | +S | +
Klebsiella pneumoniae | +R | +R | +- | +- | +- | +R | +S | +
Pseudomonas aeruginosa | +R | +R | +- | +- | +R | +R | +R | +
vignettes/MDR.Rmd
+ MDR.Rmd
With the function mdro()
, you can determine which micro-organisms are multi-drug resistant organisms (MDRO).
The mdro()
function takes a data set as input, such as a regular data.frame
. It tries to automatically determine the right columns for info about your isolates, like the name of the species and all columns with results of antimicrobial agents. See the help page for more info about how to set the right settings for your data with the command ?mdro
.
For WHONET data (and most other data), all settings are automatically set correctly.
+The function support multiple guidelines. You can select a guideline with the guideline
parameter. Currently supported guidelines are (case-insensitive):
guideline = "CMI2012"
(default)
guideline = "EUCAST"
guideline = "TB"
guideline = "MRGN"
guideline = "BRMO"
The Dutch national guideline - Rijksinstituut voor Volksgezondheid en Milieu “WIP-richtlijn BRMO (Bijzonder Resistente Micro-Organismen) [ZKH]” (link)
+The mdro()
function always returns an ordered factor
. For example, the output of the default guideline by Magiorakos et al. returns a factor
with levels ‘Negative’, ‘MDR’, ‘XDR’ or ‘PDR’ in that order.
The next example uses the example_isolates
data set. This is a data set included with this package and contains 2,000 microbial isolates with their full antibiograms. It reflects reality and can be used to practice AMR analysis. If we test the MDR/XDR/PDR guideline on this data set, we get:
example_isolates %>% + mdro() %>% + freq() # show frequency table of the result +# NOTE: Using column `mo` as input for `col_mo`. +# NOTE: Auto-guessing columns suitable for analysis...OK. +# NOTE: Reliability would be improved if these antimicrobial results would be available too: ceftaroline (CPT), fusidic acid (FUS), telavancin (TLV), daptomycin (DAP), quinupristin/dalfopristin (QDA), minocycline (MNO), gentamicin-high (GEH), streptomycin-high (STH), doripenem (DOR), levofloxacin (LVX), netilmicin (NET), ticarcillin/clavulanic acid (TCC), ertapenem (ETP), cefotetan (CTT), aztreonam (ATM), ampicillin/sulbactam (SAM), polymyxin B (PLB) +# Warning in mdro(.): NA introduced for isolates where the available percentage of +# antimicrobial classes was below 50% (set with `pct_required_classes`)
Frequency table
+Class: factor > ordered (numeric)
+Length: 2,000
+Levels: 4: Negative < Multi-drug-resistant (MDR) < Extensively drug-resistant …
+Available: 1,711 (85.55%, NA: 289 = 14.45%)
+Unique: 2
+ | Item | +Count | +Percent | +Cum. Count | +Cum. Percent | +
---|---|---|---|---|---|
1 | +Negative | +1595 | +93.22% | +1595 | +93.22% | +
2 | +Multi-drug-resistant (MDR) | +116 | +6.78% | +1711 | +100.00% | +
For another example, I will create a data set to determine multi-drug resistant TB:
+# a helper function to get a random vector with values S, I and R +# with the probabilities 50% - 10% - 40% +sample_rsi <- function() { + sample(c("S", "I", "R"), + size = 5000, + prob = c(0.5, 0.1, 0.4), + replace = TRUE) +} + +my_TB_data <- data.frame(rifampicin = sample_rsi(), + isoniazid = sample_rsi(), + gatifloxacin = sample_rsi(), + ethambutol = sample_rsi(), + pyrazinamide = sample_rsi(), + moxifloxacin = sample_rsi(), + kanamycin = sample_rsi())
Because all column names are automatically verified for valid drug names or codes, this would have worked exactly the same:
+my_TB_data <- data.frame(RIF = sample_rsi(), + INH = sample_rsi(), + GAT = sample_rsi(), + ETH = sample_rsi(), + PZA = sample_rsi(), + MFX = sample_rsi(), + KAN = sample_rsi())
The data set now looks like this:
+head(my_TB_data) +# rifampicin isoniazid gatifloxacin ethambutol pyrazinamide moxifloxacin +# 1 S R R S R R +# 2 R S R S R S +# 3 R R S S R S +# 4 S S S S R S +# 5 S R S S R S +# 6 R S R S S S +# kanamycin +# 1 R +# 2 I +# 3 R +# 4 S +# 5 R +# 6 S
We can now add the interpretation of MDR-TB to our data set. You can use:
+mdro(my_TB_data, guideline = "TB")
or its shortcut mdr_tb()
:
my_TB_data$mdr <- mdr_tb(my_TB_data) +# NOTE: No column found as input for `col_mo`, assuming all records contain Mycobacterium tuberculosis. +# NOTE: Auto-guessing columns suitable for analysis...OK. +# NOTE: Reliability would be improved if these antimicrobial results would be available too: capreomycin (CAP), rifabutin (RIB), rifapentine (RFP)
Create a frequency table of the results:
+freq(my_TB_data$mdr)
Frequency table
+Class: factor > ordered (numeric)
+Length: 5,000
+Levels: 5: Negative < Mono-resistant < Poly-resistant < Multi-drug-resistant <…
+Available: 5,000 (100%, NA: 0 = 0%)
+Unique: 5
+ | Item | +Count | +Percent | +Cum. Count | +Cum. Percent | +
---|---|---|---|---|---|
1 | +Mono-resistant | +3245 | +64.90% | +3245 | +64.90% | +
2 | +Negative | +678 | +13.56% | +3923 | +78.46% | +
3 | +Multi-drug-resistant | +607 | +12.14% | +4530 | +90.60% | +
4 | +Poly-resistant | +262 | +5.24% | +4792 | +95.84% | +
5 | +Extensively drug-resistant | +208 | +4.16% | +5000 | +100.00% | +
vignettes/PCA.Rmd
+ PCA.Rmd
NOTE: This page will be updated soon, as the pca() function is currently being developed.
+ +For PCA, we need to transform our AMR data first. This is what the example_isolates
data set in this package looks like:
library(AMR) +library(dplyr) +glimpse(example_isolates) +# Rows: 2,000 +# Columns: 49 +# $ date <date> 2002-01-02, 2002-01-03, 2002-01-07, 2002-01-07, 2002… +# $ hospital_id <fct> D, D, B, B, B, B, D, D, B, B, D, D, D, D, D, B, B, B,… +# $ ward_icu <lgl> FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, T… +# $ ward_clinical <lgl> TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, F… +# $ ward_outpatient <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS… +# $ age <dbl> 65, 65, 45, 45, 45, 45, 78, 78, 45, 79, 67, 67, 71, 7… +# $ gender <chr> "F", "F", "F", "F", "F", "F", "M", "M", "F", "F", "M"… +# $ patient_id <chr> "A77334", "A77334", "067927", "067927", "067927", "06… +# $ mo <mo> "B_ESCHR_COLI", "B_ESCHR_COLI", "B_STPHY_EPDR", "B_STP… +# $ PEN <ord> R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R,… +# $ OXA <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ FLC <ord> NA, NA, R, R, R, R, S, S, R, S, S, S, NA, NA, NA, NA,… +# $ AMX <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ AMC <ord> I, I, NA, NA, NA, NA, S, S, NA, NA, S, S, I, I, R, I,… +# $ AMP <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ TZP <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ CZO <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ FEP <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ CXM <ord> I, I, R, R, R, R, S, S, R, S, S, S, S, S, NA, S, S, R… +# $ FOX <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ CTX <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, S,… +# $ CAZ <ord> NA, NA, R, R, R, R, R, R, R, R, R, R, NA, NA, NA, S, … +# $ CRO <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, S,… +# $ GEN <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ TOB <ord> NA, NA, NA, NA, NA, NA, S, S, NA, NA, NA, NA, S, S, N… +# $ AMK <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ KAN <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ TMP <ord> R, R, S, S, R, R, R, R, S, S, NA, NA, S, S, S, S, S, … +# $ SXT <ord> R, R, S, S, NA, NA, NA, NA, S, S, NA, NA, S, S, S, S,… +# $ NIT <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ FOS <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ LNZ <ord> R, R, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R, R, R… +# $ CIP <ord> NA, NA, NA, NA, NA, NA, NA, NA, S, S, NA, NA, NA, NA,… +# $ MFX <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ VAN <ord> R, R, S, S, S, S, S, S, S, S, NA, NA, R, R, R, R, R, … +# $ TEC <ord> R, R, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R, R, R… +# $ TCY <ord> R, R, S, S, S, S, S, S, S, I, S, S, NA, NA, I, R, R, … +# $ TGC <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ DOX <ord> NA, NA, S, S, S, S, S, S, S, NA, S, S, NA, NA, NA, R,… +# $ ERY <ord> R, R, R, R, R, R, S, S, R, S, S, S, R, R, R, R, R, R,… +# $ CLI <ord> NA, NA, NA, NA, NA, R, NA, NA, NA, NA, NA, NA, NA, NA… +# $ AZM <ord> R, R, R, R, R, R, S, S, R, S, S, S, R, R, R, R, R, R,… +# $ IPM <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, S,… +# $ MEM <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ MTR <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ CHL <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ COL <ord> NA, NA, R, R, R, R, R, R, R, R, R, R, NA, NA, NA, R, … +# $ MUP <ord> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… +# $ RIF <ord> R, R, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R, R, R…
Now to transform this to a data set with only resistance percentages per taxonomic order and genus:
+resistance_data <- example_isolates %>% + group_by(order = mo_order(mo), # group on anything, like order + genus = mo_genus(mo)) %>% # and genus as we do here + summarise_if(is.rsi, resistance) %>% # then get resistance of all drugs + select(order, genus, AMC, CXM, CTX, + CAZ, GEN, TOB, TMP, SXT) # and select only relevant columns + +head(resistance_data) +# # A tibble: 6 x 10 +# # Groups: order [2] +# order genus AMC CXM CTX CAZ GEN TOB TMP SXT +# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> +# 1 (unknown order) (unknown genu… NA NA NA NA NA NA NA NA +# 2 Actinomycetales Corynebacteri… NA NA NA NA NA NA NA NA +# 3 Actinomycetales Cutibacterium NA NA NA NA NA NA NA NA +# 4 Actinomycetales Dermabacter NA NA NA NA NA NA NA NA +# 5 Actinomycetales Micrococcus NA NA NA NA NA NA NA NA +# 6 Actinomycetales Rothia NA NA NA NA NA NA NA NA
The new pca()
function will automatically filter on rows that contain numeric values in all selected variables, so we now only need to do:
pca_result <- pca(resistance_data) +# NOTE: Columns selected for PCA: AMC CXM CTX CAZ GEN TOB TMP SXT. +# Total observations available: 7.
The result can be reviewed with the good old summary()
function:
summary(pca_result) +# Importance of components: +# PC1 PC2 PC3 PC4 PC5 PC6 PC7 +# Standard deviation 2.154 1.6809 0.61305 0.33882 0.20755 0.03137 1.602e-16 +# Proportion of Variance 0.580 0.3532 0.04698 0.01435 0.00538 0.00012 0.000e+00 +# Cumulative Proportion 0.580 0.9332 0.98014 0.99449 0.99988 1.00000 1.000e+00
Good news. The first two components explain a total of 93.3% of the variance (see the PC1 and PC2 values of the Proportion of Variance. We can create a so-called biplot with the base R biplot()
function, to see which antimicrobial resistance per drug explain the difference per microorganism.
biplot(pca_result)
But we can’t see the explanation of the points. Perhaps this works better with our new ggplot_pca()
function, that automatically adds the right labels and even groups:
ggplot_pca(pca_result)
You can also print an ellipse per group, and edit the appearance:
+ggplot_pca(pca_result, ellipse = TRUE) + + ggplot2::labs(title = "An AMR/PCA biplot!")
vignettes/SPSS.Rmd
+ SPSS.Rmd
SPSS (Statistical Package for the Social Sciences) is probably the most well-known software package for statistical analysis. SPSS is easier to learn than R, because in SPSS you only have to click a menu to run parts of your analysis. Because of its user-friendliness, it is taught at universities and particularly useful for students who are new to statistics. From my experience, I would guess that pretty much all (bio)medical students know it at the time they graduate. SAS and Stata are comparable statistical packages popular in big industries.
+As said, SPSS is easier to learn than R. But SPSS, SAS and Stata come with major downsides when comparing it with R:
+R is highly modular.
+The official R network (CRAN) features almost 14,000 packages at the time of writing, our AMR
package being one of them. All these packages were peer-reviewed before publication. Aside from this official channel, there are also developers who choose not to submit to CRAN, but rather keep it on their own public repository, like GitHub. So there may even be a lot more than 14,000 packages out there.
Bottom line is, you can really extend it yourself or ask somebody to do this for you. Take for example our AMR
package. Among other things, it adds reliable reference data to R to help you with the data cleaning and analysis. SPSS, SAS and Stata will never know what a valid MIC value is or what the Gram stain of E. coli is. Or that all species of Klebiella are resistant to amoxicillin and that Floxapen® is a trade name of flucloxacillin. These facts and properties are often needed to clean existing data, which would be very inconvenient in a software package without reliable reference data. See below for a demonstration.
R is extremely flexible.
+Because you write the syntax yourself, you can do anything you want. The flexibility in transforming, arranging, grouping and summarising data, or drawing plots, is endless - with SPSS, SAS or Stata you are bound to their algorithms and format styles. They may be a bit flexible, but you can probably never create that very specific publication-ready plot without using other (paid) software. If you sometimes write syntaxes in SPSS to run a complete analysis or to ‘automate’ some of your work, you could do this a lot less time in R. You will notice that writing syntaxes in R is a lot more nifty and clever than in SPSS. Still, as working with any statistical package, you will have to have knowledge about what you are doing (statistically) and what you are willing to accomplish.
+R can be easily automated.
+Over the last years, R Markdown has really made an interesting development. With R Markdown, you can very easily produce reports, whether the format has to be Word, PowerPoint, a website, a PDF document or just the raw data to Excel. It even allows the use of a reference file containing the layout style (e.g. fonts and colours) of your organisation. I use this a lot to generate weekly and monthly reports automatically. Just write the code once and enjoy the automatically updated reports at any interval you like.
+For an even more professional environment, you could create Shiny apps: live manipulation of data using a custom made website. The webdesign knowledge needed (JavaScript, CSS, HTML) is almost zero.
+R has a huge community.
+Many R users just ask questions on websites like StackOverflow.com, the largest online community for programmers. At the time of writing, more than 300,000 R-related questions have already been asked on this platform (which covers questions and answers for any programming language). In my own experience, most questions are answered within a couple of minutes.
+R understands any data type, including SPSS/SAS/Stata.
+And that’s not vice versa I’m afraid. You can import data from any source into R. For example from SPSS, SAS and Stata (link), from Minitab, Epi Info and EpiData (link), from Excel (link), from flat files like CSV, TXT or TSV (link), or directly from databases and datawarehouses from anywhere on the world (link). You can even scrape websites to download tables that are live on the internet (link) or get the results of an API call and transform it into data in only one command (link).
+And the best part - you can export from R to most data formats as well. So you can import an SPSS file, do your analysis neatly in R and export the resulting tables to Excel files for sharing.
+R is completely free and open-source.
+No strings attached. It was created and is being maintained by volunteers who believe that (data) science should be open and publicly available to everybody. SPSS, SAS and Stata are quite expensive. IBM SPSS Staticstics only comes with subscriptions nowadays, varying between USD 1,300 and USD 8,500 per user per year. SAS Analytics Pro costs around USD 10,000 per computer. Stata also has a business model with subscription fees, varying between USD 600 and USD 2,800 per computer per year, but lower prices come with a limitation of the number of variables you can work with. And still they do not offer the above benefits of R.
+If you are working at a midsized or small company, you can save it tens of thousands of dollars by using R instead of e.g. SPSS - gaining even more functions and flexibility. And all R enthousiasts can do as much PR as they want (like I do here), because nobody is officially associated with or affiliated by R. It is really free.
+R is (nowadays) the preferred analysis software in academic papers.
+At present, R is among the world most powerful statistical languages, and it is generally very popular in science (Bollmann et al., 2017). For all the above reasons, the number of references to R as an analysis method in academic papers is rising continuously and has even surpassed SPSS for academic use (Muenchen, 2014).
+I believe that the thing with SPSS is, that it has always had a great user interface which is very easy to learn and use. Back when they developed it, they had very little competition, let alone from R. R didn’t even had a professional user interface until the last decade (called RStudio, see below). How people used R between the nineties and 2010 is almost completely incomparable to how R is being used now. The language itself has been restyled completely by volunteers who are dedicated professionals in the field of data science. SPSS was great when there was nothing else that could compete. But now in 2020, I don’t see any reason why SPSS would be of any better use than R.
+To demonstrate the first point:
+# not all values are valid MIC values: +as.mic(0.125) +# Class <mic> +# [1] 0.125 +as.mic("testvalue") +# Class <mic> +# [1] <NA> + +# the Gram stain is avaiable for all bacteria: +mo_gramstain("E. coli") +# [1] "Gram-negative" + +# Klebsiella is intrinsic resistant to amoxicllin, according to EUCAST: +klebsiella_test <- data.frame(mo = "klebsiella", + amox = "S", + stringsAsFactors = FALSE) +klebsiella_test # (our original data) +# mo amox +# 1 klebsiella S +eucast_rules(klebsiella_test, info = FALSE) # (the edited data by EUCAST rules) +# mo amox +# 1 klebsiella R + +# hundreds of trade names can be translated to a name, trade name or an ATC code: +ab_name("floxapen") +# [1] "Flucloxacillin" +ab_tradenames("floxapen") +# [1] "floxacillin" "floxapen" "floxapen sodium salt" +# [4] "fluclox" "flucloxacilina" "flucloxacillin" +# [7] "flucloxacilline" "flucloxacillinum" "fluorochloroxacillin" +ab_atc("floxapen") +# [1] "J01CF05"
To work with R, probably the best option is to use RStudio. It is an open-source and free desktop environment which not only allows you to run R code, but also supports project management, version management, package management and convenient import menus to work with other data sources. You can also install RStudio Server on a private or corporate server, which brings nothing less than the complete RStudio software to you as a website (at home or at work).
+To import a data file, just click Import Dataset in the Environment tab:
+ +If additional packages are needed, RStudio will ask you if they should be installed on beforehand.
+In the the window that opens, you can define all options (parameters) that should be used for import and you’re ready to go:
+ +If you want named variables to be imported as factors so it resembles SPSS more, use as_factor()
.
The difference is this:
+SPSS_data +# # A tibble: 4,203 x 4 +# v001 sex status statusage +# <dbl> <dbl+lbl> <dbl+lbl> <dbl> +# 1 10002 1 1 76.6 +# 2 10004 0 1 59.1 +# 3 10005 1 1 54.5 +# 4 10006 1 1 54.1 +# 5 10007 1 1 57.7 +# 6 10008 1 1 62.8 +# 7 10010 0 1 63.7 +# 8 10011 1 1 73.1 +# 9 10017 1 1 56.7 +# 10 10018 0 1 66.6 +# # … with 4,193 more rows + +as_factor(SPSS_data) +# # A tibble: 4,203 x 4 +# v001 sex status statusage +# <dbl> <fct> <fct> <dbl> +# 1 10002 Male alive 76.6 +# 2 10004 Female alive 59.1 +# 3 10005 Male alive 54.5 +# 4 10006 Male alive 54.1 +# 5 10007 Male alive 57.7 +# 6 10008 Male alive 62.8 +# 7 10010 Female alive 63.7 +# 8 10011 Male alive 73.1 +# 9 10017 Male alive 56.7 +# 10 10018 Female alive 66.6 +# # … with 4,193 more rows
To import data from SPSS, SAS or Stata, you can use the great haven
package yourself:
# download and install the latest version: +install.packages("haven") +# load the package you just installed: +library(haven)
You can now import files as follows:
+To read files from SPSS into R:
+# read any SPSS file based on file extension (best way): +read_spss(file = "path/to/file") + +# read .sav or .zsav file: +read_sav(file = "path/to/file") + +# read .por file: +read_por(file = "path/to/file")
Do not forget about as_factor()
, as mentioned above.
To export your R objects to the SPSS file format:
+ +To read files from SAS into R:
+# read .sas7bdat + .sas7bcat files: +read_sas(data_file = "path/to/file", catalog_file = NULL) + +# read SAS transport files (version 5 and version 8): +read_xpt(file = "path/to/file")
To export your R objects to the SAS file format:
+ +To read files from Stata into R:
+# read .dta file: +read_stata(file = "/path/to/file") + +# works exactly the same: +read_dta(file = "/path/to/file")
To export your R objects to the Stata file format:
+# save as .dta file, Stata version 14: +# (supports Stata v8 until v15 at the time of writing) +write_dta(data = yourdata, path = "/path/to/file", version = 14)
vignettes/WHONET.Rmd
+ WHONET.Rmd
This tutorial assumes you already imported the WHONET data with e.g. the readxl
package. In RStudio, this can be done using the menu button ‘Import Dataset’ in the tab ‘Environment’. Choose the option ‘From Excel’ and select your exported file. Make sure date fields are imported correctly.
An example syntax could look like this:
+library(readxl) +data <- read_excel(path = "path/to/your/file.xlsx")
This package comes with an example data set WHONET
. We will use it for this analysis.
First, load the relevant packages if you did not yet did this. I use the tidyverse for all of my analyses. All of them. If you don’t know it yet, I suggest you read about it on their website: https://www.tidyverse.org/.
+library(dplyr) # part of tidyverse +library(ggplot2) # part of tidyverse +library(AMR) # this package +library(cleaner) # to create frequency tables
We will have to transform some variables to simplify and automate the analysis:
+mo
) using our Catalogue of Life reference data set, which contains all ~70,000 microorganisms from the taxonomic kingdoms Bacteria, Fungi and Protozoa. We do the tranformation with as.mo()
. This function also recognises almost all WHONET abbreviations of microorganisms."S"
, "I"
or "R"
. That is exactly where the as.rsi()
function is for.# transform variables +data <- WHONET %>% + # get microbial ID based on given organism + mutate(mo = as.mo(Organism)) %>% + # transform everything from "AMP_ND10" to "CIP_EE" to the new `rsi` class + mutate_at(vars(AMP_ND10:CIP_EE), as.rsi)
No errors or warnings, so all values are transformed succesfully.
+We also created a package dedicated to data cleaning and checking, called the cleaner
package. Its freq()
function can be used to create frequency tables.
So let’s check our data, with a couple of frequency tables:
+# our newly created `mo` variable, put in the mo_name() function +data %>% freq(mo_name(mo), nmax = 10)
Frequency table
+Class: character
+Length: 500
+Available: 500 (100%, NA: 0 = 0%)
+Unique: 37
Shortest: 11
+Longest: 40
+ | Item | +Count | +Percent | +Cum. Count | +Cum. Percent | +
---|---|---|---|---|---|
1 | +Escherichia coli | +245 | +49.0% | +245 | +49.0% | +
2 | +Coagulase-negative Staphylococcus (CoNS) | +74 | +14.8% | +319 | +63.8% | +
3 | +Staphylococcus epidermidis | +38 | +7.6% | +357 | +71.4% | +
4 | +Streptococcus pneumoniae | +31 | +6.2% | +388 | +77.6% | +
5 | +Staphylococcus hominis | +21 | +4.2% | +409 | +81.8% | +
6 | +Proteus mirabilis | +9 | +1.8% | +418 | +83.6% | +
7 | +Enterococcus faecium | +8 | +1.6% | +426 | +85.2% | +
8 | +Staphylococcus capitis | +8 | +1.6% | +434 | +86.8% | +
9 | +Enterobacter cloacae | +5 | +1.0% | +439 | +87.8% | +
10 | +Streptococcus anginosus | +5 | +1.0% | +444 | +88.8% | +
(omitted 27 entries, n = 56 [11.20%])
+# our transformed antibiotic columns +# amoxicillin/clavulanic acid (J01CR02) as an example +data %>% freq(AMC_ND2)
Frequency table
+Class: factor > ordered > rsi (numeric)
+Length: 500
+Levels: 3: S < I < R
+Available: 481 (96.2%, NA: 19 = 3.8%)
+Unique: 3
+ | Item | +Count | +Percent | +Cum. Count | +Cum. Percent | +
---|---|---|---|---|---|
1 | +S | +356 | +74.01% | +356 | +74.01% | +
2 | +R | +103 | +21.41% | +459 | +95.43% | +
3 | +I | +22 | +4.57% | +481 | +100.00% | +
An easy ggplot
will already give a lot of information, using the included ggplot_rsi()
function:
data %>% + group_by(Country) %>% + select(Country, AMP_ND2, AMC_ED20, CAZ_ED10, CIP_ED5) %>% + ggplot_rsi(translate_ab = 'ab', facet = "Country", datalabels = FALSE)
vignettes/benchmarks.Rmd
+ benchmarks.Rmd
One of the most important features of this package is the complete microbial taxonomic database, supplied by the Catalogue of Life. We created a function as.mo()
that transforms any user input value to a valid microbial ID by using intelligent rules combined with the taxonomic tree of Catalogue of Life.
Using the microbenchmark
package, we can review the calculation performance of this function. Its function microbenchmark()
runs different input expressions independently of each other and measures their time-to-result.
microbenchmark <- microbenchmark::microbenchmark +library(AMR) +library(dplyr)
In the next test, we try to ‘coerce’ different input values into the microbial code of Staphylococcus aureus. Coercion is a computational process of forcing output based on an input. For microorganism names, coercing user input to taxonomically valid microorganism names is crucial to ensure correct interpretation and to enable grouping based on taxonomic properties.
+The actual result is the same every time: it returns its microorganism code B_STPHY_AURS
(B stands for Bacteria, the taxonomic kingdom).
But the calculation time differs a lot:
+S.aureus <- microbenchmark( + as.mo("sau"), # WHONET code + as.mo("stau"), + as.mo("STAU"), + as.mo("staaur"), + as.mo("STAAUR"), + as.mo("S. aureus"), + as.mo("S aureus"), + as.mo("Staphylococcus aureus"), # official taxonomic name + as.mo("Staphylococcus aureus (MRSA)"), # additional text + as.mo("Sthafilokkockus aaureuz"), # incorrect spelling + as.mo("MRSA"), # Methicillin Resistant S. aureus + as.mo("VISA"), # Vancomycin Intermediate S. aureus + as.mo("VRSA"), # Vancomycin Resistant S. aureus + as.mo(22242419), # Catalogue of Life ID + times = 10) +print(S.aureus, unit = "ms", signif = 2) +# Unit: milliseconds +# expr min lq mean median uq max +# as.mo("sau") 8.5 11.0 17.0 12.0 12.0 43.0 +# as.mo("stau") 120.0 130.0 150.0 140.0 160.0 180.0 +# as.mo("STAU") 130.0 140.0 150.0 150.0 160.0 170.0 +# as.mo("staaur") 7.7 9.1 13.0 11.0 12.0 38.0 +# as.mo("STAAUR") 8.3 9.3 15.0 10.0 11.0 37.0 +# as.mo("S. aureus") 11.0 12.0 18.0 13.0 14.0 41.0 +# as.mo("S aureus") 8.8 11.0 17.0 12.0 13.0 41.0 +# as.mo("Staphylococcus aureus") 6.4 6.6 7.4 7.6 7.8 9.1 +# as.mo("Staphylococcus aureus (MRSA)") 810.0 870.0 890.0 890.0 900.0 1000.0 +# as.mo("Sthafilokkockus aaureuz") 320.0 340.0 370.0 350.0 400.0 490.0 +# as.mo("MRSA") 9.2 10.0 13.0 11.0 12.0 37.0 +# as.mo("VISA") 12.0 12.0 22.0 13.0 43.0 44.0 +# as.mo("VRSA") 11.0 13.0 21.0 14.0 38.0 41.0 +# as.mo(22242419) 130.0 140.0 150.0 140.0 170.0 200.0 +# neval +# 10 +# 10 +# 10 +# 10 +# 10 +# 10 +# 10 +# 10 +# 10 +# 10 +# 10 +# 10 +# 10 +# 10
In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 5 milliseconds means it can determine 200 input values per second. It case of 100 milliseconds, this is only 10 input values per second.
+To achieve this speed, the as.mo
function also takes into account the prevalence of human pathogenic microorganisms. The downside of this is of course that less prevalent microorganisms will be determined less fast. See this example for the ID of Methanosarcina semesiae (B_MTHNSR_SEMS
), a bug probably never found before in humans:
M.semesiae <- microbenchmark(as.mo("metsem"), + as.mo("METSEM"), + as.mo("M. semesiae"), + as.mo("M. semesiae"), + as.mo("Methanosarcina semesiae"), + times = 10) +print(M.semesiae, unit = "ms", signif = 4) +# Unit: milliseconds +# expr min lq mean median uq max +# as.mo("metsem") 143.400 146.300 156.10 155.400 164.900 176.40 +# as.mo("METSEM") 141.600 146.900 167.00 170.700 185.000 188.00 +# as.mo("M. semesiae") 9.665 9.879 16.50 10.090 11.960 44.29 +# as.mo("M. semesiae") 10.000 10.080 14.46 11.660 13.140 42.01 +# as.mo("Methanosarcina semesiae") 7.161 7.389 10.40 7.542 9.294 33.00 +# neval +# 10 +# 10 +# 10 +# 10 +# 10
Looking up arbitrary codes of less prevalent microorganisms costs the most time. Full names (like Methanosarcina semesiae) are always very fast and only take some thousands of seconds to coerce - they are the most probable input from most data sets.
+In the figure below, we compare Escherichia coli (which is very common) with Prevotella brevis (which is moderately common) and with Methanosarcina semesiae (which is uncommon):
+ +Uncommon microorganisms take some more time than common microorganisms. To further improve performance, two important calculations take almost no time at all: repetitive results and already precalculated results.
+Repetitive results are unique values that are present more than once. Unique values will only be calculated once by as.mo()
. We will use mo_name()
for this test - a helper function that returns the full microbial name (genus, species and possibly subspecies) which uses as.mo()
internally.
# take all MO codes from the example_isolates data set +x <- example_isolates$mo %>% + # keep only the unique ones + unique() %>% + # pick 50 of them at random + sample(50) %>% + # paste that 10,000 times + rep(10000) %>% + # scramble it + sample() + +# got indeed 50 times 10,000 = half a million? +length(x) +# [1] 500000 + +# and how many unique values do we have? +n_distinct(x) +# [1] 50 + +# now let's see: +run_it <- microbenchmark(mo_name(x), + times = 10) +print(run_it, unit = "ms", signif = 3) +# Unit: milliseconds +# expr min lq mean median uq max neval +# mo_name(x) 1650 1730 1790 1790 1840 1900 10
So transforming 500,000 values (!!) of 50 unique values only takes 1.79 seconds. You only lose time on your unique input values.
+What about precalculated results? If the input is an already precalculated result of a helper function like mo_name()
, it almost doesn’t take any time at all (see ‘C’ below):
run_it <- microbenchmark(A = mo_name("B_STPHY_AURS"), + B = mo_name("S. aureus"), + C = mo_name("Staphylococcus aureus"), + times = 10) +print(run_it, unit = "ms", signif = 3) +# Unit: milliseconds +# expr min lq mean median uq max neval +# A 5.680 5.820 9.61 6.36 6.850 39.500 10 +# B 9.790 10.000 10.60 10.40 10.900 11.900 10 +# C 0.229 0.259 0.27 0.27 0.286 0.311 10
So going from mo_name("Staphylococcus aureus")
to "Staphylococcus aureus"
takes 0.0003 seconds - it doesn’t even start calculating if the result would be the same as the expected resulting value. That goes for all helper functions:
run_it <- microbenchmark(A = mo_species("aureus"), + B = mo_genus("Staphylococcus"), + C = mo_name("Staphylococcus aureus"), + D = mo_family("Staphylococcaceae"), + E = mo_order("Bacillales"), + F = mo_class("Bacilli"), + G = mo_phylum("Firmicutes"), + H = mo_kingdom("Bacteria"), + times = 10) +print(run_it, unit = "ms", signif = 3) +# Unit: milliseconds +# expr min lq mean median uq max neval +# A 0.209 0.221 0.236 0.225 0.244 0.311 10 +# B 0.197 0.201 0.215 0.212 0.222 0.266 10 +# C 0.205 0.224 0.243 0.229 0.242 0.383 10 +# D 0.199 0.207 0.216 0.211 0.214 0.270 10 +# E 0.196 0.206 0.218 0.215 0.221 0.270 10 +# F 0.188 0.197 0.212 0.210 0.216 0.269 10 +# G 0.195 0.198 0.213 0.203 0.215 0.299 10 +# H 0.184 0.193 0.205 0.201 0.207 0.252 10
Of course, when running mo_phylum("Firmicutes")
the function has zero knowledge about the actual microorganism, namely S. aureus. But since the result would be "Firmicutes"
anyway, there is no point in calculating the result. And because this package ‘knows’ all phyla of all known bacteria (according to the Catalogue of Life), it can just return the initial value immediately.
When the system language is non-English and supported by this AMR
package, some functions will have a translated result. This almost does’t take extra time:
mo_name("CoNS", language = "en") # or just mo_name("CoNS") on an English system +# [1] "Coagulase-negative Staphylococcus (CoNS)" + +mo_name("CoNS", language = "es") # or just mo_name("CoNS") on a Spanish system +# [1] "Staphylococcus coagulasa negativo (SCN)" + +mo_name("CoNS", language = "nl") # or just mo_name("CoNS") on a Dutch system +# [1] "Coagulase-negatieve Staphylococcus (CNS)" + +run_it <- microbenchmark(en = mo_name("CoNS", language = "en"), + de = mo_name("CoNS", language = "de"), + nl = mo_name("CoNS", language = "nl"), + es = mo_name("CoNS", language = "es"), + it = mo_name("CoNS", language = "it"), + fr = mo_name("CoNS", language = "fr"), + pt = mo_name("CoNS", language = "pt"), + times = 100) +print(run_it, unit = "ms", signif = 4) +# Unit: milliseconds +# expr min lq mean median uq max neval +# en 9.303 11.59 14.90 12.40 13.63 45.92 100 +# de 10.080 12.39 15.77 13.11 14.45 46.27 100 +# nl 13.200 16.26 20.88 17.80 19.52 49.93 100 +# es 9.957 12.23 15.57 13.12 14.59 51.99 100 +# it 10.210 12.44 19.02 13.34 14.74 52.96 100 +# fr 10.040 12.40 18.90 13.26 15.07 54.40 100 +# pt 10.450 12.67 16.91 13.46 14.68 51.47 100
Currently supported are German, Dutch, Spanish, Italian, French and Portuguese.
+vignettes/resistance_predict.Rmd
+ resistance_predict.Rmd
As with many uses in R, we need some additional packages for AMR analysis. Our package works closely together with the tidyverse packages dplyr
and ggplot2
by Dr Hadley Wickham. The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.
Our AMR
package depends on these packages and even extends their use and functions.
Our package contains a function resistance_predict()
, which takes the same input as functions for other AMR analysis. Based on a date column, it calculates cases per year and uses a regression model to predict antimicrobial resistance.
It is basically as easy as:
+# resistance prediction of piperacillin/tazobactam (TZP): +resistance_predict(tbl = example_isolates, col_date = "date", col_ab = "TZP", model = "binomial") + +# or: +example_isolates %>% + resistance_predict(col_ab = "TZP", + model "binomial") + +# to bind it to object 'predict_TZP' for example: +predict_TZP <- example_isolates %>% + resistance_predict(col_ab = "TZP", + model = "binomial")
The function will look for a date column itself if col_date
is not set.
When running any of these commands, a summary of the regression model will be printed unless using resistance_predict(..., info = FALSE)
.
# NOTE: Using column `date` as input for `col_date`.
+This text is only a printed summary - the actual result (output) of the function is a data.frame
containing for each year: the number of observations, the actual observed resistance, the estimated resistance and the standard error below and above the estimation:
predict_TZP +# year value se_min se_max observations observed estimated +# 1 2002 0.20000000 NA NA 15 0.20000000 0.05616378 +# 2 2003 0.06250000 NA NA 32 0.06250000 0.06163839 +# 3 2004 0.08536585 NA NA 82 0.08536585 0.06760841 +# 4 2005 0.05000000 NA NA 60 0.05000000 0.07411100 +# 5 2006 0.05084746 NA NA 59 0.05084746 0.08118454 +# 6 2007 0.12121212 NA NA 66 0.12121212 0.08886843 +# 7 2008 0.04166667 NA NA 72 0.04166667 0.09720264 +# 8 2009 0.01639344 NA NA 61 0.01639344 0.10622731 +# 9 2010 0.05660377 NA NA 53 0.05660377 0.11598223 +# 10 2011 0.18279570 NA NA 93 0.18279570 0.12650615 +# 11 2012 0.30769231 NA NA 65 0.30769231 0.13783610 +# 12 2013 0.06896552 NA NA 58 0.06896552 0.15000651 +# 13 2014 0.10000000 NA NA 60 0.10000000 0.16304829 +# 14 2015 0.23636364 NA NA 55 0.23636364 0.17698785 +# 15 2016 0.22619048 NA NA 84 0.22619048 0.19184597 +# 16 2017 0.16279070 NA NA 86 0.16279070 0.20763675 +# 17 2018 0.22436641 0.1938710 0.2548618 NA NA 0.22436641 +# 18 2019 0.24203228 0.2062911 0.2777735 NA NA 0.24203228 +# 19 2020 0.26062172 0.2191758 0.3020676 NA NA 0.26062172 +# 20 2021 0.28011130 0.2325557 0.3276669 NA NA 0.28011130 +# 21 2022 0.30046606 0.2464567 0.3544755 NA NA 0.30046606 +# 22 2023 0.32163907 0.2609011 0.3823771 NA NA 0.32163907 +# 23 2024 0.34357130 0.2759081 0.4112345 NA NA 0.34357130 +# 24 2025 0.36619175 0.2914934 0.4408901 NA NA 0.36619175 +# 25 2026 0.38941799 0.3076686 0.4711674 NA NA 0.38941799 +# 26 2027 0.41315710 0.3244399 0.5018743 NA NA 0.41315710 +# 27 2028 0.43730688 0.3418075 0.5328063 NA NA 0.43730688 +# 28 2029 0.46175755 0.3597639 0.5637512 NA NA 0.46175755 +# 29 2030 0.48639359 0.3782932 0.5944939 NA NA 0.48639359
The function plot
is available in base R, and can be extended by other packages to depend the output based on the type of input. We extended its function to cope with resistance predictions:
plot(predict_TZP)
This is the fastest way to plot the result. It automatically adds the right axes, error bars, titles, number of available observations and type of model.
+We also support the ggplot2
package with our custom function ggplot_rsi_predict()
to create more appealing plots:
ggplot_rsi_predict(predict_TZP)
+# choose for error bars instead of a ribbon +ggplot_rsi_predict(predict_TZP, ribbon = FALSE)
Resistance is not easily predicted; if we look at vancomycin resistance in Gram-positive bacteria, the spread (i.e. standard error) is enormous:
+example_isolates %>% + filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>% + resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "binomial") %>% + ggplot_rsi_predict() +# NOTE: Using column `date` as input for `col_date`.
Vancomycin resistance could be 100% in ten years, but might also stay around 0%.
+You can define the model with the model
parameter. The model chosen above is a generalised linear regression model using a binomial distribution, assuming that a period of zero resistance was followed by a period of increasing resistance leading slowly to more and more resistance.
Valid values are:
+Input values | +Function used by R | +Type of model | +
---|---|---|
+"binomial" or "binom" or "logit"
+ |
+glm(..., family = binomial) |
+Generalised linear model with binomial distribution | +
+"loglin" or "poisson"
+ |
+glm(..., family = poisson) |
+Generalised linear model with poisson distribution | +
+"lin" or "linear"
+ |
+lm() |
+Linear model | +
For the vancomycin resistance in Gram-positive bacteria, a linear model might be more appropriate since no binomial distribution is to be expected based on the observed years:
+example_isolates %>% + filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>% + resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>% + ggplot_rsi_predict() +# NOTE: Using column `date` as input for `col_date`.
This seems more likely, doesn’t it?
+The model itself is also available from the object, as an attribute
:
model <- attributes(predict_TZP)$model + +summary(model)$family +# +# Family: binomial +# Link function: logit + +summary(model)$coefficients +# Estimate Std. Error z value Pr(>|z|) +# (Intercept) -200.67944891 46.17315349 -4.346237 1.384932e-05 +# year 0.09883005 0.02295317 4.305725 1.664395e-05
inst/CITATION
+ Berends MS, Luz CF et al. (2019). AMR - An R Package for Working with Antimicrobial Resistance Data. bioRxiv, https://doi.org/10.1101/810622
+@Article{, + title = {AMR - An R Package for Working with Antimicrobial Resistance Data}, + author = {M S Berends and C F Luz and A W Friedrich and B N M Sinha and C J Albers and C Glasner}, + journal = {bioRxiv}, + publisher = {Cold Spring Harbor Laboratory}, + year = {2019}, + url = {https://doi.org/10.1101/810622}, +}+ +
Matthijs S. Berends. Author, maintainer. +
+Christian F. Luz. Author, contributor. +
+Alexander W. Friedrich. Author, thesis advisor. +
+Bhanu N. M. Sinha. Author, thesis advisor. +
+Casper J. Albers. Author, thesis advisor. +
+Corinna Glasner. Author, thesis advisor. +
+Judith M. Fonville. Contributor. +
+Erwin E. A. Hassing. Contributor. +
+Eric H. L. C. M. Hazenberg. Contributor. +
+Annick Lenglet. Contributor. +
+Bart C. Meijer. Contributor. +
+Sofia Ny. Contributor. +
+Dennis Souverein. Contributor. +
+AMR
(for R). Developed at the University of Groningen in collaboration with non-profit organisations Certe Medical Diagnostics and Advice and University Medical Center Groningen.
++METHODS PAPER PREPRINTED
+
+A methods paper about this package has been preprinted at bioRxiv (DOI: 10.1101/810622). Please click here for the paper on bioRxiv’s publishers page.
AMR
(for R)?(To find out how to conduct AMR analysis, please continue reading here to get started.)
+AMR
is a free, open-source and independent R package to simplify the analysis and prediction of Antimicrobial Resistance (AMR) and to work with microbial and antimicrobial data and properties, by using evidence-based methods. Our aim is to provide a standard for clean and reproducible antimicrobial resistance data analysis, that can therefore empower epidemiological analyses to continuously enable surveillance and treatment evaluation in any setting.
After installing this package, R knows ~70,000 distinct microbial species and all ~550 antibiotic, antimycotic and antiviral drugs by name and code (including ATC, EARS-NET, LOINC and SNOMED CT), and knows all about valid R/SI and MIC values. It supports any data format, including WHONET/EARS-Net data.
+This package is fully independent of any other R package and works on Windows, macOS and Linux with all versions of R since R-3.0.0 (April 2013). It was designed to work in any setting, including those with very limited resources. It was created for both routine data analysis and academic research at the Faculty of Medical Sciences of the University of Groningen, in collaboration with non-profit organisations Certe Medical Diagnostics and Advice and University Medical Center Groningen. This R package is actively maintained and is free software (see Copyright).
+
+ Used in more than 100 countries
Since its first public release in early 2018, this package has been downloaded from more than 100 countries (source: CRAN logs). Click the map to enlarge, to see the names of the countries.
+
This package can be used for:
+This package is available here on the official R network (CRAN), which has a peer-reviewed submission process. Install this package in R from CRAN by using the command:
+install.packages("AMR")
It will be downloaded and installed automatically. For RStudio, click on the menu Tools > Install Packages… and then type in “AMR” and press Install.
+Note: Not all functions on this website may be available in this latest release. To use all functions and data sets mentioned on this website, install the latest development version.
+The latest and unpublished development version can be installed from GitHub using:
+install.packages("remotes") +remotes::install_github("msberends/AMR")
To find out how to conduct AMR analysis, please continue reading here to get started or click the links in the ‘How to’ menu.
+This package contains the complete taxonomic tree of almost all ~70,000 microorganisms from the authoritative and comprehensive Catalogue of Life (CoL, www.catalogueoflife.org), supplemented by data from the List of Prokaryotic names with Standing in Nomenclature (LPSN, lpsn.dsmz.de). This supplementation is needed until the CoL+ project is finished, which we await. With catalogue_of_life_version()
can be checked which version of the CoL is included in this package.
Read more about which data from the Catalogue of Life in our manual.
+This package contains all ~550 antibiotic, antimycotic and antiviral drugs and their Anatomical Therapeutic Chemical (ATC) codes, ATC groups and Defined Daily Dose (DDD, oral and IV) from the World Health Organization Collaborating Centre for Drug Statistics Methodology (WHOCC, https://www.whocc.no) and the Pharmaceuticals Community Register of the European Commission.
+NOTE: The WHOCC copyright does not allow use for commercial purposes, unlike any other info from this package. See https://www.whocc.no/copyright_disclaimer/.
+Read more about the data from WHOCC in our manual.
+We support WHONET and EARS-Net data. Exported files from WHONET can be imported into R and can be analysed easily using this package. For education purposes, we created an example data set WHONET
with the exact same structure as a WHONET export file. Furthermore, this package also contains a data set antibiotics with all EARS-Net antibiotic abbreviations, and knows almost all WHONET abbreviations for microorganisms. When using WHONET data as input for analysis, all input parameters will be set automatically.
Read our tutorial about how to work with WHONET data here.
+The AMR
package basically does four important things:
It cleanses existing data by providing new classes for microoganisms, antibiotics and antimicrobial results (both S/I/R and MIC). By installing this package, you teach R everything about microbiology that is needed for analysis. These functions all use intelligent rules to guess results that you would expect:
+as.mo()
to get a microbial ID. The IDs are human readable for the trained eye - the ID of Klebsiella pneumoniae is “B_KLBSL_PNMN” (B stands for Bacteria) and the ID of S. aureus is “B_STPHY_AURS”. The function takes almost any text as input that looks like the name or code of a microorganism like “E. coli”, “esco” or “esccol” and tries to find expected results using intelligent rules combined with the included Catalogue of Life data set. It only takes milliseconds to find results, please see our benchmarks. Moreover, it can group Staphylococci into coagulase negative and positive (CoNS and CoPS, see source) and can categorise Streptococci into Lancefield groups (like beta-haemolytic Streptococcus Group B, source).as.ab()
to get an antibiotic ID. Like microbial IDs, these IDs are also human readable based on those used by EARS-Net. For example, the ID of amoxicillin is AMX
and the ID of gentamicin is GEN
. The as.ab()
function also uses intelligent rules to find results like accepting misspelling, trade names and abbrevations used in many laboratory systems. For instance, the values “Furabid”, “Furadantin”, “nitro” all return the ID of Nitrofurantoine. To accomplish this, the package contains a database with most LIS codes, official names, trade names, ATC codes, defined daily doses (DDD) and drug categories of antibiotics.as.rsi()
to get antibiotic interpretations based on raw MIC values (in mg/L) or disk diffusion values (in mm), or transform existing values to valid antimicrobial results. It produces just S, I or R based on your input and warns about invalid values. Even values like “<=0.002; S” (combined MIC/RSI) will result in “S”.as.mic()
to cleanse your MIC values. It produces a so-called factor (called ordinal in SPSS) with valid MIC values as levels. A value like “<=0.002; S” (combined MIC/RSI) will result in “<=0.002”.It enhances existing data and adds new data from data sets included in this package.
+eucast_rules()
to apply EUCAST expert rules to isolates (not the translation from MIC to R/SI values, use as.rsi()
for that).first_isolate()
to identify the first isolates of every patient using guidelines from the CLSI (Clinical and Laboratory Standards Institute).
+mdro()
to determine which micro-organisms are multi-drug resistant organisms (MDRO). It supports a variety of international guidelines, such as the MDR-paper by Magiorakos et al. (2012, PMID 21793988), the exceptional phenotype definitions of EUCAST and the WHO guideline on multi-drug resistant TB. It also supports the national guidelines of the Netherlands and Germany.mo_genus()
, mo_family()
, mo_gramstain()
or even mo_phylum()
. Use mo_snomed()
to look up any SNOMED CT code associated with a microorganism. As all these function use as.mo()
internally, they also use the same intelligent rules for determination. For example, mo_genus("MRSA")
and mo_genus("S. aureus")
will both return "Staphylococcus"
. They also come with support for German, Dutch, Spanish, Italian, French and Portuguese. These functions can be used to add new variables to your data.ab_name()
, ab_group()
, ab_atc()
, ab_loinc()
and ab_tradenames()
to look up values. The ab_*
functions use as.ab()
internally so they support the same intelligent rules to guess the most probable result. For example, ab_name("Fluclox")
, ab_name("Floxapen")
and ab_name("J01CF05")
will all return "Flucloxacillin"
. These functions can again be used to add new variables to your data.It analyses the data with convenient functions that use well-known methods.
+susceptibility()
and resistance()
functions, or be even more specific with the proportion_R()
, proportion_IR()
, proportion_I()
, proportion_SI()
and proportion_S()
functions. Similarly, the number of isolates can be determined with the count_resistant()
, count_susceptible()
and count_all()
functions. All these functions can be used with the dplyr
package (e.g. in conjunction with summarise()
)geom_rsi()
, a function made for the ggplot2
packageresistance_predict()
functionIt teaches the user how to use all the above actions.
+example_isolates
data set. This data set contains 2,000 microbial isolates with their full antibiograms. It reflects reality and can be used to practice AMR analysis.WHONET
data set. This data set only contains fake data, but with the exact same structure as files exported by WHONET. Read more about WHONET on its tutorial page.This R package is free, open-source software and licensed under the GNU General Public License v2.0 (GPL-2). In a nutshell, this means that this package:
+May be used for commercial purposes
May be used for private purposes
May not be used for patent purposes
May be modified, although:
+May be distributed, although:
+Comes with a LIMITATION of liability
Comes with NO warranty
NEWS.md
+ Function ab_from_text()
to retrieve antimicrobial drug names, doses and forms of administration from clinical texts in e.g. health care records, which also corrects for misspelling since it uses as.ab()
internally
Tidyverse selections for antibiotic classes, that help to select the columns of antibiotics that are of a specific antibiotic class, without the need to define the columns or antibiotic abbreviations. They can be used in any function that allows Tidyverse selections, like dplyr::select()
and tidyr::pivot_longer()
:
library(dplyr) + +# Columns 'IPM' and 'MEM' are in the example_isolates data set +example_isolates %>% + select(carbapenems()) +#> Selecting carbapenems: `IPM` (imipenem), `MEM` (meropenem)
Added mo_domain()
as an alias to mo_kingdom()
Added function filter_penicillins()
to filter isolates on a specific result in any column with a name in the antimicrobial ‘penicillins’ class (more specific: ATC subgroup Beta-lactam antibacterials, penicillins)
Added official antimicrobial names to all filter_ab_class()
functions, such as filter_aminoglycosides()
Added antibiotics code “FOX1” for cefoxitin screening (abbreviation “cfsc”) to the antibiotics
data set
Added Monuril as trade name for fosfomycin
susceptibility()
and resistance()
and all count_*()
, proportion_*()
functions:
+dplyr::all_of()
) now works againas.ab()
:
+as.ab()
, making many more input errors translatable, such as digitalised health care records, using too few or too many vowels or consonants and many moreas.ab()
would return an error on invalid input valuesas.ab()
function will now throw a note if more than 1 antimicrobial drug could be retrieved from a single input value.eucast_rules()
would not work on a tibble when the tibble
or dplyr
package was loaded*_join_microorganisms()
functions and bug_drug_combinations()
now return the original data class (e.g. tibble
s and data.table
s)rsi_df()
, proportion_df()
and count_df()
, and fixed a bug where not all different antimicrobial results were added as rows<mo>
and <Date>
+bug_drug_combinations()
for when only one antibiotic was in the input data<mo>
, to highlight the %SI vs. %RRemoved code dependency on all other R packages, making this package fully independent of the development process of others. This is a major code change, but will probably not be noticeable by most users.
+Making this package independent of especially the tidyverse (e.g. packages dplyr
and tidyr
) tremendously increases sustainability on the long term, since tidyverse functions change quite often. Good for users, but hard for package maintainers. Most of our functions are replaced with versions that only rely on base R, which keeps this package fully functional for many years to come, without requiring a lot of maintenance to keep up with other packages anymore. Another upside it that this package can now be used with all versions of R since R-3.0.0 (April 2013). Our package is being used in settings where the resources are very limited. Fewer dependencies on newer software is helpful for such settings.
Negative effects of this change are:
+freq()
that was borrowed from the cleaner
package was removed. Use cleaner::freq()
, or run library("cleaner")
before you use freq()
.mo
or rsi
in a tibble will no longer be in colour and printing rsi
in a tibble will show the class <ord>
, not <rsi>
anymore. This is purely a visual effect.mo_*
family (like mo_name()
and mo_gramstain()
) are noticeably slower when running on hundreds of thousands of rows.mo
and ab
now both also inherit class character
, to support any data transformation. This change invalidates code that checks for class length == 1.first_isolate()
), since some bacterial names might be renamed to other genera or other (sub)species. This is expected behaviour.eucast_rules()
function no longer applies “other” rules at default that are made available by this package (like setting ampicillin = R when ampicillin + enzyme inhibitor = R). The default input value for rules
is now c("breakpoints", "expert")
instead of "all"
, but this can be changed by the user. To return to the old behaviour, set options(AMR.eucast_rules = "all")
.antibiotics
data set these two rules:
+eucast_rules()
+ab_url()
to return the direct URL of an antimicrobial agent from the official WHO websiteas.ab()
, so that e.g. as.ab("ampi sul")
and ab_name("ampi sul")
workab_atc()
and ab_group()
now return NA
if no antimicrobial agent could be foundset_mo_source()
to make sure that column mo
will always be the second columnp.symbol()
- it was replaced with p_symbol()
+read.4d()
, that was only useful for reading data from an old test database.pca()
functionggplot_pca()
functionas.mo()
(and consequently all mo_*
functions, that use as.mo()
internally):
+SPE
for species, like "ESCSPE"
for Escherichia coli
+antibiotics
data setas.rsi()
for years 2010-2019 (thanks to Anthony Underwood)Fixed important floating point error for some MIC comparisons in EUCAST 2020 guideline
Interpretation from MIC values (and disk zones) to R/SI can now be used with mutate_at()
of the dplyr
package:
Added antibiotic abbreviations for a laboratory manufacturer (GLIMS) for cefuroxime, cefotaxime, ceftazidime, cefepime, cefoxitin and trimethoprim/sulfamethoxazole
Added uti
(as abbreviation of urinary tract infections) as parameter to as.rsi()
, so interpretation of MIC values and disk zones can be made dependent on isolates specifically from UTIs
Info printing in functions eucast_rules()
, first_isolate()
, mdro()
and resistance_predict()
will now at default only print when R is in an interactive mode (i.e. not in RMarkdown)
This software is now out of beta and considered stable. Nonetheless, this package will be developed continually.
+as.rsi()
and inferred resistance and susceptibility using eucast_rules()
.Support for LOINC codes in the antibiotics
data set. Use ab_loinc()
to retrieve LOINC codes, or use a LOINC code for input in any ab_*
function:
Support for SNOMED CT codes in the microorganisms
data set. Use mo_snomed()
to retrieve SNOMED codes, or use a SNOMED code for input in any mo_*
function:
mo_snomed("S. aureus") +#> [1] 115329001 3092008 113961008 +mo_name(115329001) +#> [1] "Staphylococcus aureus" +mo_gramstain(115329001) +#> [1] "Gram-positive"
as.mo()
function previously wrote to the package folder to improve calculation speed for previously calculated results. This is no longer the case, to comply with CRAN policies. Consequently, the function clear_mo_history()
was removed.as.rsi()
+as.mo()
(and consequently all mo_*
functions, that use as.mo()
internally):
+as.mo("Methicillin-resistant S.aureus")
+as.disk()
limited to a maximum of 50 millimeterstidyverse
+as.ab()
: support for drugs starting with “co-” like co-amoxiclav, co-trimoxazole, co-trimazine and co-trimazole (thanks to Peter Dutey)antibiotics
data set (thanks to Peter Dutey):
+RIF
) to rifampicin/isoniazid (RFI
). Please note that the combination rifampicin/isoniazid has no DDDs defined, so e.g. ab_ddd("Rimactazid")
will now return NA
.SMX
) to trimethoprim/sulfamethoxazole (SXT
)microorganisms
data set, which means that the new order Enterobacterales now consists of a part of the existing family Enterobacteriaceae, but that this family has been split into other families as well (like Morganellaceae and Yersiniaceae). Although published in 2016, this information is not yet in the Catalogue of Life version of 2019. All MDRO determinations with mdro()
will now use the Enterobacterales order for all guidelines before 2016 that were dependent on the Enterobacteriaceae family.
+
+Functions susceptibility()
and resistance()
as aliases of proportion_SI()
and proportion_R()
, respectively. These functions were added to make it more clear that “I” should be considered susceptible and not resistant.
library(dplyr) +example_isolates %>% + group_by(bug = mo_name(mo)) %>% + summarise(amoxicillin = resistance(AMX), + amox_clav = resistance(AMC)) %>% + filter(!is.na(amoxicillin) | !is.na(amox_clav))
Support for a new MDRO guideline: Magiorakos AP, Srinivasan A et al. “Multidrug-resistant, extensively drug-resistant and pandrug-resistant bacteria: an international expert proposal for interim standard definitions for acquired resistance.” Clinical Microbiology and Infection (2012).
+mdro()
functionmdro(...., verbose = TRUE)
) returns an informative data set where the reason for MDRO determination is given for every isolate, and an list of the resistant antimicrobial agentsData set antivirals
, containing all entries from the ATC J05 group with their DDDs for oral and parenteral treatment
as.mo()
:
+Now allows “ou” where “au” should have been used and vice versa
More intelligent way of coping with some consonants like “l” and “r”
Added a score (a certainty percentage) to mo_uncertainties()
, that is calculated using the Levenshtein distance:
as.mo(c("Stafylococcus aureus", + "staphylokok aureuz")) +#> Warning: +#> Results of two values were guessed with uncertainty. Use mo_uncertainties() to review them. +#> Class 'mo' +#> [1] B_STPHY_AURS B_STPHY_AURS + +mo_uncertainties() +#> "Stafylococcus aureus" -> Staphylococcus aureus (B_STPHY_AURS, score: 95.2%) +#> "staphylokok aureuz" -> Staphylococcus aureus (B_STPHY_AURS, score: 85.7%)
as.atc()
- this function was replaced by ab_atc()
+portion_*
functions to proportion_*
. All portion_*
functions are still available as deprecated functions, and will return a warning when used.as.rsi()
over a data set, it will now print the guideline that will be used if it is not specified by the usereucast_rules()
:
+eucast_rules()
are now applied first and not as last anymore. This is to improve the dependency on certain antibiotics for the official EUCAST rules. Please see ?eucast_rules
.as.rsi()
where the input is NA
+mdro()
and eucast_rules()
+antibiotics
data setexample_isolates
data set to better reflect realitymo_info()
+clean
to cleaner
, as this package was renamed accordingly upon CRAN requestDetermination of first isolates now excludes all ‘unknown’ microorganisms at default, i.e. microbial code "UNKNOWN"
. They can be included with the new parameter include_unknown
:
first_isolate(..., include_unknown = TRUE)
For WHONET users, this means that all records/isolates with organism code "con"
(contamination) will be excluded at default, since as.mo("con") = "UNKNOWN"
. The function always shows a note with the number of ‘unknown’ microorganisms that were included or excluded.
For code consistency, classes ab
and mo
will now be preserved in any subsetting or assignment. For the sake of data integrity, this means that invalid assignments will now result in NA
:
# how it works in base R: +x <- factor("A") +x[1] <- "B" +#> Warning message: +#> invalid factor level, NA generated + +# how it now works similarly for classes 'mo' and 'ab': +x <- as.mo("E. coli") +x[1] <- "testvalue" +#> Warning message: +#> invalid microorganism code, NA generated
This is important, because a value like "testvalue"
could never be understood by e.g. mo_name()
, although the class would suggest a valid microbial code.
Function freq()
has moved to a new package, clean
(CRAN link), since creating frequency tables actually does not fit the scope of this package. The freq()
function still works, since it is re-exported from the clean
package (which will be installed automatically upon updating this AMR
package).
Renamed data set septic_patients
to example_isolates
Function bug_drug_combinations()
to quickly get a data.frame
with the results of all bug-drug combinations in a data set. The column containing microorganism codes is guessed automatically and its input is transformed with mo_shortname()
at default:
x <- bug_drug_combinations(example_isolates) +#> NOTE: Using column `mo` as input for `col_mo`. +x[1:4, ] +#> mo ab S I R total +#> 1 A. baumannii AMC 0 0 3 3 +#> 2 A. baumannii AMK 0 0 0 0 +#> 3 A. baumannii AMP 0 0 3 3 +#> 4 A. baumannii AMX 0 0 3 3 +#> NOTE: Use 'format()' on this result to get a publicable/printable format. + +# change the transformation with the FUN argument to anything you like: +x <- bug_drug_combinations(example_isolates, FUN = mo_gramstain) +#> NOTE: Using column `mo` as input for `col_mo`. +x[1:4, ] +#> mo ab S I R total +#> 1 Gram-negative AMC 469 89 174 732 +#> 2 Gram-negative AMK 251 0 2 253 +#> 3 Gram-negative AMP 227 0 405 632 +#> 4 Gram-negative AMX 227 0 405 632 +#> NOTE: Use 'format()' on this result to get a publicable/printable format.
You can format this to a printable format, ready for reporting or exporting to e.g. Excel with the base R format()
function:
format(x, combine_IR = FALSE)
Additional way to calculate co-resistance, i.e. when using multiple antimicrobials as input for portion_*
functions or count_*
functions. This can be used to determine the empiric susceptibility of a combination therapy. A new parameter only_all_tested
(which defaults to FALSE
) replaces the old also_single_tested
and can be used to select one of the two methods to count isolates and calculate portions. The difference can be seen in this example table (which is also on the portion
and count
help pages), where the %SI is being determined:
# -------------------------------------------------------------------- +# only_all_tested = FALSE only_all_tested = TRUE +# ----------------------- ----------------------- +# Drug A Drug B include as include as include as include as +# numerator denominator numerator denominator +# -------- -------- ---------- ----------- ---------- ----------- +# S or I S or I X X X X +# R S or I X X X X +# <NA> S or I X X - - +# S or I R X X X X +# R R - X - X +# <NA> R - - - - +# S or I <NA> X X - - +# R <NA> - - - - +# <NA> <NA> - - - - +# --------------------------------------------------------------------
Since this is a major change, usage of the old also_single_tested
will throw an informative error that it has been replaced by only_all_tested
.
tibble
printing support for classes rsi
, mic
, disk
, ab
mo
. When using tibble
s containing antimicrobial columns, values S
will print in green, values I
will print in yellow and values R
will print in red. Microbial IDs (class mo
) will emphasise on the genus and species, not on the kingdom.
as.mo()
(of which some led to additions to the microorganisms
data set). Many thanks to all contributors that helped improving the algorithms.
+B_ENTRC_FAE
could have been both E. faecalis and E. faecium. Its new code is B_ENTRC_FCLS
and E. faecium has become B_ENTRC_FACM
. Also, the Latin character æ (ae) is now preserved at the start of each genus and species abbreviation. For example, the old code for Aerococcus urinae was B_ARCCC_NAE
. This is now B_AERCC_URIN
. IMPORTANT: Old microorganism IDs are still supported, but support will be dropped in a future version. Use as.mo()
on your old codes to transform them to the new format. Using functions from the mo_*
family (like mo_name()
and mo_gramstain()
) on old codes, will throw a warning.as.ab()
, including bidirectional language supportmdro()
function, to determine multi-drug resistant organismseucast_rules()
:
+eucast_rules(..., verbose = TRUE)
) returns more informative and readable outputAMR:::get_column_abx()
)atc
- using as.atc()
is now deprecated in favour of ab_atc()
and this will return a character, not the atc
class anymoreabname()
, ab_official()
, atc_name()
, atc_official()
, atc_property()
, atc_tradenames()
, atc_trivial_nl()
+mo_shortname()
+mo_*
functions where the coercion uncertainties and failures would not be available through mo_uncertainties()
and mo_failures()
anymorecountry
parameter of mdro()
in favour of the already existing guideline
parameter to support multiple guidelines within one countryname
of RIF
is now Rifampicin instead of Rifampinantibiotics
data set is now sorted by name and all cephalosporins now have their generation between bracketsguess_ab_col()
which is now 30 times faster for antibiotic abbreviationsfilter_ab_class()
to be more reliable and to support 5th generation cephalosporinsavailability()
now uses portion_R()
instead of portion_IR()
, to comply with EUCAST insightsage()
and age_groups()
now have a na.rm
parameter to remove empty valuesp.symbol()
to p_symbol()
(the former is now deprecated and will be removed in a future version)x
in age_groups()
will now introduce NA
s and not return an error anymorekey_antibiotics()
on foreign systemsmdr_tb()
+as.mic()
)Function rsi_df()
to transform a data.frame
to a data set containing only the microbial interpretation (S, I, R), the antibiotic, the percentage of S/I/R and the number of available isolates. This is a convenient combination of the existing functions count_df()
and portion_df()
to immediately show resistance percentages and number of available isolates:
Support for all scientifically published pathotypes of E. coli to date (that we could find). Supported are:
+All these lead to the microbial ID of E. coli:
+as.mo("UPEC") +# B_ESCHR_COL +mo_name("UPEC") +# "Escherichia coli" +mo_gramstain("EHEC") +# "Gram-negative"
Function mo_info()
as an analogy to ab_info()
. The mo_info()
prints a list with the full taxonomy, authors, and the URL to the online database of a microorganism
Function mo_synonyms()
to get all previously accepted taxonomic names of a microorganism
count_df()
and portion_df()
are now lowercaseas.ab()
and as.mo()
to understand even more severely misspelled inputas.ab()
now allows spaces for coercing antibiotics namesggplot2
methods for automatically determining the scale type of classes mo
and ab
+"bacteria"
from getting coerced by as.ab()
because Bacterial is a brand name of trimethoprim (TMP)eucast_rules()
and mdro()
+latest_annual_release
from the catalogue_of_life_version()
functionPVM1
from the antibiotics
data set as this was a duplicate of PME
+as.mo()
+plot()
and barplot()
for MIC and RSI classesas.mo()
+as.rsi()
on an MIC value (created with as.mic()
), a disk diffusion value (created with the new as.disk()
) or on a complete date set containing columns with MIC or disk diffusion values.mo_name()
as alias of mo_fullname()
+mdr_tb()
) and added a new vignette about MDR. Read this tutorial here on our website.Fixed a critical bug in first_isolate()
where missing species would lead to incorrect FALSEs. This bug was not present in AMR v0.5.0, but was in v0.6.0 and v0.6.1.
Fixed a bug in eucast_rules()
where antibiotics from WHONET software would not be recognised
Completely reworked the antibiotics
data set:
All entries now have 3 different identifiers:
+ab
contains a human readable EARS-Net code, used by ECDC and WHO/WHONET - this is the primary identifier used in this packageatc
contains the ATC code, used by WHO/WHOCCcid
contains the CID code (Compound ID), used by PubChemBased on the Compound ID, almost 5,000 official brand names have been added from many different countries
All references to antibiotics in our package now use EARS-Net codes, like AMX
for amoxicillin
Functions atc_certe
, ab_umcg
and atc_trivial_nl
have been removed
All atc_*
functions are superceded by ab_*
functions
All output will be translated by using an included translation file which can be viewed here.
+Please create an issue in one of our repositories if you want additions in this file.
+Improvements to plotting AMR results with ggplot_rsi()
:
colours
to set the bar colourstitle
, subtitle
, caption
, x.title
and y.title
to set titles and axis descriptionsImproved intelligence of looking up antibiotic columns in a data set using guess_ab_col()
Added ~5,000 more old taxonomic names to the microorganisms.old
data set, which leads to better results finding when using the as.mo()
function
This package now honours the new EUCAST insight (2019) that S and I are but classified as susceptible, where I is defined as ‘increased exposure’ and not ‘intermediate’ anymore. For functions like portion_df()
and count_df()
this means that their new parameter combine_SI
is TRUE at default. Our plotting function ggplot_rsi()
also reflects this change since it uses count_df()
internally.
The age()
function gained a new parameter exact
to determine ages with decimals
Removed deprecated functions guess_mo()
, guess_atc()
, EUCAST_rules()
, interpretive_reading()
, rsi()
Frequency tables (freq()
):
speed improvement for microbial IDs
fixed factor level names for R Markdown
when all values are unique it now shows a message instead of a warning
support for boxplots:
+ +Removed all hardcoded EUCAST rules and replaced them with a new reference file which can be viewed here.
+Please create an issue in one of our repositories if you want changes in this file.
+Added ceftazidim intrinsic resistance to Streptococci
Changed default settings for age_groups()
, to let groups of fives and tens end with 100+ instead of 120+
Fix for freq()
for when all values are NA
Fix for first_isolate()
for when dates are missing
Improved speed of guess_ab_col()
Function as.mo()
now gently interprets any number of whitespace characters (like tabs) as one space
Function as.mo()
now returns UNKNOWN
for "con"
(WHONET ID of ‘contamination’) and returns NA
for "xxx"
(WHONET ID of ‘no growth’)
Small algorithm fix for as.mo()
Removed viruses from data set microorganisms.codes
and cleaned it up
Fix for mo_shortname()
where species would not be determined correctly
eucast_rules()
with verbose = TRUE
+New website!
+We’ve got a new website: https://msberends.gitlab.io/AMR (built with the great pkgdown
)
BREAKING: removed deprecated functions, parameters and references to ‘bactid’. Use as.mo()
to identify an MO code.
Catalogue of Life as a new taxonomic source for data about microorganisms, which also contains all ITIS data we used previously. The microorganisms
data set now contains:
All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria and Protozoa
All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales (covering at least like all species of Aspergillus, Candida, Pneumocystis, Saccharomyces and Trichophyton)
All ~2,000 (sub)species from ~100 other relevant genera, from the kingdoms of Animalia and Plantae (like Strongyloides and Taenia)
All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed
The responsible author(s) and year of scientific publication
+This data is updated annually - check the included version with the new function catalogue_of_life_version()
.
Due to this change, some mo
codes changed (e.g. Streptococcus changed from B_STRPTC
to B_STRPT
). A translation table is used internally to support older microorganism IDs, so users will not notice this difference.
New function mo_rank()
for the taxonomic rank (genus, species, infraspecies, etc.)
New function mo_url()
to get the direct URL of a species from the Catalogue of Life
Support for data from WHONET and EARS-Net (European Antimicrobial Resistance Surveillance Network):
+first_isolate()
and eucast_rules()
, all parameters will be filled in automatically.antibiotics
data set now contains a column ears_net
.as.mo()
now knows all WHONET species abbreviations too, because almost 2,000 microbial abbreviations were added to the microorganisms.codes
data set.New filters for antimicrobial classes. Use these functions to filter isolates on results in one of more antibiotics from a specific class:
+filter_aminoglycosides() +filter_carbapenems() +filter_cephalosporins() +filter_1st_cephalosporins() +filter_2nd_cephalosporins() +filter_3rd_cephalosporins() +filter_4th_cephalosporins() +filter_fluoroquinolones() +filter_glycopeptides() +filter_macrolides() +filter_tetracyclines()
The antibiotics
data set will be searched, after which the input data will be checked for column names with a value in any abbreviations, codes or official names found in the antibiotics
data set. For example:
septic_patients %>% filter_glycopeptides(result = "R") +# Filtering on glycopeptide antibacterials: any of `vanc` or `teic` is R +septic_patients %>% filter_glycopeptides(result = "R", scope = "all") +# Filtering on glycopeptide antibacterials: all of `vanc` and `teic` is R
All ab_*
functions are deprecated and replaced by atc_*
functions:
ab_property -> atc_property() +ab_name -> atc_name() +ab_official -> atc_official() +ab_trivial_nl -> atc_trivial_nl() +ab_certe -> atc_certe() +ab_umcg -> atc_umcg() +ab_tradenames -> atc_tradenames()
These functions use as.atc()
internally. The old atc_property
has been renamed atc_online_property()
. This is done for two reasons: firstly, not all ATC codes are of antibiotics (ab) but can also be of antivirals or antifungals. Secondly, the input must have class atc
or must be coerable to this class. Properties of these classes should start with the same class name, analogous to as.mo()
and e.g. mo_genus
.
New functions set_mo_source()
and get_mo_source()
to use your own predefined MO codes as input for as.mo()
and consequently all mo_*
functions
Support for the upcoming dplyr
version 0.8.0
New function guess_ab_col()
to find an antibiotic column in a table
New function mo_failures()
to review values that could not be coerced to a valid MO code, using as.mo()
. This latter function will now only show a maximum of 10 uncoerced values and will refer to mo_failures()
.
New function mo_uncertainties()
to review values that could be coerced to a valid MO code using as.mo()
, but with uncertainty.
New function mo_renamed()
to get a list of all returned values from as.mo()
that have had taxonomic renaming
New function age()
to calculate the (patients) age in years
New function age_groups()
to split ages into custom or predefined groups (like children or elderly). This allows for easier demographic antimicrobial resistance analysis per age group.
New function ggplot_rsi_predict()
as well as the base R plot()
function can now be used for resistance prediction calculated with resistance_predict()
:
x <- resistance_predict(septic_patients, col_ab = "amox") +plot(x) +ggplot_rsi_predict(x)
Functions filter_first_isolate()
and filter_first_weighted_isolate()
to shorten and fasten filtering on data sets with antimicrobial results, e.g.:
septic_patients %>% filter_first_isolate(...) +# or +filter_first_isolate(septic_patients, ...)
is equal to:
+septic_patients %>% + mutate(only_firsts = first_isolate(septic_patients, ...)) %>% + filter(only_firsts == TRUE) %>% + select(-only_firsts)
New function availability()
to check the number of available (non-empty) results in a data.frame
New vignettes about how to conduct AMR analysis, predict antimicrobial resistance, use the G-test and more. These are also available (and even easier readable) on our website: https://msberends.gitlab.io/AMR.
eucast_rules()
:
+septic_patients
now reflects these changeseucast_rules(..., verbose = TRUE)
to get a data set with all changed per bug and drug combination.microorganisms.oldDT
, microorganisms.prevDT
, microorganisms.unprevDT
and microorganismsDT
since they were no longer needed and only contained info already available in the microorganisms
data setantibiotics
data set, from the Pharmaceuticals Community Register of the European Commissionatc_group1_nl
and atc_group2_nl
from the antibiotics
data setatc_ddd()
and atc_groups()
have been renamed atc_online_ddd()
and atc_online_groups()
. The old functions are deprecated and will be removed in a future version.guess_mo()
is now deprecated in favour of as.mo()
and will be removed in future versionsguess_atc()
is now deprecated in favour of as.atc()
and will be removed in future versionsas.mo()
:
+Now handles incorrect spelling, like i
instead of y
and f
instead of ph
:
# mo_fullname() uses as.mo() internally + +mo_fullname("Sthafilokockus aaureuz") +#> [1] "Staphylococcus aureus" + +mo_fullname("S. klossi") +#> [1] "Staphylococcus kloosii"
Uncertainty of the algorithm is now divided into four levels, 0 to 3, where the default allow_uncertain = TRUE
is equal to uncertainty level 2. Run ?as.mo
for more info about these levels.
# equal: +as.mo(..., allow_uncertain = TRUE) +as.mo(..., allow_uncertain = 2) + +# also equal: +as.mo(..., allow_uncertain = FALSE) +as.mo(..., allow_uncertain = 0)
Using as.mo(..., allow_uncertain = 3)
could lead to very unreliable results.
Implemented the latest publication of Becker et al. (2019), for categorising coagulase-negative Staphylococci
All microbial IDs that found are now saved to a local file ~/.Rhistory_mo
. Use the new function clean_mo_history()
to delete this file, which resets the algorithms.
Incoercible results will now be considered ‘unknown’, MO code UNKNOWN
. On foreign systems, properties of these will be translated to all languages already previously supported: German, Dutch, French, Italian, Spanish and Portuguese:
mo_genus("qwerty", language = "es") +# Warning: +# one unique value (^= 100.0%) could not be coerced and is considered 'unknown': "qwerty". Use mo_failures() to review it. +#> [1] "(género desconocido)"
Fix for vector containing only empty values
Finds better results when input is in other languages
Better handling for subspecies
Better handling for Salmonellae, especially the ‘city like’ serovars like Salmonella London
Understanding of highly virulent E. coli strains like EIEC, EPEC and STEC
There will be looked for uncertain results at default - these results will be returned with an informative warning
Manual (help page) now contains more info about the algorithms
Progress bar will be shown when it takes more than 3 seconds to get results
Support for formatted console text
Console will return the percentage of uncoercable input
first_isolate()
:
+septic_patients
data set this yielded a difference of 0.15% more isolatescol_patientid
), when this parameter was left blankcol_keyantibiotics()
), when this parameter was left blankoutput_logical
, the function will now always return a logical valuefilter_specimen
to specimen_group
, although using filter_specimen
will still workportion
functions, that low counts can influence the outcome and that the portion
functions may camouflage this, since they only return the portion (albeit being dependent on the minimum
parameter)microorganisms.certe
and microorganisms.umcg
into microorganisms.codes
+mo_taxonomy()
now contains the kingdom toois.rsi.eligible()
using the new threshold
parameterscale_rsi_colours()
+mo
will now return the top 3 and the unique count, e.g. using summary(mo)
+rsi
and mic
+as.rsi()
:
+"HIGH S"
will return S
+freq()
function):
+Support for tidyverse quasiquotation! Now you can create frequency tables of function outcomes:
+# Determine genus of microorganisms (mo) in `septic_patients` data set: +# OLD WAY +septic_patients %>% + mutate(genus = mo_genus(mo)) %>% + freq(genus) +# NEW WAY +septic_patients %>% + freq(mo_genus(mo)) + +# Even supports grouping variables: +septic_patients %>% + group_by(gender) %>% + freq(mo_genus(mo))
Header info is now available as a list, with the header
function
The parameter header
is now set to TRUE
at default, even for markdown
Added header info for class mo
to show unique count of families, genera and species
Now honours the decimal.mark
setting, which just like format
defaults to getOption("OutDec")
The new big.mark
parameter will at default be ","
when decimal.mark = "."
and "."
otherwise
Fix for header text where all observations are NA
New parameter droplevels
to exclude empty factor levels when input is a factor
Factor levels will be in header when present in input data (maximum of 5)
Fix for using select()
on frequency tables
scale_y_percent()
now contains the limits
parametermdro()
, key_antibiotics()
and eucast_rules()
+resistance_predict()
function)as.mic()
to support more values ending in (several) zeroes%like%
, it will now return the callcount_all
to get all available isolates (that like all portion_*
and count_*
functions also supports summarise
and group_by
), the old n_rsi
is now an alias of count_all
+get_locale
to determine language for language-dependent output for some mo_*
functions. This is now the default value for their language
parameter, by which the system language will be used at default.microorganismsDT
, microorganisms.prevDT
, microorganisms.unprevDT
and microorganisms.oldDT
to improve the speed of as.mo
. They are for reference only, since they are primarily for internal use of as.mo
.read.4D
to read from the 4D database of the MMB department of the UMCGmo_authors
and mo_year
to get specific values about the scientific reference of a taxonomic entryFunctions MDRO
, BRMO
, MRGN
and EUCAST_exceptional_phenotypes
were renamed to mdro
, brmo
, mrgn
and eucast_exceptional_phenotypes
EUCAST_rules
was renamed to eucast_rules
, the old function still exists as a deprecated function
Big changes to the eucast_rules
function:
rules
to specify which rules should be applied (expert rules, breakpoints, others or all)verbose
which can be set to TRUE
to get very specific messages about which columns and rows were affectedseptic_patients
now reflects these changespipe
for piperacillin (J01CA12), also to the mdro
functionAdded column kingdom
to the microorganisms data set, and function mo_kingdom
to look up values
Tremendous speed improvement for as.mo
(and subsequently all mo_*
functions), as empty values wil be ignored a priori
Fewer than 3 characters as input for as.mo
will return NA
Function as.mo
(and all mo_*
wrappers) now supports genus abbreviations with “species” attached
as.mo("E. species") # B_ESCHR +mo_fullname("E. spp.") # "Escherichia species" +as.mo("S. spp") # B_STPHY +mo_fullname("S. species") # "Staphylococcus species"
Added parameter combine_IR
(TRUE/FALSE) to functions portion_df
and count_df
, to indicate that all values of I and R must be merged into one, so the output only consists of S vs. IR (susceptible vs. non-susceptible)
Fix for portion_*(..., as_percent = TRUE)
when minimal number of isolates would not be met
Added parameter also_single_tested
for portion_*
and count_*
functions to also include cases where not all antibiotics were tested but at least one of the tested antibiotics includes the target antimicribial interpretation, see ?portion
Using portion_*
functions now throws a warning when total available isolate is below parameter minimum
Functions as.mo
, as.rsi
, as.mic
, as.atc
and freq
will not set package name as attribute anymore
Frequency tables - freq()
:
Support for grouping variables, test with:
+septic_patients %>% + group_by(hospital_id) %>% + freq(gender)
Support for (un)selecting columns:
+septic_patients %>% + freq(hospital_id) %>% + select(-count, -cum_count) # only get item, percent, cum_percent
Check for hms::is.hms
Now prints in markdown at default in non-interactive sessions
No longer adds the factor level column and sorts factors on count again
Support for class difftime
New parameter na
, to choose which character to print for empty values
New parameter header
to turn the header info off (default when markdown = TRUE
)
New parameter title
to manually setbthe title of the frequency table
first_isolate
now tries to find columns to use as input when parameters are left blank
Improvements for MDRO algorithm (function mdro
)
Data set septic_patients
is now a data.frame
, not a tibble anymore
Removed diacritics from all authors (columns microorganisms$ref
and microorganisms.old$ref
) to comply with CRAN policy to only allow ASCII characters
Fix for mo_property
not working properly
Fix for eucast_rules
where some Streptococci would become ceftazidime R in EUCAST rule 4.5
Support for named vectors of class mo
, useful for top_freq()
ggplot_rsi
and scale_y_percent
have breaks
parameter
AI improvements for as.mo
:
"CRS"
-> Stenotrophomonas maltophilia
+"CRSM"
-> Stenotrophomonas maltophilia
+"MSSA"
-> Staphylococcus aureus
+"MSSE"
-> Staphylococcus epidermidis
+Fix for join
functions
Speed improvement for is.rsi.eligible
, now 15-20 times faster
In g.test
, when sum(x)
is below 1000 or any of the expected values is below 5, Fisher’s Exact Test will be suggested
ab_name
will try to fall back on as.atc
when no results are found
Removed the addin to view data sets
Percentages will now will rounded more logically (e.g. in freq
function)
The data set microorganisms
now contains all microbial taxonomic data from ITIS (kingdoms Bacteria, Fungi and Protozoa), the Integrated Taxonomy Information System, available via https://itis.gov. The data set now contains more than 18,000 microorganisms with all known bacteria, fungi and protozoa according ITIS with genus, species, subspecies, family, order, class, phylum and subkingdom. The new data set microorganisms.old
contains all previously known taxonomic names from those kingdoms.
New functions based on the existing function mo_property
:
mo_phylum
, mo_class
, mo_order
, mo_family
, mo_genus
, mo_species
, mo_subspecies
+mo_fullname
, mo_shortname
+mo_type
, mo_gramstain
+mo_ref
+They also come with support for German, Dutch, French, Italian, Spanish and Portuguese:
+mo_gramstain("E. coli") +# [1] "Gram negative" +mo_gramstain("E. coli", language = "de") # German +# [1] "Gramnegativ" +mo_gramstain("E. coli", language = "es") # Spanish +# [1] "Gram negativo" +mo_fullname("S. group A", language = "pt") # Portuguese +# [1] "Streptococcus grupo A"
Furthermore, former taxonomic names will give a note about the current taxonomic name:
+mo_gramstain("Esc blattae") +# Note: 'Escherichia blattae' (Burgess et al., 1973) was renamed 'Shimwellia blattae' (Priest and Barker, 2010) +# [1] "Gram negative"
Functions count_R
, count_IR
, count_I
, count_SI
and count_S
to selectively count resistant or susceptible isolates
count_df
(which works like portion_df
) to get all counts of S, I and R of a data set with antibiotic columns, with support for grouped variablesFunction is.rsi.eligible
to check for columns that have valid antimicrobial results, but do not have the rsi
class yet. Transform the columns of your raw data with: data %>% mutate_if(is.rsi.eligible, as.rsi)
Functions as.mo
and is.mo
as replacements for as.bactid
and is.bactid
(since the microoganisms
data set not only contains bacteria). These last two functions are deprecated and will be removed in a future release. The as.mo
function determines microbial IDs using intelligent rules:
as.mo("E. coli") +# [1] B_ESCHR_COL +as.mo("MRSA") +# [1] B_STPHY_AUR +as.mo("S group A") +# [1] B_STRPTC_GRA
And with great speed too - on a quite regular Linux server from 2007 it takes us less than 0.02 seconds to transform 25,000 items:
+thousands_of_E_colis <- rep("E. coli", 25000) +microbenchmark::microbenchmark(as.mo(thousands_of_E_colis), unit = "s") +# Unit: seconds +# min median max neval +# 0.01817717 0.01843957 0.03878077 100
Added parameter reference_df
for as.mo
, so users can supply their own microbial IDs, name or codes as a reference table
Renamed all previous references to bactid
to mo
, like:
EUCAST_rules
, first_isolate
and key_antibiotics
+microorganisms
and septic_patients
+Function labels_rsi_count
to print datalabels on a RSI ggplot2
model
Functions as.atc
and is.atc
to transform/look up antibiotic ATC codes as defined by the WHO. The existing function guess_atc
is now an alias of as.atc
.
Function ab_property
and its aliases: ab_name
, ab_tradenames
, ab_certe
, ab_umcg
and ab_trivial_nl
Introduction to AMR as a vignette
Removed clipboard functions as it violated the CRAN policy
Renamed septic_patients$sex
to septic_patients$gender
Added three antimicrobial agents to the antibiotics
data set: Terbinafine (D01BA02), Rifaximin (A07AA11) and Isoconazole (D01AC05)
Added 163 trade names to the antibiotics
data set, it now contains 298 different trade names in total, e.g.:
For first_isolate
, rows will be ignored when there’s no species available
Function ratio
is now deprecated and will be removed in a future release, as it is not really the scope of this package
Fix for as.mic
for values ending in zeroes after a real number
Small fix where B. fragilis would not be found in the microorganisms.umcg
data set
Added prevalence
column to the microorganisms
data set
Added parameters minimum
and as_percent
to portion_df
Support for quasiquotation in the functions series count_*
and portions_*
, and n_rsi
. This allows to check for more than 2 vectors or columns.
Edited ggplot_rsi
and geom_rsi
so they can cope with count_df
. The new fun
parameter has value portion_df
at default, but can be set to count_df
.
Fix for ggplot_rsi
when the ggplot2
package was not loaded
Added datalabels function labels_rsi_count
to ggplot_rsi
Added possibility to set any parameter to geom_rsi
(and ggplot_rsi
) so you can set your own preferences
Fix for joins, where predefined suffices would not be honoured
Added parameter quote
to the freq
function
Added generic function diff
for frequency tables
Added longest en shortest character length in the frequency table (freq
) header of class character
Support for types (classes) list and matrix for freq
For lists, subsetting is possible:
+my_list = list(age = septic_patients$age, gender = septic_patients$gender) +my_list %>% freq(age) +my_list %>% freq(gender)
rsi_df
was removed in favour of new functions portion_R
, portion_IR
, portion_I
, portion_SI
and portion_S
to selectively calculate resistance or susceptibility. These functions are 20 to 30 times faster than the old rsi
function. The old function still works, but is deprecated.
+portion_df
to get all portions of S, I and R of a data set with antibiotic columns, with support for grouped variablesggplot2
+geom_rsi
, facet_rsi
, scale_y_percent
, scale_rsi_colours
and theme_rsi
+ggplot_rsi
to apply all above functions on a data set:
+septic_patients %>% select(tobr, gent) %>% ggplot_rsi
will show portions of S, I and R immediately in a pretty plot?ggplot_rsi
+as.bactid
and is.bactid
to transform/ look up microbial ID’s.guess_bactid
is now an alias of as.bactid
+kurtosis
and skewness
that are lacking in base R - they are generic functions and have support for vectors, data.frames and matricesg.test
to perform the Χ2 distributed G-test, which use is the same as chisq.test
+ratio
to transform a vector of values to a preset ratioratio(c(10, 500, 10), ratio = "1:2:1")
would return 130, 260, 130
%in%
or %like%
(and give them keyboard shortcuts), or to view the datasets that come with this packagep.symbol
to transform p values to their related symbols: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+clipboard_import
and clipboard_export
as helper functions to quickly copy and paste from/to software like Excel and SPSS. These functions use the clipr
package, but are a little altered to also support headless Linux servers (so you can use it in RStudio Server)freq
):
+rsi
(antimicrobial resistance) to use as inputtable
to use as input: freq(table(x, y))
+hist
and plot
to use a frequency table as input: hist(freq(df$age))
+as.vector
, as.data.frame
, as_tibble
and format
+freq(mydata, mycolumn)
is the same as mydata %>% freq(mycolumn)
+top_freq
function to return the top/below n items as vectoroptions(max.print.freq = n)
where n is your preset valueresistance_predict
and added more examplesseptic_patients
data set to better reflect the realitymic
and rsi
classes now returns all values - use freq
to check distributionskey_antibiotics
function are now generic: 6 for broadspectrum ABs, 6 for Gram-positive specific and 6 for Gram-negative specific ABsabname
function%like%
now supports multiple patternsdata.frame
s with altered console printing to make it look like a frequency table. Because of this, the parameter toConsole
is not longer needed.freq
where the class of an item would be lostseptic_patients
dataset and the column bactid
now has the new class "bactid"
+microorganisms
dataset (especially for Salmonella) and the column bactid
now has the new class "bactid"
+rsi
and mic
functions:
+as.rsi("<=0.002; S")
will return S
+as.mic("<=0.002; S")
will return <=0.002
+as.mic("<= 0.002")
now worksrsi
and mic
do not add the attribute package.version
anymore"groups"
option for atc_property(..., property)
. It will return a vector of the ATC hierarchy as defined by the WHO. The new function atc_groups
is a convenient wrapper around this.atc_property
as it requires the host set by url
to be responsivefirst_isolate
algorithm to exclude isolates where bacteria ID or genus is unavailable924b62
) from the dplyr
package v0.7.5 and aboveguess_bactid
(now called as.bactid
)
+yourdata %>% select(genus, species) %>% as.bactid()
now also worksn_rsi
to count cases where antibiotic test results were available, to be used in conjunction with dplyr::summarise
, see ?rsiguess_bactid
to determine the ID of a microorganism based on genus/species or known abbreviations like MRSAguess_atc
to determine the ATC of an antibiotic based on name, trade name, or known abbreviationsfreq
to create frequency tables, with additional info in a headerMDRO
to determine Multi Drug Resistant Organisms (MDRO) with support for country-specific guidelines.
+BRMO
and MRGN
are wrappers for Dutch and German guidelines, respectively"points"
or "keyantibiotics"
, see ?first_isolate
+tibble
s and data.table
srsi
class for vectors that contain only invalid antimicrobial interpretationsablist
to antibiotics
+bactlist
to microorganisms
+antibiotics
datasetmicroorganisms
datasetseptic_patients
+join
functions%like%
to make it case insensitivefirst_isolate
and EUCAST_rules
column names are now case-insensitiveas.rsi
and as.mic
now add the package name and version as attributesREADME.md
with more examplestestthat
packageEUCAST_rules
applies for amoxicillin even if ampicillin is missingrsi
and mic
classesThese functions are so-called 'Deprecated'. They will be removed in a future release. Using the functions will give a warning with the name of the function it has been replaced by (if there is one).
+portion_R(...) + +portion_IR(...) + +portion_I(...) + +portion_SI(...) + +portion_S(...) + +portion_df(...)+ + +
+The lifecycle of this function is retired. A retired function is no longer under active development, and (if appropiate) a better alternative is available. No new arguments will be added, and only the most critical bugs will be fixed. In a future version, this function will be removed.
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +AMR
Package — AMR • AMR (for R)Welcome to the AMR
package.
AMR
is a free and open-source R package to simplify the analysis and prediction of Antimicrobial Resistance (AMR) and to work with microbial and antimicrobial properties by using evidence-based methods. It supports any table format, including WHONET/EARS-Net data.
We created this package for both academic research and routine analysis at the Faculty of Medical Sciences of the University of Groningen and the Medical Microbiology & Infection Prevention (MMBI) department of the University Medical Center Groningen (UMCG). This R package is actively maintained and free software; you can freely use and distribute it for both personal and commercial (but not patent) purposes under the terms of the GNU General Public License version 2.0 (GPL-2), as published by the Free Software Foundation.
+This package can be used for:
Reference for the taxonomy of microorganisms, since the package contains all microbial (sub)species from the Catalogue of Life
Interpreting raw MIC and disk diffusion values, based on the latest CLSI or EUCAST guidelines
Retrieving antimicrobial drug names, doses and forms of administration from clinical health care records
Determining first isolates to be used for AMR analysis
Calculating antimicrobial resistance
Determining multi-drug resistance (MDR) / multi-drug resistant organisms (MDRO)
Calculating (empirical) susceptibility of both mono therapy and combination therapies
Predicting future antimicrobial resistance using regression models
Getting properties for any microorganism (like Gram stain, species, genus or family)
Getting properties for any antibiotic (like name, EARS-Net code, ATC code, PubChem code, defined daily dose or trade name)
Plotting antimicrobial resistance
Getting SNOMED codes of a microorganism, or get its name associated with a SNOMED code
Getting LOINC codes of an antibiotic, or get its name associated with a LOINC code
Machine reading the EUCAST and CLSI guidelines from 2011-2020 to translate MIC values and disk diffusion diameters to R/SI
Principal component analysis for AMR
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+For suggestions, comments or questions, please contact us at:
+Matthijs S. Berends
+m.s.berends [at] umcg [dot] nl
+University of Groningen
+Department of Medical Microbiology
+University Medical Center Groningen
+Post Office Box 30001
+9700 RB Groningen
+The Netherlands
If you have found a bug, please file a new issue at:
+https://github.com/msberends/AMR/issues
All antimicrobial drugs and their official names, ATC codes, ATC groups and defined daily dose (DDD) are included in this package, using the WHO Collaborating Centre for Drug Statistics Methodology.
+
+This package contains all ~550 antibiotic, antimycotic and antiviral drugs and their Anatomical Therapeutic Chemical (ATC) codes, ATC groups and Defined Daily Dose (DDD) from the World Health Organization Collaborating Centre for Drug Statistics Methodology (WHOCC, https://www.whocc.no) and the Pharmaceuticals Community Register of the European Commission (http://ec.europa.eu/health/documents/community-register/html/atc.htm).
These have become the gold standard for international drug utilisation monitoring and research.
+The WHOCC is located in Oslo at the Norwegian Institute of Public Health and funded by the Norwegian government. The European Commission is the executive of the European Union and promotes its general interest.
+NOTE: The WHOCC copyright does not allow use for commercial purposes, unlike any other info from this package. See https://www.whocc.no/copyright_disclaimer/.
+On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +as.ab("meropenem") +ab_name("J01DH02") + +ab_tradenames("flucloxacillin")+
This example data set has the exact same structure as an export file from WHONET. Such files can be used with this package, as this example data set shows. The data itself was based on our example_isolates data set.
+WHONET
+
+
+ A data.frame
with 500 observations and 53 variables:
Identification number
ID of the sample
Specimen number
ID of the specimen
Organism
Name of the microorganism. Before analysis, you should transform this to a valid microbial class, using as.mo()
.
Country
Country of origin
Laboratory
Name of laboratory
Last name
Last name of patient
First name
Initial of patient
Sex
Gender of patient
Age
Age of patient
Age category
Age group, can also be looked up using age_groups()
Date of admission
Date of hospital admission
Specimen date
Date when specimen was received at laboratory
Specimen type
Specimen type or group
Specimen type (Numeric)
Translation of "Specimen type"
Reason
Reason of request with Differential Diagnosis
Isolate number
ID of isolate
Organism type
Type of microorganism, can also be looked up using mo_type()
Serotype
Serotype of microorganism
Beta-lactamase
Microorganism produces beta-lactamase?
ESBL
Microorganism produces extended spectrum beta-lactamase?
Carbapenemase
Microorganism produces carbapenemase?
MRSA screening test
Microorganism is possible MRSA?
Inducible clindamycin resistance
Clindamycin can be induced?
Comment
Other comments
Date of data entry
Date this data was entered in WHONET
AMP_ND10:CIP_EE
28 different antibiotics. You can lookup the abbreviations in the antibiotics data set, or use e.g. ab_name("AMP")
to get the official name immediately. Before analysis, you should transform this to a valid antibiotic class, using as.rsi()
.
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +R/ab_from_text.R
+ ab_from_text.Rd
Use this function on e.g. clinical texts from health care records. It returns a list with all antimicrobial drugs, doses and forms of administration found in the texts.
+ab_from_text( + text, + type = c("drug", "dose", "administration"), + collapse = NULL, + translate_ab = FALSE, + thorough_search = NULL, + ... +)+ +
text | +text to analyse |
+
---|---|
type | +type of property to search for, either |
+
collapse | +character to pass on to |
+
translate_ab | +if |
+
thorough_search | +logical to indicate whether the input must be extensively searched for misspelling and other faulty input values. Setting this to |
+
... | +parameters passed on to |
+
A list, or a character if collapse
is not NULL
This function is also internally used by as.ab()
, although it then only searches for the first drug name and will throw a note if more drug names could have been returned.
type
At default, the function will search for antimicrobial drug names. All text elements will be searched for official names, ATC codes and brand names. As it uses as.ab()
internally, it will correct for misspelling.
With type = "dose"
(or similar, like "dosing", "doses"), all text elements will be searched for numeric values that are higher than 100 and do not resemble years. The output will be numeric. It supports any unit (g, mg, IE, etc.) and multiple values in one clinical text, see Examples.
With type = "administration"
(or abbreviations, like "admin", "adm"), all text elements will be searched for a form of drug administration. It supports the following forms (including common abbreviations): buccal, implant, inhalation, instillation, intravenous, nasal, oral, parenteral, rectal, sublingual, transdermal and vaginal. Abbreviations for oral (such as 'po', 'per os') will become "oral", all values for intravenous (such as 'iv', 'intraven') will become "iv". It supports multiple values in one clinical text, see Examples.
collapse
Without using collapse
, this function will return a list. This can be convenient to use e.g. inside a mutate()
):
+df %>% mutate(abx = ab_from_text(clinical_text))
The returned AB codes can be transformed to official names, groups, etc. with all ab_property()
functions like ab_name()
and ab_group()
, or by using the translate_ab
parameter.
With using collapse
, this function will return a character:
+df %>% mutate(abx = ab_from_text(clinical_text, collapse = "|"))
+The lifecycle of this function is maturing. The unlying code of a maturing function has been roughed out, but finer details might still change. Since this function needs wider usage and more extensive testing, you are very welcome to suggest changes at our repository or write us an email (see section 'Contact Us').
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +# mind the bad spelling of amoxicillin in this line, +# straight from a true health care record: +ab_from_text("28/03/2020 regular amoxicilliin 500mg po tds") + +ab_from_text("500 mg amoxi po and 400mg cipro iv") +ab_from_text("500 mg amoxi po and 400mg cipro iv", type = "dose") +ab_from_text("500 mg amoxi po and 400mg cipro iv", type = "admin") + +ab_from_text("500 mg amoxi po and 400mg cipro iv", collapse = ", ") + +# if you want to know which antibiotic groups were administered, do e.g.: +abx <- ab_from_text("500 mg amoxi po and 400mg cipro iv") +ab_group(abx[[1]]) + +if (require(dplyr)) { + tibble(clinical_text = c("given 400mg cipro and 500 mg amox", + "started on doxy iv today")) %>% + mutate(abx_codes = ab_from_text(clinical_text), + abx_doses = ab_from_text(clinical_text, type = "doses"), + abx_admin = ab_from_text(clinical_text, type = "admin"), + abx_coll = ab_from_text(clinical_text, collapse = "|"), + abx_coll_names = ab_from_text(clinical_text, + collapse = "|", + translate_ab = "name"), + abx_coll_doses = ab_from_text(clinical_text, + type = "doses", + collapse = "|"), + abx_coll_admin = ab_from_text(clinical_text, + type = "admin", + collapse = "|")) + +}+
Use these functions to return a specific property of an antibiotic from the antibiotics data set. All input values will be evaluated internally with as.ab()
.
ab_name(x, language = get_locale(), tolower = FALSE, ...) + +ab_atc(x, ...) + +ab_cid(x, ...) + +ab_synonyms(x, ...) + +ab_tradenames(x, ...) + +ab_group(x, language = get_locale(), ...) + +ab_atc_group1(x, language = get_locale(), ...) + +ab_atc_group2(x, language = get_locale(), ...) + +ab_loinc(x, ...) + +ab_ddd(x, administration = "oral", units = FALSE, ...) + +ab_info(x, language = get_locale(), ...) + +ab_url(x, open = FALSE, ...) + +ab_property(x, property = "name", language = get_locale(), ...)+ +
x | +any (vector of) text that can be coerced to a valid microorganism code with |
+
---|---|
language | +language of the returned text, defaults to system language (see |
+
tolower | +logical to indicate whether the first character of every output should be transformed to a lower case character. This will lead to e.g. "polymyxin B" and not "polymyxin b". |
+
... | +other parameters passed on to |
+
administration | +way of administration, either |
+
units | +a logical to indicate whether the units instead of the DDDs itself must be returned, see Examples |
+
open | +browse the URL using |
+
property | +one of the column names of one of the antibiotics data set |
+
An integer
in case of ab_cid()
A named list
in case of ab_info()
and multiple ab_synonyms()
/ab_tradenames()
A double
in case of ab_ddd()
A character
in all other cases
All output will be translated where possible.
+The function ab_url()
will return the direct URL to the official WHO website. A warning will be returned if the required ATC code is not available.
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+World Health Organization (WHO) Collaborating Centre for Drug Statistics Methodology: https://www.whocc.no/atc_ddd_index/
+WHONET 2019 software: http://www.whonet.org/software.html
+European Commission Public Health PHARMACEUTICALS - COMMUNITY REGISTER: http://ec.europa.eu/health/documents/community-register/html/atc.htm
+On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+# all properties: +ab_name("AMX") # "Amoxicillin" +ab_atc("AMX") # J01CA04 (ATC code from the WHO) +ab_cid("AMX") # 33613 (Compound ID from PubChem) +ab_synonyms("AMX") # a list with brand names of amoxicillin +ab_tradenames("AMX") # same +ab_group("AMX") # "Beta-lactams/penicillins" +ab_atc_group1("AMX") # "Beta-lactam antibacterials, penicillins" +ab_atc_group2("AMX") # "Penicillins with extended spectrum" +ab_url("AMX") # link to the official WHO page + +# smart lowercase tranformation +ab_name(x = c("AMC", "PLB")) # "Amoxicillin/clavulanic acid" "Polymyxin B" +ab_name(x = c("AMC", "PLB"), + tolower = TRUE) # "amoxicillin/clavulanic acid" "polymyxin B" + +# defined daily doses (DDD) +ab_ddd("AMX", "oral") # 1 +ab_ddd("AMX", "oral", units = TRUE) # "g" +ab_ddd("AMX", "iv") # 1 +ab_ddd("AMX", "iv", units = TRUE) # "g" + +ab_info("AMX") # all properties as a list + +# all ab_* functions use as.ab() internally, so you can go from 'any' to 'any': +ab_atc("AMP") # ATC code of AMP (ampicillin) +ab_group("J01CA01") # Drug group of ampicillins ATC code +ab_loinc("ampicillin") # LOINC codes of ampicillin +ab_name("21066-6") # "Ampicillin" (using LOINC) +ab_name(6249) # "Ampicillin" (using CID) +ab_name("J01CA01") # "Ampicillin" (using ATC) + +# spelling from different languages and dyslexia are no problem +ab_atc("ceftriaxon") +ab_atc("cephtriaxone") +ab_atc("cephthriaxone") +ab_atc("seephthriaaksone")+
Calculates age in years based on a reference date, which is the sytem date at default.
+age(x, reference = Sys.Date(), exact = FALSE, na.rm = FALSE)+ +
x | +date(s), will be coerced with |
+
---|---|
reference | +reference date(s) (defaults to today), will be coerced with |
+
exact | +a logical to indicate whether age calculation should be exact, i.e. with decimals. It divides the number of days of year-to-date (YTD) of |
+
na.rm | +a logical to indicate whether missing values should be removed |
+
An integer (no decimals) if exact = FALSE
, a double (with decimals) otherwise
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+To split ages into groups, use the age_groups()
function.
# 10 random birth dates +df <- data.frame(birth_date = Sys.Date() - runif(10) * 25000) +# add ages +df$age <- age(df$birth_date) +# add exact ages +df$age_exact <- age(df$birth_date, exact = TRUE) + +df+
Split ages into age groups defined by the split
parameter. This allows for easier demographic (antimicrobial resistance) analysis.
age_groups(x, split_at = c(12, 25, 55, 75), na.rm = FALSE)+ +
x | +age, e.g. calculated with |
+
---|---|
split_at | +values to split |
+
na.rm | +a logical to indicate whether missing values should be removed |
+
Ordered factor
+To split ages, the input for the split_at
parameter can be:
A numeric vector. A vector of e.g. c(10, 20)
will split on 0-9, 10-19 and 20+. A value of only 50
will split on 0-49 and 50+.
+The default is to split on young children (0-11), youth (12-24), young adults (25-54), middle-aged adults (55-74) and elderly (75+).
A character:
"children"
or "kids"
, equivalent of: c(0, 1, 2, 4, 6, 13, 18)
. This will split on 0, 1, 2-3, 4-5, 6-12, 13-17 and 18+.
"elderly"
or "seniors"
, equivalent of: c(65, 75, 85)
. This will split on 0-64, 65-74, 75-84, 85+.
"fives"
, equivalent of: 1:20 * 5
. This will split on 0-4, 5-9, 10-14, ..., 90-94, 95-99, 100+.
"tens"
, equivalent of: 1:10 * 10
. This will split on 0-9, 10-19, 20-29, ..., 80-89, 90-99, 100+.
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+To determine ages, based on one or more reference dates, use the age()
function.
ages <- c(3, 8, 16, 54, 31, 76, 101, 43, 21) + +# split into 0-49 and 50+ +age_groups(ages, 50) + +# split into 0-19, 20-49 and 50+ +age_groups(ages, c(20, 50)) + +# split into groups of ten years +age_groups(ages, 1:10 * 10) +age_groups(ages, split_at = "tens") + +# split into groups of five years +age_groups(ages, 1:20 * 5) +age_groups(ages, split_at = "fives") + +# split specifically for children +age_groups(ages, "children") +# same: +age_groups(ages, c(1, 2, 4, 6, 13, 17)) + +if (FALSE) { +# resistance of ciprofloxacine per age group +library(dplyr) +example_isolates %>% + filter_first_isolate() %>% + filter(mo == as.mo("E. coli")) %>% + group_by(age_group = age_groups(age)) %>% + select(age_group, CIP) %>% + ggplot_rsi(x = "age_group") +}+
Use these selection helpers inside any function that allows Tidyverse selections, like dplyr::select()
or tidyr::pivot_longer()
. They help to select the columns of antibiotics that are of a specific antibiotic class, without the need to define the columns or antibiotic abbreviations.
ab_class(ab_class) + +aminoglycosides() + +carbapenems() + +cephalosporins() + +cephalosporins_1st() + +cephalosporins_2nd() + +cephalosporins_3rd() + +cephalosporins_4th() + +cephalosporins_5th() + +fluoroquinolones() + +glycopeptides() + +macrolides() + +penicillins() + +tetracyclines()+ +
ab_class | +an antimicrobial class, like |
+
---|
All columns will be searched for known antibiotic names, abbreviations, brand names and codes (ATC, EARS-Net, WHO, etc.). This means that a selector like e.g. aminoglycosides()
will pick up column names like 'gen', 'genta', 'J01GB03', 'tobra', 'Tobracin', etc.
These functions only work if the tidyselect
package is installed, that comes with the dplyr
package. An error will be thrown if tidyselect
package is not installed, or if the functions are used outside a function that allows Tidyverse selections like select()
or pivot_longer()
.
filter_ab_class()
for the filter()
equivalent.
if (require("dplyr")) { + + # this will select columns 'IPM' (imipenem) and 'MEM' (meropenem): + example_isolates %>% + select(carbapenems()) + + # this will select columns 'mo', 'AMK', 'GEN', 'KAN' and 'TOB': + example_isolates %>% + select(mo, aminoglycosides()) + + # this will select columns 'mo' and all antimycobacterial drugs ('RIF'): + example_isolates %>% + select(mo, ab_class("mycobact")) + + + # get bug/drug combinations for only macrolides in Gram-positives: + example_isolates %>% + filter(mo_gramstain(mo) %like% "pos") %>% + select(mo, macrolides()) %>% + bug_drug_combinations() %>% + format() + + + data.frame(irrelevant = "value", + J01CA01 = "S") %>% # ATC code of ampicillin + select(penicillins()) # so the 'J01CA01' column is selected + +}+
Two data sets containing all antibiotics/antimycotics and antivirals. Use as.ab()
or one of the ab_property()
functions to retrieve values from the antibiotics data set. Three identifiers are included in this data set: an antibiotic ID (ab
, primarily used in this package) as defined by WHONET/EARS-Net, an ATC code (atc
) as defined by the WHO, and a Compound ID (cid
) as found in PubChem. Other properties in this data set are derived from one or more of these codes.
antibiotics + +antivirals+ + +
data.frame
with 456 observations and 14 variables:ab
Antibiotic ID as used in this package (like AMC
), using the official EARS-Net (European Antimicrobial Resistance Surveillance Network) codes where available
atc
ATC code (Anatomical Therapeutic Chemical) as defined by the WHOCC, like J01CR02
cid
Compound ID as found in PubChem
name
Official name as used by WHONET/EARS-Net or the WHO
group
A short and concise group name, based on WHONET and WHOCC definitions
atc_group1
Official pharmacological subgroup (3rd level ATC code) as defined by the WHOCC, like "Macrolides, lincosamides and streptogramins"
atc_group2
Official chemical subgroup (4th level ATC code) as defined by the WHOCC, like "Macrolides"
abbr
List of abbreviations as used in many countries, also for antibiotic susceptibility testing (AST)
synonyms
Synonyms (often trade names) of a drug, as found in PubChem based on their compound ID
oral_ddd
Defined Daily Dose (DDD), oral treatment
oral_units
Units of oral_ddd
iv_ddd
Defined Daily Dose (DDD), parenteral treatment
iv_units
Units of iv_ddd
loinc
All LOINC codes (Logical Observation Identifiers Names and Codes) associated with the name of the antimicrobial agent. Use ab_loinc()
to retrieve them quickly, see ab_property()
.
data.frame
with 102 observations and 9 variables:atc
ATC code (Anatomical Therapeutic Chemical) as defined by the WHOCC
cid
Compound ID as found in PubChem
name
Official name as used by WHONET/EARS-Net or the WHO
atc_group
Official pharmacological subgroup (3rd level ATC code) as defined by the WHOCC
synonyms
Synonyms (often trade names) of a drug, as found in PubChem based on their compound ID
oral_ddd
Defined Daily Dose (DDD), oral treatment
oral_units
Units of oral_ddd
iv_ddd
Defined Daily Dose (DDD), parenteral treatment
iv_units
Units of iv_ddd
An object of class data.frame
with 102 rows and 9 columns.
World Health Organization (WHO) Collaborating Centre for Drug Statistics Methodology (WHOCC): https://www.whocc.no/atc_ddd_index/
+WHONET 2019 software: http://www.whonet.org/software.html
+European Commission Public Health PHARMACEUTICALS - COMMUNITY REGISTER: http://ec.europa.eu/health/documents/community-register/html/atc.htm
+Properties that are based on an ATC code are only available when an ATC is available. These properties are: atc_group1
, atc_group2
, oral_ddd
, oral_units
, iv_ddd
and iv_units
.
Synonyms (i.e. trade names) are derived from the Compound ID (cid
) and consequently only available where a CID is available.
These data sets are available as 'flat files' for use even without R - you can find the files here:
https://github.com/msberends/AMR/raw/master/data-raw/antibiotics.txt
https://github.com/msberends/AMR/raw/master/data-raw/antivirals.txt
Files in R format (with preserved data structure) can be found here:
https://github.com/msberends/AMR/raw/master/data/antibiotics.rda
https://github.com/msberends/AMR/raw/master/data/antivirals.rda
+This package contains all ~550 antibiotic, antimycotic and antiviral drugs and their Anatomical Therapeutic Chemical (ATC) codes, ATC groups and Defined Daily Dose (DDD) from the World Health Organization Collaborating Centre for Drug Statistics Methodology (WHOCC, https://www.whocc.no) and the Pharmaceuticals Community Register of the European Commission (http://ec.europa.eu/health/documents/community-register/html/atc.htm).
These have become the gold standard for international drug utilisation monitoring and research.
+The WHOCC is located in Oslo at the Norwegian Institute of Public Health and funded by the Norwegian government. The European Commission is the executive of the European Union and promotes its general interest.
+NOTE: The WHOCC copyright does not allow use for commercial purposes, unlike any other info from this package. See https://www.whocc.no/copyright_disclaimer/.
+On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+Use this function to determine the antibiotic code of one or more antibiotics. The data set antibiotics will be searched for abbreviations, official names and synonyms (brand names).
+as.ab(x, flag_multiple_results = TRUE, ...) + +is.ab(x)+ +
x | +character vector to determine to antibiotic ID |
+
---|---|
flag_multiple_results | +logical to indicate whether a note should be printed to the console that probably more than one antibiotic code or name can be retrieved from a single input value. |
+
... | +arguments passed on to internal functions |
+
Character (vector) with class ab
. Unknown values will return NA
.
All entries in the antibiotics data set have three different identifiers: a human readable EARS-Net code (column ab
, used by ECDC and WHONET), an ATC code (column atc
, used by WHO), and a CID code (column cid
, Compound ID, used by PubChem). The data set contains more than 5,000 official brand names from many different countries, as found in PubChem.
All these properties will be searched for the user input. The as.ab()
can correct for different forms of misspelling:
Wrong spelling of drug names (like "tobramicin" or "gentamycin"), which corrects for most audible similarities such as f/ph, x/ks, c/z/s, t/th, etc.
Too few or too many vowels or consonants
Switching two characters (like "mreopenem", often the case in clinical data, when doctors typed too fast)
Digitalised paper records, leaving artefacts like 0/o/O (zero and O's), B/8, n/r, etc.
Use the ab_property()
functions to get properties based on the returned antibiotic ID, see Examples.
World Health Organization (WHO) Collaborating Centre for Drug Statistics Methodology: https://www.whocc.no/atc_ddd_index/
+WHONET 2019 software: http://www.whonet.org/software.html
+European Commission Public Health PHARMACEUTICALS - COMMUNITY REGISTER: http://ec.europa.eu/health/documents/community-register/html/atc.htm
+
+The lifecycle of this function is maturing. The unlying code of a maturing function has been roughed out, but finer details might still change. Since this function needs wider usage and more extensive testing, you are very welcome to suggest changes at our repository or write us an email (see section 'Contact Us').
+This package contains all ~550 antibiotic, antimycotic and antiviral drugs and their Anatomical Therapeutic Chemical (ATC) codes, ATC groups and Defined Daily Dose (DDD) from the World Health Organization Collaborating Centre for Drug Statistics Methodology (WHOCC, https://www.whocc.no) and the Pharmaceuticals Community Register of the European Commission (http://ec.europa.eu/health/documents/community-register/html/atc.htm).
These have become the gold standard for international drug utilisation monitoring and research.
+The WHOCC is located in Oslo at the Norwegian Institute of Public Health and funded by the Norwegian government. The European Commission is the executive of the European Union and promotes its general interest.
+NOTE: The WHOCC copyright does not allow use for commercial purposes, unlike any other info from this package. See https://www.whocc.no/copyright_disclaimer/.
+On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+antibiotics for the dataframe that is being used to determine ATCs
ab_from_text()
for a function to retrieve antimicrobial drugs from clinical text (from health care records)
# these examples all return "ERY", the ID of erythromycin: +as.ab("J01FA01") +as.ab("J 01 FA 01") +as.ab("Erythromycin") +as.ab("eryt") +as.ab(" eryt 123") +as.ab("ERYT") +as.ab("ERY") +as.ab("eritromicine") # spelled wrong, yet works +as.ab("Erythrocin") # trade name +as.ab("Romycin") # trade name + +# spelling from different languages and dyslexia are no problem +ab_atc("ceftriaxon") +ab_atc("cephtriaxone") # small spelling error +ab_atc("cephthriaxone") # or a bit more severe +ab_atc("seephthriaaksone") # and even this works + +# use ab_* functions to get a specific properties (see ?ab_property); +# they use as.ab() internally: +ab_name("J01FA01") # "Erythromycin" +ab_name("eryt") # "Erythromycin"+
This transforms a vector to a new class disk
, which is a growth zone size (around an antibiotic disk) in millimetres between 6 and 50.
as.disk(x, na.rm = FALSE) + +is.disk(x)+ +
x | +vector |
+
---|---|
na.rm | +a logical indicating whether missing values should be removed |
+
An integer
with additional new class disk
Interpret disk values as RSI values with as.rsi()
. It supports guidelines from EUCAST and CLSI.
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+if (FALSE) { +# transform existing disk zones to the `disk` class +library(dplyr) +df <- data.frame(microorganism = "E. coli", + AMP = 20, + CIP = 14, + GEN = 18, + TOB = 16) +df <- df %>% mutate_at(vars(AMP:TOB), as.disk) +df + +# interpret disk values, see ?as.rsi +as.rsi(x = as.disk(18), + mo = "Strep pneu", # `mo` will be coerced with as.mo() + ab = "ampicillin", # and `ab` with as.ab() + guideline = "EUCAST") + +as.rsi(df) +}+
This transforms a vector to a new class mic
, which is an ordered factor
with valid MIC values as levels. Invalid MIC values will be translated as NA
with a warning.
as.mic(x, na.rm = FALSE) + +is.mic(x)+ +
x | +vector |
+
---|---|
na.rm | +a logical indicating whether missing values should be removed |
+
Ordered factor
with new class mic
To interpret MIC values as RSI values, use as.rsi()
on MIC values. It supports guidelines from EUCAST and CLSI.
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+mic_data <- as.mic(c(">=32", "1.0", "1", "1.00", 8, "<=0.128", "8", "16", "16")) +is.mic(mic_data) + +# this can also coerce combined MIC/RSI values: +as.mic("<=0.002; S") # will return <=0.002 + +# interpret MIC values +as.rsi(x = as.mic(2), + mo = as.mo("S. pneumoniae"), + ab = "AMX", + guideline = "EUCAST") +as.rsi(x = as.mic(4), + mo = as.mo("S. pneumoniae"), + ab = "AMX", + guideline = "EUCAST") + +plot(mic_data) +barplot(mic_data)+
Use this function to determine a valid microorganism ID (mo
). Determination is done using intelligent rules and the complete taxonomic kingdoms Bacteria, Chromista, Protozoa, Archaea and most microbial species from the kingdom Fungi (see Source). The input can be almost anything: a full name (like "Staphylococcus aureus"
), an abbreviated name (like "S. aureus"
), an abbreviation known in the field (like "MRSA"
), or just a genus. Please see Examples.
as.mo( + x, + Becker = FALSE, + Lancefield = FALSE, + allow_uncertain = TRUE, + reference_df = get_mo_source(), + ... +) + +is.mo(x) + +mo_failures() + +mo_uncertainties() + +mo_renamed()+ +
x | +a character vector or a |
+
---|---|
Becker | +a logical to indicate whether Staphylococci should be categorised into coagulase-negative Staphylococci ("CoNS") and coagulase-positive Staphylococci ("CoPS") instead of their own species, according to Karsten Becker et al. (1,2). Note that this does not include species that were newly named after these publications, like S. caeli. +This excludes Staphylococcus aureus at default, use |
+
Lancefield | +a logical to indicate whether beta-haemolytic Streptococci should be categorised into Lancefield groups instead of their own species, according to Rebecca C. Lancefield (3). These Streptococci will be categorised in their first group, e.g. Streptococcus dysgalactiae will be group C, although officially it was also categorised into groups G and L. +This excludes Enterococci at default (who are in group D), use |
+
allow_uncertain | +a number between |
+
reference_df | +a |
+
... | +other parameters passed on to functions |
+
A character
vector with class mo
A microorganism ID from this package (class: mo
) typically looks like these examples:
Code Full name + --------------- -------------------------------------- + B_KLBSL Klebsiella + B_KLBSL_PNMN Klebsiella pneumoniae + B_KLBSL_PNMN_RHNS Klebsiella pneumoniae rhinoscleromatis + | | | | + | | | | + | | | ---> subspecies, a 4-5 letter acronym + | | ----> species, a 4-5 letter acronym + | ----> genus, a 5-7 letter acronym + ----> taxonomic kingdom: A (Archaea), AN (Animalia), B (Bacteria), + C (Chromista), F (Fungi), P (Protozoa) ++ +
Values that cannot be coered will be considered 'unknown' and will get the MO code UNKNOWN
.
Use the mo_*
functions to get properties based on the returned code, see Examples.
The algorithm uses data from the Catalogue of Life (see below) and from one other source (see microorganisms).
+The as.mo()
function uses several coercion rules for fast and logical results. It assesses the input matching criteria in the following order:
Human pathogenic prevalence: the function starts with more prevalent microorganisms, followed by less prevalent ones;
Taxonomic kingdom: the function starts with determining Bacteria, then Fungi, then Protozoa, then others;
Breakdown of input values to identify possible matches.
This will lead to the effect that e.g. "E. coli"
(a microorganism highly prevalent in humans) will return the microbial ID of Escherichia coli and not Entamoeba coli (a microorganism less prevalent in humans), although the latter would alphabetically come first.
In addition, the as.mo()
function can differentiate four levels of uncertainty to guess valid results:
Uncertainty level 0: no additional rules are applied;
Uncertainty level 1: allow previously accepted (but now invalid) taxonomic names and minor spelling errors;
Uncertainty level 2: allow all of level 1, strip values between brackets, inverse the words of the input, strip off text elements from the end keeping at least two elements;
Uncertainty level 3: allow all of level 1 and 2, strip off text elements from the end, allow any part of a taxonomic name.
This leads to e.g.:
"Streptococcus group B (known as S. agalactiae)"
. The text between brackets will be removed and a warning will be thrown that the result Streptococcus group B (B_STRPT_GRPB
) needs review.
"S. aureus - please mind: MRSA"
. The last word will be stripped, after which the function will try to find a match. If it does not, the second last word will be stripped, etc. Again, a warning will be thrown that the result Staphylococcus aureus (B_STPHY_AURS
) needs review.
"Fluoroquinolone-resistant Neisseria gonorrhoeae"
. The first word will be stripped, after which the function will try to find a match. A warning will be thrown that the result Neisseria gonorrhoeae (B_NESSR_GNRR
) needs review.
The level of uncertainty can be set using the argument allow_uncertain
. The default is allow_uncertain = TRUE
, which is equal to uncertainty level 2. Using allow_uncertain = FALSE
is equal to uncertainty level 0 and will skip all rules. You can also use e.g. as.mo(..., allow_uncertain = 1)
to only allow up to level 1 uncertainty.
There are three helper functions that can be run after then as.mo()
function:
Use mo_uncertainties()
to get a data.frame
with all values that were coerced to a valid value, but with uncertainty. The output contains a score, that is calculated as \((n - 0.5 * L) / n\), where n is the number of characters of the returned full name of the microorganism, and L is the Levenshtein distance between that full name and the user input.
Use mo_failures()
to get a vector
with all values that could not be coerced to a valid value.
Use mo_renamed()
to get a data.frame
with all values that could be coerced based on an old, previously accepted taxonomic name.
The intelligent rules consider the prevalence of microorganisms in humans grouped into three groups, which is available as the prevalence
columns in the microorganisms and microorganisms.old data sets. The grouping into prevalence groups is based on experience from several microbiological laboratories in the Netherlands in conjunction with international reports on pathogen prevalence.
Group 1 (most prevalent microorganisms) consists of all microorganisms where the taxonomic class is Gammaproteobacteria or where the taxonomic genus is Enterococcus, Staphylococcus or Streptococcus. This group consequently contains all common Gram-negative bacteria, such as Pseudomonas and Legionella and all species within the order Enterobacteriales.
+Group 2 consists of all microorganisms where the taxonomic phylum is Proteobacteria, Firmicutes, Actinobacteria or Sarcomastigophora, or where the taxonomic genus is Aspergillus, Bacteroides, Candida, Capnocytophaga, Chryseobacterium, Cryptococcus, Elisabethkingia, Flavobacterium, Fusobacterium, Giardia, Leptotrichia, Mycoplasma, Prevotella, Rhodotorula, Treponema, Trichophyton or Ureaplasma.
+Group 3 (least prevalent microorganisms) consists of all other microorganisms.
+ +Becker K et al. Coagulase-Negative Staphylococci. 2014. Clin Microbiol Rev. 27(4): 870–926. https://dx.doi.org/10.1128/CMR.00109-13
Becker K et al. Implications of identifying the recently defined members of the S. aureus complex, S. argenteus and S. schweitzeri: A position paper of members of the ESCMID Study Group for staphylococci and Staphylococcal Diseases (ESGS). 2019. Clin Microbiol Infect. https://doi.org/10.1016/j.cmi.2019.02.028
Lancefield RC A serological differentiation of human and other groups of hemolytic streptococci. 1933. J Exp Med. 57(4): 571–95. https://dx.doi.org/10.1084/jem.57.4.571
Catalogue of Life: Annual Checklist (public online taxonomic database), http://www.catalogueoflife.org (check included annual version with catalogue_of_life_version()
).
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+
+This package contains the complete taxonomic tree of almost all microorganisms (~70,000 species) from the authoritative and comprehensive Catalogue of Life (http://www.catalogueoflife.org). The Catalogue of Life is the most comprehensive and authoritative global index of species currently available.
Click here for more information about the included taxa. Check which version of the Catalogue of Life was included in this package with catalogue_of_life_version()
.
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+microorganisms for the data.frame
that is being used to determine ID's.
The mo_property()
functions (like mo_genus()
, mo_gramstain()
) to get properties based on the returned code.
# \donttest{ +# These examples all return "B_STPHY_AURS", the ID of S. aureus: +as.mo("sau") # WHONET code +as.mo("stau") +as.mo("STAU") +as.mo("staaur") +as.mo("S. aureus") +as.mo("S aureus") +as.mo("Staphylococcus aureus") +as.mo("Staphylococcus aureus (MRSA)") +as.mo("Zthafilokkoockus oureuz") # handles incorrect spelling +as.mo("MRSA") # Methicillin Resistant S. aureus +as.mo("VISA") # Vancomycin Intermediate S. aureus +as.mo("VRSA") # Vancomycin Resistant S. aureus +as.mo(115329001) # SNOMED CT code + +# Dyslexia is no problem - these all work: +as.mo("Ureaplasma urealyticum") +as.mo("Ureaplasma urealyticus") +as.mo("Ureaplasmium urealytica") +as.mo("Ureaplazma urealitycium") + +as.mo("Streptococcus group A") +as.mo("GAS") # Group A Streptococci +as.mo("GBS") # Group B Streptococci + +as.mo("S. epidermidis") # will remain species: B_STPHY_EPDR +as.mo("S. epidermidis", Becker = TRUE) # will not remain species: B_STPHY_CONS + +as.mo("S. pyogenes") # will remain species: B_STRPT_PYGN +as.mo("S. pyogenes", Lancefield = TRUE) # will not remain species: B_STRPT_GRPA + +# All mo_* functions use as.mo() internally too (see ?mo_property): +mo_genus("E. coli") # returns "Escherichia" +mo_gramstain("E. coli") # returns "Gram negative" + +# } +if (FALSE) { +df$mo <- as.mo(df$microorganism_name) + +# the select function of the Tidyverse is also supported: +library(dplyr) +df$mo <- df %>% + select(microorganism_name) %>% + as.mo() + +# and can even contain 2 columns, which is convenient for genus/species combinations: +df$mo <- df %>% + select(genus, species) %>% + as.mo() +# although this works easier and does the same: +df <- df %>% + mutate(mo = as.mo(paste(genus, species))) +}+
Interpret MIC values and disk diffusion diameters according to EUCAST or CLSI, or clean up existing R/SI values. This transforms the input to a new class rsi
, which is an ordered factor with levels S < I < R
. Invalid antimicrobial interpretations will be translated as NA
with a warning.
as.rsi(x, ...) + +# S3 method for mic +as.rsi( + x, + mo, + ab = deparse(substitute(x)), + guideline = "EUCAST", + uti = FALSE, + ... +) + +# S3 method for disk +as.rsi( + x, + mo, + ab = deparse(substitute(x)), + guideline = "EUCAST", + uti = FALSE, + ... +) + +# S3 method for data.frame +as.rsi(x, col_mo = NULL, guideline = "EUCAST", uti = NULL, ...) + +is.rsi(x) + +is.rsi.eligible(x, threshold = 0.05)+ +
x | +vector of values (for class |
+
---|---|
... | +parameters passed on to methods |
+
mo | +any (vector of) text that can be coerced to a valid microorganism code with |
+
ab | +any (vector of) text that can be coerced to a valid antimicrobial code with |
+
guideline | +defaults to the latest included EUCAST guideline, see Details for all options |
+
uti | +(Urinary Tract Infection) A vector with logicals ( |
+
col_mo | +column name of the IDs of the microorganisms (see |
+
threshold | +maximum fraction of invalid antimicrobial interpretations of |
+
Ordered factor with new class rsi
When using as.rsi()
on untransformed data, the data will be cleaned to only contain values S, I and R. When using the function on data with class mic
(using as.mic()
) or class disk
(using as.disk()
), the data will be interpreted based on the guideline set with the guideline
parameter.
Supported guidelines to be used as input for the guideline
parameter are: "CLSI 2010", "CLSI 2011", "CLSI 2012", "CLSI 2013", "CLSI 2014", "CLSI 2015", "CLSI 2016", "CLSI 2017", "CLSI 2018", "CLSI 2019", "EUCAST 2011", "EUCAST 2012", "EUCAST 2013", "EUCAST 2014", "EUCAST 2015", "EUCAST 2016", "EUCAST 2017", "EUCAST 2018", "EUCAST 2019", "EUCAST 2020". Simply using "CLSI"
or "EUCAST"
for input will automatically select the latest version of that guideline.
The repository of this package contains a machine readable version of all guidelines. This is a CSV file consisting of 18,964 rows and 10 columns. This file is machine readable, since it contains one row for every unique combination of the test method (MIC or disk diffusion), the antimicrobial agent and the microorganism. This allows for easy implementation of these rules in laboratory information systems (LIS).
+After using as.rsi()
, you can use eucast_rules()
to (1) apply inferred susceptibility and resistance based on results of other antimicrobials and (2) apply intrinsic resistance based on taxonomic properties of a microorganism.
The function is.rsi.eligible()
returns TRUE
when a columns contains at most 5% invalid antimicrobial interpretations (not S and/or I and/or R), and FALSE
otherwise. The threshold of 5% can be set with the threshold
parameter.
In 2019, the European Committee on Antimicrobial Susceptibility Testing (EUCAST) has decided to change the definitions of susceptibility testing categories R and S/I as shown below (http://www.eucast.org/newsiandr/).
R = Resistant
+A microorganism is categorised as Resistant when there is a high likelihood of therapeutic failure even when there is increased exposure. Exposure is a function of how the mode of administration, dose, dosing interval, infusion time, as well as distribution and excretion of the antimicrobial agent will influence the infecting organism at the site of infection.
S = Susceptible
+A microorganism is categorised as Susceptible, standard dosing regimen, when there is a high likelihood of therapeutic success using a standard dosing regimen of the agent.
I = Increased exposure, but still susceptible
+A microorganism is categorised as Susceptible, Increased exposure when there is a high likelihood of therapeutic success because exposure to the agent is increased by adjusting the dosing regimen or by its concentration at the site of infection.
This AMR package honours this new insight. Use susceptibility()
(equal to proportion_SI()
) to determine antimicrobial susceptibility and count_susceptible()
(equal to count_SI()
) to count susceptible isolates.
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+summary(example_isolates) # see all R/SI results at a glance + +# For INTERPRETING disk diffusion and MIC values ----------------------- + +# a whole data set, even with combined MIC values and disk zones +df <- data.frame(microorganism = "E. coli", + AMP = as.mic(8), + CIP = as.mic(0.256), + GEN = as.disk(18), + TOB = as.disk(16), + NIT = as.mic(32)) +as.rsi(df) + +if (FALSE) { + +# the dplyr way +library(dplyr) +df %>% + mutate_at(vars(AMP:TOB), as.rsi, mo = "E. coli") + +df %>% + mutate_at(vars(AMP:TOB), as.rsi, mo = .$microorganism) + +# to include information about urinary tract infections (UTI) +data.frame(mo = "E. coli", + NIT = c("<= 2", 32), + from_the_bladder = c(TRUE, FALSE)) %>% + as.rsi(uti = "from_the_bladder") + +data.frame(mo = "E. coli", + NIT = c("<= 2", 32), + specimen = c("urine", "blood")) %>% + as.rsi() # automatically determines urine isolates + +df %>% + mutate_at(vars(AMP:NIT), as.rsi, mo = "E. coli", uti = TRUE) +} + +# for single values +as.rsi(x = as.mic(2), + mo = as.mo("S. pneumoniae"), + ab = "AMP", + guideline = "EUCAST") + +as.rsi(x = as.disk(18), + mo = "Strep pneu", # `mo` will be coerced with as.mo() + ab = "ampicillin", # and `ab` with as.ab() + guideline = "EUCAST") + + +# For CLEANING existing R/SI values ------------------------------------ + +as.rsi(c("S", "I", "R", "A", "B", "C")) +as.rsi("<= 0.002; S") # will return "S" + +rsi_data <- as.rsi(c(rep("S", 474), rep("I", 36), rep("R", 370))) +is.rsi(rsi_data) +plot(rsi_data) # for percentages +barplot(rsi_data) # for frequencies + +if (FALSE) { +library(dplyr) +example_isolates %>% + mutate_at(vars(PEN:RIF), as.rsi) + +# fastest way to transform all columns with already valid AMR results to class `rsi`: +example_isolates %>% + mutate_if(is.rsi.eligible, as.rsi) + +# note: from dplyr 1.0.0 on, this will be: +# example_isolates %>% +# mutate(across(is.rsi.eligible, as.rsi)) + +# default threshold of `is.rsi.eligible` is 5%. +is.rsi.eligible(WHONET$`First name`) # fails, >80% is invalid +is.rsi.eligible(WHONET$`First name`, threshold = 0.99) # succeeds +}+
Gets data from the WHO to determine properties of an ATC (e.g. an antibiotic) like name, defined daily dose (DDD) or standard unit.
+This function requires an internet connection.
+atc_online_property( + atc_code, + property, + administration = "O", + url = "https://www.whocc.no/atc_ddd_index/?code=%s&showdescription=no" +) + +atc_online_groups(atc_code, ...) + +atc_online_ddd(atc_code, ...)+ +
atc_code | +a character or character vector with ATC code(s) of antibiotic(s) |
+
---|---|
property | +property of an ATC code. Valid values are |
+
administration | +type of administration when using |
+
url | +url of website of the WHO. The sign |
+
... | +parameters to pass on to |
+
https://www.whocc.no/atc_ddd_alterations__cumulative/ddd_alterations/abbrevations/
+Options for parameter administration
:
"Implant"
= Implant
"Inhal"
= Inhalation
"Instill"
= Instillation
"N"
= nasal
"O"
= oral
"P"
= parenteral
"R"
= rectal
"SL"
= sublingual/buccal
"TD"
= transdermal
"V"
= vaginal
Abbreviations of return values when using property = "U"
(unit):
"g"
= gram
"mg"
= milligram
`"mcg"`` = microgram
"U"
= unit
"TU"
= thousand units
"MU"
= million units
"mmol"
= millimole
"ml"
= milliliter (e.g. eyedrops)
+The lifecycle of this function is questioning. This function might be no longer be optimal approach, or is it questionable whether this function should be in this AMR
package at all.
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +if (FALSE) { +# oral DDD (Defined Daily Dose) of amoxicillin +atc_online_property("J01CA04", "DDD", "O") +# parenteral DDD (Defined Daily Dose) of amoxicillin +atc_online_property("J01CA04", "DDD", "P") + +atc_online_property("J01CA04", property = "groups") # search hierarchical groups of amoxicillin +# [1] "ANTIINFECTIVES FOR SYSTEMIC USE" +# [2] "ANTIBACTERIALS FOR SYSTEMIC USE" +# [3] "BETA-LACTAM ANTIBACTERIALS, PENICILLINS" +# [4] "Penicillins with extended spectrum" +}+
Easy check for data availability of all columns in a data set. This makes it easy to get an idea of which antimicrobial combinations can be used for calculation with e.g. susceptibility()
and resistance()
.
availability(tbl, width = NULL)+ +
tbl | +a |
+
---|---|
width | +number of characters to present the visual availability, defaults to filling the width of the console |
+
data.frame
with column names of tbl
as row names
The function returns a data.frame
with columns "resistant"
and "visual_resistance"
. The values in that columns are calculated with resistance()
.
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +availability(example_isolates) + +if (FALSE) { +library(dplyr) +example_isolates %>% availability() + +example_isolates %>% + select_if(is.rsi) %>% + availability() + +example_isolates %>% + filter(mo == as.mo("E. coli")) %>% + select_if(is.rsi) %>% + availability() +}+
Determine antimicrobial resistance (AMR) of all bug-drug combinations in your data set where at least 30 (default) isolates are available per species. Use format()
on the result to prettify it to a publicable/printable format, see Examples.
bug_drug_combinations(x, col_mo = NULL, FUN = mo_shortname, ...) + +# S3 method for bug_drug_combinations +format( + x, + translate_ab = "name (ab, atc)", + language = get_locale(), + minimum = 30, + combine_SI = TRUE, + combine_IR = FALSE, + add_ab_group = TRUE, + remove_intrinsic_resistant = FALSE, + decimal.mark = getOption("OutDec"), + big.mark = ifelse(decimal.mark == ",", ".", ","), + ... +)+ +
x | +data with antibiotic columns, like e.g. |
+
---|---|
col_mo | +column name of the IDs of the microorganisms (see |
+
FUN | +the function to call on the |
+
... | +arguments passed on to |
+
translate_ab | +a character of length 1 containing column names of the antibiotics data set |
+
language | +language of the returned text, defaults to system language (see |
+
minimum | +the minimum allowed number of available (tested) isolates. Any isolate count lower than |
+
combine_SI | +a logical to indicate whether all values of S and I must be merged into one, so the output only consists of S+I vs. R (susceptible vs. resistant). This used to be the parameter |
+
combine_IR | +logical to indicate whether values R and I should be summed |
+
add_ab_group | +logical to indicate where the group of the antimicrobials must be included as a first column |
+
remove_intrinsic_resistant | +logical to indicate that rows with 100% resistance for all tested antimicrobials must be removed from the table |
+
decimal.mark | +the character to be used to indicate the numeric + decimal point. |
+
big.mark | +character; if not empty used as mark between every
+ |
+
M39 Analysis and Presentation of Cumulative Antimicrobial Susceptibility Test Data, 4th Edition, 2014, Clinical and Laboratory Standards Institute (CLSI). https://clsi.org/standards/products/microbiology/documents/m39/.
+The function bug_drug_combinations()
returns a data.frame
with columns "mo", "ab", "S", "I", "R" and "total".
The function format()
calculates the resistance per bug-drug combination. Use combine_IR = FALSE
(default) to test R vs. S+I and combine_IR = TRUE
to test R+I vs. S.
The language of the output can be overwritten with options(AMR_locale)
, please see translate.
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +# \donttest{ +x <- bug_drug_combinations(example_isolates) +x +format(x, translate_ab = "name (atc)") + +# Use FUN to change to transformation of microorganism codes +x <- bug_drug_combinations(example_isolates, + FUN = mo_gramstain) + +x <- bug_drug_combinations(example_isolates, + FUN = function(x) ifelse(x == "B_ESCHR_COLI", + "E. coli", + "Others")) +# }+
This package contains the complete taxonomic tree of almost all microorganisms from the authoritative and comprehensive Catalogue of Life.
+
+This package contains the complete taxonomic tree of almost all microorganisms (~70,000 species) from the authoritative and comprehensive Catalogue of Life (http://www.catalogueoflife.org). The Catalogue of Life is the most comprehensive and authoritative global index of species currently available.
Click here for more information about the included taxa. Check which version of the Catalogue of Life was included in this package with catalogue_of_life_version()
.
Included are:
All ~61,000 (sub)species from the kingdoms of Archaea, Bacteria, Chromista and Protozoa
All ~8,500 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Microascales, Mucorales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, like mushrooms). Because of this, not all fungi fit the scope of this package and including everything would tremendously slow down our algorithms too. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (like all species of Aspergillus, Candida, Cryptococcus, Histplasma, Pneumocystis, Saccharomyces and Trichophyton).
All ~150 (sub)species from ~100 other relevant genera from the kingdom of Animalia (like Strongyloides and Taenia)
All ~23,000 previously accepted names of all included (sub)species (these were taxonomically renamed)
The complete taxonomic tree of all included (sub)species: from kingdom to subspecies
The responsible author(s) and year of scientific publication
The Catalogue of Life (http://www.catalogueoflife.org) is the most comprehensive and authoritative global index of species currently available. It holds essential information on the names, relationships and distributions of over 1.9 million species. The Catalogue of Life is used to support the major biodiversity and conservation information services such as the Global Biodiversity Information Facility (GBIF), Encyclopedia of Life (EoL) and the International Union for Conservation of Nature Red List. It is recognised by the Convention on Biological Diversity as a significant component of the Global Taxonomy Initiative and a contribution to Target 1 of the Global Strategy for Plant Conservation.
+The syntax used to transform the original data to a cleansed R format, can be found here: https://github.com/msberends/AMR/blob/master/data-raw/reproduction_of_microorganisms.R.
+On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+Data set microorganisms for the actual data.
+Function as.mo()
to use the data for intelligent determination of microorganisms.
# Get version info of included data set +catalogue_of_life_version() + + +# Get a note when a species was renamed +mo_shortname("Chlamydophila psittaci") +# Note: 'Chlamydophila psittaci' (Everett et al., 1999) was renamed back to +# 'Chlamydia psittaci' (Page, 1968) +# [1] "C. psittaci" + +# Get any property from the entire taxonomic tree for all included species +mo_class("E. coli") +# [1] "Gammaproteobacteria" + +mo_family("E. coli") +# [1] "Enterobacteriaceae" + +mo_gramstain("E. coli") # based on kingdom and phylum, see ?mo_gramstain +# [1] "Gram negative" + +mo_ref("E. coli") +# [1] "Castellani et al., 1919" + +# Do not get mistaken - this package is about microorganisms +mo_kingdom("C. elegans") +# [1] "Fungi" # Fungi?! +mo_name("C. elegans") +# [1] "Cladosporium elegans" # Because a microorganism was found+
R/catalogue_of_life.R
+ catalogue_of_life_version.Rd
This function returns information about the included data from the Catalogue of Life.
+catalogue_of_life_version()
+
+
+ a list
, which prints in pretty format
For DSMZ, see microorganisms.
+
+This package contains the complete taxonomic tree of almost all microorganisms (~70,000 species) from the authoritative and comprehensive Catalogue of Life (http://www.catalogueoflife.org). The Catalogue of Life is the most comprehensive and authoritative global index of species currently available.
Click here for more information about the included taxa. Check which version of the Catalogue of Life was included in this package with catalogue_of_life_version()
.
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+These functions can be used to count resistant/susceptible microbial isolates. All functions support quasiquotation with pipes, can be used in summarise()
from the dplyr
package and also support grouped variables, please see Examples.
count_resistant()
should be used to count resistant isolates, count_susceptible()
should be used to count susceptible isolates.
count_resistant(..., only_all_tested = FALSE) + +count_susceptible(..., only_all_tested = FALSE) + +count_R(..., only_all_tested = FALSE) + +count_IR(..., only_all_tested = FALSE) + +count_I(..., only_all_tested = FALSE) + +count_SI(..., only_all_tested = FALSE) + +count_S(..., only_all_tested = FALSE) + +count_all(..., only_all_tested = FALSE) + +n_rsi(..., only_all_tested = FALSE) + +count_df( + data, + translate_ab = "name", + language = get_locale(), + combine_SI = TRUE, + combine_IR = FALSE +)+ +
... | +one or more vectors (or columns) with antibiotic interpretations. They will be transformed internally with |
+
---|---|
only_all_tested | +(for combination therapies, i.e. using more than one variable for |
+
data | +a |
+
translate_ab | +a column name of the antibiotics data set to translate the antibiotic abbreviations to, using |
+
language | +language of the returned text, defaults to system language (see |
+
combine_SI | +a logical to indicate whether all values of S and I must be merged into one, so the output only consists of S+I vs. R (susceptible vs. resistant). This used to be the parameter |
+
combine_IR | +a logical to indicate whether all values of I and R must be merged into one, so the output only consists of S vs. I+R (susceptible vs. non-susceptible). This is outdated, see parameter |
+
An integer
These functions are meant to count isolates. Use the resistance()
/susceptibility()
functions to calculate microbial resistance/susceptibility.
The function count_resistant()
is equal to the function count_R()
. The function count_susceptible()
is equal to the function count_SI()
.
The function n_rsi()
is an alias of count_all()
. They can be used to count all available isolates, i.e. where all input antibiotics have an available result (S, I or R). Their use is equal to n_distinct()
. Their function is equal to count_susceptible(...) + count_resistant(...)
.
The function count_df()
takes any variable from data
that has an rsi
class (created with as.rsi()
) and counts the number of S's, I's and R's. It also supports grouped variables. The function rsi_df()
works exactly like count_df()
, but adds the percentage of S, I and R.
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+In 2019, the European Committee on Antimicrobial Susceptibility Testing (EUCAST) has decided to change the definitions of susceptibility testing categories R and S/I as shown below (http://www.eucast.org/newsiandr/).
R = Resistant
+A microorganism is categorised as Resistant when there is a high likelihood of therapeutic failure even when there is increased exposure. Exposure is a function of how the mode of administration, dose, dosing interval, infusion time, as well as distribution and excretion of the antimicrobial agent will influence the infecting organism at the site of infection.
S = Susceptible
+A microorganism is categorised as Susceptible, standard dosing regimen, when there is a high likelihood of therapeutic success using a standard dosing regimen of the agent.
I = Increased exposure, but still susceptible
+A microorganism is categorised as Susceptible, Increased exposure when there is a high likelihood of therapeutic success because exposure to the agent is increased by adjusting the dosing regimen or by its concentration at the site of infection.
This AMR package honours this new insight. Use susceptibility()
(equal to proportion_SI()
) to determine antimicrobial susceptibility and count_susceptible()
(equal to count_SI()
) to count susceptible isolates.
When using more than one variable for ...
(= combination therapy)), use only_all_tested
to only count isolates that are tested for all antibiotics/variables that you test them for. See this example for two antibiotics, Drug A and Drug B, about how susceptibility()
works to calculate the %SI:
-------------------------------------------------------------------- + only_all_tested = FALSE only_all_tested = TRUE + ----------------------- ----------------------- + Drug A Drug B include as include as include as include as + numerator denominator numerator denominator +-------- -------- ---------- ----------- ---------- ----------- + S or I S or I X X X X + R S or I X X X X + <NA> S or I X X - - + S or I R X X X X + R R - X - X + <NA> R - - - - + S or I <NA> X X - - + R <NA> - - - - + <NA> <NA> - - - - +-------------------------------------------------------------------- ++ +
Please note that, in combination therapies, for only_all_tested = TRUE
applies that:
count_S() + count_I() + count_R() = count_all() + proportion_S() + proportion_I() + proportion_R() = 1+ +
and that, in combination therapies, for only_all_tested = FALSE
applies that:
count_S() + count_I() + count_R() >= count_all() + proportion_S() + proportion_I() + proportion_R() >= 1 ++ +
Using only_all_tested
has no impact when only using one antibiotic as input.
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+proportion_*
to calculate microbial resistance and susceptibility.
# example_isolates is a data set available in the AMR package. +?example_isolates + +count_resistant(example_isolates$AMX) # counts "R" +count_susceptible(example_isolates$AMX) # counts "S" and "I" +count_all(example_isolates$AMX) # counts "S", "I" and "R" + +# be more specific +count_S(example_isolates$AMX) +count_SI(example_isolates$AMX) +count_I(example_isolates$AMX) +count_IR(example_isolates$AMX) +count_R(example_isolates$AMX) + +# Count all available isolates +count_all(example_isolates$AMX) +n_rsi(example_isolates$AMX) + +# n_rsi() is an alias of count_all(). +# Since it counts all available isolates, you can +# calculate back to count e.g. susceptible isolates. +# These results are the same: +count_susceptible(example_isolates$AMX) +susceptibility(example_isolates$AMX) * n_rsi(example_isolates$AMX) + + +if (require("dplyr")) { + example_isolates %>% + group_by(hospital_id) %>% + summarise(R = count_R(CIP), + I = count_I(CIP), + S = count_S(CIP), + n1 = count_all(CIP), # the actual total; sum of all three + n2 = n_rsi(CIP), # same - analogous to n_distinct + total = n()) # NOT the number of tested isolates! + + # Count co-resistance between amoxicillin/clav acid and gentamicin, + # so we can see that combination therapy does a lot more than mono therapy. + # Please mind that `susceptibility()` calculates percentages right away instead. + example_isolates %>% count_susceptible(AMC) # 1433 + example_isolates %>% count_all(AMC) # 1879 + + example_isolates %>% count_susceptible(GEN) # 1399 + example_isolates %>% count_all(GEN) # 1855 + + example_isolates %>% count_susceptible(AMC, GEN) # 1764 + example_isolates %>% count_all(AMC, GEN) # 1936 + + # Get number of S+I vs. R immediately of selected columns + example_isolates %>% + select(AMX, CIP) %>% + count_df(translate = FALSE) + + # It also supports grouping variables + example_isolates %>% + select(hospital_id, AMX, CIP) %>% + group_by(hospital_id) %>% + count_df(translate = FALSE) +}+
Apply susceptibility rules as defined by the European Committee on Antimicrobial Susceptibility Testing (EUCAST, http://eucast.org), see Source. This includes (1) expert rules and intrinsic resistance and (2) inferred resistance as defined in their breakpoint tables.
+To improve the interpretation of the antibiogram before EUCAST rules are applied, some non-EUCAST rules are applied at default, see Details.
+eucast_rules( + x, + col_mo = NULL, + info = interactive(), + rules = getOption("AMR.eucast_rules", default = c("breakpoints", "expert")), + verbose = FALSE, + ... +)+ +
x | +data with antibiotic columns, like e.g. |
+
---|---|
col_mo | +column name of the IDs of the microorganisms (see |
+
info | +print progress |
+
rules | +a character vector that specifies which rules should be applied. Must be one or more of |
+
verbose | +a logical to turn Verbose mode on and off (default is off). In Verbose mode, the function does not apply rules to the data, but instead returns a data set in logbook form with extensive info about which rows and columns would be effected and in which way. |
+
... | +column name of an antibiotic, please see section Antibiotics below |
+
EUCAST Expert Rules. Version 2.0, 2012.
+Leclercq et al. EUCAST expert rules in antimicrobial susceptibility testing. Clin Microbiol Infect. 2013;19(2):141-60.
+https://doi.org/10.1111/j.1469-0691.2011.03703.x
EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes Tables. Version 3.1, 2016.
+http://www.eucast.org/fileadmin/src/media/PDFs/EUCAST_files/Expert_Rules/Expert_rules_intrinsic_exceptional_V3.1.pdf
EUCAST Breakpoint tables for interpretation of MICs and zone diameters. Version 9.0, 2019.
+http://www.eucast.org/fileadmin/src/media/PDFs/EUCAST_files/Breakpoint_tables/v_9.0_Breakpoint_Tables.xlsx
The input of x
, possibly with edited values of antibiotics. Or, if verbose = TRUE
, a data.frame
with all original and new values of the affected bug-drug combinations.
Note: This function does not translate MIC values to RSI values. Use as.rsi()
for that.
+Note: When ampicillin (AMP, J01CA01) is not available but amoxicillin (AMX, J01CA04) is, the latter will be used for all rules where there is a dependency on ampicillin. These drugs are interchangeable when it comes to expression of antimicrobial resistance.
Before further processing, some non-EUCAST rules can be applied to improve the efficacy of the EUCAST rules. These non-EUCAST rules, that are then applied to all isolates, are:
Inherit amoxicillin (AMX) from ampicillin (AMP), where amoxicillin (AMX) is unavailable;
Inherit ampicillin (AMP) from amoxicillin (AMX), where ampicillin (AMP) is unavailable;
Set amoxicillin (AMX) = R where amoxicillin/clavulanic acid (AMC) = R;
Set piperacillin (PIP) = R where piperacillin/tazobactam (TZP) = R;
Set trimethoprim (TMP) = R where trimethoprim/sulfamethoxazole (SXT) = R;
Set amoxicillin/clavulanic acid (AMC) = S where amoxicillin (AMX) = S;
Set piperacillin/tazobactam (TZP) = S where piperacillin (PIP) = S;
Set trimethoprim/sulfamethoxazole (SXT) = S where trimethoprim (TMP) = S.
These rules are not applied at default, since they are not approved by EUCAST. To use these rules, please use eucast_rules(..., rules = "all")
, or set the default behaviour of the [eucast_rules()]
function with options(AMR.eucast_rules = "all")
(or any other valid input value(s) to the rules
parameter).
The file containing all EUCAST rules is located here: https://github.com/msberends/AMR/blob/master/data-raw/eucast_rules.tsv.
+To define antibiotics column names, leave as it is to determine it automatically with guess_ab_col()
or input a text (case-insensitive), or use NULL
to skip a column (e.g. TIC = NULL
to skip ticarcillin). Manually defined but non-existing columns will be skipped with a warning.
The following antibiotics are used for the functions eucast_rules()
and mdro()
. These are shown below in the format 'antimicrobial ID: name (ATC code)', sorted by name:
AMK: amikacin (J01GB06), +AMX: amoxicillin (J01CA04), +AMC: amoxicillin/clavulanic acid (J01CR02), +AMP: ampicillin (J01CA01), +SAM: ampicillin/sulbactam (J01CR01), +AZM: azithromycin (J01FA10), +AZL: azlocillin (J01CA09), +ATM: aztreonam (J01DF01), +CAP: capreomycin (J04AB30), +RID: cefaloridine (J01DB02), +CZO: cefazolin (J01DB04), +FEP: cefepime (J01DE01), +CTX: cefotaxime (J01DD01), +CTT: cefotetan (J01DC05), +FOX: cefoxitin (J01DC01), +CPT: ceftaroline (J01DI02), +CAZ: ceftazidime (J01DD02), +CRO: ceftriaxone (J01DD04), +CXM: cefuroxime (J01DC02), +CED: cephradine (J01DB09), +CHL: chloramphenicol (J01BA01), +CIP: ciprofloxacin (J01MA02), +CLR: clarithromycin (J01FA09), +CLI: clindamycin (J01FF01), +COL: colistin (J01XB01), +DAP: daptomycin (J01XX09), +DOR: doripenem (J01DH04), +DOX: doxycycline (J01AA02), +ETP: ertapenem (J01DH03), +ERY: erythromycin (J01FA01), +ETH: ethambutol (J04AK02), +FLC: flucloxacillin (J01CF05), +FOS: fosfomycin (J01XX01), +FUS: fusidic acid (J01XC01), +GAT: gatifloxacin (J01MA16), +GEN: gentamicin (J01GB03), +GEH: gentamicin-high (no ATC code), +IPM: imipenem (J01DH51), +INH: isoniazid (J04AC01), +KAN: kanamycin (J01GB04), +LVX: levofloxacin (J01MA12), +LIN: lincomycin (J01FF02), +LNZ: linezolid (J01XX08), +MEM: meropenem (J01DH02), +MTR: metronidazole (J01XD01), +MEZ: mezlocillin (J01CA10), +MNO: minocycline (J01AA08), +MFX: moxifloxacin (J01MA14), +NAL: nalidixic acid (J01MB02), +NEO: neomycin (J01GB05), +NET: netilmicin (J01GB07), +NIT: nitrofurantoin (J01XE01), +NOR: norfloxacin (J01MA06), +NOV: novobiocin (QJ01XX95), +OFX: ofloxacin (J01MA01), +OXA: oxacillin (J01CF04), +PEN: penicillin G (J01CE01), +PIP: piperacillin (J01CA12), +TZP: piperacillin/tazobactam (J01CR05), +PLB: polymyxin B (J01XB02), +PRI: pristinamycin (J01FG01), +PZA: pyrazinamide (J04AK01), +QDA: quinupristin/dalfopristin (J01FG02), +RIB: rifabutin (J04AB04), +RIF: rifampicin (J04AB02), +RFP: rifapentine (J04AB05), +RXT: roxithromycin (J01FA06), +SIS: sisomicin (J01GB08), +STH: streptomycin-high (no ATC code), +TEC: teicoplanin (J01XA02), +TLV: telavancin (J01XA03), +TCY: tetracycline (J01AA07), +TIC: ticarcillin (J01CA13), +TCC: ticarcillin/clavulanic acid (J01CR03), +TGC: tigecycline (J01AA12), +TOB: tobramycin (J01GB01), +TMP: trimethoprim (J01EA01), +SXT: trimethoprim/sulfamethoxazole (J01EE01), +VAN: vancomycin (J01XA01).
+
+The lifecycle of this function is maturing. The unlying code of a maturing function has been roughed out, but finer details might still change. Since this function needs wider usage and more extensive testing, you are very welcome to suggest changes at our repository or write us an email (see section 'Contact Us').
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +# \donttest{ +a <- data.frame(mo = c("Staphylococcus aureus", + "Enterococcus faecalis", + "Escherichia coli", + "Klebsiella pneumoniae", + "Pseudomonas aeruginosa"), + VAN = "-", # Vancomycin + AMX = "-", # Amoxicillin + COL = "-", # Colistin + CAZ = "-", # Ceftazidime + CXM = "-", # Cefuroxime + PEN = "S", # Penicillin G + FOX = "S", # Cefoxitin + stringsAsFactors = FALSE) + +a +# mo VAN AMX COL CAZ CXM PEN FOX +# 1 Staphylococcus aureus - - - - - S S +# 2 Enterococcus faecalis - - - - - S S +# 3 Escherichia coli - - - - - S S +# 4 Klebsiella pneumoniae - - - - - S S +# 5 Pseudomonas aeruginosa - - - - - S S + + +# apply EUCAST rules: 18 results are forced as R or S +b <- eucast_rules(a) + +b +# mo VAN AMX COL CAZ CXM PEN FOX +# 1 Staphylococcus aureus - S R R S S S +# 2 Enterococcus faecalis - - R R R S R +# 3 Escherichia coli R - - - - R S +# 4 Klebsiella pneumoniae R R - - - R S +# 5 Pseudomonas aeruginosa R R - - R R R + + +# do not apply EUCAST rules, but rather get a data.frame +# with 18 rows, containing all details about the transformations: +c <- eucast_rules(a, verbose = TRUE) +# }+
A data set containing 2,000 microbial isolates with their full antibiograms. The data set reflects reality and can be used to practice AMR analysis. For examples, please read the tutorial on our website.
+example_isolates
+
+
+ A data.frame
with 2,000 observations and 49 variables:
date
date of receipt at the laboratory
hospital_id
ID of the hospital, from A to D
ward_icu
logical to determine if ward is an intensive care unit
ward_clinical
logical to determine if ward is a regular clinical ward
ward_outpatient
logical to determine if ward is an outpatient clinic
age
age of the patient
gender
gender of the patient
patient_id
ID of the patient
mo
ID of microorganism created with as.mo()
, see also microorganisms
PEN:RIF
40 different antibiotics with class rsi
(see as.rsi()
); these column names occur in the antibiotics data set and can be translated with ab_name()
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +A data set containing 3,000 microbial isolates that are not cleaned up and consequently not ready for AMR analysis. This data set can be used for practice.
+example_isolates_unclean
+
+
+ A data.frame
with 3,000 observations and 8 variables:
patient_id
ID of the patient
date
date of receipt at the laboratory
hospital
ID of the hospital, from A to C
bacteria
info about microorganism that can be transformed with as.mo()
, see also microorganisms
AMX:GEN
4 different antibiotics that have to be transformed with as.rsi()
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +R/filter_ab_class.R
+ filter_ab_class.Rd
Filter isolates on results in specific antimicrobial classes. This makes it easy to filter on isolates that were tested for e.g. any aminoglycoside, or to filter on carbapenem-resistant isolates without the need to specify the drugs.
+filter_ab_class(x, ab_class, result = NULL, scope = "any", ...) + +filter_aminoglycosides(x, result = NULL, scope = "any", ...) + +filter_carbapenems(x, result = NULL, scope = "any", ...) + +filter_cephalosporins(x, result = NULL, scope = "any", ...) + +filter_1st_cephalosporins(x, result = NULL, scope = "any", ...) + +filter_2nd_cephalosporins(x, result = NULL, scope = "any", ...) + +filter_3rd_cephalosporins(x, result = NULL, scope = "any", ...) + +filter_4th_cephalosporins(x, result = NULL, scope = "any", ...) + +filter_5th_cephalosporins(x, result = NULL, scope = "any", ...) + +filter_fluoroquinolones(x, result = NULL, scope = "any", ...) + +filter_glycopeptides(x, result = NULL, scope = "any", ...) + +filter_macrolides(x, result = NULL, scope = "any", ...) + +filter_penicillins(x, result = NULL, scope = "any", ...) + +filter_tetracyclines(x, result = NULL, scope = "any", ...)+ +
x | +a data set |
+
---|---|
ab_class | +an antimicrobial class, like |
+
result | +an antibiotic result: S, I or R (or a combination of more of them) |
+
scope | +the scope to check which variables to check, can be |
+
... | +parameters passed on to |
+
All columns of x
will be searched for known antibiotic names, abbreviations, brand names and codes (ATC, EARS-Net, WHO, etc.). This means that a filter function like e.g. filter_aminoglycosides()
will include column names like 'gen', 'genta', 'J01GB03', 'tobra', 'Tobracin', etc.
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+antibiotic_class_selectors()
for the select()
equivalent.
if (FALSE) { +library(dplyr) + +# filter on isolates that have any result for any aminoglycoside +example_isolates %>% filter_ab_class("aminoglycoside") +example_isolates %>% filter_aminoglycosides() + +# this is essentially the same as (but without determination of column names): +example_isolates %>% + filter_at(.vars = vars(c("GEN", "TOB", "AMK", "KAN")), + .vars_predicate = any_vars(. %in% c("S", "I", "R"))) + + +# filter on isolates that show resistance to ANY aminoglycoside +example_isolates %>% filter_aminoglycosides("R", "any") + +# filter on isolates that show resistance to ALL aminoglycosides +example_isolates %>% filter_aminoglycosides("R", "all") + +# filter on isolates that show resistance to +# any aminoglycoside and any fluoroquinolone +example_isolates %>% + filter_aminoglycosides("R") %>% + filter_fluoroquinolones("R") + +# filter on isolates that show resistance to +# all aminoglycosides and all fluoroquinolones +example_isolates %>% + filter_aminoglycosides("R", "all") %>% + filter_fluoroquinolones("R", "all") +}+
Determine first (weighted) isolates of all microorganisms of every patient per episode and (if needed) per specimen type.
+first_isolate( + x, + col_date = NULL, + col_patient_id = NULL, + col_mo = NULL, + col_testcode = NULL, + col_specimen = NULL, + col_icu = NULL, + col_keyantibiotics = NULL, + episode_days = 365, + testcodes_exclude = NULL, + icu_exclude = FALSE, + specimen_group = NULL, + type = "keyantibiotics", + ignore_I = TRUE, + points_threshold = 2, + info = interactive(), + include_unknown = FALSE, + ... +) + +filter_first_isolate( + x, + col_date = NULL, + col_patient_id = NULL, + col_mo = NULL, + ... +) + +filter_first_weighted_isolate( + x, + col_date = NULL, + col_patient_id = NULL, + col_mo = NULL, + col_keyantibiotics = NULL, + ... +)+ +
x | +a |
+
---|---|
col_date | +column name of the result date (or date that is was received on the lab), defaults to the first column of with a date class |
+
col_patient_id | +column name of the unique IDs of the patients, defaults to the first column that starts with 'patient' or 'patid' (case insensitive) |
+
col_mo | +column name of the IDs of the microorganisms (see |
+
col_testcode | +column name of the test codes. Use |
+
col_specimen | +column name of the specimen type or group |
+
col_icu | +column name of the logicals ( |
+
col_keyantibiotics | +column name of the key antibiotics to determine first weighted isolates, see |
+
episode_days | +episode in days after which a genus/species combination will be determined as 'first isolate' again. The default of 365 days is based on the guideline by CLSI, see Source. |
+
testcodes_exclude | +character vector with test codes that should be excluded (case-insensitive) |
+
icu_exclude | +logical whether ICU isolates should be excluded (rows with value |
+
specimen_group | +value in column |
+
type | +type to determine weighed isolates; can be |
+
ignore_I | +logical to determine whether antibiotic interpretations with |
+
points_threshold | +points until the comparison of key antibiotics will lead to inclusion of an isolate when |
+
info | +print progress |
+
include_unknown | +logical to determine whether 'unknown' microorganisms should be included too, i.e. microbial code |
+
... | +parameters passed on to the |
+
Methodology of this function is strictly based on:
+M39 Analysis and Presentation of Cumulative Antimicrobial Susceptibility Test Data, 4th Edition, 2014, Clinical and Laboratory Standards Institute (CLSI). https://clsi.org/standards/products/microbiology/documents/m39/.
+A logical
vector
WHY THIS IS SO IMPORTANT
+To conduct an analysis of antimicrobial resistance, you should only include the first isolate of every patient per episode (ref). If you would not do this, you could easily get an overestimate or underestimate of the resistance of an antibiotic. Imagine that a patient was admitted with an MRSA and that it was found in 5 different blood cultures the following week. The resistance percentage of oxacillin of all S. aureus isolates would be overestimated, because you included this MRSA more than once. It would be selection bias.
All isolates with a microbial ID of NA
will be excluded as first isolate.
The functions filter_first_isolate()
and filter_first_weighted_isolate()
are helper functions to quickly filter on first isolates. The function filter_first_isolate()
is essentially equal to one of:
x %>% filter(first_isolate(., ...))+ +
The function filter_first_weighted_isolate()
is essentially equal to:
x %>% + mutate(keyab = key_antibiotics(.)) %>% + mutate(only_weighted_firsts = first_isolate(x, + col_keyantibiotics = "keyab", ...)) %>% + filter(only_weighted_firsts == TRUE) %>% + select(-only_weighted_firsts, -keyab)+ +
There are two ways to determine whether isolates can be included as first weighted isolates which will give generally the same results:
Using type = "keyantibiotics"
and parameter ignore_I
Any difference from S to R (or vice versa) will (re)select an isolate as a first weighted isolate. With ignore_I = FALSE
, also differences from I to S|R (or vice versa) will lead to this. This is a reliable method and 30-35 times faster than method 2. Read more about this in the key_antibiotics()
function.
Using type = "points"
and parameter points_threshold
A difference from I to S|R (or vice versa) means 0.5 points, a difference from S to R (or vice versa) means 1 point. When the sum of points exceeds points_threshold
, which default to 2
, an isolate will be (re)selected as a first weighted isolate.
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+# `example_isolates` is a dataset available in the AMR package. +# See ?example_isolates. + +if (FALSE) { +library(dplyr) +# Filter on first isolates: +example_isolates %>% + mutate(first_isolate = first_isolate(.)) %>% + filter(first_isolate == TRUE) + +# Now let's see if first isolates matter: +A <- example_isolates %>% + group_by(hospital_id) %>% + summarise(count = n_rsi(GEN), # gentamicin availability + resistance = resistance(GEN)) # gentamicin resistance + +B <- example_isolates %>% + filter_first_weighted_isolate() %>% # the 1st isolate filter + group_by(hospital_id) %>% + summarise(count = n_rsi(GEN), # gentamicin availability + resistance = resistance(GEN)) # gentamicin resistance + +# Have a look at A and B. +# B is more reliable because every isolate is counted only once. +# Gentamicin resistance in hospital D appears to be 3.7% higher than +# when you (erroneously) would have used all isolates for analysis. + + +## OTHER EXAMPLES: + +# Short-hand versions: +example_isolates %>% + filter_first_isolate() + +example_isolates %>% + filter_first_weighted_isolate() + + +# set key antibiotics to a new variable +x$keyab <- key_antibiotics(x) + +x$first_isolate <- first_isolate(x) + +x$first_isolate_weighed <- first_isolate(x, col_keyantibiotics = 'keyab') + +x$first_blood_isolate <- first_isolate(x, specimen_group = "Blood") +}+
g.test()
performs chi-squared contingency table tests and goodness-of-fit tests, just like chisq.test()
but is more reliable (1). A G-test can be used to see whether the number of observations in each category fits a theoretical expectation (called a G-test of goodness-of-fit), or to see whether the proportions of one variable are different for different values of the other variable (called a G-test of independence).
g.test(x, y = NULL, p = rep(1/length(x), length(x)), rescale.p = FALSE)+ +
x | +a numeric vector or matrix. |
+
---|---|
y | +a numeric vector; ignored if |
+
p | +a vector of probabilities of the same length of |
+
rescale.p | +a logical scalar; if TRUE then |
+
The code for this function is identical to that of chisq.test()
, except that:
The calculation of the statistic was changed to \(2 * sum(x * log(x / E))\)
Yates' continuity correction was removed as it does not apply to a G-test
The possibility to simulate p values with simulate.p.value
was removed
A list with class "htest"
containing the following
+ components:
the value the chi-squared test statistic.
the degrees of freedom of the approximate
+ chi-squared distribution of the test statistic, NA
if the
+ p-value is computed by Monte Carlo simulation.
the p-value for the test.
a character string indicating the type of test + performed, and whether Monte Carlo simulation or continuity + correction was used.
a character string giving the name(s) of the data.
the observed counts.
the expected counts under the null hypothesis.
the Pearson residuals,
+ (observed - expected) / sqrt(expected)
.
standardized residuals,
+ (observed - expected) / sqrt(V)
, where V
is the residual cell variance (Agresti, 2007,
+ section 2.4.5 for the case where x
is a matrix, n * p * (1 - p)
otherwise).
If x
is a matrix with one row or column, or if x
is a vector and y
is not given, then a goodness-of-fit test is performed (x
is treated as a one-dimensional contingency table). The entries of x
must be non-negative integers. In this case, the hypothesis tested is whether the population probabilities equal those in p
, or are all equal if p
is not given.
If x
is a matrix with at least two rows and columns, it is taken as a two-dimensional contingency table: the entries of x
must be non-negative integers. Otherwise, x
and y
must be vectors or factors of the same length; cases with missing values are removed, the objects are coerced to factors, and the contingency table is computed from these. Then Pearson's chi-squared test is performed of the null hypothesis that the joint distribution of the cell counts in a 2-dimensional contingency table is the product of the row and column marginals.
The p-value is computed from the asymptotic chi-squared distribution of the test statistic.
+In the contingency table case simulation is done by random sampling from the set of all contingency tables with given marginals, and works only if the marginals are strictly positive. Note that this is not the usual sampling situation assumed for a chi-squared test (like the G-test) but rather that for Fisher's exact test.
+In the goodness-of-fit case simulation is done by random sampling from the discrete distribution specified by p
, each sample being of size n = sum(x)
. This simulation is done in R and may be slow.
Use the G-test of goodness-of-fit when you have one nominal variable with two or more values (such as male and female, or red, pink and white flowers). You compare the observed counts of numbers of observations in each category with the expected counts, which you calculate using some kind of theoretical expectation (such as a 1:1 sex ratio or a 1:2:1 ratio in a genetic cross).
+If the expected number of observations in any category is too small, the G-test may give inaccurate results, and you should use an exact test instead (fisher.test()
).
The G-test of goodness-of-fit is an alternative to the chi-square test of goodness-of-fit (chisq.test()
); each of these tests has some advantages and some disadvantages, and the results of the two tests are usually very similar.
Use the G-test of independence when you have two nominal variables, each with two or more possible values. You want to know whether the proportions for one variable are different among values of the other variable.
+It is also possible to do a G-test of independence with more than two nominal variables. For example, Jackson et al. (2013) also had data for children under 3, so you could do an analysis of old vs. young, thigh vs. arm, and reaction vs. no reaction, all analyzed together.
+Fisher's exact test (fisher.test()
) is an exact test, where the G-test is still only an approximation. For any 2x2 table, Fisher's Exact test may be slower but will still run in seconds, even if the sum of your observations is multiple millions.
The G-test of independence is an alternative to the chi-square test of independence (chisq.test()
), and they will give approximately the same results.
Unlike the exact test of goodness-of-fit (fisher.test()
), the G-test does not directly calculate the probability of obtaining the observed results or something more extreme. Instead, like almost all statistical tests, the G-test has an intermediate step; it uses the data to calculate a test statistic that measures how far the observed data are from the null expectation. You then use a mathematical relationship, in this case the chi-square distribution, to estimate the probability of obtaining that value of the test statistic.
The G-test uses the log of the ratio of two likelihoods as the test statistic, which is why it is also called a likelihood ratio test or log-likelihood ratio test. The formula to calculate a G-statistic is:
+\(G = 2 * sum(x * log(x / E))\)
+where E
are the expected values. Since this is chi-square distributed, the p value can be calculated in R with:
p <- stats::pchisq(G, df, lower.tail = FALSE)+ +
where df
are the degrees of freedom.
If there are more than two categories and you want to find out which ones are significantly different from their null expectation, you can use the same method of testing each category vs. the sum of all categories, with the Bonferroni correction. You use G-tests for each category, of course.
+ +
+The lifecycle of this function is questioning. This function might be no longer be optimal approach, or is it questionable whether this function should be in this AMR
package at all.
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+McDonald, J.H. 2014. Handbook of Biological Statistics (3rd ed.). Sparky House Publishing, Baltimore, Maryland. http://www.biostathandbook.com/gtestgof.html.
# = EXAMPLE 1 = +# Shivrain et al. (2006) crossed clearfield rice (which are resistant +# to the herbicide imazethapyr) with red rice (which are susceptible to +# imazethapyr). They then crossed the hybrid offspring and examined the +# F2 generation, where they found 772 resistant plants, 1611 moderately +# resistant plants, and 737 susceptible plants. If resistance is controlled +# by a single gene with two co-dominant alleles, you would expect a 1:2:1 +# ratio. + +x <- c(772, 1611, 737) +G <- g.test(x, p = c(1, 2, 1) / 4) +# G$p.value = 0.12574. + +# There is no significant difference from a 1:2:1 ratio. +# Meaning: resistance controlled by a single gene with two co-dominant +# alleles, is plausible. + + +# = EXAMPLE 2 = +# Red crossbills (Loxia curvirostra) have the tip of the upper bill either +# right or left of the lower bill, which helps them extract seeds from pine +# cones. Some have hypothesized that frequency-dependent selection would +# keep the number of right and left-billed birds at a 1:1 ratio. Groth (1992) +# observed 1752 right-billed and 1895 left-billed crossbills. + +x <- c(1752, 1895) +g.test(x) +# p = 0.01787343 + +# There is a significant difference from a 1:1 ratio. +# Meaning: there are significantly more left-billed birds.+
ggplot2
— ggplot_pca • AMR (for R)Produces a ggplot2
variant of a so-called biplot for PCA (principal component analysis), but is more flexible and more appealing than the base R biplot()
function.
ggplot_pca( + x, + choices = 1:2, + scale = TRUE, + pc.biplot = TRUE, + labels = NULL, + labels_textsize = 3, + labels_text_placement = 1.5, + groups = NULL, + ellipse = TRUE, + ellipse_prob = 0.68, + ellipse_size = 0.5, + ellipse_alpha = 0.5, + points_size = 2, + points_alpha = 0.25, + arrows = TRUE, + arrows_colour = "darkblue", + arrows_size = 0.5, + arrows_textsize = 3, + arrows_alpha = 0.75, + base_textsize = 10, + ... +)+ +
x | +an object returned by |
+
---|---|
choices | +length 2 vector specifying the components to plot. Only the default + is a biplot in the strict sense. |
+
scale | +The variables are scaled by |
+
pc.biplot | +If true, use what Gabriel (1971) refers to as a "principal component
+ biplot", with |
+
labels | +an optional vector of labels for the observations. If set, the labels will be placed below their respective points. When using the |
+
labels_textsize | +the size of the text used for the labels |
+
labels_text_placement | +adjustment factor the placement of the variable names ( |
+
groups | +an optional vector of groups for the labels, with the same length as |
+
ellipse | +a logical to indicate whether a normal data ellipse should be drawn for each group (set with |
+
ellipse_prob | +statistical size of the ellipse in normal probability |
+
ellipse_size | +the size of the ellipse line |
+
ellipse_alpha | +the alpha (transparency) of the ellipse line |
+
points_size | +the size of the points |
+
points_alpha | +the alpha (transparency) of the points |
+
arrows | +a logical to indicate whether arrows should be drawn |
+
arrows_colour | +the colour of the arrow and their text |
+
arrows_size | +the size (thickness) of the arrow lines |
+
arrows_textsize | +the size of the text at the end of the arrows |
+
arrows_alpha | +the alpha (transparency) of the arrows and their text |
+
base_textsize | +the text size for all plot elements except the labels and arrows |
+
... | +Parameters passed on to functions |
+
The ggplot_pca()
function is based on the ggbiplot()
function from the ggbiplot
package by Vince Vu, as found on GitHub: https://github.com/vqv/ggbiplot (retrieved: 2 March 2020, their latest commit: 7325e88
; 12 February 2015).
As per their GPL-2 licence that demands documentation of code changes, the changes made based on the source code were:
Rewritten code to remove the dependency on packages plyr
, scales
and grid
Parametrised more options, like arrow and ellipse settings
Added total amount of explained variance as a caption in the plot
Cleaned all syntax based on the lintr
package and added integrity checks
Updated documentation
The colours for labels and points can be changed by adding another scale layer for colour, like scale_colour_viridis_d()
or scale_colour_brewer()
.
+The lifecycle of this function is maturing. The unlying code of a maturing function has been roughed out, but finer details might still change. Since this function needs wider usage and more extensive testing, you are very welcome to suggest changes at our repository or write us an email (see section 'Contact Us').
# `example_isolates` is a dataset available in the AMR package. +# See ?example_isolates. + +if (FALSE) { +# See ?pca for more info about Principal Component Analysis (PCA). +library(dplyr) +pca_model <- example_isolates %>% + filter(mo_genus(mo) == "Staphylococcus") %>% + group_by(species = mo_shortname(mo)) %>% + summarise_if (is.rsi, resistance) %>% + pca(FLC, AMC, CXM, GEN, TOB, TMP, SXT, CIP, TEC, TCY, ERY) + +# old +biplot(pca_model) + +# new +ggplot_pca(pca_model) +}+
ggplot2
— ggplot_rsi • AMR (for R)Use these functions to create bar plots for antimicrobial resistance analysis. All functions rely on ggplot2 functions.
+ggplot_rsi( + data, + position = NULL, + x = "antibiotic", + fill = "interpretation", + facet = NULL, + breaks = seq(0, 1, 0.1), + limits = NULL, + translate_ab = "name", + combine_SI = TRUE, + combine_IR = FALSE, + language = get_locale(), + nrow = NULL, + colours = c(S = "#61a8ff", SI = "#61a8ff", I = "#61f7ff", IR = "#ff6961", R = + "#ff6961"), + datalabels = TRUE, + datalabels.size = 2.5, + datalabels.colour = "gray15", + title = NULL, + subtitle = NULL, + caption = NULL, + x.title = "Antimicrobial", + y.title = "Proportion", + ... +) + +geom_rsi( + position = NULL, + x = c("antibiotic", "interpretation"), + fill = "interpretation", + translate_ab = "name", + language = get_locale(), + combine_SI = TRUE, + combine_IR = FALSE, + ... +) + +facet_rsi(facet = c("interpretation", "antibiotic"), nrow = NULL) + +scale_y_percent(breaks = seq(0, 1, 0.1), limits = NULL) + +scale_rsi_colours( + colours = c(S = "#61a8ff", SI = "#61a8ff", I = "#61f7ff", IR = "#ff6961", R = + "#ff6961") +) + +theme_rsi() + +labels_rsi_count( + position = NULL, + x = "antibiotic", + translate_ab = "name", + combine_SI = TRUE, + combine_IR = FALSE, + datalabels.size = 3, + datalabels.colour = "gray15" +)+ +
data | +a |
+
---|---|
position | +position adjustment of bars, either |
+
x | +variable to show on x axis, either |
+
fill | +variable to categorise using the plots legend, either |
+
facet | +variable to split plots by, either |
+
breaks | +numeric vector of positions |
+
limits | +numeric vector of length two providing limits of the scale, use |
+
translate_ab | +a column name of the antibiotics data set to translate the antibiotic abbreviations to, using |
+
combine_SI | +a logical to indicate whether all values of S and I must be merged into one, so the output only consists of S+I vs. R (susceptible vs. resistant). This used to be the parameter |
+
combine_IR | +a logical to indicate whether all values of I and R must be merged into one, so the output only consists of S vs. I+R (susceptible vs. non-susceptible). This is outdated, see parameter |
+
language | +language of the returned text, defaults to system language (see |
+
nrow | +(when using |
+
colours | +a named vector with colours for the bars. The names must be one or more of: S, SI, I, IR, R or be |
+
datalabels | +show datalabels using |
+
datalabels.size | +size of the datalabels |
+
datalabels.colour | +colour of the datalabels |
+
title | +text to show as title of the plot |
+
subtitle | +text to show as subtitle of the plot |
+
caption | +text to show as caption of the plot |
+
x.title | +text to show as x axis description |
+
y.title | +text to show as y axis description |
+
... | +other parameters passed on to |
+
At default, the names of antibiotics will be shown on the plots using ab_name()
. This can be set with the translate_ab
parameter. See count_df()
.
geom_rsi()
will take any variable from the data that has an rsi
class (created with as.rsi()
) using rsi_df()
and will plot bars with the percentage R, I and S. The default behaviour is to have the bars stacked and to have the different antibiotics on the x axis.
facet_rsi()
creates 2d plots (at default based on S/I/R) using ggplot2::facet_wrap()
.
scale_y_percent()
transforms the y axis to a 0 to 100% range using ggplot2::scale_y_continuous()
.
scale_rsi_colours()
sets colours to the bars: pastel blue for S, pastel turquoise for I and pastel red for R, using ggplot2::scale_fill_manual()
.
theme_rsi()
is a [ggplot2 theme][ggplot2::theme()
with minimal distraction.
labels_rsi_count()
print datalabels on the bars with percentage and amount of isolates using ggplot2::geom_text()
.
ggplot_rsi()
is a wrapper around all above functions that uses data as first input. This makes it possible to use this function after a pipe (%>%
). See Examples.
+The lifecycle of this function is maturing. The unlying code of a maturing function has been roughed out, but finer details might still change. Since this function needs wider usage and more extensive testing, you are very welcome to suggest changes at our repository or write us an email (see section 'Contact Us').
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +if (require("ggplot2") & require("dplyr")) { + + # get antimicrobial results for drugs against a UTI: + ggplot(example_isolates %>% select(AMX, NIT, FOS, TMP, CIP)) + + geom_rsi() + + # prettify the plot using some additional functions: + df <- example_isolates %>% select(AMX, NIT, FOS, TMP, CIP) + ggplot(df) + + geom_rsi() + + scale_y_percent() + + scale_rsi_colours() + + labels_rsi_count() + + theme_rsi() + + # or better yet, simplify this using the wrapper function - a single command: + example_isolates %>% + select(AMX, NIT, FOS, TMP, CIP) %>% + ggplot_rsi() + + # get only proportions and no counts: + example_isolates %>% + select(AMX, NIT, FOS, TMP, CIP) %>% + ggplot_rsi(datalabels = FALSE) + + # add other ggplot2 parameters as you like: + example_isolates %>% + select(AMX, NIT, FOS, TMP, CIP) %>% + ggplot_rsi(width = 0.5, + colour = "black", + size = 1, + linetype = 2, + alpha = 0.25) + + example_isolates %>% + select(AMX) %>% + ggplot_rsi(colours = c(SI = "yellow")) + +} + +if (FALSE) { + +# resistance of ciprofloxacine per age group +example_isolates %>% + mutate(first_isolate = first_isolate(.)) %>% + filter(first_isolate == TRUE, + mo == as.mo("E. coli")) %>% + # `age_group` is also a function of this package: + group_by(age_group = age_groups(age)) %>% + select(age_group, + CIP) %>% + ggplot_rsi(x = "age_group") + +# for colourblind mode, use divergent colours from the viridis package: +example_isolates %>% + select(AMX, NIT, FOS, TMP, CIP) %>% + ggplot_rsi() + scale_fill_viridis_d() +# a shorter version which also adjusts data label colours: +example_isolates %>% + select(AMX, NIT, FOS, TMP, CIP) %>% + ggplot_rsi(colours = FALSE) + + +# it also supports groups (don't forget to use the group var on `x` or `facet`): +example_isolates %>% + select(hospital_id, AMX, NIT, FOS, TMP, CIP) %>% + group_by(hospital_id) %>% + ggplot_rsi(x = "hospital_id", + facet = "antibiotic", + nrow = 1, + title = "AMR of Anti-UTI Drugs Per Hospital", + x.title = "Hospital", + datalabels = FALSE) +}+
This tries to find a column name in a data set based on information from the antibiotics data set. Also supports WHONET abbreviations.
+guess_ab_col(x = NULL, search_string = NULL, verbose = FALSE)+ +
x | ++ |
---|---|
search_string | +a text to search |
+
verbose | +a logical to indicate whether additional info should be printed |
+
A column name of x
, or NULL
when no result is found.
You can look for an antibiotic (trade) name or abbreviation and it will search x
and the antibiotics data set for any column containing a name or code of that antibiotic. Longer columns names take precendence over shorter column names.
+The lifecycle of this function is maturing. The unlying code of a maturing function has been roughed out, but finer details might still change. Since this function needs wider usage and more extensive testing, you are very welcome to suggest changes at our repository or write us an email (see section 'Contact Us').
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +df <- data.frame(amox = "S", + tetr = "R") + +guess_ab_col(df, "amoxicillin") +# [1] "amox" +guess_ab_col(df, "J01AA07") # ATC code of tetracycline +# [1] "tetr" + +guess_ab_col(df, "J01AA07", verbose = TRUE) +# Note: Using column `tetr` as input for "J01AA07". +# [1] "tetr" + +# WHONET codes +df <- data.frame(AMP_ND10 = "R", + AMC_ED20 = "S") +guess_ab_col(df, "ampicillin") +# [1] "AMP_ND10" +guess_ab_col(df, "J01CR02") +# [1] "AMC_ED20" +guess_ab_col(df, as.ab("augmentin")) +# [1] "AMC_ED20" + +# Longer names take precendence: +df <- data.frame(AMP_ED2 = "S", + AMP_ED20 = "S") +guess_ab_col(df, "ampicillin") +# [1] "AMP_ED20"+
+ Cleaning your data+Functions for cleaning and optimising your data, to be able to add variables later on (like taxonomic properties) or to fix and extend antibiotic interpretations by applying EUCAST rules. + |
+ |
---|---|
+ + | +Transform to antibiotic ID |
+
+ + | +Class 'disk' |
+
+ + | +Class 'mic' |
+
+
|
+ Transform to microorganism ID |
+
+ + | +Class 'rsi' |
+
+ + | +Apply EUCAST rules |
+
+ + | +Retrieve antimicrobial drug names and doses from clinical text |
+
+ + | +Guess antibiotic column |
+
+ + | +User-defined reference data set for microorganisms |
+
+ Enhancing your data+Functions to add new data to your existing data, such as the determination of first isolates, multi-drug resistant microorganisms (MDRO), getting properties of microorganisms or antibiotics and determining the age of patients or divide ages into age groups. + |
+ |
+
|
+ Property of an antibiotic |
+
+ + | +Split ages into age groups |
+
+ + | +Age in years of individuals |
+
+ + | +Get ATC properties from WHOCC website |
+
+
|
+ Determine first (weighted) isolates |
+
+
|
+ Join microorganisms to a data set |
+
+ + | +Key antibiotics for first weighted isolates |
+
+
|
+ Determine multidrug-resistant organisms (MDRO) |
+
+
|
+ Property of a microorganism |
+
+ + | +Symbol of a p-value |
+
+ Analysing your data+Functions for conducting AMR analysis, like counting isolates, calculating resistance or susceptibility, or make plots. + |
+ |
+
|
+ Calculate microbial resistance |
+
+
|
+ Count available isolates |
+
+ + | +Check availability of columns |
+
+ + | +Determine bug-drug combinations |
+
+
|
+ Predict antimicrobial resistance |
+
+ + | +Principal Component Analysis (for AMR) |
+
+
|
+ Antibiotic class selectors |
+
+
|
+ Filter isolates on result in antimicrobial class |
+
+ + | +G-test for Count Data |
+
+
|
+ AMR plots with |
+
+ + | +PCA biplot with |
+
+ + | +Kurtosis of the sample |
+
+ + | +Skewness of the sample |
+
+ Included data sets+Scientifically reliable references for microorganisms and antibiotics, and example data sets to use for practise. + |
+ |
+ + | +Data set with 67,150 microorganisms |
+
+ + | +Data sets with 558 antimicrobials |
+
+ + | +Data set with 2,000 example isolates |
+
+ + | +Data set with unclean data |
+
+ + | +Data set for R/SI interpretation |
+
+ + | +Translation table with 5,582 common microorganism codes |
+
+ + | +Data set with previously accepted taxonomic names |
+
+ + | +Data set with 500 isolates - WHONET example |
+
+ Background information+Some pages about our package and its external sources. Be sure to read our How To’s for more information about how to work with functions in this package. + |
+ |
+ + | +The |
+
+ + | +The Catalogue of Life |
+
+ + | +Version info of included Catalogue of Life |
+
+ + | +WHOCC: WHO Collaborating Centre for Drug Statistics Methodology |
+
+ + | +Lifecycles of functions in the |
+
+ Other functions+These functions are mostly for internal use, but some of them may also be suitable for your analysis. Especially the ‘like’ function can be useful: |
+ |
+ + | +Translate strings from AMR package |
+
+ + | +Pattern Matching |
+
+ Deprecated functions+These functions are deprecated, meaning that they will still work but show a warning with every use and will be removed in a future version. + |
+ |
+
|
+ Deprecated functions |
+
Join the data set microorganisms easily to an existing table or character vector.
+inner_join_microorganisms(x, by = NULL, suffix = c("2", ""), ...) + +left_join_microorganisms(x, by = NULL, suffix = c("2", ""), ...) + +right_join_microorganisms(x, by = NULL, suffix = c("2", ""), ...) + +full_join_microorganisms(x, by = NULL, suffix = c("2", ""), ...) + +semi_join_microorganisms(x, by = NULL, ...) + +anti_join_microorganisms(x, by = NULL, ...)+ +
x | +existing table to join, or character vector |
+
---|---|
by | +a variable to join by - if left empty will search for a column with class |
+
suffix | +if there are non-joined duplicate variables in |
+
... | +ignored |
+
Note: As opposed to the join()
functions of dplyr
, character
vectors are supported and at default existing columns will get a suffix "2"
and the newly joined columns will not get a suffix.
These functions rely on merge()
, a base R function to do joins.
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +left_join_microorganisms(as.mo("K. pneumoniae")) +left_join_microorganisms("B_KLBSL_PNE") + +if (FALSE) { +library(dplyr) +example_isolates %>% left_join_microorganisms() + +df <- data.frame(date = seq(from = as.Date("2018-01-01"), + to = as.Date("2018-01-07"), + by = 1), + bacteria = as.mo(c("S. aureus", "MRSA", "MSSA", "STAAUR", + "E. coli", "E. coli", "E. coli")), + stringsAsFactors = FALSE) +colnames(df) +df_joined <- left_join_microorganisms(df, "bacteria") +colnames(df_joined) +}+
These function can be used to determine first isolates (see first_isolate()
). Using key antibiotics to determine first isolates is more reliable than without key antibiotics. These selected isolates will then be called first weighted isolates.
key_antibiotics( + x, + col_mo = NULL, + universal_1 = guess_ab_col(x, "amoxicillin"), + universal_2 = guess_ab_col(x, "amoxicillin/clavulanic acid"), + universal_3 = guess_ab_col(x, "cefuroxime"), + universal_4 = guess_ab_col(x, "piperacillin/tazobactam"), + universal_5 = guess_ab_col(x, "ciprofloxacin"), + universal_6 = guess_ab_col(x, "trimethoprim/sulfamethoxazole"), + GramPos_1 = guess_ab_col(x, "vancomycin"), + GramPos_2 = guess_ab_col(x, "teicoplanin"), + GramPos_3 = guess_ab_col(x, "tetracycline"), + GramPos_4 = guess_ab_col(x, "erythromycin"), + GramPos_5 = guess_ab_col(x, "oxacillin"), + GramPos_6 = guess_ab_col(x, "rifampin"), + GramNeg_1 = guess_ab_col(x, "gentamicin"), + GramNeg_2 = guess_ab_col(x, "tobramycin"), + GramNeg_3 = guess_ab_col(x, "colistin"), + GramNeg_4 = guess_ab_col(x, "cefotaxime"), + GramNeg_5 = guess_ab_col(x, "ceftazidime"), + GramNeg_6 = guess_ab_col(x, "meropenem"), + warnings = TRUE, + ... +) + +key_antibiotics_equal( + y, + z, + type = c("keyantibiotics", "points"), + ignore_I = TRUE, + points_threshold = 2, + info = FALSE +)+ +
x | +table with antibiotics coloms, like |
+
---|---|
col_mo | +column name of the IDs of the microorganisms (see |
+
universal_1, universal_2, universal_3, universal_4, universal_5, universal_6 | +column names of broad-spectrum antibiotics, case-insensitive. At default, the columns containing these antibiotics will be guessed with |
+
GramPos_1, GramPos_2, GramPos_3, GramPos_4, GramPos_5, GramPos_6 | +column names of antibiotics for Gram-positives, case-insensitive. At default, the columns containing these antibiotics will be guessed with |
+
GramNeg_1, GramNeg_2, GramNeg_3, GramNeg_4, GramNeg_5, GramNeg_6 | +column names of antibiotics for Gram-negatives, case-insensitive. At default, the columns containing these antibiotics will be guessed with |
+
warnings | +give warning about missing antibiotic columns, they will anyway be ignored |
+
... | +other parameters passed on to function |
+
y, z | +characters to compare |
+
type | +type to determine weighed isolates; can be |
+
ignore_I | +logical to determine whether antibiotic interpretations with |
+
points_threshold | +points until the comparison of key antibiotics will lead to inclusion of an isolate when |
+
info | +print progress |
+
The function key_antibiotics()
returns a character vector with 12 antibiotic results for every isolate. These isolates can then be compared using key_antibiotics_equal()
, to check if two isolates have generally the same antibiogram. Missing and invalid values are replaced with a dot ("."
) by key_antibiotics()
and ignored by key_antibiotics_equal()
.
The first_isolate()
function only uses this function on the same microbial species from the same patient. Using this, e.g. an MRSA will be included after a susceptible S. aureus (MSSA) is found within the same patient episode. Without key antibiotic comparison it would not. See first_isolate()
for more info.
At default, the antibiotics that are used for Gram-positive bacteria are:
Amoxicillin
Amoxicillin/clavulanic acid
Cefuroxime
Piperacillin/tazobactam
Ciprofloxacin
Trimethoprim/sulfamethoxazole
Vancomycin
Teicoplanin
Tetracycline
Erythromycin
Oxacillin
Rifampin
At default the antibiotics that are used for Gram-negative bacteria are:
Amoxicillin
Amoxicillin/clavulanic acid
Cefuroxime
Piperacillin/tazobactam
Ciprofloxacin
Trimethoprim/sulfamethoxazole
Gentamicin
Tobramycin
Colistin
Cefotaxime
Ceftazidime
Meropenem
The function key_antibiotics_equal()
checks the characters returned by key_antibiotics()
for equality, and returns a logical
vector.
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+There are two ways to determine whether isolates can be included as first weighted isolates which will give generally the same results:
Using type = "keyantibiotics"
and parameter ignore_I
Any difference from S to R (or vice versa) will (re)select an isolate as a first weighted isolate. With ignore_I = FALSE
, also differences from I to S|R (or vice versa) will lead to this. This is a reliable method and 30-35 times faster than method 2. Read more about this in the key_antibiotics()
function.
Using type = "points"
and parameter points_threshold
A difference from I to S|R (or vice versa) means 0.5 points, a difference from S to R (or vice versa) means 1 point. When the sum of points exceeds points_threshold
, which default to 2
, an isolate will be (re)selected as a first weighted isolate.
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+# `example_isolates` is a dataset available in the AMR package. +# See ?example_isolates. + +if (FALSE) { +library(dplyr) +# set key antibiotics to a new variable +my_patients <- example_isolates %>% + mutate(keyab = key_antibiotics(.)) %>% + mutate( + # now calculate first isolates + first_regular = first_isolate(., col_keyantibiotics = FALSE), + # and first WEIGHTED isolates + first_weighted = first_isolate(., col_keyantibiotics = "keyab") + ) + +# Check the difference, in this data set it results in 7% more isolates: +sum(my_patients$first_regular, na.rm = TRUE) +sum(my_patients$first_weighted, na.rm = TRUE) +} + +# output of the `key_antibiotics` function could be like this: +strainA <- "SSSRR.S.R..S" +strainB <- "SSSIRSSSRSSS" + +key_antibiotics_equal(strainA, strainB) +# TRUE, because I is ignored (as well as missing values) + +key_antibiotics_equal(strainA, strainB, ignore_I = FALSE) +# FALSE, because I is not ignored and so the 4th value differs+
Kurtosis is a measure of the "tailedness" of the probability distribution of a real-valued random variable.
+kurtosis(x, na.rm = FALSE) + +# S3 method for default +kurtosis(x, na.rm = FALSE) + +# S3 method for matrix +kurtosis(x, na.rm = FALSE) + +# S3 method for data.frame +kurtosis(x, na.rm = FALSE)+ +
x | +a vector of values, a |
+
---|---|
na.rm | +a logical value indicating whether |
+
+The lifecycle of this function is questioning. This function might be no longer be optimal approach, or is it questionable whether this function should be in this AMR
package at all.
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+AMR
package — lifecycle • AMR (for R)Functions in this AMR
package are categorised using the lifecycle circle of the Tidyverse as found on www.tidyverse.org/lifecycle.
+This page contains a section for every lifecycle (with text borrowed from the aforementioned Tidyverse website), so they can be used in the manual pages of the functions.
+The lifecycle of this function is experimental. An experimental function is in early stages of development. The unlying code might be changing frequently. Experimental functions might be removed without deprecation, so you are generally best off waiting until a function is more mature before you use it in production code. Experimental functions are only available in development versions of this AMR
package and will thus not be included in releases that are submitted to CRAN, since such functions have not yet matured enough.
+The lifecycle of this function is maturing. The unlying code of a maturing function has been roughed out, but finer details might still change. Since this function needs wider usage and more extensive testing, you are very welcome to suggest changes at our repository or write us an email (see section 'Contact Us').
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+
+The lifecycle of this function is retired. A retired function is no longer under active development, and (if appropiate) a better alternative is available. No new arguments will be added, and only the most critical bugs will be fixed. In a future version, this function will be removed.
+The lifecycle of this function is questioning. This function might be no longer be optimal approach, or is it questionable whether this function should be in this AMR
package at all.
Convenient wrapper around grep()
to match a pattern: x %like% pattern
. It always returns a logical
vector and is always case-insensitive (use x %like_case% pattern
for case-sensitive matching). Also, pattern
can be as long as x
to compare items of each index in both vectors, or they both can have the same length to iterate over all cases.
like(x, pattern, ignore.case = TRUE) + +x %like% pattern + +x %like_case% pattern+ +
x | +a character vector where matches are sought, or an object which can be coerced by |
+
---|---|
pattern | +a character string containing a regular expression (or |
+
ignore.case | +if |
+
Idea from the like
function from the data.table
package
A logical
vector
The %like%
function:
Is case insensitive (use %like_case%
for case-sensitive matching)
Supports multiple patterns
Checks if pattern
is a regular expression and sets fixed = TRUE
if not, to greatly improve speed
Tries again with perl = TRUE
if regex fails
Using RStudio? This function can also be inserted from the Addins menu and can have its own Keyboard Shortcut like Ctrl+Shift+L
or Cmd+Shift+L
(see Tools
> Modify Keyboard Shortcuts...
).
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+# simple test +a <- "This is a test" +b <- "TEST" +a %like% b +#> TRUE +b %like% a +#> FALSE + +# also supports multiple patterns, length must be equal to x +a <- c("Test case", "Something different", "Yet another thing") +b <- c( "case", "diff", "yet") +a %like% b +#> TRUE TRUE TRUE + +# get isolates whose name start with 'Ent' or 'ent' +if (FALSE) { +library(dplyr) +example_isolates %>% + filter(mo_name(mo) %like% "^ent") +}+
Determine which isolates are multidrug-resistant organisms (MDRO) according to international and national guidelines.
+mdro( + x, + guideline = "CMI2012", + col_mo = NULL, + info = interactive(), + pct_required_classes = 0.5, + combine_SI = TRUE, + verbose = FALSE, + ... +) + +brmo(x, guideline = "BRMO", ...) + +mrgn(x, guideline = "MRGN", ...) + +mdr_tb(x, guideline = "TB", ...) + +mdr_cmi2012(x, guideline = "CMI2012", ...) + +eucast_exceptional_phenotypes(x, guideline = "EUCAST", ...)+ +
x | +data with antibiotic columns, like e.g. |
+
---|---|
guideline | +a specific guideline to follow. When left empty, the publication by Magiorakos et al. (2012, Clinical Microbiology and Infection) will be followed, please see Details. |
+
col_mo | +column name of the IDs of the microorganisms (see |
+
info | +a logical to indicate whether progress should be printed to the console |
+
pct_required_classes | +minimal required percentage of antimicrobial classes that must be available per isolate, rounded down. For example, with the default guideline, 17 antimicrobial classes must be available for S. aureus. Setting this |
+
combine_SI | +a logical to indicate whether all values of S and I must be merged into one, so resistance is only considered when isolates are R, not I. As this is the default behaviour of the |
+
verbose | +a logical to turn Verbose mode on and off (default is off). In Verbose mode, the function does not return the MDRO results, but instead returns a data set in logbook form with extensive info about which isolates would be MDRO-positive, or why they are not. |
+
... | +column name of an antibiotic, please see section Antibiotics below |
+
Please see Details for the list of publications used for this function.
+CMI 2012 paper - function mdr_cmi2012()
or mdro()
:
+Ordered factor
with levels Negative
< Multi-drug-resistant (MDR)
< Extensively drug-resistant (XDR)
< Pandrug-resistant (PDR)
TB guideline - function mdr_tb()
or mdro(..., guideline = "TB")
:
+Ordered factor
with levels Negative
< Mono-resistant
< Poly-resistant
< Multi-drug-resistant
< Extensively drug-resistant
German guideline - function mrgn()
or mdro(..., guideline = "MRGN")
:
+Ordered factor
with levels Negative
< 3MRGN
< 4MRGN
Everything else:
+Ordered factor
with levels Negative
< Positive, unconfirmed
< Positive
. The value "Positive, unconfirmed"
means that, according to the guideline, it is not entirely sure if the isolate is multi-drug resistant and this should be confirmed with additional (e.g. molecular) tests
For the pct_required_classes
argument, values above 1 will be divided by 100. This is to support both fractions (0.75
or 3/4
) and percentages (75
).
Currently supported guidelines are (case-insensitive):
guideline = "CMI2012"
+Magiorakos AP, Srinivasan A et al. "Multidrug-resistant, extensively drug-resistant and pandrug-resistant bacteria: an international expert proposal for interim standard definitions for acquired resistance." Clinical Microbiology and Infection (2012) (link)
guideline = "EUCAST"
+The European international guideline - EUCAST Expert Rules Version 3.1 "Intrinsic Resistance and Exceptional Phenotypes Tables" (link)
guideline = "TB"
+The international guideline for multi-drug resistant tuberculosis - World Health Organization "Companion handbook to the WHO guidelines for the programmatic management of drug-resistant tuberculosis" (link)
guideline = "MRGN"
+The German national guideline - Mueller et al. (2015) Antimicrobial Resistance and Infection Control 4:7. DOI: 10.1186/s13756-015-0047-6
guideline = "BRMO"
+The Dutch national guideline - Rijksinstituut voor Volksgezondheid en Milieu "WIP-richtlijn BRMO (Bijzonder Resistente Micro-Organismen) (ZKH)" (link)
Please suggest your own (country-specific) guidelines by letting us know: https://github.com/msberends/AMR/issues/new.
+Note: Every test that involves the Enterobacteriaceae family, will internally be performed using its newly named order Enterobacterales, since the Enterobacteriaceae family has been taxonomically reclassified by Adeolu et al. in 2016. Before that, Enterobacteriaceae was the only family under the Enterobacteriales (with an i) order. All species under the old Enterobacteriaceae family are still under the new Enterobacterales (without an i) order, but divided into multiple families. The way tests are performed now by this mdro()
function makes sure that results from before 2016 and after 2016 are identical.
+The lifecycle of this function is maturing. The unlying code of a maturing function has been roughed out, but finer details might still change. Since this function needs wider usage and more extensive testing, you are very welcome to suggest changes at our repository or write us an email (see section 'Contact Us').
To define antibiotics column names, leave as it is to determine it automatically with guess_ab_col()
or input a text (case-insensitive), or use NULL
to skip a column (e.g. TIC = NULL
to skip ticarcillin). Manually defined but non-existing columns will be skipped with a warning.
The following antibiotics are used for the functions eucast_rules()
and mdro()
. These are shown below in the format 'antimicrobial ID: name (ATC code)', sorted by name:
AMK: amikacin (J01GB06), +AMX: amoxicillin (J01CA04), +AMC: amoxicillin/clavulanic acid (J01CR02), +AMP: ampicillin (J01CA01), +SAM: ampicillin/sulbactam (J01CR01), +AZM: azithromycin (J01FA10), +AZL: azlocillin (J01CA09), +ATM: aztreonam (J01DF01), +CAP: capreomycin (J04AB30), +RID: cefaloridine (J01DB02), +CZO: cefazolin (J01DB04), +FEP: cefepime (J01DE01), +CTX: cefotaxime (J01DD01), +CTT: cefotetan (J01DC05), +FOX: cefoxitin (J01DC01), +CPT: ceftaroline (J01DI02), +CAZ: ceftazidime (J01DD02), +CRO: ceftriaxone (J01DD04), +CXM: cefuroxime (J01DC02), +CED: cephradine (J01DB09), +CHL: chloramphenicol (J01BA01), +CIP: ciprofloxacin (J01MA02), +CLR: clarithromycin (J01FA09), +CLI: clindamycin (J01FF01), +COL: colistin (J01XB01), +DAP: daptomycin (J01XX09), +DOR: doripenem (J01DH04), +DOX: doxycycline (J01AA02), +ETP: ertapenem (J01DH03), +ERY: erythromycin (J01FA01), +ETH: ethambutol (J04AK02), +FLC: flucloxacillin (J01CF05), +FOS: fosfomycin (J01XX01), +FUS: fusidic acid (J01XC01), +GAT: gatifloxacin (J01MA16), +GEN: gentamicin (J01GB03), +GEH: gentamicin-high (no ATC code), +IPM: imipenem (J01DH51), +INH: isoniazid (J04AC01), +KAN: kanamycin (J01GB04), +LVX: levofloxacin (J01MA12), +LIN: lincomycin (J01FF02), +LNZ: linezolid (J01XX08), +MEM: meropenem (J01DH02), +MTR: metronidazole (J01XD01), +MEZ: mezlocillin (J01CA10), +MNO: minocycline (J01AA08), +MFX: moxifloxacin (J01MA14), +NAL: nalidixic acid (J01MB02), +NEO: neomycin (J01GB05), +NET: netilmicin (J01GB07), +NIT: nitrofurantoin (J01XE01), +NOR: norfloxacin (J01MA06), +NOV: novobiocin (QJ01XX95), +OFX: ofloxacin (J01MA01), +OXA: oxacillin (J01CF04), +PEN: penicillin G (J01CE01), +PIP: piperacillin (J01CA12), +TZP: piperacillin/tazobactam (J01CR05), +PLB: polymyxin B (J01XB02), +PRI: pristinamycin (J01FG01), +PZA: pyrazinamide (J04AK01), +QDA: quinupristin/dalfopristin (J01FG02), +RIB: rifabutin (J04AB04), +RIF: rifampicin (J04AB02), +RFP: rifapentine (J04AB05), +RXT: roxithromycin (J01FA06), +SIS: sisomicin (J01GB08), +STH: streptomycin-high (no ATC code), +TEC: teicoplanin (J01XA02), +TLV: telavancin (J01XA03), +TCY: tetracycline (J01AA07), +TIC: ticarcillin (J01CA13), +TCC: ticarcillin/clavulanic acid (J01CR03), +TGC: tigecycline (J01AA12), +TOB: tobramycin (J01GB01), +TMP: trimethoprim (J01EA01), +SXT: trimethoprim/sulfamethoxazole (J01EE01), +VAN: vancomycin (J01XA01).
+In 2019, the European Committee on Antimicrobial Susceptibility Testing (EUCAST) has decided to change the definitions of susceptibility testing categories R and S/I as shown below (http://www.eucast.org/newsiandr/).
R = Resistant
+A microorganism is categorised as Resistant when there is a high likelihood of therapeutic failure even when there is increased exposure. Exposure is a function of how the mode of administration, dose, dosing interval, infusion time, as well as distribution and excretion of the antimicrobial agent will influence the infecting organism at the site of infection.
S = Susceptible
+A microorganism is categorised as Susceptible, standard dosing regimen, when there is a high likelihood of therapeutic success using a standard dosing regimen of the agent.
I = Increased exposure, but still susceptible
+A microorganism is categorised as Susceptible, Increased exposure when there is a high likelihood of therapeutic success because exposure to the agent is increased by adjusting the dosing regimen or by its concentration at the site of infection.
This AMR package honours this new insight. Use susceptibility()
(equal to proportion_SI()
) to determine antimicrobial susceptibility and count_susceptible()
(equal to count_SI()
) to count susceptible isolates.
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +if (FALSE) { +library(dplyr) +library(cleaner) + +example_isolates %>% + mdro() %>% + freq() + +example_isolates %>% + mutate(EUCAST = eucast_exceptional_phenotypes(.), + BRMO = brmo(.), + MRGN = mrgn(.)) +}+
R/data.R
+ microorganisms.codes.Rd
A data set containing commonly used codes for microorganisms, from laboratory systems and WHONET. Define your own with set_mo_source()
. They will all be searched when using as.mo()
and consequently all the mo_*
functions.
microorganisms.codes
+
+
+ A data.frame
with 5,582 observations and 2 variables:
code
Commonly used code of a microorganism
mo
ID of the microorganism in the microorganisms data set
+This package contains the complete taxonomic tree of almost all microorganisms (~70,000 species) from the authoritative and comprehensive Catalogue of Life (http://www.catalogueoflife.org). The Catalogue of Life is the most comprehensive and authoritative global index of species currently available.
Click here for more information about the included taxa. Check which version of the Catalogue of Life was included in this package with catalogue_of_life_version()
.
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+A data set containing the microbial taxonomy of six kingdoms from the Catalogue of Life. MO codes can be looked up using as.mo()
.
microorganisms
+
+
+ A data.frame
with 67,150 observations and 16 variables:
mo
ID of microorganism as used by this package
fullname
Full name, like "Escherichia coli"
kingdom
, phylum
, class
, order
, family
, genus
, species
, subspecies
Taxonomic rank of the microorganism
rank
Text of the taxonomic rank of the microorganism, like "species"
or "genus"
ref
Author(s) and year of concerning scientific publication
species_id
ID of the species as used by the Catalogue of Life
source
Either "CoL", "DSMZ" (see Source) or "manually added"
prevalence
Prevalence of the microorganism, see as.mo()
snomed
SNOMED code of the microorganism. Use mo_snomed()
to retrieve it quickly, see mo_property()
.
Catalogue of Life: Annual Checklist (public online taxonomic database), http://www.catalogueoflife.org (check included annual version with catalogue_of_life_version()
).
Parte, A.C. (2018). LPSN — List of Prokaryotic names with Standing in Nomenclature (bacterio.net), 20 years on. International Journal of Systematic and Evolutionary Microbiology, 68, 1825-1829; doi: 10.1099/ijsem.0.002786
+Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Germany, Prokaryotic Nomenclature Up-to-Date, https://www.dsmz.de/services/online-tools/prokaryotic-nomenclature-up-to-date (check included version with catalogue_of_life_version()
).
Manually added were:
11 entries of Streptococcus (beta-haemolytic: groups A, B, C, D, F, G, H, K and unspecified; other: viridans, milleri)
2 entries of Staphylococcus (coagulase-negative (CoNS) and coagulase-positive (CoPS))
3 entries of Trichomonas (Trichomonas vaginalis, and its family and genus)
1 entry of Blastocystis (Blastocystis hominis), although it officially does not exist (Noel et al. 2005, PMID 15634993)
5 other 'undefined' entries (unknown, unknown Gram negatives, unknown Gram positives, unknown yeast and unknown fungus)
6 families under the Enterobacterales order, according to Adeolu et al. (2016, PMID 27620848), that are not (yet) in the Catalogue of Life
7,411 species from the DSMZ (Deutsche Sammlung von Mikroorganismen und Zellkulturen) since the DSMZ contain the latest taxonomic information based on recent publications
This data set is available as 'flat file' for use even without R - you can find the file here:
+ +The file in R format (with preserved data structure) can be found here:
+ + +Names of prokaryotes are defined as being validly published by the International Code of Nomenclature of Bacteria. Validly published are all names which are included in the Approved Lists of Bacterial Names and the names subsequently published in the International Journal of Systematic Bacteriology (IJSB) and, from January 2000, in the International Journal of Systematic and Evolutionary Microbiology (IJSEM) as original articles or in the validation lists.
+From: https://www.dsmz.de/services/online-tools/prokaryotic-nomenclature-up-to-date/complete-list-readme
+
+This package contains the complete taxonomic tree of almost all microorganisms (~70,000 species) from the authoritative and comprehensive Catalogue of Life (http://www.catalogueoflife.org). The Catalogue of Life is the most comprehensive and authoritative global index of species currently available.
Click here for more information about the included taxa. Check which version of the Catalogue of Life was included in this package with catalogue_of_life_version()
.
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+A data set containing old (previously valid or accepted) taxonomic names according to the Catalogue of Life. This data set is used internally by as.mo()
.
microorganisms.old
+
+
+ A data.frame
with 12,708 observations and 4 variables:
fullname
Old full taxonomic name of the microorganism
fullname_new
New full taxonomic name of the microorganism
ref
Author(s) and year of concerning scientific publication
prevalence
Prevalence of the microorganism, see as.mo()
Catalogue of Life: Annual Checklist (public online taxonomic database), http://www.catalogueoflife.org (check included annual version with catalogue_of_life_version()
).
Parte, A.C. (2018). LPSN — List of Prokaryotic names with Standing in Nomenclature (bacterio.net), 20 years on. International Journal of Systematic and Evolutionary Microbiology, 68, 1825-1829; doi: 10.1099/ijsem.0.002786
+
+This package contains the complete taxonomic tree of almost all microorganisms (~70,000 species) from the authoritative and comprehensive Catalogue of Life (http://www.catalogueoflife.org). The Catalogue of Life is the most comprehensive and authoritative global index of species currently available.
Click here for more information about the included taxa. Check which version of the Catalogue of Life was included in this package with catalogue_of_life_version()
.
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+Use these functions to return a specific property of a microorganism. All input values will be evaluated internally with as.mo()
, which makes it possible to use microbial abbreviations, codes and names as input. Please see Examples.
mo_name(x, language = get_locale(), ...) + +mo_fullname(x, language = get_locale(), ...) + +mo_shortname(x, language = get_locale(), ...) + +mo_subspecies(x, language = get_locale(), ...) + +mo_species(x, language = get_locale(), ...) + +mo_genus(x, language = get_locale(), ...) + +mo_family(x, language = get_locale(), ...) + +mo_order(x, language = get_locale(), ...) + +mo_class(x, language = get_locale(), ...) + +mo_phylum(x, language = get_locale(), ...) + +mo_kingdom(x, language = get_locale(), ...) + +mo_domain(x, language = get_locale(), ...) + +mo_type(x, language = get_locale(), ...) + +mo_gramstain(x, language = get_locale(), ...) + +mo_snomed(x, ...) + +mo_ref(x, ...) + +mo_authors(x, ...) + +mo_year(x, ...) + +mo_rank(x, ...) + +mo_taxonomy(x, language = get_locale(), ...) + +mo_synonyms(x, ...) + +mo_info(x, language = get_locale(), ...) + +mo_url(x, open = FALSE, ...) + +mo_property(x, property = "fullname", language = get_locale(), ...)+ +
x | +any (vector of) text that can be coerced to a valid microorganism code with |
+
---|---|
language | +language of the returned text, defaults to system language (see |
+
... | +other parameters passed on to |
+
open | +browse the URL using |
+
property | +one of the column names of the microorganisms data set or |
+
An integer
in case of mo_year()
A list
in case of mo_taxonomy()
and mo_info()
A named character
in case of mo_url()
A double
in case of mo_snomed()
A character
in all other cases
All functions will return the most recently known taxonomic property according to the Catalogue of Life, except for mo_ref()
, mo_authors()
and mo_year()
. Please refer to this example, knowing that Escherichia blattae was renamed to Shimwellia blattae in 2010:
mo_name("Escherichia blattae")
will return "Shimwellia blattae"
(with a message about the renaming)
mo_ref("Escherichia blattae")
will return "Burgess et al., 1973"
(with a message about the renaming)
mo_ref("Shimwellia blattae")
will return "Priest et al., 2010"
(without a message)
The short name - mo_shortname()
- almost always returns the first character of the genus and the full species, like "E. coli"
. Exceptions are abbreviations of staphylococci (like "CoNS", Coagulase-Negative Staphylococci) and beta-haemolytic streptococci (like "GBS", Group B Streptococci). Please bear in mind that e.g. E. coli could mean Escherichia coli (kingdom of Bacteria) as well as Entamoeba coli (kingdom of Protozoa). Returning to the full name will be done using as.mo()
internally, giving priority to bacteria and human pathogens, i.e. "E. coli"
will be considered Escherichia coli. In other words, mo_fullname(mo_shortname("Entamoeba coli"))
returns "Escherichia coli"
.
Since the top-level of the taxonomy is sometimes referred to as 'kingdom' and sometimes as 'domain', the functions mo_kingdom()
and mo_domain()
return the exact same results.
The Gram stain - mo_gramstain()
- will be determined based on the taxonomic kingdom and phylum. According to Cavalier-Smith (2002, PMID 11837318), who defined subkingdoms Negibacteria and Posibacteria, only these phyla are Posibacteria: Actinobacteria, Chloroflexi, Firmicutes and Tenericutes. These bacteria are considered Gram-positive - all other bacteria are considered Gram-negative. Species outside the kingdom of Bacteria will return a value NA
.
All output will be translated where possible.
+The function mo_url()
will return the direct URL to the online database entry, which also shows the scientific reference of the concerned species.
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+
+This package contains the complete taxonomic tree of almost all microorganisms (~70,000 species) from the authoritative and comprehensive Catalogue of Life (http://www.catalogueoflife.org). The Catalogue of Life is the most comprehensive and authoritative global index of species currently available.
Click here for more information about the included taxa. Check which version of the Catalogue of Life was included in this package with catalogue_of_life_version()
.
Becker K et al. Coagulase-Negative Staphylococci. 2014. Clin Microbiol Rev. 27(4): 870–926. https://dx.doi.org/10.1128/CMR.00109-13
Becker K et al. Implications of identifying the recently defined members of the S. aureus complex, S. argenteus and S. schweitzeri: A position paper of members of the ESCMID Study Group for staphylococci and Staphylococcal Diseases (ESGS). 2019. Clin Microbiol Infect. https://doi.org/10.1016/j.cmi.2019.02.028
Lancefield RC A serological differentiation of human and other groups of hemolytic streptococci. 1933. J Exp Med. 57(4): 571–95. https://dx.doi.org/10.1084/jem.57.4.571
Catalogue of Life: Annual Checklist (public online taxonomic database), http://www.catalogueoflife.org (check included annual version with catalogue_of_life_version()
).
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+# taxonomic tree ----------------------------------------------------------- +mo_kingdom("E. coli") # "Bacteria" +mo_phylum("E. coli") # "Proteobacteria" +mo_class("E. coli") # "Gammaproteobacteria" +mo_order("E. coli") # "Enterobacterales" +mo_family("E. coli") # "Enterobacteriaceae" +mo_genus("E. coli") # "Escherichia" +mo_species("E. coli") # "coli" +mo_subspecies("E. coli") # "" + +# colloquial properties ---------------------------------------------------- +mo_name("E. coli") # "Escherichia coli" +mo_fullname("E. coli") # "Escherichia coli" - same as mo_name() +mo_shortname("E. coli") # "E. coli" + +# other properties --------------------------------------------------------- +mo_gramstain("E. coli") # "Gram-negative" +mo_snomed("E. coli") # 112283007, 116395006, ... (SNOMED codes) +mo_type("E. coli") # "Bacteria" (equal to kingdom, but may be translated) +mo_rank("E. coli") # "species" +mo_url("E. coli") # get the direct url to the online database entry +mo_synonyms("E. coli") # get previously accepted taxonomic names + +# scientific reference ----------------------------------------------------- +mo_ref("E. coli") # "Castellani et al., 1919" +mo_authors("E. coli") # "Castellani et al." +mo_year("E. coli") # 1919 + +# abbreviations known in the field ----------------------------------------- +mo_genus("MRSA") # "Staphylococcus" +mo_species("MRSA") # "aureus" +mo_shortname("VISA") # "S. aureus" +mo_gramstain("VISA") # "Gram-positive" + +mo_genus("EHEC") # "Escherichia" +mo_species("EHEC") # "coli" + +# known subspecies --------------------------------------------------------- +mo_name("doylei") # "Campylobacter jejuni doylei" +mo_genus("doylei") # "Campylobacter" +mo_species("doylei") # "jejuni" +mo_subspecies("doylei") # "doylei" + +mo_fullname("K. pneu rh") # "Klebsiella pneumoniae rhinoscleromatis" +mo_shortname("K. pneu rh") # "K. pneumoniae" + +# \donttest{ +# Becker classification, see ?as.mo ---------------------------------------- +mo_fullname("S. epi") # "Staphylococcus epidermidis" +mo_fullname("S. epi", Becker = TRUE) # "Coagulase-negative Staphylococcus (CoNS)" +mo_shortname("S. epi") # "S. epidermidis" +mo_shortname("S. epi", Becker = TRUE) # "CoNS" + +# Lancefield classification, see ?as.mo ------------------------------------ +mo_fullname("S. pyo") # "Streptococcus pyogenes" +mo_fullname("S. pyo", Lancefield = TRUE) # "Streptococcus group A" +mo_shortname("S. pyo") # "S. pyogenes" +mo_shortname("S. pyo", Lancefield = TRUE) # "GAS" (='Group A Streptococci') + + +# language support for German, Dutch, Spanish, Portuguese, Italian and French +mo_gramstain("E. coli", language = "de") # "Gramnegativ" +mo_gramstain("E. coli", language = "nl") # "Gram-negatief" +mo_gramstain("E. coli", language = "es") # "Gram negativo" + +# mo_type is equal to mo_kingdom, but mo_kingdom will remain official +mo_kingdom("E. coli") # "Bacteria" on a German system +mo_type("E. coli") # "Bakterien" on a German system +mo_type("E. coli") # "Bacteria" on an English system + +mo_fullname("S. pyogenes", + Lancefield = TRUE, + language = "de") # "Streptococcus Gruppe A" +mo_fullname("S. pyogenes", + Lancefield = TRUE, + language = "nl") # "Streptococcus groep A" + + +# get a list with the complete taxonomy (from kingdom to subspecies) +mo_taxonomy("E. coli") +# get a list with the taxonomy, the authors, Gram-stain and URL to the online database +mo_info("E. coli") +# }+
These functions can be used to predefine your own reference to be used in as.mo()
and consequently all mo_*
functions like mo_genus()
and mo_gramstain()
.
This is the fastest way to have your organisation (or analysis) specific codes picked up and translated by this package.
+set_mo_source(path) + +get_mo_source()+ +
path | +location of your reference file, see Details. Can be |
+
---|
The reference file can be a text file seperated with commas (CSV) or tabs or pipes, an Excel file (either 'xls' or 'xlsx' format) or an R object file (extension '.rds'). To use an Excel file, you need to have the readxl
package installed.
set_mo_source()
will check the file for validity: it must be a data.frame
, must have a column named "mo"
which contains values from microorganisms$mo
and must have a reference column with your own defined values. If all tests pass, set_mo_source()
will read the file into R and export it to "~/.mo_source.rds"
. This compressed data file will then be used at default for MO determination (function as.mo()
and consequently all mo_*
functions like mo_genus()
and mo_gramstain()
). The location of the original file will be saved as option with options(mo_source = path)
. Its timestamp will be saved with options(mo_source_datetime = ...)
.
get_mo_source()
will return the data set by reading "~/.mo_source.rds"
with readRDS()
. If the original file has changed (the file defined with path
), it will call set_mo_source()
to update the data file automatically.
Reading an Excel file (.xlsx
) with only one row has a size of 8-9 kB. The compressed file created with set_mo_source()
will then have a size of 0.1 kB and can be read by get_mo_source()
in only a couple of microseconds (millionths of a second).
Imagine this data on a sheet of an Excel file (mo codes were looked up in the microorganisms data set). The first column contains the organisation specific codes, the second column contains an MO code from this package:
| A | B | +--|--------------------|--------------| +1 | Organisation XYZ | mo | +2 | lab_mo_ecoli | B_ESCHR_COLI | +3 | lab_mo_kpneumoniae | B_KLBSL_PNMN | +4 | | | ++ +
We save it as "home/me/ourcodes.xlsx"
. Now we have to set it as a source:
set_mo_source("home/me/ourcodes.xlsx") +#> NOTE: Created mo_source file '~/.mo_source.rds' from 'home/me/ourcodes.xlsx' +#> (columns "Organisation XYZ" and "mo")+ +
It has now created a file "~/.mo_source.rds"
with the contents of our Excel file. Only the first column with foreign values and the 'mo' column will be kept when creating the RDS file.
And now we can use it in our functions:
as.mo("lab_mo_ecoli") +#> [1] B_ESCHR_COLI + +mo_genus("lab_mo_kpneumoniae") +#> [1] "Klebsiella" + +# other input values still work too +as.mo(c("Escherichia coli", "E. coli", "lab_mo_ecoli")) +#> [1] B_ESCHR_COLI B_ESCHR_COLI B_ESCHR_COLI+ +
If we edit the Excel file by, let's say, adding row 4 like this:
| A | B | +--|--------------------|--------------| +1 | Organisation XYZ | mo | +2 | lab_mo_ecoli | B_ESCHR_COLI | +3 | lab_mo_kpneumoniae | B_KLBSL_PNMN | +4 | lab_Staph_aureus | B_STPHY_AURS | +5 | | | ++ +
...any new usage of an MO function in this package will update your data file:
as.mo("lab_mo_ecoli") +#> NOTE: Updated mo_source file '~/.mo_source.rds' from 'home/me/ourcodes.xlsx' +#> (columns "Organisation XYZ" and "mo") +#> [1] B_ESCHR_COLI + +mo_genus("lab_Staph_aureus") +#> [1] "Staphylococcus"+ +
To delete the reference data file, just use ""
, NULL
or FALSE
as input for set_mo_source()
:
set_mo_source(NULL) +# Removed mo_source file '~/.mo_source.rds'.+ +
If the original Excel file is moved or deleted, the mo_source file will be removed upon the next use of as.mo()
. If the mo_source file is manually deleted (i.e. without using set_mo_source()
), the references to the mo_source file will be removed upon the next use of as.mo()
.
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +Return the symbol related to the p-value: 0 '***
' 0.001 '**
' 0.01 '*
' 0.05 '.
' 0.1 ' ' 1. Values above p = 1
will return NA
.
p_symbol(p, emptychar = " ")+ +
p | +p value |
+
---|---|
emptychar | +text to show when |
+
Text
+
+The lifecycle of this function is questioning. This function might be no longer be optimal approach, or is it questionable whether this function should be in this AMR
package at all.
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +Performs a principal component analysis (PCA) based on a data set with automatic determination for afterwards plotting the groups and labels, and automatic filtering on only suitable (i.e. non-empty and numeric) variables.
+pca( + x, + ..., + retx = TRUE, + center = TRUE, + scale. = TRUE, + tol = NULL, + rank. = NULL +)+ +
x | +a data.frame containing numeric columns |
+
---|---|
... | +columns of |
+
retx | +a logical value indicating whether the rotated variables + should be returned. |
+
center | +a logical value indicating whether the variables
+ should be shifted to be zero centered. Alternately, a vector of
+ length equal the number of columns of |
+
scale. | +a logical value indicating whether the variables should
+ be scaled to have unit variance before the analysis takes
+ place. The default is |
+
tol | +a value indicating the magnitude below which components
+ should be omitted. (Components are omitted if their
+ standard deviations are less than or equal to |
+
rank. | +optionally, a number specifying the maximal rank, i.e.,
+ maximal number of principal components to be used. Can be set as
+ alternative or in addition to |
+
An object of classes pca and prcomp
+The pca()
function takes a data.frame as input and performs the actual PCA with the R function prcomp()
.
The result of the pca()
function is a prcomp object, with an additional attribute non_numeric_cols
which is a vector with the column names of all columns that do not contain numeric values. These are probably the groups and labels, and will be used by ggplot_pca()
.
+The lifecycle of this function is maturing. The unlying code of a maturing function has been roughed out, but finer details might still change. Since this function needs wider usage and more extensive testing, you are very welcome to suggest changes at our repository or write us an email (see section 'Contact Us').
# `example_isolates` is a dataset available in the AMR package. +# See ?example_isolates. + +if (FALSE) { +# calculate the resistance per group first +library(dplyr) +resistance_data <- example_isolates %>% + group_by(order = mo_order(mo), # group on anything, like order + genus = mo_genus(mo)) %>% # and genus as we do here + summarise_if(is.rsi, resistance) # then get resistance of all drugs + +# now conduct PCA for certain antimicrobial agents +pca_result <- resistance_data %>% + pca(AMC, CXM, CTX, CAZ, GEN, TOB, TMP, SXT) + +pca_result +summary(pca_result) +biplot(pca_result) +ggplot_pca(pca_result) # a new and convenient plot function +}+
These functions can be used to calculate the (co-)resistance or susceptibility of microbial isolates (i.e. percentage of S, SI, I, IR or R). All functions support quasiquotation with pipes, can be used in summarise()
from the dplyr
package and also support grouped variables, please see Examples.
resistance()
should be used to calculate resistance, susceptibility()
should be used to calculate susceptibility.
resistance(..., minimum = 30, as_percent = FALSE, only_all_tested = FALSE) + +susceptibility(..., minimum = 30, as_percent = FALSE, only_all_tested = FALSE) + +proportion_R(..., minimum = 30, as_percent = FALSE, only_all_tested = FALSE) + +proportion_IR(..., minimum = 30, as_percent = FALSE, only_all_tested = FALSE) + +proportion_I(..., minimum = 30, as_percent = FALSE, only_all_tested = FALSE) + +proportion_SI(..., minimum = 30, as_percent = FALSE, only_all_tested = FALSE) + +proportion_S(..., minimum = 30, as_percent = FALSE, only_all_tested = FALSE) + +proportion_df( + data, + translate_ab = "name", + language = get_locale(), + minimum = 30, + as_percent = FALSE, + combine_SI = TRUE, + combine_IR = FALSE +) + +rsi_df( + data, + translate_ab = "name", + language = get_locale(), + minimum = 30, + as_percent = FALSE, + combine_SI = TRUE, + combine_IR = FALSE +)+ +
... | +one or more vectors (or columns) with antibiotic interpretations. They will be transformed internally with |
+
---|---|
minimum | +the minimum allowed number of available (tested) isolates. Any isolate count lower than |
+
as_percent | +a logical to indicate whether the output must be returned as a hundred fold with % sign (a character). A value of |
+
only_all_tested | +(for combination therapies, i.e. using more than one variable for |
+
data | +a |
+
translate_ab | +a column name of the antibiotics data set to translate the antibiotic abbreviations to, using |
+
language | +language of the returned text, defaults to system language (see |
+
combine_SI | +a logical to indicate whether all values of S and I must be merged into one, so the output only consists of S+I vs. R (susceptible vs. resistant). This used to be the parameter |
+
combine_IR | +a logical to indicate whether all values of I and R must be merged into one, so the output only consists of S vs. I+R (susceptible vs. non-susceptible). This is outdated, see parameter |
+
M39 Analysis and Presentation of Cumulative Antimicrobial Susceptibility Test Data, 4th Edition, 2014, Clinical and Laboratory Standards Institute (CLSI). https://clsi.org/standards/products/microbiology/documents/m39/.
+A double
or, when as_percent = TRUE
, a character
.
The function resistance()
is equal to the function proportion_R()
. The function susceptibility()
is equal to the function proportion_SI()
.
Remember that you should filter your table to let it contain only first isolates! This is needed to exclude duplicates and to reduce selection bias. Use first_isolate()
to determine them in your data set.
These functions are not meant to count isolates, but to calculate the proportion of resistance/susceptibility. Use the count()
functions to count isolates. The function susceptibility()
is essentially equal to count_susceptible() / count_all()
. Low counts can influence the outcome - the proportion
functions may camouflage this, since they only return the proportion (albeit being dependent on the minimum
parameter).
The function proportion_df()
takes any variable from data
that has an rsi
class (created with as.rsi()
) and calculates the proportions R, I and S. It also supports grouped variables. The function rsi_df()
works exactly like proportion_df()
, but adds the number of isolates.
When using more than one variable for ...
(= combination therapy)), use only_all_tested
to only count isolates that are tested for all antibiotics/variables that you test them for. See this example for two antibiotics, Drug A and Drug B, about how susceptibility()
works to calculate the %SI:
-------------------------------------------------------------------- + only_all_tested = FALSE only_all_tested = TRUE + ----------------------- ----------------------- + Drug A Drug B include as include as include as include as + numerator denominator numerator denominator +-------- -------- ---------- ----------- ---------- ----------- + S or I S or I X X X X + R S or I X X X X + <NA> S or I X X - - + S or I R X X X X + R R - X - X + <NA> R - - - - + S or I <NA> X X - - + R <NA> - - - - + <NA> <NA> - - - - +-------------------------------------------------------------------- ++ +
Please note that, in combination therapies, for only_all_tested = TRUE
applies that:
count_S() + count_I() + count_R() = count_all() + proportion_S() + proportion_I() + proportion_R() = 1+ +
and that, in combination therapies, for only_all_tested = FALSE
applies that:
count_S() + count_I() + count_R() >= count_all() + proportion_S() + proportion_I() + proportion_R() >= 1 ++ +
Using only_all_tested
has no impact when only using one antibiotic as input.
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+In 2019, the European Committee on Antimicrobial Susceptibility Testing (EUCAST) has decided to change the definitions of susceptibility testing categories R and S/I as shown below (http://www.eucast.org/newsiandr/).
R = Resistant
+A microorganism is categorised as Resistant when there is a high likelihood of therapeutic failure even when there is increased exposure. Exposure is a function of how the mode of administration, dose, dosing interval, infusion time, as well as distribution and excretion of the antimicrobial agent will influence the infecting organism at the site of infection.
S = Susceptible
+A microorganism is categorised as Susceptible, standard dosing regimen, when there is a high likelihood of therapeutic success using a standard dosing regimen of the agent.
I = Increased exposure, but still susceptible
+A microorganism is categorised as Susceptible, Increased exposure when there is a high likelihood of therapeutic success because exposure to the agent is increased by adjusting the dosing regimen or by its concentration at the site of infection.
This AMR package honours this new insight. Use susceptibility()
(equal to proportion_SI()
) to determine antimicrobial susceptibility and count_susceptible()
(equal to count_SI()
) to count susceptible isolates.
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+count()
to count resistant and susceptible isolates.
# example_isolates is a data set available in the AMR package. +?example_isolates + +resistance(example_isolates$AMX) # determines %R +susceptibility(example_isolates$AMX) # determines %S+I + +# be more specific +proportion_S(example_isolates$AMX) +proportion_SI(example_isolates$AMX) +proportion_I(example_isolates$AMX) +proportion_IR(example_isolates$AMX) +proportion_R(example_isolates$AMX) + +if (require("dplyr")) { + example_isolates %>% + group_by(hospital_id) %>% + summarise(r = resistance(CIP), + n = n_rsi(CIP)) # n_rsi works like n_distinct in dplyr, see ?n_rsi + + example_isolates %>% + group_by(hospital_id) %>% + summarise(R = resistance(CIP, as_percent = TRUE), + SI = susceptibility(CIP, as_percent = TRUE), + n1 = count_all(CIP), # the actual total; sum of all three + n2 = n_rsi(CIP), # same - analogous to n_distinct + total = n()) # NOT the number of tested isolates! + + # Calculate co-resistance between amoxicillin/clav acid and gentamicin, + # so we can see that combination therapy does a lot more than mono therapy: + example_isolates %>% susceptibility(AMC) # %SI = 76.3% + example_isolates %>% count_all(AMC) # n = 1879 + + example_isolates %>% susceptibility(GEN) # %SI = 75.4% + example_isolates %>% count_all(GEN) # n = 1855 + + example_isolates %>% susceptibility(AMC, GEN) # %SI = 94.1% + example_isolates %>% count_all(AMC, GEN) # n = 1939 + + + # See Details on how `only_all_tested` works. Example: + example_isolates %>% + summarise(numerator = count_susceptible(AMC, GEN), + denominator = count_all(AMC, GEN), + proportion = susceptibility(AMC, GEN)) + + example_isolates %>% + summarise(numerator = count_susceptible(AMC, GEN, only_all_tested = TRUE), + denominator = count_all(AMC, GEN, only_all_tested = TRUE), + proportion = susceptibility(AMC, GEN, only_all_tested = TRUE)) + + + example_isolates %>% + group_by(hospital_id) %>% + summarise(cipro_p = susceptibility(CIP, as_percent = TRUE), + cipro_n = count_all(CIP), + genta_p = susceptibility(GEN, as_percent = TRUE), + genta_n = count_all(GEN), + combination_p = susceptibility(CIP, GEN, as_percent = TRUE), + combination_n = count_all(CIP, GEN)) + + # Get proportions S/I/R immediately of all rsi columns + example_isolates %>% + select(AMX, CIP) %>% + proportion_df(translate = FALSE) + + # It also supports grouping variables + example_isolates %>% + select(hospital_id, AMX, CIP) %>% + group_by(hospital_id) %>% + proportion_df(translate = FALSE) +} + +if (FALSE) { + # calculate current empiric combination therapy of Helicobacter gastritis: + my_table %>% + filter(first_isolate == TRUE, + genus == "Helicobacter") %>% + summarise(p = susceptibility(AMX, MTR), # amoxicillin with metronidazole + n = count_all(AMX, MTR)) +}+
Create a prediction model to predict antimicrobial resistance for the next years on statistical solid ground. Standard errors (SE) will be returned as columns se_min
and se_max
. See Examples for a real live example.
resistance_predict( + x, + col_ab, + col_date = NULL, + year_min = NULL, + year_max = NULL, + year_every = 1, + minimum = 30, + model = NULL, + I_as_S = TRUE, + preserve_measurements = TRUE, + info = interactive(), + ... +) + +rsi_predict( + x, + col_ab, + col_date = NULL, + year_min = NULL, + year_max = NULL, + year_every = 1, + minimum = 30, + model = NULL, + I_as_S = TRUE, + preserve_measurements = TRUE, + info = interactive(), + ... +) + +# S3 method for resistance_predict +plot(x, main = paste("Resistance Prediction of", x_name), ...) + +ggplot_rsi_predict( + x, + main = paste("Resistance Prediction of", x_name), + ribbon = TRUE, + ... +)+ +
x | +a |
+
---|---|
col_ab | +column name of |
+
col_date | +column name of the date, will be used to calculate years if this column doesn't consist of years already, defaults to the first column of with a date class |
+
year_min | +lowest year to use in the prediction model, dafaults to the lowest year in |
+
year_max | +highest year to use in the prediction model, defaults to 10 years after today |
+
year_every | +unit of sequence between lowest year found in the data and |
+
minimum | +minimal amount of available isolates per year to include. Years containing less observations will be estimated by the model. |
+
model | +the statistical model of choice. This could be a generalised linear regression model with binomial distribution (i.e. using `glm(..., family = binomial)``, assuming that a period of zero resistance was followed by a period of increasing resistance leading slowly to more and more resistance. See Details for all valid options. |
+
I_as_S | +a logical to indicate whether values |
+
preserve_measurements | +a logical to indicate whether predictions of years that are actually available in the data should be overwritten by the original data. The standard errors of those years will be |
+
info | +a logical to indicate whether textual analysis should be printed with the name and |
+
... | +parameters passed on to functions |
+
main | +title of the plot |
+
ribbon | +a logical to indicate whether a ribbon should be shown (default) or error bars |
+
A data.frame
with extra class resistance_predict
with columns:
year
value
, the same as estimated
when preserve_measurements = FALSE
, and a combination of observed
and estimated
otherwise
se_min
, the lower bound of the standard error with a minimum of 0
(so the standard error will never go below 0%)
se_max
the upper bound of the standard error with a maximum of 1
(so the standard error will never go above 100%)
observations
, the total number of available observations in that year, i.e. \(S + I + R\)
observed
, the original observed resistant percentages
estimated
, the estimated resistant percentages, calculated by the model
Furthermore, the model itself is available as an attribute: attributes(x)$model
, please see Examples.
Valid options for the statistical model (parameter model
) are:
"binomial"
or "binom"
or "logit"
: a generalised linear regression model with binomial distribution
"loglin"
or "poisson"
: a generalised log-linear regression model with poisson distribution
"lin"
or "linear"
: a linear regression model
+The lifecycle of this function is maturing. The unlying code of a maturing function has been roughed out, but finer details might still change. Since this function needs wider usage and more extensive testing, you are very welcome to suggest changes at our repository or write us an email (see section 'Contact Us').
In 2019, the European Committee on Antimicrobial Susceptibility Testing (EUCAST) has decided to change the definitions of susceptibility testing categories R and S/I as shown below (http://www.eucast.org/newsiandr/).
R = Resistant
+A microorganism is categorised as Resistant when there is a high likelihood of therapeutic failure even when there is increased exposure. Exposure is a function of how the mode of administration, dose, dosing interval, infusion time, as well as distribution and excretion of the antimicrobial agent will influence the infecting organism at the site of infection.
S = Susceptible
+A microorganism is categorised as Susceptible, standard dosing regimen, when there is a high likelihood of therapeutic success using a standard dosing regimen of the agent.
I = Increased exposure, but still susceptible
+A microorganism is categorised as Susceptible, Increased exposure when there is a high likelihood of therapeutic success because exposure to the agent is increased by adjusting the dosing regimen or by its concentration at the site of infection.
This AMR package honours this new insight. Use susceptibility()
(equal to proportion_SI()
) to determine antimicrobial susceptibility and count_susceptible()
(equal to count_SI()
) to count susceptible isolates.
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+The proportion()
functions to calculate resistance
x <- resistance_predict(example_isolates, + col_ab = "AMX", + year_min = 2010, + model = "binomial") +plot(x) +if (require("ggplot2")) { + ggplot_rsi_predict(x) +} + +# using dplyr: +if (require("dplyr")) { + x <- example_isolates %>% + filter_first_isolate() %>% + filter(mo_genus(mo) == "Staphylococcus") %>% + resistance_predict("PEN", model = "binomial") + plot(x) + + # get the model from the object + mymodel <- attributes(x)$model + summary(mymodel) +} + +# create nice plots with ggplot2 yourself +if (require(ggplot2) & require("dplyr")) { + + data <- example_isolates %>% + filter(mo == as.mo("E. coli")) %>% + resistance_predict(col_ab = "AMX", + col_date = "date", + model = "binomial", + info = FALSE, + minimum = 15) + + ggplot(data, + aes(x = year)) + + geom_col(aes(y = value), + fill = "grey75") + + geom_errorbar(aes(ymin = se_min, + ymax = se_max), + colour = "grey50") + + scale_y_continuous(limits = c(0, 1), + breaks = seq(0, 1, 0.1), + labels = paste0(seq(0, 100, 10), "%")) + + labs(title = expression(paste("Forecast of Amoxicillin Resistance in ", + italic("E. coli"))), + y = "%R", + x = "Year") + + theme_minimal(base_size = 13) +}+
Data set to interpret MIC and disk diffusion to R/SI values. Included guidelines are CLSI (2011-2019) and EUCAST (2011-2020). Use as.rsi()
to transform MICs or disks measurements to R/SI values.
rsi_translation
+
+
+ A data.frame
with 18,964 observations and 10 variables:
guideline
Name of the guideline
method
Either "MIC" or "DISK"
site
Body site, e.g. "Oral" or "Respiratory"
mo
Microbial ID, see as.mo()
ab
Antibiotic ID, see as.ab()
ref_tbl
Info about where the guideline rule can be found
disk_dose
Dose of the used disk diffusion method
breakpoint_S
Lowest MIC value or highest number of millimetres that leads to "S"
breakpoint_R
Highest MIC value or lowest number of millimetres that leads to "R"
uti
A logical value (TRUE
/FALSE
) to indicate whether the rule applies to a urinary tract infection (UTI)
The repository of this AMR
package contains a file comprising this exact data set: https://github.com/msberends/AMR/blob/master/data-raw/rsi_translation.txt. This file allows for machine reading EUCAST and CLSI guidelines, which is almost impossible with the Excel and PDF files distributed by EUCAST and CLSI. This file is updated automatically.
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +Skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean.
+When negative: the left tail is longer; the mass of the distribution is concentrated on the right of the figure. When positive: the right tail is longer; the mass of the distribution is concentrated on the left of the figure.
+skewness(x, na.rm = FALSE) + +# S3 method for default +skewness(x, na.rm = FALSE) + +# S3 method for matrix +skewness(x, na.rm = FALSE) + +# S3 method for data.frame +skewness(x, na.rm = FALSE)+ +
x | +a vector of values, a |
+
---|---|
na.rm | +a logical value indicating whether |
+
+The lifecycle of this function is questioning. This function might be no longer be optimal approach, or is it questionable whether this function should be in this AMR
package at all.
On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+For language-dependent output of AMR functions, like mo_name()
, mo_gramstain()
, mo_type()
and ab_name()
.
get_locale()
+
+
+ Strings will be translated to foreign languages if they are defined in a local translation file. Additions to this file can be suggested at our repository. The file can be found here: https://github.com/msberends/AMR/blob/master/data-raw/translations.tsv.
+Currently supported languages are (besides English): Dutch, French, German, Italian, Portuguese, Spanish. Please note that currently not all these languages have translations available for all antimicrobial agents and colloquial microorganism names.
+Please suggest your own translations by creating a new issue on our repository.
+This file will be read by all functions where a translated output can be desired, like all mo_property()
functions (mo_name()
, mo_gramstain()
, mo_type()
, etc.).
The system language will be used at default, if that language is supported. The system language can be overwritten with Sys.setenv(AMR_locale = yourlanguage)
.
+The lifecycle of this function is stable. In a stable function, major changes are unlikely. This means that the unlying code will generally evolve by adding new arguments; removing arguments or changing the meaning of existing arguments will be avoided.
If the unlying code needs breaking changes, they will occur gradually. For example, a parameter will be deprecated and first continue to work, but will emit an message informing you of the change. Next, typically after at least one newly released version on CRAN, the message will be transformed to an error.
+On our website https://msberends.github.io/AMR you can find a comprehensive tutorial about how to conduct AMR analysis, the complete documentation of all functions (which reads a lot easier than here in R) and an example analysis using WHONET data.
+ +# The 'language' parameter of below functions +# will be set automatically to your system language +# with get_locale() + +# English +mo_name("CoNS", language = "en") +#> "Coagulase-negative Staphylococcus (CoNS)" + +# German +mo_name("CoNS", language = "de") +#> "Koagulase-negative Staphylococcus (KNS)" + +# Dutch +mo_name("CoNS", language = "nl") +#> "Coagulase-negatieve Staphylococcus (CNS)" + +# Spanish +mo_name("CoNS", language = "es") +#> "Staphylococcus coagulasa negativo (SCN)" + +# Italian +mo_name("CoNS", language = "it") +#> "Staphylococcus negativo coagulasi (CoNS)" + +# Portuguese +mo_name("CoNS", language = "pt") +#> "Staphylococcus coagulase negativo (CoNS)"+
AMR
(for R). Developed at the University of Groningen in collaboration with non-profit organisations Certe Medical Diagnostics and Advice and University Medical Center Groningen.