diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 00000000..8b137891 --- /dev/null +++ b/.nojekyll @@ -0,0 +1 @@ + diff --git a/404.html b/404.html new file mode 100644 index 00000000..c8d2df12 --- /dev/null +++ b/404.html @@ -0,0 +1,189 @@ + + +
+ + + + +GNU GENERAL PUBLIC LICENSE +Version 2, June 1991 + +Copyright (C) 1989, 1991 Free Software Foundation, Inc., <http://fsf.org/> + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +Everyone is permitted to copy and distribute verbatim copies +of this license document, but changing it is not allowed. + +A SUMMARY OF THIS LICENSE BY THE ORIGINAL AUTHORS OF THE AMR R PACKAGE + +This R package, with package name 'AMR': +- May be used for commercial purposes +- May be used for private purposes +- May NOT be used for patent purposes +- May be modified, although: + - Modifications MUST be released under the same license when distributing the package + - Changes made to the code MUST be documented +- May be distributed, although: + - Source code MUST be made available when the package is distributed + - A copy of the license and copyright notice MUST be included with the package. +- Comes with a LIMITATION of liability +- Comes with NO warranty + +END OF THE SUMMARY + + +GNU GENERAL PUBLIC LICENSE +TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + +0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + +1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + +2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + +3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + +4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + +5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + +6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + +7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + +8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + +9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + +10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + +NO WARRANTY + +11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + +12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + +END OF TERMS AND CONDITIONS ++ +
vignettes/AMR.Rmd
+ AMR.Rmd
Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 21 August 2022.
+Conducting AMR data analysis unfortunately requires in-depth knowledge from different scientific fields, which makes it hard to do right. At least, it requires:
+Of course, we cannot instantly provide you with knowledge and experience. But with this AMR
package, we aimed at providing (1) tools to simplify antimicrobial resistance data cleaning, transformation and analysis, (2) methods to easily incorporate international guidelines and (3) scientifically reliable reference data, including the requirements mentioned above.
The AMR
package enables standardised and reproducible AMR data analysis, with the application of evidence-based rules, determination of first isolates, translation of various codes for microorganisms and antimicrobial agents, determination of (multi-drug) resistant microorganisms, and calculation of antimicrobial resistance, prevalence and future trends.
For this tutorial, we will create fake demonstration data to work with.
+You can skip to Cleaning the data if you already have your own data ready. If you start your analysis, try to make the structure of your data generally look like this:
+date | +patient_id | +mo | +AMX | +CIP | +
---|---|---|---|---|
2022-08-21 | +abcd | +Escherichia coli | +S | +S | +
2022-08-21 | +abcd | +Escherichia coli | +S | +R | +
2022-08-21 | +efgh | +Escherichia coli | +R | +S | +
As with many uses in R, we need some additional packages for AMR data analysis. Our package works closely together with the tidyverse packages dplyr
and ggplot2
by RStudio. The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.
We will also use the cleaner
package, that can be used for cleaning data and creating frequency tables.
We will create some fake example data to use for analysis. For AMR data analysis, we need at least: a patient ID, name or code of a microorganism, a date and antimicrobial results (an antibiogram). It could also include a specimen type (e.g. to filter on blood or urine), the ward type (e.g. to filter on ICUs).
+With additional columns (like a hospital name, the patients gender of even [well-defined] clinical properties) you can do a comparative analysis, as this tutorial will demonstrate too.
+To start with patients, we need a unique list of patients.
+ +The LETTERS
object is available in R - it’s a vector with 26 characters: A
to Z
. The patients
object we just created is now a vector of length 260, with values (patient IDs) varying from A1
to Z10
. Now we we also set the gender of our patients, by putting the ID and the gender in a table:
+patients_table <- data.frame(patient_id = patients,
+ gender = c(rep("M", 135),
+ rep("F", 125)))
The first 135 patient IDs are now male, the other 125 are female.
+Let’s pretend that our data consists of blood cultures isolates from between 1 January 2010 and 1 January 2018.
+ +This dates
object now contains all days in our date range.
For this tutorial, we will uses four different microorganisms: Escherichia coli, Staphylococcus aureus, Streptococcus pneumoniae, and Klebsiella pneumoniae:
+
+bacteria <- c("Escherichia coli", "Staphylococcus aureus",
+ "Streptococcus pneumoniae", "Klebsiella pneumoniae")
Using the sample()
function, we can randomly select items from all objects we defined earlier. To let our fake data reflect reality a bit, we will also approximately define the probabilities of bacteria and the antibiotic results, using the random_rsi()
function.
+sample_size <- 20000
+data <- data.frame(date = sample(dates, size = sample_size, replace = TRUE),
+ patient_id = sample(patients, size = sample_size, replace = TRUE),
+ hospital = sample(c("Hospital A",
+ "Hospital B",
+ "Hospital C",
+ "Hospital D"),
+ size = sample_size, replace = TRUE,
+ prob = c(0.30, 0.35, 0.15, 0.20)),
+ bacteria = sample(bacteria, size = sample_size, replace = TRUE,
+ prob = c(0.50, 0.25, 0.15, 0.10)),
+ AMX = random_rsi(sample_size, prob_RSI = c(0.35, 0.60, 0.05)),
+ AMC = random_rsi(sample_size, prob_RSI = c(0.15, 0.75, 0.10)),
+ CIP = random_rsi(sample_size, prob_RSI = c(0.20, 0.80, 0.00)),
+ GEN = random_rsi(sample_size, prob_RSI = c(0.08, 0.92, 0.00)))
Using the left_join()
function from the dplyr
package, we can ‘map’ the gender to the patient ID using the patients_table
object we created earlier:
The resulting data set contains 20,000 blood culture isolates. With the head()
function we can preview the first 6 rows of this data set:
+head(data)
date | +patient_id | +hospital | +bacteria | +AMX | +AMC | +CIP | +GEN | +gender | +
---|---|---|---|---|---|---|---|---|
2014-01-13 | +A7 | +Hospital C | +Staphylococcus aureus | +R | +S | +R | +S | +M | +
2010-07-17 | +K10 | +Hospital B | +Escherichia coli | +R | +R | +S | +S | +M | +
2013-12-06 | +Z6 | +Hospital A | +Streptococcus pneumoniae | +R | +S | +R | +S | +F | +
2013-09-04 | +U4 | +Hospital C | +Klebsiella pneumoniae | +R | +S | +R | +S | +F | +
2010-05-09 | +B3 | +Hospital A | +Staphylococcus aureus | +S | +S | +R | +S | +M | +
2013-12-21 | +U3 | +Hospital B | +Escherichia coli | +S | +R | +S | +R | +F | +
Now, let’s start the cleaning and the analysis!
+We also created a package dedicated to data cleaning and checking, called the cleaner
package. It freq()
function can be used to create frequency tables.
For example, for the gender
variable:
Frequency table
+Class: character
+Length: 20,000
+Available: 20,000 (100.0%, NA: 0 = 0.0%)
+Unique: 2
Shortest: 1
+Longest: 1
+ | Item | +Count | +Percent | +Cum. Count | +Cum. Percent | +
---|---|---|---|---|---|
1 | +M | +10,403 | +52.02% | +10,403 | +52.02% | +
2 | +F | +9,597 | +47.99% | +20,000 | +100.00% | +
So, we can draw at least two conclusions immediately. From a data scientists perspective, the data looks clean: only values M
and F
. From a researchers perspective: there are slightly more men. Nothing we didn’t already know.
The data is already quite clean, but we still need to transform some variables. The bacteria
column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate()
function of the dplyr
package makes this really easy:
We also want to transform the antibiotics, because in real life data we don’t know if they are really clean. The as.rsi()
function ensures reliability and reproducibility in these kind of variables. The is.rsi.eligible()
can check which columns are probably columns with R/SI test results. Using mutate()
and across()
, we can apply the transformation to the formal <rsi>
class:
+is.rsi.eligible(data)
+# [1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE
+colnames(data)[is.rsi.eligible(data)]
+# [1] "AMX" "AMC" "CIP" "GEN"
+
+data <- data %>%
+ mutate(across(where(is.rsi.eligible), as.rsi))
Finally, we will apply EUCAST rules on our antimicrobial results. In Europe, most medical microbiological laboratories already apply these rules. Our package features their latest insights on intrinsic resistance and exceptional phenotypes. Moreover, the eucast_rules()
function can also apply additional rules, like forcing
Because the amoxicillin (column AMX
) and amoxicillin/clavulanic acid (column AMC
) in our data were generated randomly, some rows will undoubtedly contain AMX = S and AMC = R, which is technically impossible. The eucast_rules()
fixes this:
+data <- eucast_rules(data, col_mo = "bacteria", rules = "all")
Now that we have the microbial ID, we can add some taxonomic properties:
+
+data <- data %>%
+ mutate(gramstain = mo_gramstain(bacteria),
+ genus = mo_genus(bacteria),
+ species = mo_species(bacteria))
We also need to know which isolates we can actually use for analysis.
+To conduct an analysis of antimicrobial resistance, you must only include the first isolate of every patient per episode (Hindler et al., Clin Infect Dis. 2007). If you would not do this, you could easily get an overestimate or underestimate of the resistance of an antibiotic. Imagine that a patient was admitted with an MRSA and that it was found in 5 different blood cultures the following weeks (yes, some countries like the Netherlands have these blood drawing policies). The resistance percentage of oxacillin of all isolates would be overestimated, because you included this MRSA more than once. It would clearly be selection bias.
+The Clinical and Laboratory Standards Institute (CLSI) appoints this as follows:
+++(…) When preparing a cumulative antibiogram to guide clinical decisions about empirical antimicrobial therapy of initial infections, only the first isolate of a given species per patient, per analysis period (eg, one year) should be included, irrespective of body site, antimicrobial susceptibility profile, or other phenotypical characteristics (eg, biotype). The first isolate is easily identified, and cumulative antimicrobial susceptibility test data prepared using the first isolate are generally comparable to cumulative antimicrobial susceptibility test data calculated by other methods, providing duplicate isolates are excluded.
+
M39-A4 Analysis and Presentation of Cumulative Antimicrobial Susceptibility Test Data, 4th Edition. CLSI, 2014. Chapter 6.4
This AMR
package includes this methodology with the first_isolate()
function and is able to apply the four different methods as defined by Hindler et al. in 2007: phenotype-based, episode-based, patient-based, isolate-based. The right method depends on your goals and analysis, but the default phenotype-based method is in any case the method to properly correct for most duplicate isolates. This method also takes into account the antimicrobial susceptibility test results using all_microbials()
. Read more about the methods on the first_isolate()
page.
The outcome of the function can easily be added to our data:
+
+data <- data %>%
+ mutate(first = first_isolate(info = TRUE))
+# Determining first isolates using an episode length of 365 days
+# ℹ Using column 'bacteria' as input for `col_mo`.
+# ℹ Using column 'date' as input for `col_date`.
+# ℹ Using column 'patient_id' as input for `col_patient_id`.
+# Basing inclusion on all antimicrobial results, using a points threshold of
+# 2
+# => Found 10,723 'phenotype-based' first isolates (53.6% of total where a
+# microbial ID was available)
So only 53.6% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
For future use, the above two syntaxes can be shortened:
+
+data_1st <- data %>%
+ filter_first_isolate()
So we end up with 10,723 isolates for analysis. Now our data looks like:
+
+head(data_1st)
+ | date | +patient_id | +hospital | +bacteria | +AMX | +AMC | +CIP | +GEN | +gender | +gramstain | +genus | +species | +first | +
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | +2014-01-13 | +A7 | +Hospital C | +B_STPHY_AURS | +R | +S | +R | +S | +M | +Gram-positive | +Staphylococcus | +aureus | +TRUE | +
3 | +2013-12-06 | +Z6 | +Hospital A | +B_STRPT_PNMN | +R | +R | +R | +R | +F | +Gram-positive | +Streptococcus | +pneumoniae | +TRUE | +
5 | +2010-05-09 | +B3 | +Hospital A | +B_STPHY_AURS | +S | +S | +R | +S | +M | +Gram-positive | +Staphylococcus | +aureus | +TRUE | +
6 | +2013-12-21 | +U3 | +Hospital B | +B_ESCHR_COLI | +R | +R | +S | +R | +F | +Gram-negative | +Escherichia | +coli | +TRUE | +
7 | +2011-06-10 | +D8 | +Hospital D | +B_STRPT_PNMN | +S | +S | +S | +R | +M | +Gram-positive | +Streptococcus | +pneumoniae | +TRUE | +
9 | +2012-03-02 | +K7 | +Hospital B | +B_STRPT_PNMN | +R | +R | +S | +R | +M | +Gram-positive | +Streptococcus | +pneumoniae | +TRUE | +
Time for the analysis!
+You might want to start by getting an idea of how the data is distributed. It’s an important start, because it also decides how you will continue your analysis. Although this package contains a convenient function to make frequency tables, exploratory data analysis (EDA) is not the primary scope of this package. Use a package like DataExplorer
for that, or read the free online book Exploratory Data Analysis with R by Roger D. Peng.
To just get an idea how the species are distributed, create a frequency table with our freq()
function. We created the genus
and species
column earlier based on the microbial ID. With paste()
, we can concatenate them together.
The freq()
function can be used like the base R language was intended:
Or can be used like the dplyr
way, which is easier readable:
Frequency table
+Class: character
+Length: 10,723
+Available: 10,723 (100.0%, NA: 0 = 0.0%)
+Unique: 4
Shortest: 16
+Longest: 24
+ | Item | +Count | +Percent | +Cum. Count | +Cum. Percent | +
---|---|---|---|---|---|
1 | +Escherichia coli | +4,683 | +43.67% | +4,683 | +43.67% | +
2 | +Staphylococcus aureus | +2,727 | +25.43% | +7,410 | +69.10% | +
3 | +Streptococcus pneumoniae | +2,104 | +19.62% | +9,514 | +88.73% | +
4 | +Klebsiella pneumoniae | +1,209 | +11.27% | +10,723 | +100.00% | +
Using tidyverse selections, you can also select or filter columns based on the antibiotic class they are in:
+
+data_1st %>%
+ filter(any(aminoglycosides() == "R"))
# ℹ For `aminoglycosides()` using column 'GEN' (gentamicin)
+date | +patient_id | +hospital | +bacteria | +AMX | +AMC | +CIP | +GEN | +gender | +gramstain | +genus | +species | +first | +
---|---|---|---|---|---|---|---|---|---|---|---|---|
2013-12-06 | +Z6 | +Hospital A | +B_STRPT_PNMN | +R | +R | +R | +R | +F | +Gram-positive | +Streptococcus | +pneumoniae | +TRUE | +
2013-12-21 | +U3 | +Hospital B | +B_ESCHR_COLI | +R | +R | +S | +R | +F | +Gram-negative | +Escherichia | +coli | +TRUE | +
2011-06-10 | +D8 | +Hospital D | +B_STRPT_PNMN | +S | +S | +S | +R | +M | +Gram-positive | +Streptococcus | +pneumoniae | +TRUE | +
2012-03-02 | +K7 | +Hospital B | +B_STRPT_PNMN | +R | +R | +S | +R | +M | +Gram-positive | +Streptococcus | +pneumoniae | +TRUE | +
2012-12-25 | +W10 | +Hospital B | +B_STRPT_PNMN | +S | +S | +S | +R | +F | +Gram-positive | +Streptococcus | +pneumoniae | +TRUE | +
2011-02-13 | +E5 | +Hospital A | +B_STRPT_PNMN | +R | +R | +S | +R | +M | +Gram-positive | +Streptococcus | +pneumoniae | +TRUE | +
If you want to get a quick glance of the number of isolates in different bug/drug combinations, you can use the bug_drug_combinations()
function:
+data_1st %>%
+ bug_drug_combinations() %>%
+ head() # show first 6 rows
# ℹ Using column 'bacteria' as input for `col_mo`.
+mo | +ab | +S | +I | +R | +total | +
---|---|---|---|---|---|
E. coli | +AMX | +2228 | +121 | +2334 | +4683 | +
E. coli | +AMC | +3423 | +156 | +1104 | +4683 | +
E. coli | +CIP | +3439 | +0 | +1244 | +4683 | +
E. coli | +GEN | +4064 | +0 | +619 | +4683 | +
K. pneumoniae | +AMX | +0 | +0 | +1209 | +1209 | +
K. pneumoniae | +AMC | +960 | +46 | +203 | +1209 | +
+data_1st %>%
+ select(bacteria, aminoglycosides()) %>%
+ bug_drug_combinations()
# ℹ For `aminoglycosides()` using column 'GEN' (gentamicin)
+# ℹ Using column 'bacteria' as input for `col_mo`.
+mo | +ab | +S | +I | +R | +total | +
---|---|---|---|---|---|
E. coli | +GEN | +4064 | +0 | +619 | +4683 | +
K. pneumoniae | +GEN | +1075 | +0 | +134 | +1209 | +
S. aureus | +GEN | +2421 | +0 | +306 | +2727 | +
S. pneumoniae | +GEN | +0 | +0 | +2104 | +2104 | +
This will only give you the crude numbers in the data. To calculate antimicrobial resistance in a more sensible way, also by correcting for too few results, we use the resistance()
and susceptibility()
functions.
The functions resistance()
and susceptibility()
can be used to calculate antimicrobial resistance or susceptibility. For more specific analyses, the functions proportion_S()
, proportion_SI()
, proportion_I()
, proportion_IR()
and proportion_R()
can be used to determine the proportion of a specific antimicrobial outcome.
All these functions contain a minimum
argument, denoting the minimum required number of test results for returning a value. These functions will otherwise return NA
. The default is minimum = 30
, following the CLSI M39-A4 guideline for applying microbial epidemiology.
As per the EUCAST guideline of 2019, we calculate resistance as the proportion of R (proportion_R()
, equal to resistance()
) and susceptibility as the proportion of S and I (proportion_SI()
, equal to susceptibility()
). These functions can be used on their own:
+data_1st %>% resistance(AMX)
+# [1] 0.5465821
Or can be used in conjunction with group_by()
and summarise()
, both from the dplyr
package:
+data_1st %>%
+ group_by(hospital) %>%
+ summarise(amoxicillin = resistance(AMX))
hospital | +amoxicillin | +
---|---|
Hospital A | +0.5332098 | +
Hospital B | +0.5497618 | +
Hospital C | +0.5499058 | +
Hospital D | +0.5588652 | +
Of course it would be very convenient to know the number of isolates responsible for the percentages. For that purpose the n_rsi()
can be used, which works exactly like n_distinct()
from the dplyr
package. It counts all isolates available for every group (i.e. values S, I or R):
+data_1st %>%
+ group_by(hospital) %>%
+ summarise(amoxicillin = resistance(AMX),
+ available = n_rsi(AMX))
hospital | +amoxicillin | +available | +
---|---|---|
Hospital A | +0.5332098 | +3237 | +
Hospital B | +0.5497618 | +3778 | +
Hospital C | +0.5499058 | +1593 | +
Hospital D | +0.5588652 | +2115 | +
These functions can also be used to get the proportion of multiple antibiotics, to calculate empiric susceptibility of combination therapies very easily:
+
+data_1st %>%
+ group_by(genus) %>%
+ summarise(amoxiclav = susceptibility(AMC),
+ gentamicin = susceptibility(GEN),
+ amoxiclav_genta = susceptibility(AMC, GEN))
genus | +amoxiclav | +gentamicin | +amoxiclav_genta | +
---|---|---|---|
Escherichia | +0.7642537 | +0.8678198 | +0.9750160 | +
Klebsiella | +0.8320926 | +0.8891646 | +0.9743590 | +
Staphylococcus | +0.7920792 | +0.8877888 | +0.9845985 | +
Streptococcus | +0.5361217 | +0.0000000 | +0.5361217 | +
Or if you are curious for the resistance within certain antibiotic classes, use a antibiotic class selector such as penicillins()
, which automatically will include the columns AMX
and AMC
of our data:
+data_1st %>%
+ # group by hospital
+ group_by(hospital) %>%
+ # / -> select all penicillins in the data for calculation
+ # | / -> use resistance() for all peni's per hospital
+ # | | / -> print as percentages
+ summarise(across(penicillins(), resistance, as_percent = TRUE)) %>%
+ # format the antibiotic column names, using so-called snake case,
+ # so 'Amoxicillin/clavulanic acid' becomes 'amoxicillin_clavulanic_acid'
+ rename_with(set_ab_names, penicillins())
hospital | +amoxicillin | +amoxicillin_clavulanic_acid | +
---|---|---|
Hospital A | +53.3% | +26.2% | +
Hospital B | +55.0% | +27.0% | +
Hospital C | +55.0% | +25.5% | +
Hospital D | +55.9% | +27.3% | +
To make a transition to the next part, let’s see how differences in the previously calculated combination therapies could be plotted:
+
+data_1st %>%
+ group_by(genus) %>%
+ summarise("1. Amoxi/clav" = susceptibility(AMC),
+ "2. Gentamicin" = susceptibility(GEN),
+ "3. Amoxi/clav + genta" = susceptibility(AMC, GEN)) %>%
+ # pivot_longer() from the tidyr package "lengthens" data:
+ tidyr::pivot_longer(-genus, names_to = "antibiotic") %>%
+ ggplot(aes(x = genus,
+ y = value,
+ fill = antibiotic)) +
+ geom_col(position = "dodge2")
To show results in plots, most R users would nowadays use the ggplot2
package. This package lets you create plots in layers. You can read more about it on their website. A quick example would look like these syntaxes:
+ggplot(data = a_data_set,
+ mapping = aes(x = year,
+ y = value)) +
+ geom_col() +
+ labs(title = "A title",
+ subtitle = "A subtitle",
+ x = "My X axis",
+ y = "My Y axis")
+
+# or as short as:
+ggplot(a_data_set) +
+ geom_bar(aes(year))
The AMR
package contains functions to extend this ggplot2
package, for example geom_rsi()
. It automatically transforms data with count_df()
or proportion_df()
and show results in stacked bars. Its simplest and shortest example:
Omit the translate_ab = FALSE
to have the antibiotic codes (AMX, AMC, CIP, GEN) translated to official WHO names (amoxicillin, amoxicillin/clavulanic acid, ciprofloxacin, gentamicin).
If we group on e.g. the genus
column and add some additional functions from our package, we can create this:
+# group the data on `genus`
+ggplot(data_1st %>% group_by(genus)) +
+ # create bars with genus on x axis
+ # it looks for variables with class `rsi`,
+ # of which we have 4 (earlier created with `as.rsi`)
+ geom_rsi(x = "genus") +
+ # split plots on antibiotic
+ facet_rsi(facet = "antibiotic") +
+ # set colours to the R/SI interpretations (colour-blind friendly)
+ scale_rsi_colours() +
+ # show percentages on y axis
+ scale_y_percent(breaks = 0:4 * 25) +
+ # turn 90 degrees, to make it bars instead of columns
+ coord_flip() +
+ # add labels
+ labs(title = "Resistance per genus and antibiotic",
+ subtitle = "(this is fake data)") +
+ # and print genus in italic to follow our convention
+ # (is now y axis because we turned the plot)
+ theme(axis.text.y = element_text(face = "italic"))
To simplify this, we also created the ggplot_rsi()
function, which combines almost all above functions:
+data_1st %>%
+ group_by(genus) %>%
+ ggplot_rsi(x = "genus",
+ facet = "antibiotic",
+ breaks = 0:4 * 25,
+ datalabels = FALSE) +
+ coord_flip()
The AMR package also extends the plot()
and ggplot2::autoplot()
functions for plotting minimum inhibitory concentrations (MIC, created with as.mic()
) and disk diffusion diameters (created with as.disk()
).
With the random_mic()
and random_disk()
functions, we can generate sampled values for the new data types (S3 classes) <mic>
and <disk>
:
+mic_values <- random_mic(size = 100)
+mic_values
+# Class <mic>
+# [1] 32 64 128 128 64 8 32 0.002 0.025
+# [10] 0.5 32 <=0.001 2 128 0.01 1 2 0.01
+# [19] 128 0.5 4 0.002 0.125 256 0.0625 4 0.002
+# [28] 0.005 4 0.002 32 0.01 0.0625 0.25 2 64
+# [37] 0.0625 0.5 256 4 1 0.025 128 0.0625 16
+# [46] 0.5 0.005 0.01 1 2 0.01 2 8 0.125
+# [55] 0.25 256 0.0625 0.025 0.002 8 <=0.001 1 0.002
+# [64] 256 8 4 0.005 0.025 4 0.002 <=0.001 2
+# [73] 256 0.005 256 2 4 16 128 256 <=0.001
+# [82] 1 2 1 1 <=0.001 0.5 4 0.25 32
+# [91] 0.002 0.0625 0.01 32 0.005 0.01 2 64 64
+# [100] 2
+# base R:
+plot(mic_values)
+# ggplot2:
+autoplot(mic_values)
But we could also be more specific, by generating MICs that are likely to be found in E. coli for ciprofloxacin:
+
+mic_values <- random_mic(size = 100, mo = "E. coli", ab = "cipro")
For the plot()
and autoplot()
function, we can define the microorganism and an antimicrobial agent the same way. This will add the interpretation of those values according to a chosen guidelines (defaults to the latest EUCAST guideline).
Default colours are colour-blind friendly, while maintaining the convention that e.g. ‘susceptible’ should be green and ‘resistant’ should be red:
+
+# base R:
+plot(mic_values, mo = "E. coli", ab = "cipro")
+# ggplot2:
+autoplot(mic_values, mo = "E. coli", ab = "cipro")
For disk diffusion values, there is not much of a difference in plotting:
+
+disk_values <- random_disk(size = 100, mo = "E. coli", ab = "cipro")
+# ℹ Function `as.mo()` is uncertain about "E. coli" (assuming Escherichia
+# coli). Run `mo_uncertainties()` to review this.
+disk_values
+# Class <disk>
+# [1] 28 22 30 19 23 20 17 21 22 18 18 27 18 20 24 28 30 20 29 31 22 31 23 24 17
+# [26] 23 18 17 25 30 28 23 30 23 30 31 18 26 26 26 27 17 30 18 27 31 17 24 20 26
+# [51] 30 21 26 27 20 23 28 29 17 29 27 24 22 27 28 27 22 30 18 28 25 18 25 18 31
+# [76] 29 25 17 31 20 20 21 24 24 30 28 30 31 20 30 27 19 24 17 22 30 31 27 28 24
+# base R:
+plot(disk_values, mo = "E. coli", ab = "cipro")
And when using the ggplot2
package, but now choosing the latest implemented CLSI guideline (notice that the EUCAST-specific term “Susceptible, incr. exp.” has changed to “Intermediate”):
+autoplot(disk_values,
+ mo = "E. coli",
+ ab = "cipro",
+ guideline = "CLSI")
The next example uses the example_isolates
data set. This is a data set included with this package and contains 2,000 microbial isolates with their full antibiograms. It reflects reality and can be used to practice AMR data analysis.
We will compare the resistance to fosfomycin (column FOS
) in hospital A and D. The input for the fisher.test()
can be retrieved with a transformation like this:
+# use package 'tidyr' to pivot data:
+library(tidyr)
+
+check_FOS <- example_isolates %>%
+ filter(hospital_id %in% c("A", "D")) %>% # filter on only hospitals A and D
+ select(hospital_id, FOS) %>% # select the hospitals and fosfomycin
+ group_by(hospital_id) %>% # group on the hospitals
+ count_df(combine_SI = TRUE) %>% # count all isolates per group (hospital_id)
+ pivot_wider(names_from = hospital_id, # transform output so A and D are columns
+ values_from = value) %>%
+ select(A, D) %>% # and only select these columns
+ as.matrix() # transform to a good old matrix for fisher.test()
+
+check_FOS
+# A D
+# [1,] 25 77
+# [2,] 24 33
We can apply the test now with:
+
+# do Fisher's Exact Test
+fisher.test(check_FOS)
+#
+# Fisher's Exact Test for Count Data
+#
+# data: check_FOS
+# p-value = 0.03104
+# alternative hypothesis: true odds ratio is not equal to 1
+# 95 percent confidence interval:
+# 0.2111489 0.9485124
+# sample estimates:
+# odds ratio
+# 0.4488318
As can be seen, the p value is 0.031, which means that the fosfomycin resistance found in isolates from patients in hospital A and D are really different.
+What are EUCAST rules? The European Committee on Antimicrobial Susceptibility Testing (EUCAST) states on their website:
+++EUCAST expert rules are a tabulated collection of expert knowledge on intrinsic resistances, exceptional resistance phenotypes and interpretive rules that may be applied to antimicrobial susceptibility testing in order to reduce errors and make appropriate recommendations for reporting particular resistances.
+
In Europe, a lot of medical microbiological laboratories already apply these rules (Brown et al., 2015). Our package features their latest insights on intrinsic resistance and unusual phenotypes (v3.3, 2021).
+Moreover, the eucast_rules()
function we use for this purpose can also apply additional rules, like forcing
These rules can be used to discard impossible bug-drug combinations in your data. For example, Klebsiella produces beta-lactamase that prevents ampicillin (or amoxicillin) from working against it. In other words, practically every strain of Klebsiella is resistant to ampicillin.
+Sometimes, laboratory data can still contain such strains with ampicillin being susceptible to ampicillin. This could be because an antibiogram is available before an identification is available, and the antibiogram is then not re-interpreted based on the identification (namely, Klebsiella). EUCAST expert rules solve this, that can be applied using eucast_rules()
:
+oops <- data.frame(mo = c("Klebsiella",
+ "Escherichia"),
+ ampicillin = "S")
+oops
+# mo ampicillin
+# 1 Klebsiella S
+# 2 Escherichia S
+
+eucast_rules(oops, info = FALSE)
+# mo ampicillin
+# 1 Klebsiella R
+# 2 Escherichia S
A more convenient function is mo_is_intrinsic_resistant()
that uses the same guideline, but allows to check for one or more specific microorganisms or antibiotics:
+mo_is_intrinsic_resistant(c("Klebsiella", "Escherichia"),
+ "ampicillin")
+# [1] TRUE FALSE
+
+mo_is_intrinsic_resistant("Klebsiella",
+ c("ampicillin", "kanamycin"))
+# [1] TRUE FALSE
EUCAST rules can not only be used for correction, they can also be used for filling in known resistance and susceptibility based on results of other antimicrobials drugs. This process is called interpretive reading, is basically a form of imputation, and is part of the eucast_rules()
function as well:
+data <- data.frame(mo = c("Staphylococcus aureus",
+ "Enterococcus faecalis",
+ "Escherichia coli",
+ "Klebsiella pneumoniae",
+ "Pseudomonas aeruginosa"),
+ VAN = "-", # Vancomycin
+ AMX = "-", # Amoxicillin
+ COL = "-", # Colistin
+ CAZ = "-", # Ceftazidime
+ CXM = "-", # Cefuroxime
+ PEN = "S", # Benzylenicillin
+ FOX = "S", # Cefoxitin
+ stringsAsFactors = FALSE)
+data
mo | +VAN | +AMX | +COL | +CAZ | +CXM | +PEN | +FOX | +
---|---|---|---|---|---|---|---|
Staphylococcus aureus | +- | +- | +- | +- | +- | +S | +S | +
Enterococcus faecalis | +- | +- | +- | +- | +- | +S | +S | +
Escherichia coli | +- | +- | +- | +- | +- | +S | +S | +
Klebsiella pneumoniae | +- | +- | +- | +- | +- | +S | +S | +
Pseudomonas aeruginosa | +- | +- | +- | +- | +- | +S | +S | +
+eucast_rules(data)
mo | +VAN | +AMX | +COL | +CAZ | +CXM | +PEN | +FOX | +
---|---|---|---|---|---|---|---|
Staphylococcus aureus | +- | +S | +R | +R | +S | +S | +S | +
Enterococcus faecalis | +- | +- | +R | +R | +R | +S | +R | +
Escherichia coli | +R | +- | +- | +- | +- | +R | +S | +
Klebsiella pneumoniae | +R | +R | +- | +- | +- | +R | +S | +
Pseudomonas aeruginosa | +R | +R | +- | +- | +R | +R | +R | +
With the function mdro()
, you can determine which micro-organisms are multi-drug resistant organisms (MDRO).
The mdro()
function takes a data set as input, such as a regular data.frame
. It tries to automatically determine the right columns for info about your isolates, such as the name of the species and all columns with results of antimicrobial agents. See the help page for more info about how to set the right settings for your data with the command ?mdro
.
For WHONET data (and most other data), all settings are automatically set correctly.
+The mdro()
function support multiple guidelines. You can select a guideline with the guideline
parameter. Currently supported guidelines are (case-insensitive):
guideline = "CMI2012"
(default)
Magiorakos AP, Srinivasan A et al. “Multidrug-resistant, extensively drug-resistant and pandrug-resistant bacteria: an international expert proposal for interim standard definitions for acquired resistance.” Clinical Microbiology and Infection (2012) (link)
+guideline = "EUCAST3.2"
(or simply guideline = "EUCAST"
)
The European international guideline - EUCAST Expert Rules Version 3.2 “Intrinsic Resistance and Unusual Phenotypes” (link)
+guideline = "EUCAST3.1"
The European international guideline - EUCAST Expert Rules Version 3.1 “Intrinsic Resistance and Exceptional Phenotypes Tables” (link)
+guideline = "TB"
The international guideline for multi-drug resistant tuberculosis - World Health Organization “Companion handbook to the WHO guidelines for the programmatic management of drug-resistant tuberculosis” (link)
+guideline = "MRGN"
The German national guideline - Mueller et al. (2015) Antimicrobial Resistance and Infection Control 4:7. DOI: 10.1186/s13756-015-0047-6
+guideline = "BRMO"
The Dutch national guideline - Rijksinstituut voor Volksgezondheid en Milieu “WIP-richtlijn BRMO (Bijzonder Resistente Micro-Organismen) (ZKH)” (link)
+Please suggest your own (country-specific) guidelines by letting us know: https://github.com/msberends/AMR/issues/new.
+You can also use your own custom guideline. Custom guidelines can be set with the custom_mdro_guideline()
function. This is of great importance if you have custom rules to determine MDROs in your hospital, e.g., rules that are dependent on ward, state of contact isolation or other variables in your data.
If you are familiar with case_when()
of the dplyr
package, you will recognise the input method to set your own rules. Rules must be set using what R considers to be the ‘formula notation’:
+custom <- custom_mdro_guideline(CIP == "R" & age > 60 ~ "Elderly Type A",
+ ERY == "R" & age > 60 ~ "Elderly Type B")
If a row/an isolate matches the first rule, the value after the first ~
(in this case ‘Elderly Type A’) will be set as MDRO value. Otherwise, the second rule will be tried and so on. The maximum number of rules is unlimited.
You can print the rules set in the console for an overview. Colours will help reading it if your console supports colours.
+
+custom
+# A set of custom MDRO rules:
+# 1. If CIP is "R" and age is higher than 60 then: Elderly Type A
+# 2. If ERY is "R" and age is higher than 60 then: Elderly Type B
+# 3. Otherwise: Negative
+#
+# Unmatched rows will return NA.
+# Results will be of class <factor>, with ordered levels: Negative < Elderly Type A < Elderly Type B
The outcome of the function can be used for the guideline
argument in the mdro()
function:
+x <- mdro(example_isolates, guideline = custom)
+table(x)
+# x
+# Negative Elderly Type A Elderly Type B
+# 1070 198 732
The rules set (the custom
object in this case) could be exported to a shared file location using saveRDS()
if you collaborate with multiple users. The custom rules set could then be imported using readRDS()
.
The mdro()
function always returns an ordered factor
for predefined guidelines. For example, the output of the default guideline by Magiorakos et al. returns a factor
with levels ‘Negative’, ‘MDR’, ‘XDR’ or ‘PDR’ in that order.
The next example uses the example_isolates
data set. This is a data set included with this package and contains full antibiograms of 2,000 microbial isolates. It reflects reality and can be used to practise AMR data analysis. If we test the MDR/XDR/PDR guideline on this data set, we get:
+example_isolates %>%
+ mdro() %>%
+ freq() # show frequency table of the result
+# Warning: in `mdro()`: NA introduced for isolates where the available percentage of
+# antimicrobial classes was below 50% (set with `pct_required_classes`)
(16 isolates had no test results)
+Frequency table
+Class: factor > ordered (numeric)
+Length: 2,000
+Levels: 4: Negative < Multi-drug-resistant (MDR) < Extensively drug-resistant …
+Available: 1,729 (86.45%, NA: 271 = 13.55%)
+Unique: 2
+ | Item | +Count | +Percent | +Cum. Count | +Cum. Percent | +
---|---|---|---|---|---|
1 | +Negative | +1601 | +92.60% | +1601 | +92.60% | +
2 | +Multi-drug-resistant (MDR) | +128 | +7.40% | +1729 | +100.00% | +
For another example, I will create a data set to determine multi-drug resistant TB:
+
+# random_rsi() is a helper function to generate
+# a random vector with values S, I and R
+my_TB_data <- data.frame(rifampicin = random_rsi(5000),
+ isoniazid = random_rsi(5000),
+ gatifloxacin = random_rsi(5000),
+ ethambutol = random_rsi(5000),
+ pyrazinamide = random_rsi(5000),
+ moxifloxacin = random_rsi(5000),
+ kanamycin = random_rsi(5000))
Because all column names are automatically verified for valid drug names or codes, this would have worked exactly the same way:
+
+my_TB_data <- data.frame(RIF = random_rsi(5000),
+ INH = random_rsi(5000),
+ GAT = random_rsi(5000),
+ ETH = random_rsi(5000),
+ PZA = random_rsi(5000),
+ MFX = random_rsi(5000),
+ KAN = random_rsi(5000))
The data set now looks like this:
+
+head(my_TB_data)
+# rifampicin isoniazid gatifloxacin ethambutol pyrazinamide moxifloxacin
+# 1 I I S S I R
+# 2 I R R S I I
+# 3 R I S R R S
+# 4 I S I S I S
+# 5 S R S I S R
+# 6 I S I I S S
+# kanamycin
+# 1 I
+# 2 R
+# 3 S
+# 4 R
+# 5 S
+# 6 S
We can now add the interpretation of MDR-TB to our data set. You can use:
+
+mdro(my_TB_data, guideline = "TB")
or its shortcut mdr_tb()
:
+my_TB_data$mdr <- mdr_tb(my_TB_data)
+# ℹ No column found as input for `col_mo`, assuming all rows contain
+# Mycobacterium tuberculosis.
Create a frequency table of the results:
+
+freq(my_TB_data$mdr)
Frequency table
+Class: factor > ordered (numeric)
+Length: 5,000
+Levels: 5: Negative < Mono-resistant < Poly-resistant < Multi-drug-resistant <…
+Available: 5,000 (100.0%, NA: 0 = 0.0%)
+Unique: 5
+ | Item | +Count | +Percent | +Cum. Count | +Cum. Percent | +
---|---|---|---|---|---|
1 | +Mono-resistant | +3193 | +63.86% | +3193 | +63.86% | +
2 | +Negative | +996 | +19.92% | +4189 | +83.78% | +
3 | +Multi-drug-resistant | +450 | +9.00% | +4639 | +92.78% | +
4 | +Poly-resistant | +241 | +4.82% | +4880 | +97.60% | +
5 | +Extensively drug-resistant | +120 | +2.40% | +5000 | +100.00% | +
vignettes/PCA.Rmd
+ PCA.Rmd
NOTE: This page will be updated soon, as the pca() function is currently being developed.
+ +For PCA, we need to transform our AMR data first. This is what the example_isolates
data set in this package looks like:
+library(AMR)
+library(dplyr)
+glimpse(example_isolates)
+# Rows: 2,000
+# Columns: 49
+# $ date <date> 2002-01-02, 2002-01-03, 2002-01-07, 2002-01-07, 2002-…
+# $ hospital_id <fct> D, D, B, B, B, B, D, D, B, B, D, D, D, D, D, B, B, B, …
+# $ ward_icu <lgl> FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, TR…
+# $ ward_clinical <lgl> TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FA…
+# $ ward_outpatient <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
+# $ age <dbl> 65, 65, 45, 45, 45, 45, 78, 78, 45, 79, 67, 67, 71, 71…
+# $ gender <chr> "F", "F", "F", "F", "F", "F", "M", "M", "F", "F", "M",…
+# $ patient_id <chr> "A77334", "A77334", "067927", "067927", "067927", "067…
+# $ mo <mo> "B_ESCHR_COLI", "B_ESCHR_COLI", "B_STPHY_EPDR", "B_STPH…
+# $ PEN <rsi> R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, R, …
+# $ OXA <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+# $ FLC <rsi> NA, NA, R, R, R, R, S, S, R, S, S, S, NA, NA, NA, NA, …
+# $ AMX <rsi> NA, NA, NA, NA, NA, NA, R, R, NA, NA, NA, NA, NA, NA, …
+# $ AMC <rsi> I, I, NA, NA, NA, NA, S, S, NA, NA, S, S, I, I, R, I, …
+# $ AMP <rsi> NA, NA, NA, NA, NA, NA, R, R, NA, NA, NA, NA, NA, NA, …
+# $ TZP <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+# $ CZO <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+# $ FEP <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+# $ CXM <rsi> I, I, R, R, R, R, S, S, R, S, S, S, S, S, NA, S, S, R,…
+# $ FOX <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+# $ CTX <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, S, …
+# $ CAZ <rsi> NA, NA, R, R, R, R, R, R, R, R, R, R, NA, NA, NA, S, S…
+# $ CRO <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, S, …
+# $ GEN <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+# $ TOB <rsi> NA, NA, NA, NA, NA, NA, S, S, NA, NA, NA, NA, S, S, NA…
+# $ AMK <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+# $ KAN <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+# $ TMP <rsi> R, R, S, S, R, R, R, R, S, S, NA, NA, S, S, S, S, S, R…
+# $ SXT <rsi> R, R, S, S, NA, NA, NA, NA, S, S, NA, NA, S, S, S, S, …
+# $ NIT <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+# $ FOS <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+# $ LNZ <rsi> R, R, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R, R, R,…
+# $ CIP <rsi> NA, NA, NA, NA, NA, NA, NA, NA, S, S, NA, NA, NA, NA, …
+# $ MFX <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+# $ VAN <rsi> R, R, S, S, S, S, S, S, S, S, NA, NA, R, R, R, R, R, S…
+# $ TEC <rsi> R, R, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R, R, R,…
+# $ TCY <rsi> R, R, S, S, S, S, S, S, S, I, S, S, NA, NA, I, R, R, S…
+# $ TGC <rsi> NA, NA, S, S, S, S, S, S, S, NA, S, S, NA, NA, NA, R, …
+# $ DOX <rsi> NA, NA, S, S, S, S, S, S, S, NA, S, S, NA, NA, NA, R, …
+# $ ERY <rsi> R, R, R, R, R, R, S, S, R, S, S, S, R, R, R, R, R, R, …
+# $ CLI <rsi> R, R, NA, NA, NA, R, NA, NA, NA, NA, NA, NA, R, R, R, …
+# $ AZM <rsi> R, R, R, R, R, R, S, S, R, S, S, S, R, R, R, R, R, R, …
+# $ IPM <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, S, …
+# $ MEM <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+# $ MTR <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+# $ CHL <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+# $ COL <rsi> NA, NA, R, R, R, R, R, R, R, R, R, R, NA, NA, NA, R, R…
+# $ MUP <rsi> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
+# $ RIF <rsi> R, R, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, R, R, R,…
Now to transform this to a data set with only resistance percentages per taxonomic order and genus:
+
+resistance_data <- example_isolates %>%
+ group_by(order = mo_order(mo), # group on anything, like order
+ genus = mo_genus(mo)) %>% # and genus as we do here
+ summarise_if(is.rsi, resistance) %>% # then get resistance of all drugs
+ select(order, genus, AMC, CXM, CTX,
+ CAZ, GEN, TOB, TMP, SXT) # and select only relevant columns
+
+head(resistance_data)
+# # A tibble: 6 × 10
+# # Groups: order [5]
+# order genus AMC CXM CTX CAZ GEN TOB TMP SXT
+# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
+# 1 (unknown order) (unknown ge… NA NA NA NA NA NA NA NA
+# 2 Actinomycetales Schaalia NA NA NA NA NA NA NA NA
+# 3 Bacteroidales Bacteroides NA NA NA NA NA NA NA NA
+# 4 Campylobacterales Campylobact… NA NA NA NA NA NA NA NA
+# 5 Caryophanales Gemella NA NA NA NA NA NA NA NA
+# 6 Caryophanales Listeria NA NA NA NA NA NA NA NA
The new pca()
function will automatically filter on rows that contain numeric values in all selected variables, so we now only need to do:
+pca_result <- pca(resistance_data)
+# ℹ Columns selected for PCA: "AMC", "CAZ", "CTX", "CXM", "GEN", "SXT", "TMP"
+# and "TOB". Total observations available: 7.
The result can be reviewed with the good old summary()
function:
+summary(pca_result)
+# Groups (n=4, named as 'order'):
+# [1] "Caryophanales" "Enterobacterales" "Lactobacillales" "Pseudomonadales"
+# Importance of components:
+# PC1 PC2 PC3 PC4 PC5 PC6 PC7
+# Standard deviation 2.1539 1.6807 0.6138 0.33879 0.20808 0.03140 5.121e-17
+# Proportion of Variance 0.5799 0.3531 0.0471 0.01435 0.00541 0.00012 0.000e+00
+# Cumulative Proportion 0.5799 0.9330 0.9801 0.99446 0.99988 1.00000 1.000e+00
# Groups (n=4, named as 'order'):
+# [1] "Caryophanales" "Enterobacterales" "Lactobacillales" "Pseudomonadales"
+Good news. The first two components explain a total of 93.3% of the variance (see the PC1 and PC2 values of the Proportion of Variance. We can create a so-called biplot with the base R biplot()
function, to see which antimicrobial resistance per drug explain the difference per microorganism.
+biplot(pca_result)
But we can’t see the explanation of the points. Perhaps this works better with our new ggplot_pca()
function, that automatically adds the right labels and even groups:
+ggplot_pca(pca_result)
You can also print an ellipse per group, and edit the appearance:
+
+ggplot_pca(pca_result, ellipse = TRUE) +
+ ggplot2::labs(title = "An AMR/PCA biplot!")
vignettes/SPSS.Rmd
+ SPSS.Rmd
SPSS (Statistical Package for the Social Sciences) is probably the most well-known software package for statistical analysis. SPSS is easier to learn than R, because in SPSS you only have to click a menu to run parts of your analysis. Because of its user-friendliness, it is taught at universities and particularly useful for students who are new to statistics. From my experience, I would guess that pretty much all (bio)medical students know it at the time they graduate. SAS and Stata are comparable statistical packages popular in big industries.
+As said, SPSS is easier to learn than R. But SPSS, SAS and Stata come with major downsides when comparing it with R:
+R is highly modular.
+The official R network (CRAN) features more than 16,000 packages at the time of writing, our AMR
package being one of them. All these packages were peer-reviewed before publication. Aside from this official channel, there are also developers who choose not to submit to CRAN, but rather keep it on their own public repository, like GitHub. So there may even be a lot more than 14,000 packages out there.
Bottom line is, you can really extend it yourself or ask somebody to do this for you. Take for example our AMR
package. Among other things, it adds reliable reference data to R to help you with the data cleaning and analysis. SPSS, SAS and Stata will never know what a valid MIC value is or what the Gram stain of E. coli is. Or that all species of Klebiella are resistant to amoxicillin and that Floxapen® is a trade name of flucloxacillin. These facts and properties are often needed to clean existing data, which would be very inconvenient in a software package without reliable reference data. See below for a demonstration.
R is extremely flexible.
+Because you write the syntax yourself, you can do anything you want. The flexibility in transforming, arranging, grouping and summarising data, or drawing plots, is endless - with SPSS, SAS or Stata you are bound to their algorithms and format styles. They may be a bit flexible, but you can probably never create that very specific publication-ready plot without using other (paid) software. If you sometimes write syntaxes in SPSS to run a complete analysis or to ‘automate’ some of your work, you could do this a lot less time in R. You will notice that writing syntaxes in R is a lot more nifty and clever than in SPSS. Still, as working with any statistical package, you will have to have knowledge about what you are doing (statistically) and what you are willing to accomplish.
+R can be easily automated.
+Over the last years, R Markdown has really made an interesting development. With R Markdown, you can very easily produce reports, whether the format has to be Word, PowerPoint, a website, a PDF document or just the raw data to Excel. It even allows the use of a reference file containing the layout style (e.g. fonts and colours) of your organisation. I use this a lot to generate weekly and monthly reports automatically. Just write the code once and enjoy the automatically updated reports at any interval you like.
+For an even more professional environment, you could create Shiny apps: live manipulation of data using a custom made website. The webdesign knowledge needed (JavaScript, CSS, HTML) is almost zero.
+R has a huge community.
+Many R users just ask questions on websites like StackOverflow.com, the largest online community for programmers. At the time of writing, 460,146 R-related questions have already been asked on this platform (that covers questions and answers for any programming language). In my own experience, most questions are answered within a couple of minutes.
+R understands any data type, including SPSS/SAS/Stata.
+And that’s not vice versa I’m afraid. You can import data from any source into R. For example from SPSS, SAS and Stata (link), from Minitab, Epi Info and EpiData (link), from Excel (link), from flat files like CSV, TXT or TSV (link), or directly from databases and datawarehouses from anywhere on the world (link). You can even scrape websites to download tables that are live on the internet (link) or get the results of an API call and transform it into data in only one command (link).
+And the best part - you can export from R to most data formats as well. So you can import an SPSS file, do your analysis neatly in R and export the resulting tables to Excel files for sharing.
+R is completely free and open-source.
+No strings attached. It was created and is being maintained by volunteers who believe that (data) science should be open and publicly available to everybody. SPSS, SAS and Stata are quite expensive. IBM SPSS Staticstics only comes with subscriptions nowadays, varying between USD 1,300 and USD 8,500 per user per year. SAS Analytics Pro costs around USD 10,000 per computer. Stata also has a business model with subscription fees, varying between USD 600 and USD 2,800 per computer per year, but lower prices come with a limitation of the number of variables you can work with. And still they do not offer the above benefits of R.
+If you are working at a midsized or small company, you can save it tens of thousands of dollars by using R instead of e.g. SPSS - gaining even more functions and flexibility. And all R enthousiasts can do as much PR as they want (like I do here), because nobody is officially associated with or affiliated by R. It is really free.
+R is (nowadays) the preferred analysis software in academic papers.
+At present, R is among the world most powerful statistical languages, and it is generally very popular in science (Bollmann et al., 2017). For all the above reasons, the number of references to R as an analysis method in academic papers is rising continuously and has even surpassed SPSS for academic use (Muenchen, 2014).
+I believe that the thing with SPSS is, that it has always had a great user interface which is very easy to learn and use. Back when they developed it, they had very little competition, let alone from R. R didn’t even had a professional user interface until the last decade (called RStudio, see below). How people used R between the nineties and 2010 is almost completely incomparable to how R is being used now. The language itself has been restyled completely by volunteers who are dedicated professionals in the field of data science. SPSS was great when there was nothing else that could compete. But now in 2022, I don’t see any reason why SPSS would be of any better use than R.
+To demonstrate the first point:
+
+# not all values are valid MIC values:
+as.mic(0.125)
+# Class <mic>
+# [1] 0.125
+as.mic("testvalue")
+# Class <mic>
+# [1] <NA>
+
+# the Gram stain is available for all bacteria:
+mo_gramstain("E. coli")
+# [1] "Gram-negative"
+
+# Klebsiella is intrinsic resistant to amoxicillin, according to EUCAST:
+klebsiella_test <- data.frame(mo = "klebsiella",
+ amox = "S",
+ stringsAsFactors = FALSE)
+klebsiella_test # (our original data)
+# mo amox
+# 1 klebsiella S
+eucast_rules(klebsiella_test, info = FALSE) # (the edited data by EUCAST rules)
+# mo amox
+# 1 klebsiella R
+
+# hundreds of trade names can be translated to a name, trade name or an ATC code:
+ab_name("floxapen")
+# [1] "Flucloxacillin"
+ab_tradenames("floxapen")
+# [1] "floxacillin" "floxapen" "floxapen sodium salt"
+# [4] "fluclox" "flucloxacilina" "flucloxacillin"
+# [7] "flucloxacilline" "flucloxacillinum" "fluorochloroxacillin"
+ab_atc("floxapen")
+# [1] "J01CF05"
To work with R, probably the best option is to use RStudio. It is an open-source and free desktop environment which not only allows you to run R code, but also supports project management, version management, package management and convenient import menus to work with other data sources. You can also install RStudio Server on a private or corporate server, which brings nothing less than the complete RStudio software to you as a website (at home or at work).
+To import a data file, just click Import Dataset in the Environment tab:
+ +If additional packages are needed, RStudio will ask you if they should be installed on beforehand.
+In the the window that opens, you can define all options (parameters) that should be used for import and you’re ready to go:
+ +If you want named variables to be imported as factors so it resembles SPSS more, use as_factor()
.
The difference is this:
+
+SPSS_data
+# # A tibble: 4,203 x 4
+# v001 sex status statusage
+# <dbl> <dbl+lbl> <dbl+lbl> <dbl>
+# 1 10002 1 1 76.6
+# 2 10004 0 1 59.1
+# 3 10005 1 1 54.5
+# 4 10006 1 1 54.1
+# 5 10007 1 1 57.7
+# 6 10008 1 1 62.8
+# 7 10010 0 1 63.7
+# 8 10011 1 1 73.1
+# 9 10017 1 1 56.7
+# 10 10018 0 1 66.6
+# # ... with 4,193 more rows
+
+as_factor(SPSS_data)
+# # A tibble: 4,203 x 4
+# v001 sex status statusage
+# <dbl> <fct> <fct> <dbl>
+# 1 10002 Male alive 76.6
+# 2 10004 Female alive 59.1
+# 3 10005 Male alive 54.5
+# 4 10006 Male alive 54.1
+# 5 10007 Male alive 57.7
+# 6 10008 Male alive 62.8
+# 7 10010 Female alive 63.7
+# 8 10011 Male alive 73.1
+# 9 10017 Male alive 56.7
+# 10 10018 Female alive 66.6
+# # ... with 4,193 more rows
To import data from SPSS, SAS or Stata, you can use the great haven
package yourself:
+# download and install the latest version:
+install.packages("haven")
+# load the package you just installed:
+library(haven)
You can now import files as follows:
+To read files from SPSS into R:
+
+# read any SPSS file based on file extension (best way):
+read_spss(file = "path/to/file")
+
+# read .sav or .zsav file:
+read_sav(file = "path/to/file")
+
+# read .por file:
+read_por(file = "path/to/file")
Do not forget about as_factor()
, as mentioned above.
To export your R objects to the SPSS file format:
+
+# save as .sav file:
+write_sav(data = yourdata, path = "path/to/file")
+
+# save as compressed .zsav file:
+write_sav(data = yourdata, path = "path/to/file", compress = TRUE)
To read files from SAS into R:
+
+# read .sas7bdat + .sas7bcat files:
+read_sas(data_file = "path/to/file", catalog_file = NULL)
+
+# read SAS transport files (version 5 and version 8):
+read_xpt(file = "path/to/file")
To export your R objects to the SAS file format:
+
+# save as regular SAS file:
+write_sas(data = yourdata, path = "path/to/file")
+
+# the SAS transport format is an open format
+# (required for submission of the data to the FDA)
+write_xpt(data = yourdata, path = "path/to/file", version = 8)
To read files from Stata into R:
+
+# read .dta file:
+read_stata(file = "/path/to/file")
+
+# works exactly the same:
+read_dta(file = "/path/to/file")
To export your R objects to the Stata file format:
+
+# save as .dta file, Stata version 14:
+# (supports Stata v8 until v15 at the time of writing)
+write_dta(data = yourdata, path = "/path/to/file", version = 14)
This tutorial assumes you already imported the WHONET data with e.g. the readxl
package. In RStudio, this can be done using the menu button ‘Import Dataset’ in the tab ‘Environment’. Choose the option ‘From Excel’ and select your exported file. Make sure date fields are imported correctly.
An example syntax could look like this:
+
+library(readxl)
+data <- read_excel(path = "path/to/your/file.xlsx")
This package comes with an example data set WHONET
. We will use it for this analysis.
First, load the relevant packages if you did not yet did this. I use the tidyverse for all of my analyses. All of them. If you don’t know it yet, I suggest you read about it on their website: https://www.tidyverse.org/.
+
+library(dplyr) # part of tidyverse
+library(ggplot2) # part of tidyverse
+library(AMR) # this package
+library(cleaner) # to create frequency tables
We will have to transform some variables to simplify and automate the analysis:
+mo
) using our Catalogue of Life reference data set, which contains all ~70,000 microorganisms from the taxonomic kingdoms Bacteria, Fungi and Protozoa. We do the tranformation with as.mo()
. This function also recognises almost all WHONET abbreviations of microorganisms."S"
, "I"
or "R"
. That is exactly where the as.rsi()
function is for.
+# transform variables
+data <- WHONET %>%
+ # get microbial ID based on given organism
+ mutate(mo = as.mo(Organism)) %>%
+ # transform everything from "AMP_ND10" to "CIP_EE" to the new `rsi` class
+ mutate_at(vars(AMP_ND10:CIP_EE), as.rsi)
No errors or warnings, so all values are transformed succesfully.
+We also created a package dedicated to data cleaning and checking, called the cleaner
package. Its freq()
function can be used to create frequency tables.
So let’s check our data, with a couple of frequency tables:
+
+# our newly created `mo` variable, put in the mo_name() function
+data %>% freq(mo_name(mo), nmax = 10)
Frequency table
+Class: character
+Length: 500
+Available: 500 (100.0%, NA: 0 = 0.0%)
+Unique: 37
Shortest: 11
+Longest: 40
+ | Item | +Count | +Percent | +Cum. Count | +Cum. Percent | +
---|---|---|---|---|---|
1 | +Escherichia coli | +245 | +49.0% | +245 | +49.0% | +
2 | +Coagulase-negative Staphylococcus (CoNS) | +74 | +14.8% | +319 | +63.8% | +
3 | +Staphylococcus epidermidis | +38 | +7.6% | +357 | +71.4% | +
4 | +Streptococcus pneumoniae | +31 | +6.2% | +388 | +77.6% | +
5 | +Staphylococcus hominis | +21 | +4.2% | +409 | +81.8% | +
6 | +Proteus mirabilis | +9 | +1.8% | +418 | +83.6% | +
7 | +Enterococcus faecium | +8 | +1.6% | +426 | +85.2% | +
8 | +Staphylococcus capitis | +8 | +1.6% | +434 | +86.8% | +
9 | +Enterobacter cloacae | +5 | +1.0% | +439 | +87.8% | +
10 | +Streptococcus anginosus | +5 | +1.0% | +444 | +88.8% | +
(omitted 27 entries, n = 56 [11.20%])
+
+# our transformed antibiotic columns
+# amoxicillin/clavulanic acid (J01CR02) as an example
+data %>% freq(AMC_ND2)
Frequency table
+Class: factor > ordered > rsi (numeric)
+Length: 500
+Levels: 3: S < I < R
+Available: 481 (96.2%, NA: 19 = 3.8%)
+Unique: 3
Drug: Amoxicillin/clavulanic acid (AMC, J01CR02)
+Drug group: Beta-lactams/penicillins
+%SI: 78.59%
+ | Item | +Count | +Percent | +Cum. Count | +Cum. Percent | +
---|---|---|---|---|---|
1 | +S | +356 | +74.01% | +356 | +74.01% | +
2 | +R | +103 | +21.41% | +459 | +95.43% | +
3 | +I | +22 | +4.57% | +481 | +100.00% | +
An easy ggplot
will already give a lot of information, using the included ggplot_rsi()
function:
vignettes/datasets.Rmd
+ datasets.Rmd
All reference data (about microorganisms, antibiotics, R/SI interpretation, EUCAST rules, etc.) in this AMR
package are reliable, up-to-date and freely available. We continually export our data sets to formats for use in R, SPSS, SAS, Stata and Excel. We also supply tab separated files that are machine-readable and suitable for input in any software program, such as laboratory information systems.
On this page, we explain how to download them and how the structure of the data sets look like.
++If you are reading this page from within R, please visit our website, which is automatically updated with every code change. +
+A data set with 70,764 rows and 16 columns, containing the following column names:
mo, fullname, kingdom, phylum, class, order, family, genus, species, subspecies, rank, ref, species_id, source, prevalence and snomed.
This data set is in R available as microorganisms
, after you load the AMR
package.
It was last updated on 21 August 2022 14:52:57 UTC. Find more info about the structure of this data set here.
+Direct download links:
+NOTE: The exported files for SAS, SPSS and Stata do not contain SNOMED codes, as their file size would exceed 100 MB; the file size limit of GitHub. Advice? Use R instead.
+Our full taxonomy of microorganisms is based on the authoritative and comprehensive:
+Included (sub)species per taxonomic kingdom:
+Kingdom | +Number of (sub)species | +
---|---|
(unknown kingdom) | +5 | +
Animalia | +2,153 | +
Archaea | +694 | +
Bacteria | +22,852 | +
Chromista | +32,167 | +
Fungi | +9,582 | +
Example rows when filtering on genus Escherichia:
+mo | +fullname | +kingdom | +phylum | +class | +order | +family | +genus | +species | +subspecies | +rank | +ref | +species_id | +source | +prevalence | +snomed | +
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
B_ESCHR | +Escherichia | +Bacteria | +Proteobacteria | +Gammaproteobacteria | +Enterobacterales | +Enterobacteriaceae | +Escherichia | ++ | + | genus | ++ | 515602 | +LPSN | +1 | +64735005 | +
B_ESCHR_ALBR | +Escherichia albertii | +Bacteria | +Proteobacteria | +Gammaproteobacteria | +Enterobacterales | +Enterobacteriaceae | +Escherichia | +albertii | ++ | species | +Huys et al., 2003 | +776053 | +LPSN | +1 | +419388003 | +
B_ESCHR_COLI | +Escherichia coli | +Bacteria | +Proteobacteria | +Gammaproteobacteria | +Enterobacterales | +Enterobacteriaceae | +Escherichia | +coli | ++ | species | +Castellani et al., 1919 | +776057 | +LPSN | +1 | +1095001000112106, 715307006, 737528008, … | +
B_ESCHR_FRGS | +Escherichia fergusonii | +Bacteria | +Proteobacteria | +Gammaproteobacteria | +Enterobacterales | +Enterobacteriaceae | +Escherichia | +fergusonii | ++ | species | +Farmer et al., 1985 | +776059 | +LPSN | +1 | +72461005 | +
B_ESCHR_HRMN | +Escherichia hermannii | +Bacteria | +Proteobacteria | +Gammaproteobacteria | +Enterobacterales | +Enterobacteriaceae | +Escherichia | +hermannii | ++ | species | +Brenner et al., 1983 | +776060 | +LPSN | +1 | +85786000 | +
B_ESCHR_MRMT | +Escherichia marmotae | +Bacteria | +Proteobacteria | +Gammaproteobacteria | +Enterobacterales | +Enterobacteriaceae | +Escherichia | +marmotae | ++ | species | +Liu et al., 2015 | +792928 | +LPSN | +1 | +14961000146107 | +
A data set with 14,338 rows and 4 columns, containing the following column names:
fullname, fullname_new, ref and prevalence.
Note: remember that the ‘ref’ columns contains the scientific reference to the old taxonomic entries, i.e. of column ‘fullname’. For the scientific reference of the new names, i.e. of column ‘fullname_new’, see the microorganisms
data set.
This data set is in R available as microorganisms.old
, after you load the AMR
package.
It was last updated on 21 August 2022 14:52:57 UTC. Find more info about the structure of this data set here.
+Direct download links:
+This data set contains old, previously accepted taxonomic names. The data sources are the same as the microorganisms
data set:
Example rows when filtering on Escherichia:
+fullname | +fullname_new | +ref | +prevalence | +
---|---|---|---|
Escherichia adecarboxylata | +Leclercia adecarboxylata | +Leclerc, 1962 | +1 | +
Escherichia blattae | +Shimwellia blattae | +Burgess et al., 1973 | +1 | +
Escherichia vulneris | +Pseudescherichia vulneris | +Brenner et al., 1983 | +1 | +
A data set with 464 rows and 14 columns, containing the following column names:
ab, cid, name, group, atc, atc_group1, atc_group2, abbreviations, synonyms, oral_ddd, oral_units, iv_ddd, iv_units and loinc.
This data set is in R available as antibiotics
, after you load the AMR
package.
It was last updated on 21 August 2022 14:52:57 UTC. Find more info about the structure of this data set here.
+Direct download links:
+This data set contains all EARS-Net and ATC codes gathered from WHO and WHONET, and all compound IDs from PubChem. It also contains all brand names (synonyms) as found on PubChem and Defined Daily Doses (DDDs) for oral and parenteral administration.
+ab | +cid | +name | +group | +atc | +atc_group1 | +atc_group2 | +abbreviations | +synonyms | +oral_ddd | +oral_units | +iv_ddd | +iv_units | +loinc | +
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AMK | +37768 | +Amikacin | +Aminoglycosides | +D06AX12, J01GB06, S01AA21 | +Aminoglycoside antibacterials | +Other aminoglycosides | +ak, ami, amik, … | +amicacin, amikacillin, amikacin, … | ++ | + | 1.0 | +g | +13546-7, 15098-7, 17798-0, … | +
AMX | +33613 | +Amoxicillin | +Beta-lactams/penicillins | +J01CA04 | +Beta-lactam antibacterials, penicillins | +Penicillins with extended spectrum | +ac, amox, amx | +actimoxi, amoclen, amolin, … | +1.5 | +g | +3.0 | +g | +16365-9, 25274-2, 3344-9, … | +
AMC | +23665637 | +Amoxicillin/clavulanic acid | +Beta-lactams/penicillins | +J01CR02 | +Beta-lactam antibacterials, penicillins | +Combinations of penicillins, incl. beta-lactamase inhibitors | +a/c, amcl, aml, … | +amocla, amoclan, amoclav, … | +1.5 | +g | +3.0 | +g | ++ |
AMP | +6249 | +Ampicillin | +Beta-lactams/penicillins | +J01CA01, S01AA19 | +Beta-lactam antibacterials, penicillins | +Penicillins with extended spectrum | +am, amp, ampi | +acillin, adobacillin, amblosin, … | +2.0 | +g | +6.0 | +g | +21066-6, 3355-5, 33562-0, … | +
AZM | +447043 | +Azithromycin | +Macrolides/lincosamides | +J01FA10, S01AA26 | +Macrolides, lincosamides and streptogramins | +Macrolides | +az, azi, azit, … | +aritromicina, azasite, azenil, … | +0.3 | +g | +0.5 | +g | +16420-2, 25233-8 | +
PEN | +5904 | +Benzylpenicillin | +Beta-lactams/penicillins | +J01CE01, S01AA14 | +Combinations of antibacterials | +Combinations of antibacterials | +bepe, pen, peni, … | +abbocillin, ayercillin, bencilpenicilina, … | ++ | + | 3.6 | +g | +3913-1 | +
A data set with 102 rows and 9 columns, containing the following column names:
atc, cid, name, atc_group, synonyms, oral_ddd, oral_units, iv_ddd and iv_units.
This data set is in R available as antivirals
, after you load the AMR
package.
It was last updated on 21 August 2022 14:52:57 UTC. Find more info about the structure of this data set here.
+Direct download links:
+This data set contains all ATC codes gathered from WHO and all compound IDs from PubChem. It also contains all brand names (synonyms) as found on PubChem and Defined Daily Doses (DDDs) for oral and parenteral administration.
+atc | +cid | +name | +atc_group | +synonyms | +oral_ddd | +oral_units | +iv_ddd | +iv_units | +
---|---|---|---|---|---|---|---|---|
J05AF06 | +441300 | +Abacavir | +Nucleoside and nucleotide reverse transcriptase inhibitors | +Abacavir, Abacavir sulfate, Ziagen | +0.6 | +g | ++ | + |
J05AB01 | +135398513 | +Aciclovir | +Nucleosides and nucleotides excl. reverse transcriptase inhibitors | +Acicloftal, Aciclovier, Aciclovir, … | +4.0 | +g | +4 | +g | +
J05AF08 | +60871 | +Adefovir dipivoxil | +Nucleoside and nucleotide reverse transcriptase inhibitors | +Adefovir di ester, Adefovir dipivoxil, Adefovir Dipivoxil, … | +10.0 | +mg | ++ | + |
J05AE05 | +65016 | +Amprenavir | +Protease inhibitors | +Agenerase, Amprenavir, Amprenavirum, … | +1.2 | +g | ++ | + |
J05AP06 | +16076883 | +Asunaprevir | +Antivirals for treatment of HCV infections | +Asunaprevir, Sunvepra | ++ | + | + | + |
J05AE08 | +148192 | +Atazanavir | +Protease inhibitors | +Atazanavir, Atazanavir Base, Latazanavir, … | +0.3 | +g | ++ | + |
A data set with 20,369 rows and 11 columns, containing the following column names:
guideline, method, site, mo, rank_index, ab, ref_tbl, disk_dose, breakpoint_S, breakpoint_R and uti.
This data set is in R available as rsi_translation
, after you load the AMR
package.
It was last updated on 21 August 2022 14:52:57 UTC. Find more info about the structure of this data set here.
+Direct download links:
+This data set contains interpretation rules for MIC values and disk diffusion diameters. Included guidelines are CLSI (2011-2022) and EUCAST (2011-2022).
+guideline | +method | +site | +mo | +mo_name | +rank_index | +ab | +ab_name | +ref_tbl | +disk_dose | +breakpoint_S | +breakpoint_R | +uti | +
---|---|---|---|---|---|---|---|---|---|---|---|---|
EUCAST 2022 | +MIC | ++ | F_ASPRG_MGTS | +Aspergillus fumigatus | +2 | +AMB | +Amphotericin B | +Aspergillus | ++ | 1 | +1 | +FALSE | +
EUCAST 2022 | +MIC | ++ | F_ASPRG_NIGR | +Aspergillus niger | +2 | +AMB | +Amphotericin B | +Aspergillus | ++ | 1 | +1 | +FALSE | +
EUCAST 2022 | +MIC | ++ | F_CANDD | +Candida | +3 | +AMB | +Amphotericin B | +Candida | ++ | 1 | +1 | +FALSE | +
EUCAST 2022 | +MIC | ++ | F_CANDD_ALBC | +Candida albicans | +2 | +AMB | +Amphotericin B | +Candida | ++ | 1 | +1 | +FALSE | +
EUCAST 2022 | +MIC | ++ | F_CANDD_DBLN | +Candida dubliniensis | +2 | +AMB | +Amphotericin B | +Candida | ++ | 1 | +1 | +FALSE | +
EUCAST 2022 | +MIC | ++ | F_CANDD_KRUS | +Candida krusei | +2 | +AMB | +Amphotericin B | +Candida | ++ | 1 | +1 | +FALSE | +
A data set with 134,956 rows and 2 columns, containing the following column names:
mo and ab.
This data set is in R available as intrinsic_resistant
, after you load the AMR
package.
It was last updated on 21 August 2022 14:52:57 UTC. Find more info about the structure of this data set here.
+Direct download links:
+This data set contains all defined intrinsic resistance by EUCAST of all bug-drug combinations, and is based on ‘EUCAST Expert Rules’ and ‘EUCAST Intrinsic Resistance and Unusual Phenotypes’ v3.3 (2021).
+Example rows when filtering on Enterobacter cloacae:
+microorganism | +antibiotic | +
---|---|
Enterobacter cloacae | +Acetylmidecamycin | +
Enterobacter cloacae | +Acetylspiramycin | +
Enterobacter cloacae | +Amoxicillin | +
Enterobacter cloacae | +Amoxicillin/clavulanic acid | +
Enterobacter cloacae | +Ampicillin | +
Enterobacter cloacae | +Ampicillin/sulbactam | +
Enterobacter cloacae | +Avoparcin | +
Enterobacter cloacae | +Azithromycin | +
Enterobacter cloacae | +Benzylpenicillin | +
Enterobacter cloacae | +Cadazolid | +
Enterobacter cloacae | +Cefadroxil | +
Enterobacter cloacae | +Cefazolin | +
Enterobacter cloacae | +Cefoxitin | +
Enterobacter cloacae | +Cephalexin | +
Enterobacter cloacae | +Cephalothin | +
Enterobacter cloacae | +Clarithromycin | +
Enterobacter cloacae | +Clindamycin | +
Enterobacter cloacae | +Cycloserine | +
Enterobacter cloacae | +Dalbavancin | +
Enterobacter cloacae | +Dirithromycin | +
Enterobacter cloacae | +Erythromycin | +
Enterobacter cloacae | +Flurithromycin | +
Enterobacter cloacae | +Fusidic acid | +
Enterobacter cloacae | +Gamithromycin | +
Enterobacter cloacae | +Josamycin | +
Enterobacter cloacae | +Kitasamycin | +
Enterobacter cloacae | +Lincomycin | +
Enterobacter cloacae | +Linezolid | +
Enterobacter cloacae | +Meleumycin | +
Enterobacter cloacae | +Midecamycin | +
Enterobacter cloacae | +Miocamycin | +
Enterobacter cloacae | +Nafithromycin | +
Enterobacter cloacae | +Norvancomycin | +
Enterobacter cloacae | +Oleandomycin | +
Enterobacter cloacae | +Oritavancin | +
Enterobacter cloacae | +Pirlimycin | +
Enterobacter cloacae | +Primycin | +
Enterobacter cloacae | +Pristinamycin | +
Enterobacter cloacae | +Quinupristin/dalfopristin | +
Enterobacter cloacae | +Ramoplanin | +
Enterobacter cloacae | +Rifampicin | +
Enterobacter cloacae | +Rokitamycin | +
Enterobacter cloacae | +Roxithromycin | +
Enterobacter cloacae | +Solithromycin | +
Enterobacter cloacae | +Spiramycin | +
Enterobacter cloacae | +Tedizolid | +
Enterobacter cloacae | +Teicoplanin | +
Enterobacter cloacae | +Telavancin | +
Enterobacter cloacae | +Telithromycin | +
Enterobacter cloacae | +Thiacetazone | +
Enterobacter cloacae | +Tildipirosin | +
Enterobacter cloacae | +Tilmicosin | +
Enterobacter cloacae | +Troleandomycin | +
Enterobacter cloacae | +Tulathromycin | +
Enterobacter cloacae | +Tylosin | +
Enterobacter cloacae | +Tylvalosin | +
Enterobacter cloacae | +Vancomycin | +
A data set with 169 rows and 9 columns, containing the following column names:
ab, name, type, dose, dose_times, administration, notes, original_txt and eucast_version.
This data set is in R available as dosage
, after you load the AMR
package.
It was last updated on 21 August 2022 14:52:57 UTC. Find more info about the structure of this data set here.
+Direct download links:
+EUCAST breakpoints used in this package are based on the dosages in this data set.
+Currently included dosages in the data set are meant for: ‘EUCAST Clinical Breakpoint Tables’ v11.0 (2021).
+ab | +name | +type | +dose | +dose_times | +administration | +notes | +original_txt | +eucast_version | +
---|---|---|---|---|---|---|---|---|
AMK | +Amikacin | +standard_dosage | +25-30 mg/kg | +1 | +iv | ++ | 25-30 mg/kg x 1 iv | +11 | +
AMX | +Amoxicillin | +high_dosage | +2 g | +6 | +iv | ++ | 2 g x 6 iv | +11 | +
AMX | +Amoxicillin | +standard_dosage | +1 g | +3 | +iv | ++ | 1 g x 3-4 iv | +11 | +
AMX | +Amoxicillin | +high_dosage | +0.75-1 g | +3 | +oral | ++ | 0.75-1 g x 3 oral | +11 | +
AMX | +Amoxicillin | +standard_dosage | +0.5 g | +3 | +oral | ++ | 0.5 g x 3 oral | +11 | +
AMX | +Amoxicillin | +uncomplicated_uti | +0.5 g | +3 | +oral | ++ | 0.5 g x 3 oral | +11 | +
vignettes/resistance_predict.Rmd
+ resistance_predict.Rmd
As with many uses in R, we need some additional packages for AMR data analysis. Our package works closely together with the tidyverse packages dplyr
and ggplot2
by Dr Hadley Wickham. The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.
Our AMR
package depends on these packages and even extends their use and functions.
Our package contains a function resistance_predict()
, which takes the same input as functions for other AMR data analysis. Based on a date column, it calculates cases per year and uses a regression model to predict antimicrobial resistance.
It is basically as easy as:
+# resistance prediction of piperacillin/tazobactam (TZP):
+resistance_predict(tbl = example_isolates, col_date = "date", col_ab = "TZP", model = "binomial")
+
+# or:
+%>%
+ example_isolates resistance_predict(col_ab = "TZP",
+ "binomial")
+ model
+# to bind it to object 'predict_TZP' for example:
+<- example_isolates %>%
+ predict_TZP resistance_predict(col_ab = "TZP",
+ model = "binomial")
The function will look for a date column itself if col_date
is not set.
When running any of these commands, a summary of the regression model will be printed unless using resistance_predict(..., info = FALSE)
.
# ℹ Using column 'date' as input for `col_date`.
+This text is only a printed summary - the actual result (output) of the function is a data.frame
containing for each year: the number of observations, the actual observed resistance, the estimated resistance and the standard error below and above the estimation:
+predict_TZP
+# year value se_min se_max observations observed estimated
+# 1 2002 0.20000000 NA NA 15 0.20000000 0.05616378
+# 2 2003 0.06250000 NA NA 32 0.06250000 0.06163839
+# 3 2004 0.08536585 NA NA 82 0.08536585 0.06760841
+# 4 2005 0.05000000 NA NA 60 0.05000000 0.07411100
+# 5 2006 0.05084746 NA NA 59 0.05084746 0.08118454
+# 6 2007 0.12121212 NA NA 66 0.12121212 0.08886843
+# 7 2008 0.04166667 NA NA 72 0.04166667 0.09720264
+# 8 2009 0.01639344 NA NA 61 0.01639344 0.10622731
+# 9 2010 0.05660377 NA NA 53 0.05660377 0.11598223
+# 10 2011 0.18279570 NA NA 93 0.18279570 0.12650615
+# 11 2012 0.30769231 NA NA 65 0.30769231 0.13783610
+# 12 2013 0.06896552 NA NA 58 0.06896552 0.15000651
+# 13 2014 0.10000000 NA NA 60 0.10000000 0.16304829
+# 14 2015 0.23636364 NA NA 55 0.23636364 0.17698785
+# 15 2016 0.22619048 NA NA 84 0.22619048 0.19184597
+# 16 2017 0.16279070 NA NA 86 0.16279070 0.20763675
+# 17 2018 0.22436641 0.1938710 0.2548618 NA NA 0.22436641
+# 18 2019 0.24203228 0.2062911 0.2777735 NA NA 0.24203228
+# 19 2020 0.26062172 0.2191758 0.3020676 NA NA 0.26062172
+# 20 2021 0.28011130 0.2325557 0.3276669 NA NA 0.28011130
+# 21 2022 0.30046606 0.2464567 0.3544755 NA NA 0.30046606
+# 22 2023 0.32163907 0.2609011 0.3823771 NA NA 0.32163907
+# 23 2024 0.34357130 0.2759081 0.4112345 NA NA 0.34357130
+# 24 2025 0.36619175 0.2914934 0.4408901 NA NA 0.36619175
+# 25 2026 0.38941799 0.3076686 0.4711674 NA NA 0.38941799
+# 26 2027 0.41315710 0.3244399 0.5018743 NA NA 0.41315710
+# 27 2028 0.43730688 0.3418075 0.5328063 NA NA 0.43730688
+# 28 2029 0.46175755 0.3597639 0.5637512 NA NA 0.46175755
+# 29 2030 0.48639359 0.3782932 0.5944939 NA NA 0.48639359
+# 30 2031 0.51109592 0.3973697 0.6248221 NA NA 0.51109592
+# 31 2032 0.53574417 0.4169574 0.6545309 NA NA 0.53574417
The function plot
is available in base R, and can be extended by other packages to depend the output based on the type of input. We extended its function to cope with resistance predictions:
+plot(predict_TZP)
This is the fastest way to plot the result. It automatically adds the right axes, error bars, titles, number of available observations and type of model.
+We also support the ggplot2
package with our custom function ggplot_rsi_predict()
to create more appealing plots:
+ggplot_rsi_predict(predict_TZP)
+
+# choose for error bars instead of a ribbon
+ggplot_rsi_predict(predict_TZP, ribbon = FALSE)
Resistance is not easily predicted; if we look at vancomycin resistance in Gram-positive bacteria, the spread (i.e. standard error) is enormous:
+
+example_isolates %>%
+ filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
+ resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "binomial") %>%
+ ggplot_rsi_predict()
+# ℹ Using column 'date' as input for `col_date`.
Vancomycin resistance could be 100% in ten years, but might remain very low.
+You can define the model with the model
parameter. The model chosen above is a generalised linear regression model using a binomial distribution, assuming that a period of zero resistance was followed by a period of increasing resistance leading slowly to more and more resistance.
Valid values are:
+Input values | +Function used by R | +Type of model | +
---|---|---|
+"binomial" or "binom" or "logit"
+ |
+glm(..., family = binomial) |
+Generalised linear model with binomial distribution | +
+"loglin" or "poisson"
+ |
+glm(..., family = poisson) |
+Generalised linear model with poisson distribution | +
+"lin" or "linear"
+ |
+lm() |
+Linear model | +
For the vancomycin resistance in Gram-positive bacteria, a linear model might be more appropriate:
+
+example_isolates %>%
+ filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
+ resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>%
+ ggplot_rsi_predict()
+# ℹ Using column 'date' as input for `col_date`.
The model itself is also available from the object, as an attribute
:
+model <- attributes(predict_TZP)$model
+
+summary(model)$family
+#
+# Family: binomial
+# Link function: logit
+
+summary(model)$coefficients
+# Estimate Std. Error z value Pr(>|z|)
+# (Intercept) -200.67944891 46.17315349 -4.346237 1.384932e-05
+# year 0.09883005 0.02295317 4.305725 1.664395e-05
Note: to keep the package size as small as possible, we only included this vignette on CRAN. You can read more vignettes on our website about how to conduct AMR data analysis, determine MDRO’s, find explanation of EUCAST rules, and much more: https://msberends.github.io/AMR/articles/.
+AMR
is a free, open-source and independent R package (see Copyright) to simplify the analysis and prediction of Antimicrobial Resistance (AMR) and to work with microbial and antimicrobial data and properties, by using evidence-based methods. Our aim is to provide a standard for clean and reproducible antimicrobial resistance data analysis, that can therefore empower epidemiological analyses to continuously enable surveillance and treatment evaluation in any setting.
After installing this package, R knows ~71,000 distinct microbial species and all ~570 antibiotic, antimycotic and antiviral drugs by name and code (including ATC, EARS-Net, PubChem, LOINC and SNOMED CT), and knows all about valid R/SI and MIC values. It supports any data format, including WHONET/EARS-Net data.
+The AMR
package is available in Danish, Dutch, English, French, German, Italian, Portuguese, Russian, Spanish and Swedish. Antimicrobial drug (group) names and colloquial microorganism names are provided in these languages.
This package is fully independent of any other R package and works on Windows, macOS and Linux with all versions of R since R-3.0 (April 2013). It was designed to work in any setting, including those with very limited resources. Since its first public release in early 2018, this package has been downloaded from more than 175 countries.
+This package can be used for:
+All reference data sets (about microorganisms, antibiotics, R/SI interpretation, EUCAST rules, etc.) in this AMR
package are publicly and freely available. We continually export our data sets to formats for use in R, SPSS, SAS, Stata and Excel. We also supply flat files that are machine-readable and suitable for input in any software program, such as laboratory information systems. Please find all download links on our website, which is automatically updated with every code change.
This R package was created for both routine data analysis and academic research at the Faculty of Medical Sciences of the University of Groningen, in collaboration with non-profit organisations Certe Medical Diagnostics and Advice Foundation and University Medical Center Groningen. This R package formed the basis of two PhD theses (DOI 10.33612/diss.177417131 and DOI 10.33612/diss.192486375) but is actively and durably maintained (see changelog)) by two public healthcare organisations in the Netherlands.
+