diff --git a/DESCRIPTION b/DESCRIPTION index dc718674..598c0391 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: AMR -Version: 0.5.0.9025 -Date: 2019-03-26 +Version: 0.6.0 +Date: 2019-03-27 Title: Antimicrobial Resistance Analysis Authors@R: c( person( diff --git a/NEWS.md b/NEWS.md index 2d7bed2b..ff88a349 100755 --- a/NEWS.md +++ b/NEWS.md @@ -1,5 +1,4 @@ -# AMR 0.5.0.90xx -**Note: this is the development version, which will eventually be released as AMR 0.6.0.** +# AMR 0.6.0 **New website!** @@ -11,7 +10,7 @@ We've got a new website: [https://msberends.gitlab.io/AMR](https://msberends.git #### New * **BREAKING**: removed deprecated functions, parameters and references to 'bactid'. Use `as.mo()` to identify an MO code. * Catalogue of Life as a new taxonomic source for data about microorganisms, which also contains all ITIS data we used previously. The `microorganisms` data set now contains: - * All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria, Protozoa and Viruses + * All ~55,000 (sub)species from the kingdoms of Archaea, Bacteria and Protozoa * All ~3,000 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales and Schizosaccharomycetales (covering at least like all species of *Aspergillus*, *Candida*, *Pneumocystis*, *Saccharomyces* and *Trichophyton*) * All ~2,000 (sub)species from ~100 other relevant genera, from the kingdoms of Animalia and Plantae (like *Strongyloides* and *Taenia*) * All ~15,000 previously accepted names of included (sub)species that have been taxonomically renamed @@ -122,7 +121,8 @@ We've got a new website: [https://msberends.gitlab.io/AMR](https://msberends.git as.mo(..., allow_uncertain = 0) ``` Using `as.mo(..., allow_uncertain = 3)` could lead to very unreliable results. - * All microbial IDs that are found with zero uncertainty are now saved to a local file `~/.Rhistory_mo`. Use the new function `clean_mo_history()` to delete this file, which resets the algorithms. + * Implemented the latest publication of Becker *et al.* (2019), for categorising coagulase-negative *Staphylococci* + * All microbial IDs that found are now saved to a local file `~/.Rhistory_mo`. Use the new function `clean_mo_history()` to delete this file, which resets the algorithms. * Incoercible results will now be considered 'unknown', MO code `UNKNOWN`. On foreign systems, properties of these will be translated to all languages already previously supported: German, Dutch, French, Italian, Spanish and Portuguese: ```r mo_genus("qwerty", language = "es") diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index a8fb96d7..3cf55fe5 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -78,7 +78,7 @@
diff --git a/docs/articles/AMR.html b/docs/articles/AMR.html index 6c761450..d2cef62d 100644 --- a/docs/articles/AMR.html +++ b/docs/articles/AMR.html @@ -40,7 +40,7 @@ @@ -192,7 +192,7 @@AMR.Rmd
Note: values on this page will change with every website update since they are based on randomly created values and the page was written in RMarkdown. However, the methodology remains unchanged. This page was generated on 26 March 2019.
+Note: values on this page will change with every website update since they are based on randomly created values and the page was written in R Markdown. However, the methodology remains unchanged. This page was generated on 27 March 2019.
Now, let’s start the cleaning and the analysis!
@@ -411,8 +411,8 @@ #> #> Item Count Percent Cum. Count Cum. Percent #> --- ----- ------- -------- ----------- ------------- -#> 1 M 10,435 52.2% 10,435 52.2% -#> 2 F 9,565 47.8% 20,000 100.0% +#> 1 M 10,344 51.7% 10,344 51.7% +#> 2 F 9,656 48.3% 20,000 100.0%So, we can draw at least two conclusions immediately. From a data scientist perspective, the data looks clean: only values M
and F
. From a researcher perspective: there are slightly more men. Nothing we didn’t already know.
The data is already quite clean, but we still need to transform some variables. The bacteria
column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate()
function of the dplyr
package makes this really easy:
data <- data %>%
@@ -443,10 +443,10 @@
#> Kingella kingae (no changes)
#>
#> EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
-#> Table 1: Intrinsic resistance in Enterobacteriaceae (1262 changes)
+#> Table 1: Intrinsic resistance in Enterobacteriaceae (1342 changes)
#> Table 2: Intrinsic resistance in non-fermentative Gram-negative bacteria (no changes)
#> Table 3: Intrinsic resistance in other Gram-negative bacteria (no changes)
-#> Table 4: Intrinsic resistance in Gram-positive bacteria (2756 changes)
+#> Table 4: Intrinsic resistance in Gram-positive bacteria (2726 changes)
#> Table 8: Interpretive rules for B-lactam agents and Gram-positive cocci (no changes)
#> Table 9: Interpretive rules for B-lactam agents and Gram-negative rods (no changes)
#> Table 10: Interpretive rules for B-lactam agents and other Gram-negative bacteria (no changes)
@@ -462,9 +462,9 @@
#> Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no changes)
#> Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no changes)
#>
-#> => EUCAST rules affected 7,403 out of 20,000 rows
+#> => EUCAST rules affected 7,400 out of 20,000 rows
#> -> added 0 test results
-#> -> changed 4,018 test results (0 to S; 0 to I; 4,018 to R)
So only 28.2% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
So only 28.3% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
For future use, the above two syntaxes can be shortened with the filter_first_isolate()
function:
isolate | @@ -654,11 +654,11 @@|||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | -2010-01-29 | -P7 | +2010-09-11 | +L7 | B_ESCHR_COL | -S | -S | +R | +I | S | S | TRUE | @@ -666,35 +666,35 @@|
2 | -2010-05-18 | -P7 | +2010-11-07 | +L7 | B_ESCHR_COL | S | +S | +S | R | -S | -S | FALSE | TRUE |
3 | -2010-06-01 | -P7 | +2011-01-16 | +L7 | B_ESCHR_COL | -R | S | R | S | +S | FALSE | TRUE | |
4 | -2010-07-21 | -P7 | +2011-02-25 | +L7 | B_ESCHR_COL | +R | S | -I | S | S | FALSE | @@ -702,83 +702,83 @@||
5 | -2010-08-20 | -P7 | +2011-08-07 | +L7 | B_ESCHR_COL | S | -R | +S | S | S | FALSE | -FALSE | +TRUE |
6 | -2010-12-14 | -P7 | +2011-08-16 | +L7 | B_ESCHR_COL | S | -I | -S | -S | -FALSE | -FALSE | -||
7 | -2011-03-02 | -P7 | -B_ESCHR_COL | -S | -S | S | R | +S | +FALSE | +TRUE | +|||
7 | +2011-10-08 | +L7 | +B_ESCHR_COL | +S | +R | +S | +S | TRUE | TRUE | ||||
8 | -2011-03-14 | -P7 | +2011-10-26 | +L7 | B_ESCHR_COL | -S | -S | R | S | +S | +S | FALSE | TRUE |
9 | -2011-05-28 | -P7 | +2012-01-15 | +L7 | B_ESCHR_COL | S | I | -S | +R | S | FALSE | TRUE | |
10 | -2011-08-09 | -P7 | +2012-02-08 | +L7 | B_ESCHR_COL | -I | S | -R | +S | +S | S | FALSE | TRUE |
Instead of 2, now 8 isolates are flagged. In total, 79.5% of all isolates are marked ‘first weighted’ - 51.2% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
+Instead of 2, now 10 isolates are flagged. In total, 78.6% of all isolates are marked ‘first weighted’ - 50.3% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
As with filter_first_isolate()
, there’s a shortcut for this new algorithm too:
So we end up with 15,891 isolates for analysis.
+So we end up with 15,729 isolates for analysis.
We can remove unneeded columns:
@@ -786,7 +786,6 @@date | patient_id | hospital | @@ -803,15 +802,14 @@||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | -2014-08-13 | -C5 | -Hospital C | +2015-10-28 | +A4 | +Hospital B | B_ESCHR_COL | -R | S | R | S | +S | M | Gram negative | Escherichia | @@ -819,13 +817,42 @@TRUE |
4 | -2010-09-17 | -R7 | +2011-07-22 | +Y8 | +Hospital C | +B_STRPT_PNE | +S | +S | +S | +R | +F | +Gram positive | +Streptococcus | +pneumoniae | +TRUE | +|
2014-09-30 | +N3 | +Hospital B | +B_STRPT_PNE | +S | +S | +S | +R | +M | +Gram positive | +Streptococcus | +pneumoniae | +TRUE | +||||
2013-02-14 | +Z6 | Hospital A | B_ESCHR_COL | -R | -I | +S | +S | R | S | F | @@ -835,14 +862,28 @@TRUE | |||||
5 | -2017-04-07 | -Z10 | +2015-02-01 | +D5 | +Hospital C | +B_ESCHR_COL | +S | +I | +S | +S | +M | +Gram negative | +Escherichia | +coli | +TRUE | +|
2013-12-28 | +V6 | Hospital B | B_STPHY_AUR | +R | S | -S | -S | +R | S | F | Gram positive | @@ -850,54 +891,6 @@aureus | TRUE | |||
7 | -2012-04-03 | -J2 | -Hospital A | -B_ESCHR_COL | -R | -R | -R | -S | -M | -Gram negative | -Escherichia | -coli | -TRUE | -|||
9 | -2017-09-09 | -U3 | -Hospital A | -B_ESCHR_COL | -R | -S | -S | -S | -F | -Gram negative | -Escherichia | -coli | -TRUE | -|||
10 | -2015-12-21 | -E1 | -Hospital B | -B_ESCHR_COL | -S | -S | -S | -S | -M | -Gram negative | -Escherichia | -coli | -TRUE | -
Time for the analysis!
@@ -915,9 +908,9 @@Or can be used like the dplyr
way, which is easier readable:
Frequency table of genus
and species
from a data.frame
(15,891 x 13)
Frequency table of genus
and species
from a data.frame
(15,729 x 13)
Columns: 2
-Length: 15,891 (of which NA: 0 = 0.00%)
+Length: 15,729 (of which NA: 0 = 0.00%)
Unique: 4
Shortest: 16
Longest: 24
The functions portion_S()
, portion_SI()
, portion_I()
, portion_IR()
and portion_R()
can be used to determine the portion of a specific antimicrobial outcome. They can be used on their own:
Or can be used in conjuction with group_by()
and summarise()
, both from the dplyr
package:
data_1st %>%
group_by(hospital) %>%
@@ -984,19 +977,19 @@ Longest: 24
Hospital A
-0.4674370
+0.4841779
Hospital B
-0.4698925
+0.4800215
Hospital C
-0.4813574
+0.4663419
Hospital D
-0.4712389
+0.4815057
@@ -1014,23 +1007,23 @@ Longest: 24
Hospital A
-0.4674370
-4760
+0.4841779
+4835
Hospital B
-0.4698925
-5580
+0.4800215
+5581
Hospital C
-0.4813574
-2387
+0.4663419
+2258
Hospital D
-0.4712389
-3164
+0.4815057
+3055
@@ -1050,27 +1043,27 @@ Longest: 24
Escherichia
-0.7272384
-0.9034205
-0.9763581
+0.7376713
+0.8986807
+0.9754067
Klebsiella
-0.7457847
-0.9014267
-0.9760052
+0.7305010
+0.8953710
+0.9727330
Staphylococcus
-0.7245186
-0.9181001
-0.9756098
+0.7305435
+0.9275325
+0.9793315
Streptococcus
-0.7234213
+0.7320692
0.0000000
-0.7234213
+0.7320692
diff --git a/docs/articles/AMR_files/figure-html/plot 1-1.png b/docs/articles/AMR_files/figure-html/plot 1-1.png
index bb00739e..b76ef17b 100644
Binary files a/docs/articles/AMR_files/figure-html/plot 1-1.png and b/docs/articles/AMR_files/figure-html/plot 1-1.png differ
diff --git a/docs/articles/AMR_files/figure-html/plot 3-1.png b/docs/articles/AMR_files/figure-html/plot 3-1.png
index 1eb0f5f2..a6b86eeb 100644
Binary files a/docs/articles/AMR_files/figure-html/plot 3-1.png and b/docs/articles/AMR_files/figure-html/plot 3-1.png differ
diff --git a/docs/articles/AMR_files/figure-html/plot 4-1.png b/docs/articles/AMR_files/figure-html/plot 4-1.png
index 8232c403..9bc0d924 100644
Binary files a/docs/articles/AMR_files/figure-html/plot 4-1.png and b/docs/articles/AMR_files/figure-html/plot 4-1.png differ
diff --git a/docs/articles/AMR_files/figure-html/plot 5-1.png b/docs/articles/AMR_files/figure-html/plot 5-1.png
index 5a888aaf..95f86455 100644
Binary files a/docs/articles/AMR_files/figure-html/plot 5-1.png and b/docs/articles/AMR_files/figure-html/plot 5-1.png differ
diff --git a/docs/articles/EUCAST.html b/docs/articles/EUCAST.html
index 47e067ad..5e05b9fd 100644
--- a/docs/articles/EUCAST.html
+++ b/docs/articles/EUCAST.html
@@ -40,7 +40,7 @@
EUCAST.Rmd
G_test.Rmd
SPSS.Rmd
R is highly modular.
-The official R network (CRAN) features almost 14,000 packages at the time of writing, our AMR
package being one of them. All these packages were peer-reviewed before publication. Aside from this official channel, there are also developers who choose not to submit to CRAN, but rather keep it on their own public repository, like GitLab or GitHub. So there may even be a lot more than 14,000 packages out there.
The official R network (CRAN) features almost 14,000 packages at the time of writing, our AMR
package being one of them. All these packages were peer-reviewed before publication. Aside from this official channel, there are also developers who choose not to submit to CRAN, but rather keep it on their own public repository, like GitLab or GitHub. So there may even be a lot more than 14,000 packages out there.
Bottom line is, you can really extend it yourself or ask somebody to do this for you. Take for example our AMR
package. Among other things, it adds reliable reference data to R to help you with the data cleaning and analysis. SPSS, SAS and Stata will never know what a valid MIC value is or what the Gram stain of E. coli is. Or that all species of Klebiella are resistant to amoxicillin and that Floxapen® is a trade name of flucloxacillin. These facts and properties are often needed to clean existing data, which would be very inconvenient in a software package without reliable reference data. See below for a demonstration.
R has a huge community.
-Many R users just ask questions on websites like StackOverflow.com, the largest online community for programmers. At the time of writing, more than 275,000 R-related questions have already been asked on this platform (which covers questions and answers for any programming language). In my own experience, most questions are answered within a couple of minutes.
+Many R users just ask questions on websites like StackOverflow.com, the largest online community for programmers. At the time of writing, almost 300,000 R-related questions have already been asked on this platform (which covers questions and answers for any programming language). In my own experience, most questions are answered within a couple of minutes.
R understands any data type, including SPSS/SAS/Stata.
-And that’s not vice versa I’m afraid. You can import data from any source into R. As said, from SPSS/SAS/Stata (link), but also from Excel (link), from flat files like CSV, TXT or TSV (link), or directly from databases or datawarehouses from anywhere on the world (link). You can even scrape websites to download tables that are live on the internet (link).
-And the best part - you can export from R to all data formats as well. So you can import an SPSS file, do your analysis neatly in R and export back to SPSS. Although you might omit that very last step.
+And that’s not vice versa I’m afraid. You can import data from any source into R. From SPSS, SAS and Stata (link), from Minitab, Epi Info and EpiData (link), from Excel (link), from flat files like CSV, TXT or TSV (link), or directly from databases and datawarehouses from anywhere on the world (link). You can even scrape websites to download tables that are live on the internet (link).
+And the best part - you can export from R to most data formats as well. So you can import an SPSS file, do your analysis neatly in R and export the resulting tables to Excel files.
R is completely free and open-source.
No strings attached. It was created and is being maintained by volunteers who believe that (data) science should be open and publicly available to everybody. SPSS, SAS and Stata are quite expensive. IBM SPSS Staticstics only comes with subscriptions nowadays, varying between USD 1,300 and USD 8,500 per computer per year. SAS Analytics Pro costs around USD 10,000 per computer. Stata also has a business model with subscription fees, varying between USD 600 and USD 1,200 per computer per year, but lower prices come with a limitation of the number of variables you can work with.
-If you are working at a midsized or small company, you can save it tens of thousands of dollars by using R instead of SPSS - gaining even more functions and flexibility. And all R enthousiasts can do as much PR as they want (like I do here), because nobody is officially associated with or affiliated by R. It is really free.
+If you are working at a midsized or small company, you can save it tens of thousands of dollars by using R instead of e.g. SPSS - gaining even more functions and flexibility. And all R enthousiasts can do as much PR as they want (like I do here), because nobody is officially associated with or affiliated by R. It is really free.
If you sometimes write syntaxes in SPSS to run a complete analysis or to ‘automate’ some of your work, you should perhaps do this in R. You will notice that writing syntaxes in R is a lot more nifty and clever than in SPSS.
@@ -280,7 +280,7 @@To work with R, probably the best option is to use RStudio. It is an open-source and free desktop environment which not only allows you to run R code, but also supports project management, version management, package management and convenient import menus to work with other data sources. You can also run RStudio Server, which is nothing less than the complete RStudio software available as a website (e.g. in your corporate network or at home).
+To work with R, probably the best option is to use RStudio. It is an open-source and free desktop environment which not only allows you to run R code, but also supports project management, version management, package management and convenient import menus to work with other data sources. You can also install RStudio Server on a private or corporate server, which brings nothing less than the complete RStudio software to you as a website (at home or at work).
To import a data file, just click Import Dataset in the Environment tab:
If additional packages are needed, RStudio will ask you if they should be installed on beforehand.
diff --git a/docs/articles/WHONET.html b/docs/articles/WHONET.html index e3c39870..ff1d5e4b 100644 --- a/docs/articles/WHONET.html +++ b/docs/articles/WHONET.html @@ -40,7 +40,7 @@WHONET.Rmd
Frequency table of mo
from a data.frame
(500 x 54)
Class: mo
(character
)
Length: 500 (of which NA: 0 = 0.00%)
-Unique: 37
Families: 10
Genera: 17
-Species: 35
@@ -258,7 +258,7 @@ Species: 35 | |||||||
---|---|---|---|---|---|---|---|
2 | -B_STPHY | +B_STPHY_CNS | 74 | 14.8% | 319 | @@ -314,33 +314,34 @@ Species: 35||
9 | -B_STRPT | -8 | -1.6% | -442 | -88.4% | -||
10 | B_ENTRB_CLO | 5 | 1.0% | -447 | -89.4% | +439 | +87.8% | +
10 | +B_ENTRC_COL | +4 | +0.8% | +443 | +88.6% |
(omitted 27 entries, n = 53 [10.6%])
+(omitted 29 entries, n = 57 [11.4%])
# our transformed antibiotic columns
# amoxicillin/clavulanic acid (J01CR02) as an example
data %>% freq(AMC_ND2)
Frequency table of AMC_ND2
from a data.frame
(500 x 54)
# Warning: These values could not be coerced to a valid atc: "AMCND".
Class: factor
> ordered
> rsi
(numeric
)
Length: 500 (of which NA: 19 = 3.80%)
Levels: 3: S
< I
< R
Unique: 3
%IR: 25.00% (ratio 2.8:1)
+%IR: 25.99%
diff --git a/docs/articles/atc_property.html b/docs/articles/atc_property.html index bcc1f409..4f0f304b 100644 --- a/docs/articles/atc_property.html +++ b/docs/articles/atc_property.html @@ -40,7 +40,7 @@ @@ -192,7 +192,7 @@ | |||||||
---|---|---|---|---|---|---|---|
2 | -Staphylococcus coagulase negative | +Staphylococcus coagulase-negative | 313 | 15.7% | 780 | @@ -604,7 +605,8 @@ Unique: 4 Length: 2,000 (of which NA: 771 = 38.55%)
@@ -735,7 +737,8 @@ Median: 31 July 2009 (47.39%) Length: 2,000 (of which NA: 771 = 38.55%) |
---|
diff --git a/docs/articles/index.html b/docs/articles/index.html index 9f777dc6..ffcbca60 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -78,7 +78,7 @@ diff --git a/docs/articles/mo_property.html b/docs/articles/mo_property.html index 1ba5e03b..a730030e 100644 --- a/docs/articles/mo_property.html +++ b/docs/articles/mo_property.html @@ -40,7 +40,7 @@ @@ -192,7 +192,7 @@ |
---|