diff --git a/DESCRIPTION b/DESCRIPTION index 251378b0..9d62c704 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -13,7 +13,7 @@ Authors@R: c( given = c("Christian", "F."), family = "Luz", email = "c.f.luz@umcg.nl", - role = c("aut", "rev"), + role = "aut", comment = c(ORCID = "0000-0001-5809-5995")), person( given = c("Erwin", "E.", "A."), diff --git a/_pkgdown.yml b/_pkgdown.yml index 1f73ca9e..8f3683e5 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -41,6 +41,9 @@ navbar: - text: 'Work with WHONET data' icon: 'fa-globe-americas' href: 'articles/WHONET.html' + - text: 'Import data from SPSS/SAS/Stata' + icon: 'fa-file-upload' + href: 'articles/SPSS.html' - text: 'Apply EUCAST rules' icon: 'fa-exchange-alt' href: 'articles/EUCAST.html' diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index b66d2788..78f0eccf 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -121,6 +121,13 @@ Work with WHONET data +
So, we can draw at least two conclusions immediately. From a data scientist perspective, the data looks clean: only values M
and F
. From a researcher perspective: there are slightly more men. Nothing we didn’t already know.
The data is already quite clean, but we still need to transform some variables. The bacteria
column now consists of text, and we want to add more variables based on microbial IDs later on. So, we will transform this column to valid IDs. The mutate()
function of the dplyr
package makes this really easy:
data <- data %>%
@@ -436,10 +443,10 @@
#> Kingella kingae (no changes)
#>
#> EUCAST Expert Rules, Intrinsic Resistance and Exceptional Phenotypes (v3.1, 2016)
-#> Table 1: Intrinsic resistance in Enterobacteriaceae (1293 changes)
+#> Table 1: Intrinsic resistance in Enterobacteriaceae (1241 changes)
#> Table 2: Intrinsic resistance in non-fermentative Gram-negative bacteria (no changes)
#> Table 3: Intrinsic resistance in other Gram-negative bacteria (no changes)
-#> Table 4: Intrinsic resistance in Gram-positive bacteria (2812 changes)
+#> Table 4: Intrinsic resistance in Gram-positive bacteria (2713 changes)
#> Table 8: Interpretive rules for B-lactam agents and Gram-positive cocci (no changes)
#> Table 9: Interpretive rules for B-lactam agents and Gram-negative rods (no changes)
#> Table 10: Interpretive rules for B-lactam agents and other Gram-negative bacteria (no changes)
@@ -455,9 +462,9 @@
#> Non-EUCAST: piperacillin/tazobactam = S where piperacillin = S (no changes)
#> Non-EUCAST: trimethoprim/sulfa = S where trimethoprim = S (no changes)
#>
-#> => EUCAST rules affected 7,447 out of 20,000 rows
+#> => EUCAST rules affected 7,301 out of 20,000 rows
#> -> added 0 test results
-#> -> changed 4,105 test results (0 to S; 0 to I; 4,105 to R)
So only 28.5% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
So only 28.3% is suitable for resistance analysis! We can now filter on it with the filter()
function, also from the dplyr
package:
For future use, the above two syntaxes can be shortened with the filter_first_isolate()
function:
Only 2 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics()
function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.
Only 1 isolates are marked as ‘first’ according to CLSI guideline. But when reviewing the antibiogram, it is obvious that some isolates are absolutely different strains and should be included too. This is why we weigh isolates, based on their antibiogram. The key_antibiotics()
function adds a vector with 18 key antibiotics: 6 broad spectrum ones, 6 small spectrum for Gram negatives and 6 small spectrum for Gram positives. These can be defined by the user.
If a column exists with a name like ‘key(…)ab’ the first_isolate()
function will automatically use it and determine the first weighted isolates. Mind the NOTEs in below output:
data <- data %>%
mutate(keyab = key_antibiotics(.)) %>%
@@ -630,7 +637,7 @@
#> NOTE: Using column `patient_id` as input for `col_patient_id`.
#> NOTE: Using column `keyab` as input for `col_keyantibiotics`. Use col_keyantibiotics = FALSE to prevent this.
#> [Criterion] Inclusion based on key antibiotics, ignoring I.
-#> => Found 15,866 first weighted isolates (79.3% of total)
isolate | @@ -647,10 +654,10 @@|||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | -2010-06-20 | -Y4 | +2010-01-04 | +M1 | B_ESCHR_COL | -S | +R | S | R | S | @@ -659,10 +666,10 @@|||
2 | -2010-07-31 | -Y4 | +2010-01-27 | +M1 | B_ESCHR_COL | -S | +R | S | S | S | @@ -671,20 +678,20 @@|||
3 | -2010-08-26 | -Y4 | +2010-05-07 | +M1 | B_ESCHR_COL | S | S | S | S | FALSE | -FALSE | +TRUE | |
4 | -2010-12-11 | -Y4 | +2010-06-09 | +M1 | B_ESCHR_COL | R | S | @@ -695,8 +702,8 @@||||||
5 | -2010-12-30 | -Y4 | +2010-07-23 | +M1 | B_ESCHR_COL | R | S | @@ -707,32 +714,32 @@||||||
6 | -2011-04-02 | -Y4 | +2010-09-29 | +M1 | B_ESCHR_COL | -R | -I | S | R | +S | +S | FALSE | TRUE |
7 | -2011-04-06 | -Y4 | +2010-10-14 | +M1 | B_ESCHR_COL | -S | -S | +R | S | R | +S | FALSE | TRUE |
8 | -2011-04-07 | -Y4 | +2010-10-15 | +M1 | B_ESCHR_COL | S | S | @@ -743,11 +750,11 @@||||||
9 | -2011-05-28 | -Y4 | +2010-10-16 | +M1 | B_ESCHR_COL | -R | S | +R | S | S | FALSE | @@ -755,23 +762,23 @@||
10 | -2011-09-09 | -Y4 | +2010-11-16 | +M1 | B_ESCHR_COL | S | S | -R | S | -TRUE | +S | +FALSE | TRUE |
Instead of 2, now 8 isolates are flagged. In total, 79.3% of all isolates are marked ‘first weighted’ - 50.9% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
+Instead of 1, now 9 isolates are flagged. In total, 79.3% of all isolates are marked ‘first weighted’ - 51% more than when using the CLSI guideline. In real life, this novel algorithm will yield 5-10% more isolates than the classic CLSI guideline.
As with filter_first_isolate()
, there’s a shortcut for this new algorithm too:
So we end up with 15,866 isolates for analysis.
+So we end up with 15,859 isolates for analysis.
We can remove unneeded columns:
@@ -796,42 +803,74 @@Or can be used like the dplyr
way, which is easier readable:
Frequency table of genus
and species
from a data.frame
(15,866 x 13)
Frequency table of genus
and species
from a data.frame
(15,859 x 13)
Columns: 2
-Length: 15,866 (of which NA: 0 = 0.00%)
+Length: 15,859 (of which NA: 0 = 0.00%)
Unique: 4
Shortest: 16
Longest: 24
The functions portion_R
, portion_RI
, portion_I
, portion_IS
and portion_S
can be used to determine the portion of a specific antimicrobial outcome. They can be used on their own:
Or can be used in conjuction with group_by()
and summarise()
, both from the dplyr
package:
data_1st %>%
group_by(hospital) %>%
@@ -977,19 +984,19 @@ Longest: 24
Hospital A
-0.4749633
+0.4658398
Hospital B
-0.4879713
+0.4821041
Hospital C
-0.4761905
+0.4874459
Hospital D
-0.4600062
+0.4803681
@@ -1007,23 +1014,23 @@ Longest: 24
Hospital A
-0.4749633
-4773
+0.4658398
+4757
Hospital B
-0.4879713
-5570
+0.4821041
+5532
Hospital C
-0.4761905
+0.4874459
2310
Hospital D
-0.4600062
-3213
+0.4803681
+3260
@@ -1043,27 +1050,27 @@ Longest: 24
Escherichia
-0.7249364
-0.8996183
-0.9725191
+0.7305798
+0.8974359
+0.9760010
Klebsiella
-0.7291399
-0.9037227
-0.9762516
+0.7370441
+0.9040307
+0.9776072
Staphylococcus
-0.7410072
-0.9229188
-0.9786742
+0.7303256
+0.9200205
+0.9771853
Streptococcus
-0.7292645
+0.7332526
0.0000000
-0.7292645
+0.7332526
diff --git a/docs/articles/AMR_files/figure-html/plot 1-1.png b/docs/articles/AMR_files/figure-html/plot 1-1.png
index b8ebcb6c..6c74e47b 100644
Binary files a/docs/articles/AMR_files/figure-html/plot 1-1.png and b/docs/articles/AMR_files/figure-html/plot 1-1.png differ
diff --git a/docs/articles/AMR_files/figure-html/plot 3-1.png b/docs/articles/AMR_files/figure-html/plot 3-1.png
index 33c83dcc..7e16efff 100644
Binary files a/docs/articles/AMR_files/figure-html/plot 3-1.png and b/docs/articles/AMR_files/figure-html/plot 3-1.png differ
diff --git a/docs/articles/AMR_files/figure-html/plot 4-1.png b/docs/articles/AMR_files/figure-html/plot 4-1.png
index 9bfe6803..913447ee 100644
Binary files a/docs/articles/AMR_files/figure-html/plot 4-1.png and b/docs/articles/AMR_files/figure-html/plot 4-1.png differ
diff --git a/docs/articles/AMR_files/figure-html/plot 5-1.png b/docs/articles/AMR_files/figure-html/plot 5-1.png
index 70b13bd3..57c3c5c1 100644
Binary files a/docs/articles/AMR_files/figure-html/plot 5-1.png and b/docs/articles/AMR_files/figure-html/plot 5-1.png differ
diff --git a/docs/articles/EUCAST.html b/docs/articles/EUCAST.html
index 84e39b40..43817373 100644
--- a/docs/articles/EUCAST.html
+++ b/docs/articles/EUCAST.html
@@ -83,6 +83,13 @@
Work with WHONET data
+
+
+
+
+ Import data from SPSS/SAS/Stata
+
+
diff --git a/docs/articles/G_test.html b/docs/articles/G_test.html
index 439fdc60..51b354ec 100644
--- a/docs/articles/G_test.html
+++ b/docs/articles/G_test.html
@@ -83,6 +83,13 @@
Work with WHONET data
+
+
+
+
+ Import data from SPSS/SAS/Stata
+
+
diff --git a/docs/articles/SPSS.html b/docs/articles/SPSS.html
new file mode 100644
index 00000000..fdf4eab7
--- /dev/null
+++ b/docs/articles/SPSS.html
@@ -0,0 +1,397 @@
+
+
+
+
+
+
+
+How to import data from SPSS / SAS / Stata • AMR (for R)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ How to import data from SPSS / SAS / Stata
+ Matthijs S. Berends
+
+ 14 February 2019
+
+
+ SPSS.Rmd
+
+
+
+
+
+
+
+SPSS / SAS / Stata
+SPSS (Statistical Package for the Social Sciences) is probably the most well-known software package for statistical analysis. SPSS is easier to learn than R, because in SPSS you only have to click a menu to run parts of your analysis. Because of its user-friendlyness, it is taught at universities and particularly useful for students who are new to statistics. From my experience, I would guess that pretty much all (bio)medical students know it at the time they graduate. SAS and Stata are statistical packages popular in big industries.
+
+
+
+Compared to R
+As said, SPSS is easier to learn than R. But SPSS, SAS and Stata come with major downsides when comparing it with R:
+
+-
+
R is highly modular.
+The official R network (CRAN) features almost 14,000 packages at the time of writing, our AMR
package being one of them. All these packages were peer-reviewed before publication. Aside from this official channel, there are also developers who choose not to submit to CRAN, but rather keep it on their own public repository, like GitLab or GitHub. So there may even be a lot more than 14,000 packages out there.
+Bottomline is, you can really extend it yourself or ask somebody to do this for you. Take for example our AMR
package. SPSS, SAS and Stata will never know what a valid MIC value is (so data might not be clean) or what the Gram stain of E. coli is. Or the fact that all species of Klebiella are resistant to amoxicillin.
+
+-
+
R is extremely flexible.
+Because you write the syntax yourself, you can do anything you want. The flexibility in transforming, gathering, grouping, summarising and drawing plots is endless - with SPSS, SAS or Stata you are bound to their algorithms and styles. It may be a bit flexible, but you can never create that very specific publication-ready plot without using other (paid) software.
+
+-
+
R can be easily automated.
+Over the last years, R Markdown has really made an interesting development. With R Markdown, you can very easily reproduce your reports, whether it’s to Word, Powerpoint, a website, a PDF document or just the raw data to Excel. I use this a lot to generate monthly reports automatically. Just write the code once and enjoy the automatically updated reports at any interval you like.
+For an even more professional environment, you could create Shiny apps: live manipulation of data using a custom made website. The webdesign knowledge needed (Javascript, CSS, HTML) is almost zero.
+
+-
+
R has a huge community.
+Many R users just ask questions on website like stackoverflow.com, the largest online community for programmers. At the time of writing, around 275,000 R questions have been asked on this platform (which covers questions and answer for any programming language). In my own experience, most questions are answered within a couple of minutes.
+
+-
+
R understands any data type, including SPSS/SAS/Stata.
+And that’s not vice versa I’m afraid. You can import data from any source into R. As said, from SPSS/SAS/Stata (link), but also from Excel (link), from flat files like CSV, TXT or TSV (link), or directly from databases or datawarehouses from anywhere on the world (link). You can even scrape websites to download tables that are live on the internet (link).
+And the best part - you can export from R to all data formats as well. So you can import an SPSS file, do your analysis neatly in R and export back to SPSS. Although you might omit that very last step.
+
+-
+
R is completely free and open-source.
+No strings attached. It was created and is being maintained by volunteers who believe that (data) science should be open and publicly available to everybody. SPSS, SAS and Stata are quite expensive. IBM SPSS Staticstics only comes with subscriptions nowadays, varying between USD 1,300 and USD 8,500 per computer per year. SAS Analytics Pro costs around USD 10,000 per computer. Stata also has a business model with subscription fees, varying between USD 600 and USD 1,200 per computer per year, but lower prices come with a limitation of the number of variables you can work with.
+If you are working at a midsized or small company, you can save it tens of thousands of dollars by using R instead of SPSS - gaining even more functions and flexibility. And all R enthousiasts can do as much PR as they want (like I do here), because nobody is officially associated with or affiliated by R. It is really free.
+
+
+If you sometimes write syntaxes in SPSS to run a complete analysis or to ‘automate’ some of your work, you should perhaps do this in R. You will notice that writing syntaxes in R is a lot more nifty and clever than in SPSS.
+
+
+
+Import data from SPSS/SAS/Stata
+
+
+RStudio
+To work with R, probably the best option is to use RStudio. It is an open-source and free desktop environment which not only allows you to run R code, but also supports project management, version management, package management and convenient import menu to work with other data sources. You can also run RStudio Server, which is nothing less than the complete RStudio software available as a website (e.g. in your corporate network or at home).
+To import a data file, just click Import Dataset in the Environment tab:
+
+If additional packages are needed, RStudio will ask you if they should be installed on beforehand.
+In the the window that opens, you can define all options (parameters) that should be used for import and you’re ready to go:
+
+If you want named variables to be imported as factors so it resembles SPSS more, use as_factor()
.
+The difference is this:
+SPSS_data
+# # A tibble: 4,203 x 4
+# v001 sex status statusage
+# <dbl> <dbl+lbl> <dbl+lbl> <dbl>
+# 1 10002 1 1 76.6
+# 2 10004 0 1 59.1
+# 3 10005 1 1 54.5
+# 4 10006 1 1 54.1
+# 5 10007 1 1 57.7
+# 6 10008 1 1 62.8
+# 7 10010 0 1 63.7
+# 8 10011 1 1 73.1
+# 9 10017 1 1 56.7
+# 10 10018 0 1 66.6
+# # … with 4,193 more rows
+
+as_factor(SPSS_data)
+# # A tibble: 4,203 x 4
+# v001 sex status statusage
+# <dbl> <fct> <fct> <dbl>
+# 1 10002 Male alive 76.6
+# 2 10004 Female alive 59.1
+# 3 10005 Male alive 54.5
+# 4 10006 Male alive 54.1
+# 5 10007 Male alive 57.7
+# 6 10008 Male alive 62.8
+# 7 10010 Female alive 63.7
+# 8 10011 Male alive 73.1
+# 9 10017 Male alive 56.7
+# 10 10018 Female alive 66.6
+# # … with 4,193 more rows
+
+
+
+Base R
+To import data from SPSS, SAS or Stata, you can use the great haven
package yourself:
+# download and install the latest version:
+install.packages("haven")
+# load the package you just installed:
+library(haven)
+You can now import files as follows:
+
+
+SPSS
+To read files from SPSS into R:
+# read any SPSS file based on file extension (best way):
+read_spss(file = "path/to/file")
+
+# read .sav or .zsav file:
+read_sav(file = "path/to/file")
+
+# read .por file:
+read_por(file = "path/to/file")
+Do not forget about as_factor()
, as mentioned above.
+To export your R objects to the SPSS file format:
+
+
+
+
+SAS
+To read files from SAS into R:
+# read .sas7bdat + .sas7bcat files:
+read_sas(data_file = "path/to/file", catalog_file = NULL)
+
+# read SAS transport files (version 5 and version 8):
+read_xpt(file = "path/to/file")
+To export your R objects to the SAS file format:
+
+
+
+
+Stata
+To read files from Stata into R:
+# read .dta file:
+read_stata(file = "/path/to/file")
+
+# works exactly the same:
+read_dta(file = "/path/to/file")
+To export your R objects to the Stata file format:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/docs/articles/WHONET.html b/docs/articles/WHONET.html
index 7e0c1b40..34d9fc96 100644
--- a/docs/articles/WHONET.html
+++ b/docs/articles/WHONET.html
@@ -83,6 +83,13 @@
Work with WHONET data
+
+
+
+
+ Import data from SPSS/SAS/Stata
+
+
diff --git a/docs/articles/atc_property.html b/docs/articles/atc_property.html
index 6fb5fb0a..a958d76c 100644
--- a/docs/articles/atc_property.html
+++ b/docs/articles/atc_property.html
@@ -83,6 +83,13 @@
Work with WHONET data
+
+
+
+
+ Import data from SPSS/SAS/Stata
+
+
diff --git a/docs/articles/benchmarks.html b/docs/articles/benchmarks.html
index c366169e..45722e41 100644
--- a/docs/articles/benchmarks.html
+++ b/docs/articles/benchmarks.html
@@ -83,6 +83,13 @@
Work with WHONET data
+
+
+
+
+ Import data from SPSS/SAS/Stata
+
+
diff --git a/docs/articles/freq.html b/docs/articles/freq.html
index ca9f0361..db8b423f 100644
--- a/docs/articles/freq.html
+++ b/docs/articles/freq.html
@@ -83,6 +83,13 @@
Work with WHONET data
+
+
+
+
+ Import data from SPSS/SAS/Stata
+
+
diff --git a/docs/articles/index.html b/docs/articles/index.html
index 88a1f82d..87d81222 100644
--- a/docs/articles/index.html
+++ b/docs/articles/index.html
@@ -121,6 +121,13 @@
Work with WHONET data
+
+
+
+
+ Import data from SPSS/SAS/Stata
+
+
@@ -232,6 +239,7 @@
How to conduct AMR analysis
How to apply EUCAST rules
How to use the *G*-test
+ How to import data from SPSS / SAS / Stata
How to work with WHONET data
How to get properties of an antibiotic
Benchmarks
diff --git a/docs/articles/mo_property.html b/docs/articles/mo_property.html
index 3a1da9ed..7cd41212 100644
--- a/docs/articles/mo_property.html
+++ b/docs/articles/mo_property.html
@@ -83,6 +83,13 @@
Work with WHONET data
+
+
+
+
+ Import data from SPSS/SAS/Stata
+
+
diff --git a/docs/articles/resistance_predict.html b/docs/articles/resistance_predict.html
index 531e53c3..20f66452 100644
--- a/docs/articles/resistance_predict.html
+++ b/docs/articles/resistance_predict.html
@@ -83,6 +83,13 @@
Work with WHONET data
+
+
+
+
+ Import data from SPSS/SAS/Stata
+
+
diff --git a/docs/authors.html b/docs/authors.html
index a2906631..dded1b7f 100644
--- a/docs/authors.html
+++ b/docs/authors.html
@@ -121,6 +121,13 @@
Work with WHONET data
+
+
+
+
+ Import data from SPSS/SAS/Stata
+
+
@@ -230,7 +237,7 @@
- Christian F. Luz. Author, reviewer.
+
Christian F. Luz. Author.
diff --git a/docs/extra.css b/docs/extra.css
index 9d2cdc72..2a1d3b1d 100644
--- a/docs/extra.css
+++ b/docs/extra.css
@@ -79,6 +79,10 @@ a pre[href], a pre[href]:hover, a pre[href]:focus {
/* adjusted colour for all real links; having href attribute */
color: #128f76;
}
+.ot, .dv {
+ /* numbers and TRUE/FALSE */
+ color: slategray;
+}
/* syntax font */
pre, code {
diff --git a/docs/extra.js b/docs/extra.js
index f8e2d62c..a8fbe436 100644
--- a/docs/extra.js
+++ b/docs/extra.js
@@ -56,6 +56,20 @@ $( document ).ready(function() {
'' +
'
(
AMR
is a free and open-source R package to simplify the analysis and prediction of Antimicrobial Resistance (AMR) and to work with microbial and antimicrobial properties by using evidence-based methods. It supports any table format, including WHONET/EARS-Net data.
AMR
is a free and open-source R package to simplify the analysis and prediction of Antimicrobial Resistance (AMR) and to work with microbial and antimicrobial properties by using evidence-based methods. It supports any data format, including WHONET/EARS-Net data.
After installing this package, R knows almost all ~20,000 microorganisms and ~500 antibiotics by name and code, and knows all about valid RSI and MIC values.
-We created this package for both academic research and routine analysis at the Faculty of Medical Sciences of the University of Groningen and the Medical Microbiology & Infection Prevention (MMBI) department of the University Medical Center Groningen (UMCG). This R package is actively maintained and free software; you can freely use and distribute it for both personal and commercial (but not patent) purposes under the terms of the GNU General Public License version 2.0 (GPL-2), as published by the Free Software Foundation. Read the full license here.
+Used to SPSS? Read our tutorial on how to import data from SPSS, SAS or Stata and learn in which ways R outclasses any of these statistical packages.
+We created this package for both academic research and routine analysis at the Faculty of Medical Sciences of the University of Groningen, the Netherlands, and the Medical Microbiology & Infection Prevention (MMBI) department of the University Medical Center Groningen (UMCG). This R package is actively maintained and is free software; you can freely use and distribute it for both personal and commercial (but not patent) purposes under the terms of the GNU General Public License version 2.0 (GPL-2), as published by the Free Software Foundation. Read the full license here.
This package can be used for: