How to import data from SPSS / SAS / Stata
-Dr. Matthijs -Berends
- -17 May 2023
- - Source:vignettes/SPSS.Rmd
- SPSS.Rmd
SPSS / SAS / Stata -
-SPSS (Statistical Package for the Social Sciences) is probably the -most well-known software package for statistical analysis. SPSS is -easier to learn than R, because in SPSS you only have to click a menu to -run parts of your analysis. Because of its user-friendliness, it is -taught at universities and particularly useful for students who are new -to statistics. From my experience, I would guess that pretty much all -(bio)medical students know it at the time they graduate. SAS and Stata -are comparable statistical packages popular in big industries.
-Compared to R -
-As said, SPSS is easier to learn than R. But SPSS, SAS and Stata come -with major downsides when comparing it with R:
--
-
-
-
R is highly modular.
-The official R network -(CRAN) features more than 16,000 packages at the time of writing, -our
-AMR
package being one of them. All these packages were -peer-reviewed before publication. Aside from this official channel, -there are also developers who choose not to submit to CRAN, but rather -keep it on their own public repository, like GitHub. So there may even -be a lot more than 14,000 packages out there.Bottom line is, you can really extend it yourself or ask somebody to -do this for you. Take for example our
-AMR
package. Among -other things, it adds reliable reference data to R to help you with the -data cleaning and analysis. SPSS, SAS and Stata will never know what a -valid MIC value is or what the Gram stain of E. coli is. Or -that all species of Klebiella are resistant to amoxicillin and -that Floxapen® is a trade name of flucloxacillin. These facts -and properties are often needed to clean existing data, which would be -very inconvenient in a software package without reliable reference data. -See below for a demonstration.
- -
-
R is extremely flexible.
-Because you write the syntax yourself, you can do anything you want. -The flexibility in transforming, arranging, grouping and summarising -data, or drawing plots, is endless - with SPSS, SAS or Stata you are -bound to their algorithms and format styles. They may be a bit flexible, -but you can probably never create that very specific publication-ready -plot without using other (paid) software. If you sometimes write -syntaxes in SPSS to run a complete analysis or to ‘automate’ some of -your work, you could do this a lot less time in R. You will notice that -writing syntaxes in R is a lot more nifty and clever than in SPSS. -Still, as working with any statistical package, you will have to have -knowledge about what you are doing (statistically) and what you are -willing to accomplish.
-
- -
-
R can be easily automated.
-Over the last years, R -Markdown has really made an interesting development. With R -Markdown, you can very easily produce reports, whether the format has to -be Word, PowerPoint, a website, a PDF document or just the raw data to -Excel. It even allows the use of a reference file containing the layout -style (e.g. fonts and colours) of your organisation. I use this a lot to -generate weekly and monthly reports automatically. Just write the code -once and enjoy the automatically updated reports at any interval you -like.
-For an even more professional environment, you could create Shiny apps: live manipulation of -data using a custom made website. The webdesign knowledge needed -(JavaScript, CSS, HTML) is almost zero.
-
- -
-
R has a huge community.
-Many R users just ask questions on websites like StackOverflow.com, the largest -online community for programmers. At the time of writing, 489 583 -R-related questions have already been asked on this platform (that -covers questions and answers for any programming language). In my own -experience, most questions are answered within a couple of -minutes.
-
- -
-
R understands any data type, including -SPSS/SAS/Stata.
-And that’s not vice versa I’m afraid. You can import data from any -source into R. For example from SPSS, SAS and Stata (link), from Minitab, Epi Info -and EpiData (link), from Excel -(link), from flat files like -CSV, TXT or TSV (link), or -directly from databases and datawarehouses from anywhere on the world -(link). You can even scrape -websites to download tables that are live on the internet (link) or get the results of -an API call and transform it into data in only one command (link).
-And the best part - you can export from R to most data formats as -well. So you can import an SPSS file, do your analysis neatly in R and -export the resulting tables to Excel files for sharing.
-
- -
-
R is completely free and open-source.
-No strings attached. It was created and is being maintained by -volunteers who believe that (data) science should be open and publicly -available to everybody. SPSS, SAS and Stata are quite expensive. IBM -SPSS Staticstics only comes with subscriptions nowadays, varying between USD -1,300 and USD 8,500 per user per year. SAS Analytics Pro -costs around -USD 10,000 per computer. Stata also has a business model with -subscription fees, varying between -USD 600 and USD 2,800 per computer per year, but lower prices come -with a limitation of the number of variables you can work with. And -still they do not offer the above benefits of R.
-If you are working at a midsized or small company, you can save it -tens of thousands of dollars by using R instead of e.g. SPSS - gaining -even more functions and flexibility. And all R enthousiasts can do as -much PR as they want (like I do here), because nobody is officially -associated with or affiliated by R. It is really free.
-
- -
-
R is (nowadays) the preferred analysis software in -academic papers.
-At present, R is among the world most powerful statistical languages, -and it is generally very popular in science (Bollmann et al., -2017). For all the above reasons, the number of references to R as an -analysis method in academic papers is -rising continuously and has even surpassed SPSS for academic use -(Muenchen, 2014).
-I believe that the thing with SPSS is, that it has always had a great -user interface which is very easy to learn and use. Back when they -developed it, they had very little competition, let alone from R. R -didn’t even had a professional user interface until the last decade -(called RStudio, see below). How people used R between the nineties and -2010 is almost completely incomparable to how R is being used now. The -language itself has been -restyled completely by volunteers who are dedicated professionals in -the field of data science. SPSS was great when there was nothing else -that could compete. But now in 2023, I don’t see any reason why SPSS -would be of any better use than R.
-
-
To demonstrate the first point:
-
-# not all values are valid MIC values:
-as.mic(0.125)
-#> Class 'mic'
-#> [1] 0.125
-as.mic("testvalue")
-#> Class 'mic'
-#> [1] <NA>
-
-# the Gram stain is available for all bacteria:
-mo_gramstain("E. coli")
-#> [1] "Gram-negative"
-
-# Klebsiella is intrinsic resistant to amoxicillin, according to EUCAST:
-klebsiella_test <- data.frame(
- mo = "klebsiella",
- amox = "S",
- stringsAsFactors = FALSE
-)
-klebsiella_test # (our original data)
-#> mo amox
-#> 1 klebsiella S
-eucast_rules(klebsiella_test, info = FALSE) # (the edited data by EUCAST rules)
-#> mo amox
-#> 1 klebsiella R
-
-# hundreds of trade names can be translated to a name, trade name or an ATC code:
-ab_name("floxapen")
-#> [1] "Flucloxacillin"
-ab_tradenames("floxapen")
-#> [1] "culpen" "floxacillin" "floxacillin sodium"
-#> [4] "floxapen" "floxapen sodium salt" "fluclox"
-#> [7] "flucloxacilina" "flucloxacillin" "flucloxacilline"
-#> [10] "flucloxacillinum" "fluorochloroxacillin" "staphylex"
-ab_atc("floxapen")
-#> [1] "J01CF05"
Import data from SPSS/SAS/Stata -
-RStudio -
-To work with R, probably the best option is to use RStudio. It is an -open-source and free desktop environment which not only allows you to -run R code, but also supports project management, version management, -package management and convenient import menus to work with other data -sources. You can also install RStudio Server on a -private or corporate server, which brings nothing less than the complete -RStudio software to you as a website (at home or at work).
-To import a data file, just click Import Dataset in the -Environment tab:
-If additional packages are needed, RStudio will ask you if they -should be installed on beforehand.
-In the the window that opens, you can define all options (parameters) -that should be used for import and you’re ready to go:
-If you want named variables to be imported as factors so it resembles
-SPSS more, use as_factor()
.
The difference is this:
-
-SPSS_data
-# # A tibble: 4,203 x 4
-# v001 sex status statusage
-# <dbl> <dbl+lbl> <dbl+lbl> <dbl>
-# 1 10002 1 1 76.6
-# 2 10004 0 1 59.1
-# 3 10005 1 1 54.5
-# 4 10006 1 1 54.1
-# 5 10007 1 1 57.7
-# 6 10008 1 1 62.8
-# 7 10010 0 1 63.7
-# 8 10011 1 1 73.1
-# 9 10017 1 1 56.7
-# 10 10018 0 1 66.6
-# # ... with 4,193 more rows
-
-as_factor(SPSS_data)
-# # A tibble: 4,203 x 4
-# v001 sex status statusage
-# <dbl> <fct> <fct> <dbl>
-# 1 10002 Male alive 76.6
-# 2 10004 Female alive 59.1
-# 3 10005 Male alive 54.5
-# 4 10006 Male alive 54.1
-# 5 10007 Male alive 57.7
-# 6 10008 Male alive 62.8
-# 7 10010 Female alive 63.7
-# 8 10011 Male alive 73.1
-# 9 10017 Male alive 56.7
-# 10 10018 Female alive 66.6
-# # ... with 4,193 more rows
Base R -
-To import data from SPSS, SAS or Stata, you can use the great haven
package
-yourself:
-# download and install the latest version:
-install.packages("haven")
-# load the package you just installed:
-library(haven)
You can now import files as follows:
-SPSS -
-To read files from SPSS into R:
-
-# read any SPSS file based on file extension (best way):
-read_spss(file = "path/to/file")
-
-# read .sav or .zsav file:
-read_sav(file = "path/to/file")
-
-# read .por file:
-read_por(file = "path/to/file")
Do not forget about as_factor()
, as mentioned above.
To export your R objects to the SPSS file format:
-
-# save as .sav file:
-write_sav(data = yourdata, path = "path/to/file")
-
-# save as compressed .zsav file:
-write_sav(data = yourdata, path = "path/to/file", compress = TRUE)
SAS -
-To read files from SAS into R:
-
-# read .sas7bdat + .sas7bcat files:
-read_sas(data_file = "path/to/file", catalog_file = NULL)
-
-# read SAS transport files (version 5 and version 8):
-read_xpt(file = "path/to/file")
To export your R objects to the SAS file format:
-
-# save as regular SAS file:
-write_sas(data = yourdata, path = "path/to/file")
-
-# the SAS transport format is an open format
-# (required for submission of the data to the FDA)
-write_xpt(data = yourdata, path = "path/to/file", version = 8)
Stata -
-To read files from Stata into R:
-
-# read .dta file:
-read_stata(file = "/path/to/file")
-
-# works exactly the same:
-read_dta(file = "/path/to/file")
To export your R objects to the Stata file format:
-
-# save as .dta file, Stata version 14:
-# (supports Stata v8 until v15 at the time of writing)
-write_dta(data = yourdata, path = "/path/to/file", version = 14)