Scripts to create a dataset from Redcap outputs to use for a PLS-DA classification.
Go to file
Dijkhof 12a8d9bf57 README.md
Upload
2021-07-05 16:35:51 +02:00
DatesParser.py the rest of the scripts 2021-07-01 15:22:36 +02:00
DemoParser.py the rest of the scripts 2021-07-01 15:22:36 +02:00
FinalDF_Parser.py the rest of the scripts 2021-07-01 15:22:36 +02:00
PAParser.py the rest of the scripts 2021-07-01 15:22:36 +02:00
PLS.py test 2021-07-01 15:20:19 +02:00
README.md README.md 2021-07-05 16:35:51 +02:00
ScatterBoxplotter.py the rest of the scripts 2021-07-01 15:22:36 +02:00

README.md

In this folder you can find the scripts I used for my Master Graduation Project.

In this project I tried to do a supervised classification of older cancer patients developing a postoperative complication using a broad set of variables. Partial Least Square Discriminant Analysis (PLS-DA) was used to classify on this diverse and non-balanced set of data. The types of data ranged from activity data extracted from Fitbits, to medical data gathered from their hospital-files.

The specific format of the used dataset was based on the data exports from the projects' Redcap page. For more information on this please contact Maarten Lahr (Department of Epidemiology, UMCG) or Barbara van Leeuwen (Department of Surgery, UMCG).

Before running the scripts the following datafiles should be stored within one folder:

  • Demographics
  • Surgery and admission + Complications (both Redcap instruments combined in one .csv)
  • Data SACM
  • Baseline Assessment (T1=b)
  • Completion Data

The following files are stored on the Surgery hard drive in the digital UMCG environment.

  • 8x combined PA files
  • Patient uuid with email and patientnumber.csv (Eurecat, raw .csv)
  • PhysicalActivities_umcg.csv (Eurecat, raw .csv)

After downloading all data, the scripts should be run in the following order:

  1. DemoParser.py - Script to transform Redcap-output to usable dataframe with the patients' demographics

  2. DateParser.py - Uses the input of several Redcap exports to create a dataframe with all important dates and number of days between events.

  3. FinalDF_Parser.py - Script to create dataframe with scores from all test-moments

  4. EurecatParser.py - Script to create physical activity data

  5. FinalCombiner.py - Script that combines all datasets into one final dataframe

  6. PLS.py - Script performing the PLS-DA and plotting R2-Q2, Wolds R, PRESS and ROC-AUC plots

  7. ScatterBoxplotter.py Script to plot scatter-boxplots from final dataset