Principal Investigator: Xianjun Dong
Parkinson’s disease (PD) affects more than 7 million people worldwide, and biomarkers to bolster the therapeutic pipeline are urgently needed. The Accelerating Medicine Partnership in Parkinson’s disease (AMP PD) consortium has provided the unprecedented opportunities for investigators to utilize the data to build an early diagnosis platform for the PD patient diagnosis, which could lead to improved treatment response and higher efficacy.
Meanwhile, the advances in the machine learning and artificial intelligence could also help to maximize the value of the massive amount of readily available data. In our work, we leveraged the RNA-seq data to find the most significant differentially expressed genes between PD patients and health controls, which will be used to develop a prediction model for PD diagnosis. To achieve a better performance, the most predictive genes were selected using LASSO method. Finally, all our models have very good performances when applying to the testing data.
Based on our current primary results and the data availability, advanced model by combining both PD-associated RNAs and DNAs signal with the state-of-the-art deep learning techniques will be constructed. It will develop and test highly innovative multi-omics classifiers and provide a generally useful computational framework for large-scale, unbiased PD biomarker discovery.
Developing biomarkers for clinical use is a difficult process that requires evaluation of multiple, large cohorts, each adding confidence to the marker. The Accelerating Medicine Partnership in Parkinson’s disease (AMP PD) consortium provides an unparalleled opportunity to rapidly achieve this previously elusive goal.
In this pilot work, we utilized the RNA-seq data from the PPMI cohort in AMP-PD. 691 PD cases and 502 health controls were randomly split into training set (80%) and testing set (20%). 697 differentially expressed genes with a relax cutoff (FDR < 0.1) were used in training our models. 46 genes were finally selected through LASSO feature selection process as our predictors for PD diagnosis. Three models were tested on the testing dataset with high accuracy, with the areas under the receiver operating curve (ROAUCs) of 74.44%, 72.86, and 75.65% for logistic regression, random forest, and support vector machine, respectively.
This is our first result of an ongoing investigation. For our next step we will train our model in one cohort (e.g. PPMI) and test in the independent cohorts (e.g. PDBP and BioFIND). We will build multi-omics classifiers by combining both PD-associated RNAs and DNAs signal with the state-of-the-art deep learning techniques (e.g. variational autoencoder).