Feature selection using Logistic Regression in Case–Control DNA methylation data of Parkinson's disease: A Comparative study

Publication date: Available online 16 August 2018Source: Journal of Theoretical BiologyAuthor(s): Aishwarya Kakade, Baby Kumari, Pankaj Singh DholaniyaAbstractParkinson's Disease (PD) is described as a progressive neurological disorder caused by the degeneration of dopaminergic neurons in substantia nigra pars compacta. The pathogenesis of the disease is not fully understood but it has been linked with complex genetic, epigenetic and environmental interactions. A substantial number of studies have shown the role of epigenetic modifications in support of the progression of PD. In the present study, we have analyzed the data containing methylation patterns of 1726 transcripts captured over from 66 samples of 450k, which includes 43 controls and 23 diseased samples. We used Logistic Regression (LR) for feature reduction and build a classifier with an improved accuracy rate than all features together. The performance of the classifier was compared with other feature reduction approaches viz. Random Forest (RF) and Principal Component Analysis (PCA). Feature reduction with LR and RF performed better than PCA. Some of the features corresponding to the genes such as COMT, DCTN1 and PRNP were uniquely identified by LR and are reported to play a significant role in PD.
Source: Journal of Theoretical Biology - Category: Biology Source Type: research