IJERPH, Vol. 17, Pages 1828: Stroke Prediction with Machine Learning Methods among Older Chinese

IJERPH, Vol. 17, Pages 1828: Stroke Prediction with Machine Learning Methods among Older Chinese International Journal of Environmental Research and Public Health doi: 10.3390/ijerph17061828 Authors: Yafei Wu Ya Fang Timely stroke diagnosis and intervention are necessary considering its high prevalence. Previous studies have mainly focused on stroke prediction with balanced data. Thus, this study aimed to develop machine learning models for predicting stroke with imbalanced data in an elderly population in China. Data were obtained from a prospective cohort that included 1131 participants (56 stroke patients and 1075 non-stroke participants) in 2012 and 2014, respectively. Data balancing techniques including random over-sampling (ROS), random under-sampling (RUS), and synthetic minority over-sampling technique (SMOTE) were used to process the imbalanced data in this study. Machine learning methods such as regularized logistic regression (RLR), support vector machine (SVM), and random forest (RF) were used to predict stroke with demographic, lifestyle, and clinical variables. Accuracy, sensitivity, specificity, and areas under the receiver operating characteristic curves (AUCs) were used for performance comparison. The top five variables for stroke prediction were selected for each machine learning method based on the SMOTE-balanced data set. The total prevalence of stroke was high in 2014 (4.95%), with men experiencing much higher prevalence than women (6.76% vs. 3....
Source: International Journal of Environmental Research and Public Health - Category: Environmental Health Authors: Tags: Article Source Type: research