CML-Cardio: a cascade machine learning model to predict cardiovascular disease risk as a primary prevention strategy

AbstractCardiovascular diseases are among the leading causes of mortality worldwide, with more than 23 million related deaths per year by 2030, according to the World Heart Federation. Although most of these diseases may be prevented, population awareness strategies are still ineffective. In this context, we propose the CML-Cardio tool, a machine learning application to automate the risk classification process of developing CVDs. For this, researchers in our group collected data on diabetes, blood pressure, and other risk factors in a private company. Our final model consists of a cascade system to handle highly imbalanced data. In the first stage, a binary model is responsible for predicting whether a patient has a low risk of developing CVDs or if has a risk that needs attention. In this step, we use six algorithms: logistic regression, SVM, random forest, XGBoost, CatBoost, and multilayer perceptron. The better results presented an average accuracy of 0.86 ± 0.03 and f-score of 0.85 ± 0.04. We interpret each feature’s impact on the models’ output and validate the subsystem for the next step. In the second stage, we use an anomaly detection model to learn the intermediate risk patterns present in the instances that need attention. The cascade mod el presented an average accuracy of 0.80 ± 0.07 and f-score of 0.70 ± 0.07. Finally, we develop the CML-Cardio prototype of an actual application as a primary prevention strategy.Graphical abstractIn this work, we propose t...
Source: Medical and Biological Engineering and Computing - Category: Biomedical Engineering Source Type: research