Breast cancer classification with reduced feature set using association rules and support vector machine

AbstractIn the last few years, machine learning is one of the driving forces of science and industry, but increasing of data requires paradigm shifts in traditional methods in the application of machine learning techniques on this data especially in healthcare field. Furthermore, with the availability of different clinical technologies, tumor features have been collected for breast cancer classification. Therefore, feature selection and accuracy improvement have become a challenging and time-consuming task. In this paper, the proposed approach has two stages. In the first, Association Rules (AR) are used to eliminate insignificant features. In the second, several classifiers are applied to differentiate the incoming tumors. Feature space dimension is reduced from nine to eight and four attributes by using AR. In test stage, threefold cross-validation method was applied to the Wisconsin Breast Cancer Diagnostic (WBCD) dataset from the University of California Irvine machine learning repository to evaluate the proposed system performances. The correct classification rate obtained with Support Vector Machine (SVM) model with AR shows the highest classification accuracy (98.00%) for eight attributes and 96.14% for 4 attributes. The results show that the proposed approach can be used for feature space reduction and saving of time during the training phase leading to better accuracy and fast automatic classification systems.
Source: Network Modeling Analysis in Health Informatics and Bioinformatics - Category: Bioinformatics Source Type: research