CBDT-Oglyc: Prediction of O-glycosylation sites using ChiMIC-based balanced decision table and feature selection

J Bioinform Comput Biol. 2023 Oct;21(5):2350024. doi: 10.1142/S0219720023500245. Epub 2023 Oct 28.ABSTRACTO-glycosylation (Oglyc) plays an important role in various biological processes. The key to understanding the mechanisms of Oglyc is identifying the corresponding glycosylation sites. Two critical steps, feature selection and classifier design, greatly affect the accuracy of computational methods for predicting Oglyc sites. Based on an efficient feature selection algorithm and a classifier capable of handling imbalanced datasets, a new computational method, ChiMIC-based balanced decision table O-glycosylation (CBDT-Oglyc), is proposed. ChiMIC-based balanced decision table for O-glycosylation (CBDT-Oglyc), is proposed to predict Oglyc sites in proteins. Sequence characterization is performed by combining amino acid composition (AAC), undirected composition of [Formula: see text]-spaced amino acid pairs (undirected-CKSAAP) and pseudo-position-specific scoring matrix (PsePSSM). Chi-MIC-share algorithm is used for feature selection, which simplifies the model and improves predictive accuracy. For imbalanced classification, a backtracking method based on local chi-square test is designed, and then cost-sensitive learning is incorporated to construct a novel classifier named ChiMIC-based balanced decision table (CBDT). Based on a 1:49 (positives:negatives) training set, the CBDT classifier achieves significantly better prediction performance than traditional classifiers. Moreov...
Source: Journal of Bioinformatics and Computational Biology - Category: Bioinformatics Authors: Source Type: research