A novel riboswitch classification based on imbalanced sequences achieved by machine learning

by Solomon Shiferaw Beyene, Tianyi Ling, Blagoj Ristevski, Ming Chen Riboswitch, a part of regulatory mRNA (50–250nt in length), has two main classes: aptamer and expression platform. One of the main challenges raised during the classification of riboswitch is imbalanced data. That is a circumstance in which the records of a sequences of one group are very small compared to the others. Such circumstances lead classifier to ignore minority group and emphasize on majority ones, which results in a skewed classification. We considered sixteen riboswitch families, to be in accord with recent riboswitch classification work, that contain imbalanced sequences. The sequences were split into training and test set using a newly developed pipeline. From 5460k-mers (k value 1 to 6) produced, 156 features were calculated based onCfsSubsetEval andBestFirst function found in WEKA 3.8. Statistically tested result was significantly difference between balanced and imbalanced sequences (p
Source: PLoS Computational Biology - Category: Biology Authors: Source Type: research