The Helitron family classification using SVM based on Fourier transform features applied on an unbalanced dataset

AbstractHelitrons are mobile sequences which belong to the class 2 of eukaryotic transposons. Their specificity resides in their mechanism of transposition: the rolling circle mechanism. They play an important role in remodeling proteomes due to their ability to modify existing genes and introducing new ones. A major difficulty in identifying and classifying Helitron families comes from the complex structure, the unspecified length, and the unbalanced appearance number of each Helitron type. The Helitron ’s recognition is still not solved in literature. The purpose of this paper is to characterize and classify Helitron types using spectral features and support vector machine (SVM) classification technique. Thus, the helitronic DNA is transformed into a numerical form using theFCGS2 coding technique. Then, a set of spectral features is extracted from the smoothed Fourier transform applied on theFCGS2 signals. Based on the spectral signature and the classification ’s confusion matrix, we demonstrated that some specific classes which do not show similarities, such as HelitronY2 and NDNAX3, are easily discriminated with important accuracy rates exceeding 90%. However, some Helitron types have great similarities such as the following: Helitron1, HelitronY1, He litronY1A, and HelitronY4. Our system is also able to predict them with promising values reaching 70%.Graphical abstractThe Helitron recognizer based on features extracted from smoothed Fourier transform
Source: Medical and Biological Engineering and Computing - Category: Biomedical Engineering Source Type: research