StrucBreak: A Computational Framework for Structural Break Detection in DNA Sequences

Abstract Damages or breaks in DNA may change the characteristics of genomes and causes various diseases. In this work we construct a system that incorporates the maximum likelihood-based probabilistic formula to assess the number of damages that have occurred in any DNA sequence. This approach has been progressively benchmarked by implementing simulated data sets so that the outcomes can be compared with a ground truth or reference value. At first the sequence data set order is checked through the statistical cumulative sum (STACUMSUM). The verified sequences are then estimated by prior and posterior probability to count the percentages of breaks and mutations. Maximum-likelihood estimation then finds out the exact numbers and positions of breaks and detections. In database manipulation, one factor that decides the orientation and order of the sequence is geometric distance between consecutive sequences. The geometric distance is measured for smooth representation of the genome or DNA sequences. Finally, we compared the performance of our system with DAMBE5: (A Comprehensive Software Package for Data Analysis in Molecular Biology and Evaluation), and in response to time and space complexity, StrucBreak is much faster and consumes much less space due to our algorithmic approaches.
Source: Interdisciplinary Sciences, Computational Life Sciences - Category: Bioinformatics Source Type: research