Protein coding regions prediction by fusing DNA shape features

N Biotechnol. 2024 Jan 3:S1871-6784(23)00074-2. doi: 10.1016/j.nbt.2023.12.006. Online ahead of print.ABSTRACTExons crucial for coding are often hidden within introns, and the two tend to vary greatly in length, which results in deep learning-based protein coding region prediction methods often performing poorly when applied to more structurally complex biological genomes. DNA shape information also plays a role in revealing the underlying logic of gene expression, yet current methods ignore the influence of DNA shape features when distinguishing coding and non-coding regions. We propose a method to predict protein-coding regions using the CNNS-BRNN model, which incorporates DNA shape features and improves the model's ability to distinguish between intronic and exonic features. We use a fusion coding technique that combines DNA shape features and traditional sequence features. Experiments show that this method outperforms the baseline method in metrics such as AUC and F1 by 2.3% and 5.3%, respectively, and the fusion coding method that introduces DNA shape features has a significant improvement in model performance.PMID:38182076 | DOI:10.1016/j.nbt.2023.12.006
Source: New Biotechnology - Category: Biotechnology Authors: Source Type: research