An efficient Burrows-Wheeler transform-based aligner for short read mapping

Comput Biol Chem. 2024 Mar 5;110:108050. doi: 10.1016/j.compbiolchem.2024.108050. Online ahead of print.ABSTRACTRead mapping as the foundation of computational biology is a bottleneck task under the pressure of sequencing throughput explodes. In this work, we present an efficient Burrows-Wheeler transform-based aligner for next-generation sequencing (NGS) short read. Firstly, we propose a difference-aware classification strategy to assign specific reads to the computationally more economical search modes, and present some acceleration techniques, such as a seed pruning method based on the property of maximum coverage interval to reduce the redundant locating for candidate regions, redesigning LF calculation to support fast query. Then, we propose a heuristic verification to determine the best mapping from amounts of flanking sequences. Incorporated with low-distortion string embedding, most dissimilar sequences are filtered out cheaply, and the highly similar sequences left are just right for the wavefront alignment algorithm's preference. We provide a full spectrum benchmark with different read lengths, the results show that our method is 1.3-1.4 times faster than state-of-the-art Burrows-Wheeler transform-based methods (including bowtie2, bwa-MEM, and hisat2) over 101bp reads and has a speedup with 1.5-13 times faster over 750bp to 1000bp reads; meanwhile, our method has comparable memory usage and accuracy. However, hash-based methods (including Strobealign, Minimap2, and ...
Source: Computational Biology and Chemistry - Category: Bioinformatics Authors: Source Type: research