EAPhy: A Flexible Tool for High-throughput Quality Filtering of Exon-alignments and Data Processing for Phylogenetic Methods

Introduction High-Throughput Sequencing (HTS) has revolutionised the field of phylogenetics by enabling researchers to question the evolutionary relationships between taxa with large-scale multi-locus datasets 1,2. The development of these methods has been driven by a realisation that the inclusion of many genetic markers helps to account for stochastic coalescent histories of individual genes 3,4,5,6. Species tree inference methods use the multispecies coalescent model to estimate potential gene tree – species tree discordance and large numbers of unlinked loci represent a greater sample of the gene tree distribution underlying the true species tree 6. However, while phylogenetic estimation might improve by sequencing many loci 4,5,6,7, the requirement for high-quality sequence alignments remains unchanged and is fundamental for the correct inference of phylogenetic hypotheses. Existing alignment methods can be extrapolated for use with large-scale multi-locus datasets, but visual inspection of each alignment, the traditional approach for assessing alignment quality, is challenging with thousands of sequenced loci 8. As a consequence of the impracticality of visual inspection, the impact of missing data in large phylogenomic datasets is often nominally explored and the potential consequences of distinct alignment filtering criteria remain unknown. Nonetheless, contradicting opinions coexist 9,10,11 regarding the effect of missing data on phylogenetic inference and it is th...
Source: PLOS Currents Tree of Life - Category: Genetics & Stem Cells Authors: Source Type: research