Methods for reducing the number of sequences in molecular evolutionary analyses

Publication date: Available online 31 October 2019Source: Meta GeneAuthor(s): Yoshiyuki Suzuki, Maho Nishimura, Tatsuya Inoue, Yuki KobayashiAbstractDue to the progress in the sequencing technology, the number of nucleotide sequences for pathogens deposited in the public databases has been increasing rapidly. Consequently, in the molecular evolutionary analyses of pathogens, it may occasionally be difficult to include all the available sequences and necessary to reduce the number of sequences to accomplish computation within a realistic time frame. Here several methods for reducing the number of sequences were evaluated using the amount of evolutionary information contained in the retained sequences, which was measured as the total branch length of the phylogenetic tree (L). In the REA (random elimination in alignment) method, each of sequences was eliminated with equal probability. In the phylogenetic tree-based methods, the sequences associated with short exterior branches were eliminated; the sequences to be eliminated were required to constitute neighbors with another sequence in the CNT (closest neighbor in tree) method, whereas no such restriction was imposed in the SET (shortest exterior branch in tree) method. In the distance matrix-based methods, the sequences with small average distances to other sequences were eliminated; the sequences to be eliminated were required to be closely related to another sequence in the CPM (closest pair in matrix) method, whereas no suc...
Source: Meta Gene - Category: Genetics & Stem Cells Source Type: research