Bioinformatics Protocols for Quickly Obtaining Large-Scale Data Sets for Phylogenetic Inferences

AbstractUseful insight into the evolution of genes and gene families can be provided by the analysis of all available genome datasets rather than just a few, which are usually those of model species. Handling and transforming such datasets into the desired format for downstream analyses is, however, often a difficult and time-consuming task for researchers without a background in informatics. Therefore, we present two simple and fast protocols for data preparation, using an easy-to-install, open-source, cross-platform software application with user-friendly, rich graphical user interface (SEDA;http://www.sing-group.org/seda/index.html). The first protocol is a substantial improvement over one recently published (L ópez-Fernández et al. Practical applications of computational biology and bioinformatics, 12th International conference. Springer, Cham, pp 88–96 (2019)[1]), which was used to study the evolution ofGULO, a gene that encodes the enzyme responsible for the last step of vitamin C synthesis. In this paper, we show how the sequence data file used for the phylogenetic analyses can now be obtained much faster by changing the way coding sequence isoforms are removed, using the newly implemented SEDA operation “Remove isoforms”. This protocol can be used to easily show that putative functionalGULO genes are present in several Prostotomian groups such as Molluscs, Priapulida and Arachnida. Such findings could have been easily missed if only a few Protostomian model sp...
Source: Interdisciplinary Sciences, Computational Life Sciences - Category: Bioinformatics Source Type: research