Best Practices for Data Sharing in Phylogenetic Research

Introduction The amount of phylogenetic data has rapidly increased in its quality and availability over the past few decades. Additionally, phylogenetic matrices and trees are often based on and need to be linked to data on traits, geographic distribution, and genetic / genomic sequences. Despite the rapid growth in data generation, comparative data from published studies are too often unavailable, incomplete, or incompatible thereby greatly limiting reproducibility and expansion of existing studies 1,2. A greater focus on data integration and interoperability, even at the data collection phase of a project, allows for scalable, integrative analyses that combine data from multiple sources 3,4. Modeled on other initiatives in science that have suggested practices to standardize and share data 5, here we discuss recommendations for data sharing that will allow the phylogenetics community to advance large scale research much more efficiently. “We” in this case, are members of the three NSF-funded AVAToL projects – Open Tree of Life, Arbor and Next-generation Phenomics, although this manuscript was heavily influenced by a period of public commenting 23 (see Acknowledgements). We define phylogenetic data as the inputs, outputs and methodological details of a phylogenetic analysis. Current practices for publication of these data too often limit reusability. For example, molecular alignments are rarely preserved and made available; phenomic matrices do not always include f...
Source: PLOS Currents Tree of Life - Category: Genetics & Stem Cells Authors: Source Type: research