GAD: A Python Script for Dividing Genome Annotation Files into Feature-Based Files

AbstractNowadays, the manipulation and analysis of genomic data stored in publicly accessible repositories have become a daily task in genomics and bioinformatics laboratories. Due to the enormous advancement in the field of genome sequencing and the emergence of many projects, bioinformaticians have pushed for the creation of a variety of programs and pipelines that will automatically analyze such big data, in particular the pipelines of gene annotation. Dealing with annotation files using easy and simple programs is very important, particularly for non-developers, enhancing the genomic data analysis acceleration. One of the first tasks required to work with genomic annotation files is to extract different features. In this regard, we have developed GAD (https://github.com/bio-projects/GAD) using Python to be a fast, easy, and controlled script that has a high ability to handle annotation files such as GFF3 and GTF. GAD is a cross-platform graphical interface tool used to extract genome features such as intergenic regions, upstream, and downstream genes. Besides, GAD finds all names of ambiguous sequence ontology, and either extracts them or considers them as genes or transcripts. The results are produced in a variety of file formats, such as BED, GTF, GFF3, and FASTA, supported by other bioinformatics programs. The GAD can handle large sizes of different genomes and an infinite number of files with minimal user effort. Therefore, our script could be integrated into various ...
Source: Interdisciplinary Sciences, Computational Life Sciences - Category: Bioinformatics Source Type: research