A safe and complete algorithm for metagenomic assembly.

A safe and complete algorithm for metagenomic assembly. Algorithms Mol Biol. 2018;13:3 Authors: Obscura Acosta N, Mäkinen V, Tomescu AI Abstract Background: Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all bacterial species present in a sequencing sample. This problem can be naturally formulated as finding a collection of circular walks of a directed graph G that together cover all nodes, or edges, of G. Approach: We address this problem with the "safe and complete" framework of Tomescu and Medvedev (Research in computational Molecular biology-20th annual conference, RECOMB 9649:152-163, 2016). An algorithm is called safe if it returns only those walks (also called safe) that appear as subwalk in all metagenomic assembly solutions for G. A safe algorithm is called complete if it returns all safe walks of G. Results: We give graph-theoretic characterizations of the safe walks of G, and a safe and complete algorithm finding all safe walks of G. In the node-covering case, our algorithm runs in time [Formula: see text], and in the edge-covering case it runs in time [Formula: see text]; n and m denote the number of nodes and edges, respectively, of G. This algorithm constitutes the first theoretical tight upper bound on what can be safely assembled from metagenomic rea...
Source: Algorithms for Molecular Biology : AMB - Category: Molecular Biology Authors: Tags: Algorithms Mol Biol Source Type: research