Unifying duplication episode clustering and gene-species mapping inference
We present a novel problem, called MetaEC, which aims to infer gene-species assignments in a collection of partially leaf-labeled gene trees labels by minimizing the size of duplication episode clustering (EC). This problem is particularly relevant in metagenomics, where incomplete data often poses a challenge in the accurate reconstruction of gene histories. To solve MetaEC, we propose a polynomial time dynamic programming (DP) formulation that verifies the existence of a set of duplication episodes from a predefined set of episode candidates. In addition, we design a method to infer distributions of gene-species mappings...
Source: Algorithms for Molecular Biology : AMB - February 14, 2024 Category: Molecular Biology Authors: Pawe ł Górecki Natalia Rutecka Agnieszka Mykowiecka Jaros ław Paszek Source Type: research

Unifying duplication episode clustering and gene-species mapping inference
We present a novel problem, called MetaEC, which aims to infer gene-species assignments in a collection of partially leaf-labeled gene trees labels by minimizing the size of duplication episode clustering (EC). This problem is particularly relevant in metagenomics, where incomplete data often poses a challenge in the accurate reconstruction of gene histories. To solve MetaEC, we propose a polynomial time dynamic programming (DP) formulation that verifies the existence of a set of duplication episodes from a predefined set of episode candidates. In addition, we design a method to infer distributions of gene-species mappings...
Source: Algorithms for Molecular Biology : AMB - February 14, 2024 Category: Molecular Biology Authors: Pawe ł Górecki Natalia Rutecka Agnieszka Mykowiecka Jaros ław Paszek Source Type: research

Unifying duplication episode clustering and gene-species mapping inference
We present a novel problem, called MetaEC, which aims to infer gene-species assignments in a collection of partially leaf-labeled gene trees labels by minimizing the size of duplication episode clustering (EC). This problem is particularly relevant in metagenomics, where incomplete data often poses a challenge in the accurate reconstruction of gene histories. To solve MetaEC, we propose a polynomial time dynamic programming (DP) formulation that verifies the existence of a set of duplication episodes from a predefined set of episode candidates. In addition, we design a method to infer distributions of gene-species mappings...
Source: Algorithms for Molecular Biology : AMB - February 14, 2024 Category: Molecular Biology Authors: Pawe ł Górecki Natalia Rutecka Agnieszka Mykowiecka Jaros ław Paszek Source Type: research

Unifying duplication episode clustering and gene-species mapping inference
We present a novel problem, called MetaEC, which aims to infer gene-species assignments in a collection of partially leaf-labeled gene trees labels by minimizing the size of duplication episode clustering (EC). This problem is particularly relevant in metagenomics, where incomplete data often poses a challenge in the accurate reconstruction of gene histories. To solve MetaEC, we propose a polynomial time dynamic programming (DP) formulation that verifies the existence of a set of duplication episodes from a predefined set of episode candidates. In addition, we design a method to infer distributions of gene-species mappings...
Source: Algorithms for Molecular Biology : AMB - February 14, 2024 Category: Molecular Biology Authors: Pawe ł Górecki Natalia Rutecka Agnieszka Mykowiecka Jaros ław Paszek Source Type: research

Unifying duplication episode clustering and gene-species mapping inference
We present a novel problem, called MetaEC, which aims to infer gene-species assignments in a collection of partially leaf-labeled gene trees labels by minimizing the size of duplication episode clustering (EC). This problem is particularly relevant in metagenomics, where incomplete data often poses a challenge in the accurate reconstruction of gene histories. To solve MetaEC, we propose a polynomial time dynamic programming (DP) formulation that verifies the existence of a set of duplication episodes from a predefined set of episode candidates. In addition, we design a method to infer distributions of gene-species mappings...
Source: Algorithms for Molecular Biology : AMB - February 14, 2024 Category: Molecular Biology Authors: Pawe ł Górecki Natalia Rutecka Agnieszka Mykowiecka Jaros ław Paszek Source Type: research

Unifying duplication episode clustering and gene-species mapping inference
We present a novel problem, called MetaEC, which aims to infer gene-species assignments in a collection of partially leaf-labeled gene trees labels by minimizing the size of duplication episode clustering (EC). This problem is particularly relevant in metagenomics, where incomplete data often poses a challenge in the accurate reconstruction of gene histories. To solve MetaEC, we propose a polynomial time dynamic programming (DP) formulation that verifies the existence of a set of duplication episodes from a predefined set of episode candidates. In addition, we design a method to infer distributions of gene-species mappings...
Source: Algorithms for Molecular Biology : AMB - February 14, 2024 Category: Molecular Biology Authors: Pawe ł Górecki Natalia Rutecka Agnieszka Mykowiecka Jaros ław Paszek Source Type: research

Unifying duplication episode clustering and gene-species mapping inference
We present a novel problem, called MetaEC, which aims to infer gene-species assignments in a collection of partially leaf-labeled gene trees labels by minimizing the size of duplication episode clustering (EC). This problem is particularly relevant in metagenomics, where incomplete data often poses a challenge in the accurate reconstruction of gene histories. To solve MetaEC, we propose a polynomial time dynamic programming (DP) formulation that verifies the existence of a set of duplication episodes from a predefined set of episode candidates. In addition, we design a method to infer distributions of gene-species mappings...
Source: Algorithms for Molecular Biology : AMB - February 14, 2024 Category: Molecular Biology Authors: Pawe ł Górecki Natalia Rutecka Agnieszka Mykowiecka Jaros ław Paszek Source Type: research

Unifying duplication episode clustering and gene-species mapping inference
We present a novel problem, called MetaEC, which aims to infer gene-species assignments in a collection of partially leaf-labeled gene trees labels by minimizing the size of duplication episode clustering (EC). This problem is particularly relevant in metagenomics, where incomplete data often poses a challenge in the accurate reconstruction of gene histories. To solve MetaEC, we propose a polynomial time dynamic programming (DP) formulation that verifies the existence of a set of duplication episodes from a predefined set of episode candidates. In addition, we design a method to infer distributions of gene-species mappings...
Source: Algorithms for Molecular Biology : AMB - February 14, 2024 Category: Molecular Biology Authors: Pawe ł Górecki Natalia Rutecka Agnieszka Mykowiecka Jaros ław Paszek Source Type: research

Predicting horizontal gene transfers with perfect transfer networks
Algorithms Mol Biol. 2024 Feb 6;19(1):6. doi: 10.1186/s13015-023-00242-2.ABSTRACTBACKGROUND: Horizontal gene transfer inference approaches are usually based on gene sequences: parametric methods search for patterns that deviate from a particular genomic signature, while phylogenetic methods use sequences to reconstruct the gene and species trees. However, it is well-known that sequences have difficulty identifying ancient transfers since mutations have enough time to erase all evidence of such events. In this work, we ask whether character-based methods can predict gene transfers. Their advantage over sequences is that hom...
Source: Algorithms for Molecular Biology : AMB - February 6, 2024 Category: Molecular Biology Authors: Alitzel L ópez Sánchez Manuel Lafond Source Type: research

Global exact optimisations for chloroplast structural haplotype scaffolding
CONCLUSIONS: We succeed to model biological knowledge on genomic structures to scaffold chloroplast genomes. Our results suggest that modelling genomic regions is sufficient for scaffolding repeats and is suitable for finding several solutions corresponding to several genome forms.PMID:38321522 | DOI:10.1186/s13015-023-00243-1 (Source: Algorithms for Molecular Biology : AMB)
Source: Algorithms for Molecular Biology : AMB - February 6, 2024 Category: Molecular Biology Authors: Victor Epain Rumen Andonov Source Type: research

Co-linear chaining on pangenome graphs
Algorithms Mol Biol. 2024 Jan 27;19(1):4. doi: 10.1186/s13015-024-00250-w.ABSTRACTPangenome reference graphs are useful in genomics because they compactly represent the genetic diversity within a species, a capability that linear references lack. However, efficiently aligning sequences to these graphs with complex topology and cycles can be challenging. The seed-chain-extend based alignment algorithms use co-linear chaining as a standard technique to identify a good cluster of exact seed matches that can be combined to form an alignment. Recent works show how the co-linear chaining problem can be efficiently solved for acy...
Source: Algorithms for Molecular Biology : AMB - January 26, 2024 Category: Molecular Biology Authors: Jyotshna Rajput Ghanshyam Chandra Chirag Jain Source Type: research

Fulgor: a fast and compact k-mer index for large-scale matching and color queries
Algorithms Mol Biol. 2024 Jan 22;19(1):3. doi: 10.1186/s13015-024-00251-9.ABSTRACTThe problem of sequence identification or matching-determining the subset of reference sequences from a given collection that are likely to contain a short, queried nucleotide sequence-is relevant for many important tasks in Computational Biology, such as metagenomics and pangenome analysis. Due to the complex nature of such analyses and the large scale of the reference collections a resource-efficient solution to this problem is of utmost importance. This poses the threefold challenge of representing the reference collection with a data stru...
Source: Algorithms for Molecular Biology : AMB - January 23, 2024 Category: Molecular Biology Authors: Jason Fan Jamshed Khan Noor Pratap Singh Giulio Ermanno Pibiri Rob Patro Source Type: research

Dollo-CDP: a polynomial-time algorithm for the clade-constrained large Dollo parsimony problem
Algorithms Mol Biol. 2024 Jan 8;19(1):2. doi: 10.1186/s13015-023-00249-9.ABSTRACTThe last decade of phylogenetics has seen the development of many methods that leverage constraints plus dynamic programming. The goal of this algorithmic technique is to produce a phylogeny that is optimal with respect to some objective function and that lies within a constrained version of tree space. The popular species tree estimation method ASTRAL, for example, returns a tree that (1) maximizes the quartet score computed with respect to the input gene trees and that (2) draws its branches (bipartitions) from the input constraint set. This...
Source: Algorithms for Molecular Biology : AMB - January 8, 2024 Category: Molecular Biology Authors: Junyan Dai Tobias Rubel Yunheng Han Erin K Molloy Source Type: research

Dollo-CDP: a polynomial-time algorithm for the clade-constrained large Dollo parsimony problem
Algorithms Mol Biol. 2024 Jan 8;19(1):2. doi: 10.1186/s13015-023-00249-9.ABSTRACTThe last decade of phylogenetics has seen the development of many methods that leverage constraints plus dynamic programming. The goal of this algorithmic technique is to produce a phylogeny that is optimal with respect to some objective function and that lies within a constrained version of tree space. The popular species tree estimation method ASTRAL, for example, returns a tree that (1) maximizes the quartet score computed with respect to the input gene trees and that (2) draws its branches (bipartitions) from the input constraint set. This...
Source: Algorithms for Molecular Biology : AMB - January 8, 2024 Category: Molecular Biology Authors: Junyan Dai Tobias Rubel Yunheng Han Erin K Molloy Source Type: research

Investigating the complexity of the double distance problems
Algorithms Mol Biol. 2024 Jan 4;19(1):1. doi: 10.1186/s13015-023-00246-y.ABSTRACTBACKGROUND: Two genomes [Formula: see text] and [Formula: see text] over the same set of gene families form a canonical pair when each of them has exactly one gene from each family. Denote by [Formula: see text] the number of common families of [Formula: see text] and [Formula: see text]. Different distances of canonical genomes can be derived from a structure called breakpoint graph, which represents the relation between the two given genomes as a collection of cycles of even length and paths. Let [Formula: see text] and [Formula: see text] b...
Source: Algorithms for Molecular Biology : AMB - January 4, 2024 Category: Molecular Biology Authors: Mar ília D V Braga Leonie R Brockmann Katharina Klerx Jens Stoye Source Type: research