Modeling and predicting cancer clonal evolution with reinforcement learning [METHOD]
Cancer results from an evolutionary process that typically yields multiple clones with varying sets of mutations within the same tumor. Accurately modeling this process is key to understanding and predicting cancer evolution. Here, we introduce clone to mutation (CloMu), a flexible and low-parameter tree generative model of cancer evolution. CloMu uses a two-layer neural network trained via reinforcement learning to determine the probability of new mutations based on the existing mutations on a clone. CloMu supports several prediction tasks, including the determination of evolutionary trajectories, tree selection, causalit...
Source: Genome Research - August 10, 2023 Category: Genetics & Stem Cells Authors: Ivanovic, S., El-Kebir, M. Tags: METHOD Source Type: research

Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT [METHOD]
We present GGCAT, a tool for constructing both types of graphs, based on a new approach merging the k-mer counting step with the unitig construction step, as well as on numerous practical optimizations. For compacted de Bruijn graph construction, GGCAT achieves speed-ups of 3x to 21x compared with the state-of-the-art tool Cuttlefish 2. When constructing the colored variant, GGCAT achieves speed-ups of 5x to 39x compared with the state-of-the-art tool BiFrost. Additionally, GGCAT is up to 480x faster than BiFrost for batch sequence queries on colored graphs. (Source: Genome Research)
Source: Genome Research - August 9, 2023 Category: Genetics & Stem Cells Authors: Cracco, A., Tomescu, A. I. Tags: METHOD Source Type: research

Efficient minimizer orders for large values of k using minimum decycling sets [METHOD]
Minimizers are ubiquitously used in data structures and algorithms for efficient searching, mapping, and indexing of high-throughput DNA sequencing data. Minimizer schemes select a minimum k-mer in every L-long subsequence of the target sequence, where minimality is with respect to a predefined k-mer order. Commonly used minimizer orders select more k-mers than necessary and therefore provide limited improvement in runtime and memory usage of downstream analysis tasks. The recently introduced universal k-mer hitting sets produce minimizer orders with fewer selected k-mers. Generating compact universal k-mer hitting sets is...
Source: Genome Research - August 9, 2023 Category: Genetics & Stem Cells Authors: Pellow, D., Pu, L., Ekim, B., Kotlar, L., Berger, B., Shamir, R., Orenstein, Y. Tags: METHOD Source Type: research

Improving quartet graph construction for scalable and accurate species tree estimation from gene trees [METHOD]
Summary methods are widely used to estimate species trees from genome-scale data. However, they can fail to produce accurate species trees when the input gene trees are highly discordant because of estimation error and biological processes, such as incomplete lineage sorting. Here, we introduce TREE-QMC, a new summary method that offers accuracy and scalability under these challenging scenarios. TREE-QMC builds upon weighted Quartet Max Cut, which takes weighted quartets as input and then constructs a species tree in a divide-and-conquer fashion, at each step forming a graph and seeking its max cut. The wQMC method has bee...
Source: Genome Research - August 8, 2023 Category: Genetics & Stem Cells Authors: Han, Y., Molloy, E. K. Tags: METHOD Source Type: research

Efficient taxa identification using a pangenome index [METHOD]
We present new algorithms and methods for solving this problem. Specifically, given a collection of d documents, over an alphabet of size , we extend the r-index with additional words to support document listing queries for a pattern that occurs in documents in in time and space, where w is the machine word size. Applied in a bacterial mock community experiment, our method is up to three times faster than a comparable method that uses the standard r-index locate queries. We show that our method classifies both simulated and real nanopore reads at the strain level with higher accuracy compared with other approaches. Finally...
Source: Genome Research - August 8, 2023 Category: Genetics & Stem Cells Authors: Ahmed, O., Rossi, M., Boucher, C., Langmead, B. Tags: METHOD Source Type: research