Efficient taxa identification using a pangenome index [METHOD]

We present new algorithms and methods for solving this problem. Specifically, given a collection of d documents, over an alphabet of size , we extend the r-index with additional words to support document listing queries for a pattern that occurs in documents in in time and space, where w is the machine word size. Applied in a bacterial mock community experiment, our method is up to three times faster than a comparable method that uses the standard r-index locate queries. We show that our method classifies both simulated and real nanopore reads at the strain level with higher accuracy compared with other approaches. Finally, we present strategies for compacting this structure in applications in which read lengths or match lengths can be bounded.
Source: Genome Research - Category: Genetics & Stem Cells Authors: Tags: METHOD Source Type: research