Virus detection in high-throughput sequencing data without a reference genome of the host.

We present a new modular, computational pipeline for discovery of novel viruses in host samples. While existing pipelines rely on the availability of the hosts reference genome for filtering sequence reads, our new pipeline can also cope with cases for which no reference genome is available. As a further novelty of our method a decoy module is used to assess false classification rates in the discovery process. Additionally, viruses with a low read coverage can be identified and visually reviewed. We validate our pipeline on simulated data as well as two experimental samples with known virus content. For the experimental samples, we were able to reproduce the laboratory findings. Our newly developed pipeline is applicable for virus detection in a wide range of host species. The three modules we present can either be incorporated individually in other pipelines or be used as a stand-alone pipeline. We are the first to present a decoy approach within a virus detection pipeline that can be used to assess error rates so that the quality of the final result can be judged. We provide an implementation of our modules via Github. However, the principle of the modules can easily be re-implemented by other researchers. PMID: 30292006 [PubMed - as supplied by publisher]
Source: Infection, Genetics and Evolution - Category: Genetics & Stem Cells Authors: Tags: Infect Genet Evol Source Type: research