Bayesian modelling of high-throughput sequencing assays with malacoda

by Andrew R. Ghazi, Xianguo Kong, Ed S. Chen, Leonard C. Edelstein, Chad A. Shaw NGS studies have uncovered an ever-growing catalog of human variation while leaving an enormous gap between observed variation and experimental characterization of variant function. High-throughput screens powered by NGS have greatly increased the rate of variant functionalization, but the develop ment of comprehensive statistical methods to analyze screen data has lagged. In the massively parallel reporter assay (MPRA), short barcodes are counted by sequencing DNA libraries transfected into cells and the cell’s output RNA in order to simultaneously measure the shifts in transcription induc ed by thousands of genetic variants. These counts present many statistical challenges, including overdispersion, depth dependence, and uncertain DNA concentrations. So far, the statistical methods used have been rudimentary, employing transformations on count level data and disregarding experimental and technical structure while failing to quantify uncertainty in the statistical model. We have developed an extensive framework for the analysis of NGS functionalization screens available as an R package called malacoda (available from github.com/andrewGhazi/malacoda). Our software implements a pr obabilistic, fully Bayesian model of screen data. The model uses the negative binomial distribution with gamma priors to model sequencing counts while accounting for effects from input library preparation and sequenci...
Source: PLoS Computational Biology - Category: Biology Authors: Source Type: research