Deep learning of regulatory regions discovers enhancer variants implicated in PAH

Pulmonary arterial hypertension (PAH) is a rare and fatal lung disease. To date, in only a third of patients, the cause can be attributed to rare genetic variation in the protein-coding space. The sequencing of 15,000 whole genomes by the NIHR BioResource – Rare Diseases (NBR), including ~1,200 PAH samples, provides an unprecedented opportunity to estimate the contribution of regulatory genome variation to the development of PAH.This work aims to determine whether sequence-based predictions of epigenetic features can be used to narrow down the possible regions of interest and allow aggregation of variants into functional groups for association testing.A convolutional neural network has been trained using publicly available datasets to predict epigenetic features from DNA sequence. The model was tested against known enhancer regions and its accurate performance was verified; two approaches were developed for the evaluation of the epigenetic features. Firstly, an epigenetic importance score that supplies general information about the availability of epigenetic profiles within a region to explore the non-coding space. Secondly, a regulation score that combines the predicted features into activating and repressing subsets for more detailed analyses to gauge the regulatory impact of variants. Based on the regulatory impact and other common variant annotations, variants were filtered and aggregated for overrepresentation analysis comparing cases with controls.Preliminary stat...
Source: European Respiratory Journal - Category: Respiratory Medicine Authors: Tags: 13.01 - Pulmonary hypertension Source Type: research