Machine Learning Polymer Models of Three-Dimensional Chromatin Organization in Human Lymphoblastoid Cells

We present machine learning models of human genome three-dimensional structure that combine one dimensional (linear) sequence specificity, epigenomic information and transcription factor binding profiles, with the polymer-based biophysical simulations in order to explain the extensive long-range chromatin looping observed in ChIA-PET experiments for lymphoblastoid cells. Random Forest, Gradient Boosting Machine (GBM), and Deep Learning models were constructed and evaluated, when predicting high resolution interactions within Topologically Associating Domains (TADs). The predicted interactions are consistent with the experimental long-read ChIA-PET interactions mediated by CTCF and RNAPOL2 for GM12878 cell line. The contribution of sequence information and chromatin state defined by epigenomic features to the prediction task is analyzed and reported, when using them separately and combined.Furthermore, we design three-dimensional models of chromatin contact domains (CCDs) using real (ChIA-PET) and predicted looping interactions. Initial results show a similarity between both types of 3D computational models (constructed from experimental or predicted interactions). This observation confirms the association between genome sequence, epigenomic and transcription factor profiles, and three-dimensional interactions.
Source: Methods - Category: Molecular Biology Source Type: research