Machine learning based methods to generate conformational ensembles of disordered proteins

Biophys J. 2023 Dec 4:S0006-3495(23)04121-8. doi: 10.1016/j.bpj.2023.12.001. Online ahead of print.ABSTRACTIntrinsically disordered proteins are characterized by a conformational ensemble. While computational approaches such as molecular dynamics simulations have been used to generate such ensembles, their computational costs can be prohibitive. An alternative approach is to learn from data and train machine learning models to generate conformational ensembles of disordered proteins. This has been a relatively unexplored approach, and in this work we demonstrate a proof-of-principle approach to do so. Specifically, we devised a two-stage computational pipeline: in the first stage, we employed supervised machine learning models to predict ensemble-derived two-dimensional properties of a sequence, given the conformational ensemble of a closely related sequence. In the second stage, we used denoising diffusion models to generate three-dimensional coarse-grained conformational ensembles, given the two-dimensional predictions outputted by the first stage. We trained our models on a dataset of coarse-grained molecular dynamics simulations of thousands of rationally designed synthetic sequences. The accuracy of our 2D and 3D predictions was validated across multiple metrics, and our work demonstrates the applicability of machine learning techniques to predicting higher dimensional properties of disordered proteins.PMID:38053335 | DOI:10.1016/j.bpj.2023.12.001
Source: Biophysical Journal - Category: Physics Authors: Source Type: research