Significance of data selection in deep learning for reliable binding mode prediction of ligands in the active site of CYP3A4.

Significance of data selection in deep learning for reliable binding mode prediction of ligands in the active site of CYP3A4. Chem Pharm Bull (Tokyo). 2019 Aug 17;: Authors: Sato A, Tanimura N, Honma T, Konagaya A Abstract For rational drug design, it is essential to predict the binding mode of protein-ligand complexes. Although various machine learning-based models have been reported that use convolutional neural networks (deep learning) to predict binding modes from three-dimensional structures, there are few detailed reports on how best to construct and use datasets. Here, we examined how different datasets affected the prediction of the binding mode of CYP3A4 by a three-dimensional neural network when the number of crystal structures for the target protein was limited. We used four different training datasets: one large, general dataset containing various protein complexes and three smaller, more specific datasets containing complexes with CYP3A4-like pockets, complexes with CYP3A4-binding ligands, and complexes with CYP protein family members. We then trained models with different combinations of datasets with or without subsequent fine-tuning and evaluated the binding mode prediction performance of each model. The best ROC AUC model with respect to area under the receiver operating characteristic curve was obtained by training with a combination of the general protein and CYP family datasets. However, the ROC AUC - recall balan...
Source: Chemical and Pharmaceutical Bulletin - Category: Drugs & Pharmacology Authors: Tags: Chem Pharm Bull (Tokyo) Source Type: research