Can machine learning consistently improve the scoring power of classical scoring functions? Insights into the role of machine learning in scoring functions.

In this study, to better recognize the potential of classical SFs, we have conducted a comparative assessment of 25 commonly used SFs. Accordingly, the scoring power was systematically estimated by using the state-of-the-art ML methods that replaced the original multiple linear regression method to refit individual energy terms. The results show that the newly-developed ML-based SFs consistently performed better than classical ones. In particular, gradient boosting decision tree (GBDT) and random forest (RF) achieved the best predictions in most cases. The newly-developed ML-based SFs were also tested on another benchmark modified from PDBbind v2007, and the impacts of structural and sequence similarities were evaluated. The results indicated that the superiority of the ML-based SFs could be fully guaranteed when sufficient similar targets were contained in the training set. Moreover, the effect of the combinations of features from multiple SFs was explored, and the results indicated that combining NNscore2.0 with one to four other classical SFs could yield the best scoring power. However, it was not applicable to derive a generic target-specific SF or SF combination. PMID: 31982914 [PubMed - as supplied by publisher]

https://www.ncbi.nlm.nih.gov/pubmed/31982914?dopt=Abstract

Source: Briefings in Bioinformatics - January 24, 2020 Category: Bioinformatics Authors: Shen C, Hu Y, Wang Z, Zhang X, Zhong H, Wang G, Yao X, Xu L, Cao D, Hou T Tags: Brief Bioinform Source Type: research