Random Forest Refinement of Pairwise Potentials for Protein-ligand Decoy Detection

Jun Pei; Zheng Zheng; Hyunji Kim; Lin Song; Sarah Walworth; Margaux Merz; Kenneth M. Merz Jr.

doi:10.26434/chemrxiv.8047820.v1

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Random Forest Refinement of Pairwise Potentials for Protein-ligand Decoy Detection

30 April 2019, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function’s ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluate the relevant importance for each atom pair using traditional means. With the introduction of machine learning methods, it has become possible to determine the relative importance for each atom pair present in a scoring function. In this work, we use the Random Forest (RF) method to refine a pair potential developed by our laboratory (GARF6) by identifying relevant atom pairs that optimize the performance of the potential on our given task. Our goal is to construct a machine learning (ML) model that can accurately differentiate the native ligand binding pose from candidate poses using a potential refined by RF optimization. We successfully constructed RF models on an unbalanced data set with the ‘comparison’ concept and, the resultant RF models were tested on CASF-2013.5 In a comparison of the performance of our RF models against 29 scoring functions, we found our models outperformed the other scoring functions in predicting the native pose. In addition, we used two artificial designed potential models to address the importance of the GARF potential in the RF models: (1) a scrambled probability function set, which was obtained by mixing up atom pairs and probability functions in GARF, and (2) a uniform probability function set, which share the same peak positions with GARF but have fixed peak heights. The results of accuracy comparison from RF models based on the scrambled, uniform, and original GARF potential clearly showed that the peak positions in the GARF potential are important while the well depths are not.

Keywords

Supplementary materials

Title

Description

Actions

Title

supporting info

Description

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

Random Forest Refinement of Pairwise Potentials for Protein–Ligand Decoy Detection

Jun Pei, Zheng Zheng, Hyunji Kim, Lin Frank Song, Sarah Walworth, Margaux R. Merz, Kenneth M. Merz journal article

Journal of Chemical Information and Modeling , Volume 59, Issue 7

Online publication date: Jun 19, 2019

Version History

Apr 30, 2019 Version 1

Metrics

3,590

592

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv.8047820.v1

Funding

National Science Foundation OAC-1560168

Author’s competing interest statement

The authors declare no competing financial interest.

Random Forest Refinement of Pairwise Potentials for Protein-ligand Decoy Detection

Authors

Abstract

Keywords

Supplementary materials

Comments

Now Published

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Share