These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
2 files

Random Forest Refinement of the KECSA2 Knowledge-based Scoring Function for Protein Decoy Detection

submitted on 19.10.2018, 21:23 and posted on 22.10.2018, 14:12 by Jun Pei, Zheng Zheng, Kenneth M. Merz Jr.
In this work, via the use of the ‘comparison’ concept, Random Forest (RF) models were successfully generated using unbalanced data sets that assign different importance factors to atom pair potentials to enhance their ability to identify native proteins from decoy proteins. Individual and combined data sets consisting of twelve decoy sets were used to test the performance of the RF models. We find that RF models increase the recognition of native structures without affecting their ability to identify the best decoy structures. We also created models using scrambled atom types, which create physically unrealistic probability functions, in order to test the ability of the RF algorithm to create useful models based on inputted scrambled probability functions. From this test we find that we are unable to create models that are of similar quality relative to the unscrambled probability functions. Next we created uniform probability functions where the peak positions as the same as the original, but each interaction has the same peak height. Using these uniform potentials we were able to recover models as good as the ones using the full potentials suggesting all that is important in these models are the experimental peak positions.


Email Address of Submitting Author


Michigan State University, Department of Chemistry


United States

ORCID For Submitting Author


Declaration of Conflict of Interest

The authors declare no competing financial interest.


Read the published paper

in Journal of Chemical Information and Modeling