These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
Preprints are manuscripts made publicly available before they have been submitted for formal peer review and publication. They might contain new research findings or data. Preprints can be a draft or final version of an author's research but must not have been accepted for publication at the time of submission.
submitted on 19.10.2018 and posted on 22.10.2018by Jun Pei, Zheng Zheng, Kenneth M. Merz Jr.
In this work, via the use
of the ‘comparison’ concept, Random Forest (RF) models were successfully
generated using unbalanced data sets that assign different importance factors
to atom pair potentials to enhance their ability to identify native proteins
from decoy proteins. Individual and combined data sets consisting of twelve
decoy sets were used to test the performance of the RF models. We find that RF
models increase the recognition of native structures without affecting their
ability to identify the best decoy structures. We also created models using
scrambled atom types, which create physically unrealistic probability
functions, in order to test the ability of the RF algorithm to create useful
models based on inputted scrambled probability functions. From this test we
find that we are unable to create models that are of similar quality relative
to the unscrambled probability functions. Next we created uniform probability
functions where the peak positions as the same as the original, but each
interaction has the same peak height. Using these uniform potentials we were
able to recover models as good as the ones using the full potentials suggesting
all that is important in these models are the experimental peak positions.