Evaluation of Cross-Validation Strategies in Sequence-Based Binding Prediction Using Deep Learning

Angela Lopez-del Rio; Alfons Nonell-Canals; David Vidal; Alexandre Perera-Lluna

doi:10.26434/chemrxiv.7133885.v1

Biological and Medicinal Chemistry

Search within Biological and Medicinal Chemistry

Evaluation of Cross-Validation Strategies in Sequence-Based Binding Prediction Using Deep Learning

27 September 2018, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Binding prediction between targets and drug-like compounds through Deep Neural Networks have generated promising results in recent years, outperforming traditional machine learning-based methods. However, the generalization capability of these classification models is still an issue to be addressed. In this work, we explored how different cross-validation strategies applied to data from different molecular databases affect to the performance of binding prediction proteochemometrics models. These strategies are: (1) random splitting, (2) splitting based on K-means clustering (both of actives and inactives), (3) splitting based on source database and (4) splitting based both in the clustering and in the source database. These schemas are applied to a Deep Learning proteochemometrics model and to a simple logistic regression model to be used as baseline. Additionally, two different ways of describing molecules in the model are tested: (1) by their SMILES and (2) by three fingerprints. The classification performance of our Deep Learning-based proteochemometrics model is comparable to the state of the art. Our results show that the lack of generalization of these models is due to a bias in public molecular databases and that a restrictive cross-validation schema based on compounds clustering leads to worse but more robust and credible results. Our results also show better performance when representing molecules by their fingerprints.

Keywords

Deep Learning

protein-ligand binding

machine learning

chemotype bias

cross validation strategies

Supplementary materials

Title

Description

Actions

Title

supporting-info-cv-alopez

Description

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

Evaluation of Cross-Validation Strategies in Sequence-Based Binding Prediction Using Deep Learning

Angela Lopez-del Rio, Alfons Nonell-Canals, David Vidal, Alexandre Perera-Lluna journal article

Journal of Chemical Information and Modeling , Volume 59, Issue 4

Online publication date: Feb 07, 2019

Version History

Sep 27, 2018 Version 1

Version Notes

First version

Metrics

3,652

729

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv.7133885.v1

Funding

This research was partially supported by an Industrial Doctorate grant from the Generalitat of Catalonia to A.L.-d.R. (DI 2016-080). This work was also supported in part within the framework of the Ministerio de Economía, Industria y Competitividad (MINECO) with grants TEC2014–60337–R and TEC2017 DPI2017-89827-R, and the Centro de Investigación Biomédica en Red (CIBER) of Bioengineering, Biomaterials and Nanomedicine, an initiative of the Instituto de Salud“ Carlos III” (ISCIII).

Author’s competing interest statement

ALR, ANC and DV are affiliated with Mind the Byte SL, a company that develops and provides solutions for computational drug discovery using Big Data and Artificial Intelligence approaches.

Evaluation of Cross-Validation Strategies in Sequence-Based Binding Prediction Using Deep Learning

Authors

Abstract

Keywords

Supplementary materials

Comments

Now Published

Version History

Version Notes

Metrics

License

DOI

Funding

Author’s competing interest statement

Share