Debiasing Algorithms for Protein Ligand Binding Data do not Improve Generalisation

Vikram Sundar; Lucy Colwell

doi:10.26434/chemrxiv.8139194.v1

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Debiasing Algorithms for Protein Ligand Binding Data do not Improve Generalisation

17 May 2019, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The structured nature of chemical data means machine learning models trained to predict protein-ligand binding risk overfitting the data, impairing their ability to generalise and make accurate predictions for novel candidate ligands. To address this limitation, data debiasing algorithms systematically partition the data to reduce bias. When models are trained using debiased data splits, the reward for simply memorising the training data is reduced, suggesting that the ability of the model to make accurate predictions for novel candidate ligands will improve. To test this hypothesis, we use distance-based data splits to measure how well a model can generalise. We first confirm that models perform better for randomly split held-out sets than for distant held-out sets. We then debias the data and find, surprisingly, that debiasing typically reduces the ability of models to make accurate predictions for distant held-out test sets. These results suggest that debiasing reduces the information available to a model, impairing its ability to generalise.

Keywords

Protein Ligand Binding

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

The Effect of Debiasing Protein–Ligand Binding Data on Generalization

Vikram Sundar, Lucy Colwell journal article

Journal of Chemical Information and Modeling , Volume 60, Issue 1

Online publication date: Dec 11, 2019

Version History

May 17, 2019 Version 1

Metrics

3,162

769

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv.8139194.v1

Author’s competing interest statement

no conflict of interest

Debiasing Algorithms for Protein Ligand Binding Data do not Improve Generalisation

Authors

Abstract

Keywords

Comments

Now Published

Version History

Metrics

License

DOI

Author’s competing interest statement

Share