Comparison of Structure- and Ligand-Based Scoring Functions for Deep Generative Models: A GPCR Case Study

Morgan Thomas; Rob Smith; Noel M. O’Boyle; Chris de Graaf; Andreas Bender

doi:10.26434/chemrxiv.14138147.v1

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Comparison of Structure- and Ligand-Based Scoring Functions for Deep Generative Models: A GPCR Case Study

02 March 2021, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Deep generative models have shown the ability to devise both valid and novel chemistry, which could significantly accelerate the identification of bioactive compounds. Many current models, however, use molecular descriptors or ligand-based predictive methods to guide molecule generation towards a desirable property space. This restricts their application to relatively data-rich targets, neglecting those where little data is available to sufficiently train a predictor. Moreover, ligand-based models often bias molecule generation towards previously established chemical space, thereby limiting their ability to identify truly novel chemotypes. In this work, we assess the ability of using molecular docking via Glide – a structure-based approach – as a scoring function to guide the deep generative model REINVENT and compare model performance and behaviour to a ligand-based scoring function. Additionally, we modify the previously published MOSES benchmarking dataset to remove any induced bias towards non-protonatable groups. We also propose a new metric to measure dataset diversity, which is less confounded by the distribution of heavy atom count than the commonly used internal diversity metric. With respect to the main findings, we found that when optimizing the docking score against DRD2, the model improves predicted ligand affinity beyond that of known DRD2 active molecules. In addition, generated molecules occupy complementary chemical and physicochemical space compared to the ligand-based approach, and novel physicochemical space compared to known DRD2 active molecules. Furthermore, the structure-based approach learns to generate molecules that satisfy crucial residue interactions, which is information only available when taking protein structure into account. Overall, this work demonstrates the advantage of using molecular docking to guide de novo molecule generation over ligand-based predictors with respect to predicted affinity, novelty, and the ability to identify key interactions between ligand and protein target. Practically, this approach has applications in early hit generation campaigns to enrich a virtual library towards a particular target, and also in novelty-focused projects, where de novo molecule generation either has no prior ligand knowledge available or should not be biased by it.

Keywords

Artificial intelligence

Structure-based drug design

Ligand-based drug design

Generative models

Deep learning

Recurrent neural network

Docking

De novo

Supplementary materials

Title

Description

Actions

Title

Supporting information

Description

Actions

Title

Additional file 2

Description

Actions

Title

data

Description

Actions

Supplementary weblinks

Title

Description

Actions

Title

Description

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study

Morgan Thomas, Robert T. Smith, Noel M. O’Boyle, Chris de Graaf, Andreas Bender journal article

Journal of Cheminformatics , Volume 13, Issue 1

Online publication date: May 13, 2021

Version History

Mar 02, 2021 Version 1

Metrics

3,210

1,052

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv.14138147.v1

Author’s competing interest statement

No conflict of interest

Comparison of Structure- and Ligand-Based Scoring Functions for Deep Generative Models: A GPCR Case Study

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Now Published

Version History

Metrics

License

DOI

Author’s competing interest statement

Share