How to make machine learning scoring functions competitive with FEP

24 June 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Machine learning offers a promising approach for fast and accurate binding affin- ity predictions. However, current models often fail to generalise beyond their training data and are not robustly evaluated on a diverse range of benchmarks, limiting their application in drug discovery projects. In this work, we address these issues by intro- ducing a novel graph neural network model called AEV-PLIG (Atomic Environment Vector - Protein Ligand Interaction Graph), which encodes protein-ligand interactions via atomic environment vectors to improve generalisation. We evaluate our model on improved benchmarks, including our new out-of-distribution test set we call OOD Test, and two alternative benchmark systems used for free energy perturbation (FEP) calculations, and highlight competitive performance of AEV-PLIG across the board. Moreover, we demonstrate how augmented data can be leveraged to enhance predic- tion accuracy, and how enriching the training data with three complexes from a con- generic series of ligands binding to a target of interest improves performance further. Altogether, we show that these strategies improve the applicability of machine learn- ing scoring functions and enable state-of-the-art performance nearing the accuracy of physics-based simulation methods—but at a fraction of their computational cost. This practical approach extends the predictive capabilities of machine learning for molecular discovery, paving the way for its broader use in computer-aided drug design.

Keywords

prediction
affinity
Absolute binding free energies
computer-aided drug design
protein-ligand
binding

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
Additional supporting figures and tables.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.