Enhancing Semiempirical Quantum Mechanical Scoring with Machine Learning: a new scoring function that accounts for both the enthalpic and entropic contributions to the ligand binding free energy

Thomas Evangelidis; Ilektra-Chara Giassa,; Mario Lovrić

doi:10.26434/chemrxiv-2022-68n6h-v2

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Enhancing Semiempirical Quantum Mechanical Scoring with Machine Learning: a new scoring function that accounts for both the enthalpic and entropic contributions to the ligand binding free energy

27 December 2022, Version 2

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Identifying hit compounds is a principal step in early-stage drug discovery. While many machine learning (ML) approaches have been proposed, in the absence of binding data, molecular docking is the most widely used option to predict binding modes and score hundreds of thousands of compounds for binding affinity to the target protein. Docking's effectiveness is critically dependent on the protein-ligand (P-L) scoring function (SF), thus re-scoring with more rigorous SFs is a common practice. In this pilot study, we scrutinize the PM6-D3H4X/COSMO semi-empirical quantum mechanical (SQM) method as a docking pose re-scoring tool on 17 diverse receptors and ligand decoy sets, totaling 1.5 million P-L complexes. We investigate the effect of explicitly computed ligand conformational entropy and ligand deformation energy on SQM P-L scoring in a virtual screening (VS) setting, as well as molecular mechanics (MM) versus hybrid SQM/MM structure optimization prior to re-scoring. Our results proclaim that there is no obvious benefit from computing ligand conformational entropies or deformation energies and that optimizing only the ligand's geometry on the SQM level is sufficient to achieve the best possible scores. Instead, we leverage machine learning (ML) to include implicitly the missing entropy terms to the SQM score using ligand topology, physicochemical, and P-L interaction descriptors. Our new hybrid scoring function, named SQM-ML, is transparent and explainable, and achieves in average 9% higher AUC-ROC than PM6-D3H4X/COSMO and 3% higher than Glide SP, but with consistent and predictable performance across all test sets, unlike the former two SFs, whose performance is considerably target-dependent and sometimes resembles that of a random classifier. The code to prepare and train SQM-ML models is available at https://github.com/tevang/sqm-ml.git and we believe that will pave the way for a new generation of hybrid SQM/ML protein-ligand scoring functions.

Keywords

computer-aided drug design

structure-based drug design

drug discovery

cheminformatics

chemoinformatics

artificial intelligence

drug discovery

quantum mechanics

free energy

protein-ligand binding

ligand

virtual screening

Supplementary weblinks

Title

Description

Actions

Title

SQM-ML repository

Description

Open source code to process SQM output, compute and prepare features for the training of SQM-ML, evaluate the results, plot them and explain the ML model, along with the final SQM-ML model for production runs.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Dec 27, 2022 Version 2

Dec 23, 2022 Version 1

Version Notes

removed LaTEX symbols from the abstract

Metrics

1,721

438

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2022-68n6h-v2

Funding

Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences

Research Project RVO: 61388963

Operational Programme Research, Development and Education (OP RDE)

Project: ‘Chemical Biology for Drugging Undruggable Targets (ChemBioDrug)’ (No. CZ.02.1.01/0.0/0.0/16_019/0000729)

Czech Academy of Sciences

Programme of support for promising human resources - postdoctoral fellows (PPLZ)

Ministry of Education, Youth and Sports from the Large Infrastructures for Research, Experimental Development and Innovations

‘IT4Innovations National Supercomputing Center - LM2015070

Author’s competing interest statement

The code to create the SQM-ML scoring function is open source under the MIT license. However the raw data, Cuby4 in house version, and all the know how for SQM scoring is a property of IOCB CAS. The authors renounce any economical rights to the source code and data created for the purpose of this publication.

Ethics

The author(s) declare that they have sought and gained approval from the relevant ethics committee/IRB for this research and its publication.

Enhancing Semiempirical Quantum Mechanical Scoring with Machine Learning: a new scoring function that accounts for both the enthalpic and entropic contributions to the ligand binding free energy

Authors

Abstract

Keywords

Supplementary weblinks

Comments

Version History

Version Notes

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share