Combined physics- and machine-learning-based method to identify druggable binding sites using SILCS-Hotspots

Erik Nordquist; Mingtian Zhao; Anmol Kumar; Alex MacKerell

doi:10.26434/chemrxiv-2024-hrqq9-v2

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Combined physics- and machine-learning-based method to identify druggable binding sites using SILCS-Hotspots

20 August 2024, Version 2

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Identifying druggable binding sites on proteins is an important and challenging problem, particularly for cryptic, allosteric binding sites that may not be obvious from X-ray, cryo-EM, or predicted structures. The Site-Identification by Ligand Competitive Saturation (SILCS) method accounts for the flexibility of the target protein using all-atom molecular simulations that include various small molecule solutes in aqueous solution. During the simulations the combination of protein flexibility and comprehensive sampling of the water and solute spatial distributions can identify buried binding pockets absent in experimentally-determined structures. Previously, we reported a method for leveraging the information in the SILCS sampling to identify binding sites (termed Hotspots) of small mono- or bi-cyclic compounds, a subset of which coincide with known binding sites of drug-like molecules. Here we build in that physics-based approach and present a ML model for ranking the Hotspots according to the likelihood they can accommodate drug-like molecules (e.g. molecular weight > 200 daltons). In the independent validation set, which includes various enzymes and receptors, our model recalls 67% and 89% of experimentally-validated ligand binding sites in the top 10 and 20 ranked Hotspots, respectively. Furthermore, we show that the model’s output Decision Function is a useful metric to predict binding sites and their potential druggability in new targets. Given the utility the SILCS method for ligand discovery and optimization the tools presented represent an important advancement in the identification of orthosteric and allosteric binding sites and the discovery of drug-like molecules targeting those sites.

Keywords

Site identification by ligand competitive saturation

protein-ligand interaction

orthosteric

allosteric

computer-aided drug design

CADD

binding site prediction

Supplementary materials

Title

Description

Actions

Title

Supporting information.

Description

Figure S1: Surface-exposed Hotspot 25 in ERK5. Figure S2: Distribution of Hotspot SASA by protein system. Figure S3. Analysis of the recursive feature elimination and the top two principal components (PCs) of the training set. Figure S4: Ranking based on mean LGFE of each Hotspot. Figure S5: Burial of allosteric binding site between GABABR Active TM domains. Figure S6: CryptoSite predictions for NKG2D (A) and TEM-1 (B). Table S1: List of proteins and ligands used for methods validation. Table S2: Training and validation set Hotspots and ligand distances. Table S3: Stratified 5-fold Cross-validation training of higher-order SVM Classifier with polynomial or radial basis functions kernels and a Random Forest model. Table S4. FDA compound screening for selected Hotspots of TEM-1 and GABABR Active.

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

This manuscript has been published in the Journal of Chemical Information and Modeling, doi: 10.1021/acs.jcim.4c01189

Version History

Aug 20, 2024 Version 2

Apr 25, 2024 Version 1

Version Notes

Minor revisions prior to resubmission.

Metrics

862

561

Views

Downloads

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2024-hrqq9-v2

Funding

NIH Office of the Director

GM131710

NIH Office of the Director

T32CA154274

Author’s competing interest statement

A.D.M. Jr. is co-founder and Chief Scientific Officer of SilcsBio, LLC.

Ethics

The author(s) declare that they have sought and gained approval from the relevant ethics committee/IRB for this research and its publication.

Combined physics- and machine-learning-based method to identify druggable binding sites using SILCS-Hotspots

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Version Notes

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share