Active site sequence representation of human kinases outperforms full sequence for affinity prediction and inhibitor generation: 3D effects in a 1D model

Jannis Born; Tien Huynh; Astrid Stroobants; Wendy Cornell; Matteo Manica

doi:10.26434/chemrxiv-2021-np7xj-v4

Biological and Medicinal Chemistry

Search within Biological and Medicinal Chemistry

Active site sequence representation of human kinases outperforms full sequence for affinity prediction and inhibitor generation: 3D effects in a 1D model

29 October 2021, Version 4

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Recent advances in deep learning have enabled the development of large-scale multimodal models for virtual screening and de novo molecular design. The human kinome with its abundant sequence and inhibitor data presents an attractive opportunity to develop proteochemometric models that exploit the size and internal diversity of this family of targets. Here we challenge a standard practice in sequence-based affinity prediction models: instead of leveraging the full primary structure of proteins, each target is represented by a sequence of 29 residues defining the ATP binding site. In kinase-ligand binding affinity prediction, our results show that the reduced active site sequence representation is not only computationally more efficient but consistently yields significantly higher performance than the full primary structure. This trend persists across different models, datasets, performance metrics and holds true when predicting affinity for both unseen ligands and kinases. Our interpretability analysis further demonstrates that, even without supervision, the full sequence model can learn to focus on the active site residues to a higher extent. We then investigate a de novo molecular design task and find that the active site provides benefits in the computational efficiency, but otherwise, both kinase representations yield similar optimized affinities (for both SMILES and SELFIES-based molecular generators). Our work challenges the assumption that full primary structure is indispensable for modelling human kinases. We hope that these results will inspire additional investigation into hybrid mechanistic-DL modeling approaches to support the identification and optimization of kinase inhibitors’ candidates.

Keywords

kinase representation

Supplementary weblinks

Title

Description

Actions

Title

GitHub repository

Description

Code to reproduce the experiments for "Active site sequence representation of human kinases outperforms full sequence for affinity prediction and inhibitor generation: 3D effects in a 1D model"

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

Active Site Sequence Representations of Human Kinases Outperform Full Sequence Representations for Affinity Prediction and Inhibitor Generation: 3D Effects in a 1D Model

Jannis Born, Tien Huynh, Astrid Stroobants, Wendy D. Cornell, Matteo Manica journal article

Journal of Chemical Information and Modeling , Volume 62, Issue 2

Online publication date: Dec 14, 2021

Version History

Oct 29, 2021 Version 4

Jul 26, 2021 Version 3

Jul 20, 2021 Version 2

Jul 14, 2021 Version 1

Version Notes

Extended validation of the predictive models. Adding comparisons with SOTA model. General improvements in the presentation of the methodologies and the results.

Metrics

2,594

919

Views

Downloads

Citations

License

The content is available under CC BY NC 4.0

DOI

10.26434/chemrxiv-2021-np7xj-v4

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Active site sequence representation of human kinases outperforms full sequence for affinity prediction and inhibitor generation: 3D effects in a 1D model

Authors

Abstract

Keywords

Supplementary weblinks

Comments

Now Published

Version History

Version Notes

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share