Abstract
Drug discovery pipelines nowadays rely on machine learning models to explore and evaluate large chemical spaces. While including 3D structural information is considered beneficial, structural models are hindered by the availability of protein-ligand complex structures. Exemplified for kinase drug discovery, we address this issue by generating kinase-ligand complex data using template docking for the kinase compound subset of available ChEMBL assay data. To evaluate the benefit of the created complex data, we use it to train a structure-based E(3)-invariant graph neural network (GNN). Our evaluation shows that binding affinities can be predicted with significantly higher precision by models that take synthetic binding poses into account compared to ligand or DTI models only.
Supplementary weblinks
Title
Raw kinodata-3D dataset
Description
A Zenodo record holding the raw kinase-ligand complex data we generated, including ligand structures, poses, KLIFS pocket structures, and CHEMBL bioactivity measurements.
Actions
View Title
Preprocessed kinodata-3D for PyTorch Geometric & kinase affinity prediction models
Description
A Zenodo record holding dataset and model artifacts that can be used with our published code.
Actions
View Title
Binding affinity prediction case study
Description
The code used to carry out our binding affinity prediction case study.
Actions
View Title
Kinodata-3D data generation pipeline
Description
The code used to generate the kinodata-3D dataset.
Actions
View