A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compounds

A. Lina Heinzke; Barbara Zdrazil; Paul D. Leeson; Robert J. Young; Axel Pahl; Herbert Waldmann; Andrew R. Leach

doi:10.26434/chemrxiv-2024-vj70m-v2

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compounds

11 March 2024, Version 2

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Providing a better understanding of what makes a compound a successful drug candidate is crucial for reducing the high attrition rates in drug discovery. Analyses of the differences between active compounds, clinical candidates and drugs require high-quality datasets. However, most datasets of drug discovery programs are not openly available. This work introduces a dataset of compound-target pairs extracted from the open-source bioactivity database ChEMBL (release 32). Compound-target pairs in the dataset either have at least one measured activity or are part of the manually curated set of known interactions in ChEMBL. Known interactions between drugs or clinical candidates and targets are specifically annotated to facilitate analyses on differences between drugs, clinical candidates, and other active compounds. In total, the dataset comprises 614,594 compound-target pairs, 5,109 (3,932) of which are known interactions between drugs (clinical candidates) and targets. The extraction is performed in an automated manner and fully reproducible. We are providing not only the datasets but also the code to rerun the analyses with other ChEMBL releases.

Keywords

drug-target-interactions

Supplementary weblinks

Title

Description

Actions

Title

Dataset

Description

The dataset and subsets for all ChEMBL versions from version 26 to 33.

Actions

View

Title

Code (Zenodo)

Description

The code used to generate the dataset.

Actions

View

Title

Code (GitHub)

Description

The code on GitHub.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Mar 11, 2024 Version 2

Mar 06, 2024 Version 1

Version Notes

The title was updated. Multi-part figures were combined into single images and the image quality was improved.

Metrics

1,709

828

Views

Downloads

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2024-vj70m-v2

Funding

Wellcome Trust

104104/A/14/Z, 218244/Z/19/Z

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compounds

Authors

Abstract

Keywords

Supplementary weblinks

Comments

Version History

Version Notes

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share