Scalable Drug Property Prediction via Automated Machine Learning

Xinqi Li; Sergio Pascual-Diaz; Calum Hand; Rory Garland; Waseem Abbas; Faiz Khan; Nikhil Das; Vedant Desai; Mohamed  AbouZleikha; Matthew Clark

doi:10.26434/chemrxiv-2025-2p12l

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Scalable Drug Property Prediction via Automated Machine Learning

17 January 2025, Version 1

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The integration of artificial intelligence technologies into pharmaceutical research is crucial for gaining an early understanding of molecular properties, thereby facilitating successful drug design. Constructing a machine learning (ML) model however, requires knowledge spanning from data preprocessing and feature engineering to model fine-tuning, posing a challenge for chemists to effectively utilize ML tools for drug property predictions. This paper introduces a model training engine (MTE), which is a scalable automated ML pipeline that supports end-to-end \textit{in silico} drug property prediction. To accelerate the training process, a paralleled model fine-tuning scheme is developed for model optimization and selection, reducing the time complexity from $\mathcal{O}(n\times k)$ to $\mathcal{O}(n + k^2)$, where $ k >1$ and $k^2$ is much smaller than $n$. The MTE is benchmarked against five state-of-the-art models using twenty-two Therapeutic Data Commons ADMET datasets. The experimental results demonstrate the effectiveness and robustness of the MTE across diverse molecular data prediction tasks.

Keywords

Machine Learning

ADMET

Hyperparameter Optimisation

Distributed Computing

Supplementary materials

Title

Description

Actions

Title

Supplementary Information for Scalable Drug Property Prediction via Automated Machine Learning

Description

Additional information for main text including additional experiments

Actions

Supplementary weblinks

Title

Description

Actions

Title

Github

Description

Open source code

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Jan 20, 2025 Version 2

Jan 17, 2025 Version 1

Metrics

662

315

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2025-2p12l

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Scalable Drug Property Prediction via Automated Machine Learning

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share