Conformal Prediction-based Machine Learning in Cheminformatics: Current Applications and New Challenges

Mario Astigarraga; Andrés Sánchez-Ruiz; Gonzalo Colmenarejo

doi:10.26434/chemrxiv-2025-p36vt

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Conformal Prediction-based Machine Learning in Cheminformatics: Current Applications and New Challenges

29 January 2025, Version 1

Review

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Conformal Prediction (CP) is a distribution-free Machine Learning (ML) framework that has been developed in the last ~25 years to provide well calibrated prediction subsets/intervals that include the true label with a user pre-defined probability, only requiring data exchangeability. It is based on the concept of nonconformity (or dissimilarity) of the new prediction compared to previous data and their predictions, so that the prediction subset/interval size is larger for new “unusual” instances and smaller for “typical” instances. Given its simplicity and ease of applicability, since 2012 it has been widely adopted in Cheminformatics, especially in the Quantitative Structure-Activity Relationship (QSAR) modeling and Molecular Screening areas. This rapid popularization of CP in Cheminformatics can be explained on the grounds that: a) it can handle the applicability domain (AD) issue of ML models, of large importance in Cheminformatics due to the immense size of the chemical space; b) it deals with classification of heavily imbalanced datasets typical in Molecular Screening; and c) it quantifies compound-specific prediction uncertainties, especially useful as it allows to implement gain-cost strategies to accelerate drug discovery by reducing compounds to test. This comprehensive review introduces the method, provides a full appraisal of the work done in the field of Cheminformatics (with special emphasis in the QSAR and Molecular Screening arenas), and discusses its pros and cons and new challenges, especially for Deep Learning applications and nonexchangeable datasets, a very frequent situation in Cheminformatics.

Keywords

conformal prediction

cheminformatics

quantitative structure-activity relationship

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Final published version: Astigarraga, M., Sánchez-Ruiz, A.; Colmenarejo, G. Conformal Prediction-based AI in Cheminformatics: Current Applications and New Challenges”. Artificial Intelligence In The Life Sciences. (2025) 7, 100127. DOI: 10.1016/j.ailsci.2025.100127

Version History

Jan 29, 2025 Version 1

Metrics

695

284

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2025-p36vt

Funding

MCIN/AEI/10.13039/501100011033 and by “ERDF A way of making Europe”

PID2021-127318OB-I00

Consejería de Ciencia, Universidades e Innovación de la Comunidad de Madrid, Spain

PEJ-2020-AI/BIO-17904

Consejería de Ciencia, Universidades e Innovación de la Comunidad de Madrid, Spain

PIPF-2022/SAL-GL-26278

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Conformal Prediction-based Machine Learning in Cheminformatics: Current Applications and New Challenges

Authors

Abstract

Keywords

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share