Real-World Molecular Out-Of-Distribution: Specification and Investigation

Prudencio Tossou; Cas Wognum; Michael Craig; Hadrien Mary; Emmanuel Noutahi

doi:10.26434/chemrxiv-2023-q11q4-v2

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Real-World Molecular Out-Of-Distribution: Specification and Investigation

26 July 2023, Version 2

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

This study presents a rigorous framework for investigating Molecular Out-Of-Distribution (MOOD) generalization in drug discovery. The concept of MOOD is first clarified through a problem specification that demonstrates how the covariate shifts encountered during real-world deployment can be characterized by the distribution of sample distances to the training set. We find that these shifts can cause performance to drop by up to 60% and uncertainty calibration by up to 40%. This leads us to propose a splitting protocol that aims to close the gap between deployment and testing. Then, using this protocol, a thorough investigation is conducted to assess the impact of model design, model selection and dataset characteristics on MOOD performance and uncertainty calibration. We find that appropriate representations and algorithms with built-in uncertainty estimation are crucial to improve performance and uncertainty calibration. This study sets itself apart by its exhaustiveness and opens an exciting avenue to benchmark meaningful, algorithmic progress in molecular scoring. All related code can be found on Github at https://github.com/valence-labs/mood-experiments.

Keywords

Supplementary materials

Title

Description

Actions

Title

MOOD: Supplementary Material

Description

Provides a variety of additional figures to support the results from the main text.

Actions

Supplementary weblinks

Title

Description

Actions

Title

MOOD: Code base

Description

A Github repository with all the code that was used for the results in the MOOD paper.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Excellent work, although I found it a bit confusing because less known concepts are discussed before they are defined. To name a few: uncertainty calibration, covariate shifts, IID/OOD algorithms, RCT. Moving the Methods section up, after the Introduction, would greatly improve it, IMO.

Version History

Jul 26, 2023 Version 2

Jun 05, 2023 Version 1

Version Notes

Update affiliation.

Metrics

3,975

1,853

Views

Downloads

License

The content is available under CC BY NC 4.0

DOI

10.26434/chemrxiv-2023-q11q4-v2

Funding

Mitacs

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Real-World Molecular Out-Of-Distribution: Specification and Investigation

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Version Notes

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share