Transfer Learning for a Foundational Chemistry Model

Emma King-Smith

doi:10.26434/chemrxiv-2023-gnzpf

Data-driven chemistry has garnered much interest concurrent with improvements in hardware and the development of new machine learning models. However, a notable bottleneck for data-driven chemistry specifically is the challenge in obtaining sufficiently large, accurate datasets of a desired chemical outcome. Herein, I develop a machine learning framework that makes prediction amid low data: First, a chemical “foundational model” is trained using on a dataset of ~1 million experimental organic crystal structures of organic molecules - a source of big data in the chemistry. A task specific model is then stacked on top on this general model. This approach achieves state-of-the-art performance in a diverse set of tasks – toxicity prediction, yield prediction, and odor prediction. More generally, my work shows that a foundational model approach, which led to step-change in domains such as natural language, can unlock advances in chemistry.

Transfer Learning for a Foundational Chemistry Model

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share

Transfer Learning for a Foundational Chemistry Model

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share