Leveraging Large Language Models for Predictive Chemistry

17 October 2023, Version 3
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Machine learning has revolutionized many fields and has recently found applications in chemistry and materials science. The small datasets commonly found in chemistry sparked the development of sophisticated machine-learning approaches that incorporate chemical knowledge for each application and, therefore, require much expertise to develop. Here, we show that large language models trained on vast amounts of text extracted from the internet can easily be adapted to solve various tasks in chemistry and materials science by fine-tuning them to answer chemical questions in natural language with the correct answer. We compared this approach with dedicated machine-learning models for many applications spanning properties of molecules and materials to the yield of chemical reactions. Surprisingly, this approach performs comparable to or even outperforms the conventional techniques---particularly in the low data limit. In addition, we can perform inverse design successfully by simply inverting the questions. The high performance, especially for small data sets, combined with the ease of use, can fundamentally impact how we leverage machine learning in the chemical and material sciences. Next to a literature search, querying a foundation model might become a routine way to bootstrap a project by leveraging the collective knowledge encoded in these foundation models or to provide a baseline for predictive tasks.

Keywords

GPT3
LLM
Data Science
Machine Learning

Supplementary materials

Title
Description
Actions
Title
Supplementary Information
Description
Additional experiments and information.
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.