Abstract
Machine learning has revolutionized many fields and has recently found applications in chemistry and materials science. The small datasets commonly found in chemistry sparked the development of sophisticated machine-learning approaches that incorporate chemical knowledge for each application and, therefore, require much expertise to develop. Here, we show that large language models trained on vast amounts of text extracted from the internet can easily be adapted to solve various tasks in chemistry and materials science by fine-tuning them to answer chemical questions in natural language with the correct answer. We compared this approach with dedicated machine-learning models for many applications spanning properties of molecules and materials to the yield of chemical reactions. Surprisingly, this approach performs comparable to or even outperforms the conventional techniques---particularly in the low data limit. In addition, we can perform inverse design successfully by simply inverting the questions. The high performance, especially for small data sets, combined with the ease of use, can fundamentally impact how we leverage machine learning in the chemical and material sciences. Next to a literature search, querying a foundation model might become a routine way to bootstrap a project by leveraging the collective knowledge encoded in these foundation models or to provide a baseline for predictive tasks.
Supplementary materials
Title
Supplementary Information
Description
Additional experiments and information.
Actions