Abstract
The design of small molecules is crucial for technological applications ranging from drug discovery to energy storage. Due to the vast design space available to modern synthetic chemistry, the community has increasingly sought to use data-driven and machine learning approaches to navigate this space. Although generative machine learning
methods have recently shown potential for computational molecular design, their use is hindered by complex training procedures, and they often fail to generate valid and
unique molecules. In this context, pre-trained Large Language Models (LLMs) have emerged as potential tools for molecular design, as they appear to be capable of creating and modifying molecules based on simple instructions provided through natural language prompts. In this work, we show that the Claude 3 Opus LLM can read, write, and modify molecules according to prompts, with an impressive 97% valid and unique molecules. By quantifying these modifications in a low-dimensional latent space, we
systematically evaluate the model’s behavior under different prompting conditions. Notably, the model is able to perform guided molecular generation when asked to manipulate
the electronic structure of molecules using simple, natural-language prompts. Our findings highlight the potential of LLMs as powerful and versatile molecular design
engines.
Supplementary materials
Title
Supplementary Information (SI) for “Large Language Models as Molecular Design Engines”
Description
Supplementary Information (SI) of the paper, having additional metrics that were recorded and supporting figures to accompany the primary
manuscript.
Actions
Supplementary weblinks
Title
Dataset for "Large Language Models as Molecular Design Engines"
Description
Comprises of Data (claude-gpt-paper.zip), Codes (claude-gpt-paper-codes.zip ), and Molecular Viewer app (llm-visulizer-dashapp.zip) for viewing the molecules generated by the Large Language Model.
Actions
View Title
GitHub Repository for the codes used in the paper "Large Language Models as Molecular Design Engines"
Description
Has simple google colab notebook (GPT_modification_just_plots.ipynb), which can be run on google colab to generate the figures of the paper. More information on how to run this notebook is given in the it as markdown cells in the notebook and the readme file in the repository.
Actions
View