Assessment of chemistry knowledge in large language models that generate code

Andrew D. White; Glen M. Hocky; Heta A. Gandhi; Mehrad Ansari; Sam Cox; Geemi P. Wellawatte; Subarna Sasmal; Ziyue Yang; Kangxin Liu; Yuvraj Singh; Willmor J. Peña Ccoa

doi:10.26434/chemrxiv-2022-3md3n-v2

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Assessment of chemistry knowledge in large language models that generate code

12 December 2022, Version 2

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

In this work, we investigate the question: do code-generating large language models know chemistry? Our results indicate, mostly yes. To evaluate this, we produce a benchmark set of problems, and evaluate these models based on correctness of code by automated testing and evaluation by experts. We find recent LLMs are able to write correct code across a variety of topics in chemistry and their accuracy can be increased by 30 percentage points via prompt engineering strategies, like putting copyright notices at the top of files. These dataset and evaluation tools are open source which can be contributed to or built upon by future researchers, and will serve as a community resource for evaluating the performance of new models as they emerge. We also describe some good practices for employing LLMs in chemistry. The general success of these models demonstrates that their impact on chemistry teaching and research is poised to be enormous.

Keywords

deep learning

language models

large language models

prompt engineering

Supplementary materials

Title

Description

Actions

Title

Contexts

Description

Contexts used for prompt engineering

Actions

Title

Raw data for automated evaluation

Description

Accuracy data for automated evaluation used to generated figures

Actions

Title

Raw data for expert evaluation

Description

Accuracy data for expert evaluation used to generated figures

Actions

Title

Supporting Information

Description

Additional figures, tables, and analysis.

Actions

Supplementary weblinks

Title

Description

Actions

Title

Associated Data for Assessment of chemistry knowledge in large language models that generate code

Description

Contains website of completions for human evaluable prompts for Codex model and shows how completions were presented to evaluators.

Actions

View

Title

Database of Prompts

Description

All prompts and analysis code used for paper

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

Assessment of chemistry knowledge in large language models that generate code

Andrew D. White, Glen M. Hocky, Heta A. Gandhi, Mehrad Ansari, Sam Cox, Geemi P. Wellawatte, Subarna Sasmal, Ziyue Yang, Kangxin Liu, Yuvraj Singh, Willmor J. Peña Ccoa journal article

Digital Discovery , Volume 2, Issue 2

Online publication date: 2023

Version History

Dec 12, 2022 Version 2

Jul 07, 2022 Version 1

Version Notes

Added additional models for comparison, expanded discussion of differences between models, and various minor clarifications.

Metrics

6,377

3,916

Views

Downloads

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2022-3md3n-v2

Funding

National Institutes of Health

R35GM137966

National Institutes of Health

R35GM138312

National Science Foundation

1764415

Department of Energy

DESC0020464

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Assessment of chemistry knowledge in large language models that generate code

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Now Published

Version History

Version Notes

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share