Machine learned calibrations to high-throughput molecular excited state calculations

Shomik Verma; Miguel Rivera; David O. Scanlon; Aron Walsh

doi:10.26434/chemrxiv-2022-08jm9-v2

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Machine learned calibrations to high-throughput molecular excited state calculations

01 March 2022, Version 2

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Understanding the excited state properties of molecules provides insights into how they interact with light. These interactions can be exploited to design compounds for photochemical applications, including enhanced spectral conversion of light to increase the efficiency of photovoltaic cells. While chemical discovery is time- and resource-intensive experimentally, computational chemistry can be used to screen large-scale databases for molecules of interest in a procedure known as high-throughput virtual screening. The first step usually involves a high-speed but low-accuracy method to screen large numbers of molecules (potentially millions) so only the best candidates are evaluated with expensive methods. However, use of a coarse first-pass screening method can potentially result in high false positive or false negative rates. Therefore, this study uses machine learning to calibrate a high-throughput technique (xTB-sTDA) against a higher accuracy one (TD-DFT). Testing the calibration model shows a ~6-fold decrease in error in-domain and a ~3-fold decrease out-of-domain. The resulting mean absolute error of ~0.14 eV is in line with previous work in machine learning calibrations and out-performs previous work in linear calibration of xTB-sTDA. We then apply the calibration model to screen a 250k molecule database and map inaccuracies of xTB-sTDA in chemical space. We also show generalizability of the workflow by calibrating against a higher-level technique (CC2), yielding a similarly low error. Overall, this work demonstrates machine learning can be used to develop a both cheap and accurate method for large-scale excited state screening, enabling accelerated molecular discovery across a variety of disciplines.

Keywords

Excited states

Time-dependent Density Functional Theory

Supplementary materials

Title

Description

Actions

Title

Supplementary Information for "Machine learned calibrations to high-throughput molecular excited state calculations"

Description

Supplementary information includes chemical information about training sets, TD-DFT settings, ML model architecture, and additional plots of dataset calibration, additional details about active learning, further analysis of high-throughput screening results, and additional substructure analysis of xTB-sTDA error categories.

Actions

Supplementary weblinks

Title

Description

Actions

Title

xTB-ML data

Description

Data repository for paper. Includes raw data and trained ML models.

Actions

View

Title

xTB-ML workflow

Description

Code repository for paper. Includes scripts to run TD-DFT, xTB, and train/test ML models.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

Machine learned calibrations to high-throughput molecular excited state calculations

Shomik Verma, Miguel Rivera, David O. Scanlon, Aron Walsh journal article

The Journal of Chemical Physics , Volume 156, Issue 13

Print publication date: Apr 07, 2022

Version History

Mar 01, 2022 Version 2

Jan 11, 2022 Version 1

Version Notes

We have added various details to make the paper clearer. We have included references to the delta-ML approach the paper was based on. We re-made Figure 2 to include all training and test datasets. We included additional cross-validation results for the training datasets. We included a comparison of direct vs. delta ML models for the 300k training set. We included additional analysis for the HTVS results section, and for the xTB-sTDA error subsection. Finally, we added additional ML analysis to the CC2 results section, including a transfer learning model and a B3LYP to CC2 calibration model, to help improve the accuracy of xTB calibrations.

Metrics

2,009

628

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2022-08jm9-v2

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Machine learned calibrations to high-throughput molecular excited state calculations

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Now Published

Version History

Version Notes

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share