General reactive machine learning potentials for CHON elements

Bowen Li; Sixuan Mi; Jin Xiao; Duo Zhang; Shuwen Zhang; John Zhang; Han Wang; Tong Zhu

doi:10.26434/chemrxiv-2025-1d293-v2

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

General reactive machine learning potentials for CHON elements

18 June 2025, Version 2

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Accurate and efficient modeling of chemical reactions is paramount for advancements in catalysis, synthesis, and materials design. Machine learning potentials (MLPs) offer a computationally efficient alternative to \textit{ab initio} methods; however, developing broadly applicable reactive MLPs remains challenging due to inherent chemical complexity. Here, we present a scalable workflow for developing reactive MLPs specifically tailored to C, H, O, N-containing systems. Our approach involves constructing a large-scale pre-training dataset of over 17 million non-equilibrium structures along chemical reaction pathways, generated by combining the Nudged Elastic Band (NEB) method and structure alignment algorithms, with energies and forces labelled at the semi-empirical level. Subsequently, a high-precision fine-tuning dataset containing over 200,000 structures was efficiently built at the Density Functional Theory (DFT) level by integrating active learning methods. An array of model architectures and training paradigms, including pretraining-finetuning and transfer learning frameworks, were systematically benchmarked through rigorous evaluations. Through this process, we developed an optimized MLP model demonstrating state-of-the-art performance in both predictive accuracy and generalization capability for reactive chemical environments involving C, H, O, and N elements. Notably, when integrated with the machine-learned DFT model developed in our prior work, the resulting model achieves accuracy closely approaching that of coupled cluster calculations and demonstrably outperforms many conventional DFT methods across diverse reactive systems. This work establishes a robust framework for constructing highly accurate and transferable reactive MLPs, paving the way for large-scale, high-fidelity simulations of complex chemical processes relevant to numerous scientific and engineering disciplines.

Keywords

Reactive machine learning potentials

Pretraining

∆-learning

Machine learning DFT

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Jun 18, 2025 Version 2

Jun 16, 2025 Version 1

Version Notes

The previous version omitted one author, and the new version has corrected this error.

Metrics

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2025-1d293-v2

Funding

National Natural Science Foundation of China

22222303, 22173032, 22250710136, 22303113, 2233300

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

General reactive machine learning potentials for CHON elements

Authors

Abstract

Keywords

Comments

Version History

Version Notes

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share