Active Learning of Atomic Size Gas/Solid Potential Energy Surfaces via Physics Aware Models

29 May 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

We propose an active learning (AL) framework to develop force fields (FFs) that accurately model the potential energy surfaces (PES) of gas/solid atomic-scale complexes. A central challenge is integrating AL with flexible, physics-aware potentials to achieve quantum-level accuracy for complex interfacial systems. Our approach trains physics-aware potentials, with incorporated flexibility and smoothness, on actively sampled Density Functional Theory (DFT) data to describe interactions between undercoordinated atomic silver (Ag) clusters and gaseous pollutants (CO$_2$, CO, SO$_2$), relevant for environmental applications like sensing. The AL process follows three stages: (1) FFs are trained using adaptable physics aware potentials of semi-empirical descriptors, optimized via a Pareto analysis scheme; (2) refined FFs generate candidate structures through Metropolis Hastings Monte Carlo (MHMC) or stochastic molecular dynamics (sMD); (3) a subset of candidates is selected for DFT labeling based on an outlier score (OS), which utilizes the existing data descriptor distributions, ensuring diverse PES exploration. This framework produces FFs capable of capturing cohesive, physisorption, and chemisorption interactions with accuracy comparable to \textit{ab initio} methods and advanced machine learning models, while retaining the efficiency of semi-empirical potentials. Our methodology is highly versatile, easily accommodating various choices of descriptors, model basis sets, and sampling techniques.

Keywords

force-field
Active Learning
PES sampling
gas/solid interface

Supplementary materials

Title
Description
Actions
Title
Manuscript supporting figures, tables and discussion
Description
Supporting information includes extendent discussion on modeling details, result figures and discussion and force-field parameter tables.
Actions
Title
Generated data through the Active Learning scheme
Description
Gaussian output .log files for each single point calculation and extracted data in custom made .xyz files. Read the the "README.txt." for further details.
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.