CaTS: Toward Scalable and Efficient Transition State Screening for Catalyst Discovery

12 June 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Large-scale screening of materials via machine learning is emerging as an effective strategy for accelerating scientific discovery and industrial applications. Machine learning methods for transition state (TS)-based screening for catalysts remain underexplored due to the scarcity of TS datasets and the inherent challenges of TS searching tasks. Here, we present a framework for large-scale transition states screening for catalysts (CaTS), which uniquely bridges microscopic reaction kinetics and macroscopic computational efficiency by leveraging TS energy, a mechanistically rigorous yet computationally prohibitive descriptor. CaTS integrates automated structure generation with a machine learning force field-based Nudged Elastic Band (NEB) method, enabling high-throughput TS exploration at 10^4 the speed of density functional theory (DFT). First optimized and validated on a small-molecule TS database comprising 10,000 reactions (achieving sub-0.2 eV errors in TS energy prediction) and further applied to a metal-organic complex catalyst (0.16 eV MAE with only 327 training samples), CaTS achieves DFT-level accuracy at 0.01% computational cost. Scaling to over 1,000 unseen metal-organic complex structures, it identifies top candidates validated by rigorous DFT. AI-assisted analysis using ChatGPT O3 and SHAP confirms that the predictions are consistent with mechanistic heuristics, providing theoretical validation for large-scale prediction. This paradigm shift from static descriptors to kinetic-resolution screening enables industrial-scale catalyst discovery with atomistic precision.

Keywords

Machine learning force field
Homogeneous catalysis
Large-scale screening of materials
AI-assisted mechanism analysis

Supplementary materials

Title
Description
Actions
Title
Supplementary material for "CaTS: Toward Scalable and Efficient Transition State Screening for Catalyst Discovery"
Description
The supplementary information document for "CaTS: Toward Scalable and Efficient Transition State Screening for Catalyst Discovery" (TSFF_SI.pdf) describes the datasets used in the study, such as the Transition1x benchmark dataset containing 9.6 million data points and the rhodium-phosphine complex catalytic dataset. It also covers model architectures like the EquiformerV2 graph neural network (GNN), training details including hyperparameters and pretraining-finetuning workflows, optimization of NEB parameters (e.g., effects of interpolation image numbers and optimizers), model performance (energy and force prediction error curves), applications of the automated structure generation tool ComplexGen, and large-scale screening workflows. Data distributions, model performance, and catalyst screening results are presented through figures and tables, providing technical support for the reproducibility and in-depth understanding of the CaTS framework.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.