Abstract
Large-scale screening of materials via machine learning is emerging as an effective strategy for accelerating scientific discovery and industrial applications. Machine learning methods for transition state (TS)-based screening for catalysts remain underexplored due to the scarcity of TS datasets and the inherent challenges of TS searching tasks. Here, we present a framework for large-scale transition states screening for catalysts (CaTS), which uniquely bridges microscopic reaction kinetics and macroscopic computational efficiency by leveraging TS energy, a mechanistically rigorous yet computationally prohibitive descriptor. CaTS integrates automated structure generation with a machine learning force field-based Nudged Elastic Band (NEB) method, enabling high-throughput TS exploration at 10^4 the speed of density functional theory (DFT). First optimized and validated on a small-molecule TS database comprising 10,000 reactions (achieving sub-0.2 eV errors in TS energy prediction) and further applied to a metal-organic complex catalyst (0.16 eV MAE with only 327 training samples), CaTS achieves DFT-level accuracy at 0.01% computational cost. Scaling to over 1,000 unseen metal-organic complex structures, it identifies top candidates validated by rigorous DFT. AI-assisted analysis using ChatGPT O3 and SHAP confirms that the predictions are consistent with mechanistic heuristics, providing theoretical validation for large-scale prediction. This paradigm shift from static descriptors to kinetic-resolution screening enables industrial-scale catalyst discovery with atomistic precision.
Supplementary materials
Title
Supplementary material for "CaTS: Toward Scalable and Efficient Transition State Screening for Catalyst Discovery"
Description
The supplementary information document for "CaTS: Toward Scalable and Efficient Transition State Screening for Catalyst Discovery" (TSFF_SI.pdf) describes the datasets used in the study, such as the Transition1x benchmark dataset containing 9.6 million data points and the rhodium-phosphine complex catalytic dataset. It also covers model architectures like the EquiformerV2 graph neural network (GNN), training details including hyperparameters and pretraining-finetuning workflows, optimization of NEB parameters (e.g., effects of interpolation image numbers and optimizers), model performance (energy and force prediction error curves), applications of the automated structure generation tool ComplexGen, and large-scale screening workflows. Data distributions, model performance, and catalyst screening results are presented through figures and tables, providing technical support for the reproducibility and in-depth understanding of the CaTS framework.
Actions