More and Faster: Simultaneously Improving Reaction Coverage and Computational Cost in Automated Reaction Prediction Tasks
Preprints are manuscripts made publicly available before they have been submitted for formal peer review and publication. They might contain new research findings or data. Preprints can be a draft or final version of an author's research but must not have been accepted for publication at the time of submission.
Automated reaction prediction has the potential to elucidate complex reaction networks for applications ranging from combustion to materials degradation. Although substantial progress has been made in predicting specific reaction pathways and resolving mechanisms, the computational cost and inconsistent reaction coverage of automated prediction are still obstacles to exploring deep reaction networks without using heuristics. Here we show that cost can be reduced and reaction coverage can be increased simultaneously by relatively straight- forward modifications of the reaction enumeration, geometry initialization, and transition state convergence algorithms that are common to many emerging prediction methodologies. These changes are implemented in the context of Yet Another Reaction Program (YARP), our reaction prediction package, for which we report a head-to-head comparison with prevailing methods for two benchmark reaction prediction tasks. In all cases, we observe near perfect recapitulation of established reaction pathways and products by YARP, without the use of heuristics or other domain knowledge to guide reaction selection. In addition, YARP also discovers many new kinetically relevant pathways and products reported here for the first time. This is achieved while simultaneously reducing the cost of reaction characterization nearly 100-fold and increasing transition state success rates and intended rates over 2-fold and 10-fold, respectively, compared with recent benchmarks. This combination of ultra-low cost and high reaction-coverage creates opportunities to explore the reactivity of larger sys- tems and more complex reaction networks for applications like chemical degradation, where approaches based on domain heuristics fail.