SNAr Regioselectivity Predictions: Machine Learning Trigger-ing DFT Reaction Modeling through Statistical Threshold



Fast and accurate prospective predictions of the regioselectivity can significantly reduce the time and resources spent on unproductive transformations in the pharmaceutical industry. Density functional theory (DFT) reaction modeling through transition state theory (TST) and machine learning (ML) methods have been widely used to predict reaction outcomes such as selectivity. However, TST reaction modeling and ML methods are either time-consuming or data dependent. Herein, we introduce a prototype seamlessly bridging machine learning and TST modeling by triggering the resource-intensive but much less domain sensitive DFT calculation only on less confident ML predictions. The proposed workflow was trained and tested on both Pfizer internal dataset and USPTO public dataset to predict regioselectivity for SNAr reactions. Our method is accurate and fast which achieves 96.3% and 94.7% accuracy predicting the correct major product on Pfizer and USPTO datasets, respectively, in a fraction of conventional TST computing time.