RegioSQM20: Improved Prediction of the Regioselectivity of Electrophilic Aromatic Substitutions

Abstract

We present RegioSQM20, a new version of RegioSQM (Chem. Sci. 2018, 9, 660), which predicts the regioselectivities of electrophilic aromatic substitution (EAS) re- actions from the calculation of proton affinities. The following improvements have been made: The open source semiempirical tight binding program xtb is used instead of the closed source MOPAC program. Any low energy tautomeric forms of the input molecule are identified and regioselectivity predictions are made for each form. Finally, RegioSQM20 offers a qualitative prediction of the reactivity of each tautomer (low, medium, or high) based on the reaction center with the highest proton affinity. The inclusion of tautomers increases the success rate from 90.7% to 92.7%. RegioSQM20 is compared to two machine learning based models: one developed by Struble et al. (React. Chem. Eng. 2020, 5, 896) specifically for regioselectivity predictions of EAS reactions (WLN) and a more generally applicable reactivity predictor (IBM RXN) de- veloped by Schwaller et al. (ACS Cent. Sci. 2019, 5, 1572). RegioSQM20 and WLN offers roughly the same success rates for the entire data sets (without considering tau- tomers), while WLN is many orders of magnitude faster. The accuracy of the more general IBM RXN approach is somewhat lower: 76.3%-85.0%, depending on the data set. The code is freely available under the MIT open source license and will be made available as a webservice (regiosqm.org) in the near future.

Content

Supplementary weblinks