An Unsupervised Machine Learning Workflow for Assigning and Predicting Generality in Asymmetric Catalysis



The development of chiral catalysts that can provide high enantioselectivities across a wide assortment of substrates or reaction range is a priority for many catalyst design efforts. While several approaches are available to aid in the identification of general catalyst systems there is currently no simple procedure for directly measuring how general a given catalyst could be. Herein, we present a catalyst-agnostic workflow centered on unsupervised machine learning that enables the rapid assessment and quantification of catalyst generality. The workflow uses curated literature data sets and reaction descriptors to visualize and cluster chemical space coverage. This reaction network can then be applied to derive a catalyst generality metric through designer equations and interfaced with other regression techniques for general catalyst prediction. As validating case studies, we have successfully applied this method to identify-through-quantification the most general catalyst chemotype for an organocatalytic asymmetric Mannich reaction and predicted the most general chiral phosphoric acid catalyst for the addition of nucleophile to imines. The mechanistic basis for catalyst generality can then be gleaned from the calculated values by deconstructing the contributions of chemical space and enantiomeric excess to the overall result. We conclude that broadly applicable catalysts may be more adaptative to changes in reactant structure because enantioinduction does not rely on a single set of noncovalent interactions. In contrast, some systems work by engaging in robust noncovalent contacts that do not change significantly in nature when the structure of the reaction component is altered. Ultimately, our findings represent a framework for interrogating and predicting catalyst generality, and this strategy should be relevant to other catalytic systems widely applied in asymmetric synthesis.