The design of molecular catalysts typically involves reconciling multiple conflicting property requirements, largely relying on human intuition and local structural searches. However, the vast number of potential catalysts requires pruning of the candidate space by efficient property prediction with quantitative structure-property relationships. Data-driven workflows embedded in a library of potential catalysts can be used to build predictive models for catalyst performance and serve as a blueprint for novel catalyst designs. Herein we introduce kraken, a discovery platform covering monodentate organophosphorus(III) ligands providing comprehensive physicochemical descriptors based on representative conformer ensembles. Using quantum-mechanical methods, we calculated descriptors for 1,558 ligands, including commercially available examples, and trained machine learning models to predict properties of over 300,000 new ligands. We demonstrate the application of kraken to systematically explore the property space of organophosphorus ligands and how existing datasets in catalysis can be used to accelerate ligand selection during reaction optimization.
fig1 for chemrxiv