Machine Learning to Reduce Reaction Optimization Lead Time – Proof of Concept with Suzuki, Negishi and Buchwald-Hartwig Cross-Coupling Reactions

To date, optimizing reactions let alone predicting the outcome (yield) of known reactions requires expert knowledge and can at best be obtained by computationally complex and expensive modelling. The present investigation tests if machine learning represents a viable approach for predicting a model reaction outcome that could be put into daily production. A prerequisite was replacing advanced scripting techniques with a more approachable data science platform such as Knime®. The Palladium catalyzed Suzuki-Miyaura, Negishi and Buchwald-Hartwig reactions were selected for a classification model of high/low yielding outcome combined with a selection of reaction conditions stemming from a commercial database. Here we present preliminary results of a random forest-based classification model using readily calculated standard medicinal chemistry descriptors from substrates and products yielded high ROC AUC of up to 96%. The descriptors used in the model do not convey anything about the reactivity, only 1D- and 2D- structural information, and performed equal or better than fingerprints, both in terms of prediction and computational requirements. One of the major challenges was the quality of the data and its subsequent curation.