Abstract
To date, optimizing reactions let alone predicting the
outcome (yield) of known reactions requires expert knowledge and can at best be
obtained by computationally complex and expensive modelling. The present
investigation tests if machine learning represents a viable approach for
predicting a model reaction outcome that could be put into daily production. A
prerequisite was replacing advanced scripting techniques with a more
approachable data science platform such as Knime®. The Palladium catalyzed
Suzuki-Miyaura, Negishi and Buchwald-Hartwig reactions were selected for a classification
model of high/low yielding outcome combined with a selection of reaction
conditions stemming from a commercial database. Here we present preliminary
results of a random forest-based classification model using readily calculated
standard medicinal chemistry descriptors from substrates and products yielded
high ROC AUC of up to 96%. The descriptors used in the model do not convey anything
about the reactivity, only 1D- and 2D- structural information, and performed
equal or better than fingerprints, both in terms of prediction and
computational requirements. One of the major challenges was the quality of the
data and its subsequent curation.
Supplementary materials
Title
S1 Supplementary Information
Description
Actions
Title
S2 Knime workflows with readme
Description
Actions
Title
S3 Reaxys ReactionIDs
Description
Actions