These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
Preprints are manuscripts made publicly available before they have been submitted for formal peer review and publication. They might contain new research findings or data. Preprints can be a draft or final version of an author's research but must not have been accepted for publication at the time of submission.
submitted on 06.07.2020 and posted on 07.07.2020by Fernando Huerta, Samuel Hallinder, Alexander Minidis
To date, optimizing reactions let alone predicting the
outcome (yield) of known reactions requires expert knowledge and can at best be
obtained by computationally complex and expensive modelling. The present
investigation tests if machine learning represents a viable approach for
predicting a model reaction outcome that could be put into daily production. A
prerequisite was replacing advanced scripting techniques with a more
approachable data science platform such as Knime®. The Palladium catalyzed
Suzuki-Miyaura, Negishi and Buchwald-Hartwig reactions were selected for a classification
model of high/low yielding outcome combined with a selection of reaction
conditions stemming from a commercial database. Here we present preliminary
results of a random forest-based classification model using readily calculated
standard medicinal chemistry descriptors from substrates and products yielded
high ROC AUC of up to 96%. The descriptors used in the model do not convey anything
about the reactivity, only 1D- and 2D- structural information, and performed
equal or better than fingerprints, both in terms of prediction and
computational requirements. One of the major challenges was the quality of the
data and its subsequent curation.