Organic Chemistry

Predicting Glycosylation Stereoselectivity Using Machine Learning



Predicting the stereochemical outcome of chemical reactions is challenging in mechanistically ambiguous transformations. The stereoselectivity of glycosylation reactions is influenced by at least eleven factors across four chemical participants and temperature. A random forest algorithm was trained using a highly reproducible, concise dataset to accurately predict the stereoselective outcome of glycosylations. The steric and electronic contributions of all chemical reagents and solvents were quantified by quantum mechanical calculations. The trained model accurately predicts stereoselectivities for unseen nucleophiles, electrophiles, acid catalyst, and solvents across a wide temperature range (overall root mean square error 6.8%). All predictions were validated experimentally on a standardized microreactor platform. The model helped to identify novel ways to control glycosylation stereoselectivity and accurately predicts previously unknown means of stereocontrol. By quantifying the degree of influence of each variable, we discovered that environmental factors influence the stereoselectivity of glycosylations more than the coupling partners in this area of chemical space.

Version notes

Version 1.0


Thumbnail image of gly_stereo_predict_final.pdf

Supplementary material

Thumbnail image of gly_stereo_predict_SI_final.pdf
gly stereo predict SI final
Thumbnail image of Total 342 data.xlsx
Total 342 data
Thumbnail image of training 268.xlsx
training 268
Thumbnail image of validation 74.xlsx
validation 74
Thumbnail image of Potential descriptors.xlsx
Potential descriptors