Predictive Modeling of PROTAC Cell Permeability with Machine Learning

11 October 2022, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


Approaches for the prediction of PROTAC cell permeability are of major interest to reduce resource-demanding synthesis and testing of low-permeable PROTACs. We report a comprehensive investigation of the scope and limitations of machine learning-based binary classification models developed using simple 2D descriptors for large and structurally diverse sets of CRBN and VHL PROTACs. After construction and internal validation, the models were used for the prediction of blinded sets of PROTACs. For the VHL PROTAC set, kappa nearest neighbor and random forest models succeeded in predicting the permeability with >80% accuracy (k >0.57). Models retrained by combining the original training and the blinded set performed equally well for a second blinded VHL set. However, models for CRBN PROTACs were less successful, mainly due to the highly imbalanced nature of the CRBN datasets. We conclude that properly trained machine learning models can be integrated as effective filters in the PROTAC design process.


Cell permeability
Machine Learning
Molecular property space


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.