Abstract
Artificial feed-forward neural networks have long been recognized as powerful machine learning models and are widely used in QSAR and QSPR modeling of molecular properties. Inspired by Random Forest models and the robust techniques of sample and feature bagging, the RandomNets model was developed as an efficient, vectorized solution for ensemble creation, training, and inference. The model adds an extra dimension to the tensors passing through the neural network, combined with input feature masking and optional subsampling of the dataset during training. This vectorized approach improves efficiency and simplifies training and inference of the implicit ensemble. Training a 25-member implicit ensemble requires only twice the time of a comparable baseline network but significantly improves prediction performance, as measured by R² and MSE on test sets from 133 bioactivity datasets, with an average performance increase of around 25%. Compared to the conceptually similar input masking technique using dropout, the implicit ensemble demonstrates reduced sensitivity to hyperparameter choices, similar or improved performance, and a fourfold reduction in training time. Additionally, the implicit ensemble provides the standard deviation of individual predictions, which can help identify uncertain predictions.
Supplementary materials
Title
Supplementary plots and figures
Description
Supplementary Information for: RandomNets Improve Neural Network Regression Performance via Implicit Ensembling
Actions