Abstract
Formulations, or mixtures of chemical ingredients, are ubiquitous in materials science, but optimizing their properties remains challenging due to the vast design space. Computational approaches offer a promising solution to traverse this space while minimizing trial-and-error experimentation. Using high-throughput classical molecular dynamics simulations, we generated a comprehensive dataset of over 30,000 solvent mixtures to evaluate three machine learning approaches that connect molecular structure and composition to property: formulation descriptor aggregation (FDA), formulation graph (FG), and Set2Set-based method (FDS2S). Our results demonstrate that our new FDS2S approach outperforms other approaches in predicting simulation-derived properties. Formulation-property relationships can reveal important substructures and identify promising formulations at least two to three times faster than random guessing. The models show robust transferability to experimental datasets, accurately predicting properties across energy, pharmaceutical, and petroleum applications. Our research demonstrates the utility of high-throughput simulations and machine learning tools to design formulations with promising properties.
Supplementary materials
Title
Supplementary information document
Description
The supporting information contains the comparison of formulation labels between molecular dynamics simulations and experiments, analysis of miscibility for binary mixtures using molecular dynamics simulations, best hyperparameters of formulation-property models when trained with 90% of the data, and description of the formulation dataset generated in this work and the curated literature datasets.
Actions