Abstract
The availability of large, high-quality data sets is crucial for artificial intelligence design and discovery in chemistry. Despite the essential roles of solvents in chemistry, the rapid computational data set generation of solution-phase molecular properties at the quantum mechanical level of theory was previously hampered by the complicated simulation procedure. Software toolkits that can automate the procedure to set up high-throughput explicit-solvent quantum chemistry (QC) calculations for arbitrary solutes and solvents in an open-source framework are still lacking. We developed AutoSolvate, an open-source toolkit to streamline the workflow for QC calculation of explicitly solvated molecules. It automates the solvated-structure generation, force field fitting, configuration sampling, and the final extraction of microsolvated cluster structures that QC packages can readily use to predict molecular properties of interest. AutoSolvate is available through both a command line interface and a graphical user interface, making it accessible to the broader scientific community. To improve the quality of the initial structures generated by AutoSolvate, we investigated the dependence of solute-solvent closeness on solute/solvent identities and trained a machine learning model to predict the closeness and guide initial structure generation. Finally, we tested the capability of AutoSolvate for rapid data set curation by calculating the outer-sphere reorganization energy of a large data set of 166 redox couples, which demonstrated the promise of the AutoSolvate package for chemical discovery efforts.
Supplementary materials
Title
Supplementary Materials for AutoSolvate: A Toolkit for Automating Quantum Chemistry Design and Discovery of Solvated Molecules
Description
See the supplementary material for calculation of solute average radius, cluster extraction in GUI and CLI, solvent-solute center distance and closeness vs. system size, NPT MM density equilibration, MDDF approximation validation, unconverged benchmark systems and systems with unphysical λ_o/λ excluded from the λ_o/λ histogram, ML hyperparameters, and data and machine learning model file description.
Actions