AutoSolvate: A Toolkit for Automating Quantum Chemistry Design and Discovery of Solvated Molecules

11 January 2022, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The availability of large, high-quality data sets is crucial for artificial intelligence design and discovery in chemistry. Despite the essential roles of solvents in chemistry, the rapid computational data set generation of solution-phase molecular properties at the quantum mechanical level of theory was previously hampered by the complicated simulation procedure. Software toolkits that can automate the procedure to set up high-throughput explicit-solvent quantum chemistry (QC) calculations for arbitrary solutes and solvents in an open-source framework are still lacking. We developed AutoSolvate, an open-source toolkit to streamline the workflow for QC calculation of explicitly solvated molecules. It automates the solvated-structure generation, force field fitting, configuration sampling, and the final extraction of microsolvated cluster structures that QC packages can readily use to predict molecular properties of interest. AutoSolvate is available through both a command line interface and a graphical user interface, making it accessible to the broader scientific community. To improve the quality of the initial structures generated by AutoSolvate, we investigated the dependence of solute-solvent closeness on solute/solvent identities and trained a machine learning model to predict the closeness and guide initial structure generation. Finally, we tested the capability of AutoSolvate for rapid data set curation by calculating the outer-sphere reorganization energy of a large data set of 166 redox couples, which demonstrated the promise of the AutoSolvate package for chemical discovery efforts.

Keywords

machine learning
automation
quantum chemistry
software

Supplementary materials

Title
Description
Actions
Title
Supplementary Materials for AutoSolvate: A Toolkit for Automating Quantum Chemistry Design and Discovery of Solvated Molecules
Description
See the supplementary material for calculation of solute average radius, cluster extraction in GUI and CLI, solvent-solute center distance and closeness vs. system size, NPT MM density equilibration, MDDF approximation validation, unconverged benchmark systems and systems with unphysical λ_o/λ excluded from the λ_o/λ histogram, ML hyperparameters, and data and machine learning model file description.
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.