Behavior of Linear and Nonlinear Dimensionality Reduction for Collective Variable Identi cation of Small Molecule Solution-Phase Reactions

21 September 2021, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Identifying collective variables for chemical reactions is essential to reduce the 3$N$ dimensional energy landscape into lower dimensional basins and barriers of interest. However in condensed phase processes, the non-meaningful motions of bulk solvent often overpower the ability of dimensionality reduction methods to identify correlated motions that underpin collective variables. Yet solvent can play important indirect or direct roles in reactivity and much can be lost through treatments that remove or dampen solvent motion. This has been amply demonstrated within principal component analysis, although less is known about the behavior of nonlinear dimensionality reduction methods, e.g., UMAP, that have become more popular recently. The latter presents an interesting alternative to linear methods though often at the expense of interpretability. This work presents distance attenuated projection methods of atomic coordinates that facilitate the application of both PCA and UMAP to identify collective variables in solution, and further the specific identity of solvent molecules that participate in chemical reactions. The performance of both methods is examined in detail for two reactions where the explicit solvent plays very different roles within the collective variables. The first reaction consists of the dynamic exchange of a cation about a polyhydroxy anion that is facilitated by waters of solvation, while the second reaction consists of a nucleophilic attack of water upon ethylene to initiate cis/trans isomerization. When applied to raw data, both PCA and UMAP representations are dominated by bulk solvent motions. On the other hand, when applied to data preprocessed by our attenuated projection methods, both PCA and UMAP identify the appropriate collective variables in solution.

Supplementary materials

Title
Description
Actions
Title
Supplementary Material
Description
Plots of the sensitivity of different components identified in reduced hyperspace as a function of different preprocessing methods.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.