Data Efficient Learning of Molecular Slow Modes from Nonequilibrium Metadynamics

10 January 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Enhanced sampling simulations help overcome free energy barriers and explore molecular conformational space by applying external bias potential along suitable collective variables (CVs). However, identifying optimal CVs that align with the slow modes of complex molecular systems with many coupled degrees of freedom can be a significant challenge. Deep time-lagged independent component analysis (Deep-TICA) addresses this issue by employing an artificial neural network that generates non-linear combinations of molecular descriptors to learn the slowest degrees of freedom. Training Deep-TICA CVs, however, typically requires long equilibrium simulations that can sample multiple recrossing events across various metastable conformations of the molecule. This requirement can often be prohibitively expensive, thereby limiting its widespread application. In this study, we present an algorithm that enables the training of Deep-TICA CVs using a limited amount of trajectory data obtained from short non-equilibrium metadynamics simulations that only sample one forward transition from the initial to the final state. We achieve this by utilizing the variational Koopman algorithm, which reweights short off-equilibrium trajectories to reflect the equilibrium probability densities. We demonstrate that enhanced sampling simulations conducted along the Koopman reweighted Deep-TICA CV can accurately and efficiently converge the free energy surface for systems such as the Muller-Brown Potential, Alanine Dipeptide, and the chignolin mini-protein. Our approach, therefore, addresses the key challenge of inferring slow modes from limited trajectory data, making it more feasible to use deep learning CVs to study molecular processes of practical relevance.

Keywords

Enhanced Sampling
TICA
Koopman Reweighting
Protein Folding
Free Energy

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
The supporting information text includes the computational details for the MD simulation and neural network training and supplementary results (convergence plots, CV vs time plots, and comparison of the chignolin FES with the DE Shaw data).
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.