Abstract
We introduce AlphaFold2-RAVE (af2rave), an open-source Python package that integrates machine learning-based structure prediction with physics-driven sampling to generate alternative protein conformations efficiently. Protein structures are not static but exist as ensembles of conformations, many of which are functionally relevant yet challenging to resolve experimentally. While deep learning models like AlphaFold2 can predict structural ensembles, they lack explicit physical validation. af2rave addresses this limitation by combining reduced multiple sequence alignment (MSA) AlphaFold2 predictions with molecular dynamics (MD) simulations to efficiently explore local conformational space. A feature selection module identifies key structural degrees of freedom, and the State Predictive Information Bottleneck (SPIB) method uncovers the underlying conformational topology, classifying functionally relevant states. Under the Reweighted Autoencoded Variational Bayes for Enhanced Sampling (RAVE) protocol, either unbiased or biased sampling can be performed to further explore the conformation ensembles. We validate af2rave on multiple systems, including E. coli adenosine kinase (ADK) and human DDR1 kinase, successfully identifying distinct functional states with minimal prior biological knowledge. Furthermore, we demonstrate that af2rave achieves conformational sampling efficiency comparable to long unbiased MD simulations on the SARS-CoV-2 spike protein receptor-binding domain while significantly reducing computational cost. The af2rave package provides a streamlined workflow for researchers to generate and analyze alternative protein conformations, offering an accessible tool for drug discovery and structural biology.