Abstract
Network methods and molecular dynamics (MD) simulations have become essential tools for studying protein dynamics. However, applying network methods to MD simulations of flexible proteins is a major challenge, since the high conformational heterogeneity in such multi-state systems can lead to vastly different network topolo- gies across an ensemble. To address this, tools that can disentangle conformational ensembles on a network level are needed. Here, we propose a graph-based clustering framework that provides state-specific insight into the residue interactions of flexible proteins. The framework hinges on using the set of graph-theoretic closeness centralities of all amino acid residues as a structural fingerprint and input for unsupervised machine learning algorithms to perform dimensionality reduction and clustering. The resulting clusters - states with shared network topology - are subsequently fed back into the upstream workflow and characterized at every representation level. Based on the example of FAT10 - a protein with intrinsically disordered regions and two folded domains connected by a flexible linker - we demonstrate how this approach can be used to understand the protein’s residue interactions on different, interconnected levels and to characterize its most populated states. Due to the modularity of the framework, it can be easily adapted, which makes it a suitable method to support network-based analyses of MD simulations for a wide variety of proteins.