Abstract
Digital polymer chemistry leverages computational methods to design and optimize polymer materials. While there have been advances in using machine learning to accelerate the design of polymers, the field is hampered by the lack of standards, which precludes comparability and makes it difficult to build on top of prior work. To address this gap, we introduce PolyMetriX, an open-source Python library designed to facilitate the entire polymer informatics workflow—from obtaining data to training models. PolyMetriX provides standardized dataset objects, curated polymer property datasets, and advanced featurization techniques that extract hierarchical structural information at the full polymer, backbone, and side chain levels. Additionally, it incorporates polymer-specific data splitting strategies to ensure robust model generalization. PolyMetriX enhances the predictive performance of models while improving reproducibility in digital polymer chemistry.