Abstract
Designing high-performance polymers remains a critical challenge due to the vast design space. While machine learning and generative models have advanced polymer informatics, most approaches lack directional optimization capabilities and fail to close the loop between design and physical validation. Here we introduce PolyRL, a closed-loop reinforcement learning (RL) framework for the inverse design of gas separation polymers. By integrating reward model training, generative model pre-training, RL fine-tuning, and theoretical validation, PolyRL achieves multi-objective optimization under data-scarce conditions. We demonstrate that PolyRL is capable of efficiently generating polymer candidates with enhanced gas separation performance, as substantiated by detailed molecular simulation analyses. Additionally, we establish a standardized benchmark for RL-based polymer generation, providing a foundation for future research. This work showcases the power of reinforcement learning in polymer design and advances AI-driven materials discovery toward closed-loop, goal-directed paradigms.
Supplementary materials
Title
Supporting Information for "PolyRL: Reinforcement Learning-Guided Polymer Generation for Multi-Objective Polymer Discovery"
Description
This supporting information (SI) for "PolyRL: Reinforcement Learning-Guided Polymer Generation for Multi-Objective Polymer Discovery" details the PolyRL framework, including datasets, property prediction models (e.g., Random Forest), generative models (GPT-2, LLaMA2, GRU, LSTM), and reinforcement learning algorithms (REINVENT, REINFORCE, etc.). It also presents results from molecular dynamics simulations and SHAP analysis to validate the framework's efficiency in optimizing polymer structures for gas separation tasks.
Actions