Abstract
The integration of artificial intelligence with materials science has opened new frontiers in accelerated materials discovery. However, general-purpose large language models (LLMs) often struggle with domain-specific challenges, necessitating the development of specialized models. Here, we introduce PolySea, a domain-specific LLM tailored for polymer informatics, designed to address key limitations in polymer property prediction, inverse design, and knowledge extraction. PolySea is trained on a meticulously curated dataset, integrating high-fidelity polymer property data from PolyInfo with structured polymer knowledge distilled from expert-curated sources. By leveraging LoRA-based fine-tuning, we mitigate catastrophic forgetting while enhancing computational efficiency, ensuring optimal retention of both general linguistic capabilities and polymer-specific knowledge.
PolySea demonstrates state-of-the-art performance across diverse polymer-related tasks. On regression benchmarks, it achieves an R² score of 0.97, while delivering 79% classification accuracy in thermal stability prediction. Comparative assessments against leading general-purpose LLMs—including ChatGPT-o1 and DeepSeek-R1—highlight PolySea’s superior precision, particularly in on-demand polymer design, where it generates novel polymer structures unseen in training yet aligned with target property constraints. The generated polymers are rigorously validated using a graph neural network surrogate model, Polymer Genome and density functional theory experiments, confirming their feasibility for real-world synthesis.
Our findings underscore the transformative potential of domain-adapted LLMs in accelerating polymer informatics. By bridging the gap between AI and materials science, PolySea not only establishes a new paradigm for polymer design but also paves the way for the development of specialized AI models across broader scientific disciplines.