Is BigSMILES the Friend of Polymer Machine Learning?

06 January 2025, Version 5
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Machine learning (ML) has become a powerful tool in polymer science, with its success strongly relying on effective structural representations of polymers. While the Simplified Molecular Input Line Entry System (SMILES) is widely used due to its simplicity, it was originally designed for small molecules and struggles to capture the stochastic nature of polymers. Recently, BigSMILES has been introduced as a more compact and versatile representation of polymer structures. However, the relative performance of SMILES and BigSMILES in polymer ML tasks remains unexplored. In this study, we systematically evaluate SMILES and BigSMILES across 12 polymer-related tasks, including property prediction and inverse design, utilizing convolutional neural networks (CNNs) and large language models (LLMs). Our results show that BigSMILES enables faster training times due to its reduced token complexity, and achieves comparable or superior performance to SMILES in certain predictive tasks. Moreover, BigSMILES more accurately encodes chemical information and monomer connectivity for copolymers within LLM frameworks. This work serves as a starting point for a comprehensive evaluation of SMILES and BigSMILES in polymer ML applications, highlighting the potential of BigSMILES to streamline and accelerate polymer informatics workflows, particularly for complex systems like copolymers and polymer composites. Looking ahead, advancing polymer representations to integrate polymer chain structure, phase morphology, and processing parameters will be crucial for capturing the multifaceted relationships between polymer structure and properties, driving more accurate and efficient modeling.

Keywords

Polymer Machine Learning
BigSMILES
SMILES
Polymer Representative Learning

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.