Is BigSMILES the Friend of Polymer Machine Learning?

Haoke Qiu; Zhao-Yan Sun

doi:10.26434/chemrxiv-2024-bxxhh-v5

Polymer Science

Search within Polymer Science

Is BigSMILES the Friend of Polymer Machine Learning?

06 January 2025, Version 5

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Machine learning (ML) has become a powerful tool in polymer science, with its success strongly relying on effective structural representations of polymers. While the Simplified Molecular Input Line Entry System (SMILES) is widely used due to its simplicity, it was originally designed for small molecules and struggles to capture the stochastic nature of polymers. Recently, BigSMILES has been introduced as a more compact and versatile representation of polymer structures. However, the relative performance of SMILES and BigSMILES in polymer ML tasks remains unexplored. In this study, we systematically evaluate SMILES and BigSMILES across 12 polymer-related tasks, including property prediction and inverse design, utilizing convolutional neural networks (CNNs) and large language models (LLMs). Our results show that BigSMILES enables faster training times due to its reduced token complexity, and achieves comparable or superior performance to SMILES in certain predictive tasks. Moreover, BigSMILES more accurately encodes chemical information and monomer connectivity for copolymers within LLM frameworks. This work serves as a starting point for a comprehensive evaluation of SMILES and BigSMILES in polymer ML applications, highlighting the potential of BigSMILES to streamline and accelerate polymer informatics workflows, particularly for complex systems like copolymers and polymer composites. Looking ahead, advancing polymer representations to integrate polymer chain structure, phase morphology, and processing parameters will be crucial for capturing the multifaceted relationships between polymer structure and properties, driving more accurate and efficient modeling.

Keywords

Polymer Machine Learning

BigSMILES

SMILES

Polymer Representative Learning

Supplementary weblinks

Title

Description

Actions

Title

Code and Data

Description

Code and Data

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.