Abstract
Artificial Intelligence (AI) techniques are transforming the computational discovery and design of polymers. The key enablers for polymer informatics are machine-readable molecular string representations of the building blocks of a polymer, i.e., the monomers. In monomer strings, such as SMILES, symbols at the head and tail atoms indicate the locations of bond formation during polymerization. Since the linking of monomers determines a polymer’s properties, the performance of AI prediction models will, ultimately, be limited by the accuracy of the head and tail assignments in the monomer SMILES. Considering the large number of polymer precursors available in chemical data bases, reliable methods for the automated assignment of head and tail atoms are needed. Here, we report a method for assigning head and tail atoms in monomer SMILES by analyzing the reactivity of their functional groups. In a reference data set containing 206 polymer precursors, the HeadTailAssign (HTA) algorithm has correctly predicted the polymer class of 204 monomer SMILES, representing an accuracy of 99%. The head and tail atoms were correctly assigned to 187 monomer SMILES, representing an accuracy of 91%. The HTA code is available for validation and reuse at https://github.com/IBM/HeadTailAssign