Abstract
Processing of chemical information by computational intelligence methods faces the challenge of the structural complexity of molecular graphs. These graphs are not amenable to being represented in a suitable way for such methods. The most popular representation is the SMILES notation standard. However, it comes with some limitations such as the abundance of non-valid strings and the fact that similar strings often represent very different molecules.
In this work, a completely different approach to chemical nomenclature is presented. A reduced instruction set is defined, and the language of all strings that are sequences of such instructions is considered. All strings of this language are valid, i.e., each string represents a molecule. Moreover, slight changes in a string usually correspond to small modifications in the represented molecule. Therefore, this approach is appropriate for its use if state-of-the-art computational intelligence systems for chemical information processing, including deep learning models.