Deep Generative Model for Sparse Graphs using Text-Based Learning with Augmentation in Generative Examination Networks

27 September 2019, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Graphs and networks are a key research tool for a variety of science fields, most notably chemistry, biology, engineering and social sciences. Modeling and generation of graphs with efficient sampling is a key challenge for graphs. In particular, the non-uniqueness, high dimensionality of the vertices and local dependencies of the edges may render the task challenging. We apply our recently introduced method, Generative Examination Networks (GENs) to create the first text-based generative graph models using one-line text formats as graph representation. In our GEN, a RNN-generative model for a one-line text format learns autonomously to predict the next available character. The training is stopped by an examination mechanism checking validating the percentage of valid graphs generated. We achieved moderate to high validity using dense g6 strings (random 67.8 +/- 0.6, canonical 99.1 +/- 0.2). Based on these results we have adapted the widely used SMILES representation for molecules to a new input format, which we call linear graph input (LGI). Apart from the benefits of a short compressible text-format, a major advantage include the possibility to randomize and augment the format. The generative models are evaluated for overall performance and for reconstruction of the property space. The results show that LGI strings are very well suited for machine-learning and that augmentation is essential for the performance of the model in terms of validity, uniqueness and novelty. Lastly, the format can address smaller and larger dataset of graphs and the format can be easily adapted to define another meaning of the characters used in the LGI-string and can address sparse graph problems in used in other fields of science.

Keywords

LSTM RNNs
autonomously
generative
GEN
Generative Examination Network
quality control measures

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.