The development of new nanomaterials for energy technologies is dependent on understanding the intricate relation between material properties and atomic structure. It is, therefore, crucial to be able to routinely characterise the atomic structure in nanomaterials, and a promising method for this task is Pair Distribution Function (PDF) analysis. The PDF can be obtained through Fourier transformation of x-ray total scattering data, and represents a histogram of all interatomic distances in the sample. Going from the distance information in the PDF to a chemical structure is an unassigned distance geometry problem (uDGP), and solving this is often the bottleneck in nanostructure analysis. In this work, we propose to use a Conditional Variational Autoencoder (CVAE) to automatically solve the uDGP to obtain valid chemical structures from PDFs. We use a simple model system of hypothetical mono-metallic nanoparticles containing up to 100 atoms in the face centered cubic (FCC) structure as a proof of concept. The model is trained to predict the assigned distance matrix (aDM) from a simulated PDF of the structure as the conditional input. We introduce a novel representation of structures by projecting them inside a unit sphere and adding additional anchor points or satellites to help in the reconstruction of the chemical structure. The performance of the CVAE model is compared to a Deterministic Autoencoder (DAE) showing that both models are able to solve the uDGP reasonably well. We further show that the CVAE learns a structured and meaningful latent embedding space which can be used to predict new chemical structures.
The paper has been through a peer-review process and accepted at 16th international workshop on mining and learning with graphs under KDD2020 conference.