Abstract
In order to apply machine learning to study the structure-property relationships of
particular nanomaterials domain knowledge is typically required for feature extraction. However,
this process may introduce bias if there is a focus on known aspects of structure, impeding
the discovery of new science. Here, we develop an approach that uses only atomic Cartesian
coordinates to predict the electron affinities, band gap energies, Fermi energies and ionization
potentials of simulated graphene nanoflakes from a pubically available data set [1]. The workflow
developed represents nanoflakes with graphs that are more representative than the ball-stick
atom-bond representation that is intuitive to humans and generates fixed-size embeddings of
these graphs using the neural embedding framework graph2vec [2]. Pairing the graph embeddings
with a convolutional neural network produced highly accurate predictive models with hold out
test set R2 from 0.9 − 0.96 for nanoflakes with a very challenging variation in size from tens to
thousands of atoms. These predictions were benchmarked against results for optimised predictive
models with geometric domain-driven features [3] exceeded their model accuracy for predictions
of Fermi energy, electron affinity and ionisation potential and met their model accuracy for band
gap energy.