Abstract
While the vision of accelerating materials discovery using data
driven methods is well-founded, practical realization has been
throttled due to challenges in data generation, ingestion, and
materials state-aware machine learning. High-throughput experiments and automated computational workflows are addressing
the challenge of data generation, and capitalizing on these emerging data resources requires ingestion of data into an architecture
that captures the complex provenance of experiments and simulations. In this manuscript, we describe an event-sourced architecture for materials provenance (ESAMP) that encodes the sequence
and interrelationships among events occurring in a simulation or
experiment. We use this architecture to ingest a large and varied dataset (MEAD) that contains raw data and metadata from
millions of materials synthesis and characterization experiments
performed using various modalities such as serial, parallel, multimodal experimentation. Our data architecture tracks the evolution of a material’s state, enabling a demonstration of how stateequivalency rules can be used to generate datasets that significantly enhance data-driven materials discovery. Specifically, using state-equivalency rules and parameters associated with statechanging processes in addition to the typically used composition
data, we demonstrated marked reduction of uncertainty in prediction of overpotential for oxygen evolution reaction (OER) catalysts. Finally, we discuss the importance of ESAMP architecture in
enabling several aspects of accelerated materials discovery such
as dynamic workflow design, generation of knowledge graphs,
and efficient integration of theory and experiment.
Supplementary materials
Title
SI ACE Database paper
Description
SI containing discussion on entities in the database.
Actions