ChemRxiv
These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
Han_Pipeline_Final.pdf (4.29 MB)

Nanomaterials Synthesis Insights from Machine Learning of Scientific Articles by Extracting, Structuring, and Visualizing Knowledge

preprint
submitted on 09.11.2019, 05:14 and posted on 20.11.2019, 02:50 by Anna Hiszpanski, Brian Gallagher, Karthik Chellapan, Peggy Li, Shusen Liu, Hyojin Kim, Jinkyu Han, Bhavya Kailkhura, David Buttler, Yong Han

Nanomaterials of varying compositions and morphologies are of interest for many applications from catalysis to optics, but the synthesis of nanomaterials and their scale-up are most often time-consuming and Edisonian processes. Information gleaned from scientific literature can help inform and accelerate nanomaterials development, but again, searching the literature and digesting the information are time-consuming manual processes for researchers. To help address these challenges, we developed scientific article-processing tools that extract and structure information from the text and figures of nanomaterials articles, thereby enabling the creation of a personalized knowledgebase for nanomaterials synthesis that can be mined to help inform further nanomaterials development. Starting with a corpus of ca. 35k nanomaterials-related articles, we developed models to classify articles according to the nanomaterial composition and morphology, extract synthesis protocols from within the articles’ text, and extract, normalize, and categorize chemical terms within synthesis protocols. We demonstrate the efficiency of the proposed pipeline on an expert-labeled set of nanomaterials synthesis articles, achieving 100% accuracy on composition prediction, 95% prediction on morphology prediction, 0.99 AUC on protocol identification, and up to 0.87 F1-score on chemical entity recognition. In addition to processing articles’ text, microscopy images of nanomaterials within articles are also automatically identified and analyzed to determine nanomaterials’ morphologies and size distributions. To enable users to easily explore the database, we developed a complementary browser-based visualization tool that provides flexibility in comparing across subsets of articles of interest. We use these tools and information to identify trends in nanomaterials synthesis, such as the correlation of certain reagents with various nanomaterial morphologies, which is useful in guiding hypotheses and reducing the potential parameter space during experimental design.

History

Email Address of Submitting Author

han5@llnl.gov

Institution

Lawrence Livermore National Laboratory

Country

United States

ORCID For Submitting Author

0000-0002-3000-2782

Declaration of Conflict of Interest

No Conflict of Interest

Exports