Abstract
We present a novel framework that we name "Learning Advance" for hypothesis generation and validation for the discovery of chemical knowledge in the context of optimizing solubility in amphiphile/water systems. The workflow begins with an initial hypothesis: that the incorporation of common hydrotropic additives, such as sugars or urea, enhances solubility limits. To test this assumption, we employ a grid search and Latin hypercube sampling approach to design experimental combinations of additive weight percentages. We employ high-throughput robotic systems for automating the experiments and a YOLO-based image analysis workflow for determining the degree of solubilization. Experimental data are transformed into a chemical feature space to train a Gaussian Process Regression (GPR) model, which drives a Bayesian optimization (BO) algorithm for identifying optimal additive combinations. When BO plateaus, the "Learning Advance" approach leverages all accumulated data for AI analysis. We extract correlations between target property and chemical features, enabling LLM tools to generate a novel hypothesis based on the observed data. This hypothesis is subsequently validated through experimentation, creating a continuous cycle of discovery. This framework demonstrates how integrating BO with AI-driven hypothesis generation enables breakthroughs beyond conventional optimization limits, establishing a promising approach for advancing scientific knowledge discovery in material science and chemistry.
Supplementary materials
Title
Supplementary Materials
Description
SI
Actions