Semi-automatic scheme for early-stage material search: developing solvent-solubility prediction of tetraphenylporphyrin derivatives securing chemical-space coverage

27 May 2022, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


This study developed and implemented a semi-automatic material exploration scheme to modelize the solvent-solubility of tetraphenylporphyrin derivatives. In particular, the scheme involved the following steps: definition of a practical chemical search space, prioritization of molecules in the space using an extended algorithm for submodular function maximization without requiring biased variable selection or pre-existing data, synthesis & automatic measurement, and machine-learning model estimation. The optimal evaluation order selected using the algorithm covered several similar molecules (32% of all targeted molecules, whereas that obtained by random sampling and uncertainty sampling was ~7% and ~4%, respectively) with a small number of evaluations (10 molecules: 0.13% of all targeted molecules). The derived binary classification models predicted ‘good solvents’ with an accuracy > 0.8. Overall, we confirmed the effectivity of the proposed semi-automatic scheme in early-stage material search projects for accelerating a wider range of material research.


Automatic experiment
machine learning
submodular function maximization
UV–Vis absorption spectroscopy
molecular aggregation
first-principles calculation
materials science
materials informatics

Supplementary materials

Supplementary Information 1
1. Substitution target molecules 2. Mapping molecules over principal components 3. Spectrum analysis 4. Time-dependent density functional theory (TDDFT) calculation 5. Solvent-solubility predictions 6. Threshold used in SFMMOL 7. Comparison of algorithms for prediction performances of calculated properties 8. Preparation of the top-ranked TPP derivatives 11–15
Supplementary Information 2
1. Molecular groups 2. Spectrum data


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.