Fine-Tuning a Genetic Algorithm for CAMD: A Screening-Guided Warm Start

04 November 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

More sustainable chemical processes require the selection of suitable molecules, which can be supported by computer-aided molecular design (CAMD). CAMD often generates and evaluates molecular structures using genetic algorithms. However, genetic algorithms can suffer from slow convergence, and might yield suboptimal solutions. In response to these challenges, this work presents a method to fine-tune a genetic algorithm for CAMD. The proposed method builds on the COSMO-CAMD framework that utilizes a genetic algorithm for solving optimization-based molecular design problems and COSMO-RS for predicting physical properties of molecules. The key idea of the proposed method is to integrate results from a fast large-scale molecular screening into the molecular design framework, thereby enabling targeted initialization of the genetic algorithm, referred to as warm-start. The proposed method is applied in two case studies to design solvents for extracting gamma-valerolactone and phenol, respectively, from aqueous solutions. Compared to the benchmark method, the warm-started COSMO-CAMD framework reduces computing time by up to 70%, discovers fourfold more top performing candidate molecules, and identifies seven tailored molecular fragments, culminating in the discovery of two novel solvents specifically for the phenol case. The optimal solvent is found in all computational runs. Overall, the warm-started COSMO-CAMD framework significantly improves efficiency, effectiveness, and robustness of molecular design.

Supplementary materials

Title
Description
Actions
Title
Supporting Information (general)
Description
It includes additional information about the proposed method, the setup and results of the two case studies discussed, and the software information used in the main article.
Actions
Title
SMILES screening
Description
It includes all molecule inside the COSMO database used in this work.
Actions
Title
SMILES top40 case study 1
Description
It includes the top 40 screened candidate molecules used in case study 1.
Actions
Title
SMILES top40 case study 1
Description
It includes the top 40 screened candidate molecules used in case study 2.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.