Generating molecules with specific constituents and structures that exhibit desired properties is a crucial yet challenging task in the computer-aided design of functional molecules. This challenge arises from the discrete nature of the vast design space of molecules, which is subject to additional physical constraints such as symmetries. Exploration and optimization within this constrained discrete space pose difficulties for most machine learning methods. In this paper, we introduce a multimodal representation for molecules that accounts for both their discrete atomic constituents and their continuous atomic positions in 3D Euclidean space. Based on this representation, we develop MolEdit, a molecular generation method that simultaneously solves discrete and continuous optimization problems: MolEdit learns the distribution of molecular constituents using efficient normalizing flow models and employs a group-optimized score matching algorithm to model the symmetry-preserved distribution of atomic positions. By combining these two components, MolEdit can effectively assemble any discrete molecular graph and generate corresponding molecular conformers. Furthermore, by decomposing the generation process multimodally, MolEdit can work with flexible prompts specifying conditional information about molecular constituents and substructures, leading to a general-purpose approach to versatile molecular editing.
Versatile Molecular Editing via Multimodal and Group-optimized Generative Learning
27 September 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.