Data-Driven Design of Protein-Like Single-Chain Polymer Nanoparticles



The functional structure of proteins is heavily influenced by their folding behavior. AlphaFold, a powerful artificial intelligence (AI) program trained on information from the Protein Data Bank (PDB), was developed to predict the 3D structure of proteins from its amino acid sequence. Inspired by this, we aim to elucidate structural features of synthetic single-chain polymer nanoparticles (SCNPs) based on compositional information (monomers, chain length, molecular weight, charge, and valency) by machine learning (ML). Specifically, we demonstrate the effectiveness of ML to improve the efficiency of SCNP design and uncover important polymer design attributes to mimic protein-like structural features. To start, we randomly screened over 1000 synthesized SCNPs through a combination of high-throughput dynamic light scattering (DLS) and small-angle X-ray scattering (SAXS) and compared these results to simulated protein data from the PDB. Then, utilizing evidential neural networks (ENets), we predicted, synthesized, and characterized 30 novel compact SCNPs. Incredibly, this data-driven approach yielded 58% of the predicted SCNPs with Porod exponent ≥ 3.5 as opposed to 5% of SCNPs from the random screen. Using Shapely additive explanation (SHAP) values, we further uncovered interesting contributions of monomer content on Porod exponent and radius of gyration. From this work, we have shown that an ML-guided approach proves effective for the challenging, unintuitive problem of nanoparticle design.


Supplementary material

Supporting Information
Contains additional data and tables