Application of Life Cycle Assessment and Machine Learning for High-Throughput Screening of Green Chemical Substitutes
The production process of many active pharmaceutical ingredients such as sitagliptin could cause severe environmental problems due to the use of toxic chemical materials and production infrastructure, energy consumption and wastes treatment. The environmental impacts of sitagliptin production process were estimated with life cycle assessment (LCA) method, which suggested that the use of chemical materials provided the major environmental impacts. Both methods of Eco-indicator 99 and ReCiPe endpoints confirmed that chemical feedstock accounted 83% and 70% of life-cycle impact, respectively. Among all the chemical materials used in the sitagliptin production process, trifluoroacetic anhydride was identified as the largest influential factor in most impact categories according to the results of ReCiPe midpoints method. Therefore, high-throughput screening was performed to seek for green chemical substitutes to replace the target chemical (i.e. trifluoroacetic anhydride) by the following three steps. Firstly, thirty most similar chemicals were obtained from two million candidate alternatives in PubChem database based on their molecular descriptors. Thereafter, deep learning neural network models were developed to predict life-cycle impact according to the chemicals in Ecoinvent v3.5 database with known LCA values and corresponding molecular descriptors. Finally, 1,2-ethanediyl ester was proved to be one of the potential greener substitutes after the LCA data of these similar chemicals were predicted using the well-trained machine learning models. The case study demonstrated the applicability of the novel framework to screen green chemical substitutes and optimize the pharmaceutical manufacturing process.