When medicinal chemistry was born a hundred years ago, a drug design methodology was expected to be based on the knowledge of the relations among chemistry, biology and medicine. Originally, chemists believed that a drug molecule consists of a scaffold with several substituents. While the substituents were replaced by alternate functional groups (aka substructures), the activity value of the molecule would be changed accordingly. This is termed as structure–activity relationship (SAR), which can be used to guide chemists to chemically modify the molecule to improve its druggability. Along with the progress of computing technology, SAR evolved into QSAR (Quantitative SAR). QSAR method prevailed in the era when determinism dominated the scientific community. Therefore, the paradigm of QSAR studies was based on the thought of “discovering the analytical rules (analytical formula of functional) between independent variables and functions hidden in experimental data by curve fitting or regression”. Earlier QSAR was based upon so called the similarity and additivity postulates. With the advent of the era of high-throughput experiments and big data, the two postulates are facing serious challenges. Coupled with the puzzling problems (such as substructure partitioning, “activity cliff”, unbalanced data sampling, and the paradox of prediction accuracy and generalization), conventional QSAR was declining from mature. In the beginning of this century, artificial intelligence (AI), specifically deep learning (DL), significantly succussed in image pattern recognition and natural language processing (NPL). AI was soon adopted for QSAR studies as a disruptive approach. It is now believed that drug design can be data-driven instead of rule-based (curve-fitting or regression). QSAR can also be directly revealed by AI without knowing the mechanisms of actions. Thus, the two postulates of conventional QSAR are no longer required, and the associated puzzling problems or paradoxes could be resolved. By examining the historical pathway of QSAR evolving into AI assisted drug design (AIDD), this review summarizes the process how the drug design paradigm is transformed from determinism to causalitism + probabilitism. The principles and challenges of drug design methodology are explored, the pros and cons for QSAR and AIDD are discussed with perspectives. It is worth noting that although AIDD is powerful, it is not omnipotent and should be treated rationally. The essence of machine learning is to reveal the major trends of a data set; while the minor trends (aka outliers, which are often ignored or discarded) cannot be captured by AI algorithms. However, the outliers are likely to be the entrances to disruptive discoveries. Therefore, philosophically, it is unrealistic to develop innovative drugs only relying on AIDD. AIDD’s achievements rely on inheriting the legacy of QSAR's theories, methods, technologies, and data.
Evolving Drug Design Methodology: from QSAR to AIDD
09 August 2022, Version 1
This content is a preprint and has not undergone peer review at the time of posting.