Computational Study of Paxlovid in Ligand GA

Ligand GA is used to generate potentially realistic inhibitors of the 3CL protease of the SARS-Cov-2 virus by starting with the Nirmatrelvir molecule and changing it. After post-processing of the molecules using an extensive set of docking parameters, several of the molecules are selected and presented in detail that have good inhibition and metabolic activity, and binding increased to Mpro, while small change occurred in the binding to the liver enzyme CYP 3A4. This is a computational study of realistic potential inhibitors of the SARS-Cov-2 main protease found by using this software, with multiple results of which some are highlighted in particular. Of note is that these molecules may not require a CYP liver enzyme inhibitor such as Ritonavir in practical application.

, hence viral replication at an early stage of infection. The x-ray structure of Nirmatrelvir bound to the SARS-Cov-2 3CL protease, aka Mpro, is available at the Protein Data Bank, PDB ID 7SI9 [20]. The molecule is a descendant of an earlier created molecule, PF-00835231 [21], created during the SARS-Cov-1 outbreak in 2003 1 . Nirmatrelvir's precursor is an injectable and was modified, turned into an orally ingested pill, which gained FDA EUA approval as Paxlovid in Nov 2021. However, the approval is for patients with high risk of hospitalization or death after infection. The inhibition of viral replication to avoid serious illness has a window of ~5 days post-infection, after that the virus has multiplied enough in the body that inhibition will not have an effect.
A serious drawback of Nirmatrelvir is that it metabolizes too quickly in the liver [13]. The cytochrome P450 class (aka CYP ###: gene family, subfamily, isoform/individual type) of enzymes is common, occurring in different species and contain a heme iron network that makes them function as oxygenases [22,23,24]. There are 57 encodings for CYPs in the human body, of which 6 are liver enzymes: CYP 1A2, 2C9, 2C19, 2D6, 2E1, 3A4, some of which are also found in the abdominal gut, and estimated to account for 80% of drug metabolism. These enzymes are the main source of metabolism and excretion of drugs in the human body. These, primarily CYP 3A4, eliminate Nirmatrelvir from the body and reduce the plasma concentration in the bloodstream.
Ritonavir is a well-known inhibitor of most of these enzymes and is commonly used to increase the concentration of other drugs in the body by non-specifically blocking the enzyme's normal metabolic function. It was initially developed as a protease inhibitor, in particular to HIV, but is now commonly used as an activator of other drugs by inhibiting the CYP enzymes, amongst other proteins. The molecule Ritonavir is large and flexible and sticks to almost any protein; it is not a precision designed small molecule, but the body can metabolize it. The problem with Ritonavir, and as a result with the Paxlovid therapeutic, is that by inhibiting these enzymes it also blocks the metabolism of a number of other molecules and drugs [25], including commonly used anti-depressants and cholesterol medicines. The use of Ritonavir could send a normal medication or dose of prescribed medication into a toxic level. There are a variety of side effects of Ritonavir [26]. As a result, Paxlovid is primarily intended for high risk groups in its FDA approval [27], which is about 2% of the population in the US.

Section 2: Docking calculations
Before the details of the docking analysis are presented for the molecules there are some comments. Binding of a ligand to a protein is often quantified by a single number using docking software from a top scoring pose. The physical process of a ligand-protein interaction can not be defined by a single number, although being a good quantifier of interaction. The potential interaction energy well that a small molecule occupies geometrically when bound has structure: depth, width, and high dimensionality coming from the many body atomic interactions of atoms from 2 molecules (electrostatic, van der Waals, torsional rotations, longer range hydrogen bonding, electronic quantum structure, and so on). There are generally different and complicated high-dimensional well shapes of energy minima and, as a result, different bound states reflected in docking software by different ligand conformations and perturbations (wiggles) of these. The stability of these bound states is dictated by characteristics of the local potential well, translating to evaluation of the disassociation constant k_D. This information is required for accurate estimation of the on-/off-population of a ligand to a protein and its inhibitory effect.
In a multi-state system after sufficient time, the population of the ligands can be found by a usual thermodynamically Boltzmann weighted distribution. These primary states can be small in number, but with a spread in energy of almost degeneracy; thermal motion of a stable ligand conformation can result in near degenerate energies depending on the ligand and protein. The energy spread about these primary conformational modes can be visualized by physical wiggle room, e.g. spatial perturbations of a bound small molecule coordinates in an almost degenerate but continuous set of states trapped in a well while maintaining boundedness of the ligand. This spread depends on the ligand and protein.
GOLD is a non-deterministic genetic algorithm and running it multiple times samples the possible binding states, local and global minima of the binding interaction model fitness function. Large sets of docking jobs of Nirmatrelvir and all of the molecules presented in this work was done to sample the distribution of states quantified by docking score, i.e. binding interaction energy. The histogram of the docking scores of Nirmatrelvir/Mpro is shown in Figure 1. There are seemingly 2 dominant modes of ligand-protein interaction, a broad one at 46 and a narrow one at 68 GOLD PLP score. A finer bin resolution points to additional sub-structure about these scores, Figure 1 The distributions found by large sampling with random initial conditions from docking software can be useful in reducing the dimensionality of the dihedral angles of a ligand to protein in optimistic binding goals. A primary smeared peak in the energy density of states points to a reduction in the degrees of freedom in describing the ligand and its conformations near that point, such as at 68 and 46. In the case of the peak at 68, 5 of the 9 rotatable dihedral angles do not change much, and also perhaps 2 others. The 'lower half' of the molecule flip flops substantially in several different orientations within this set, and is due to only one degree of freedom angle. This is found by visualizing many of these molecules at ~68 docked score values.
A potential inhibitor of a protein, including that of the Mpro SARS-Cov-2 protease, has to have physical capability of entering into its docked pose. The small molecule should be able to rotate in dihedral angles and ring conformational states without obstruction to enter and bind. Docking software does not model this process before somewhat bound, but only the binding based on molecular models and modeling and physical data. Two widely separated peaks, without any scores in a histogram between them, are disjoint and can be interpreted -large variation geometrically of the molecule, although allowed and keeping its stereoisomer, are in different conformational states which are not smoothly connected. Two widely separated peaks with populated states in energy between, i.e. smooth (probabilistic) density of states, can be connected by small changes in the molecule and conformations. This is important in determining if a computationally determined molecule is reasonable. As an extreme example, an isolated cavity deep within a protein without any physical pathway outlets is unreachable, but could score very high or elsewhere in a distribution of scores via docking calculations.
These points are relevant to using a computational docking tool that measures conformational poses based on simplified energy calculations. As an example, Figure 2 2 shows two conformations of Nirmatrelvir/Mpro, both in the highest distribution peak of a large set docking calculations with random but chemically constrained torsion angles by modeled atomic interaction in GOLD. (The highest scoring pose at 71 is presented in next Section 3). These 2 superimposed example docking states have PLP scores of 68.5 and 68.2 and well representing the peak at 68 in Figure 1(a). The spatial difference of the two, i.e. conformations, is due to a single dihedral angle in the molecule, and the peak distribution at 68 is from the conformational change, quantified by the GOLD docking software and its initial conditions, guided by molecular interactions towards local minima. Most wiggles and small energy differences in the distribution come from small thermodynamic perturbations of geometric shape and orientation, not two different conformations near each other in binding energy.  Energy distributions and density of states calculations via random sampling of binding states is necessary to find accessible binding conformations out of all possibilities in addition to filtering against geometrically realistic paths to the binding. These histograms are reported for the molecules examined in detail in later sections of this work.
The docking calculations are also performed on a primary drug metabolizing human liver enzyme P450, CYP 3A4. This enzyme is one of 6 CYP enzymes responsible for extracting waste-like molecules from the human body. CYP 3A4, while being considered the most prevalent, is one of these. The crystal structure of Ritonavir bound to CYP 3A4 [15] was used in the docking calculations, with the ligand stripped. The binding site of CYP 3A4 is a large vacuous chamber on one side of the heme iron atomic frame. CYP 3A4's binding site will stick highly to very general molecules. Because of its size geometrically, larger small molecules can easily 'fit' into this region without obstruction which is presumably why this enzyme is good at transporting waste molecules from the liver. A total ligand-protein binding energy will scale with the number of heavy atoms, so that a bound small molecule twice the size of another could result in a larger, even ~2x, larger binding energy. However, the target protein binding site of the drug is amino acid specific in function and increasing the size can miss the target and also create unwanted interactions with other proteins.

Section 3: Paxlovid
Paxlovid is a combination of 2 molecules: Nirmatrelvir and Ritonavir. Paxlovid in the literature is often referred to Nirmatrelvir in reference to the combination with the Ritonavir molecule.
The x-ray crystal structure PDB ID 7SI9 [20] of Nirmatrelvir bound the SARS-Cov-2 was used for docking and also in validating the docked pose. The computational docked pose is close to that of the complete x-ray structure, including the eclipsed trifluoromethyl group. The docked pose overlayed with the 7SI9 ligand is illustrated in Figure 3. The trifluoromethyl is in an eclipsed position which is unusual because typically this is higher in energy than non-eclipsed. Studies of these groups have been done with this eclipsed position and found to be more stable in classes of compounds [32]. This is the highest scoring docked Nirmatrelvir/Mpro pose generated by CCDC GOLD out of multiple runs. The RMSD of the 2 molecules in Figure 3 is 2.89 Angstroms as calculated in PyMol, comparable to the 2 A x-ray resolution. A Chimera view of Nirmatrelvir in its highest scoring docked pose, its Fischer projection in ChemDraw, and it bound to Mpro are in Figure 4. The distributions of docking runs to both proteins, Mpro and CYP 3A4, is in Figure 1. Several structural characteristics of Nirmatrelvir are: it has a fused amino-like double ring, a trifluoromethyl group which is eclipsed when bound to Mpro, and a nitrile extension.
The SARS-Cov-2 3CL protease (Mpro) binding site is an X-shaped cavity which can be described as: 3 ends are caves and the fourth is an exiting valley. This is the region of the Mpro relevant to viral replication which is targeted and partly blocked by Nirmatrelvir. This molecule, when bound, primarily fills in 2 caves.
Note: In this image of Nirmatrelvir bound to Mpro, the solvent accessible protein surface (SAS) is displayed. This has the advantage of clearly showing the binding cavity. However, in this image and in general there could be a hydrogen appearing to penetrate the surface. This is the case with Nirmatrelvir and Mpro in the nitrogen atoms 19, 36 farthest, nearest nitrogens from the trifluoromethyl group. These both are hydrogen bond donors (info on next page) with Mpro atoms 1315, 1112 in hydrophilic GLU 166, hydrophobic PHE 140. There is no structural clash or overlap, and this is common in ligandprotein Chimera images. Figure 5 shows the ligand binding without the surface. This particular state of Nirmatrelvir is an outlier in the histogram of docking runs. First, it is the highest scoring conformer. Second, there are 11 out of these that scored rounded to either 70 or 71. That is a PLP score of 2 difference, 2 steps of discreet jumps at ~1 to 69, which is unusual and not typical of wiggle in conformations. This gap in states, although not blatant of a total 1/3 kCal, could be due to the molecular interaction modeling. There doesn't seem to be anything conformationally that would make this conformer to stick out from the bump at 67-69 in Figure 1.
The geometric pose agrees with the x-ray crystal structures made with Mpro-Nirmatrelvir [17]. X-ray samples of protein-ligand complexes are unique in energy minimization however, and the very much larger density of states at 67-69 with a spread of ±3, Figure 1(a,b) is biologically realistic, with a typical score of 68. The difference in score from Nirmatrelvir-Mpro to Nirmatrelvir-CYP 3A4 is 84.4-70.7 = 13.8. This is very large, approximately of 2.1 kCal in binding energy. GOLD PLP scores are dimensionless numbers representing the interaction of docked ligand to the protein; the rough correspondence between PLP score differences and energy differences is found by known calculations in terms of both and is roughly 6.5 PLP for 1 kCal of various molecules. The binding energy is important and biochemical terms of ligand in the on-/off-states of protein-ligand is important. In terms of k_D or on-off ligand binding fraction, 1 kCal is commonly referred to be a factor of 10, although this is not a linear or rigorous relation. Studies of the metabolism of Nirmatrelvir revealed that it doesn't last for long in the body. The liver enzymes are actively working fast with it. As a result, plasma concentrations of it decrease too rapidly. 2 possible solutions to maintaining sufficient presence to have an inhibitory effect on viral replication via the target Mpro are: 1) use higher and more frequent doses, 2) add an inhibitor to the garbage collection process by blocking the action of the CYP enzymes, in particular 3A4. Both have drawbacks: 1) unwanted interactions with other proteins, 2) blocking the liver enzymes from normal function could result in increased levels of other medications to the point of toxicity. Dose changes for either purpose obviously means more interactions with a variety of proteins, not only Mpro.

Ritonavir
Ritonavir (Norvir) is a well-known inhibitor of CYP 3A4, and most of the CYP liver enzymes. Initially developed for treating HIV, it became well known as a CYP inhibitor and is currently used in conjunction with other drugs to maintain or raise their concentrations in the body. It is a very large small molecule that breaks most of Lipinski's Rule of 5 ADME conditions. It has mass of 720 Da and 18 rotatable bonds. It is so flexible and large that it potentially sticks to almost anything; it is not a precision small molecule inhibitor of a protease. Ritonavir is also a protease class specific inhibitor and was considered in conjunction with Liponavir as an inhibitor of Mpro due to latter's structure. Many molecules of this size and flexibility have been both computationally and in the lab examined as possible viable inhibitors of SARS-Cov-2 Mpro [33,34].
The canonical representation of Ritonavir is, The molecule, diagram of it, its highest docking pose to Mpro, and its highest scoring pose to CYP 3A4 are shown in Figure 6. The distributions of docking runs is given in Figure 7. The distribution is wide and very normal; this is due to the largeness and also the flexibility of 18 rotatable bonds giving it the ability to stick in almost any conformation. The peaks,widths pertaining to Mpro and CYP 3A4 are centered at 57,10 and 102,11.
The docking of Ritonavir to CYP 3A4 results in an extremely high PLP docking score, 142. This confirms its use as an effective inhibitor. It essentially fills the entire cavity adjacent to the heme iron frame, see Figure 6. Note that it has 50 heavy atoms and its size is primarily responsible for the high PLP docking (interaction) score. Per atom the score is 2.56, which is high, but the filling of the cavity and the number of atoms is the primary reason for its large interaction with CYP 3A4. By contrast, Nirmatrelvir has 35 atoms, a score of 84.11, and a similar score per atom of 2.41 to CYP 3A4.
The SARS-Cov-2 Mpro is structurally similar to Ritonavir, and Liponavir-Ritonavir, has been examined in inhibiting Mpro, the latter of which is in the class of proteases that it binds to. The docking score is 100.37, which appears reasonable when compared to Nirmatrelvir's score of 68 or 70.7. However, as mentioned, Ritonavir is large and flexible and can generally interact and fit somewhat to general proteases and proteins. Upon inspection of the docked Ritonavir to Mpro, it extends into one X-cave and the surface tunnel. It partly enters another X-cave and covers over, not in, the 3 rd cave. It possibly enters the target region necessary to inhibit biochemical viral replication. This is a good example of why localization of small molecule to the protein is more important than a high docking score in realistic molecular design, however both being important. Ritonavir was found in conjunction with Liponavir, and also in other combinations, not to be an effective drug in blocking the SARS-Cov-2 [35].
Ritonavir's use in blocking the metabolism of other drugs is well known. However, its large effect of 'stuffing the waste basket until full' of CYP 3A4 (and other CYP liver enzymes) so that that a molecule can not be cleared by the liver has the disadvantage of blocking a variety of normal liver functions and of other drugs/small molecules. This a reason that the drug Paxlovid is FDA approved only for high risk groups, i.e., those individuals with predisposition to severe Covid-19, who can accept the risk of Ritonavir. This disadvantage could be eliminated or alleviated by finding other potential small molecule candidates that don't require Ritonavir or other drugs to inhibit ordinary bodily function.

GOLD Parameters
There are 2 sets of parameters, from Ligand GA and from CCDC GOLD. The GOLD parameters during the iterations were chosen with mostly auto default: 15 identical docking jobs were run in each GOLD molecular docking, an autoscale=1.1, early termination was turned on, and a cavity.atoms file was used with a radius of 20 Angstroms centered about. After the ranked set of non-isomeric files was obtained after 4 or 5 iterations using Ligand_GA_Output_Analysis, Ligand GA was restarted with the 30 top scoring non-isomeric molecules. This continued until 33 iterations were completed. Then the net set of the result from the total collection was ranked and ordered using Ligand_GA_Output_Analysis. A truncated list of 76 molecules was selected based on any GOLD PLP score greater than 79; this cutoff was chosen because the highest score of any of Nirmatrelvir's stereoisomers is 78. Then this list of 76 non-isomeric molecules was fully post-processed with an extensive set of GOLD parameters: 30 repeated docking calculations per ligand, pop_size=75, n_islands=10, max_ops=300000, early_termination=0 (off). The set 76 non-isomeric molecules is given in the Supplementary Information. The resulting number of molecules is 15922, coming from 76 non-isomeric molecules (average 260 stereoisomers). The total number of post-processing GOLD runs is ~600,000.
The GOLD docking parameters during the Ligand GA run in using mostly defaults are for faster iterations at cost of less accurate evolution. Due to GOLD being a genetic algorithm also, the results can slightly vary because a genetic algorithm is not necessarily deterministic. Scores can change if not enough search is made in GOLD but in repetition the docking random sampling of initial dihedral angles will stabilize and show the conformational modes of the ligand to the protein.

Ligand GA parameters
The point of stopping after 4 or 5 iterations and then restarting with a ranked list is to improve the rate of population improvement towards a local minima. This is a typical maneuver in using genetic algorithms if the fitness function is very complicated and the time of computation required very large. A population of 30 chromosomes is sufficient to obtain results in 33 iterations. A typical iteration requires about 4 hours with 64 cores. Genetic algorithms are known to converge rapidly in less than 15 iterations, very much depending on the complexity of the fitness function and size of the search space, and much more iterations are necessary to improve accuracy at the minima, i.e. flattening off behavior of genetic algorithm best scores. It should be noted that due to the non-zero elite_fraction in the GA, a portion of the non-isomeric molecules is resampled and docked by GOLD, and this improves the accuracy of the calculation while others are kept after the 4 or 5 iterations.
The molecules were evolved in the genetic algorithm Ligand GA software by using the SARS-Cov-2 Mpro binding site. In addition, the molecules were post-processed on a primary liver human enzyme CYP 3A4. The same extensive GOLD parameters were used on both proteins for all molecules, and a 20 Angstrom sphere cavity file was used with the sphere centered on the Mpro GLU amide and the CYP 3A4 heme iron.
The set of Ligand GA parameters used was the default set described in [1]  There is much information in a gold docking output mol2 file. The GOLD output mol2 files not only carry the coordinates and conformation of the poses but also detailed ligand and amino acid information, bond information, amino acids in the cavity, and other information. Docking score values are from modeled interactions of true inter-atomic interactions and are not true to 2 significant digits. The numbers are kept to this precision to reflect GOLD output and all distributions used rounded integer numbers of score values S(PLP)). The files in Supplementary Information have these values to 2 significant digits. The overall docking score summary of each molecule in this work is quoted in terms of:

S(PLP) S(hbond) S(cho) S(metal) DE(clash) DE(tors) Intcor
Individual GOLD output ligand PLP atomic scores can also be requested by changing GOLD parameters, and is useful in editing molecules. In mention, the Ligand GA fitness function was changed to average docking score per heavy atom, and several multiple iteration runs were done with this but not presented.
The Ligand GA fitness function uses an S(PLP) -ADME_Penalty for each of the molecules and stereoisomers. ADME_penalty enforces the Lipinski Rule of 5 conditions in the soft sense [1]. There are additional constraints that have been written for a completely different set of molecules and goal, including enforcing particular hydrogen bonds, but these are not used in this analysis.
The calculations in this work follows the procedure in the flow diagram in Figure 8  Step 1 uses Ligand GA. Then by using a cutoff of 79 for each of the non-isomeric molecules, 76 molecules are used in step 2. In step 2, these 76 molecules are processed with a more extensive set of parameters, -30 repeated docking calculations per ligand -pop_size=75 -n_islands=10 -max_ops=300000 -early_termination=0 (off) Then from these results, an even more exhaustive set of calculations is performed for molecules 48:32 and 74:19, and Nirmatrelvir, Ritonavir, consisting of 100 identical molecules and 200 GOLD docking runs. All of the processing after Step 1 uses both Mpro and CYP 3A4.
Any of the molecules at Steps 1-3 can be edited for improvement in Ligand GA and also for realistic pharmacokinetics. This wasn't done the molecules used in this work. The different output directories and files can be used to reproduce any of the calculations.

Section 4: Analysis
There are several criteria in searching the set of 15922 for good choices: 4. Satisfactory visual geometric docking fit to the site and individual atom interactions between protein and ligand. 5. Synthesis, maybe at scale, is important and is analyzed by inspecting individual molecules. 3 The iterations of the molecules in Ligand GA follows the highest scoring stereoisomer of the nonisomeric molecule. Molecules are represented in textual non-isomeric SMILES. Upon expansion of one of these many stereoisomers will be generated, and the highest scoring one is what guides the selection from one iteration to the next of the non-isomeric population.
First the results from the 15922 molecules are given, of which Nirmatrelvir is not included because its score is less than 79. There are more than 15922 in the complete output, but a cutoff slightly higher than the largest score of the stereoisomeric completion of Nirmatrelvir was used to limit the analysis and post-processing of the total set. Figure 9 gives the distributions of these molecules, in score differences between molecule/Mpro to molecule/CYP 3A4, the scores of molecule/Mpro, and the scores of molecule/CYP 3A4. The Mpro/CYP 3A4 score Pearson coefficient of the 15922 molecule scores docked is .33, a low correlation between binding strength of the molecules to 2 proteins. The 2 binding sites are from 2 proteins and are very different, and a high binding to Mpro with low binding to CYP 3A4 is a goal of the analysis. CYP 3A4 has a large and bland cavity in which molecules bind into; it isn't surprising that there is no correlation.
The average difference of the Mpro and CYP 3A4 scores is 25.5, approximately 3.9 kCal if 1 kCal per 6.5 is used. This high average can be problematic in the design of realistic drugs as very quick metabolism will result if it is much higher than the binding to the target protein. The difference of Nirmatrelvir is 13.8, which could be improved upon.

Selection of molecules:
electing particular molecules depends on the criteria. All of the molecules satisfy Lipinski's ADME restrictions as the molecules were evolved with an ADME penalty in the Ligand GA fitness function. The conditions 1,2,3,4 are used to present 2 sets of molecules. The docking of molecule to CYP 3A4 is not used in Ligand GA but analyzed after the molecules were generated. (Both proteins are used simultaneously in the to be released version Ligand MultiObj GA).
Out of the 15922 isomeric molecules generated from the 76 non-isomeric molecules, 5 had docking score difference between Mpro and CYP 3A4 that is positive (criteria 3). All of these had higher ligand/Mpro scores than Nirmatrelvir/Mpro, and 4 had lower scores to Ritonavir/CYP 3A4 (criteria 1,2 together with 3 is marked by an x in Table 1). The molecules are listed in Table 2 and their molecular representations are in Table 2. The mol2 files and GOLD docking output are given in the Supplementary Information. Two of these molecules, 48:32 and 74:19 are discussed in detail, and the molecular diagrams of the other 3 molecules, stereoisomer specific, are shown in Figure 10.   There are many other possibly viable molecules. Docking score, or measuring binding interaction, depends on the molecular modeling and is only -one rough-quantifier of potential binders. Although CCDC GOLD does produce x-ray validated docking of ligands, the fit of the molecule in the cavity and at the relevant region at amino acids is important. Ritonavir has a high docking score to Mpro, was considered as a potential inhibitor once, then experimentally found to have no effect; it does stick to Mpro but it is large and very flexible -it sticks away from the region of Mpro that is responsible for cleaving polyproteins in the viral replication process. The size of the molecule, i.e. mass or number of heavy atoms, is also important in considering possible inhibitors; increasing the docked score per atom is independent of the number of atoms; a total binding interaction does increase generically with the this number.
More molecules are listed in Table 3 which follow the conditions 1,2, and 4. Out of the complete set of 15922 there were 76 which satisfied conditions 1,2,4. Also, docking calculations is not an exact calculation, but being based on a genetic algorithm with dihedral angles is non-deterministic. A cutoff factor of 1.02 was used to multiply the score of the Ligand GA molecules in comparing with Nirmatrelvir. This conservatively raises the docking score of the molecule to both Mpro and CYP 3A4, and decreases the number of screened molecules than without (122 with 1.0 cutoff factor). The set of 76 molecules all have a stereoisomer (15922 total from the 76) that has a higher binding score than Nirmatrelvir while satisfying ADME restrictions. There are many molecules out of 15922 with higher scores, as seen in the distribution Figure 9(c).
The 76 non-isomeric with scores above 79 are a direct output of Ligand GA. It is reasonable and important to manually edit molecules if necessary. For example, Molecule 74:19 and 68:51 have an external 3-member C ring that appears peculiar and raises the question 'what is that for'. It does have an effect in the geometry of the stereoisomer and interaction, but it could be changed and then reanalyzed. Manual modifications of functional groups or atoms is expected in realistic inhibitor design. Simeprevir, a very effective treatment of Hepatitus C, has 2 3-member C rings and one of them is external.
Improvement of Nirmatrelvir so that interaction is increased with Mpro and decreased with CYP 3A4. In this work only Mpro was used in molecular evolution with a GA. The situation improves after using a multi-objective genetic algorithm. In Ligand MultiObj GA the small molecules are iteratively improved simultaneously using 2 fitness functions in the evolution, maximizing interaction with Mpro and minimizing with CYP 3A4.

Analysis of Molecule 48:32
A detailed analysis of one of these molecules, 48:32, is given.
The interaction of a ligand to a protein and biochemical behavior is not found by only a score, but rather a complete examination of the ligand, including visualization of the protein-ligand complex. This molecule fits in the set of 15922 as an outlier in the binding interaction score differences of Mpro and CYP 3A4, and in the highest quartile of scores to both proteins. Originally this molecule was selected on from duplicated set of GOLD runs with exhaustive GOLD search parameters on Mpro and CYP 3A4, which led to highest scores of 82.26 and 78.69, a difference of -3.57. Given the large number of multiple runs of docking job with different random initial conditions, from the 2 nd post-processing step of 4 molecules (Figure 8), the highest Mpro scores changed slightly to 82.61 and 86.55 from that in Table 1. It's molecular properties relevant to ADME are Molecule 48:32 is 70% the mass of Nirmatrelvir. It has 80% the number of heavy atoms, and it has 60% the number of rotatable bonds than Nirmatrelvir. This molecule is much smaller and less flexible. It has 4 rings not 3, which makes it more compact in size. These parameters are seemingly better as far as tolerance and permittivity of it in the biological process of ingestion to the target protein.
The distribution of binding modes is in Figure 11 from 1000's of docking runs with exhaustive search parameters. There appear to be 4 primary Molecule 48:32/Mpro docking modes at 52, 60, 73, and 82. The score distributions (representing binding energy) and their widths in the histogram should be taken into account in comparing computationally the ligand binding of Nirmatrelvir versus molecule 48:32. All of the modes of molecule 48:32 are higher in binding score (energy) comparatively, with the 83 one from 48 32 Mpro much larger. The number of rotatable bonds is 6 and comparing to Nirmatrelvir's 9 is a 33% reduction. There are fewer rotatable degrees of freedom; the fewer rotations could be a source of the differences such as 4 major center points of its PLP score distribution in Figure 11, as compared to 2 of Nirmatrelvir in Figure 1.
The distribution of scores of Molecule 48:32/CYP 3A4 is useful in gauging how it binds both in value and conformationally to CYP 3A4. This distribution is in Figure 12.    The overall increase of PLP 11.93 (17%) is roughly 1.8 kCal. This is a significant increase in protein-ligand interaction energy and could represent an increase of on-/off-binding of up to approximately x50 to x100. Figure 13 shows the highest scoring pose of 48:32 to Mpro, the molecular diagram, and the docking to Mpro and CYP 3A4. In CCDC GOLD, the sidechains of some of the amino acids in the cavity are allowed to rotate to some extent. In Figure 14 an overlay of docked Nirmatrelvir to Mpro is shown with the protein from the docked 48:32 to Mpro. There is a small difference in the protein surface if the image is shown with the background protein of docked 48:32 to Mpro, but it is negligible. Figure 14 shows the superimposition of both in the CYP 3A4 enzyme cavity. Molecule 48:32 in the X-shaped Mpro binding site spatially goes into 2 of the 3 caves and extends partly into the exiting surface tunnel. This is similar to Nirmatrelvir, which does the same, but 48:32 slightly farther. Its atomic content is different, and this will influence the interaction energy. It's 20% size reduction in this fitting make it more precise atomically as a ligand to target inhibitor. The per atom docking score of Molecule 48:32 is 2.95 as compared with to 2.02 of Nirmatrelvir; this is a 46% increase. It is strongly encouraged to open the mol2 files in a molecular viewer to see the different geometric aspects of binding of these molecules in overlay.  Molecule 48:32 has an almost Z2 or symmetry which is inherited from the same in the X-shaped cavity of the Mpro binding site. Were it not for an additional =O on one side it would be atomically reflective. This is in contrast to Nirmatrelvir which fills the legs of the cavity differently. The Ligand GA molecular evolution in finding optimal binders is guided by higher scoring molecules than the input population. The docking score per atom increased by 46% as a result, but it is interesting that the result points out symmetry of the Mpro cavity also.
In conclusion about Molecule 48:32 there are 2 more points. In regards to synthesis, it is a made of known fragments, 2 of which are in Nirmatrelvir. In biochemical activity it is small, 28 heavy atoms, that can have atoms substituted to make the very complicated ADME process more appropriate. This work is computational, and there is no substitute for lab work and testing.

Analysis of Molecule 74:19
The discussion begins with the highest docking scores of Nirmatrelvir  This molecule fits in the set of 15922 as one of the highest scores to Mpro and average to CYP 3A4, 90.3+5.4=95.8 is 1 sigma, Figure 9. Its molecular properties relevant to ADME are: Molecule 74:19 is 98% the mass of Nirmatrelvir despite the increase of 2 atoms. It has 2 more heavy atoms and one less rotatable bonds than Nirmatrelvir. It has 4 rings not 3, 2 of which are 3-member Crings. It is proportionally more carbon as it has fewer nitrogen's and oxygens, and is less massive. These parameters are seemingly better as far as tolerance and permittivity of it in the biological process of ingestion to the target protein and in ADME. Note that it has a 3-member C ring extending from it; as commented earlier on page 19, this can be edited and the docking re-examined.
The distribution of binding modes to Mpro is shown in Figure 16. There are several primary ones including a major center at 49 and one at 89. The one at 89 is much higher than the Nirmatrelvir at 68, a 32% increase. The histogram peak Figure 16(a) and resolved in (b) at scores 81 to 89 of Molecule 74:19 has many conformational states, suggesting conformational freedom to access the one at 91.11.
The score (interaction energy) distribution of docked states to CYP 3A4 is very broad with a center at 75. This distribution is in Figure 17. The binding cavity of CYP 3A4 is an open region that generic molecules can fit into, and although the maximum 74:19/CYP 3A4 is at 94.27, that is far from 75 and an outlier in the distribution.  The overall docking score of 74:19/Mpro is 91.11 and that of Nirmatrelvir/Mpro is 70.68, for an increase of PLP 20.43 or 29%. This is roughly 3.1 kCal (at approximately 1 kCal per 6.5 PLP score), meaning a fairly large increase in the ligand in-/off-states by up to a factor x1000, but this correspondence between kCal and population ratio x10 per kCal is not linear and a rough estimate. Figure 18 shows the highest scoring pose of 74:19 to Mpro, the molecular diagram, and the docking to Mpro and CYP 3A4. In Figure 5 an overlay of docked 74:19/Mpro and Nirmatrelvir is shown with the docked Nirmatrelvir/Mpro. Figure 6 shows the superimposition of both on the CYP 3A4 enzyme. Molecule 74:19 in the X-shaped Mpro binding cavity extends into all 3 caves and partially extends into the surface tunnel. Spatially, it fills more of the cavity than Nirmatrelvir, which extends into 2 of the 3 caves and partly into the tunnel. The per atom docking score of 74:19 is 2.46 as compared with to 2.02 of Nirmatrelvir, 2.95 of Molecule 48:32. This is a 22% increase from Nirmatrelvir.
The overlays of molecule 74:19 and Nirmatrelvir attached to Mpro are shown in Figure 19. As before to illustrate the small differences in the rotatable protein sidechain that a ligand can make when attached these are shown with the Mpro docked with Nirmatrelvir and molecule 74:19. They are essentially the same. The overlays in the docking to CYP 3A4 are shown in Figure 20.