Homology Models of the Papain-Like Protease PLpro from Coronavirus 2019-nCoV

. The December 2019 outbreak of pneumonia in Wuhan, Hubei Province of China was rapidly linked to a novel coronavirus 2019-nCoV. The rapid spread and severity of the virus has led the World Health Organization to declare it a Public Health Emergency of International Concern. We recently described the first homology models of the main 3CL protease from 2019-nCoV, and now present models of the other viral protease, the papain-like protease or PL pro . Whilst the overall viral genome is most closely associated with bat coronaviruses, no bat PL pro crystal structures are known. Wuhan 2019-nCoV PL pro is most closely related to a bat coronavirus PL pro (97% identity), then SARS (80 %) and MERS (29%) and the most promising models presented here are prepared from SARS crystal structure templates.


Introduction.
The detection in December 2019 of a cluster of pneumonia cases in Wuhan City, China, with no known cause prompted a rapid response from health authorities. 1,2Within a month a novel coronavirus 2019-nCoV, also named Wuhan seafood market pneumonia virus isolate Wuhan-Hu-1, had been identified as the causative agent and a preliminary genome sequence had been released to the research community, 3 followed by 3 more accurate iterations lodged in Genbank. 4,5At the time of writing there have been >9000 confirmed cases and the virus has spread to 20 countries (Australia, Cambodia, Canada, China, Finland, France, Germany, India, Italy, Japan, Republic of Korea, Malaysia, Nepal, Philippines, Singapore, Sri Lanka, Thailand, United Arab Emirates, United States of America, Vietnam), culminating in the World Health Organization declaring the outbreak of 2019-nCoV to be a Public Health Emergency of International Concern (PHEIC). 6ilst efforts to understand the biology of the virus have begun with a view to producing an early vaccine, global efforts to contain the outbreak are currently concentrated on rapid development of diagnostics, detection of infected persons, their isolation and treatment.In the next phase, work will begin on finding new specific drugs to treat 2019-nCoV infections, and structural biology of the viral proteins will play a key role.In a previous communication we reported initial homology models for the 3CLpro encoded by the Wuhan novel coronavirus (2019-nCoV 3CLpro). 7Beta coronaviruses such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS) coronavirus generally contain a second, papain-like protease (PL pro ), which in concert with 3CL pro process the two viral polyproteins encoded by two open reading frames orf1a and orf1ab.In addition to their normal processing of viral polyproteins, the PL pro from SARS and MERS have also been shown to have de-ubiquitination and de-ISGylation functions.][10] SARS and MERS PL pro enzymes have a preference for cleaving substrates after two glycine residues. 11,12 contrast to 3CLpro which utilises a Cys/His catalytic dyad, papain-like proteases from MERS and SARS possess a classical Asp/His/Cys catalytic triad.They also possess a Cterminal zinc-finger like domain consisting of two b-hairpins bearing four cysteine residues which coordinate a Zn 2+ ion with tetrahedral geometry (Figure 1).The zinc-binding region has been shown through mutagenesis of the cysteine residues be crucial for both structural integrity and catalytic function. 8The N-terminal region is a ubiquitin-like (Ubl) domain which is not directly involved in the proteolytic function of PL pro .The central "thumb" domain is primarily comprised of a-helices and contributes the catalytic Cys112 to the active site.The C-terminal region is predominantly b-sheet and contains the fingers domain and also the "palm" region which bears the other two residues of the catalytic triad His275, Asp287.In this current work we now report the preparation of homology models of 2019-nCoV PL pro based on SARS and MERS X-ray crystallographic templates.

Methods.
The Wuhan 2019-nCoV genome was obtained from Genbank/NCBI (release MN908947.1). 4 Sequences from crystal structure sequences were obtained from the Protein Data Bank (rcsb.org).Sequence analysis was carried out using the web interface at the European Bioinformatics Institute (EMBL-EBI). 13Pairwise sequence alignments were performed using the EMBOSS-Needle method with default EBLOSUM62 matrices.Multiple sequence alignments were carried out using Clustal Omega using default HMM profile-profile parameters.Blast searches were carried out against the full UniProt Knowledgebase using blastp 2.9.0+ and default parameters.
Homology models were prepared using the publicly accessible online Swissmodel [14][15][16][17][18] programs, searching the SWISS-MODEL template library (SMTL version 2020-01-02, and PDB release 2019-12-27).Model 6 included an allosteric SARS PLpro inhibitor carried through from the PDB:3e9s template.3 Templates did not contain the structural zinc atom, resulting in models 3,7,10 also having no zinc atoms.These were included in the study however to examine the effect of lack of zinc on model stability in the fingers region in molecular dynamics simulations.Model 9 was produced as a homodimer.
The initial heavy atom-only models were further refined using the Protein Preparation module in Schrödinger Suite 2019-2 19 to add hydrogen atoms and optimise the internal hydrogen bonding network.Finally, the models were energy minimised using the OPLS3e force field with charges from the force field and implicit water solvent, and the Polak-Ribier Conjugate Gradient (PRCG) method to gradient <0.05Å.The final minimised models were then analysed using MolProbity 20 and visualised in Pymol v2.1. 21Molecular dynamics simulations were performed with the Desmond Molecular Dynamics System (D. E. Shaw Research, New York, NY) by using the tools incorporated in the Schrödinger Suite 2019-2. 19or full details of molecular dynamics simulation conditions see Supporting Information.

Results and Discussion
There are a total of around 26 SARS and MERS PL pro crystal structures in the PDB 22 (see Supporting Information Table S3 for the full list), but none from bat coronaviruses.An overlay of the available structures reveals a general conservation of fold in the zinc-binding and catalytic domains, but a flexible linker region between the catalytic and fingers domains results in some differences in the overall shape.In general the SARS structures are slightly more elongated than in MERS (Figure 2), and they also display more structural diversity in the ubiquitin-like domain compared to MERS.An initial search of the Protein Data Bank was performed for PLpro templates and these were aligned with Clustal Omega 13 against the full length polyprotein provided for the Wuhan 2019-nCoV (Genbank ID:MN908947), 4 and this produced a 326 amino acid region of interest.Although most of the MERS and SARS PDB hits were of slightly shorter (316-322) length the longer sequence was used as it partially matched two longer Avian Infectious Bronchitis Virus structures.The 326aa sequence was trimmed from the polyprotein and annotated as MN908947_PLpro (Ser1561-Gly1896 polyprotein numbering).At the time of writing there were 17 2019-nCoV polyprotein sequences in the NCBI database, all 100% identical in the PLpro region. 23Running a BLAST search on the trimmed WH_PLpro.fastaagainst the full UniProt Knowledgebase indicated that the nCoV PLpro was most closely related (97% identity) to a bat SARS-like coronavirus (UniProtKB:A0A2R3SUX5), followed by another bat coronavirus and a cluster of SARS sequences all at 83% identity (see Supporting Information Table S4).Pairwise Emboss-Needle alignments of MN908947_PLpro against individual Bat coronavirus, MERS and SARS sequences (see Supporting Information Figures S1-3), showed that the nCoV PL pro had only 29% sequence identity (48% similarity) to MERS PL pro , 80% identity (88% similarity) to SARS PL pro , but was 97% identical (99% similarity) to a bat coronavirus (UniProtKB: A0A2R3SUX5).A Clustal Omega multiple sequence alignment of MN908947_PLpro against bat A0A2R3SUX5 and representative MERS and SARS PLpro sequences is shown in Figure 3.This suggested a catalytic triad for the novel coronavirus consisting of Cys114, His275, and Asp289..*::: The trimmed MN908947_PLpro sequence was then used for Swissmodel homology model generation.The default Swissmodel template search algorithm initially only suggested 8 high quality SARS templates, 3 of which were semi-redundant as they arise from multiple nonidentical protease monomers in the unit cell.Four of the five crystal structures had ubiquitin bound.In spite of their lower sequence identity, 2 MERS templates were manually added to the template search and the 10 models were built (Table 1).Model9 was obtained as a homodimer, although this may be as a result a crystallisation artefact in the template, or a curiosity of the Swissmodel template library.Nevertheless all 10 models were taken through the refinement protocol in Schrödinger Suite 2019, and analysed via manual inspection in Pymol, and semi-quantitatively using Molprobity (See Supporting   The refined models were each also analysed by a 100ns molecular dynamics simulation to identify mobile regions and overall model stability.In all cases the ubiquitin-like and zincbinding b-hairpins were the most mobile regions.The presence or absence of a bound allosteric inhibitor did not dramatically influence the mobile regions (Figure 5).By contrast, models without an included zinc atom displayed a very high degree of mobility of the four cysteine-containing b-sheets with a downstream effect on the position of the loop bearing catalytic His275, which may be responsible for the observed requirement of zinc for catalytic function of the coronaviral papain-like proteases (see Supporting information Figures S24, S37, S46). 8

Conclusion.
The papain-like protease encoded by the 2019-nCoV Wuhan coronavirus is very highly homologous to bat and SARS coronaviral PL pro .We used SARS CoV PL pro crystallographic templates to prepare homology models of the 2019-nCoV PL pro in both ligand-bound and apo forms.Molecular dynamics simulations indicate most molecular motion is contained in the N-terminal Ubl and C-terminal zinc finger domains, and also support the thesis that zinc is required for structural integrity of the protease.

Figure 1 .
Figure 1.Panel A: SARS PL pro covalently bound to ubiquitin aldehyde (C-terminal RLRGGaldehyde region only shown as grey sticks) N-terminal Ubiquitin-like domain shown in magenta, Fingers domain in cyan ribbons.Panel B: Active site with catalytic triad Cys112, His273, Asp287 shown as yellow sticks.Panel C: Zinc-binding domain showing 4 coordinating cysteine residues Cys190, Cys193, Cys225, Cys227.

Figure 2 .
Figure 2. Overlay of 26 coronavirus PL pro proteins in the PDB.Panel A: 12 known SARS PL pro crystal structures.Panel B: 14 MERS structures.Zinc atoms shown as grey spheres.
The best model 1 based on a SARS template PDB:5tl6 shows very high similarity to other coronaviral papain-like proteases, with a catalytic triad composed of Cys114-His275-Asp289 and a conventional zinc-binding domain of 4 cysteine residues Cys192, 194, 227, 229.The other models align very closely to model 1 in their catalytic domains with template-derived diversity in the zinc finger and Ubl domains (See Supporting Information FiguresS15a-c).

Table 1 .
Swissmodel templates and results statistics.