Automatic Cavity Identification and Decomposition into Subpockets with CAVIAR
Preprints are manuscripts made publicly available before they have been submitted for formal peer review and publication. They might contain new research findings or data. Preprints can be a draft or final version of an author's research but must not have been accepted for publication at the time of submission.
Motivation. The detection of small molecules binding sites in proteins is central to structure-based drug design and chemical biology. Many tools were developed in the last 40 years, but few of them are still available in 2020, open-source, and suitable for the analysis of large databases or for the integration in automatic workflows. No software can characterize subpockets solely with the information of the protein structure, a pivotal concept in fragment-based drug design.
Results. CAVIAR is a new open source tool for protein cavity identification and rationalization, supporting PDB and mmCIF files as well as DCD trajectories from molecular dynamics simulations. The protein structure serves as input for automatic cavity detection and computation of properties, including ligandability. A subcavity segmentation algorithm decomposes binding sites into subpockets without requiring the presence of a ligand. The defined subpockets mimick the empirical definitions of subpockets in medicinal chemistry projects. A tool like CAVIAR may be valuable to support chemical biology, medicinal chemistry and ligand identification efforts. Our analysis of the PDB shows that liganded cavities tend to be bigger, more hydrophobic and more complex than apo cavities. Moreover, in line with the paradigm of fragment-based drug design, the binding affinity scales relatively well with the number of subcavities filled by the ligand. Compounds binding to more than three subcavities are mostly in the nanomolar or better range of affinities to their target.
Availability and implementation. Installation notes, user manual and support for CAVIAR are available at https://jr-marchand.github.io/caviar/. The CAVIAR GUI and CAVIAR command line tool are available on GitHub at https://github.com/jr-marchand/caviar and a conda package is hosted on Anaconda cloud at https://anaconda.org/jr-marchand/caviar. The software suite is free and all of the source code is available under a permissive MIT license. The lists of PDB files used for validation, as well as the results of subpocket decomposition with CAVIAR and DoGSite are hosted on GitHub at https://github.com/jr-marchand/caviar/tree/master/validation_sets.