Fingerprints (FPs) are the most common small molecule representation in cheminformatics. There are a wide variety of fingerprints, and the Extended Connectivity Fingerprint (ECFP) is one of the best-suited for general applications. Despite the overall FP abundance, only a few FPs represent the 3D structure of the molecule, and hardly any encode protein-ligand interactions. Here, we present a Protein-Ligand Extended Connectivity (PLEC) fingerprint that implicitly encodes protein-ligand interactions by pairing the ECFP environments from the ligand and the protein. PLEC fingerprints were used to construct different machine learning (ML) models tailored for predicting protein-ligand affinities (pKi/d). Even the simplest linear model built on the PLEC fingerprint achieved Rp=0.83 on the PDBbind v2016 "core set”, demonstrating its descriptive power. The PLEC fingerprint has been implemented in the Open Drug Discovery Toolkit (https://github.com/oddt/oddt).
supplementary table 1
supplementary table 2