Combining structural and bioactivity-based fingerprints improves prediction performance and scaffold-hopping capability

This study aims at improving upon existing activity predictions methods by augmenting chemical structure fingerprints with bio-activity based fingerprints derived from high-throughput screening (HTS) data (HTSFPs). The HTSFPs were generated from HTS data obtained from PubChem and combined with an ECFP4 structural fingerprint. The combined experimental and structural fingerprint (CESFP) was benchmarked against the individual ECFP4 and HTSFP fingerprints. Results showed that the CESFP has improved predictive performance as well as scaffold hopping capability. The CESFP identified unique compounds compared to both the ECFP4 and the HTSFP fingerprint indicating synergistic effects between the two fingerprints. A feature importance analysis showed that a small subset of the HTSFP features contribute most to the overall performance of the CESFP. This combined approach allows for activity prediction of compounds with only sparse HTSFPs due to the supporting effect from the structural fingerprint.