Abstract
Fragment hits serve as starting points for lead-like compound development. A common approach to advancing from fragments is to build structure-activity relationships (SARs) from close analogues. One strategy involves performing automated chemistry around fragment hits and evaluating resulting crude reaction mixtures (CRMs) of analogues in assays, bypassing costly purification. However, these purification-agnostic workflows are often perceived as noisy, and therefore typically involve additional hit resynthesis for confirmation, potentially discarding false negatives and reducing SAR dataset size. High-throughput (HT) X-ray crystallography has the potential to address these issues by unambiguously resolving hits directly from 100s–1000s of CRMs. However, no systematic analytics exist for extracting SAR models from HT crystallographic evaluation of CRMs.
We demonstrate that crystallographic SAR (xSAR) can be extracted from CRMs evaluated via HT X-ray crystallography. We present here a simple rule-based ligand scoring scheme that identifies conserved chemical features linked to binding and non-binding observations in crystallography. Applied to a large-scale crystallographic dataset of 957 fragment elaborations in CRMs targeting PHIP(2), a therapeutically relevant bromodomain, our xSAR model demonstrated effectiveness in two proof-of-concept experiments. First, it recovered 26 missed binders in the initial dataset, doubling the hit rate and denoising the dataset. Second, it enabled a prospective virtual screen, also leveraging previously resolved cocrystal structures, that identified novel hits with informative chemistries, achieving up to a 10-fold binding affinity improvement over the repurified hit from the initial CRM evaluation.
This work establishes a proof-of-concept that xSAR models can be directly extracted from large-scale crystallographic readouts of CRMs, offering a valuable methodology to build SAR models and accelerate design-make-test iterations without requiring CRM hit resynthesis and confirmation. This invites future work to utilise advanced analytics and modelling techniques to further strengthen purification-agnostic workflows.
Supplementary materials
Title
Supplementary Information: Structure-Activity Relationships can be directly extracted from high-throughput crystallographic evaluation of fragment elaborations in crude reaction mixtures
Description
This Supplementary Information provides extended data supporting the development and validation of the xSAR model. It includes definitions of binding feature classifications and score distributions (PBS/NBS), chemical space projections, and Venn diagrams for feature overlap. Compound structures from the Retrospective-97 rescreening and Prospective-93 virtual screening sets are provided, alongside crystallographic and kinetic binding results. Benchmarking analyses compare xSAR scores to traditional similarity and machine learning methods. Detailed mathematical formalisms are included for conservation score weighting, bit classification, and ensemble scoring. Supplementary methods cover model training, statistical evaluation, and references.
Actions
Supplementary weblinks
Title
xSAR GitHub repository: xSAR dataset and source code
Description
This repository supports the xSAR modelling for deriving structure-activity relationships directly from high-throughput crystallographic screening of fragment elaborations in crude reaction mixtures (CRMs). It includes raw and corrected binding annotations for 957 compounds, retrospective validation on 97 resynthesized compounds, benchmarking datasets, virtual screening predictions, and kinetic assay results for 93 selected hits. Python scripts and Jupyter notebooks are provided for computing PBS and NBS scores, benchmarking ligand-based classifiers, and guiding follow-up prioritisation.
Actions
View