Abstract
In drug discovery, the reliability of compound screening based on manual assessments is compromised by potential bias, while existing methods lack robust risk control measures. To address these challenges, we introduced conformal selection as an enhanced approach to optimize the compound screening process with balanced risks and benefits. Leveraging conformal inference, our approach constructs p-values for each candidate molecule to quantify statistical evidence for selection. The final selection of molecules is determined by comparing these p-values against thresholds derived from multiple testing principles. Our approach offers rigorous control over the false discovery rate, ensuring validity independent of dataset size and requiring minimal assumptions. By avoiding the estimation of prediction errors required in previous approaches, our method achieves higher accuracy (power), thereby improving the ability to identify promising candidates. Furthermore, our method demonstrates superior computational efficiency. We validate these advantages through numerical simulations on real-world datasets.