Despite its widespread use in chemical discovery, approximate density functional theory (DFT) is poorly suited to many targets, such as those containing open-shell, 3d transition metals that can be expected to have strong multi-reference (MR) character. For discovery workflows to be predictive, we need automated, low-cost methods that can distinguish the regions of chemical space where DFT should be applied from those where it should not. We curate over 4,800 open-shell transition-metal complexes up to hundreds of atoms in size from prior high-throughput DFT studies and evaluate affordable, finite-temperature DFT evaluation of fractional occupation number (FON)-based MR diagnostics. We show that intuitive measures of strong correlation (i.e., the HOMO–LUMO gap) are not predictive of MR character as judged by FON-based diagnostics. Analysis of independently trained machine learning (ML) models to predict HOMO–LUMO gaps and FON-based diagnostics reveals differences in metal- and ligand-sensitivity of the two quantities. We use our trained ML models to rapidly evaluate MR character over a space of ca. 187,000 theoretical complexes, identifying large-scale trends in spin-state-dependent MR character and finding small HOMO–LUMO gap complexes while ensuring low MR character.