Force fields form the basis for classical molecular simulations and their accuracy is crucial for the quality of, for instance, protein-ligand binding simulations in drug discovery. The huge diversity of small molecule chemistry makes it a challenge to build and parameterize a suitable force field. The Open Force Field Initiative is a combined industry and academic consortium developing a state-of-the-art small molecule force field. In this report industry members of the consortium worked together to objectively evaluate the performance of the force fields (referred to here as OpenFF) produced by the initiative on a combined public and proprietary dataset of 19,653 relevant molecules selected from their internal research and compound collections. This evaluation was important because it was completely blind; at most partners, none of the molecules or data were used in force field development or testing prior to this work. We compare the Open Force Field "Sage" version 2.0.0 and "Parsley" version 1.3.0 with GAFF-2.11-AM1BCC, OPLS4 and SMIRNOFF99Frosst. We analyzed force field-optimized geometries and conformer energies compared to reference quantum mechanical data. We show that OPLS4 performs best, and the latest Open Force Field release shows a clear improvement compared to its predecessors. The performance of established force fields such as GAFF-2.11 was generally worse. While OpenFF researchers were involved in building the benchmarking infrastructure used in this work, benchmarking was done entirely in-house within industrial organizations and the resulting assessment is reported here. This work assesses the force field performance using separate benchmarking steps, external datasets, and involving external research groups. This effort may also be unique in terms of the number of different industrial partners involved, with 10 different companies participating in the benchmark efforts.
The supplementary Information contains (1) equations used to compute ddE energies; (2) tables with the number of molecules selected by each industry partner and optimized with QM and MM for the public and the proprietary dataset; (3) table with outliers of the public and proprietary datasets; (4) plots comparing OPLS4 using both ffld_server and macromodel obtained with compare-forcefields and the conformer matching process match-minima; (5) table with mean ddE and RMSD values of charged and neutral molecules and corresponding scatter plots for charged and neutral molecules, (6) molecular fragments of the Roche dataset containing concerning torsions not shown in the main text, (7) code to extract optimized records from QCArchive for the public datasets hosted there.