Materials Science Optimization Benchmark Dataset for Multi-fidelity Hard-sphere Packing Simulations

Sterling G. Baird; Taylor D. Sparks

doi:10.26434/chemrxiv-2023-fjjk7

Materials Science

Search within Materials Science

Materials Science Optimization Benchmark Dataset for Multi-fidelity Hard-sphere Packing Simulations

09 January 2023, Version 1

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Benchmarks are an essential driver of progress in scientific disciplines. Ideal benchmarks mimic real-world tasks as closely as possible, where insufficient difficulty or applicability can stunt growth in the field. Benchmarks should also have sufficiently low computational overhead to promote accessibility and repeatability. The goal is then to win a “Turing test” of sorts by creating a surrogate model that is indistinguishable from the ground truth observation (at least within the dataset bounds that were explored), necessitating a large amount of data. In materials science and chemistry, industry-relevant optimization tasks are often hierarchical, noisy, multi-fidelity, multi-objective, high-dimensional, and non-linearly correlated while exhibiting mixed numerical and categorical variables subject to linear and non-linear constraints. To complicate matters, unexpected, failed simulation or experimental regions may be present in the search space. In this study, 438371 random hard-sphere packing simulations representing 279 CPU days’ worth of computational overhead were performed across nine input parameters with linear constraints and two discrete fidelities each with continuous fidelity parameters and results were logged to a free-tier shared MongoDB Atlas database. Two core tabular datasets resulted from this study: 1. a failure probability dataset containing unique input parameter sets and the estimated probabilities that the simulation will fail at each of the two steps, and 2. a regression dataset mapping input parameter sets (including repeats) to particle packing fractions and computational runtimes for each of the two steps. These two datasets can be used to create a surrogate model as close as possible to running the actual simulations by incorporating simulation failure and heteroskedastic noise. For the regression dataset, percentile ranks were computed within each of the groups of identical parameter sets to enable capturing heteroskedastic noise. This contrasts with a more traditional approach that imposes a-priori assumptions such as Gaussian noise, e.g., by providing a mean and standard deviation. A similar approach can be applied to other benchmark datasets to bridge the gap between optimization benchmarks with low computational overhead and realistically complex, real-world optimization scenarios.

Keywords

adaptive design

matsci-opt-benchmarks

Supplementary weblinks

Title

Description

Actions

Title

Materials Science Optimization Benchmarks

Description

A collection of benchmarking problems and datasets for testing the performance of advanced optimization algorithms in the field of materials science and chemistry.

Actions

View

Title

Materials Science Optimization Benchmarks (v0.1.0 Zenodo Snapshot)

Description

A collection of benchmarking problems and datasets for testing the performance of advanced optimization algorithms in the field of materials science and chemistry.

Actions

View

Title

Materials Science Optimization Benchmark Dataset for Multi-fidelity Hard-sphere Packing Simulations

Description

Two core tabular datasets resulted from this study: 1. a failure probability dataset containing unique input parameter sets and the estimated probabilities that the simulation will fail at each of the two steps, and 2. a regression dataset mapping input parameter sets (including repeats) to particle packing fractions and computational runtimes for each of the two steps.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Mar 08, 2023 Version 2

Jan 09, 2023 Version 1

Metrics

1,010

697

Views

Downloads

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2023-fjjk7

Funding

Division of Materials Research

DMR-1651668

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Materials Science Optimization Benchmark Dataset for Multi-fidelity Hard-sphere Packing Simulations

Authors

Abstract

Keywords

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share