Predicting emerging chemical content in consumer products using machine learning

Luka Lila Thornton; David Carlson; Mark Wiesner

doi:10.26434/chemrxiv-2021-96wvd

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Predicting emerging chemical content in consumer products using machine learning

21 December 2021, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Chemical ingredients in consumer products are continually changing. To understand our exposure to chemicals and their consequent risk, we need to know their concentrations in products, or chemical weight fractions. Unfortunately, manufacturers rarely report comprehensive weight fraction data on product labels. The goal of this study was to evaluate the utility of machine learning strategies for predicting weight fractions when chemical constituent data are limited. A “data-poor” framework was developed and tested using a small dataset on consumer products containing engineered nanomaterials to represent emerging substances. A second, more traditional framework was applied to a “data-rich” product dataset comprised of bulk-scale organic chemicals for comparison purposes. Feature variables included chemical properties, functional use categories (e.g., antimicrobial), product categories (e.g., makeup), product matrix categories, and whether weight fractions were manufacturer-reported or experimentally obtained. Classification into three weight fraction bins was done using a random forest or nonlinear support vector classifier. An ablation study revealed that functional use data improved predictive performance when included alongside chemical property data, suggesting the utility of functional use categories in evaluating the safety and sustainability of emerging chemicals. Models could roughly stratify material-product observations into order of magnitude weight fractions with moderate success; the best of these achieved an average balanced accuracy of 73% on the nanomaterials product data. Framework comparisons also revealed a positive trend in sample size versus average balanced accuracy, suggesting great promise for machine learning approaches with continued investment in chemical data collection.

Keywords

Exposure modeling

Chemical function

Nanomaterials

Artificial intelligence

Machine learning

Environmental exposure

Consumer product safety

Supplementary materials

Title

Description

Actions

Title

Predicting emerging chemical content in consumer products using machine learning: Supporting information

Description

Supporting information for the article, "Predicting Emerging Chemical Content in Consumer Products Using Machine Learning," including hyperlinks for accessing the data and machine learning code repository and descriptions of more in-depth data curation, optional data augmentation steps and additional statistical test results.

Actions

Supplementary weblinks

Title

Description

Actions

Title

Predicting emerging chemical content in consumer products using machine learning: Data and code repository

Description

A GitHub repository containing data, machine learning code, and instructions for reproducing the development environment for "Predicting Emerging Chemical Content in Consumer Products Using Machine Learning" (see README).

Actions

View

Title

Predicting emerging chemical content in consumer products using machine learning: Online code interface

Description

Reproduce the machine learning and data analysis code from the article in your browser with a single click. The Binder URL launches code in interactive iPython notebooks self-contained in a virtual, executable environment.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

Predicting emerging chemical content in consumer products using machine learning

Luka Lila Thornton, David E. Carlson, Mark R. Wiesner journal article

Science of The Total Environment , Volume 834

Print publication date: Aug, 2022

Version History

Dec 21, 2021 Version 1

Metrics

911

240

Views

Downloads

Citations

License

The content is available under CC BY NC 4.0

DOI

10.26434/chemrxiv-2021-96wvd

Funding

National Science Foundation

DGE-2022040

National Science Foundation, Environmental Protection Agency

EF-0830093

National Science Foundation, Environmental Protection Agency

DBI-1266252

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Predicting emerging chemical content in consumer products using machine learning

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Now Published

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share