Classifying the toxicity of pesticides to honey bees via support vector machines with random walk graph kernels

Ping Yang; E. Adrian Henle; Xiaoli Fern; Cory M. Simon

doi:10.26434/chemrxiv-2022-q5zgx-v4

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Classifying the toxicity of pesticides to honey bees via support vector machines with random walk graph kernels

23 May 2022, Version 4

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Pesticides benefit agriculture by increasing crop yield, quality, and security. However, pesticides may inadvertently harm bees, which are valuable as pollinators. Thus, candidate pesticides in development pipelines must be assessed for toxicity to bees. Leveraging a data set of 382 molecules with toxicity labels from honey bee exposure experiments, we train a support vector machine (SVM) to predict the toxicity of pesticides to honey bees. We compare two representations of the pesticide molecules: (i) a random walk feature vector listing counts of length-L walks on the molecular graph with each vertex- and edge-label sequence and (ii) the MACCS structural key fingerprint (FP), a bit vector indicating the presence/absence of a list of pre-defined subgraph patterns in the molecular graph. We explicitly construct the MACCS FPs, but rely on the fixed-length-L random walk graph kernel (RWGK) in place of the dot product for the random walk representation. The L-RWGK-SVM achieves an accuracy, precision, recall, and F1 score (mean over 2000 runs) of 0.81, 0.68, 0.71, and 0.69 on the test data set---with L=4 the mode optimal walk length. The MACCS-FP-SVM performs on par/marginally better than the L-RWGK-SVM, lends more interpretability, but varies more in performance. We interpret the MACCS-FP-SVM by illuminating which subgraph patterns in the molecules tend to strongly push them towards the toxic/non-toxic side of the separating hyperplane.

Keywords

random walk graph kernels

graph kernels

toxicity prediction

pesticide toxicity to honey bees

Supplementary weblinks

Title

Description

Actions

Title

code to reproduce

Description

Julia code to reproduce

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

Classifying the toxicity of pesticides to honey bees via support vector machines with random walk graph kernels

Ping Yang, E. Adrian Henle, Xiaoli Z. Fern, Cory M. Simon journal article

The Journal of Chemical Physics , Volume 157, Issue 3

Print publication date: Jul 21, 2022

Version History

May 23, 2022 Version 4

Apr 05, 2022 Version 3

Mar 10, 2022 Version 2

Mar 09, 2022 Version 1

Version Notes

We explained and used the classical MACCS structural key fingerprint as a baseline representation for the pesticide molecules. We compare and contrast the random walk feature vector with the MACCS fingerprint—both intuitively and empirically. We interpret an SVM based on the MACCS fingerprints by illuminating which molecular subgraph patterns tend to push pesticides towards the toxic/non-toxic side of the separating hyperplane of the SVM. To adopt a more practical machine learning procedure, we now treat the random walk length L as a hyperparameter to be tuned during each train-test run, as opposed to our previous setup where we a priori specified L. We added two new illustrations to clarify the construction and meaning of the random walk feature vector.

Metrics

2,306

921

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2022-q5zgx-v4

Funding

National Science Foundation

1920945

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) declare that they have sought and gained approval from the relevant ethics committee/IRB for this research and its publication.

Classifying the toxicity of pesticides to honey bees via support vector machines with random walk graph kernels

Authors

Abstract

Keywords

Supplementary weblinks

Comments

Now Published

Version History

Version Notes

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share