Theoretical and Computational Chemistry

Classifying the toxicity of pesticides to honey bees via support vector machines with random walk graph kernels



Pesticides benefit agriculture by increasing crop yield, quality, and security. However, pesticides may inadvertently harm bees, which are agriculturally and ecologically vital as pollinators. The development of new pesticides---driven by pest resistance to and demands to reduce negative environmental impacts of incumbent pesticides---necessitates assessments of pesticide toxicity to bees. We leverage a data set of 382 molecules labeled from honey bee toxicity experiments to train a classifier that predicts the toxicity of a new pesticide molecule to honey bees. Traditionally, the first step of a molecular machine learning task is to explicitly convert molecules into feature vector representations for input to the classifier. Instead, we (i) adopt the fixed-length random walk graph kernel to express the similarity between any two molecular graphs and (ii) use the kernel trick to train a support vector machine (SVM) to classify the bee toxicity of pesticides represented as molecular graphs. We assess the performance of the graph-kernel-SVM classifier under different walk lengths used to describe the molecular graphs. The optimal classifier, with walk length 4, achieves a (mean over 100 runs) accuracy, precision, recall, and F1 score of 0.82, 0.69, 0.74, and 0.71 on the test data set.

Version notes

use F1 score to select optimal model in cross-validation procedure. report F1 score. this is more standard than precision * recall and it varies between 0, 1.


Thumbnail image of bee_tox_rwgk_svm.pdf

Supplementary weblinks