Filter feature selection for unsupervised clustering of designer drugs using DFT simulated IR spectra data

07 September 2021, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The rapid emergence of novel psychoactive substances (NPS) poses new challenges and requirements for forensic testing/analysis techniques. This paper aims to explore the application of unsupervised clustering of NPS compounds' infrared spectra. Two statistical measures, Pearson and Spearman, were used to quantify the spectral similarity and to generate the affinity matrices for hierarchical clustering. The correspondence of spectral similarity clustering trees to the commonly used structural/pharmacological categorization was evaluated and compared to the clustering generated using 2D/3D molecular fingerprints. Hybrid model feature selections were applied using different filter-based feature ranking algorithms developed for unsupervised clustering tasks. Since Spearman tends to overestimate the spectral similarity based on the overall pattern of the full spectrum, the clustering result shows the highest degree of improvement from having the non-discriminative features removed. The loading plots of the first two principal components (PCs) of the optimal feature subsets confirmed that the most important vibrational bands contributing to the clustering of NPS compounds were selected using NDFS feature selection algorithms.

Keywords

IR spectra classification
Designer drugs
Machine learning
Feature selection
Feature reduction
Pattern recognition
Hierarchical clustering

Supplementary materials

Title
Description
Actions
Title
IR Manuscript - Supporting
Description
Github repository link, other supporting information.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.