On the challenge of unambiguous identification of fentanyl analogs: Exploring measurement diversity using standard reference mass spectral libraries

Fentanyl analogs are a class of designer drugs that are particularly challenging to unambiguously identify due to the mass spectral and retention time similarities of unique compounds. In this paper, we use agglomerative hierarchical clustering to explore the measurement diversity of fentanyl analogs and better understand the challenge of unambiguous identifications using analytical techniques traditionally available to drug chemists. We consider four measurements in particular: gas chromatography retention indices, electron ionization mass spectra, electrospray ionization tandem mass spectra, and direct analysis in real time mass spectra. Our analysis demonstrates how simultaneously considering data from multiple measurement techniques increases the observable measurement diversity of fentanyl analogs, which can reduce identification ambiguity. This paper further supports the use of multiple analytical techniques to identify fentanyl analogs (among other substances), as is recommended by the Scientific Working Group for the Analysis of Seized Drugs (SWGDRUG).


| INTRODUC TI ON
Fentanyl analogs have been some of the more problematic drugs of abuse this past decade [1]. Historically, authorities scheduled fentanyl analogs individually, which allowed for slightly modified and unscheduled variants to be distributed faster than they could be controlled [2]. While "blanket" scheduling of fentanyl analogs by governing bodies like the United States Drug Enforcement Administration have helped limit the rise of new modified versions [3], their high potency has allowed them to permeate the drug supply chain as components in increasingly complex samples [4], complicating both law enforcement and public health responses [1,5,6].
Underlying these societal challenges is a fundamental measurement issue-fentanyl analogs are difficult to unambiguously discriminate with analytical techniques typically found in forensic laboratories. For example, consider the near-identical centroided mass spectra of cyclopropyl fentanyl and crotonyl fentanyl, measured with electron ionization mass spectrometry (EI-MS), shown in Figure 1. Roughly speaking, in EI-MS, an analyte interacts with highenergy electrons and forms ions. The relative abundance of these ions is then reported as a function of ion mass-to-charge ratio (m/z) in a data structure referred to as a mass spectrum. Because the m/z values depend only on the molecular constitution (viz., mass) of the observed ions, the mass spectra of isomeric compounds are often very similar. Spectral differences due to molecular connectivity will be reflected through changes in signal intensity that may be subtle or even indistinguishable (see Figure 1).
Due to difficulties with EI-MS, it is commonplace to leverage additional measurements, like chromatographic retention times, or other technologies altogether to help discriminate structurally similar compounds. Examples of these measurements include a bespoke gas chromatography (GC) mass spectrometry method for discriminating synthetic opioids [8], and ultra-high performance liquid chromatography tandem mass spectrometry method for discriminating between cyclopropyl fentanyl and crotonyl fentanyl in toxicology applications [9]. The Scientific Working Group for the Analysis of Seized Drugs (SWGDRUG) currently recommends reporting results from multiple techniques when identifying drugs or substances [10].
While the solution of using multiple measurement techniques for identifying compounds is conceptually straight-forward, it is difficult to know how many techniques-and which techniques in particularare necessary to accurately identify specific compounds without first characterizing how these measurements differ across similar compounds. Therefore, understanding the measurement diversity of fentanyl analogs will help us select measurement techniques that minimize identification ambiguity.
In this paper, we define the concept of measurement diversity using agglomerative hierarchical clustering, and characterize the measurement diversity of fentanyl analogs across four common analytical measurement strategies: (a) GC retention indices (RI), (b) full scan mass spectra collected using electron ionization mass spectrometry (EI-MS), (c) precursor m/z and product ion mass spectra (MS 2 ) collected at multiple collision energies using electrospray ionization tandem mass spectrometry (ESI-MS/MS), and (d) in-source collision induced dissociation (is-CID) full scan mass spectra collected at three orifice one energies using direct analysis in real time mass spectrometry (DART-MS). Examples of each measurement type using fentanyl are presented in Figure 2. We also discuss how these four measurements can be combined to increase measurement diversity and subsequently improve our ability to unambiguously identify compounds.

| THEORY AND ME THODS
In the following sections, we first describe the mathematical foundations of clustering used to define our notion of measurement diversity (Section 2.1). Next, dissimilarity is computed for each measurement types (Section 2.2) followed by a description of the specific data collection and analysis procedure (Section 2.3) used to generate the results presented in Section 3.

F I G U R E 1
Head-to-tail display of cyclopropyl fentanyl (top/black) and crotonyl fentanyl (bottom/red) mass spectra with structures overlaid. These mass spectra were obtained from the Scientific Working Group for the Analysis of Seized Drugs mass spectral library (version 3.11) [7].

| Defining measurement diversity
Agglomerative hierarchical clustering (AHC) is an approach for constructing a tree-structure (dendrogram) that describes a hierarchical relationship between a set of objects [11,12]. This relationship is usually based on a mathematically defined measure of dissimilarity or distance between objects-an AHC algorithm receives as input a square and symmetric dissimilarity matrix that summarizes the pairwise dissimilarity between all objects in the set.
In an AHC analysis, we begin by assuming every object belongs to its own individual cluster (i.e., if there are n objects in a set, we begin with n clusters all of size 1). In each step, the least dissimilar (or most similar) clusters-which will initially be the least dissimilar individual objects-are merged to create a new cluster. This procedure continues until all objects in the set belong to a single cluster; the sequence by which clusters are merged and the dissimilarities between merged clusters are tracked throughout the process. An excellent overview of AHC algorithms and specific implementation considerations can be found in [13].
With a dendrogram, we can identify dissimilarity levels at which we can "cut" the dendrogram and evaluate the resulting clusters; the results can be interpreted through a variety of metrics. In this paper, we define a simple metric referred to as the measurement diversity index F I G U R E 2 Example measurements of fentanyl: (A) retention index from the National Institute of Standards and Technology (NIST) 23 gas chromatography methods library, (B) electron ionization (EI) mass spectrum from NIST 23 EI-MS library, (C) set of product ion mass spectra (MS 2 ) measured at multiple normalized collision energies from the NIST 23 MS/MS library, and (D) set of in-source collision induced dissociation mass spectra measured at multiple orifice one energies from the NIST direct analysis in real time mass spectrometry (DART-MS) Forensic Database (version 7-Grasshopper). For heat maps (panels C and D), point opacity represents relative peak intensity in the underlying mass spectra.
where N( ) is the number of clusters identified from a dendrogram at a specified dissimilarity level and n is the number of objects in the set. If the clustering results have a low measurement diversity (N ≪ n), it will be difficult to accurately identify individual objects. As measurement diversity increases (N → n), individual objects are easier discriminate.

| Calculating dissimilarity between measurements and clusters
As noted previously, the input to an AHC algorithm is a dissimilarity matrix that summarizes the pairwise dissimilarity between objects in a set. Thus, the first step to performing an AHC analysis is selecting a pairwise dissimilarity measure to describe the relationship between objects. In this paper, we are interested in discriminating fentanyl analogs by four analytical measurements.
For RI, the dissimilarity can be calculated using an absolute difference, where r 1 and r 2 are the two retention indices.
The similarity between any two EI full scan mass spectra can be measured using a variety of techniques [14][15][16][17][18][19][20][21][22]; one long-standing approach for approximating similarity is referred to as the dot-product, or cosine similarity. We denote the cosine similarity between mass spectra x and y, measured with a low-resolution mass spectrometer and m/z tolerance lr = 0, as x, y, lr , and approximate the dissimilarity between any mass spectra as 2 x, y, lr = 1 − x, y, lr .
Specifications for computing cosine similarity with m/z tolerance as a parameter is provided in the Supplemental Information.
We use a two-stage approach to characterize dissimilarity between fentanyl analogs using high resolution ESI-MS/MS measurements. We first consider the absolute difference between their where M i is the precursor m/z of analogs 1 and 2, respectively. If Δ ms1 > hr = 0.005, we set the dissimilarity between the analogs as 1 (i.e., the maximum possible dissimilarity). If Δ ms1 ≤ hr = 0.005, and the two analogs have at least one pair of MS 2 mass spectra that were collected at the same collision energy, we compute dissimilarity as where x and y are the sets of MS 2 mass spectra collected at E > 0 different collision energies, x i and y i are the specific mass spectra collected at the same ith collision energy, hr = 0.005 is the m/z tolerance with a high-resolution mass spectrometer, and the pairwise spectral similarity function is the same as employed with EI-MS mass spectra.
If Δ ms1 ≤ hr = 0.005, and the compared compounds do not share any MS 2 spectra at the same collision energy, then 3 (x, y) = 1.
DART-MS is an ambient ionization mass spectrometry technique that is generally not preceded by a chromatography step.
Accordingly, real-world mass spectra collected with DART-MS often contain signature ions originating from more than one compound, and spectral similarity or dissimilarity between an unknown mass spectrum and a reference spectrum of a pure compound is approximated using partial pattern matching approaches [23][24][25]. In this study, we compare pure standard library spectra to each other, which allows us to use cosine similarity (or full pattern matching) in a manner similar to Equation (3). With the DART-MS mass spectra, E = 3 represents all compounds with mass spectra measured at the exact same is-CID energies.
The second step in an AHC analysis is specifying how dissimilarity is computed between clusters, and this is independent of the method selected for measuring pairwise dissimilarities between objects (described previously). As described in [13], there are several "linkage" methods for describing the dissimilarity between clusters.
For example, we can approximate the dissimilarity between two clusters based on the average dissimilarity between all objects in clusters, commonly referred to as a group average or unweighted pair group method with arithmetic mean (UPGMA) [26]. In this study, we consider a "complete link" method which means the dissimilarity between clusters is approximated by the maximum dissimilarity between objects in the clusters, thus we guarantee that the maximum dissimilarity between any two objects in a cluster created at a specified dissimilarity cutoff level will always be bounded by the cutoff level value itself.

| Data and analysis details
The data used in this study was extracted from various NIST da- Articles describing how these measurements are recorded and how libraries/databases are constructed/evaluated can be found in the literature [30][31][32][33]. We only considered fentanyl analogs for which we had all four measurements in this study; details about the complete set of 197 fentanyl analogs are in Table S1.
Data analysis was conducted using a custom script prepared in the R programming language [34]; the underlying source code is available for review by contacting the corresponding author. A schematic overview of the analysis steps is provided as Figure 3.
To compute the measurement diversity using a single library, we followed the steps outlined in Figure 3A. For clustering, we used the hclust package available in base R with a calculated dissimilarity matrix and "complete" linkage method. Diversity calculations were done with a variety of cutoff values depending on the measurement type being considered. To compute combined diversity indices using multiple measurements, we followed the steps in Figure 3B. In this process, we need to convert traditional dissimilarity matrices into binary encoded dissimilarity matrices based on values in the original matrices. More details on these calculations are provided during the discussion of multiple measurement comparisons in Section 3.

| RE SULTS AND D ISCUSS I ON
For each of the 197 fentanyl analogs investigated, we performed four AHC analyses using the four measurement types discussed previously. Figure 4 shows parts of the dendrogram created using RI as the discriminating measurement, with an overlayed magnification of 24 analogs and a table discussing three analogs (norsufentanil, N-methyl cyclopropyl norfentanyl, and cyclopropyl norfentanyl).
From this, we can observe several groups consisting of two to four compounds with RI within 10 arbitrary units (a.u.) (e.g., cyclopropyl norfentanyl and N-methyl cyclopropyl norfentanyl) and a few compounds that would be uniquely identifiable with a 10 a.u. cutoff level for RI (e.g., Norsufentanil). Analogous dendrograms can be generated for EI-MS dissimilarity, ESI-MS/MS MS 2 dissimilarity, and DART-MS is-CID mass spectral dissimilarity (figures not shown).
We studied how the measurement diversity changed for each measurement as a function of dissimilarity cutoff level ( Figure 5).
The diversity of RI was never a perfect one, even with a dissimilarity level of 0 ( Figure 5A Figures 5A-D), we see that the measurement diversity is highest with DART-MS and ESI-MS/MS measurements, followed by EI-MS and RI measurements. This is expected since both DART-MS and ESI-MS/MS measurements contain information across multiple collision energies; lower energy spectra are dominated by peaks that provide insights about the intact molecule (e.g., molecular weight) while higher energy spectra generally have more peaks allowing us to infer potential fragmentation information. If only using single DART-MS measurements collected at a single low energy value, as is common in many forensic applications [35], measurement diversity drops to 0.5 ( Figure S1).
With EI-MS, mass spectra are collected at 70 eV and can contain little information about the intact molecular ion for analytes with strongly labile bonds like fentanyl analogs. Accordingly, several EI-MS of fentanyl analogs have indistinguishable mass spectra at the 0.1 dissimilarity threshold (see Figure 6).
As noted earlier in the paper, the limitations of EI-MS are wellestablished. And given that EI-MS is usually preceded by GC, it is prudent to consider retention times (or RI) while trying to discriminate samples. We can simulate this experience by computing a combined RI and EI-MS dissimilarity prior to an AHC analysis (see Figure 3B). Because the measurements differ in range and interpretability, it is easiest to work with binary values based on whether the individual dissimilarities are above or below specified cutoff values.
In particular, we can compute the combined dissimilarity of two fentanyl analogs using RI and EI-MS as  Data Center. Measurement diversity could vary when using targeted methods specific for drug analysis [36][37][38] or using different instrumentation with varied laboratory conditions.
In addition to the quality and comprehensiveness of the measurements, the underlying mathematical approach for approximating object and cluster dissimilarity greatly affects the computed measurement diversity. In this study, we considered traditional measures of dissimilarity for comparing RI and EI mass spectra, and extensions of traditional methods for comparing MS 2 collected with ESI-MS/MS and is-CID mass spectra collected with DART-MS. It is possible that a specialized approach for approximating the dissimilarity of fentanyl measurements will improve the computed measurement diversity and thus the ease with which compounds can be uniquely identified.
A natural extension to this study is to characterize measurement diversity while leveraging replicate measurements per fentanyl analog. Having replicate measurements allow us to use a broader variety of mathematical tools [39][40][41][42], including machine learning [43,44]. However, it is worth noting that replicate measurements with ambient ionization mass spectrometry techniques, like DART-MS, may introduce additional measurement uncertainty that hinder its discriminatory power [42]. Furthermore, it would be fruitful to evaluate measurement diversity for other classes of drugs [45] and using measurements beyond RI and mass spectra.

F I G U R E 6
Sparkline-style mass spectra of fentanyl analogs with indistinguishable electron ionization mass spectrometry measurements at a dissimilarity cut off level of 0.1 (or similarity level of 0.9). For all plots, the x-axis range is between m/z 40 and m/z 320, the y-axis is relative intensity with a range between 0 and 1, and the base peak occurs at m/z 245. Label numbers correspond with IDs in Table S1.

| CON CLUS IONS
In this paper, we tried to better understand the challenge of unambiguous identification of fentanyl analogs by exploring the measurement diversity of these substances in the NIST mass spectral libraries. We defined measurement diversity through the results of AHC and were able to identify that these analytes were most diverse when measured with DART-MS. Measurement diversity improved by combining multiple measurements, but still five pairs of compounds (mostly positional isomers) were indistinguishable with the standard measurements in NIST libraries. Designing custom/targeted methods or new mathematical approaches to characterize pairwise dissimilarity-including leveraging replicate measurements-are potentially fruitful approaches that will improve our ability to identify fentanyl analogs.

ACK N OWLED G M ENTS
The authors would like to thank Dr. Gary Mallard (NIST) for sharing his insights about RI and the general challenges with compound identification using mass spectrometry.

F I G U R E 7
Pairs of fentanyl analogs that were indistinguishable using measurements from the National Institute of Standards and Technology libraries and discrimination requirements of (i) absolute retention index difference of 10 units, (ii) electron ionization mass spectrometry dissimilarity of 0.1, (iii) direct analysis in real time mass spectrometry dissimilarity of 0.1 a.u., and (iv) electrospray ionization tandem mass spectrometry (ESI-MS/ MS) dissimilarity of 0.1. Pairs 154/174 and 181/182 (denoted by an *) are distinguishable using ESI-MS/MS but not the other three techniques. ID numbers correspond with information provided in Table S1.

CO N FLI C T O F I NTE R E S T S TATE M E NT
The authors have no conflicts of interest to declare.

D I SCL A I M ER
Official contribution of the National Institute of Standards and Technology (NIST); not subject to copyright in the United States.
Certain commercial products are identified in order to adequately specify the procedure; this does not imply endorsement or recommendation by NIST, nor does it imply that such products are necessarily the best available for the purpose.