Analysis of nanopore data: classification strategies for an unbiased curation of single-molecule events from DNA nanostructures

12 April 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Nanopores are versatile single-molecule sensors that are being used to sense increasingly complex mixtures of structured molecules, with applications in molecular data storage and disease biomarker detection. However, increased molecular complexity presents additional challenges to the analysis of nanopore data including more translocation events being rejected for not matching an expected signal structure and a greater risk of selection bias entering this event curation process. To highlight these challenges, here we present the analysis of a model molecular system consisting of a nanostructured DNA molecule attached to a linear DNA carrier. We make use of recent advances in the event segmentation capabilities of Nanolyzer, a graphical analysis tool provided for nanopore event fitting, and describe approaches to event substructure analysis. In the process, we identify and discuss important sources of selection bias in that emerge in the analysis of this molecular system and consider the complicating effects of molecular conformation and variable experimental conditions (e.g. pore diameter). We then present additional refinements to existing analysis techniques, allowing for improved separation of multiplexed samples, fewer translocation events rejected as false negatives, and a wider range of experimental conditions for which accurate molecular information can be extracted. Increasing the coverage of analyzed events within nanopore data is not only important for characterizing complex molecular samples with high fidelity, but is also becoming essential to the generation of accurate, unbiased training data as machine learning approaches to data analysis and event identification continue to increase in prevalence.

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
Section S1: Configuration Details of Nanolyzer Analysis Section S2: List of Metadata Used in Nanolyzer Data Manager (“Data Dictionary”) Section S3: Selection Filters Used – List of SQL Queries Section S4: Removing Type 0 Events before Sublevel Clustering Section S5: First Ten Events from Each Category of Table 1 / Figure 4 Section S6: Comparing ECD Distributions of Type 0 vs. Type 1/2 Events Section S7: 1D Histograms of ECD/TrECD for Events of Figure 6 Section S8: Control – Two Carrier + Star Pairs Run Separately and Together on a Single Pore Section S9: Sample Events of 2 kbp + 6/12-arm stars through Smaller (8-nm) Pore Section S10: Filters A & B Applied to 2 kbp + 6/12-Arm Stars through Larger (12-nm) Pore Section S11: Rigid Sorting Approach on Events from 2 kbp + 6/12-Arm Stars in Figure 7
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.