A Robust Parallel Computing Data Extraction Framework for Nanopore Experiments

27 November 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The success of a nanopore experiment relies not only on the quality of the experimental design but also on the performance of the analysis program utilized to decipher the ionic perturbations necessary for understanding the fundamental molecular intricacies. We have developed a data extraction and analysis framework that leverages parallel computing, efficient memory management (minimizing data aggregation), and vectorization, yielding significant performance enhancement. The open-seek-read-close data loading architecture running on multiple cores underpins the swift analysis of large files where an ~ ×18 improvement was found for a 100-minute-long file (~4.5 GB in size) compared to the more traditional single (cell) array data loading method. The proposed application was benchmarked against five other analysis platforms showcasing significant performance enhancement (>×6 to ×1120). The integrated provisions for batch analysis enable concurrently analyzing multiple files, a crucial capability notably absent in most existing analysis platforms. The batch-analysis feature is particularly vital for high-bandwidth experiments, wherein data is distributed across several files rather than consolidated into a single large file. Furthermore, the application is equipped with multi-level data fitting based on abrupt changes in the waveform. The ability to condense the extracted events to a single file improves data portability (e.g., 16 GB file acquired at 200 kHz with 28,182 events reduces to 47.9 MB in size—343× reduction in size) and enable a multitude of post-analysis extraction to be done efficiently. In summary, the utilization of parallel computing, efficient memory management, and vectorized operations have led to a fast analysis platform that delivers significant performance enhancement, making it well-suited for multiple and sizeable nanopore data file analysis.

Keywords

Nanopore
Fast-Analysis
Parallel-Analysis

Supplementary materials

Title
Description
Actions
Title
NanoPlex Supplementary Material
Description
This folder contains the Matlab codes/functions and the video tutorial of the NanoPlex application
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.