Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design

05 October 2021, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Peptide nucleic acids (PNAs) are potential antisense therapies for genetic, acquired, and viral diseases. Efficiently selecting candidate PNA sequences for synthesis and evaluation from a genome containing hundreds to thousands of options can be challenging. To facilitate this process, we leverage here machine learning (ML) algorithms and automated synthesis technology to predict PNA synthesis efficiency and guide rational PNA sequence design. The training data was collected from individual fluorenylmethyloxycarbonyl (Fmoc) deprotection reactions performed on a fully automated PNA synthesizer. Our optimized ML model allows for 93% prediction accuracy and 0.97 Pearson’s r. The predicted synthesis scores were validated to be correlated with the experimental HPLC crude purities (correlation coefficient R2 = 0.95). Furthermore, we demonstrated a general applicability of ML through designing synthetically accessible antisense PNA sequences from 102,315 predicted candidates targeting exon 44 of the human dystrophin gene, SARS-CoV-2, HIV, as well as selected genes associated with cardiovascular diseases, type II diabetes, and various cancers. Collectively, ML provides an accurate prediction of PNA synthesis quality and serves as a useful computational tool for rational PNA sequence design.

Keywords

Machine learning
Automated flow synthesis
Peptide nucleic acid
Synthesis optimization
Sequence design

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
Materials and Methods, Supplementary Text, Figures S1 to S5, Table S1 to S2, Synthetic UV-vis Traces, HPLC, LC-MS traces
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.