Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design

Chengxi Li; Genwei Zhang; Somesh Mohapatra; Alex Callahan; Andrei Loas; Rafael Gomez-Bombarelli; Bradley Pentelute

doi:10.26434/chemrxiv-2021-dr3mf

Organic Chemistry

Search within Organic Chemistry

Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design

05 October 2021, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Peptide nucleic acids (PNAs) are potential antisense therapies for genetic, acquired, and viral diseases. Efficiently selecting candidate PNA sequences for synthesis and evaluation from a genome containing hundreds to thousands of options can be challenging. To facilitate this process, we leverage here machine learning (ML) algorithms and automated synthesis technology to predict PNA synthesis efficiency and guide rational PNA sequence design. The training data was collected from individual fluorenylmethyloxycarbonyl (Fmoc) deprotection reactions performed on a fully automated PNA synthesizer. Our optimized ML model allows for 93% prediction accuracy and 0.97 Pearson’s r. The predicted synthesis scores were validated to be correlated with the experimental HPLC crude purities (correlation coefficient R2 = 0.95). Furthermore, we demonstrated a general applicability of ML through designing synthetically accessible antisense PNA sequences from 102,315 predicted candidates targeting exon 44 of the human dystrophin gene, SARS-CoV-2, HIV, as well as selected genes associated with cardiovascular diseases, type II diabetes, and various cancers. Collectively, ML provides an accurate prediction of PNA synthesis quality and serves as a useful computational tool for rational PNA sequence design.

Keywords

Machine learning

Automated flow synthesis

Peptide nucleic acid

Synthesis optimization

Sequence design

Supplementary materials

Title

Description

Actions

Title

Supporting Information

Description

Materials and Methods, Supplementary Text, Figures S1 to S5, Table S1 to S2, Synthetic UV-vis Traces, HPLC, LC-MS traces

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design

Chengxi Li, Genwei Zhang, Somesh Mohapatra, Alex J. Callahan, Andrei Loas, Rafael Gómez‐Bombarelli, Bradley L. Pentelute journal article

Advanced Science , Volume 9, Issue 34

Online publication date: Oct 21, 2022

Version History

Oct 05, 2021 Version 1

Metrics

1,869

842

Views

Downloads

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2021-dr3mf

Funding

Novo Nordisk

MIT Koch Institute School of Science Fellowship in Cancer Research

MIT-Takeda Fellowship program

Author’s competing interest statement

Bradley Pentelute is a co-founder of Amide Technologies and Resolute Bio. Both companies focus on developing protein and peptide therapeutics.

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design

Authors

Abstract

Keywords

Supplementary materials

Comments

Now Published

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share