Abstract
Processing high dimensional and complex monoclonal antibody (mAb) bioprocess data in industry is now more efficient due to conversational AI. The human in the loop approach to Large Language Model (LLM) inferencing with document retrieval and chained outputs is a probable benefit to existing biotechnology workflows. Potential risks of using natural language processing are minimized due to the utility of solving problems with vast amounts of structured and unstructured mixed data that can be verified by the Human-AI team. This novel work demonstrates o1-preview, ChatGPT-4o, L3.1-405B, and 3.5 Sonnet models’ fast and stateof-the-art solutions. In specific, o1-preview provided a response to 16 papers 110x faster than the manuscript author’s time after the number of words were set equal. In addition, ChatGPT-4o was 371x faster than an optimal human researcher to examine and provide an estimate regarding dimension reduction or combinatorial optimization for a recent paper by Kao, M., et al. The third LLM speed advantage of 336x by ChatGPT-4o vs. the manuscript author was achieved using monte carlo simulations and markov chain models performance forecasts and a current paper by Konoike, F., et al. Part A featured the individual analysis of 5 recent mAb production papers, which emphasized the proficiency of o1-preview (9.9/10.0), ChatGPT-4o (9.2), and L3.1-405B (9.2) providing a forecast report. Example generations for o1-preview and L3.1-405B typically established connections between using dimension reduction or combinatorial optimization and improving bioprocesses. Part B models generated tables regarding how LLMs can improve numerical data from 5 different papers using monte carlo simulations or markov chain models. An example from ChatGPT-4o (9.0) was substantially more complete, accurate, and convincing than the table provided 3.5 Sonnet (8.0). Part C utilized the report format from Part A combined with the numerical approach from Part B across 6 additional papers, led by o1-preview (9.0) and ChatGPT-4o (8.5). The o1-preview example followed the prompt format well, citing cases of how LLMs will utilize reinforcement learning and bayesian optimization to improve mAb production. The work represents a standard for utilizing a considerable amount of bioprocess data to forecast new results, with the transition into LLMs providing near-real-time production data analysis aided by document retrieval to provide a synergistic effect with existing machine learning techniques.
Supplementary materials
Title
Monoclonal Antibody SA1 Supplementary
Description
The supplementary file contains Part A1: 28 Single prompt generations for 4 models across 5 papers.
Actions
Title
Monoclonal Antibody SA2 Supplementary
Description
The supplementary file contains Part A2: 4 Single prompt generations for 4 models regarding a single 5 paper summary.
Actions
Title
Monoclonal Antibody SOH Supplementary
Description
The supplementary file contains Part OH: 3 Single prompt generations for 1 model regarding a single paper.
Actions
Title
Monoclonal Antibody SB1 Supplementary
Description
The supplementary file contains Part B1: 28 Single prompt generations for 4 models across 5 papers.
Actions
Title
Monoclonal Antibody SB2 Supplementary
Description
The supplementary file contains Part B2: 4 Single prompt generations for 4 models regarding a single 5 paper summary.
Actions
Title
Monoclonal Antibody SC1 Supplementary
Description
The supplementary file contains Part C1: 32 Single prompt generations for 4 models across 6 papers.
Actions
Title
Monoclonal Antibody SC2 Supplementary
Description
The supplementary file contains Part C2: 4 Single prompt generations for 4 models regarding a single 6 paper summary.
Actions