Text and Network-Mining for COVID-19 Intervention Studies

06 May 2020, Version 2
This content is a preprint and has not undergone peer review at the time of posting.


Background: The COVID-19 pandemic has led to a massive and collective pursuit by the research community to find effective diagnostics, drugs and vaccines The large and growing body of literature present in MEDLINE and other online resources including various self-archive sites are invaluable for these efforts. MEDLINE has more than 30 million abstracts and an additional corpus related to COVID-19, SARS and MERS has more than 40,000 literature articles, and these numbers are growing. Automated extraction of useful information from literature and automated generation of novel insights is crucial for accelerated discovery of drug/vaccine targets and re-purposing drug candidates.

Methods: We applied text-mining on MEDLINE abstracts and the CORD-19 corpus to extract a rich set of pair-wise correlations between various biomedical entities. We built a comprehensive pair-wise entity association network involving 15 different entity types using both text-mined associations as well as novel associations obtained using link prediction. The resulting network, which we call CoNetz, also contains a specialized COVID-19 subnetwork that provides a network view of COVID-19 related literature. Additionally, we developed a set of network exploration utilities and user-friendly network visualization utilities using NetworkX and PyVis.

Results: CoNetz consisted of pair-wise associations involving 174,000 entities covering 15 different entity types. The specialized COVID-19 subnetwork consisted of 7.8 million pair-wise associations involving 43,000 entities. The network captured several of the well-known COVID-19 drug re-purposing candidates and also predicted novel candidates including ingavirin, laninamivir, nevirapine, paritaprevir, pranlukast and peficitinib.

Conclusions: Our automated text and network-mining approach builds an up-to-date and comprehensive knowledge network from literature for COVID-19 studies. The wide range of entity types captured in CoNetz provides a rich neighborhood context around the relations of interest. The approach avoids multiple drawbacks associated with manual curation including cost and effort involved, lack of up-to-date information and limited coverage. Amongst the novel repurposing drugs predicted, laninamivir and paritaprevir are possible COVID-19 anti-viral drugs while pranlukast was postulated to be a candidate for managing severe respiratory symptoms in COVID-19 patients. CoNetz is available for download and use from https://web.rniapps.net/tcn/tcn.tar.gz


biomedical text-mining
Graph Convolution
Link-Prediction techniques

Supplementary materials

COVID19 SupplementaryMaterial


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.