Biological and Medicinal Chemistry

Text and Network-Mining for COVID-19 Intervention Studies

Thomas Joseph Tata Consultancy Services Limited (TCSL)


Background: The COVID-19 pandemic has led to a massive and collective pursuit by the research community to find effective diagnostics, drugs and vaccines The large and growing body of literature present in MEDLINE and other online resources including various self-archive sites are invaluable for these efforts. MEDLINE has more than 30 million abstracts and an additional corpus related to COVID-19, SARS and MERS has more than 40,000 literature articles, and these numbers are growing. Automated extraction of useful information from literature and automated generation of novel insights is crucial for accelerated discovery of drug/vaccine targets and re-purposing drug candidates.

Methods: We applied text-mining on MEDLINE abstracts and the CORD-19 corpus to extract a rich set of pair-wise correlations between various biomedical entities. We built a comprehensive pair-wise entity association network involving 15 different entity types using both text-mined associations as well as novel associations obtained using link prediction. The resulting network, which we call CoNetz, also contains a specialized COVID-19 subnetwork that provides a network view of COVID-19 related literature. Additionally, we developed a set of network exploration utilities and user-friendly network visualization utilities using NetworkX and PyVis.

Results: CoNetz consisted of pair-wise associations involving 174,000 entities covering 15 different entity types. The specialized COVID-19 subnetwork consisted of 7.8 million pair-wise associations involving 43,000 entities. The network captured several of the well-known COVID-19 drug re-purposing candidates and also predicted novel candidates including ingavirin, laninamivir, nevirapine, paritaprevir, pranlukast and peficitinib.

Conclusions: Our automated text and network-mining approach builds an up-to-date and comprehensive knowledge network from literature for COVID-19 studies. The wide range of entity types captured in CoNetz provides a rich neighborhood context around the relations of interest. The approach avoids multiple drawbacks associated with manual curation including cost and effort involved, lack of up-to-date information and limited coverage. Amongst the novel repurposing drugs predicted, laninamivir and paritaprevir are possible COVID-19 anti-viral drugs while pranlukast was postulated to be a candidate for managing severe respiratory symptoms in COVID-19 patients. CoNetz is available for download and use from


Thumbnail image of COVID19_Submission_Final.pdf
download asset COVID19_Submission_Final.pdf 3 MB [opens in a new tab]

Supplementary material

Thumbnail image of COVID19_SupplementaryMaterial.pdf
download asset COVID19_SupplementaryMaterial.pdf 0.04 MB [opens in a new tab]
COVID19 SupplementaryMaterial