A Phenotype-driven COVID-19 Knowledge Graph from Biomedical Literature Drives Hypothesis Generation
Since December 2019, the scientific community has experienced a literature explosion regarding the novel coronavirus originating in Wuhan, China. As such, it has become increasingly difficult for researchers in the field to stay informed about novel developments in the published corpus. To address this problem and aid researchers in collecting, analyzing, and organizing the vast amount of information, we have created a knowledge graph (KG) cataloguing the relationships found between entities as evidenced by papers in the COVID-19 Open Research Dataset (CORD-19). We trained an embedding model to apply the KG to subsequent tasks such as predicting new treatments, symptoms, and risk factors for COVID-19. The embedding model obtained a classification accuracy over 70% classification accuracy with hits@10 at 0.61 and 0.18 depending on the expansiveness of the KG. Furthermore, an interactive web application was created and allows researchers to explore the KG and form novel questions. In conclusion, our KG compiles and extracts COVID-19 information useful to developing diagnostics and treatments. The web application is available at http://covid19nlp.wglab.org:3001/.