The ability to quickly learn fundamentals about a new infectious disease such as how it is transmitted, the incubation period, and related symptoms is crucial in any novel pandemic. For instance, rapid identification of symptoms can enable interventions for dampening the spread of the disease. Traditionally, symptoms are learned from research publications associated with clinical studies. However, clinical studies are often slow and time-intensive, hence delays can have dire consequences in a rapidly spreading pandemic like we have seen with COVID-19. In this paper, we introduce SymptomID, a modular AI-based framework for rapid identification of symptoms associated with novel pandemics using publicly available news reports. SymptomID is built using the state-of-the-art natural language processing model - BERT - to extract symptoms from publicly available news reports and cluster related symptoms together to remove redundancy. Our proposed framework requires minimal training data because it builds on a pre-trained language model. In this study, we present a case study of SymptomID using news articles about the current COVID-19 pandemic. Our COVID-19 symptom extraction module, trained on 225 articles, achieves an F1 score of over 0.8. SymptomID can correctly identify well-established symptoms (e.g., “fever" and “cough") and less-prevalent symptoms (e.g. “rashes", “hair loss", “brain fog") associated with the novel coronavirus. We believe this framework can be extended and easily adapted in future pandemics to quickly learn relevant insights that are fundamental for understanding and combating a new infectious disease.
Direct link to paper

Related Publications

Our Supporters