< i > EventEpi < /i > —A natural language processing framework for event-based surveillance

by Auss Abbood, Alexander Ullrich, R üdiger Busche, Stéphane Ghozzi According to the World Health Organization (WHO), around 60% of all outbreaks are detected using informal sources. In many public health institutes, including the WHO and the Robert Koch Institute (RKI), dedicated groups of public health agents sift through numerous articles and newsletters to det ect relevant events. This media screening is one important part of event-based surveillance (EBS). Reading the articles, discussing their relevance, and putting key information into a database is a time-consuming process. To support EBS, but also to gain insights into what makes an article and the e vent it describes relevant, we developed a natural language processing framework for automated information extraction and relevance scoring. First, we scraped relevant sources for EBS as done at the RKI (WHO Disease Outbreak News and ProMED) and automatically extracted the articles’ key data:disease,country,date, andconfirmed-case count. For this, we performed named entity recognition in two steps: EpiTator, an open-source epidemiological annotation tool, suggested many different possibilities for each. We extracted the key country and disease using a heuristic with good results. We trained a naive Bayes classifier to find the key date and confirmed-case count, using the RKI ’s EBS database as labels which performed modestly. Then, for relevance scoring, we defined two classes to which any article might belong: Th...
Source: PLoS Computational Biology - Category: Biology Authors: Source Type: research