Comparing Different Methods for Named Entity Recognition in Portuguese Neurology Text

AbstractElectronic Medical Records (EMRs) are written in an unstructured way, often using natural language. Information Extraction (IE) may be used for acquiring knowledge from such texts, including the automatic recognition of meaningful entities, through models  for Named Entity Recognition (NER). However, while most work on the previous was made for English, this experience aimed at testing different methods in Portuguese text, more precisely, on the domain of Neurology, and take some conclusions. This paper comprised the comparison between Conditional R andom Fields (CRF), bidirectional Long Short-term Memory - Conditional Random Fields (BiLSTM-CRF) and a BiLSTM-CRF with residual learning connections, using not only Portuguese texts from medical journals but also texts from the Coimbra Hospital and Universitary Centre (CHUC) Neurology Service. Furt hermore, the performances of BiLSTM-CRF models using word embeddings (WEs) trained with clinical text and WEs trained with general language texts were compared. Deep learning models achieved F1-Scores of nearly 83% and 75%, respectively for relaxed and strict evaluation, on texts extracted from the medical journal. For texts collected from the Hospital, the same achieved F1-Scores of nearly 71% and 62%. This work concludes that deep learning models outperform the shallow learning models and that in-domain WEs get better results than general language WEs, even when the latter are trained with m uch more text than the former. Fu...
Source: Journal of Medical Systems - Category: Information Technology Source Type: research