A Comparison of Word Embeddings for the Biomedical Natural Language Processing

Conclusion Based on the evaluation results, we can draw the following conclusions. First, the word embeddings trained on EHR and MedLit can capture the semantics of medical terms better and find semantically relevant medical terms closer to human experts’ judgments than those trained on GloVe and Google News. Second, there does not exist a consistent global ranking of word embeddings for all downstream biomedical NLP applications. However, adding word embeddings as extra features will improve results on most downstream tasks. Finally, the word embeddings trained on biomedical domain corpora do not necessarily have better performance than those trained on general domain corpora for any downstream biomedical NLP task.Graphical abstract
Source: Journal of Biomedical Informatics - Category: Information Technology Source Type: research