Leveraging Unlabeled Clinical Data to Boost Performance of Risk Stratification Models for Suspected Acute Coronary Syndrome

AMIA Annu Symp Proc. 2024 Jan 11;2023:744-753. eCollection 2023.ABSTRACTThe performance of deep learning models in the health domain is desperately limited by the scarcity of labeled data, especially for specific clinical-domain tasks. Conversely, there are vastly available clinical unlabeled data waiting to be exploited to improve deep learning models where their training labeled data are limited. This paper investigates the use of task-specific unlabeled data to boost the performance of classification models for the risk stratification of suspected acute coronary syndrome. By leveraging large numbers of unlabeled clinical notes in task-adaptive language model pretraining, valuable prior task-specific knowledge can be attained. Based on such pretrained models, task-specific fine-tuning with limited labeled data produces better performances. Extensive experiments demonstrate that the pretrained task-specific language models using task-specific unlabeled data can significantly improve the performance of the downstream models for specific classification tasks.PMID:38222439 | PMC:PMC10785873
Source: AMIA Annual Symposium Proceedings - Category: Bioinformatics Authors: Source Type: research