Scalable Electronic Phenotyping For Studying Patient Comorbidities.

Scalable Electronic Phenotyping For Studying Patient Comorbidities. AMIA Annu Symp Proc. 2018;2018:740-749 Authors: Ling AY, Alsentzer E, Chen J, Banda JM, Tamang S, Minty E Abstract Over 75 million Americans have multiple concurrent chronic conditions and medical decision making for these patients is mostly based on retrospective cohort studies. Current methods to generate cohorts of patients with comorbidities are neither scalable nor generalizable. We propose a supervised machine learning algorithm for learning comorbidity phenotypes without requiring manually created training sets. First, we generated myocardial infarction (MI) and type-2 diabetes (T2DM) patient cohorts using ICD9-based imperfectly labeled samples upon which LASSO logistic regression models were trained. Second, we assessed the effects of training sample size, inclusion of physician input, and inclusion of clinical text features on model performance. Using ICD9 codes as our labeling heuristic, we achieved comparable performance to models created using keywords as labeling heuristic. We found that expert input and higher training sample sizes could compensate for the lack of clinical text derived features. However, our best performing model included clinical text as features with a large training sample size. PMID: 30815116 [PubMed - indexed for MEDLINE]
Source: AMIA Annual Symposium Proceedings - Category: Bioinformatics Tags: AMIA Annu Symp Proc Source Type: research