Imputing race and ethnicity in healthcare claims databases

AbstractOur objective was  to enhance existing methods for indirectly estimating race/ethnicity in health care data by exploring ways to improve imputation accuracy with a total of 9,812,306 hospital visits from the Connecticut statewide hospitalization claims database from 2012 to 2017. Using this data, we developed multi nomial logistic regression models to predict patients’ race and ethnicity when assuming that 50% of race/ethnicity is missing completely at random. Our models included predictors derived from Connecticut birth records, US Census data, and demographic patient-level data, and were compared using per formance measures. Our model correctly classified the race and ethnicity of approximately 85% of patients in the Connecticut hospitalization claims data. We found the following [sensitivities and specificities] for our five race/ethnicity categories: non-Hispanic White [94, 83], non-Hispanic Black [76, 97], non-Hispanic Asian or Pacific Islander [41, 99.6], Hispanic [87, 95], and non-Hispanic other race [5, 99.7]. First name, surname, census tract and insurance type were key predictors. Further, Connecticut-specific name dictionaries were better at identifying non-White race and ethnicity com pared to the national 2010 US Census surname dictionary. Therefore, state-specific health records, census information, and patients’ demographic characteristics can be utilized to improve the prediction of missing racial and ethnic information in Connecticut hospitali...
Source: Health Services and Outcomes Research Methodology - Category: Statistics Source Type: research