Estimating summary statistics for electronic health record laboratory data for use in high-throughput phenotyping algorithms

Publication date: Available online 31 January 2018 Source:Journal of Biomedical Informatics Author(s): D.J. Albers, N. Elhadad, J. Claassen, R. Perotte, A. Goldstein, G. Hripcsak We study the question of how to represent or summarize raw laboratory data taken from an electronic health record (EHR) using parametric model selection to reduce or cope with biases induced through clinical care. It has been previously demonstrated that the health care process [1,2], as defined by measurement context[2,3] and measurement patterns [4,5], can influence how EHR data are distributed statistically [6,7]. We construct an algorithm, PopKLD, which is based on information criterion model selection [8,9], is intended to reduce and cope with health care process biases and to produce an intuitively understandable continuous summary. The PopKLD algorithm can be automated and is designed to be applicable in high-throughput settings; for example, the output of the PopKLD algorithm can be used as input for phenotyping algorithms. Moreover, we develop the PopKLD-CAT algorithm that transforms the continuous PopKLD summary into a categorical summary useful for applications that require categorical data such as topic modeling. We evaluate our methodology in two ways. First, we apply the method to laboratory data collected in two different health care contexts, primary versus intensive care. We show that the PopKLD preserves known physiologic features in the data that are lost when summarizing t...
Source: Journal of Biomedical Informatics - Category: Information Technology Source Type: research