Using synthetic data to replace linkage derived elements: a case study

AbstractWhile record linkage can expand analyses performable from survey microdata, it also incurs greater risk of privacy-encroaching disclosure. One way to mitigate this risk is to replace some of the information added through linkage with synthetic data elements. This paper describes a case study using the National Hospital Care Survey (NHCS), which collects patient records under a pledge of protecting patient privacy from a sample of U.S. hospitals for statistical analysis purposes. The NHCS data were linked to the National Death Index (NDI) to enhance the survey with mortality information. The added information from NDI linkage enables survival analyses related to hospitalization, but as the death information includes dates of death and detailed causes of death, having it joined with the patient records increases the risk of patient re-identification (albeit only for deceased persons). For this reason, an approach was tested to develop synthetic data that uses models from survival analysis to replace vital status and actual dates-of-death with synthetic values and uses classification tree analysis to replace actual causes of death with synthesized causes of death. The degree to which analyses performed on the synthetic data replicate results from analysis on the actual data is measured by comparing survival analysis parameter estimates from both data files. Because synthetic data only have value to the degree that they can be used to produce statistical estimates that ar...
Source: Health Services and Outcomes Research Methodology - Category: Statistics Source Type: research
More News: Hospitals | Statistics | Study