Inconsistency in UK Biobank Event Definitions From Different Data Sources and Its Impact on Bias and Generalizability: A Case Study of Venous Thromboembolism

Am J Epidemiol. 2023 Nov 17:kwad232. doi: 10.1093/aje/kwad232. Online ahead of print.ABSTRACTThe UK Biobank study contains several sources of diagnostic data, including hospital inpatient data and self-reported conditions for ~500,000 participants, and primary care data for ~177,000 participants (35%). Epidemiological investigations require a primary disease definition, but whether to combine sources to maximize power or focus on one to ensure a consistent outcome is not clear. The consistency of definitions was investigated for venous thromboembolism (VTE) by looking at overlap when defining cases from hospital in-patient data, primary care reports, and self-reported questionnaires. VTE cases showed little overlap between data sources, with only 6% of reported events for those with primary care data identified by all three of hospital, primary care, and self-report, while 71% appeared only in one source. Deep vein thrombosis only events represented 68% of self-reported and 36% of hospital-reported VTE cases, while pulmonary embolism only events represented 20% of self-reported and 50% of hospital-reported VTE cases. Additionally, different distributions of sociodemographic characteristics were observed; for example, 46% of hospital reported VTE cases were female, compared with 58% of self-reported VTE cases. These results illustrate how seemingly neutral decisions taken to improve data quality can affect the representativeness of a dataset.PMID:37981722 | DOI:10.1093/aje/kwa...
Source: Am J Epidemiol - Category: Epidemiology Authors: Source Type: research