Development and Validation of a Large Synthetic Cohort for the Study of Cardiovascular Health across the Lifespan

This study included 40,875 participants from 7 large, population-based longitudinal epidemiology studies (1948-2016). We multiply imputed the participant's lifespan CRFs and events using the available records based on a joint multi-level imputation model. To validate the imputed values, we partially removed the observed data, then compared the imputed and observed values. The complete lifespan synthetic dataset reflected the original observed data trends well. Among our validation sample, the distributions of imputed CRFs and events were close to the observed but with less variability. Bland-Altman plots indicated that there was a slightly negative trend in general and the agreement bias was relatively small for the continuous CRFs. The hypothetical linear regression model suggested that the relationships between the CRFs and events were preserved in the imputed dataset. This approach generated valid estimates of CRFs and events across the lifespan for African American and White participants. The synthetic cohort may be accurate enough to be useful in assessing the origins and timing of accumulating cardiovascular risk that can inform efforts to avoid cardiovascular risk development.PMID:33987646 | DOI:10.1093/aje/kwab137
Source: Am J Epidemiol - Category: Epidemiology Authors: Source Type: research