Scalable probabilistic PCA for large-scale genetic variation data

We present ProPCA, a highly scalable method based on a probabilistic generative model, which computes the top PCs on genetic variation data efficiently. We applied ProPCA to compute the top five PCs on genotype data from the UK Bioban k, consisting of 488,363 individuals and 146,671 SNPs, in about thirty minutes. To illustrate the utility of computing PCs in large samples, we leveraged the population structure inferred by ProPCA within White British individuals in the UK Biobank to identify several novel genome-wide signals of re cent putative selection including missense mutations inRPGRIP1L andTLR4.
Source: PLoS Genetics - Category: Genetics & Stem Cells Authors: Source Type: research
More News: Genetics | Study | UK Health