Using the Ensembl Variant Effect Predictor with your 23andme data

I subscribe to the Ensembl blog and found, in my feed reader this morning, a post which linked to the Variant Effect Predictor (VEP). The original blog post, strangely, has disappeared. Not to worry: so, the VEP takes genotyping data in one of several formats, compares it with the Ensembl variation + core databases and returns a summary of how the variants affect transcripts and regulatory regions. My first thought – can I apply this to my own 23andme data? 1. Convert 23andme data to VCF If you download your raw data from 23andme, it looks something like this (ignoring comment lines): rs4477212 1 82154 AA rs3094315 1 752566 AA rs3131972 1 752721 GG rs12562034 1 768448 GG rs12124819 1 776546 AA The rest of this post assumes that your raw data are saved in a file named mySNPs.txt; note that the original file name of your download will be something like genome_YOUR_NAME_Full_TIMESTAMP.txt.zip. VEP will accept several file formats, including VCF (variant call format). At Github, I discovered a tool named 23andme2vcf to perform the conversion. My first attempt failed: perl 23andme2vcf.pl mySNPs.txt mySNPs.vcf # raw data file and reference file are out of sync at ./23andme2vcf.pl line 154, line 4. Digging around the Github issues page, I noted that this error has occurred before; the issue was closed when the reference file supplied with 23andme2vcf was updated. A quick line count reveals the problem: more SNPs in my 23andme data (V3 platform) than the reference file. gu...
Source: What You're Doing Is Rather Desperate - Category: Bioinformaticians Authors: Tags: bioinformatics genomics personal statistics 23andme ensembl prediction variant Source Type: blogs