Using the Ensembl Variant Effect Predictor with your 23andme data
I subscribe to the Ensembl blog and found, in my feed reader this morning, a post which linked to the Variant Effect Predictor (VEP). The original blog post, strangely, has disappeared.
Not to worry: so, the VEP takes genotyping data in one of several formats, compares it with the Ensembl variation + core databases and returns a summary of how the variants affect transcripts and regulatory regions. My first thought – can I apply this to my own 23andme data?
1. Convert 23andme data to VCF
If you download your raw data from 23andme, it looks something like this (ignoring comment lines):
rs4477212 1 82154 AA
rs3094315 1 752566 AA
rs3131972 1 752721 GG
rs12562034 1 768448 GG
rs12124819 1 776546 AA
The rest of this post assumes that your raw data are saved in a file named mySNPs.txt; note that the original file name of your download will be something like genome_YOUR_NAME_Full_TIMESTAMP.txt.zip.
VEP will accept several file formats, including VCF (variant call format). At Github, I discovered a tool named 23andme2vcf to perform the conversion. My first attempt failed:
perl 23andme2vcf.pl mySNPs.txt mySNPs.vcf
# raw data file and reference file are out of sync at ./23andme2vcf.pl line 154, line 4.
Digging around the Github issues page, I noted that this error has occurred before; the issue was closed when the reference file supplied with 23andme2vcf was updated.
A quick line count reveals the problem: more SNPs in my 23andme data (V3 platform) than the reference file.
gu...
Source: What You're Doing Is Rather Desperate - Category: Bioinformaticians Authors: nsaunders Tags: bioinformatics genomics personal statistics 23andme ensembl prediction variant Source Type: blogs