Inside the Variation toolkit: annotating a VCF with the data of NCBI biosystems mapped to BED.
Let's annotate a VCF file with the data from the NCBI biosystem. First the 'NCBI biosystem' data are mapped to a BED file using the following script. It joins "ncbi;biosystem2gene", "ncbi:biosystem-label" and "biomart-ensembl:gene" It produces a tabix-inded BED mapping the data of 'NCBI biosystem': $ gunzip -c ncbibiosystem.bed.gz | head 1 69091 70008 79501 106356 30 Signaling_by_GPCR 1 69091 (Source: YOKOFAKUN)
Source: YOKOFAKUN - July 18, 2013 Category: Bioinformaticians Authors: Pierre Lindenbaum Source Type: blogs

Interestingly: the sentence adverbs of PubMed Central
Scientific writing – by which I mean journal articles – is a strange business, full of arcane rules and conventions with origins that no-one remembers but to which everyone adheres. I’ve always been amused by one particular convention: the sentence adverb. Used with a comma to make a point at the start of a sentence, as in these examples: Surprisingly, we find that the execution of karyokinesis and cytokinesis is timely… Grossly, the tumor is well circumscribed with fibrous capsule… Correspondingly, the short-term Smad7 gene expression is graded… The example that always makes me smile ...
Source: What You're Doing Is Rather Desperate - July 15, 2013 Category: Bioinformaticians Authors: nsaunders Tags: R research diary ruby statistics adverbs pubmed central text-mining Source Type: blogs

Interestingly: the sentence adverbs of PubMed Central
Scientific writing – by which I mean journal articles – is a strange business, full of arcane rules and conventions with origins that no-one remembers but to which everyone adheres. I’ve always been amused by one particular convention: the sentence adverb. Used with a comma to make a point at the start of a sentence, as in these examples: Surprisingly, we find that the execution of karyokinesis and cytokinesis is timely… Grossly, the tumor is well circumscribed with fibrous capsule… Correspondingly, the short-term Smad7 gene expression is graded… The example that always makes me smile ...
Source: What You're Doing Is Rather Desperate - July 15, 2013 Category: Bioinformaticians Authors: nsaunders Tags: R research diary ruby statistics adverbs pubmed central text-mining Source Type: blogs

Playing with the "UCSC Genome Browser Track Hubs". my notebook
The UCSC has recently created the Genome Browser Track Hubs: " Track hubs are web-accessible directories of genomic data that can be viewed on the UCSC Genome Browser. ". I've created a Hub for the Rotavirus Genome hosted on github at:https://github.com/lindenb/genomehub.My data were primarily described as a XML file. It contains a description of the genome, of the tracks, the path to the fasta (Source: YOKOFAKUN)
Source: YOKOFAKUN - July 15, 2013 Category: Bioinformaticians Authors: Pierre Lindenbaum Source Type: blogs

Inside the Variation Toolkit: Gene Ontology for VCF, GUI for VCF
A quick note about three java-based tools for VCF files I wrote today. VcfViewGuiVcfViewGui : a Simple java-Swing-based VCF viewer. VCFGeneOntologyvcfgo reads a VCF annotated with VEP or SNPEFF, loads the data from GeneOntology and GOA and adds a new field in the INFO column for the GO terms for each position.Example:$ java -jar dist/vcfgo.jar I="https://raw.github.com/arq5x/gemini/master/test/ (Source: YOKOFAKUN)
Source: YOKOFAKUN - July 12, 2013 Category: Bioinformaticians Authors: Pierre Lindenbaum Source Type: blogs

“Open”: motivation versus definition
Tweet length: 140 characters. Quote + URL that I wanted to tweet: 160 characters. Solution: brief blog post. the probability that people who can help each other can be connected has risen to the point that for many types of problem that they actually are Please read the rest of Cameron’s thoughts on motivations for openness in research: Open is a state of mind. Filed under: open access, open science Tagged: cameron neylon, open science, science blogs (Source: What You're Doing Is Rather Desperate)
Source: What You're Doing Is Rather Desperate - July 10, 2013 Category: Bioinformaticians Authors: nsaunders Tags: open access open science cameron neylon science blogs Source Type: blogs

Mapping the annotations of a query sequence on a BLAST hit, my notebook.
This post is the answer to my own question on biostar "BLASTN / TBLASTN : mapping the features of the query to the hit.". I wrote a java program to map the annotations of a sequence to the Hit of a Blast result. The tool is available on github at https://github.com/lindenb/jvarkit.For example, say you want to map the features of the Uniprot record for Rotavirus NSP3 (http://www.uniprot.org/ (Source: YOKOFAKUN)
Source: YOKOFAKUN - July 9, 2013 Category: Bioinformaticians Authors: Pierre Lindenbaum Source Type: blogs

Lenovo Thinkpad T431s
Got a Thinkpad T431s to replace my 3-year old T410s. Here are some random observations, after using the new laptop for a few weeks. The T431s feels much lighter than the T410s. The screen is a bit smaller (still 14″, but more elongated), and the T431s lacks a DVD reader/writer, but I won’t miss that to much. On the other hand I find the fingerprint reader (not faster than typing, but less awkward when typing in front of other people) and the memory card reader (can transfer pictures from a camera fast and without having to search for the right cable) quite useful. I like the backlit keyboard better than the l...
Source: eric.jain.name - July 6, 2013 Category: Bioinformaticians Authors: Eric Jain Tags: Review Source Type: blogs

Interdisciplinary EMBL postdoc fellowship in genome evolution and chemical-biology
The EMBL Interdisciplinary Postdocs (EIPOD) program is now accepting applications (deadline 12 of September). This program funds interdisciplinary research projects between different units of the EMBL. Applicants are encouraged to discuss self-defined project ideas with EMBL scientists or select up to two project ideas available at the EIPOD website.  One of the project ideas listed this year is for a joint project between our group (EMBL-EBI) and the group of Nassos Typas at the EMBL Genome Biology Unit in Heidelberg. Here is a short description of project idea, entitled "Modeling genotype-to-phenotype relati...
Source: Public Rambling - July 2, 2013 Category: Bioinformaticians Source Type: blogs

My biggest contribution to the field of biochemistry
LinkedIn has a feature by which one can endorse other people for different fields. Periodically the system prompts me to vote yea-or-nay on a bunch of endorsements, and conversely I get regular updates as to what others have endorsed me.  It's always nice to get a vote of confidence, but sometimes I find myself wondering what it really means.Read more » (Source: Omics! Omics!)
Source: Omics! Omics! - June 30, 2013 Category: Bioinformaticians Authors: Keith Robison Source Type: blogs

-omics in 2013
Just how many (bad) -omics are there anyway? Let’s find out. Update: code and data now at Github 1. Get the raw data It would be nice if we could search PubMed for titles containing all -omics: *omics[TITL] However, we cannot since leading wildcards don’t work in PubMed search. So let’s just grab all articles from 2013: 2013[PDAT] and save them in a format which includes titles. I went with “Send to…File”, “Format…CSV”, which returns 575 068 records in pubmed_result.csv, around 227 MB in size. 2. Extract the -omics Titles are in column 1 and we only want the -omics...
Source: What You're Doing Is Rather Desperate - June 25, 2013 Category: Bioinformaticians Authors: nsaunders Tags: bioinformatics publications statistics omics pubmed Source Type: blogs

-omics in 2013
Just how many (bad) -omics are there anyway? Let’s find out. Update: code and data now at Github 1. Get the raw data It would be nice if we could search PubMed for titles containing all -omics: *omics[TITL] However, we cannot since leading wildcards don’t work in PubMed search. So let’s just grab all articles from 2013: 2013[PDAT] and save them in a format which includes titles. I went with “Send to…File”, “Format…CSV”, which returns 575 068 records in pubmed_result.csv, around 227 MB in size. 2. Extract the -omics Titles are in column 1 and we only want the -omics...
Source: What You're Doing Is Rather Desperate - June 25, 2013 Category: Bioinformaticians Authors: nsaunders Tags: bioinformatics publications statistics omics pubmed Source Type: blogs

No-one cares about your bioinformatics software
Here’s a tip. When you write an article about your software, the title of which indicates that open-source is important: A universal open-source Electronic Laboratory Notebook but you then: provide almost no details in the abstract do not provide a link to a website or repository from which your “free” software can be obtained choose not to make the article open access and put the installation instructions in a supplementary data file which is also not open access Don’t be surprised when no-one uses your software. Or is the publication more important to you than the product? Filed under: bioinfo...
Source: What You're Doing Is Rather Desperate - June 23, 2013 Category: Bioinformaticians Authors: nsaunders Tags: bioinformatics publications lims open source software Source Type: blogs