Interestingly: the sentence adverbs of PubMed Central
Scientific writing – by which I mean journal articles – is a strange business, full of arcane rules and conventions with origins that no-one remembers but to which everyone adheres. I’ve always been amused by one particular convention: the sentence adverb. Used with a comma to make a point at the start of a sentence, as in these examples: Surprisingly, we find that the execution of karyokinesis and cytokinesis is timely… Grossly, the tumor is well circumscribed with fibrous capsule… Correspondingly, the short-term Smad7 gene expression is graded… The example that always makes me smile ...
Source: What You're Doing Is Rather Desperate - July 15, 2013 Category: Bioinformaticians Authors: nsaunders Tags: R research diary ruby statistics adverbs pubmed central text-mining Source Type: blogs

Interestingly: the sentence adverbs of PubMed Central
Scientific writing – by which I mean journal articles – is a strange business, full of arcane rules and conventions with origins that no-one remembers but to which everyone adheres. I’ve always been amused by one particular convention: the sentence adverb. Used with a comma to make a point at the start of a sentence, as in these examples: Surprisingly, we find that the execution of karyokinesis and cytokinesis is timely… Grossly, the tumor is well circumscribed with fibrous capsule… Correspondingly, the short-term Smad7 gene expression is graded… The example that always makes me smile ...
Source: What You're Doing Is Rather Desperate - July 15, 2013 Category: Bioinformaticians Authors: nsaunders Tags: R research diary ruby statistics adverbs pubmed central text-mining Source Type: blogs

“Open”: motivation versus definition
Tweet length: 140 characters. Quote + URL that I wanted to tweet: 160 characters. Solution: brief blog post. the probability that people who can help each other can be connected has risen to the point that for many types of problem that they actually are Please read the rest of Cameron’s thoughts on motivations for openness in research: Open is a state of mind. Filed under: open access, open science Tagged: cameron neylon, open science, science blogs (Source: What You're Doing Is Rather Desperate)
Source: What You're Doing Is Rather Desperate - July 10, 2013 Category: Bioinformaticians Authors: nsaunders Tags: open access open science cameron neylon science blogs Source Type: blogs

-omics in 2013
Just how many (bad) -omics are there anyway? Let’s find out. Update: code and data now at Github 1. Get the raw data It would be nice if we could search PubMed for titles containing all -omics: *omics[TITL] However, we cannot since leading wildcards don’t work in PubMed search. So let’s just grab all articles from 2013: 2013[PDAT] and save them in a format which includes titles. I went with “Send to…File”, “Format…CSV”, which returns 575 068 records in pubmed_result.csv, around 227 MB in size. 2. Extract the -omics Titles are in column 1 and we only want the -omics...
Source: What You're Doing Is Rather Desperate - June 25, 2013 Category: Bioinformaticians Authors: nsaunders Tags: bioinformatics publications statistics omics pubmed Source Type: blogs

-omics in 2013
Just how many (bad) -omics are there anyway? Let’s find out. Update: code and data now at Github 1. Get the raw data It would be nice if we could search PubMed for titles containing all -omics: *omics[TITL] However, we cannot since leading wildcards don’t work in PubMed search. So let’s just grab all articles from 2013: 2013[PDAT] and save them in a format which includes titles. I went with “Send to…File”, “Format…CSV”, which returns 575 068 records in pubmed_result.csv, around 227 MB in size. 2. Extract the -omics Titles are in column 1 and we only want the -omics...
Source: What You're Doing Is Rather Desperate - June 25, 2013 Category: Bioinformaticians Authors: nsaunders Tags: bioinformatics publications statistics omics pubmed Source Type: blogs

No-one cares about your bioinformatics software
Here’s a tip. When you write an article about your software, the title of which indicates that open-source is important: A universal open-source Electronic Laboratory Notebook but you then: provide almost no details in the abstract do not provide a link to a website or repository from which your “free” software can be obtained choose not to make the article open access and put the installation instructions in a supplementary data file which is also not open access Don’t be surprised when no-one uses your software. Or is the publication more important to you than the product? Filed under: bioinfo...
Source: What You're Doing Is Rather Desperate - June 23, 2013 Category: Bioinformaticians Authors: nsaunders Tags: bioinformatics publications lims open source software Source Type: blogs

Snippets: guts, cancers, statistics
This article is getting a lot of attention on Twitter this week. Brief summary: cancer cells are really messed up in all sorts of ways, most of which are not causal with respect to the cancer. Anyone who has ever looked at microarray data knows that it’s not uncommon for 50% or more of genes to show differential expression in a cancer/normal comparison, so this is hardly a new concept. I think we need to move away from ever-more detailed characterizations of the ways in which cancer cells are “messed up.” We know that they are and that doesn’t provide much insight, in my opinion. The vast majority o...
Source: What You're Doing Is Rather Desperate - June 17, 2013 Category: Bioinformaticians Authors: nsaunders Tags: publications statistics archaea cancer microbiology Source Type: blogs

Using the Ensembl Variant Effect Predictor with your 23andme data
I subscribe to the Ensembl blog and found, in my feed reader this morning, a post which linked to the Variant Effect Predictor (VEP). The original blog post, strangely, has disappeared. Not to worry: so, the VEP takes genotyping data in one of several formats, compares it with the Ensembl variation + core databases and returns a summary of how the variants affect transcripts and regulatory regions. My first thought – can I apply this to my own 23andme data? 1. Convert 23andme data to VCF If you download your raw data from 23andme, it looks something like this (ignoring comment lines): rs4477212 1 82154 AA rs3094315 ...
Source: What You're Doing Is Rather Desperate - June 4, 2013 Category: Bioinformaticians Authors: nsaunders Tags: bioinformatics genomics personal statistics 23andme ensembl prediction variant Source Type: blogs

How to: bulk retrieval of archaeal genome sequences from the NCBI FTP site
While we’re on the topic of mistaking Archaea for Bacteria, here’s an issue with the NCBI FTP site that has long annoyed me and one workaround. Warning: I threw this together minutes ago and it’s not fully tested. Let’s cut to the chase. In the NCBI FTP site, archaeal genome data is stored along with bacterial genomes in a single directory named Bacteria. Aside from the fact that this is taxonomically incorrect, it makes bulk retrieval of archaeal data rather difficult. For example, I know that Methanococcoides burtonii is an archaeon and if I want to download its protein-coding genes (files ending...
Source: What You're Doing Is Rather Desperate - May 28, 2013 Category: Bioinformaticians Authors: nsaunders Tags: bioinformatics genomics database ftp ncbi sequence Source Type: blogs

Oops: taxonomy #fail
My journey from bench scientist to bioinformatician began with archaeal genomes. So I was somewhat startled to read The catalytic mechanism for aerobic formation of methane by bacteria, in which we learn about the “ocean-dwelling bacterium Nitrosopumilus maritimus“. So was Jonathan Eisen of course and you should go and read why. Every top hit in a Web search for that organism tells us that Nitrosopumilus maritimus is an archaeon. Looking forward to a rapid correction and apology from Nature. Title edited from “phylogeny” to “taxonomy” at the insistence of @BioinfoTools ;) Filed under: pu...
Source: What You're Doing Is Rather Desperate - May 27, 2013 Category: Bioinformaticians Authors: nsaunders Tags: publications archaea biochemistry enzymology errors nature phylogeny taxonomy Source Type: blogs