How to: remember that you once knew how to parse KEGG
Recently, someone asked me if I could generate a list of genes associated with a particular pathway. Sure, I said and hacked together some rather nasty code in R which, given a KEGG pathway identifier, used a combination of the KEGG REST API, DBGET and biomaRt to return HGNC symbols. Coincidentally, someone asked the same question at Biostar. Pierre recommended the TogoWS REST service, which provides an API to multiple biological data sources. An article describing TogoWS was published in 2010. An excellent suggestion – and one which, I later discovered, I had bookmarked. Twice. As long ago as 2008. This “redis...
Source: What You're Doing Is Rather Desperate - April 22, 2013 Category: Bioinformaticians Authors: nsaunders Tags: bioinformatics programming ruby biostar how to kegg pathways rest Source Type: blogs

A brief note: R 3.0.0 and bioinformatics
Today marks the release of R 3.0.0. There will be plenty of commentary and useful information at sites such as R-bloggers (for example, Tal’s post). Version 3.0.0 is great news for bioinformaticians, due to the introduction of long vectors. What does that mean? Well, several months ago, I was using the simpleaffy package from Bioconductor to normalize Affymetrix exon microarrays. I began as usual by reading the CEL files: f <- list.files(path = "data/affyexon", pattern = ".CEL.gz", full.names = T, recursive = T) cel <- ReadAffy(filenames = f) When this happened: Error in read.affybatch(fi...
Source: What You're Doing Is Rather Desperate - April 3, 2013 Category: Bioinformaticians Authors: nsaunders Tags: bioinformatics programming statistics 3.0.0 affymetrix bioconductor microarray Source Type: blogs

Git for bioinformaticians at the Bioinformatics FOAM meeting
Last week, I attended the annual Computational and Simulation Sciences and eResearch Conference, hosted by CSIRO in Melbourne. The meeting includes a workshop that we call Bioinformatics FOAM (Focus On Analytical Methods). This year it was run over 2.5 days (up from the previous 1.5 by popular request); one day for internal CSIRO stuff and the rest open to external participants. I had the pleasure of giving a brief presentation on the use of Git in bioinformatics. Nothing startling; aimed squarely at bioinformaticians who may have heard of version control in general and Git in particular but who are yet to employ either. I...
Source: What You're Doing Is Rather Desperate - March 26, 2013 Category: Bioinformaticians Authors: nsaunders Tags: australia bioinformatics computing meetings csiro eresearch foam git ict slideshare version control Source Type: blogs

The end of Google Reader: a scientist’s perspective
Since 2005, I have started almost every working day by using one Web application – an application that occupies a permanent browser tab on my work and home desktop machines. That application is Google Reader. If you’re reading this, you’re probably aware that Google Reader will cease to exist from July 1 2013. Others have ranted, railed against the corporate machine and expressed their sadness. I thought I’d try to explain why, for this working scientist at least, RSS and feed readers are incredibly useful tools which I think should be valued highly. Some feeds, yesterday RSS: a primer When I first...
Source: What You're Doing Is Rather Desperate - March 18, 2013 Category: Bioinformaticians Authors: nsaunders Tags: google web resources google reader rss Source Type: blogs

R/ggplot2 tip: aes_string
I’m a big fan of ggplot2. Recently, I ran into a situation which called for a useful feature that I had not used previously: aes_string. Imagine that you have data consisting of observations for several variables – let’s say A, B, C – where each observation is from one of two groups – call them X and Y: df1 <- data.frame(A = rnorm(50), B = rnorm(50), C = rnorm(50), group = rep(LETTERS[24:25], 25)) head(df1) # A B C group # 1 0.2748922 -0.4805635 -1.80242191 X # 2 0.0060852 -1.2972077 0.64262069 Y # 3 0.1994655 -0.4628783 0.07670911...
Source: What You're Doing Is Rather Desperate - February 25, 2013 Category: Bioinformaticians Authors: nsaunders Tags: programming research diary statistics aes aes_string ggplot2 Source Type: blogs

Basic R: rows that contain the maximum value of a variable
File under “I keep forgetting how to do this basic, frequently-required task, so I’m writing it down here.” Let’s create a data frame which contains five variables, vars, named A – E, each of which appears twice, along with some measurements: df.orig <- data.frame(vars = rep(LETTERS[1:5], 2), obs1 = c(1:10), obs2 = c(11:20)) df.orig # vars obs1 obs2 # 1 A 1 11 # 2 B 2 12 # 3 C 3 13 # 4 D 4 14 # 5 E 5 15 # 6 A 6 16 # 7 B 7 17 # 8 C 8 18 # 9 D 9 19 # 10 E 10 20 Now, let’s say we want only the ro...
Source: What You're Doing Is Rather Desperate - February 13, 2013 Category: Bioinformaticians Authors: nsaunders Tags: programming research diary statistics Source Type: blogs

Genes x Samples: please explain
One of my bioinformatics pet peeves involves statements like this one, from the CNAmet user guide: Inputs to CNAmet are three m x n matrices, where m is the number of genes and n the number samples What we’re looking at here is the hot, but poorly-defined topic of data integration, in which biological measurements from two or more different platforms are somehow combined in a way that provides more information than each platform separately. Read any paper on this topic, download the software and you’ll find example datasets containing two or more matched matrices, with rows where measurements have been summar...
Source: What You're Doing Is Rather Desperate - February 12, 2013 Category: Bioinformaticians Authors: nsaunders Tags: bioinformatics research diary Source Type: blogs

Lots of “open goodness” in the AU/NZ region
January/February are exciting months for open [data|research|science|access] proponents in our region – by which I mean Australia and New Zealand. First, we’ve enjoyed a speaking tour by Sir Tim Berners-Lee, during which he discussed the benefits of open data several times. I was able to attend two events in Sydney in person and a third, linux.conf.au, by video stream. The events were the work of many people but in particular, Pia Waugh. Go follow her on Twitter, now. Next – I wish I had been able to get to this one – the Open Research Conference on February 6-7, University of Auckland. I’m en...
Source: What You're Doing Is Rather Desperate - February 6, 2013 Category: Bioinformaticians Authors: nsaunders Tags: australia australian news open access open science new zealand open data tim berners-lee Source Type: blogs

It’s #overlyhonestmethods come to life!
Retraction Watch reports a study of microarray data sharing. The article, published in Clinical Chemistry, is itself behind a paywall despite trumpeting the virtues of open data. So straight to the Open Access Irony Award group at CiteULike it goes. I was not surprised to learn that the rate of public deposition of data is low, nor that most deposited data ignores standards and much of it is low quality. What did catch my eye though, was a retraction notice for one of the articles from the study, in which the authors explain the reason for retraction. Two phrases in particular stand out: we discovered an error in the da...
Source: What You're Doing Is Rather Desperate - January 30, 2013 Category: Bioinformaticians Authors: nsaunders Tags: bioinformatics publications statistics microarray reproducibility retraction Source Type: blogs

The future of science publishing from 1996
Floating by in the Twitter stream, this from @leonidkruglyak. It leads to a light-hearted opinion(ated) piece by Sydney Brenner in Current Biology, 1996. In 1996, you may recall, the Web was just a few years old. Amusingly (sadly?), it seems that Brenner predicted many of the topics in science publishing that we’re still discussing in 2013. It’s just that he thought they would be implemented in no time at all. For example, open refereeing: It is incidents such as this that have led me to question whether the anonymity of referees needs to be guarded so closely Self-publishing/archiving and post-publication...
Source: What You're Doing Is Rather Desperate - January 10, 2013 Category: Bioinformaticians Authors: nsaunders Tags: publications altmetrics history publishing sydney brenner www Source Type: blogs

Open Access: sometimes all it takes is the right person
We can debate the economics, complexities, details, implementation… of open access publishing for as long as we like. However, the basic principle: that publicly-funded research should be publicly-accessible seems to me at least, very obviously correct and “the right thing to do”. So this, from April 2012, was very depressing. Open access not as simple as it sounds: outgoing ARC boss For those outside Australia, the ARC is the Australian Research Council. Much debate ensued in which one contributor to the comment thread wrote: …it is particularly galling that Sheil is projecting her own simplistic...
Source: What You're Doing Is Rather Desperate - January 8, 2013 Category: Bioinformaticians Authors: nsaunders Tags: australia open access arc publishing Source Type: blogs