R/ggplot2 tip: aes_string
I’m a big fan of ggplot2. Recently, I ran into a situation which called for a useful feature that I had not used previously: aes_string. Imagine that you have data consisting of observations for several variables – let’s say A, B, C – where each observation is from one of two groups – call them X and Y: df1 <- data.frame(A = rnorm(50), B = rnorm(50), C = rnorm(50), group = rep(LETTERS[24:25], 25)) head(df1) # A B C group # 1 0.2748922 -0.4805635 -1.80242191 X # 2 0.0060852 -1.2972077 0.64262069 Y # 3 0.1994655 -0.4628783 0.07670911...
Source: What You're Doing Is Rather Desperate - February 25, 2013 Category: Bioinformaticians Authors: nsaunders Tags: programming research diary statistics aes aes_string ggplot2 Source Type: blogs

AGBT Preview: Nabsys
A complaint which seems to be circulating on Twitter and elsewhere is that this year’s AGBT conference on Marco Island next week doesn’t look like it will have any excitement around new platforms.  AGBT has been a traditional coming out party for platforms.  Last year it was Oxford Nanopore which created a huge buzz, and in previous years that crown has been held by Ion Torrent, Pacific Biosciences, Complete Genomics and others (including a few which seem to have gone kaput).It is hard to argue that this year’s program is much more heavily tilted towards applications of genomics than novel genomic technologies.  M...
Source: Omics! Omics! - February 18, 2013 Category: Bioinformaticians Authors: Keith Robison Source Type: blogs

Basic R: rows that contain the maximum value of a variable
File under “I keep forgetting how to do this basic, frequently-required task, so I’m writing it down here.” Let’s create a data frame which contains five variables, vars, named A – E, each of which appears twice, along with some measurements: df.orig <- data.frame(vars = rep(LETTERS[1:5], 2), obs1 = c(1:10), obs2 = c(11:20)) df.orig # vars obs1 obs2 # 1 A 1 11 # 2 B 2 12 # 3 C 3 13 # 4 D 4 14 # 5 E 5 15 # 6 A 6 16 # 7 B 7 17 # 8 C 8 18 # 9 D 9 19 # 10 E 10 20 Now, let’s say we want only the ro...
Source: What You're Doing Is Rather Desperate - February 13, 2013 Category: Bioinformaticians Authors: nsaunders Tags: programming research diary statistics Source Type: blogs

Genes x Samples: please explain
One of my bioinformatics pet peeves involves statements like this one, from the CNAmet user guide: Inputs to CNAmet are three m x n matrices, where m is the number of genes and n the number samples What we’re looking at here is the hot, but poorly-defined topic of data integration, in which biological measurements from two or more different platforms are somehow combined in a way that provides more information than each platform separately. Read any paper on this topic, download the software and you’ll find example datasets containing two or more matched matrices, with rows where measurements have been summar...
Source: What You're Doing Is Rather Desperate - February 12, 2013 Category: Bioinformaticians Authors: nsaunders Tags: bioinformatics research diary Source Type: blogs

Counting the reads in a BAM file using SGE and the Open-MPI library: my notebook.
In the current post, I'll describe my first Open MPI program. Open MPI is "a Message Passing Interface (MPI) library, a standardized and portable message-passing system to function on a wide variety of parallel computers". My C program takes a list of BAMs, distributes some jobs on the SGE (SUN/Oracle Grid Engine) to count the number of reads and returns the results to a master process. (Source: YOKOFAKUN)
Source: YOKOFAKUN - February 11, 2013 Category: Bioinformaticians Authors: Pierre Lindenbaum Source Type: blogs

A tool to compare the BAMs
Following that thread on Biostar, I've created a tool that compares two or more BAMs. This java program uses the Picard and berkeleydb-JE libraries and is available at: http://code.google.com/p/jvarkit/wiki/CompareBams. Download & installsee http://code.google.com/p/jvarkit/wiki/CompareBams. ExampleThe following Makefile align the same pair of FASTQs with 5 different parameters for bwa aln -O ( (Source: YOKOFAKUN)
Source: YOKOFAKUN - February 7, 2013 Category: Bioinformaticians Authors: Pierre Lindenbaum Source Type: blogs

Lots of “open goodness” in the AU/NZ region
January/February are exciting months for open [data|research|science|access] proponents in our region – by which I mean Australia and New Zealand. First, we’ve enjoyed a speaking tour by Sir Tim Berners-Lee, during which he discussed the benefits of open data several times. I was able to attend two events in Sydney in person and a third, linux.conf.au, by video stream. The events were the work of many people but in particular, Pia Waugh. Go follow her on Twitter, now. Next – I wish I had been able to get to this one – the Open Research Conference on February 6-7, University of Auckland. I’m en...
Source: What You're Doing Is Rather Desperate - February 6, 2013 Category: Bioinformaticians Authors: nsaunders Tags: australia australian news open access open science new zealand open data tim berners-lee Source Type: blogs

It’s #overlyhonestmethods come to life!
Retraction Watch reports a study of microarray data sharing. The article, published in Clinical Chemistry, is itself behind a paywall despite trumpeting the virtues of open data. So straight to the Open Access Irony Award group at CiteULike it goes. I was not surprised to learn that the rate of public deposition of data is low, nor that most deposited data ignores standards and much of it is low quality. What did catch my eye though, was a retraction notice for one of the articles from the study, in which the authors explain the reason for retraction. Two phrases in particular stand out: we discovered an error in the da...
Source: What You're Doing Is Rather Desperate - January 30, 2013 Category: Bioinformaticians Authors: nsaunders Tags: bioinformatics publications statistics microarray reproducibility retraction Source Type: blogs

Samtools tview as a library to display the BAM
I've forked samtools and modified the code of tview to use it as a library to display the alignments. The original code is an inreractive interface using the ncurses library. I've modified the original code and changed the structure of the C 'struct tview' with a few callbacks to make it more object-oriented:(...) typedef struct AbstractTview { int mrow, mcol; (...) khash_t(kh_rg) * (Source: YOKOFAKUN)
Source: YOKOFAKUN - January 25, 2013 Category: Bioinformaticians Authors: Pierre Lindenbaum Source Type: blogs

Vector Representation in Structural Biology
One of the questions that vexes computational structural biologists when writing analysis software, is whether to represent spatial vectors as an object with x, y, z components, or as a numeric array of floats with 3 members. After turning over this problem in my head for the last decade, and writing vector classes in several languages, I can tell you now: go with the array of floats. It might seem that you cam write cleaner code with named components: x, y, z. After all, you’re probably going to be translating algorithms from math or physics books, where x, y, z are used. However the x, y, z designations are...
Source: Trapped in the USA - January 21, 2013 Category: Bioinformaticians Authors: bosco Source Type: blogs

Batch script for running MaxQuant command line tool
I had to process multiple Orbitrap raw files using the same parameter with Max Quant, was not able to find a simple tool for this, thus I wrote a batch script which can be downloaded from https://docs.google.com/file/d/0BxbjZeVL8S4EQW1zbVd4TzRYSDg/edit . It needs a preconfigured parameter file to begin with. For that, i generally open the Max Quant for a dummy file called TestFile.raw and set the desired parameters and fasta file for search through its GUI. Then save this as testpar.xml in the data directory using the File->Save Parameters option of MaxQuant GUI. The script needs to be modified for following va...
Source: Bioinformatics Latest News - January 16, 2013 Category: Bioinformaticians Authors: Animesh Sharma Source Type: blogs

Urinal-based Science Supervision
In a recent email conversation with Baker lab postdoc Jeremy Mills, I stumbled onto an important issue regarding supervision in science and ceramic bowls. It started with me recounting an incident in the early naugties. I was then in a lab that ran out of money and unfortunately, I was the first to be asked to leave. I spent the next few months chasing around like a headless chicken for interviews on the west and east coast of the United States. On an off-chance, I even shot off an email to a professor down the corridor. I managed to get some interviews, but nothing much came out of it. Then, one day, in my building on...
Source: Trapped in the USA - January 15, 2013 Category: Bioinformaticians Authors: bosco Source Type: blogs

A Short(ened) Note on Ion Torrent & High G+C
As one might guess from reading this space, I always have an itch to try new sequencing technologies or updates to existing ones.  That's generally a good thing in my position, though more than a few times I experience buyers remorse.  At least this time, I found something a bit interestingRead more » (Source: Omics! Omics!)
Source: Omics! Omics! - January 14, 2013 Category: Bioinformaticians Authors: Keith Robison Source Type: blogs

The future of science publishing from 1996
Floating by in the Twitter stream, this from @leonidkruglyak. It leads to a light-hearted opinion(ated) piece by Sydney Brenner in Current Biology, 1996. In 1996, you may recall, the Web was just a few years old. Amusingly (sadly?), it seems that Brenner predicted many of the topics in science publishing that we’re still discussing in 2013. It’s just that he thought they would be implemented in no time at all. For example, open refereeing: It is incidents such as this that have led me to question whether the anonymity of referees needs to be guarded so closely Self-publishing/archiving and post-publication...
Source: What You're Doing Is Rather Desperate - January 10, 2013 Category: Bioinformaticians Authors: nsaunders Tags: publications altmetrics history publishing sydney brenner www Source Type: blogs