Tool tip: dropbox-restore
I’m currently rather sleep-deprived and prone to doing stupid things. Like this, for example: rsync -av ~/Dropbox /path/to/backup/directory/ where the directory /path/to/backup/directory already contains a much-older Dropbox directory. So when I set up a new machine, install Dropbox and copy the Dropbox directory back to its default location – hey! What happened to all my space? What are all these old files? Oh wait…I forgot to delete: rsync -av --delete ~/Dropbox /path/to/backup/directory/ Now, files can be restored of course, but not when there are thousands of them and I don’t even know what&...
Source: What You're Doing Is Rather Desperate - July 23, 2014 Category: Bioinformaticians Authors: nsaunders Tags: computing research diary software dropbox github Source Type: blogs

My own “404 not found”: making amends using Github
Conclusion I’m now in a position to run this analysis at regular intervals – probably monthly – and push the results to Github. Watch this space for any interesting developments. Making this kind of pipeline available and reproducible by others is less easy. If you have access to a machine with all the prerequisites, clone the repository, replace the symlinks to database directories with real directories, change to the directory code/ruby and run the rake tasks – well, you might be able to do the same thing. But you’d probably rather sequence the human microbiome.Filed under: bioinformatics, r...
Source: What You're Doing Is Rather Desperate - July 21, 2014 Category: Bioinformaticians Authors: nsaunders Tags: bioinformatics research diary ruby archaea database est github Source Type: blogs

Converting a spreadsheet of SMILES: my first OSM contribution
I’ve long admired the work of the Open Source Malaria Project. Unfortunately time and “day job” constraints prevent me from being as involved as I’d like. So: I was happy to make a small contribution recently in response to this request for help: Can anyone help @O_S_M to convert this spreadsheet ( malaria.ourexperiment.org/biological_dat…) into chemical structures with data? #openscience #realtimechem— Alice Williamson (@all_isee) June 24, 2014 Note – this all works fine under Linux; there seem to be some issues with Open Babel library files under OSX. First step: make that data usab...
Source: What You're Doing Is Rather Desperate - July 1, 2014 Category: Bioinformaticians Authors: nsaunders Tags: open science programming statistics cheminformatics conversion malaria osm smiles Source Type: blogs

The good, bad & missing from Bio* libraries?
As I mentioned recently, I've been exploring how I might use the emerging Julia language to solve problems.  While that requires a large amount of mental work, I see some potential gains, both in having more readable code than Perl as well as to potentially leverage a lot of high-level concepts for parallel execution that are built into the language.  But beyond the challenge of elderly canine pedagogy that I present, there is the issue that the BioJulia library is quite embryonic, with serious consideration of treating much of the existing code base as a first draft (or, that is the impression I get from skimming th...
Source: Omics! Omics! - June 30, 2014 Category: Bioinformaticians Authors: Keith Robison Source Type: blogs

utils4bioinformatics: all those “little snippets” in one place
Over the years, I’ve written a lot of small “utility scripts”. You know the kind of thing. Little code snippets that facilitate research, rather than generate research results. For example: just what are the fields that you can use to qualify Entrez database searches? Typically, they end up languishing in long-forgotten Dropbox directories. Sometimes, the output gets shared as a public link. No longer! As of today, “little code snippets that do (hopefully) useful things” have a new home at Github. Also as of today: there’s not much there right now, just the aforementioned Entrez database...
Source: What You're Doing Is Rather Desperate - June 23, 2014 Category: Bioinformaticians Authors: nsaunders Tags: bioinformatics research diary software code github repository Source Type: blogs

Dabbling with Julia
As I've remarked before, I've done significant coding in a large number of languages over the last 35-or-so years.  I don't consider myself a computer language savant; I've known folks who can pick up new languages quickly and switch between them facilely, but for me it is more difficult.  I haven't tried learning a new language in perhaps 5 years, but this week I backed into oneRead more » (Source: Omics! Omics!)
Source: Omics! Omics! - June 3, 2014 Category: Bioinformaticians Authors: Keith Robison Source Type: blogs

Tableau & Quantified Self
Tableau has a contest going on for visualizing Quantified Self data. I happen to have a fair amount of such of data from all kinds of sources on zenobase.com, so I decided to give it a shot. I was curious about Tableau’s geo features, and how well Tableau would handle slightly larger data sets. So I chose some outdoor/fitness activity data (which includes coordinates), and some hour-resolution energy expenditure data (~10K records). Getting the data into Tableau was straightforward, as Zenobase exports data in CSV format. Tableau didn’t detect any data types (timestamps, coordinates, or even numbers), but that...
Source: eric.jain.name - May 24, 2014 Category: Bioinformaticians Authors: Eric Jain Tags: Quantified Self Source Type: blogs

Tableau & Quantified Self
Tableau has a contest going on for visualizing Quantified Self data. I happen to have a fair amount of such of data from all kinds of sources on zenobase.com, so I decided to give it a shot. I was curious about Tableau’s geo features, and how well Tableau would handle slightly larger data sets. So I chose some outdoor/fitness activity data (which includes coordinates), and some hour-resolution energy expenditure data (~10K records). Getting the data into Tableau was straightforward, as Zenobase exports data in CSV format. Tableau didn’t detect any data types (timestamps, coordinates, or even numbers), but that...
Source: eric.jain.name - May 24, 2014 Category: Information Technology Authors: Eric Jain Tags: Quantified Self Source Type: blogs

A nodejs-based REST server for the UCSC @GenomeBrowser
Node.js provides a simple mechanism to write a REST server. As an exercise, I wrote a REST server for the mysql server of the UCSC genome bowser. The code is available on github at:https://github.com/lindenb/bionode Starting the server $ cd bionode $ node ucsc/ucsc.js Server running at http://localhost:8080/ METHOD: /schema/databases Lists the available databases :e.g: http://localhost:8080/ (Source: YOKOFAKUN)
Source: YOKOFAKUN - May 20, 2014 Category: Bioinformaticians Authors: Pierre Lindenbaum Source Type: blogs

How I start a bioinformatics project
Phil Ashton tweeted a link to a paper about how to set up a bioinformatics project file hierarchy: " A Quick Guide to Organizing Computational Biology Projects ". Nick Loman posted his version yesterday : "How I start a bioinformatics project" on http://nickloman.github.io/2014/05/14/how-i-start-a-bioinformatics-project/. Here is mine (simplified):I start by creating a directory managed by git (Source: YOKOFAKUN)
Source: YOKOFAKUN - May 15, 2014 Category: Bioinformaticians Authors: Pierre Lindenbaum Source Type: blogs

This is why code written by scientists gets ugly
There’s a lot of discussion around why code written by self-taught “scientist programmers” rarely follows what a trained computer scientist would consider “best practice”. Here’s a recent post on the topic. One answer: we begin with exploratory data analysis and never get around to cleaning it up. An example. For some reason, a researcher (let’s call him “Bob”) becomes interested in a particular dataset in the GEO database. So Bob opens the R console and use the GEOquery package to grab the data: Update: those of you commenting “should have used Python insteadR...
Source: What You're Doing Is Rather Desperate - May 14, 2014 Category: Bioinformaticians Authors: nsaunders Tags: programming research diary statistics Source Type: blogs

Generating wikipedia semantic links from a pubmed-id
In "Building a biomedical semantic network in Wikipedia with Semantic Wiki Links" (Database . 2012 Mar 20;2012) Benjamin Good & al. introduced the Semantic Wiki Link (SWL):An SWL is a hyperlink on Wikipedia that allows the editor to explicitly specify the type of relationship between the concept described on the page being edited and the concept that is being linked to (http://en.wikipedia.org/ (Source: YOKOFAKUN)
Source: YOKOFAKUN - May 12, 2014 Category: Bioinformaticians Authors: Pierre Lindenbaum Source Type: blogs

When is db=all not db=all? When you use Entrez ELink.
Just a brief technical note. I figured that for a given compound in PubChem, it would be interesting to know whether that compound had been used in a high-throughput experiment, which you might find in GEO. Very easy using the E-utilities, as implemented in the R package rentrez: library(rentrez) links <- entrez_link(dbfrom = "pccompound", db = "gds", id = "62857") length(links$pccompound_gds) # [1] 741 Browsing the rentrez documentation, I note that db can take the value “all”. Sounds useful! links <- entrez_link(dbfrom = "pccompound", db = "all", id =...
Source: What You're Doing Is Rather Desperate - April 29, 2014 Category: Bioinformaticians Authors: nsaunders Tags: bioinformatics programming research diary api elink entrez ncbi rentrez Source Type: blogs

On the road: CSS and eResearch Conference 2014
Next week I’ll be in Melbourne for one of my favourite meetings, the annual Computational and Simulation Sciences and eResearch Conference. The main reason for my visit is the Bioinformatics FOAM workshop. Day 1 (March 27) is not advertised since it is an internal CSIRO day, but I’ll be presenting a talk titled “SQL, noSQL or no database at all? Are databases still a core skill?“. Day 2 (March 28) is open to all and I’ll be talking about “Learning from complete strangers: social networking for bioinformaticians“. I imagine these and other talks will appear on Slideshare soon, at bo...
Source: What You're Doing Is Rather Desperate - March 20, 2014 Category: Bioinformaticians Authors: nsaunders Tags: bioinformatics computing meetings travel conference foam melbourne Source Type: blogs

“Advance” access and DOIs: what’s the problem?
A DOI, this morning When I arrive at work, the first task for the day is “check feeds”. If I’m lucky, in the “journal TOCs” category, there will be an abstract that looks interesting, like this one on the left (click for larger version). Sometimes, the title is a direct link to the article at the journal website. Often though, the link is a Digital Object Identifier or DOI. Frequently, when the article is labelled as “advance access” or “early”, clicking on the DOI link leads to a page like the one below on the right. DOI #fail In the grand scheme of things I suppose th...
Source: What You're Doing Is Rather Desperate - March 9, 2014 Category: Bioinformaticians Authors: nsaunders Tags: doi journals publishing Source Type: blogs