PubMed Commons & Bioinformatics: a call for action
NCBI pubmed Commons/@PubMedCommons is a new system that enables researchers to share their opinions about scientific publications. Researchers can comment on any publication indexed by PubMed, and read the comments of others. Now that we can add some comments to the papers in pubmed, I suggest to flag the articles to mark the deprecated softwares, databases, hyperlinks using a simple controlled (Source: YOKOFAKUN)
Source: YOKOFAKUN - October 24, 2013 Category: Bioinformaticians Authors: Pierre Lindenbaum Source Type: blogs

Inside the variation toolkit: Generating a structured document describing an Illumina directory.
I wrote a tool named "Illuminadir" : it creates a structured (JSON or XML) representation of a directory containing some Illumina FASTQs (I only tested it with HiSeq , paired end-data and indexes).MotivationIlluminadir scans folders , search for FASTQs and generate a structured summary of the files (xml or json). Currently only tested with HiSeq data having an indexCompilationSee also (Source: YOKOFAKUN)
Source: YOKOFAKUN - October 23, 2013 Category: Bioinformaticians Authors: Pierre Lindenbaum Source Type: blogs

Pubmed Commons - the new science water-cooler
Pubmed has decided to dip its toes into social activities by adding a commenting feature to it's website (named Pubmed Commons). It will start off in a closed pilot phase where you have to receive an invite in order to be able to comment but it should eventually be widely available. The implementation is simple and everything works as you would expect. Here is a screenshot with an example comment: As you would expect you get an option to add a comment, to edit or delete previous comments you have made and up-vote other comments. In future versions you will be able to reply to comments in a threaded discussion. The comm...
Source: Public Rambling - October 22, 2013 Category: Bioinformaticians Source Type: blogs

Ion Previews More Accurate Polymerase, Faster Template Prep
I haven't talked about Ion Torrent for a while, because it was largely off my radar screen.  In early 2012 the PGM had been an important contributor to my early de novo genome assemblies, as it was the only fast turnaround, low cost system I could access.  But the data quality was always frustrating, with many indels, and the 200 basepair mode on the read lengths not great for assembly.  Once I could access a MiSeq, that became our dominant instrument for individual genome assembly.  We tried Ion once more with the 300 basepair chemistry, but were not particularly impressed.Read more » (Source: Omics! Omics!)
Source: Omics! Omics! - October 22, 2013 Category: Bioinformaticians Authors: Keith Robison Source Type: blogs

Project management (online) tools
Discussions can be used as notebooks but they get mixed in with comments on any item such as a to-do list item. All projects can be downloaded for back-up but automation required 3rd party service or coding via the API. iOS app available and Android via 3rd party app. No free account (60 day trail), plans start at $20/month 10 projects 3GB limit up to £3000/year unlimited projects 500GB limit. Basecamp can be extended from a list of additional services (mostly 3rd party) and they tend to cost additional fees. Freedcamp Project views with to-do lists, discussions, milestones, file attachments. Dashboard view with group ac...
Source: Public Rambling - October 21, 2013 Category: Bioinformaticians Source Type: blogs

Scientific Data - ultimate salami slicing publishing
Last week a new NPG journal called Scientific Data started accepting submission. Although I discussed this new journal with colleagues a few times I realized that I never argued here why I think this a very strange idea for a journal. So what is Scientific Data ? In short it is a journal that publishes metadata for a dataset with data quality metrics. From the homepage: Scientific Data is a new open-access, online-only publication for descriptions of scientifically valuable datasets. It introduces a new type of content called the Data Descriptor designed to make your data more discoverable, interpretable and reusable...
Source: Public Rambling - October 19, 2013 Category: Bioinformaticians Tags: publishing Source Type: blogs

Ripples from 454's Shutdown Announcment
Roche's announcement this week that they planned to shut down the 454 sequencing business in mid-2016 was not completely unexpected, as a number of rumors of shutdown had shown up on Twitter.  Most tweets on the subject fell into two categories: either just-the-facts-ma'am or jokes about the dominant error profile (which I guess you could call just the facts maaa'aaam).  But, certainly I wouldn't have thought Roche on the verge of this decision when I went to AGBT 2013 in February, as 454 had a huge suite in a prime location (just by the main conference hall entrance) and many expensive events. Now, Roche...
Source: Omics! Omics! - October 19, 2013 Category: Bioinformaticians Authors: Keith Robison Source Type: blogs

Software Environment Management with Modules: my notebook
The following question was recently asked on Biostar "Bioinformatics: how to version control small scripts located all over the server". I suggested to put the scripts in a central repository (under git or whatever ) and to use symbolic links in the workspaces to manage the files. On the other hand, Alex Reynolds suggested to use a Module to deploy versions of a given package. http:// (Source: YOKOFAKUN)
Source: YOKOFAKUN - October 7, 2013 Category: Bioinformaticians Authors: Pierre Lindenbaum Source Type: blogs

Roche Taps PacBio for Human Diagnostics
One of the two big buzzes in the genomics business world was the announcement that Roche Diagnostics has signed a major deal with Pacific Biosciences in the field of human diagnostics, which comes with a $35M upfront payment and a possible $45M in milestones, plus future sales of reagents.  PacBio stock rocketed over 70% on this news. This on the same day that cancer diagnostics company Foundation Medicine went public with a similar potent climb from their offering price; a good day for those lucky enough to have the shares (which, by the way, does not include me in any way, though Foundation shares a common venture backe...
Source: Omics! Omics! - September 26, 2013 Category: Bioinformaticians Authors: Keith Robison Source Type: blogs

Single-cell genomics: taking noise into account
Technical variation versus average read countsReprinted by permission from Macmillan Publishers Ltd Nat Methods, advance online (doi:10.1038/nmeth.2645) Sequencing throughput and amplification strategies have improved to a point where single cell sequencing has become feasible.  There was a recent review in Nat Rev Gen covering the progress in single cell genomics and some of its potential applications that is worth a read.  However, the required amplification steps are likely to introduce significant variation for small amounts of starting material. A group of investigators from the EBML-Heidelberg, EMBL-E...
Source: Public Rambling - September 23, 2013 Category: Bioinformaticians Source Type: blogs

Potential Sources of Drag on PacBio's Long Read Performance Trajectory
Over at Homolog.us there are two detailed blog entries on Pacific Biosciences entitled "End of Short Read Era?" (Part I  and Part II).  I've tweeted a number of comments on the technical aspects, but there are some more substantial thoughts reading these pieces helped me condense.Read more » (Source: Omics! Omics!)
Source: Omics! Omics! - September 23, 2013 Category: Bioinformaticians Authors: Keith Robison Source Type: blogs

Web scraping using Mechanize: PMID to PMCID/NIHMSID
Web services are great. Pass them a URL. Structured data comes back. Parse it, analyse it, visualise it. Done. Web scraping – interacting programmatically with a web page – is not so great. It requires more code and when the web page changes, the code breaks. However, in the absence of a web service, scraping is better than nothing. It can even be rather satisfying. Early in my bioinformatics career the realisation that code, rather than humans, can automate the process of submitting forms and reading the results was quite a revelation. In this post: how to interact with a web page at the NCBI using the Mechani...
Source: What You're Doing Is Rather Desperate - September 17, 2013 Category: Bioinformaticians Authors: nsaunders Tags: programming ruby web resources how to mechanize ncbi web scraping Source Type: blogs

Microarrays, scan dates and Bioconductor: it shouldn’t be this difficult
When dealing with data from high-throughput experimental platforms such as microarrays, it’s important to account for potential batch effects. A simple example: if you process all your normal tissue samples this week and your cancerous tissue samples next week, you’re in big trouble. Differences between cancer and normal are now confounded with processing time and you may as well start over with new microarrays. Processing date is often a good surrogate for batch and it was once easy to extract dates from Affymetrix CEL files using Bioconductor. It seems that this is no longer the case. Once upon a time (about...
Source: What You're Doing Is Rather Desperate - August 22, 2013 Category: Bioinformaticians Authors: nsaunders Tags: bioinformatics research diary statistics affymetrix batch effect microarray Source Type: blogs

Running a picard tool in the #KNIME workflow engine
http://www.knime.org/ is "a user-friendly graphical workbench for the entire analysis process: data access, data transformation, initial investigation, powerful predictive analytics, visualisation and reporting". In this post, I'll show how to invoke an external java program, and more precisely a tool from the picard library from with knime. The workflow: load a list of BAM filenames, invoke (Source: YOKOFAKUN)
Source: YOKOFAKUN - July 18, 2013 Category: Bioinformaticians Authors: Pierre Lindenbaum Source Type: blogs