Playing with the #GA4GH schemas and #Avro : my notebook
After watching David Haussler's talk "Beacon Project and Data Sharing ApIs", I wanted to play with Avro and the models and APIs defined by the Global Alliance for Genomics and Health (ga4gh) coalition Here is my notebook. (Wikipedia) Avro: "Avro is a remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and (Source: YOKOFAKUN)
Source: YOKOFAKUN - June 17, 2015 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

Monitoring a java application with mbeans. An example with samtools/htsjdk.
"A MBean is a Java object that follows the JMX specification. A MBean can represent a device, an application, or any resource that needs to be managed. The JConsole graphical user interface is a monitoring tool that complies to the JMX specification.". In this post I'll show how I've modified the sources of the htsjdk library to monitor the java program reading a VCF file from the Exac server. (Source: YOKOFAKUN)
Source: YOKOFAKUN - May 6, 2015 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

Playing with hadoop/mapreduce and htsjdk/VCF : my notebook.
The aim of this test is to get a count of each type of variant/genotypes in a VCF file using Apache Hadoop and the java library for NGS htsjdk. My source code is available at: https://github.com/lindenb/hadoop-sandbox/blob/master/src/main/java/com/github/lindenb/hadoop/Test.java. First, and this is my main problem, I needed to create a class 'VcfRow' that would contains the whole data about a (Source: YOKOFAKUN)
Source: YOKOFAKUN - May 4, 2015 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

Integrating a java program in #usegalaxy.
This is my notebook for the integration of java programs in https://usegalaxy.org/ . create a directory for your tools under ${galaxy-root}/tools mkdir ${galaxy-root}/tools/jvarkit put all the required jar files and the XML files describing your tools (see below) in this new directory:$ ls ${galaxy-root}/tools/jvarkit/ commons-jexl-2.1.1.jar groupbygene.jar htsjdk-1.128.jar vcffilterjs.jar (Source: YOKOFAKUN)
Source: YOKOFAKUN - February 27, 2015 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

Drawing a Manhattan plot in SVG using a GWAS+XML model.
On friday, I saw my colleague @b_l_k starting writing SVG+XML code to draw a Manhattan plot. I told him that a better idea would be to describe the data using XML and to transform the XML to SVG using XSLT. So, let's do this. I put the XSLT stylesheet on github at https://github.com/lindenb/xslt-sandbox/blob/master/stylesheets/bio/manhattan.xsl . And the model of data would look like this (I (Source: YOKOFAKUN)
Source: YOKOFAKUN - February 21, 2015 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

Automatic code generation for @knime with XSLT: An example with two nodes: fasta reader and writer.
KNIME is a java+eclipse-based graphical workflow-manager. Biologists in my lab often use this tool to filter VCFs or other tabular data. A software Development kit (SDK) is provided to build new nodes. My main problem with this SDK is, that you need to write a large number of similar files and you also have to interact with a graphical interface. I wanted to automatize the generation of java (Source: YOKOFAKUN)
Source: YOKOFAKUN - February 17, 2015 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

Listing the'Subject' Sequences in a BLAST database using the NCBI C++ toolbox. My notebook.
In my previous post (http://plindenbaum.blogspot.com/2015/01/filtering-fasta-sequences-using-ncbi-c.html) I ' ve built an application filtering FASTA sequences using theNCBI C++ toolbox (http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/). Here, I ' m gonna write a tool listing the ' subject ' sequences in a BLAST database.This new application ListBlastDatabaseContent takes only one argument ' -db ' , the (Source: YOKOFAKUN)
Source: YOKOFAKUN - February 2, 2015 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

Listing the 'Subject' Sequences in a BLAST database using the NCBI C++ toolbox. My notebook.
In my previous post (http://plindenbaum.blogspot.com/2015/01/filtering-fasta-sequences-using-ncbi-c.html) I've built an application filtering FASTA sequences using theNCBI C++ toolbox (http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/). Here, I'm gonna write a tool listing the 'subject' sequences in a BLAST database.This new application ListBlastDatabaseContent takes only one argument '-db', the (Source: YOKOFAKUN)
Source: YOKOFAKUN - February 1, 2015 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

Filtering Fasta Sequences using the #NCBI C++ API. My notebook.
In my previous post (http://plindenbaum.blogspot.com/2015/01/compiling-c-hello-world-program-using.html) I've built a simple "Hello World" application using theNCBI C++ toolbox (http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/). Here, I'm gonna to extend the code in order to create a program filtering FASTA sequences on their sizes.This new application FastaFilterSize needs three new arguments:' (Source: YOKOFAKUN)
Source: YOKOFAKUN - January 29, 2015 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

Compiling a C++'Hello world' program using the #NCBI C++ toolbox: my notebook.
This post is my notebook for compiling a simple C++ application using the NCBI C++ toolbox (http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/).This application prints ' Hello world ' and takes two arguments: ' -o ' to specificiy the output filename (default is standard output) ' -n ' to set the name to be printed (default: " Word ! " ) The code I used is the one containing in the distribution of blast (Source: YOKOFAKUN)
Source: YOKOFAKUN - January 29, 2015 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

Compiling a C++ 'Hello world' program using the #NCBI C++ toolbox: my notebook.
This post is my notebook for compiling a simple C++ application using the NCBI C++ toolbox (http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/).This application prints 'Hello world' and takes two arguments:'-o' to specificiy the output filename (default is standard output) '-n' to set the name to be printed (default: "Word !") The code I used is the one containing in the distribution of blast (Source: YOKOFAKUN)
Source: YOKOFAKUN - January 28, 2015 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

Divide-and-conquer in a #Makefile : recursivity and #parallelism.
This post is my notebook about implementing a divide-and-conquer strategy in GNU make.Say you have a list of 'N' VCFs files. You want to create a list of:common SNPs in vcf1 and vcf2 common SNPs in vcf3 and the previous list common SNPs in vcf4 and the previous list (...) common SNPs in vcfN and the previous list Yes, I know I can do this using:grep -v '^#' f.vcf|cut -f 1,2,4,5 | sort | uniq (Source: YOKOFAKUN)
Source: YOKOFAKUN - December 4, 2014 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

XML+XSLT = #Makefile -based #workflows for #bioinformatics
I've recently read some conversations on Twitter about Makefile-based bioinformatics workflows. I've suggested on biostars.org (Standard simple format to describe a bioinformatics analysis pipeline) that a XML file could be used to describe a model of data and XSLT could transform this model to a Makefile-based workflow. I've already explored this idea in a previous post (Generating a pipeline of (Source: YOKOFAKUN)
Source: YOKOFAKUN - December 3, 2014 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

Visualizing @GenomeBrowser liftOver/chain files using animated #SVG
I wrote a tool to visualize some UCSC "chain/liftOver" files as an animated SVG file. This tool is available on github at:https://github.com/lindenb/jvarkit/wiki/LiftOverToSVG"A liftOver file is a chain file, where for each region in the genome the alignments of the best/longest syntenic regions are used to translate features from one version of a genome to another.".SVG Elements and CSS styles (Source: YOKOFAKUN)
Source: YOKOFAKUN - October 29, 2014 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

IGVFox: Integrative Genomics Viewer control through mozilla Firefox
I've just pushed IGVFox 0.1 an add-on for Firefox, controlling IGV, the Integrative Genomics Viewer.This add-on allows the users to set the genomic position of IGV by just clicking a hyperlink in a HTML page. The source code is available on github at https://github.com/lindenb/igvfox and a first release is available as a *.xpi file at https://github.com/lindenb/igvfox/releases. That's it, (Source: YOKOFAKUN)
Source: YOKOFAKUN - October 15, 2014 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs