Creating a custom GATK Walker (GATK 3.6) : my notebook
This is my notebook for creating a custom engine in GATK. Description I want to read a VCF file and to get a table of category/count. Something like this: HAVE_ID TYPE COUNT YES SNP 123 NO SNP 3 NO INDEL 13 Class Category I create a class Category describing each row in the table. It's just a List of Strings static class Category implements Comparable { (Source: YOKOFAKUN)
Source: YOKOFAKUN - January 14, 2017 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

Hello WDL ( Workflow Description Language )
This is a quick note about my first WDL workflow (Workflow Description Language) https://software.broadinstitute.org/wdl/. As a Makefile, my workflow would be the following one: NAME?=world $(NAME)_sed.txt : $(NAME).txt sed 's/Hello/Goodbye/' $ $@ $(NAME).txt: echo "Hello $(NAME)"> $@ Executed as:$ make NAME=WORLD echo "Hello WORLD"> WORLD.txt sed 's/Hello/Goodbye/' WORLD.txt> (Source: YOKOFAKUN)
Source: YOKOFAKUN - October 26, 2016 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

Writing a Custom ReadFilter for the GATK, my notebook.
The GATK contains a set of predefined read filters that "filter or transfer incoming SAM/BAM data files":BadCigar BadMate CountingRead DuplicateRead FailsVendorQualityCheck LibraryRead MalformedRead MappingQuality MappingQualityUnavailable (...) With the help of the modular architecture of the GATK, it's possible to write a custom ReadFilter. In this post I'll write a ReadFilter that removes the (Source: YOKOFAKUN)
Source: YOKOFAKUN - September 21, 2016 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

Playing with #magicblast, the #NCBI Short read mapper. My notebook
NCBI MAGIC Blast was recently mentioned by BioMickWatson on twitter. Looks pretty cool. Perhaps once again the answer to all bfx questions will be BLAST RE https://t.co/4D5e9QQnrb pic.twitter.com/bwW3y0yl2n- Mick Watson (@BioMickWatson) September 9, 2016 Here, I'll be playing with magicblast and I'll compare its output with bwa (Makefile below). First, here is an extract of the manual for (Source: YOKOFAKUN)
Source: YOKOFAKUN - September 8, 2016 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

pubmed: extracting the 1st authors' gender and location who published in the Bioinformatics journal.
In this post I ' ll get some statistics about the 1st authors in the " Bioinformatics " journal from pubmed. I ' ll extract their genders and locations. I ' ll use some tools I ' ve already described some years ago but I ' ve re-written them. Downloading the dataTo download the paper published in Bioinformatics, the pubmed/entrez query is ' " Bioinformatics " [jour] ' . I use pubmeddump to download all those (Source: YOKOFAKUN)
Source: YOKOFAKUN - May 27, 2016 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

pubmed: extracting the 1st authors' gender and location who published in the Bioinformatics journal.
In this post I'll get some statistics about the 1st authors in the "Bioinformatics" journal from pubmed. I'll extract their genders and locations. I'll use some tools I've already described some years ago but I've re-written them. Downloading the dataTo download the paper published in Bioinformatics, the pubmed/entrez query is '"Bioinformatics"[jour]'. I use pubmeddump to download all those (Source: YOKOFAKUN)
Source: YOKOFAKUN - May 26, 2016 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

Playing with the @ORCID_Org / @ncbi_pubmed graph. My notebook.
"ORCID provides a persistent digital identifier that distinguishes you from every other researcher and, through integration in key research workflows such as manuscript and grant submission, supports automated linkages between you and your professional activities ensuring that your work is recognized. "I've recently discovered that pubmed now integrates ORCID identfiers. and so it begins ! :-D @ (Source: YOKOFAKUN)
Source: YOKOFAKUN - May 20, 2016 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

finding new intron-exon junctions using the public Encode RNASeq data
I've been asked to look for some new / suspected / previously uncharacterized intron-exon junctions in public RNASeq data. I've used the BAMs under http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCaltechRnaSeq/. The following command is used to build the list of BAMs: curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCaltechRnaSeq/" |\ tr ' "' "\n" | (Source: YOKOFAKUN)
Source: YOKOFAKUN - May 16, 2016 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

Reading a VCF file faster with java 8, htsjdk and java.util.stream.Stream
java 8 streams "support functional-style operations on streams of elements, such as map-reduce transformations on collections". In this post, I will show how I've implemented a java.util.stream.Stream of VCF variants that counts the number of items in dbsnp.This example uses the java htsjdk API for reading variants.When using parallel streams, the main idea is to implement a java.util.Spliterator (Source: YOKOFAKUN)
Source: YOKOFAKUN - March 3, 2016 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

Now in picard: two javascript-based tools filtering BAM and VCF files.
SamJS and VCFFilterJS are two tools I wrote for jvarkit. Both tools use the embedded java javascript engine to filter BAM or VCF file. To get a broader audience, I've copied those functionalities to Picard in 'FilterSamReads' and 'FilterVcf'. FilterSamReadsFilterSamReads filters a SAM or BAM file with a javascript expression using the java javascript-engine. The script puts the following (Source: YOKOFAKUN)
Source: YOKOFAKUN - March 3, 2016 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

Registering a tool in the @ELIXIREurope regisry using XML, XSLT, JSON and curl. My notebook.
The Elixir Registry / pmid:26538599 "A portal to bioinformatics resources world-wide. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools."In this post, I will describe how I've used the bio.tools API to register (Source: YOKOFAKUN)
Source: YOKOFAKUN - February 23, 2016 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

Happy birthday my blog. You are now ten-year-old.
Happy birthday my blog. You are now 10-year-old. (Source: YOKOFAKUN)
Source: YOKOFAKUN - December 4, 2015 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

GATK-UI : a java-swing interface for the Genome Analysis Toolkit.
I've just pushed GATK-UI, a java swing interface for the Genome Analysis Toolkit GATK at https://github.com/lindenb/gatk-ui. This tool is also available as a WebStart/JNLP application. Screenshot Why did you create this tool ?Some non-bioinformatician collaborators often want some coverage data for a defined set of BAM, for a specific region...Did you test every tool ?NOHow did you create an (Source: YOKOFAKUN)
Source: YOKOFAKUN - December 2, 2015 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

Playing with #Docker , my notebook
This post is my notebook about docker after we had a very nice introduction about docker by François Moreews (INRIA/IRISA, Rennes). I've used docker today for the first time, my aim was just to create an image containing https://github.com/lindenb/verticalize, a small tool I wrote to verticalize text files. Install dockeryou hate running this kind of command-lines, aren't you ? $ wget -qO- (Source: YOKOFAKUN)
Source: YOKOFAKUN - July 12, 2015 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs

A BLAST to SAM converter.
Some times ago, I've received a set of Ion-Torrent /mate-reads with a poor quality. I wasn't able to align much things using bwa. I've always wondered if I could get better alignments using NCBI-BLASTN (short answer: no) . That's why I asked guyduche, my intership student to write a C program to convert the output of blastn to SAM. His code is available on github at :https://github.com/guyduche/ (Source: YOKOFAKUN)
Source: YOKOFAKUN - June 28, 2015 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs