Creating a custom GATK Walker (GATK 3.6) : my notebook
This is my notebook for creating a custom engine in GATK.
Description
I want to read a VCF file and to get a table of category/count. Something like this:
HAVE_ID
TYPE
COUNT
YES
SNP
123
NO
SNP
3
NO
INDEL
13
Class Category
I create a class Category describing each row in the table. It's just a List of Strings
static class Category
implements Comparable
{ (Source: YOKOFAKUN)
Source: YOKOFAKUN - January 14, 2017 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs
Hello WDL ( Workflow Description Language )
This is a quick note about my first WDL workflow (Workflow Description Language) https://software.broadinstitute.org/wdl/.
As a Makefile, my workflow would be the following one:
NAME?=world
$(NAME)_sed.txt : $(NAME).txt
sed 's/Hello/Goodbye/' $ $@
$(NAME).txt:
echo "Hello $(NAME)"> $@
Executed as:$ make NAME=WORLD
echo "Hello WORLD"> WORLD.txt
sed 's/Hello/Goodbye/' WORLD.txt> (Source: YOKOFAKUN)
Source: YOKOFAKUN - October 26, 2016 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs
Writing a Custom ReadFilter for the GATK, my notebook.
The GATK contains a set of predefined read filters that "filter or transfer incoming SAM/BAM data files":BadCigar
BadMate
CountingRead
DuplicateRead
FailsVendorQualityCheck
LibraryRead
MalformedRead
MappingQuality
MappingQualityUnavailable
(...)
With the help of the modular architecture of the GATK, it's possible to write a custom ReadFilter. In this post I'll write a ReadFilter that removes the (Source: YOKOFAKUN)
Source: YOKOFAKUN - September 21, 2016 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs
Playing with #magicblast, the #NCBI Short read mapper. My notebook
NCBI MAGIC Blast was recently mentioned by BioMickWatson on twitter.
Looks pretty cool. Perhaps once again the answer to all bfx questions will be BLAST RE https://t.co/4D5e9QQnrb pic.twitter.com/bwW3y0yl2n- Mick Watson (@BioMickWatson) September 9, 2016
Here, I'll be playing with magicblast and I'll compare its output with bwa (Makefile below).
First, here is an extract of the manual for (Source: YOKOFAKUN)
Source: YOKOFAKUN - September 8, 2016 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs
pubmed: extracting the 1st authors' gender and location who published in the Bioinformatics journal.
In this post I ' ll get some statistics about the 1st authors in the " Bioinformatics " journal from pubmed. I ' ll extract their genders and locations.
I ' ll use some tools I ' ve already described some years ago but I ' ve re-written them.
Downloading the dataTo download the paper published in Bioinformatics, the pubmed/entrez query is ' " Bioinformatics " [jour] ' .
I use pubmeddump to download all those (Source: YOKOFAKUN)
Source: YOKOFAKUN - May 27, 2016 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs
pubmed: extracting the 1st authors' gender and location who published in the Bioinformatics journal.
In this post I'll get some statistics about the 1st authors in the "Bioinformatics" journal from pubmed. I'll extract their genders and locations.
I'll use some tools I've already described some years ago but I've re-written them.
Downloading the dataTo download the paper published in Bioinformatics, the pubmed/entrez query is '"Bioinformatics"[jour]'.
I use pubmeddump to download all those (Source: YOKOFAKUN)
Source: YOKOFAKUN - May 26, 2016 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs
Playing with the @ORCID_Org / @ncbi_pubmed graph. My notebook.
"ORCID provides a persistent digital identifier that distinguishes you from every other researcher and, through integration in key research workflows such as manuscript and grant submission, supports automated linkages between you and your professional activities ensuring that your work is recognized. "I've recently discovered that pubmed now integrates ORCID identfiers.
and so it begins ! :-D @ (Source: YOKOFAKUN)
Source: YOKOFAKUN - May 20, 2016 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs
finding new intron-exon junctions using the public Encode RNASeq data
I've been asked to look for some new / suspected / previously uncharacterized intron-exon junctions in public RNASeq data.
I've used the BAMs under http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCaltechRnaSeq/.
The following command is used to build the list of BAMs:
curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCaltechRnaSeq/" |\
tr ' "' "\n" | (Source: YOKOFAKUN)
Source: YOKOFAKUN - May 16, 2016 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs
Reading a VCF file faster with java 8, htsjdk and java.util.stream.Stream
java 8 streams "support functional-style operations on streams of elements, such as map-reduce transformations on collections". In this post, I will show how I've implemented a java.util.stream.Stream of VCF variants that counts the number of items in dbsnp.This example uses the java htsjdk API for reading variants.When using parallel streams, the main idea is to implement a java.util.Spliterator (Source: YOKOFAKUN)
Source: YOKOFAKUN - March 3, 2016 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs
Now in picard: two javascript-based tools filtering BAM and VCF files.
SamJS and VCFFilterJS are two tools I wrote for jvarkit. Both tools use the embedded java javascript engine to filter BAM or VCF file.
To get a broader audience, I've copied those functionalities to Picard in 'FilterSamReads' and 'FilterVcf'.
FilterSamReadsFilterSamReads filters a SAM or BAM file with a javascript expression using the java javascript-engine.
The script puts the following (Source: YOKOFAKUN)
Source: YOKOFAKUN - March 3, 2016 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs
Registering a tool in the @ELIXIREurope regisry using XML, XSLT, JSON and curl. My notebook.
The Elixir Registry / pmid:26538599 "A portal to bioinformatics resources world-wide. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools."In this post, I will describe how I've used the bio.tools API to register (Source: YOKOFAKUN)
Source: YOKOFAKUN - February 23, 2016 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs
Happy birthday my blog. You are now ten-year-old.
Happy birthday my blog. You are now 10-year-old. (Source: YOKOFAKUN)
Source: YOKOFAKUN - December 4, 2015 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs
GATK-UI : a java-swing interface for the Genome Analysis Toolkit.
I've just pushed GATK-UI, a java swing interface for the Genome Analysis Toolkit GATK at https://github.com/lindenb/gatk-ui. This tool is also available as a WebStart/JNLP application.
Screenshot
Why did you create this tool ?Some non-bioinformatician collaborators often want some coverage data for a defined set of BAM, for a specific region...Did you test every tool ?NOHow did you create an (Source: YOKOFAKUN)
Source: YOKOFAKUN - December 2, 2015 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs
Playing with #Docker , my notebook
This post is my notebook about docker after we had a very nice introduction about docker by François Moreews (INRIA/IRISA, Rennes). I've used docker today for the first time, my aim was just to create an image containing https://github.com/lindenb/verticalize, a small tool I wrote to verticalize text files.
Install dockeryou hate running this kind of command-lines, aren't you ?
$ wget -qO- (Source: YOKOFAKUN)
Source: YOKOFAKUN - July 12, 2015 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs
A BLAST to SAM converter.
Some times ago, I've received a set of Ion-Torrent /mate-reads with a poor quality. I wasn't able to align much things using bwa. I've always wondered if I could get better alignments using NCBI-BLASTN (short answer: no) . That's why I asked guyduche, my intership student to write a C program to convert the output of blastn to SAM. His code is available on github at :https://github.com/guyduche/ (Source: YOKOFAKUN)
Source: YOKOFAKUN - June 28, 2015 Category: Bioinformatics Authors: Pierre Lindenbaum Source Type: blogs