How to analyze next-gen DNA uptake data

We want to understand why competent Haemophilus influenzae cells take up some parts of H. influenzae chromosomes more efficiently than others. To this end, before Christmas the grad student reisolated preparations of DNA fragments of chromosomal DNA from strain 26-028NP (hence 'NP') that had been taken up by competent cells of the standard lab strain Rd.  He sent these DNA samples to the former post-doc for sequencing (with the original 'input DNAs as controls).  The post-doc has now sent us the sequencing data, and the grad student is going to analyze this, with two main goals:Determine how a DNA fragment's probability of uptake is affected by the presence of sequences matching the uptake signal sequence ('USS') motif.Identify other sequence factors that influence uptake.The grad student has written up an overview of his plan for accomplishing these goals, and that has stimulated me to also think about how it could be done.He (or the former post-doc?) has already done the first step, scoring the degree of preferential uptake for every position in the genome.  I think this was done by comparing each genome position's coverage in the recovered-DNA dataset to its coverage in the control 'input' dataset.  This gives a score they call the 'uptake ratio'.Here's a graph made by the grad student, showing the uptake ratios for two different preps of chromosomal DNA, over a 13 kb segment of the 1830 kb H. influenzae Rd chromosome. The dark blue points are for a DNA...
Source: RRResearch - Category: Molecular Biology Authors: Source Type: blogs