A new kind of problem

The PhD student and I are analyzing the data from his mapping ofH. influenzae's uptake of genomic DNA. The data was generated by Illumina sequencing of genomic DNA samples before and after they had been taken up by competent cells. Using arec2 mutant as the competent cells lets us recover the DNA intact after uptake.He has done quite a bit of analysis of the resulting uptake ratios (ratio of 'recovered' coverage to 'input' coverage) but now we're going back and sorting out various anomalies that we initially ignored or overlooked.One big issue is the surprisingly uneven sequencing coverage of the genome that we see even when just sequencing sheared genomic DNA (the 'input' samples). The graph below shows the sequencing coverage of the first 10 kb of the genome of strain 'NP'. The orange dots are from a short-fragment DNA prep (sheared, most fragments between 50 and 500 bp) and the blue dots are from a large-fragment DNA prep (sheared, most fragments between 1-5 and 9 kb).Over most of this segment (and over most of the ~1900 kb genome), coverage of the 'short' sample is about 200-400 reads, about twice as high as the ~150-250 read coverage of the 'long' sample. But in three places coverage of both samples falls to near-zero or zero. Similar losses of coverage occur at many places throughout the genome, though this segment has more than usual.What could cause this? In principle it could be something about the DNA preparations, about the librar...
Source: RRResearch - Category: Molecular Biology Authors: Source Type: blogs