A new plan for contamination correction

The grad student and I had an intense Skype discussion with the former postdoc this morning. We realized that there's a much easier way to estimate the total Rd contamination in the uptake samples, so we're going to do this.Preamble: The former postdoc isn't convinced that this will be better than what we were doing. He likes the simplicity of using the dual-genome alignments directly to calculate uptake ratios, and doesn't think that all the missing data from multiply-mapped positions is a concern. I don't like discarding what is otherwise good data, and hope that using all the data we can will reduce the noise due to low coverage positions in our uptake ratios. (The grad student prudently kept his opinions to himself.)Easy contamination estimate: We actually already have this data for the eight NP samples, and can easily get it for the eight GG samples. Way back, the postdoc determined how many of the reads in each sample mapped to the Rd part of the concatenated Rd-NP genome, and how many mapped to the NP part (table below). Reads from repeats and no-SNP segments were excluded. The fraction mapping to Rd (Rd/total in the table below) is the fraction of Rd DNA contaminating the sample. (This data is from a file namedmeanreads per genome.xlsx. UP1, 2 and 3 are the replicate long-fragment uptake samples, UP4, 5, and 6 are the replicate short fragment uptake samples, and UP13 and UP15 are the corresponding long and short i...
Source: RRResearch - Category: Molecular Biology Authors: Source Type: blogs