How many contamination-control replicates can we do?
This is a continuation of the previous post.For each of our 12 genuinely contaminated uptake samples we want to create multiple replicates of a fake-contaminated input samples, each fake-contaminated with an independent set of Rd reads at that sample's level of contamination.For example, our UP01 sample has 5.3% Rd contamination. its corresponding input sample is UP13. UP13 has about 2.7 x10^6 reads, so to make a fake-contaminated sample for UP13 we need to add (2.7x10^6 * 0.053)/(1-0.053) = 1.5x10^5 Rd reads to the UP13 reads.Since our Rd sample has 4,088,620 reads, for UP01 we could make 27 such fake-contaminat...
Source: RRResearch - August 4, 2017 Category: Molecular Biology Authors: Rosie Redfield Source Type: blogs

Almost there: making the uptake ratio graphs
Yesterday the PhD student showed me the results of his contamination-correction tests.  They confirmed that our new error-correction strategy works, and suggested an improvement.The problem and the strategy:  We want to know how efficiently different segments of NP or GG DNA are taken up by competent Rd cells.  All of our 12 'uptake' samples are contaminated, consisting of mostly reads of NP or GG DNA taken up by Rd cells plus varying amounts of contaminating Rd chromosomal DNA.  We want to calculate the 'uptake ratio' for each genome position as the ratio of sequence coverage in the uptake sample to co...
Source: RRResearch - August 4, 2017 Category: Molecular Biology Authors: Rosie Redfield Source Type: blogs

UBC's Faculty Pension Plan, for dummies
Yesterday I went to an information session about UBC's Faculty Pension Plan, designed for faculty approaching the age when we have to decide what to do with the money the Plan has accumulated on our behalf. Even if we choose not to retire (or not yet), by age 71 contributions to the Plan will end and we must make a decision.Until retirement or age 71, the Plan has been taking money from our salaries and from UBC's pockets, and investing it in something they call their Balanced Fund. (Yes, if you know what you're doing you can change the investment mix.) The info session told us that this fund does pretty wel...
Source: RRResearch - July 27, 2017 Category: Molecular Biology Authors: Rosie Redfield Source Type: blogs

Ways to test contamination control
In the previous post I proposed an alternative method to control for contaminating recipient DNA in our donor DNA uptake samples. Because we've never done this before (and probably nobody else has either), we need ways to check that it accomplishes what we want.Here's one way:We already have made samples of pure donor DNA reads (from strain NP or GG) that have been deliberately contaminated with reads from the recipient Rd (10% Rd, 90% NP or GG). These REMIX samples have already been mapped to the donor genomes.Make a second set of these samples, using the same pure donor samples but this time contaminating them to 10...
Source: RRResearch - July 27, 2017 Category: Molecular Biology Authors: Rosie Redfield Source Type: blogs

New contamination analysis suggests alternate strategy
Yesterday's post considered the factors affecting our current strategy for removing the contamination Rd-derived reads from our 'uptake' samples. Since then I've looked at the read-mapping results using our new perfected reference sequences. The problem of reads that have no SNPs and thus map ambiguously to both Rd and NP/GG is worse than we thought. Below I'll describe this data and the new strategy I want to consider.The former post-doc sent us a spreadsheet with all the summary data from mapping each of our samples ('input', 'uptake' and controls) to various combinations of genomes. I'll start by con...
Source: RRResearch - July 25, 2017 Category: Molecular Biology Authors: Rosie Redfield Source Type: blogs

Coverage and contamination in our DNA-uptake dataset
We finally have the corrected reference genome sequences for our DNA uptake deep-sequencing data. The genome sequences from NCBI had errors and polymorphisms relative to the specific strains we used in the experiments, so the former post-doc used the sequence data from this experiment to create reference sequences that perfectly match the DNAs we used.The bright-coloured table below shows 8 of our 16 samples. These are the samples that examined uptake of DNA from strain 'NP' (86-028NP) into strain 'Rd'. Not included are the equivalent 8 sample that used DNA from strain 'GG"' (PittGG). These reads have b...
Source: RRResearch - July 21, 2017 Category: Molecular Biology Authors: Rosie Redfield Source Type: blogs

Does expression of the toxA operon depend on ToxT as well as ToxA?
Short answer: Yes, but not in the way we expected.First, here's a diagram showing thetoxTA operon and the mutants we're examining:The grey bars show the extents of the deletions. The ∆toxT and ∆toxTA mutants have a SpcR/strR cassette inserted at the deletion point, but the ∆toxA mutant has only a short 'scar' sequence at the deletion point.A few months ago I wrote a post about evidence that ToxA prevents transcription of the toxTA operon from an unexpected internal promoter. Here's a better version of the graph I showed there (note that transcription is going from right to left):It looks like there are...
Source: RRResearch - May 20, 2017 Category: Molecular Biology Authors: Rosie Redfield Source Type: blogs

Learning to use the NCBI Gene Expression Omnibus
As part of our workup for the toxin/antitoxin manuscript, I want to find expression data for the homologs of theHaemophilus influenzae toxin and antitoxin genes. The former post-doc recommends that I use NCBI's Gene Expression Omnibus ('GEO') for this.I'll need to learn how to search the GEO for specific accession data and data from specific taxa.I'll also need to find out the specific identifiers for the genes I'm interested in, in the species I'm interested in. I think I can use BLAST searches (queried with theH. influenzae sequences) to find the species and links to the DNA sequences of the homologs, and then ...
Source: RRResearch - May 15, 2017 Category: Molecular Biology Authors: Rosie Redfield Source Type: blogs

How do non-competence genes respond to competence inducting treatment?
For the RNAseq part of the toxin-antitoxin paper, we should describe what we learn about how transfer to the competence-inducing starvation medium MIV affects genes not known to be involved in competence.The former undergrad left us with a set of Edge and DEseq2 analyses of changes in gene expression. I discussed them here last summer (http://rrresearch.fieldofscience.com/2016/08/making-sense-of-rna-seq-comparisons.html). Unfortunately I don't know how to properly interpret them. The former post-doc suggested some analyses, but I'm reluctant to dive into these until I have a better idea of what I'd be gettin...
Source: RRResearch - April 6, 2017 Category: Molecular Biology Authors: Rosie Redfield Source Type: blogs

A new plan for contamination correction
The grad student and I had an intense Skype discussion with the former postdoc this morning. We realized that there's a much easier way to estimate the total Rd contamination in the uptake samples, so we're going to do this.Preamble: The former postdoc isn't convinced that this will be better than what we were doing. He likes the simplicity of using the dual-genome alignments directly to calculate uptake ratios, and doesn't think that all the missing data from multiply-mapped positions is a concern. I don't like discarding what is otherwise good data, and hope that using all the data we can will reduce the n...
Source: RRResearch - April 5, 2017 Category: Molecular Biology Authors: Rosie Redfield Source Type: blogs

What we learned from the RNAseq data: Part 1
So, we did a massive RNAseq study, with 124 samples ofH. influenzae cultures at different growth stages, in the rich medium sBHI and the competence-inducing medium MIV, and with wildtype or mutant genes affecting competence. (You can use the Search box to find all the previous posts about this work...) Here I specifically want to think about what we learned about competence from the competence-induced cultures (cells transferred from sBHI to MIV). We sampled cultures at the T=0 point indicated by the star in the above diagram, and the 10, 30 and 10 minute times in MIV. I'll consider the results for stra...
Source: RRResearch - April 5, 2017 Category: Molecular Biology Authors: Rosie Redfield Source Type: blogs

How will contaminating Rd reads map onto NP?
Based on the analysis in the previous post, we can do the lower-bound and upper-bound calculations and corrections for each of our 12 'uptake' samples (3 replicates each of 4 treatments). This will give us good estimates of the % Rd contamination for each sample. But what do we do with this information?We can map each sample just to its own NP genome (or GG). Now the Rd-derived reads from positions that have strong similarity to NP locations should map there, including all the repeats and no-SNP reads.  I think that the Rd-derived sequences that don't have NP homologs and thus can't be mapped onto NP wi...
Source: RRResearch - April 3, 2017 Category: Molecular Biology Authors: Rosie Redfield Source Type: blogs

Try #3: Analyzing the uptake data
OK, here's another try at understanding the interactions of repeated sequences, reads with no SNPs, and RD contamination in our DNA uptake sequencing data.A reminder of what the problem is that I'm trying to solve: We have three 'uptake' samples for each treatment in our big analysis of DNA uptake specificity, but they can't be directly compared because the strain NP donor DNA has varying levels of contamination by the recipient strain Rd. We have no direct way to measure this contamination, but I think we can infer it from other features of the samples.This time I've simplified the diagrams by considering only a...
Source: RRResearch - April 3, 2017 Category: Molecular Biology Authors: Rosie Redfield Source Type: blogs

How to analyze the DNA uptake data
I'm working on getting a clear plan for how we will uptake bias for each genome position (genome-wide uptake ratios) from our DNA uptake sequencing data.In principle, we just divide the coverage at each position in the sequencing data for the recovered DNA samples (taken up by cells) by the coverage at the same position in the sequencing data for the input DNA.(We are using DNAs from two donor strains 'NP' and 'GG'). Below I'll just describe things for NP, but exactly the same issues apply to GG.)Complications (some solved, some not): Reads from repeated sequences such as ribosomal RNA genes (5 copies) can't be m...
Source: RRResearch - March 31, 2017 Category: Molecular Biology Authors: Rosie Redfield Source Type: blogs

A new kind of problem
The PhD student and I are analyzing the data from his mapping ofH. influenzae's uptake of genomic DNA. The data was generated by Illumina sequencing of genomic DNA samples before and after they had been taken up by competent cells. Using arec2 mutant as the competent cells lets us recover the DNA intact after uptake.He has done quite a bit of analysis of the resulting uptake ratios (ratio of 'recovered' coverage to 'input' coverage) but now we're going back and sorting out various anomalies that we initially ignored or overlooked.One big issue is the surprisingly uneven sequencing coverage of the genome that we s...
Source: RRResearch - March 29, 2017 Category: Molecular Biology Authors: Rosie Redfield Source Type: blogs