Filling in the gaps to better understand human biology

Duplicated segments pose serious problems for the assembly and annotation of the human genome. In the human reference genome there are still large gaps that require specialized efforts to fill. Many of these gaps lie within highly duplicated segments in which the degree of sequence variation among duplicated loci approaches levels of allelic variation. Many people assume that much of the sequence that is still missing from the reference assembly is not very biologically interesting. However, it has become increasingly apparent that the segmental duplications themselves provide the molecular basis for many human genetic disorders. The resolution of these regions is therefore essential for a complete understanding of the genetic basis of human disease. Three patches released in GRCh37.p8, that add almost 400Kb of novel sequence, prove the concept that sequence missing so far from the reference genome can be of crucial importance.  The biological story surrounding these sequences can be found in a recent publication from the Eichler lab (Dennis et al., 2012) but here we'll tell you a little bit about how we worked with the Eichler lab to create these assembly patches. Figure 1: Ancestral copy of SRGAP2 in chimpanzee (left) and human (right). The other red ticks on the human chromosome showthe human specific duplications added by this effort.One of the impediments in resolving the complexity of these regions is the diploid nature of the hum...
Source: GenomeRef - Category: Genetics & Stem Cells Source Type: blogs