Resolving unknown nucleotides in the IPD-IMGT/HLA database by extended and full-length sequencing of HLA class I and II alleles

AbstractIn the past, identification of HLA alleles was limited to sequencing the region of the gene coding for the peptide binding groove, resulting in a lack of sequence information in the HLA database, challenging HLA allele assignment software programs. We investigated full-length sequences of 19 HLA class I and 7 HLA class II alleles, and we extended another 47 HLA class I alleles with sequences of 5 ′ and 3′ UTR regions that were all not yet available in the IPD-IMGT/HLA database. We resolved 8638 unknown nucleotides in the coding sequence of HLA class I and 2139 of HLA class II. Furthermore, with full-length sequencing of the 26 alleles, more than 90 kb of sequence information was added t o the non-coding sequences, whereas extension of the 47 alleles resulted in the addition of 5.5 kb unknown nucleotides to the 5′ UTR and >  31.7 kb to the 3′ UTR region. With this information, some interesting features were observed, like possible recombination events and lineage evolutionary origins. The continuing increase in the availability of full-length sequences in the HLA database will enable the identification of the evol utionary origin and will help the community to improve the alignment and assignment accuracy of HLA alleles.
Source: Immunogenetics - Category: Genetics & Stem Cells Source Type: research