PASS2.7: a database containing structure-based sequence alignments and associated features of protein domain superfamilies from SCOPe

AbstractSequence alignments are models that capture the structural, functional and evolutionary relationships between proteins. Structure-guided sequence alignments are helpful in the case of distantly related proteins with poor sequence identity, thus rendering routine sequence alignment methods ineffective. Protein Alignment organized as Structural Superfamilies or PASS2 database provides such sequence alignments of protein domains within a superfamily as per the Structural Classification of Proteins extended (SCOPe) database. The current update of PASS2 (i.e. PASS2.7) is following the latest release of SCOPe (2.07) and we provide data for 14  323 protein domains that are<40% identical and are organized into 2024 superfamilies. Several useful features derived from the alignments, such as conserved secondary structural motifs, HMMs and residues conserved across the superfamily, are also reported. Protein domains that are deviant from the rest of the members of a superfamily may compromise the quality of the alignment, and we found this to be the case in ∼7% of the total superfamilies we considered. To improve the alignment by objectively identifying such ‘outliers’, in this update, we have used ak-means-based unsupervised machine learning method for clustering superfamily members, where features provided were length of domains aligned, Cα-RMSD derived from the rigid-body superposition of all members and gaps contributed to the alignment by each domain. In a few c...
Source: Database : The Journal of Biological Databases and Curation - Category: Databases & Libraries Source Type: research