PS4: a next-generation dataset for protein single-sequence secondary structure prediction

We present PS4, a dataset of 18,731 nonredundant protein chains and their respective secondary structure labels. Each chain is identified, and the dataset is nonredundant against other secondary structure datasets commonly seen in the literature. We perform ablation studies by training secondary structure prediction algorithms on the PS4 training set and obtains state-of-the-art accuracy on the CB513 test set in zero shots.PMID:37997848 | DOI:10.2144/btn-2023-0024
Source: BioTechniques - Category: Biotechnology Authors: Source Type: research