Automatization and self-maintenance of the O-GlcNAcome catalog: a smart scientific database

AbstractPost-translational modifications (PTMs) are ubiquitous and essential for protein function and signaling, motivating the need for sustainable benefit and open models of web databases. Highly conservedO-GlcNAcylation is a case example of one of the most recently discovered PTMs, investigated by a growing community. Historically, details aboutO-GlcNAcylated proteins and sites were dispersed across literature and in non-O-GlcNAc-focused, rapidly outdated or now defunct web databases. In a first effort to fill the gap, we recently published a humanO-GlcNAcome catalog with a basic web interface. Based on the enthusiasm generated by this first resource, we extended ourO-GlcNAcome catalog to include data from 42 distinct organisms and released theO-GlcNAc Database v1.2. In this version, more than 14 500O-GlcNAcylated proteins and 11 000O-GlcNAcylation sites are referenced from the curation of 2200 publications. In this article, we also present the extensive features of theO-GlcNAc Database, including the user-friendly interface, back-end and client –server interactions. We particularly emphasized our workflow, involving a mostly automatized and self-maintained database, including machine learning approaches for text mining. We hope that this software model will be useful beyond theO-GlcNAc community, to set up new smart, scientific online databases, in a short period of time. Indeed, this database system can be administrated with little to no programming skills and is meant...
Source: Database : The Journal of Biological Databases and Curation - Category: Databases & Libraries Source Type: research