Low-rank and Sparse Subspace Modeling of Speech for DNN Based Acoustic Modeling

Publication date: Available online 26 March 2019Source: Speech CommunicationAuthor(s): Pranay Dighe, Afsaneh Asaei, Hervé BourlardAbstractTowards the goal of improving acoustic modeling for automatic speech recognition (ASR), this work investigates the modeling of senone subspaces in deep neural network (DNN) posteriors using low-rank and sparse modeling approaches. While DNN posteriors are typically very high-dimensional, recent studies have shown that the true class information is actually embedded in low-dimensional subspaces. Thus, a matrix of all posteriors belonging to a particular senone class is expected to have a very low rank. In this paper, we exploit Principal Component Analysis and Compressive Sensing based dictionary learning for low-rank and sparse modeling of senone subspaces respectively. Our hypothesis is that the principal components of DNN posterior space (termed as eigen-posteriors in this work) and Compressive Sensing dictionaries can act as suitable models to extract the well-structured low-dimensional latent information and discard the undesirable high-dimensional unstructured noise present in the posteriors. Enhanced DNN posteriors thus obtained are used as soft targets for training better acoustic models to improve ASR. In this context, our approach also enables improving distant speech recognition by mapping far-field acoustic features to low-dimensional senone subspaces learned from near-field features. Experiments are performed on AMI Meeting cor...
Source: Speech Communication - Category: Speech-Language Pathology Source Type: research