Training of Reduced-Rank Linear Transformations for Multi-layer Polynomial Acoustic Features for Speech Recognition

Publication date: Available online 8 April 2019Source: Speech CommunicationAuthor(s): Muhammad Ali Tahir, Heyun Huang, Albert Zeyer, Ralf Schlüter, Hermann NeyAbstractThe use of higher-order polynomial acoustic features can improve the performance of automatic speech recognition (ASR). However, dimensionality of polynomial representation can be prohibitively large, making acoustic model training using polynomial features infeasible for large vocabulary ASR systems. This paper presents a multi-layer polynomial training framework for acoustic modeling, which recursively expands the acoustic features into their second-order polynomial feature space. After each expansion the dimensionality of resultant features is reduced by a linear transformation. Experimental results obtained for two large-vocabulary continuous speech recognition tasks show that the proposed method outperforms conventional mixture models. More recently the acoustic modelling community has shifted its focus to deep neural networks. We also train multi-layer polynomial features in a similar way: allowing backpropagation and using mean-normalized stochastic gradient descent algorithm. This has led to encouraging results. Specifically, appending a sigmoid-based feed-forward deep neural network with a final polynomial layer has resulted in significant word error rate improvement.
Source: Speech Communication - Category: Speech-Language Pathology Source Type: research