Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target

Publication date: Available online 20 February 2019Source: Speech CommunicationAuthor(s): Paria Dadvar, Masoud GeravanchizadehAbstractIn this paper, a robust binaural speech separation system based on deep neural network (DNN) is introduced. The proposed system has three main processing stages. In the spectral processing stage, the multiresolution cochleagram (MRCG) feature is extracted from the beamformed signal. In the spatial processing stage, a novel reliable spatial feature of smITD+smILD is obtained by soft missing data masking of binaural cues. In the final stage, a deep neural network takes the combined spectral and spatial features and estimates a newly defined ideal ratio mask (IRM) designed for noisy and reverberant conditions. The performance of the proposed system is evaluated and compared with two recent binaural speech separation systems as baselines in various noisy and reverberant conditions. Furthermore, the performance of each processing stage is explored and compared to those of state-of-the-art approaches. A multitalker spatially diffuse babble is used as interferer at four signal-to-noise ratios (SNRs). Simulated rooms with four matched and four unmatched reverberation times (RTs) are considered in the experiments. It is shown that the proposed system outperforms the baseline systems in improving the intelligibility and quality of separated speech signals in reverberant and noisy conditions. The results confirm the efficiency of each system component, es...
Source: Speech Communication - Category: Speech-Language Pathology Source Type: research