Speech Enhancement using ultrasonic doppler sonar

This study validated the use of ultrasonic doppler frequency shifts caused by facial movements for enhancing audio speech contaminated by high levels of acoustic noise. A 40kHz ultrasonic beam is incident to a speaker’s face. The received signals were first demodulated and converted to a spectral feature parameter. The spectral feature derived from the ultrasonic Doppler signal (UDS) was concatenated with spectral features from noisy speech, which were then used to estimate the magnitude of the spectrum of clean speech. A nonlinear regression approach was employed in this estimation where the relationship between audio-UDS features and the corresponding clean speech is represented by deep neural networks (DNN). The feasibility of the proposed enhancement method was tested on a 1 hour audio-UDS corpus and four different types of noise data. The results showed that, both objectively and subjectively, the best performance was obtained when the audio and UDS were used cooperatively. A correlation analysis was also carried out to investigate the usefulness of multi-directional ultrasonic sensing. The results showed that the performance was affected by the number of the adopted UDS channels, particularly in cases of low levels of SNRs.
Source: Speech Communication - Category: Speech-Language Pathology Source Type: research