Privacy-Preserving PLDA Speaker Verification using Outsourced Secure Computation
Publication date: Available online 1 October 2019Source: Speech CommunicationAuthor(s): Amos Treiber, Andreas Nautsch, Jascha Kolberg, Thomas Schneider, Christoph BuschAbstractThe usage of biometric recognition has become prevalent in various verification processes, ranging from unlocking mobile devices to verifying bank transactions. Automatic speaker verification (ASV) allows an individual to verify its identity towards an online service provider by comparing freshly sampled speech data to reference information stored on the service provider’s server. Due to the sensitive nature of biometric data, the storage and usag...
Source: Speech Communication - October 2, 2019 Category: Speech-Language Pathology Source Type: research

Automatic Speech Emotion Recognition using an Optimal Combination of Features based on EMD-TKEO
Publication date: Available online 19 September 2019Source: Speech CommunicationAuthor(s): Leila Kerkeni, Youssef Serrestou, Kosai Raoof, Mohamed Mbarki, Mohamed Ali Mahjoub, Catherine ClederAbstractIn this paper, we propose a global approach for speech emotion recognition (SER) system using empirical mode decomposition (EMD). Its use is motivated by the fact that the EMD combined with the Teager-Kaiser Energy Operator (TKEO) gives an efficient time-frequency analysis of the non-stationary signals. In this method, each signal is decomposed using EMD into oscillating components called intrinsic mode functions (IMFs). TKEO i...
Source: Speech Communication - September 20, 2019 Category: Speech-Language Pathology Source Type: research

Objective Classification of Auditory Brainstem Responses to Consonant-Vowel Syllables Using Local Discriminant Bases
ConclusionThis study shows the efficiency of frequency and time-frequency domains features. The results indicate that time-frequency features obtained by local discriminant bases were more successful in objective classifications of the responses. Besides, the selected features from the phase of the frequency responses were reliable in classifying the responses to consonant-vowels /da/, /ba/ and /ga/.SignificanceThe importance of this study lies in the fact that it helps the objective classification of auditory brainstem responses underlying the different encoding of three consonant-vowels (/ba/, /da/ and /ga/) in frequency...
Source: Speech Communication - September 20, 2019 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: October 2019Source: Speech Communication, Volume 113Author(s): (Source: Speech Communication)
Source: Speech Communication - September 13, 2019 Category: Speech-Language Pathology Source Type: research

Time-domain Speech Enhancement Using Generative Adversarial Networks
Publication date: Available online 4 September 2019Source: Speech CommunicationAuthor(s): Santiago Pascual, Joan Serrà, Antonio BonafonteAbstractSpeech enhancement improves recorded voice utterances to eliminate noise that might be impeding their intelligibility or compromising their quality. Typical speech enhancement systems are based on regression approaches that subtract noise or predict clean signals. Most of them do not operate directly on waveforms. In this work, we propose a generative approach to regenerate corrupted signals into a clean version by using generative adversarial networks on the raw signal. We also ...
Source: Speech Communication - September 5, 2019 Category: Speech-Language Pathology Source Type: research

Data Augmentation using Generative Adversarial Networks for Robust Speech Recognition
Publication date: Available online 19 August 2019Source: Speech CommunicationAuthor(s): Yanmin Qian, Hu Hu, Tian TanAbstractFor noise robust speech recognition, data mismatch between training and testing is a significant challenge. Data augmentation is an effective way to enlarge the size and diversity of training data and solve this problem. Different from the traditional approaches by directly adding noise to the original waveform, in this work we utilize generative adversarial networks (GAN) for data generation to improve speech recognition under noise conditions. In this paper we investigate different configurations of...
Source: Speech Communication - August 20, 2019 Category: Speech-Language Pathology Source Type: research

Computer-vision analysis reveals facial movements made during mandarin tone production align with pitch trajectories
Publication date: Available online 17 August 2019Source: Speech CommunicationAuthor(s): Saurabh Garg, Ghassan Hamarneh, Allard Jongman, Joan A. Sereno, Yue WangAbstractUsing computer-vision and image processing techniques, we aim to identify specific visual cues as induced by facial movements made during Mandarin tone production and examine how they are associated with each of the four Mandarin tones. Audio-video recordings of 20 native Mandarin speakers producing Mandarin words involving the vowel /3/ with each of the four tones were analyzed. Four facial points of interest were detected automatically: medial point of lef...
Source: Speech Communication - August 18, 2019 Category: Speech-Language Pathology Source Type: research

Unconventional Spoken Iconicity follows a Conventional Structure: Evidence from Demonstrations
Publication date: Available online 16 August 2019Source: Speech CommunicationAuthor(s): Arthur Lewis Thompson, Youngah DoAbstractSome languages have more forms of conventional spoken iconicity than others. Japanese, for example, has more ideophones than English. So how do speakers of a language with limited semantic categories of ideophones depict percepts? One possibility is demonstrations: unconventional, yet depictive, discourse. Demonstrations follow quotatives (e.g., I was like ___) and perform referents as opposed to describing them. In English, a language with arguably restricted sets of ideophones, speakers may ena...
Source: Speech Communication - August 17, 2019 Category: Speech-Language Pathology Source Type: research

Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech
Publication date: Available online 14 August 2019Source: Speech CommunicationAuthor(s): Okko Räsänen, Shreyas Seshadri, Julien Karadayi, Eric Riebling, John Bunce, Alejandrina Cristia, Florian Metze, Marisa Casillas, Celia Rosemberg, Elika Bergelson, Melanie SoderstromAbstractAutomatic word count estimation (WCE) from audio recordings can be used to quantify the amount of verbal communication in a recording environment. One key application of WCE is to measure language input heard by infants and toddlers in their natural environments, as captured by daylong recordings from microphones worn by the infants. Although WCE is...
Source: Speech Communication - August 15, 2019 Category: Speech-Language Pathology Source Type: research

Differentiating tongue shapes for alveolar-postalveolar and alveolar-velar contrasts
Publication date: Available online 13 August 2019Source: Speech CommunicationAuthor(s): Natalia ZharkovaAbstractThis paper is focussed on differentiating midsagittal tongue shapes for alveolar-postalveolar and alveolar-velar contrasts in place of articulation. In addition to two established measures assessing the shape of a tongue curve, three new indices are introduced, which capture more fine grained distinctions in tongue shape, through quantifying the extent of curvature at different locations along the tongue contour. In order to establish whether the indices can be applied across a range of ultrasound recording setti...
Source: Speech Communication - August 13, 2019 Category: Speech-Language Pathology Source Type: research

Adaptive Blind Moving Source Separation Based on Intensity Vector Statistics
Publication date: Available online 8 August 2019Source: Speech CommunicationAuthor(s): Areeb Riaz, Xiyu Shi, Ahmet KondozAbstractThis paper presents a novel approach to blind moving source separation by detecting, tracking and separating speakers in real-time using intensity vector direction (IVD) statistics. It updates unmixing system parameters swiftly in order to deal with the time-variant mixing parameters. Denoising is carried out to extract reliable speaker estimates using von-Mises modeling of the IVD measurements in space and IIR filtering of the IVD distribution in time. Peaks in the IVD distribution are assigned ...
Source: Speech Communication - August 8, 2019 Category: Speech-Language Pathology Source Type: research

End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models
This study explores this idea proposing a bimodal recurrent neural network (BRNN) framework for SAD. The approach models the temporal dynamic of the sequential audiovisual data, improving the accuracy and robustness of the proposed SAD system. Instead of estimating hand-crafted features, the study investigates an end-to-end training approach, where acoustic and visual features are directly learned from the raw data during training. The experimental evaluation considers a large audiovisual corpus with over 60.8 hours of recordings, collected from 105 speakers. The results demonstrate that the proposed framework leads to abs...
Source: Speech Communication - August 1, 2019 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: September 2019Source: Speech Communication, Volume 112Author(s): (Source: Speech Communication)
Source: Speech Communication - July 31, 2019 Category: Speech-Language Pathology Source Type: research

Listener Impressions of Foreigner-Directed Speech: A Systematic Review
Publication date: Available online 13 July 2019Source: Speech CommunicationAuthor(s): Kathrin Rothermich, Havan Leigh Harris, Kerry Sewell, Susan C. BobbAbstractNon-native speakers of a particular language face communicative challenges when interacting with native speakers in everyday life. A strategy frequently employed by native speakers to ensure smooth communication is speech accommodation in the form of foreigner-directed speech. Most of the research on foreigner-directed speech has focused on acoustic parameters, but few studies have examined non-native listener perceptions. This systematic review evaluates the publi...
Source: Speech Communication - July 13, 2019 Category: Speech-Language Pathology Source Type: research

Evaluation of VOCALISE under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01)
Publication date: Available online 11 July 2019Source: Speech CommunicationAuthor(s): Finnian Kelly, Andrea Fröhlich, Volker Dellwo, Oscar Forth, Samuel Kent, Anil Alexander (Source: Speech Communication)
Source: Speech Communication - July 12, 2019 Category: Speech-Language Pathology Source Type: research