Editorial Board
Publication date: October 2019Source: Speech Communication, Volume 113Author(s): (Source: Speech Communication)
Source: Speech Communication - September 13, 2019 Category: Speech-Language Pathology Source Type: research

Time-domain Speech Enhancement Using Generative Adversarial Networks
Publication date: Available online 4 September 2019Source: Speech CommunicationAuthor(s): Santiago Pascual, Joan Serrà, Antonio BonafonteAbstractSpeech enhancement improves recorded voice utterances to eliminate noise that might be impeding their intelligibility or compromising their quality. Typical speech enhancement systems are based on regression approaches that subtract noise or predict clean signals. Most of them do not operate directly on waveforms. In this work, we propose a generative approach to regenerate corrupted signals into a clean version by using generative adversarial networks on the raw signal. We...
Source: Speech Communication - September 5, 2019 Category: Speech-Language Pathology Source Type: research

Data Augmentation using Generative Adversarial Networks for Robust Speech Recognition
Publication date: Available online 19 August 2019Source: Speech CommunicationAuthor(s): Yanmin Qian, Hu Hu, Tian TanAbstractFor noise robust speech recognition, data mismatch between training and testing is a significant challenge. Data augmentation is an effective way to enlarge the size and diversity of training data and solve this problem. Different from the traditional approaches by directly adding noise to the original waveform, in this work we utilize generative adversarial networks (GAN) for data generation to improve speech recognition under noise conditions. In this paper we investigate different configurations of...
Source: Speech Communication - August 20, 2019 Category: Speech-Language Pathology Source Type: research

Computer-vision analysis reveals facial movements made during mandarin tone production align with pitch trajectories
Publication date: Available online 17 August 2019Source: Speech CommunicationAuthor(s): Saurabh Garg, Ghassan Hamarneh, Allard Jongman, Joan A. Sereno, Yue WangAbstractUsing computer-vision and image processing techniques, we aim to identify specific visual cues as induced by facial movements made during Mandarin tone production and examine how they are associated with each of the four Mandarin tones. Audio-video recordings of 20 native Mandarin speakers producing Mandarin words involving the vowel /3/ with each of the four tones were analyzed. Four facial points of interest were detected automatically: medial point of lef...
Source: Speech Communication - August 18, 2019 Category: Speech-Language Pathology Source Type: research

Unconventional Spoken Iconicity follows a Conventional Structure: Evidence from Demonstrations
Publication date: Available online 16 August 2019Source: Speech CommunicationAuthor(s): Arthur Lewis Thompson, Youngah DoAbstractSome languages have more forms of conventional spoken iconicity than others. Japanese, for example, has more ideophones than English. So how do speakers of a language with limited semantic categories of ideophones depict percepts? One possibility is demonstrations: unconventional, yet depictive, discourse. Demonstrations follow quotatives (e.g., I was like ___) and perform referents as opposed to describing them. In English, a language with arguably restricted sets of ideophones, speakers may ena...
Source: Speech Communication - August 17, 2019 Category: Speech-Language Pathology Source Type: research

Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech
Publication date: Available online 14 August 2019Source: Speech CommunicationAuthor(s): Okko Räsänen, Shreyas Seshadri, Julien Karadayi, Eric Riebling, John Bunce, Alejandrina Cristia, Florian Metze, Marisa Casillas, Celia Rosemberg, Elika Bergelson, Melanie SoderstromAbstractAutomatic word count estimation (WCE) from audio recordings can be used to quantify the amount of verbal communication in a recording environment. One key application of WCE is to measure language input heard by infants and toddlers in their natural environments, as captured by daylong recordings from microphones worn by the infants. Althoug...
Source: Speech Communication - August 16, 2019 Category: Speech-Language Pathology Source Type: research

Differentiating tongue shapes for alveolar-postalveolar and alveolar-velar contrasts
Publication date: Available online 13 August 2019Source: Speech CommunicationAuthor(s): Natalia ZharkovaAbstractThis paper is focussed on differentiating midsagittal tongue shapes for alveolar-postalveolar and alveolar-velar contrasts in place of articulation. In addition to two established measures assessing the shape of a tongue curve, three new indices are introduced, which capture more fine grained distinctions in tongue shape, through quantifying the extent of curvature at different locations along the tongue contour. In order to establish whether the indices can be applied across a range of ultrasound recording setti...
Source: Speech Communication - August 13, 2019 Category: Speech-Language Pathology Source Type: research

Adaptive Blind Moving Source Separation Based on Intensity Vector Statistics
Publication date: Available online 8 August 2019Source: Speech CommunicationAuthor(s): Areeb Riaz, Xiyu Shi, Ahmet KondozAbstractThis paper presents a novel approach to blind moving source separation by detecting, tracking and separating speakers in real-time using intensity vector direction (IVD) statistics. It updates unmixing system parameters swiftly in order to deal with the time-variant mixing parameters. Denoising is carried out to extract reliable speaker estimates using von-Mises modeling of the IVD measurements in space and IIR filtering of the IVD distribution in time. Peaks in the IVD distribution are assigned ...
Source: Speech Communication - August 8, 2019 Category: Speech-Language Pathology Source Type: research

End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models
This study explores this idea proposing a bimodal recurrent neural network (BRNN) framework for SAD. The approach models the temporal dynamic of the sequential audiovisual data, improving the accuracy and robustness of the proposed SAD system. Instead of estimating hand-crafted features, the study investigates an end-to-end training approach, where acoustic and visual features are directly learned from the raw data during training. The experimental evaluation considers a large audiovisual corpus with over 60.8 hours of recordings, collected from 105 speakers. The results demonstrate that the proposed framework leads to abs...
Source: Speech Communication - August 1, 2019 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: September 2019Source: Speech Communication, Volume 112Author(s): (Source: Speech Communication)
Source: Speech Communication - July 31, 2019 Category: Speech-Language Pathology Source Type: research

Listener Impressions of Foreigner-Directed Speech: A Systematic Review
Publication date: Available online 13 July 2019Source: Speech CommunicationAuthor(s): Kathrin Rothermich, Havan Leigh Harris, Kerry Sewell, Susan C. BobbAbstractNon-native speakers of a particular language face communicative challenges when interacting with native speakers in everyday life. A strategy frequently employed by native speakers to ensure smooth communication is speech accommodation in the form of foreigner-directed speech. Most of the research on foreigner-directed speech has focused on acoustic parameters, but few studies have examined non-native listener perceptions. This systematic review evaluates the publi...
Source: Speech Communication - July 13, 2019 Category: Speech-Language Pathology Source Type: research

Evaluation of VOCALISE under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01)
Publication date: Available online 11 July 2019Source: Speech CommunicationAuthor(s): Finnian Kelly, Andrea Fröhlich, Volker Dellwo, Oscar Forth, Samuel Kent, Anil Alexander (Source: Speech Communication)
Source: Speech Communication - July 12, 2019 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: August 2019Source: Speech Communication, Volume 111Author(s): (Source: Speech Communication)
Source: Speech Communication - July 6, 2019 Category: Speech-Language Pathology Source Type: research

A statistical procedure to adjust for time-interval mismatch in forensic voice comparison
Publication date: Available online 2 July 2019Source: Speech CommunicationAuthor(s): Geoffrey Stewart Morrison, Finnian KellyAbstractThe present paper describes a statistical modeling procedure that was developed to account for the fact that, in a forensic voice comparison analysis conducted for a particular case, there was a long time interval between when the questioned- and known-speaker recordings were made (six years), but in the sample of the relevant population used for training and testing the forensic voice comparison system there was a short interval (hours to days) between when each of multiple recordings of eac...
Source: Speech Communication - July 3, 2019 Category: Speech-Language Pathology Source Type: research

Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case (forensic_eval_01) – Conclusion
Publication date: Available online 27 June 2019Source: Speech CommunicationAuthor(s): Geoffrey Stewart Morrison, Ewald EnzingerAbstractThis conclusion to the virtual special issue (VSI) “Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case (forensic_eval_01)” provides a brief summary of the papers included in the VSI, observations based on the results, and reflections on the aims and process. It also includes errata and acknowledgments. (Source: Speech Communication)
Source: Speech Communication - June 27, 2019 Category: Speech-Language Pathology Source Type: research

Detection of Speech Tampering Using Sparse Representations and Spectral Manipulations Based Information Hiding
Publication date: Available online 21 June 2019Source: Speech CommunicationAuthor(s): Shengbei Wang, Weitao Yuan, Jianming Wang, Masashi UnokiAbstractSpeech tampering has brought serious problems to speech security. Information hiding method can be used for tampering detection if it can satisfy several competitive requirements, e.g., inaudibility, robustness, blindness, and fragility. According to preliminary analysis, spectral envelope and formants are important indicators of tampering, since tampering the speech will unavoidably modify the shape of the spectral envelope and the locations/magnitudes of the formants. By ta...
Source: Speech Communication - June 22, 2019 Category: Speech-Language Pathology Source Type: research

Deep Learning for Minimum Mean-Square Error Approaches to Speech Enhancement
Publication date: Available online 13 June 2019Source: Speech CommunicationAuthor(s): Aaron Nicolson, Kuldip K. PaliwalAbstractRecently, the focus of speech enhancement research has shifted from minimum mean-square error (MMSE) approaches, like the MMSE short-time spectral amplitude (MMSE-STSA) estimator, to state-of-the-art masking- and mapping-based deep learning approaches. We aim to bridge the gap between these two differing speech enhancement approaches. Deep learning methods for MMSE approaches are investigated in this work, with the objective of producing intelligible enhanced speech at a high quality. Since the spe...
Source: Speech Communication - June 13, 2019 Category: Speech-Language Pathology Source Type: research

Anomaly Detection based Pronunciation Verification Approach Using Speech Attribute Features
Publication date: Available online 11 June 2019Source: Speech CommunicationAuthor(s): Mostafa Shahin, Beena AhmedAbstractComputer aided pronunciation training tools require accurate automatic pronunciation error detection algorithms to identify errors made by their users. However, the performance of these algorithms is highly dependent on the amount of mispronounced speech data used to train them and the reliability of its manual annotation. To overcome this problem, we turned the mispronunciation detection into an anomaly detection problem, which utilize algorithms trained with only correctly pronounced speech data. In th...
Source: Speech Communication - June 12, 2019 Category: Speech-Language Pathology Source Type: research

Improving Human Scoring of Prosody Using Parametric Speech Synthesis
In this study, HMM-based speech synthesis from an average model of native speakers was utilized. The experimental result shows that the proposed method can improve scoring reliability, which is confirmed by an increase in the inter-rater correlation. We also build an automatic pronunciation evaluation system trained from non-native speech databases with scores given by either the conventional and proposed methods, and compare the performance of the systems. The result shows that the predicted pronunciation scores matched the human-rated scores; the human-machine correlation produced a score of 0.87, while the conventional ...
Source: Speech Communication - June 6, 2019 Category: Speech-Language Pathology Source Type: research

Evaluation of Phonexia automatic speaker recognition software under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01)
Publication date: Available online 23 May 2019Source: Speech CommunicationAuthor(s): Michael Jessen, Jakub Bortlík, Petr Schwarz, Yosef A. SolewiczAbstractAs part of the Speech Communication virtual special issue “Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case (forensic_eval_01)” two automatic speaker recognition systems developed by the company Phonexia were tested. The first named SID (Speaker Identification)-XL3 is an i-vector PLDA system that works with two steams of features, one of them using MFCCs in a classical sense, the...
Source: Speech Communication - May 24, 2019 Category: Speech-Language Pathology Source Type: research

Exploiting Temporal Correlation in Pitch-Adaptive Speech Enhancement
Publication date: Available online 17 May 2019Source: Speech CommunicationAuthor(s): Johannes Stahl, Pejman MowlaeeAbstractThe single-channel speech enhancement problem is addressed. We propose a pitch-adaptive short-time Fourier transform (PASTFT) framework to obtain a signal-dependent time-frequency representation of the input signal. We analyze the inter-frame correlation of successive speech DFT bins resulting from the PASTFT and harmonic signal modeling. This analysis reveals significant correlation if the phase progression introduced by the harmonic nature of the speech signal is taken into account. Hence, we model s...
Source: Speech Communication - May 18, 2019 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: July 2019Source: Speech Communication, Volume 110Author(s): (Source: Speech Communication)
Source: Speech Communication - May 17, 2019 Category: Speech-Language Pathology Source Type: research

How modeling entrance loss and flow separation in a two-mass model affects the oscillation and synthesis quality
In this study, a modified two-mass model of the vocal folds was used to simulate phonation for 12 modeling options: three ways to model the entrance loss combined with four ways to model flow separation. For each condition, the following characteristics of the glottal oscillation and flow were determined: the phonation threshold pressure, the frequency range of self-sustained oscillation, the oscillation amplitude for different glottal rest openings, the spectral slope of the flow derivative, the maximum flow declination rate (MFDR), the open quotient (OQ), and the difference between the levels of the first and second harm...
Source: Speech Communication - May 1, 2019 Category: Speech-Language Pathology Source Type: research

Normal-to-Lombard Adaptation of Speech Synthesis Using Long Short-Term Memory Recurrent Neural Networks
Publication date: Available online 18 April 2019Source: Speech CommunicationAuthor(s): Bajibabu Bollepalli, Lauri Juvela, Manu Airaksinen, Cassia Valentini-Botinhao, Paavo AlkuAbstractIn this article, three adaptation methods are compared based on how well they change the speaking style of a neural network based text-to-speech (TTS) voice. The speaking style conversion adopted here is from normal to Lombard speech. The selected adaptation methods are: auxiliary features (AF), learning hidden unit contribution (LHUC), and fine-tuning (FT). Furthermore, four state-of-the-art TTS vocoders are compared in the same context. The...
Source: Speech Communication - April 19, 2019 Category: Speech-Language Pathology Source Type: research

IITG-HingCoS Corpus: A Hinglish Code-Switching Database for Automatic Speech Recognition
Publication date: Available online 17 April 2019Source: Speech CommunicationAuthor(s): Sreeram Ganji, Kunal Dhawan, Rohit SinhaAbstractCode-switching is a phenomenon in linguistics which refers to the use of two or more languages, especially within the same discourse. This phenomenon has been observed in many multilingual communities across the globe. In the recent past, there have been increasing demand for automatic speech recognition (ASR) systems to deal with code-switching. However, for training such systems, very limited code-switching resources are available as yet. Thus, the development of code-switching resources ...
Source: Speech Communication - April 17, 2019 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: May 2019Source: Speech Communication, Volume 109Author(s): (Source: Speech Communication)
Source: Speech Communication - April 16, 2019 Category: Speech-Language Pathology Source Type: research

Evaluation of Nuance Forensics 9.2 and 11.1 under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01)
Publication date: Available online 10 April 2019Source: Speech CommunicationAuthor(s): Dr. Michael Jessen (Source: Speech Communication)
Source: Speech Communication - April 11, 2019 Category: Speech-Language Pathology Source Type: research

Training of Reduced-Rank Linear Transformations for Multi-layer Polynomial Acoustic Features for Speech Recognition
Publication date: Available online 8 April 2019Source: Speech CommunicationAuthor(s): Muhammad Ali Tahir, Heyun Huang, Albert Zeyer, Ralf Schlüter, Hermann NeyAbstractThe use of higher-order polynomial acoustic features can improve the performance of automatic speech recognition (ASR). However, dimensionality of polynomial representation can be prohibitively large, making acoustic model training using polynomial features infeasible for large vocabulary ASR systems. This paper presents a multi-layer polynomial training framework for acoustic modeling, which recursively expands the acoustic features into their second-or...
Source: Speech Communication - April 9, 2019 Category: Speech-Language Pathology Source Type: research

Dysarthric speech classification from coded telephone speech using glottal features
Publication date: Available online 8 April 2019Source: Speech CommunicationAuthor(s): N.P. Narendra, Paavo AlkuAbstractThis paper proposes a new dysarthric speech classification method from coded telephone speech using glottal features. The proposed method utilizes glottal features, which are efficiently estimated from coded telephone speech using a recently proposed deep neural net-based glottal inverse filtering method. Two sets of glottal features were considered: (1) time- and frequency-domain parameters and (2) parameters based on principal component analysis (PCA). In addition, acoustic features are extracted from co...
Source: Speech Communication - April 9, 2019 Category: Speech-Language Pathology Source Type: research

Speech-Driven Animation with Meaningful Behaviors
This study proposes to bridge the gap between these two approaches overcoming their limitations. The approach builds a dynamic Bayesian network (DBN), where a discrete variable is added to constrain the behaviors on the underlying constraint. The study implements and evaluates the approach with two constraints: discourse functions and prototypical behaviors. By constraining on the discourse functions (e.g., questions), the model learns the characteristic behaviors associated with a given discourse class learning the rules from the data. By constraining on prototypical behaviors (e.g., head nods), the approach can be embedd...
Source: Speech Communication - April 5, 2019 Category: Speech-Language Pathology Source Type: research

Speech Enhancement using ultrasonic doppler sonar
This study validated the use of ultrasonic doppler frequency shifts caused by facial movements for enhancing audio speech contaminated by high levels of acoustic noise. A 40kHz ultrasonic beam is incident to a speaker’s face. The received signals were first demodulated and converted to a spectral feature parameter. The spectral feature derived from the ultrasonic Doppler signal (UDS) was concatenated with spectral features from noisy speech, which were then used to estimate the magnitude of the spectrum of clean speech. A nonlinear regression approach was employed in this estimation where the relationship between aud...
Source: Speech Communication - April 4, 2019 Category: Speech-Language Pathology Source Type: research

Improving multilingual speech emotion recognition by combining acoustic features in a three-layer model
This study presents a scheme for multilingual speech emotion recognition. Determining the emotion of speech in general relies upon specific training data, and a different target speaker or language may present significant challenges. In this regard, we first explore 215 acoustic features from emotional speech. Second, we carry out speaker normalization and feature selection to develop a shared standard acoustic parameter set for multiple languages. Third, we use a three-layer model composed of acoustic features, semantic primitives, and emotion dimensions to map acoustics into emotion dimensions. Finally, we classify the c...
Source: Speech Communication - April 4, 2019 Category: Speech-Language Pathology Source Type: research

Analysis of phonation onsets in vowel production, using information from glottal area and flow estimate
Publication date: Available online 1 April 2019Source: Speech CommunicationAuthor(s): Tiina Murtola, Jarmo Malinen, Ahmed Geneid, Paavo AlkuAbstractA multichannel dataset comprising high-speed videoendoscopy images, and electroglottography and free-field microphone signals, was used to investigate phonation onsets in vowel production. Use of the multichannel data enabled simultaneous analysis of the two main aspects of phonation, glottal area, extracted from the high-speed videoendoscopy images, and glottal flow, estimated from the microphone signal using glottal inverse filtering. Pulse-wise parameterization of the glotta...
Source: Speech Communication - April 2, 2019 Category: Speech-Language Pathology Source Type: research

Speaker recognition using PCA-based feature transformation
Publication date: Available online 2 April 2019Source: Speech CommunicationAuthor(s): Ahmed Isam Ahmed, John Chiverton, David Ndzi, Victor BecerraAbstractThis paper introduces a Weighted-Correlation Principal Component Analysis (WCR-PCA) for efficient transformation of speech features in speaker recognition. A Recurrent Neural Network (RNN) technique is also introduced to perform the weighted PCA. The weights are taken as the log-likelihood values from a fitted Single Gaussian-Background Model (SG-BM). For speech features, we show that there are large differences between feature variances which makes covariance based PCA l...
Source: Speech Communication - April 2, 2019 Category: Speech-Language Pathology Source Type: research

Output-based Speech Quality Assessment Using Autoencoder and Support Vector Regression
Publication date: Available online 2 April 2019Source: Speech CommunicationAuthor(s): Jing Wang, Yahui Shan, Xiang Xie, Jingming KuangAbstractThe output-based speech quality assessment method has been widely used and received increasing attention since it does not need undistorted signals as reference. In order to obtain a high correlation between the predicted scores and subjective results, this paper presents a new speech quality assessment method to estimate the quality of degraded speech without the reference speech. Bottleneck features are extracted with autoencoder and support vector regression is chosen as mapping m...
Source: Speech Communication - April 2, 2019 Category: Speech-Language Pathology Source Type: research

New insights on the optimality of parameterized wiener filters for speech enhancement applications
Publication date: Available online 27 March 2019Source: Speech CommunicationAuthor(s): Rafael Attili Chiea, Márcio Holsbach Costa, Guillaume BarraultAbstractThis work presents a unified framework for defining a family of noise reduction techniques for speech enhancement applications. The proposed approach provides a unique theoretical foundation for some widely-applied soft and hard time-frequency masks, which encompasses the well-known Wiener filter and the heuristically-designed Binary mask. These techniques can now be considered as optimal solutions of the same minimization problem. The proposed cost function is ...
Source: Speech Communication - March 28, 2019 Category: Speech-Language Pathology Source Type: research

Low-rank and Sparse Subspace Modeling of Speech for DNN Based Acoustic Modeling
Publication date: Available online 26 March 2019Source: Speech CommunicationAuthor(s): Pranay Dighe, Afsaneh Asaei, Hervé BourlardAbstractTowards the goal of improving acoustic modeling for automatic speech recognition (ASR), this work investigates the modeling of senone subspaces in deep neural network (DNN) posteriors using low-rank and sparse modeling approaches. While DNN posteriors are typically very high-dimensional, recent studies have shown that the true class information is actually embedded in low-dimensional subspaces. Thus, a matrix of all posteriors belonging to a particular senone class is expected to ...
Source: Speech Communication - March 27, 2019 Category: Speech-Language Pathology Source Type: research

Temporal envelope cues and simulations of cochlear implant signal processing
Publication date: Available online 21 March 2019Source: Speech CommunicationAuthor(s): Raymond L. GoldsworthyABSTRACTConventional signal processing implemented on clinical cochlear implant (CI) sound processors is based on envelope signals extracted from overlapping frequency regions. Conventional strategies do not encode temporal envelope or temporal fine-structure cues with high fidelity. In contrast, several research strategies have been developed recently to enhance the encoding of temporal envelope and fine-structure cues. The present study examines the salience of temporal envelope cues when encoded into vocoder repr...
Source: Speech Communication - March 22, 2019 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: April 2019Source: Speech Communication, Volume 108Author(s): (Source: Speech Communication)
Source: Speech Communication - March 22, 2019 Category: Speech-Language Pathology Source Type: research

Speech Reverberation Suppression for Time-Varying Environments Using Weighted Prediction Error Method With Time-Varying Autoregressive Model
Publication date: Available online 11 March 2019Source: Speech CommunicationAuthor(s): Mahdi Parchami, Hamidreza Amindavar, Wei-Ping ZhuAbstractIn this paper, a novel approach for the task of speech reverberation suppression in non-stationary (changing) acoustic environments is proposed. The suggested approach is based on the popular weighted prediction error (WPE) method, yet, instead of considering fixed reverberation prediction weights, our method takes into account the more generic time-varying autoregressive (TV-AR) model which allows dynamic estimation and updating for the prediction weights over time. We use an init...
Source: Speech Communication - March 11, 2019 Category: Speech-Language Pathology Source Type: research

Why listening in background noise is harder in a non-native language than in a native language: A review
Publication date: Available online 8 March 2019Source: Speech CommunicationAuthor(s): Odette Scharenborg, Marjolein van OsAbstractThere is ample evidence that recognising words in a non-native language is more difficult than in a native language, even for those with a high proficiency in the non-native language involved, and particularly in the presence of background noise. Why is this the case? To answer this question, this paper provides a systematic review of the literature on non-native spoken-word recognition in the presence of background noise, and posits an updated theory on the effect of background noise on native ...
Source: Speech Communication - March 9, 2019 Category: Speech-Language Pathology Source Type: research

Multiple Description Coding Technique to Improve the Robustness of ACELP Based Coders AMR-WB
Publication date: Available online 2 March 2019Source: Speech CommunicationAuthor(s): Hocine Chaouch, Fatiha Merazka, Philippe MarthonAbstractIn this paper, a concealment method based on multiple-description coding (MDC) is presented, to improve speech quality deterioration caused by packet loss for algebraic code-excited linear prediction (ACELP) based coders. We apply to the ITU-T G.722.2 coder, a packet loss concealment (PLC) technique, which uses packetization schemes based on MDC. This latter is used with two new designed modes, which are modes 5 and 6 (18,25 and 19,85 kbps, respectively). We introduce our new second-...
Source: Speech Communication - March 4, 2019 Category: Speech-Language Pathology Source Type: research

Text Normalization using Memory Augmented Neural Networks
Publication date: Available online 28 February 2019Source: Speech CommunicationAuthor(s): Subhojeet Pramanik, Aman HussainAbstractWe perform text normalization, i.e. the transformation of words from the written to the spoken form, using a memory augmented neural network. With the addition of dynamic memory access and storage mechanism, we present a neural architecture that will serve as a language-agnostic text normalization system while avoiding the kind of unacceptable errors made by the LSTM-based recurrent neural networks. By successfully reducing the frequency of such mistakes, we show that this novel architecture is ...
Source: Speech Communication - March 1, 2019 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: February 2019Source: Speech Communication, Volume 107Author(s): (Source: Speech Communication)
Source: Speech Communication - February 20, 2019 Category: Speech-Language Pathology Source Type: research

Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target
Publication date: Available online 20 February 2019Source: Speech CommunicationAuthor(s): Paria Dadvar, Masoud GeravanchizadehAbstractIn this paper, a robust binaural speech separation system based on deep neural network (DNN) is introduced. The proposed system has three main processing stages. In the spectral processing stage, the multiresolution cochleagram (MRCG) feature is extracted from the beamformed signal. In the spatial processing stage, a novel reliable spatial feature of smITD+smILD is obtained by soft missing data masking of binaural cues. In the final stage, a deep neural network takes the combined spectral an...
Source: Speech Communication - February 20, 2019 Category: Speech-Language Pathology Source Type: research

A network-modeling approach to investigating individual differences in articulatory-to-acoustic relationship strategies
This study represents an exploratory analysis of a novel method of investigating variation among individual speakers with respect to the articulatory strategies used to modify acoustic characteristics of their speech. Articulatory data (nasalization, tongue height, breathiness) and acoustic data (F1 frequency) related to the distinction of three nasal-oral vowel contrasts in French were co-registered. Data were collected first from four Southern French (FR) speakers and, subsequently, from nine naïve Australian English listeners who imitated the FR productions. Articulatory measurements were mapped to F1 measurements ...
Source: Speech Communication - February 6, 2019 Category: Speech-Language Pathology Source Type: research

The relative contribution of computer assisted prosody training vs. instructor based prosody teaching in developing speaking skills by interpreter trainees: an experimental study
Publication date: Available online 2 February 2019Source: Speech CommunicationAuthor(s): Mahmood Yenkimaleki, Vincent J. van Heuven, Hossein MoradimokhlesAbstractThe present study investigates the relative contribution of computer assisted prosody training (CAPT) vs. instructor based prosody teaching (IBPT) on developing speaking skills by interpreter trainees. Three groups of student interpreters were formed. All were native speakers of Farsi who studied English translation and interpreting at the BA level at the University of Applied Sciences in Tehran, Iran. Participants were assigned to groups at random. No significant...
Source: Speech Communication - February 2, 2019 Category: Speech-Language Pathology Source Type: research

OPENGLOT – An open environment for the evaluation of glottal inverse filtering
Publication date: Available online 31 January 2019Source: Speech CommunicationAuthor(s): Paavo Alku, Tiina Murtola, Jarmo Malinen, Juha Kuortti, Brad Story, Manu Airaksinen, Mika Salmi, Erkki Vilkman, Ahmed GeneidAbstractGlottal inverse filtering (GIF) refers to technology to estimate the source of voiced speech, the glottal flow, from speech signals. When a new GIF algorithm is proposed, its accuracy needs to be evaluated. However, the evaluation of GIF is problematic because the ground truth, the real glottal volume velocity signal generated by the vocal folds, cannot be recorded non-invasively from natural speech. This ...
Source: Speech Communication - February 1, 2019 Category: Speech-Language Pathology Source Type: research

End-to-End Acoustic Modeling using Convolutional Neural Networks for HMM-based Automatic Speech Recognition
Publication date: Available online 30 January 2019Source: Speech CommunicationAuthor(s): Dimitri Palaz, Mathew Magimai-Doss, Ronan CollobertAbstractIn hidden Markov model (HMM) based automatic speech recognition (ASR) system, modeling the statistical relationship between the acoustic speech signal and the HMM states that represent linguistically motivated subword units such as phonemes is a crucial step. This is typically achieved by first extracting acoustic features from the speech signal based on prior knowledge such as, speech perception or/and speech production knowledge, and, then training a classifier such as artifi...
Source: Speech Communication - January 31, 2019 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: January 2019Source: Speech Communication, Volume 106Author(s): (Source: Speech Communication)
Source: Speech Communication - January 17, 2019 Category: Speech-Language Pathology Source Type: research