Editorial Board
Publication date: August 2019Source: Speech Communication, Volume 111Author(s): (Source: Speech Communication)
Source: Speech Communication - July 5, 2019 Category: Speech-Language Pathology Source Type: research

A statistical procedure to adjust for time-interval mismatch in forensic voice comparison
Publication date: Available online 2 July 2019Source: Speech CommunicationAuthor(s): Geoffrey Stewart Morrison, Finnian KellyAbstractThe present paper describes a statistical modeling procedure that was developed to account for the fact that, in a forensic voice comparison analysis conducted for a particular case, there was a long time interval between when the questioned- and known-speaker recordings were made (six years), but in the sample of the relevant population used for training and testing the forensic voice comparison system there was a short interval (hours to days) between when each of multiple recordings of eac...
Source: Speech Communication - July 3, 2019 Category: Speech-Language Pathology Source Type: research

Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case (forensic_eval_01) – Conclusion
Publication date: Available online 27 June 2019Source: Speech CommunicationAuthor(s): Geoffrey Stewart Morrison, Ewald EnzingerAbstractThis conclusion to the virtual special issue (VSI) “Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case (forensic_eval_01)” provides a brief summary of the papers included in the VSI, observations based on the results, and reflections on the aims and process. It also includes errata and acknowledgments. (Source: Speech Communication)
Source: Speech Communication - June 27, 2019 Category: Speech-Language Pathology Source Type: research

Detection of Speech Tampering Using Sparse Representations and Spectral Manipulations Based Information Hiding
Publication date: Available online 21 June 2019Source: Speech CommunicationAuthor(s): Shengbei Wang, Weitao Yuan, Jianming Wang, Masashi UnokiAbstractSpeech tampering has brought serious problems to speech security. Information hiding method can be used for tampering detection if it can satisfy several competitive requirements, e.g., inaudibility, robustness, blindness, and fragility. According to preliminary analysis, spectral envelope and formants are important indicators of tampering, since tampering the speech will unavoidably modify the shape of the spectral envelope and the locations/magnitudes of the formants. By ta...
Source: Speech Communication - June 22, 2019 Category: Speech-Language Pathology Source Type: research

Deep Learning for Minimum Mean-Square Error Approaches to Speech Enhancement
Publication date: Available online 13 June 2019Source: Speech CommunicationAuthor(s): Aaron Nicolson, Kuldip K. PaliwalAbstractRecently, the focus of speech enhancement research has shifted from minimum mean-square error (MMSE) approaches, like the MMSE short-time spectral amplitude (MMSE-STSA) estimator, to state-of-the-art masking- and mapping-based deep learning approaches. We aim to bridge the gap between these two differing speech enhancement approaches. Deep learning methods for MMSE approaches are investigated in this work, with the objective of producing intelligible enhanced speech at a high quality. Since the spe...
Source: Speech Communication - June 13, 2019 Category: Speech-Language Pathology Source Type: research

Anomaly Detection based Pronunciation Verification Approach Using Speech Attribute Features
Publication date: Available online 11 June 2019Source: Speech CommunicationAuthor(s): Mostafa Shahin, Beena AhmedAbstractComputer aided pronunciation training tools require accurate automatic pronunciation error detection algorithms to identify errors made by their users. However, the performance of these algorithms is highly dependent on the amount of mispronounced speech data used to train them and the reliability of its manual annotation. To overcome this problem, we turned the mispronunciation detection into an anomaly detection problem, which utilize algorithms trained with only correctly pronounced speech data. In th...
Source: Speech Communication - June 12, 2019 Category: Speech-Language Pathology Source Type: research

Improving Human Scoring of Prosody Using Parametric Speech Synthesis
In this study, HMM-based speech synthesis from an average model of native speakers was utilized. The experimental result shows that the proposed method can improve scoring reliability, which is confirmed by an increase in the inter-rater correlation. We also build an automatic pronunciation evaluation system trained from non-native speech databases with scores given by either the conventional and proposed methods, and compare the performance of the systems. The result shows that the predicted pronunciation scores matched the human-rated scores; the human-machine correlation produced a score of 0.87, while the conventional ...
Source: Speech Communication - June 6, 2019 Category: Speech-Language Pathology Source Type: research

Evaluation of Phonexia automatic speaker recognition software under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01)
Publication date: Available online 23 May 2019Source: Speech CommunicationAuthor(s): Michael Jessen, Jakub Bortlík, Petr Schwarz, Yosef A. SolewiczAbstractAs part of the Speech Communication virtual special issue “Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case (forensic_eval_01)” two automatic speaker recognition systems developed by the company Phonexia were tested. The first named SID (Speaker Identification)-XL3 is an i-vector PLDA system that works with two steams of features, one of them using MFCCs in a classical sense, the other using D...
Source: Speech Communication - May 23, 2019 Category: Speech-Language Pathology Source Type: research

Exploiting Temporal Correlation in Pitch-Adaptive Speech Enhancement
Publication date: Available online 17 May 2019Source: Speech CommunicationAuthor(s): Johannes Stahl, Pejman MowlaeeAbstractThe single-channel speech enhancement problem is addressed. We propose a pitch-adaptive short-time Fourier transform (PASTFT) framework to obtain a signal-dependent time-frequency representation of the input signal. We analyze the inter-frame correlation of successive speech DFT bins resulting from the PASTFT and harmonic signal modeling. This analysis reveals significant correlation if the phase progression introduced by the harmonic nature of the speech signal is taken into account. Hence, we model s...
Source: Speech Communication - May 18, 2019 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: July 2019Source: Speech Communication, Volume 110Author(s): (Source: Speech Communication)
Source: Speech Communication - May 17, 2019 Category: Speech-Language Pathology Source Type: research

How modeling entrance loss and flow separation in a two-mass model affects the oscillation and synthesis quality
In this study, a modified two-mass model of the vocal folds was used to simulate phonation for 12 modeling options: three ways to model the entrance loss combined with four ways to model flow separation. For each condition, the following characteristics of the glottal oscillation and flow were determined: the phonation threshold pressure, the frequency range of self-sustained oscillation, the oscillation amplitude for different glottal rest openings, the spectral slope of the flow derivative, the maximum flow declination rate (MFDR), the open quotient (OQ), and the difference between the levels of the first and second harm...
Source: Speech Communication - April 30, 2019 Category: Speech-Language Pathology Source Type: research

Normal-to-Lombard Adaptation of Speech Synthesis Using Long Short-Term Memory Recurrent Neural Networks
Publication date: Available online 18 April 2019Source: Speech CommunicationAuthor(s): Bajibabu Bollepalli, Lauri Juvela, Manu Airaksinen, Cassia Valentini-Botinhao, Paavo AlkuAbstractIn this article, three adaptation methods are compared based on how well they change the speaking style of a neural network based text-to-speech (TTS) voice. The speaking style conversion adopted here is from normal to Lombard speech. The selected adaptation methods are: auxiliary features (AF), learning hidden unit contribution (LHUC), and fine-tuning (FT). Furthermore, four state-of-the-art TTS vocoders are compared in the same context. The...
Source: Speech Communication - April 19, 2019 Category: Speech-Language Pathology Source Type: research

IITG-HingCoS Corpus: A Hinglish Code-Switching Database for Automatic Speech Recognition
Publication date: Available online 17 April 2019Source: Speech CommunicationAuthor(s): Sreeram Ganji, Kunal Dhawan, Rohit SinhaAbstractCode-switching is a phenomenon in linguistics which refers to the use of two or more languages, especially within the same discourse. This phenomenon has been observed in many multilingual communities across the globe. In the recent past, there have been increasing demand for automatic speech recognition (ASR) systems to deal with code-switching. However, for training such systems, very limited code-switching resources are available as yet. Thus, the development of code-switching resources ...
Source: Speech Communication - April 17, 2019 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: May 2019Source: Speech Communication, Volume 109Author(s): (Source: Speech Communication)
Source: Speech Communication - April 16, 2019 Category: Speech-Language Pathology Source Type: research

Evaluation of Nuance Forensics 9.2 and 11.1 under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01)
Publication date: Available online 10 April 2019Source: Speech CommunicationAuthor(s): Dr. Michael Jessen (Source: Speech Communication)
Source: Speech Communication - April 11, 2019 Category: Speech-Language Pathology Source Type: research