Evaluation of Phonexia automatic speaker recognition software under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01)

Publication date: Available online 23 May 2019Source: Speech CommunicationAuthor(s): Michael Jessen, Jakub Bortlík, Petr Schwarz, Yosef A. SolewiczAbstractAs part of the Speech Communication virtual special issue “Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case (forensic_eval_01)” two automatic speaker recognition systems developed by the company Phonexia were tested. The first named SID (Speaker Identification)-XL3 is an i-vector PLDA system that works with two steams of features, one of them using MFCCs in a classical sense, the other using DNN-Stacked Bottle-Neck features based on correlated spectral-domain features as well as on information from voice/voiceless detection and fundamental frequency. The second system that was tested is called SID-BETA4. It uses MFCCs as input features (without deltas and double deltas) and employs a DNN-based speaker embedding architecture. Each of the two systems was tested in two variants. In the first, the system was used without including any domain-specific data, i.e. data from the training set of forensic_eval_01. In the second variant, training set data were used with a method called 10% FAR calibration. With this method scores are shifted in a way that 10% of the scores in the non-target distribution (based on training data) will have LLR> 0 and 90% will have LLR < 0. Results showed that the speaker embedding system SID-BETA4 leads to clear improveme...
Source: Speech Communication - Category: Speech-Language Pathology Source Type: research