Objective Speech Intelligibility Measures for Forensic Applications
Giovanni Costantini1,2, Andrea Paoloni3, Massimiliano Todisco1,3
1 Department of Electronic Engineering, University of Rome “Tor Vergata”, Rome, Italy
2 IDASC Institute of Acoustics and Sensors “Orso Mario Corbino”, Rome, Italy
3 Fondazione “Ugo Bordoni”, Rome, Italy
Intelligibility of speech refers to the amount of speech items that a normal listener can understand. More specifically the standard ISO 9921 defines intelligibility as “the measurement of effectiveness in understanding speech.” Intelligibility can be assessed at sentence level, at word level, and for each phoneme. Intelligibility plays a key role in communications; indeed, ensuring full intelligibility is the main purpose of any communication channel or any recording system.
In forensic applications it is crucial that the meaning of sentences and mentioned names reflect those actually uttered by the speakers rather than the views of the transcribers. In many cases, however, there are harsh contrasts between the prosecutor and the defender about the transcription of difficult material: a well known example of contrast originated from a famous wiretap, where a word could be transcribed as “sbancato” [zbanka’to], meaning ‘depleted of all money’ in a game or in a business or “ sbiancato” [zbjanka’to], meaning a mere ‘whitened’. To assess the reliability of a transcript it would be useful to have a measure of the intelligibility of the signal to be transcribed. Unfortunately, no subjective measurement can be used in forensic applications, because the content of the message is not known in advance, and therefore it is impossible to determine the percentage of words that have been accurately transcribed. The only way to assess the intelligibility in forensic applications is to set up a system based on acoustic parameters which is able to predict the intelligibility of the measured signal. Such a system in the forensic field would be also very useful for evaluating the performance of speech enhancement systems, and more generally, would be very useful in many other fields to avoid the high cost of the subjective evaluation of signal intelligibility.
The present work aims to study the problem of the intelligibility of speech signal. In general two principally different assessment methods may be applied: subjective assessment, based on the use of the listeners, and objective assessment, based on physical parameters of the signal.
The subjective tests are often too costly and laborious to deploy and not well accepted in the courtroom. For that reasons we think would be useful to have a method to verify objectively whether a given signal can be transcribed with reasonable assurance of the reliability.
Speech Corpus and Subjective Intelligibility Evaluations
Both subjective and objective tests are conducted using the European project SAM EUROM 1. In particular have been used 50 rhyming Italian words with or without meaning, preceded by the word “PRENDI” (get) and followed by the word “INTANTO”(yet).
Degradations considered include additive noise. In particular, the corpus have been properly made noisy by adding Pink, Hammer and Babble noise.
Each type of noise appeared in five different degrees of signal to noise ratio (S/N = 2dB, 0dB, -2dB, -4dB, -6dB) and read by 4 different voices, two men and two women.
At the end of operations, therefore, can be found to have 60 different corpora each formed by 50 different words.
Table I shows the complete speech corpus.
A first experiment was conducted in order to obtain the subjective intelligibility score.
The speech corpora have been subjected to a group of 12 listeners, 4 for every degradation condition, using software developed for this purpose under theCycling74 Max/MSP environment, that deliver each item at chance many times as listener agreed. One test set consists of 50 different test signals. The listener fill in the proper space the word heard. Fig. 1 shows the application interface.
The result of the subjective tests is shown in Fig. 2.
We note that, for the same S/N, bubble noise leads to significantly higher values of the intelligibility than the other two types of disturbance.
Objective measurements do not measure intelligibility but determine physical parameters to predict intelligibility according to a certain model.
Many Objective measurements do not quantify intelligibility but determine physical parameters to predict intelligibility according to a certain model.
Many objective speech intelligibility measurements have been proposed in the past in order to predict the subjective intelligibility of speech. Most of the literature in this field comes from the IT world, where the problem is to study the impact of the transmission channel and encoders on intelligibility of speech.
Three frequently used objective measurement methods were evaluated for use, based on: the signal-to-noise ratio, with the noise filtered by an A-weighting curve (S/NA), the Articulation Index (AI), and the Speech Transmission Index (STI). Unfortunately, all those objective measurements need the clean signal to be available for comparison with the noisy signal. All of them can be referred to as double-sided methods and are not suitable for predicting the intelligibility in forensic applications. To this end, we propose a single-sided intelligibility measurement based on STI.
In the Speech Transmission Index theory the intelligibility of speech is related to the preservation of the spectral differences between successive speech elements, the phonemes. This can be described by the envelope function. The envelope function is determined by the specific sequence of phones of a specific utterance. The STI-based measure is computed as shown in Fig. 3. You can find more details in the AES paper: Giovanni Costantini, Andrea Paoloni, Massimiliano Todisco, Objective Speech Intelligibility Measures Based on Speech Transmission Index for Forensic Applications, 39th International AES Conference on Audio Forensics: Practices and Challenges, Hillerød, Denmark, June 17–19, 2010, pp. 182-188.
The experiment was conducted using for the intelligibility assessment the STI-based measure on the speech corpus described above. The experiment has shown high correlation between subjective and objective data in particular conditions that are typical of forensic applications.
SSIM – Software for Speech Intelligibility Measure
The overall results of this study show that the STI function provides a very good estimate of speech intelligibility. In particular, the experiments carried out have proven that our proposed STI measurement procedure is able to predict with sufficient accuracy speech intelligibility in conditions very close to those most frequently found in forensic applications, where both additive and multiplicative noise are involved.
Moreover, we developed a Windows application, named SSIM, Software for Speech Intelligibility Measure, see Fig 4, that operates a short-time STI-based measure; this application allows us to compute the objective intelligibility locally on a noisy signal, using window length of 500ms. We also tested the application on real signals and we have achieved excellent results.
Interested readers are invited to download our system and test it on their signals.
If you have any question, feedback or suggestion please don’t hesitate to contact us.
- Giovanni Costantini, Andrea Paoloni, Massimiliano Todisco, Objective Speech Intelligibility Measures Based on Speech Transmission Index for Forensic Applications, 39th International AES Conference on Audio Forensics: Practices and Challenges, Hillerød, Denmark, June 17–19, 2010, pp. 182-188.
- Chen D., Fourcin A., et alii, EUROM A spoken language resource for the EU, ESCA EUROSPEECH ’95 Madrid September 1995.
- Herman J.M. Steeneken, The Measurement of Speech Intelligibility , TNO Human Factors, Soesterberg, the Netherlands.
- Ma J., Hu y., Loizou C., Objective measures for predicting speech intelligibility in moist conditions based on new band importance functions, JASA 125, May 2009.
- Nobuhiko Kitawaki, and Takeshi Yamada, Subjective and Objective Quality Assessment for Noise Reduced Speech, ETSI Workshop on Speech and Noise in Wideband Communication, May 2007, Sophia Antipolis, France
- W. M. Liu, K. A. Jellyman, N. W. D. Evans, and J. S. D. Mason, Assessment of Objective Quality Measures for Speech Intelligibility, INTERSPEECH 2008, 9th Annual Conference of the International Speech Communication Association Brisbane, Australia September 22-26, 2008
- Hu Y., Loizou, P.C., A Comparative Intelligibility Study of Speech Enhancement Algorithms, Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on Volume 4, Issue, 15-20 April 2007, Page(s): IV-561 – IV-564.
- Kryter K., Methods for the calculation and use of the Articulation Index, JASA 34, 1689–1697 November 1962.
- Kryter, K., ANSI S3.5-1969, American National Standards Methods for Calculation of the Articulation Index, American National Standards Institute, New York, 1969.
- Payton K. L., A method to determine the speech transmission index from speech waveforms, JASA 106, 3637-3648, 1999.