Effect of Background Music Arrangement and Tempo on Foreground Speech Intelligibility.

Posted on 15.05.2022 - 18:07 by Philippa Demonte
One of the four commonest reasons that broadcasters such as the BBC receive complaints about the audibility of their dialogue output is because of the underlying background music. However, in the UK there are only two directives given by broadcasters to audio-visual content providers: turn the background music down by 3 dB when co-present with foreground speech, and avoid background music with heavily percussive beats and lyrics.

Since background music is narratively important in certain genres of programmes, such as documentaries and dramas, completely omitting the music will not necessarily solve problems with intelligibility. More insight is therefore needed into which aspects of background music cause problems related to the audibility and understanding of the foreground dialogue. Further directives for content providers regarding the background music can then be suggested, particularly with future object-based approaches to audio in mind.

The files in this collection relate to quantitative, objective and subjective investigations conducted by P. Demonte at the University of Salford in 2018 to determine whether or not background music arrangement (instrumentation timbre and density) has any significant effect on foreground speech intelligibility. The effect of background music tempo was also tested.

The investigation used spoken sentences from the R-SPIN speech corpus (Bilger et al., 1984), and background music created by the researcher in Garageband using Apple Loops. To compare against the effects of background music, a control condition using speech-shaped noise (a purely energetic masking noise) was also tested.

The objective testing involved the use of Tang and Cooke's (2016) HEGP OIM (high energy glimpse proportion objective intelligibility metrics), with the glimpse proportion (GP) values as a proxy for speech intelligibility. The subjective testing involved a standard speech in noise test (SINT), whereby participants listened over headphones to spoken sentences played simultaneously with background noise (music or speech-shaped noise), and were tasked with identifying target words. Correct word scores converted to word recognition percentages then acted as a proxy for quantifying effects on speech intelligibility. Playlists for the subjective testing were pre-randomised, and audio reproduction was via headphones from digital audio workstations in Adobe Audition and an external soundcard.

For the subjective testing, speech was played at a calibrated level of 63 dB A. Instead of pre-setting the background noise (music or speech-shaped noise) .wav files to produce the same speech-to-noise ratio (SNR), they were instead set to produce the same energetic masking noise level, such that any effects observed on the foreground speech intelligibility would be attributable to other factors. This was achieved upfront of the experiment by running pairs of the speech and noise files through Tang and Cooke's (2016) HEGP OIM in a Matlab script, and iteratively adjusting the noise levels in a 'for' loop until the OIM gave a glimpse proportion (GP) value of 10 for the speech relative to the noise.

The files included in this collection:

* a zip folder containing the background music and speech-shaped noise audio .wav files;

* an Excel workbook with the objective testing data, i.e. SNRs, GP values, and HEGP values;

* an Excel workbook with the subjective testing data collected and statistical analyses.

The 'Read Me' sections of the Excel workbook provide further details about the background music pieces created for this investigation.

This collection does not contain:

* the R-SPIN speech audio .wav files

* Tang & Cooke's (2016) HEGP OIM and the associated Matlab script and functions

...which must be acquired separately from the relevant authors due to the copyright on those materials.

Further details of this investigation are given in the PhD thesis by P. Demonte (2022).

email (1): p.demonte@edu.salford.ac.uk
email (2): philippademonte@gmail.com


Bilger, R C (1984). Manual for the clinical use of the Revised SPIN test. Champaign, IL, USA: University of Illinois, pp. 1-57. https://doi.org/10.1044/jshr.2701.32.

Bilger, R C et al. (1984). Standardization of a test of speech perception in noise. In: Journal of Speech, Language, and Hearing Research, 27, pp. 32-48.

Tang, Yan and Cooke, Martin (2016). Glimpse-based metrics for predicting speech intelligibility in additive noise conditions. Article 5704563. In: Interspeech 2016, ISCA, 8th-12th September 2016, San Francisco. pp. 2488-2492. https://usir.salford.ac.uk/id/eprint/40054/


Demonte, Philippa (2022): Effect of Background Music Arrangement and Tempo on Foreground Speech Intelligibility.. University of Salford. Collection. https://doi.org/10.17866/rd.salford.c.5996539.v1
Select your citation style and then place your mouse over the citation text to select it.


S3A: Future Spatial Audio for an Immersive Listener Experience at Home

Engineering and Physical Sciences Research Council


need help?