Precedence Effect - Listening Experiment Audio
A zip folder containing the audio files used in a speech-in-noise test conducted in the Listening Room at the University of Salford in March 2020 towards the PhD thesis by P. Demonte (2022).
This research investigates object-based approaches to audio engineering with the intent of improving the speech intelligibility of broadcast audio for end-users. Speech intelligibility is defined here as the audibility and comprehension of spoken dialogue.
This particular listening experiment was a baseline study involving an array of four loudspeakers in three different playback configurations:
* L1 + R1 - a stereo pair of loudspeakers at 2m distance and +/- 30 degrees azimuth from the listener position; playing speech and background noise simultaneously.
* C2 - an auxilliary (aux) loudspeaker at a true centre position, i.e. 1.73m distance and 0 degrees azimuth from the listener, playing speech only.
* R2 - an auxilliary (aux) loudspeaker at 1.73m distance and +90 degrees azimuth from the listener, playing speech only.
Loudspeaker array configurations for audio playback:
1) L1 + R1
2) L1 + R1 + C2
3) L1 + R1 + R2
This quantitative, subjective listening experiment was designed to test the effect on speech intelligibility when a psychoacoustic phenomenon called the precedence effect is utilised. This is implemented by augmenting a loudspeaker array and applying a delay to the additional (aux) loudspeaker. In addition, equalisation (EQ) has to be applied to the signal from the aux loudspeaker to negate any differences between the two- and three loudspeaker arrays in terms of comb filtering effects at the listener position.
Audio playback for this listening experiment was controlled by a Matlab script, which followed randomised playlists from an Excel spreadsheet. These resources can be found in the same Salford Figshare collection as these audio files: P.Demonte (2022). 'Utilising the Precedence Effect with an Object-Based Approach to Audio to Improve Speech Intelligibility' (https://doi.org/10.17866/rd.salford.c.5975881.v1)
The format for each trial of the speech-in-noise test:
1) a narrator announces the sentence number, e.g. "Number one";
2) a spoken sentence taken from the Demonte (2019) re-recording of the HARVARD speech corpus plays simultaneously with background noise (either speech-shaped noise at -9 dB SNR or speech-modulated noise at -11 dB SNR);
3) then silence for 12 seconds, during which time the participant writes down the spoken sentence as they have heard and understood it;
4) thereafter, the next trial automatically begins;
5) after each set of 20 trials, there is a 30 seconds break, as indicated by playback of some gentle piano music, before the next set of 20 trials begins;
6) playback ends after completion of a total of 120 trials.
The sub-folder names that the audio (.wav) files have been sorted into within this zip folder reflect the directory names outlined in the Matlab script.
- spoken sentences for playback through loudspeakers L1 + R2 (the stereo pair). The .wav file names reflect the HARVARD speech corpus list number and sentence number (01 to 10) within that list; '_5' indicates the audio duration of 5 seconds.
- 5 seconds of background masking noise for simultaneous playback with the speech through loudspeakers L1 + R2 (the stereo pair). The .wav file name indicates whether it is speech-modulated noise (SMN) or speech-shaped noise (SSN), and the speech-to-noise ratio that it has been pre-adjusted to.
* C2S > C2S_with_EQ_applied_020320
- spoken sentences for playback through auxiliary loudspeaker C2.
- these are similar to the SpeeOrig audio .wav files, but with additional zero-padding (silence) at the start to replicate a 10 ms delay from this loudspeaker. Correspondingly, 10 ms of silence has been edited from the end, such that the total duration of each .wav file is 5 seconds.
- these files also had equalization (EQ)** applied for playback from the C2 loudspeaker that was specific to the Listening Room conditions at the listener position on 2nd March 2020.
* R2S > R2S_with_EQ_applied_020320
- spoken sentences for playback through auxiliary loudspeaker R2.
- same editing as for the C2S .wav files, i.e. to replicate a 10 ms delay from the R2 loudspeaker, and total duration of 5 seconds
- EQ** applied for playback from the R2 loudspeaker that was specific to the Listening Room conditions at the listener position on 2nd March 2020.
- Mute12.wav - 12 seconds of silence, which the Matlab script sends to the C2 and R2 loudspeakers during the trials that only involve the L1 + R1 stereo loudspeaker array configuration;
- Mute30.wav - 30 seconds of silence, which the Matlab script sends to the C2 and R2 loudspeakers whilst music plays from the L1 + R1 loudspeakers during the short breaks after each set of 20 trials.
- not included in this zip file due to copyright
See also the following Salford Figshare collections:
- for the complete Demonte re-recording of the HARVARD speech corpus (720 sentences total)
- for the master audio files of the speech-shaped noise and speech-modulated noise
** Also enclosed within this zip file is a sub-folder: 'for loudspeaker callibration and EQ':
> Pink Noise
- a .wav of pink noise used for sound pressure level measurements to callibrate the four loudspeakers in the experiment array
- once callibrated, recordings made on 2nd March 2020 (at the listener position) of the pink noise played through the three different loudspeaker array configurations
> HARVARD speech 100
- Harvard_speech100.wav - an audio file of 100 of the HARVARD speech corpus sentences played back-to-back; used during sound pressure level measurements at the listener position
- recordings made of this audio at the listener position, as per the Listening Room conditions on 2nd March 2020, in order to then determine the EQ that needed to be applied to the speech .wav files for playback from the C2 and R2 loudspeakers
- versions of the Harvard_speech100.wav files with the EQ applied, as a quality check before then applying the EQ to the speech .wav files for that day of the experiment.
A separate Matlab script was used for analysing these audio recordings in order to determine the EQ adjustments required and then apply them to the relevant speech audio files.
For replication of this listening experiment, new EQ adjustments must be made for the room conditions on any given day.
For further information, contact: