ReadMe_Harvard_British_English_recording_2019.txt Author: Philippa Demonte (ORCID ID: orcid.org/0000-0001-5810-2737; SCOPUS ID: 57202968540) Acoustics Research Group, University of Salford, United Kingdom e-mail(1): p.demonte@edu.salford.ac.uk e-mail(2): philippademonte@gmail.com tel: +44 (0)7979 578 482 Year: 2019 ReadMe document last updated: 13 August 2019 (v.2) OVERVIEW: The HARVARD speech corpus is a database of 720 phonetically-balanced sentences, divided into 72 lists of 10 sentences each. See harvard.txt for an overview of the sentences and the online- and journal references. This document outlines the details of a high-quality digital audio recording of the HARVARD speech corpus in its entirety by a female native British English speaker. AVAILABILITY: The audio .wav files which constitute this recording of the corpus are hosted on the University of Salford's Figshare site (https://salford.figshare.com). The files include: * HARVARD_raw.zip (approximately 499.36 MB) - 72 .wav files of the lists of spoken sentences (https://doi.org/10.17866/rd.salford.7862666.v1) * atmos semi-anechoic chamber Uni of Salford (approximately 5.66 MB) - one .wav file containing ambient noise of recording room (https://doi.org/10.17866/rd.salford.7862156.v1) * HARVARD_Edited_EP.zip (approximately 106.53 MB) - 720 .wav files with the audio edited into individual sentences and end-pointed (https://doi.org/10.17866/rd.salford.7862465.v1) * HARVARD_Edited_EP_5s.zip (approximately 107.12 MB) - 720 .wav files with the audio additionally front- and end- zero-padded to 5 seconds duration (https://doi.org/10.17866/rd.salford.7862186.v1) * EndPoint.m - the Matlab script created for end-pointing the edited audio files (https://doi.org/10.17866/rd.salford.7862285.v1) * zeropad.m - the Matlab script created to zero pad the edited audio files to a duration of 5 seconds (https://doi.org/10.17866/rd.salford.7862303.v1) * harvard.txt - text file of all 720 HARVARD sentences + references (https://doi.org/10.17866/rd/salford.7857743.v1) The audio is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/legalcode). RECORDING DETAILS: * Recording date: 20th December 2018 * Talker / Recording engineer: Philippa Demonte ==> a female native (Standard Southern) British English speaker in her 40s. ==> The talker made a conscious effort to take in-breaths away from the microphone, to articulate clearly, and to utter all sentences at a steady rate. * Location: semi-anechoic chamber, Acoustics Research Centre, University of Salford, UK ==> acoustically treated to standards BS ISO 3744 / ISO 3745 and BS 4196 * Microphone: Electro-Voice RE20 ==> Element type: Dynamic ==> Large diaphragm ==> Frequency response: 45 Hz - 18,000 Hz, i.e. relatively flat response ==> Polar Pattern: Cardioid; no coloration at 180-degrees off-axis ==> Impedance: 150 ohms balanced ==> Sensitivity, Open Circuit Voltage, 1 kHz: 1.5 mV/pascal ==> Hum pickup level, typical (60 Hz/1 millioersted field): -130 dBm ==> mic on stand at height of ~ 1.40 m placed: 0.89 m from front of room, 1.66 m from both sides of room, 3.18 m from back of room ==> the talker faced towards the front of the room at a distance of 0.1 m from the microphone. A 2-screen pop filter was placed inbetween at 0.05 m from microphone. * Soundcard: Focusrite Scarlett 2i2 ==> mic plugged in to Channel 1; Line; without pad; without 48V (phantom power) * DAW: Adobe Audition 2017 * Reproduction: Mono * Sampling rate: 48,000 Hz * Bit rate: 32 bit * Gain: around -33 to -20 dB, i.e. balance point between having a high enough gain for the dialogue, whilst trying to minimise electrical noise from mic cable. * Input: Focusrite; Output: Focusrite; Master Clock: Focusrite; Latency: 200 ms; Monitoring: via headphones. * Saved as: .wav files (+ accompanying .pkf files). The filename format is: Harvard list (number) .wav * Sentences were recorded in groups of 10, i.e. by list POST-PRODUCTION PROCESSING: * The raw .wav files were manually edited into individual .wav files of each HARVARD sentence using Adobe Audition 2017. * The edited .wav files were then end-pointed using a Matlab script (see EndPoint.m). The script determines the locations of the upcross points in each file, and then turns all gain amplitude values to zero before the 3rd index and after the 3rd-to-last index. The filename format is: Harvard - list number - sentence number - _0.wav * For the purpose of use in a speech-in-noise test, the researcher required all .wav files to be of 5 seconds in duration. Hence, each edited and end-pointed .wav file was additionally zero-padded by 1 second upfront and 1+ seconds at the end. The filename format is: Harvard - list number - sentence number - _5.wav No further processing was applied to these .wav files, as filtering, EQ, amplitude normalisation, and so on would have compromised the high quality of this audio recording. ======================================================================================================================================== FUTURE AVAILABILITY - ADDITIONAL MATERIALS The author has created the following materials, which will be uploaded to a separate collection within the author's profile on Salford Figshare following objective and subjective pilot testing with this recording of the HARVARD speech corpus: SPEECH: * a concatenated .wav file of 100 of the 720 HARVARD spoken sentences (edited; end-pointed) MASKING NOISE - MASTER .WAV FILES: * speech-shaped noise (SSN): white noise filtered with the long-term average speech spectrum (LTASS) using 52nd order linear predictive coding (LPC) * speech-modulated noise (SMN): SSN with the temporal envelope of the speech applied SSN and SMN are the same in the spectral domain, but different in the time domain. SSN is a purely energetic masker. In contrast, SMN allows partial glimpsing of the spoken signal, and can therefore be considered as either an energetic- or informational masker depending on the signal-to-noise ratio (SNR) adjusted to. MASKING NOISE - EDITED .WAV FILES (5 seconds duration each; 100 files per masker per SNR): * 5 seconds duration each; * 100 .wav files per masker (SSN; SMN) per speech-to-noise ratio (SNR) at dB SNRs: 0, -3, -6, -9, -12, -15, -18, -21, -24. Pilot testing will determine which SNRs are required for the masking noises for 50% speech intelligibility with this HARVARD corpus recording GLIMPSE PROPORTION AND HIGH ENERGY GLIMPSE PROPORTION VALUES * As calculated by the Objective Intelligibility Metric (OIM) created by Cooke & Tang (2016) AMPLITUDE STATISTICS * Calculated for speech .wav files and masker .wav files using a Matlab script. The values replicate those, which can alternatively be generated using the Amplitude Statistics feature in Adobe Audition CORRECT WORD SCORE TEMPLATES FOR SPEECH-IN-NOISE TESTS * for calculation in Excel * for use in Matlab or similar MATLAB SCRIPTS * for creating the SSN and SMN master .wav files using either LPC or the pwelch method * for calculating the amplitude statistics =========================================================================================================================================