University of Salford
Browse

HARVARD speech corpus - audio recording 2019

Posted on 2019-05-13 - 12:16 authored by Philippa Demonte
High-quality (sampling rate: 48 kHz; 32-bit rate) digital audio .wav files of a new recording of the HARVARD speech corpus in its entirety (720 phonetically balanced sentences), featuring a female native British English speaker. For use in speech-in-noise tests, evaluations of audio quality, machine learning, and so on.

Recorded: December 2018 at the University of Salford.

The audio files are licensed under the Creative Commons Attribution-NonCommercial 4.0 International License
(https://creativecommons.org/licenses/by-nc/4.0/legalcode)

Citation for the HARVARD sentences:
* From the appendix of: 'IEEE Subcommittee on Subjective Measurements IEEE Recommended Practices for Speech Quality Measurements'. IEEE Transactions on Audio and Electroacoustics, vol. 17, pp. 227-246. 1969.
* http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/fgdata/OldFiles/Recorder.app/utterances/Type1/harvsents.txt

For an overview of the lists of HARVARD speech corpus sentences:
* harvard.txt

For an overview of how this version of the speech corpus was recorded and audio engineered:
* harvard_201218_British_English_recording.txt

For examples of the audio files at three different stages of processing
* Speech corpus - example of raw audio: HARVARD list 01.wav
* Speech corpus - example of edited audio: Harvard_L01_S01_0.wav
* Speech corpus - example of edited end-pointed zero-padded audio: Harvard_L01_S01_5.wav

For the .wav audio files of the HARVARD speech corpus in its entirety at different stages of processing, see the zip folders:
* Speech corpus - Harvard - raw audio
* Speech corpus - Harvard - edited and end-pointed audio
* Speech corpus - Harvard - edited, end-pointed, zero-padded audio

Each raw audio file is a recording of a single Harvard speech corpus list of 10 sentences.
The two zip folders of the edited versions contain 10 individual sentence .wav files per sub-folder.

For a .wav audio file of the room ambience in which this version of the Harvard speech corpus was recorded in:
* Speech corpus - ambient noise of recording room

For the Matlab 2018b scripts used for the end-pointing and zero-padding applied to the audio files:
* EndPoint.m
* zeropad.m



CITE THIS COLLECTION

DataCite
3 Biotech
3D Printing in Medicine
3D Research
3D-Printed Materials and Systems
4OR
AAPG Bulletin
AAPS Open
AAPS PharmSciTech
Abhandlungen aus dem Mathematischen Seminar der Universität Hamburg
ABI Technik (German)
Academic Medicine
Academic Pediatrics
Academic Psychiatry
Academic Questions
Academy of Management Discoveries
Academy of Management Journal
Academy of Management Learning and Education
Academy of Management Perspectives
Academy of Management Proceedings
Academy of Management Review
or
Select your citation style and then place your mouse over the citation text to select it.

FUNDING

EP/L000539/1

SHARE

email
need help?