Results and audio materials from an experiment into the cognitive categorization of auditory objects in complex audio scenes.

dataset

posted on 2016-06-01, 11:01 authored by James Stephen Woodcock, Trevor John CoxTrevor John Cox, William Jonathan DaviesWilliam Jonathan Davies, Frank Melchior

This fileset contains public data relating to the JAES paper "Categorization of broadcast audio objects in complex auditory scenes", along with supporting figures for the paper.

The zip file "Public.zip" contains the following:

./Soundscapes contains the programme material used for the soundscapes case study in 48kHz 24-bit .wav format. The channel order is L, R, C, LFE, LS, RS. The folder also contains a pure data patch "interface.pd" that was used to run the test

./scripts_and_data contains the R script that was used to conduct the analysis. Lines 454 onwards should be uncommented according to the programme material that is being analyesed.

./scripts_and_data/data contain the data from the sorting experiments in .csv format. The progamme material that files in this and the parent folder relate to are named in the folowing way: RD = Radio drama, LE = Live events, ND = Nature documentary, SS = Soundscape, FF = Feature film, ALL = All material. In these files, rows are objects and columns are participants. Numbers relate the the group that each object was assigned to for a given participant.

./scripts_and_data/data/labels_data contains .csv files descibing the category labels participants assigned to their groups. In these files, rows are category labels and columns are objects. If an object was categorised using a given category label, a 1 is assigned and a 0 otherwise.

./interview_transcripts contains .docx files of the transcripts of the open interview with each of the participants

Abstract:

This paper presents a series of experiments to determine a categorization framework for broadcast audio objects. Object-based audio is becoming an evermore important paradigm for the representation of complex sound scenes. However, there is a lack of knowledge regarding object level perception and cognitive processing of complex broadcast audio scenes. As categorization is a fundamental strategy in reducing cognitive load, knowledge of the categories utilized by listeners in the perception of complex scenes will be beneficial to the development of perceptually based representations and rendering strategies for object-based audio. In this study, expert and non-expert listeners took part in a free card sorting task using audio objects from a variety of different types of programme material. Hierarchical agglomerative clustering suggests that there are 7 general categories, which relate to sounds indicating actions and movement, continuous and transient background sound, clear speech, non-diegetic music and effects, vocalisations, and prominent attention grabbing transient sounds. A three dimensional perceptual space calculated via multidimensional scaling suggests that these categories vary along dimensions related to the semantic content of the objects, the temporal extent of the objects, and whether the object indicates the presence of people.