AuthorsR. Eg and D. M. Behne
EditorsS. Ouni, F. Berthommier and A. Jesse
TitleTemporal integration for live conversational speech
AfilliationMedia, Communication Systems
StatusPublished
Publication TypeProceedings, refereed
Year of Publication2013
Conference NameProceedings of the 12th International Conference on Auditory-Visual Speech Processing (AVSP2013)
Pagination129-133
Date PublishedAugust
PublisherAVSP
KeywordsWorkshop
Abstract

The difficulty in detecting short asynchronies between corresponding audio and video signals demonstrates the remarkable resilience of the perceptual system when integrating the senses. Thresholds for perceived synchrony vary depending on the complexity, congruency and predictability of the audiovisual event. For instance, asynchrony is typically detected sooner for simple flash and tone combinations than for speech stimuli. In applied scenarios, such as teleconference platforms, the thresholds themselves are of particular interest; since the transmission of audio and video streams can result in temporal misalignments, system providers need to establish how much delay they can allow. This study compares the perception of synchrony in speech for a live two-way teleconference scenario and a controlled experimental set-up. Although methodologies and measures differ, our explorative analysis indicates that the windows of temporal integration are similar for the two scenarios. Nevertheless, the direction of temporal tolerance differs; for the teleconference, audio lead asynchrony was more difficult to detect than for the experimental speech videos. While the windows of temporal integration are fairly independent of the context, the skew in the audio lead threshold may be a reflection of the natural diversion of attending to a conversation.

Citation KeySimula.simula.2092