|Authors||R. Eg and D. M. Behne|
|Editors||S. Ouni, F. Berthommier and A. Jesse|
|Title||Temporal integration for live conversational speech|
|Afilliation||Media, Communication Systems|
|Publication Type||Proceedings, refereed|
|Year of Publication||2013|
|Conference Name||Proceedings of the 12th International Conference on Auditory-Visual Speech Processing (AVSP2013)|
The difficulty in detecting short asynchronies between corresponding audio and video signals demonstrates the remarkable resilience of the perceptual system when integrating the senses. Thresholds for perceived synchrony vary depending on the complexity, congruency and predictability of the audiovisual event. For instance, asynchrony is typically detected sooner for simple flash and tone combinations than for speech stimuli. In applied scenarios, such as teleconference platforms, the thresholds themselves are of particular interest; since the transmission of audio and video streams can result in temporal misalignments, system providers need to establish how much delay they can allow. This study compares the perception of synchrony in speech for a live two-way teleconference scenario and a controlled experimental set-up. Although methodologies and measures differ, our explorative analysis indicates that the windows of temporal integration are similar for the two scenarios. Nevertheless, the direction of temporal tolerance differs; for the teleconference, audio lead asynchrony was more difficult to detect than for the experimental speech videos. While the windows of temporal integration are fairly independent of the context, the skew in the audio lead threshold may be a reflection of the natural diversion of attending to a conversation.