AuthorsR. Eg
TitlePreserving Temporal Integration in Multimedia: Perceived Synchrony Across Audiovisual Content and Quality Distortions
AfilliationCommunication Systems, Communication Systems
Publication TypePhD Thesis
Year of Publication2014
Degree awarding institutionDepartment of Psychology
Number of Pages170
Date Published10/2014
PublisherAkademika Publishing, University of Oslo
Place PublishedOslo, Norway
Other NumbersISSN 1504-3991

With the introduction of modern media, our senses are facing a new reality. The human perceptual system has adapted to the physical world through millennia of evolution; on the other hand, multimedia technology has existed less than a century. The presented project set out to explore how the human senses bind auditory and visual information in various multimedia settings. Specifically, the project aimed to establish whether the loss of auditory or visual quality could have adverse effects on this perceptual binding process. Quality loss is common to multimedia content and can arise at any point during preparation, transmission, or presentation. While audio and video distortions come with a variety of audible and visible effects, often in the form of artifacts, this work takes basis in the perceptual consequences rather than the origin of the distortions. In order to assess how quality distortions affect audiovisual integration, asynchrony was introduced as an experimental tool to establish thresholds for temporal integration that can be compared across conditions. Using this as a methodology throughout, several experiments on perceived audiovisual synchrony were conducted for different content, different distortions, different scenarios, and different experimental approaches.

Instead of focusing on specific auditory or visual artifacts that can arise from the compression, encoding or transmission of multimedia content, the first set of experiments took a more generic approach. Quality distortions were introduced uniformly and consistently across the auditory and visual signals, ensuring that all sensory information would be equally affected by the masking effect. Yet, the findings revealed no significant effect of either noise or blur on perceived synchrony of audiovisual events. However, we found temporal integration to vary significantly between short and long speech excerpts, and between speech, music, and isolated physical actions. Compared to music and speech, which both are rapid and dynamic in content, greater tolerance to asynchrony was observed for the single spoken syllable and the isolated action event. This finding is possibly related to the relatively few temporal cues shared between the modalities; the temporal alignment of auditory and visual signals is an on-going process and it seems likely that it depends on continuous and consistent reference points.

The subsequent experiments looked into asynchrony applied to teleconference scenarios. In the first study, asynchrony detection for spontaneous speech was compared to perceived synchrony in recorded speech, revealing that temporal integration is more robust for the first. Likely, this is due to two related factors, the measure and the nature of the task. When detecting gradually increasing asynchrony, the slow change may make perception less sensitive to the temporal offset. Similarly, in a live conversation, numerous distractions may take attention away from the task at hand, again making perception less sensitive to the misalignment.

A common challenge to teleconference systems is reverberating acoustics, which can affect the temporal signature of auditory signals. Accordingly, a final study explored reverberation as a distortion to auditory information. Despite the lack of observed impact from other quality distortions, reverberation might still influence perceived audiovisual synchrony due to the shared temporal dimension. Results demonstrated again the robustness of temporal integrity in audiovisual speech. However, reverberation that follows isolated events can have severe impact on the subjective perception of synchrony. Nevertheless, the main lesson learnt over the course of this work relates to the remarkability of the perceptual system. Similar to earlier studies that have demonstrated the perceptual system’s capacity to compensate for substantial separations between the modalities, be they temporal, spatial, or articulatory, this work presents examples of perception’s capacity to compensate for quality discrepancies.