|Title||Processing Cyclic Multimedia Workloads on Modern Architectures|
|Afilliation||, Communication Systems|
|Publication Type||PhD Thesis|
|Year of Publication||2014|
|Publisher||University of Oslo|
Working with modern architectures for high performance applications is increasingly more difficult for programmers as the complexity of both the system architectures and software continue to increase. The level of hand tuning and native adaptations required to achieve high performance comes at the cost of limiting the portability of the software. For instance, we show that a compute intensive DCT algorithm performs better on graphic processors than the best algorithm for x86. In particular, limited portability is true for cyclic multimedia workloads, a set of programs that run continuously with strict requirements for high performance and low latency. An example of a typical multimedia workload is a pipeline of many small image processing algorithms working in tandem to complete a particular task. The input can be videos from one or more live cameras, and the output is a set of video frames with elements from several of the source videos, for example as stitched panorama frames or 3D warped video. Such a setup runs continuously and potentially needs to adapt to various degrees of changes in the setup without interruptions or downtime. To reach the performance goal required by multimedia pipelines, modern, heterogeneous architectures are considered instead of the traditional symmetric multi-processing architectures. We also investigate variations between recent microarchitectures of symmetric processors to identify differences that a low-level scheduler must take into account. Further, since multimedia workloads often need to adapt to various external conditions, e.g., adding another participant to a video conference, we also investigate elastic and portable processing of multimedia work- loads. To do this, we propose a framework design and language, which we call P2G. In the age of Big Data, this idea differs from the typical frameworks used for distributed processing, such as MapReduce and Dryad, in that it is designed for continuous operation instead of batch process- ing of large workloads. We emphasise heterogeneous support and expose parallel opportunities in workloads in a way that is easy to target since it is similar to sequential execution with multidimensional arrays. The framework ideas are implemented as a prototype and released as an open source platform for further experimentation and evaluation.