Efficient EONS: Execution of Large Workloads on Elastic Heterogeneous Resources

In the current and future industry and society, there will be an increasing number of systems storing and processing large amounts of data. This is the next frontier for innovation, competition and productivity with ongoing large initiatives both in the EU and the US. Areas where processing of large amounts of unstructured data is applied include medicine, meteorology, genomics, connectomics, physics, biology, environmental research, Internet search, finance and governmental informatics. Finding structure and information in these data sets does obvious demand massive computational resources, but it imposes also huge demands on I/O and communication in order to yield answers to nearly arbitrary questions in subjective real-time. If these questions are posed by machines, or if a human’s question covers interconnected data sets, there are furthermore dependencies between processing steps. In a large computing cluster like a grid or a cloud, an important challenge is thus to execute the many concurrent computations in an efficient and dynamic manner where available resources, processing, communication, dependencies and timeliness must be taken into account when mapping tasks to processing cores.

As such, the aim of the EONS research project is to perform basic research in the area of system and tools support for both, parallel programming and parallel processing, in the context of future distributed large-scale heterogeneous systems. EONS will develop concepts and mechanisms that enable the development of software for these next-generation big-data applications. This is achieved by solving fundamental challenges for the dispatching, division, scheduling and identification of tasks that can run correctly in parallel in a shared distributed system of heterogeneous computing resources in complex topologies.

The following points are investigated by EONS:

  • Formalization of a high level parallel programming model that is compatible with those programming models and languages that developer today know. There are already several approaches to specify potential parallelism, but for workloads with processing and/or time dependencies, we need to add notions of deadlines and execution orders.
  • Compiler and multi-core run-time system. Many run-time systems have been built and are in use, but there are large potentials for more efficient execution and run-time support for the dependencies must be added. Scheduling and mapping of tasks to processing engines will here be important. At the core of this plan is the common exploitation of knowledge that can be retained from the compilation step with knowledge that can be gained at runtime during execution on a multi-core system.
  • Distributed implementation and high-level scheduler optimization. Adding support for multiple machines makes the previous item more complex. The heterogeneity and complexity increase and the communication costs vary more. A high-level scheduler therefore must take this into account, i.e., in addition to the competition for resources from different concurrent workloads.
All project participants have experience in the area of experimental research, and the EONS project will therefore use evolutionary prototyping to accomplish its goal of investigating and implementing a mechanisms that enable the development and execution of complex, time-dependent and computationally intensive applications. Using simulations to prove the validity of our high and low-level schedulers, scalability, etc., would be possible, but past experience has shown that we are frequently unable to model the complexity of the system correctly. Furthermore, the best venues for publishing research results in this area today require test results from real systems. As such, developing a proof-of-concept prototype running real applications is the only feasible approach for solving issues related to this project.

Final goal:

EONS develops a programming model for processing big-data workloads, which is implemented as a combination of compiler, operating system, and high-level scheduler.

Funding source:

The Norwegian Research Council's FRINATEK programme

All partners:

Dag Johansen, University of Oslo

Find publication

Year published

Affiliation

Media
The Center for Resilient Networks and Applications

Duration

January 2015 - December 2017

Contact person(s)