AuthorsF. Zahid
TitleNetwork Optimization for High Performance Cloud Computing
AfilliationCommunication Systems
Project(s)ERAC: Efficient and Robust Architecture for the Big Data Cloud
Publication TypePhD Thesis
Year of Publication2017
Degree awarding institutionUniversity of Oslo
Date Published12/2017
PublisherUniversity of Oslo
Place PublishedUniversity of Oslo

Cloud Computing has seen a tremendous popularity in last several years. A scalable and efficient data center network is essential for a performance capable cloud computing infrastructure. This thesis provides practical solutions to enable an efficient, flexible, multi-tenant network architecture suitable for high-performance cloud computing, using InfiniBand (IB) as a demonstration technology. The work is motivated by the needs of the future data centers to provide efficient cloud solutions for increasing uptake of the cloud technology for both big data and traditional High-Performance Computing (HPC) applications.

Research contributions of this thesis lie within three main categories. First, we propose a set of improvements to the fat-tree routing algorithm to make it suitable for HPC workloads in the cloud. Fat-Tree is a popular network topology in HPC systems. Our proposed improvements to the fat-tree routing make it more efficient, provides performance isolation among tenants in multi-tenant systems, and enable routing of both physical end nodes and virtualized end nodes according to the policies set by the provider. Second, we design new network reconfiguration methods to significantly reduce the time it takes to reroute the IB network. Reduced network reconfiguration time means that the interconnection network in a HPC cloud can optimize itself quickly to adapt to changing tenant configurations, faults, running workloads, and current network conditions. Last, we demonstrate a self-adaptive network prototype for IB-based HPC clouds, fully equipped with autonomous monitoring and adaptation, and configurable through a high-level condition-action language for the service providers.

The research conducted in this thesis has potential impacts on both private cloud infrastructures, such as medium sized clusters used for enterprise HPC, and public clouds offering innovative HPC solutions to the customers at scale. The industrial application of the thesis is reflected by the eight patent applications resulted from this work.