AuthorsE. Tasoulas, F. Zahid, E. G. Gran, K. Begnum, B. D. Johnsen and T. Skeie
TitleEfficient Routing and Reconfiguration in Virtualized HPC Environments with vSwitch-enabled Lossless Networks
AfilliationCommunication Systems
Project(s)ERAC: Efficient and Robust Architecture for the Big Data Cloud
Publication TypeJournal Article
Year of Publication2018
JournalConcurrency and Computation: Practice and Experience
PublisherJohn Wiley & Sons
KeywordsInfiniBand, Lossless Interconnection Networks, Network Reconfiguration, Network Routing, SR-IOV, Virtualization

To meet the demands of communication-intensive workloads in the cloud, virtual machines (VMs) should utilize low overhead network communication paradigms. In general, such paradigms enable VMs to directly communicate with the hardware by means of a passthrough technology like Single-Root I/O Virtualization (SR-IOV). However, when passthrough-based virtualization is coupled with lossless interconnection networks, live-migrations introduce scalability challenges due to the substantial network reconfiguration overhead. With these challenges in mind we proposed a virtual switch (vSwitch) SR-IOV architecture for InfiniBand in (33). In this paper, we first suggest solutions to rectify the space-domain scalability issues that are present in vSwitch-enabled subnets as a result of the VMs using dedicated layer-two addresses. Then we discuss routing strategies for virtualized environments using vSwitches, and present a routing algorithm for Fat-Trees. We also present a reconfiguration method that minimizes imposed reconfiguration overhead on Fat-Trees. We perform an extensive evaluation of our prototype algorithms, and as vSwitch-enabled hardware does not yet exist, we deduce from empirical observations by emulating vSwitches with existing hardware, as well as large-scale simulations. Our results show significant reduction in the reconfiguration times as route recalculations can be eliminated, and for certain scenarios, the number of reconfiguration subnet management packets sent to switches is reduced from several hundred thousand down to a single one without degrading the routing quality.