AuthorsF. Zahid, E. G. Gran, B. Bogdanski, B. D. Johnsen and T. Skeie
TitlePartition-aware routing to improve network isolation in InfiniBand based multi-tenant clusters
AfilliationCommunication Systems, Communication Systems
Publication TypeProceedings, refereed
Year of Publication2015
Conference Name15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)
Date Published07/2015
Place PublishedShenzhen, China
ISBN Number978-1-4799-8006-2

InfiniBand (IB) is a widely used network interconnect for modern high-performance computing systems. In large IB fabrics, network isolation is provided through partitioning. However, routing is oblivious to the partitions in the network. Hence, physical links share flows from different partitions. This sharing of the intermediate links creates interference, which is particularly critical to avoid in multi-tenant environments, like cloud computing. In such systems, each tenant needs predictable network performance, unaffected by the workload of the other tenants. In addition, using the current routing schemes, despite that the links connecting nodes outside partitions are never used, they are routed the same way as the other functional links. This may result in degraded load-balancing.

In this paper, we present an implementation of a partition-aware fat-tree routing algorithm, pFTree. The pFTree utilizes a multifold mechanism to provide performance isolation among partitions belonging to the different tenant groups. Given the available network resources, pFTree starts isolating partitions at the physical link level, and then it moves on to utilize virtual lanes when needed. Our experiments and simulations show that pFTree is able to significantly reduce the affect of inter-partition interference effectively without any additional functional overhead. Furthermore, pFTree also provides improved load-balancing over the state-of-the-art fat-tree routing algorithm.

Citation Key19134