|Authors||F. Zahid, E. G. Gran, B. Bogdanski, B. D. Johnsen and T. Skeie|
|Title||Partition-aware routing to improve network isolation in InfiniBand based multi-tenant clusters|
|Afilliation||Communication Systems, Communication Systems|
|Publication Type||Proceedings, refereed|
|Year of Publication||2015|
|Conference Name||15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)|
|Place Published||Shenzhen, China|
InfiniBand (IB) is a widely used network interconnect for modern high-performance computing systems. In large IB fabrics, network isolation is provided through partitioning. However, routing is oblivious to the partitions in the network. Hence, physical links share flows from different partitions. This sharing of the intermediate links creates interference, which is particularly critical to avoid in multi-tenant environments, like cloud computing. In such systems, each tenant needs predictable network performance, unaffected by the workload of the other tenants. In addition, using the current routing schemes, despite that the links connecting nodes outside partitions are never used, they are routed the same way as the other functional links. This may result in degraded load-balancing.
In this paper, we present an implementation of a partition-aware fat-tree routing algorithm, pFTree. The pFTree utilizes a multifold mechanism to provide performance isolation among partitions belonging to the different tenant groups. Given the available network resources, pFTree starts isolating partitions at the physical link level, and then it moves on to utilize virtual lanes when needed. Our experiments and simulations show that pFTree is able to significantly reduce the affect of inter-partition interference effectively without any additional functional overhead. Furthermore, pFTree also provides improved load-balancing over the state-of-the-art fat-tree routing algorithm.