AuthorsB. Bogdanski, S. Reinemo, F. O. Sem-Jacobsen and E. G. Gran
TitleSFtree: a Fully Connected and Deadlock Free Switch-to-Switch Routing Algorithm for Fat-Trees
AfilliationCloud, Communication Systems
StatusPublished
Publication TypeJournal Article
Year of Publication2012
JournalACM Transactions on Architecture and Code Optimization
Volume8
Number4
Date PublishedJanuary
PublisherACM
Abstract

Existing fat-tree routing algorithms fully exploit the path diversity of a fat-tree topology in the context of compute node traffic, but they lack support for deadlock free and fully connected switch-to-switch communication. Such support is crucial for efficient system management, for example in InfiniBand (IB) systems. With the general increase in system management capabilities found in modern InfiniBand switches, the lack of deadlock free switch-to-switch communication is a problem for fat-tree based IB installations because management traffic might cause routing deadlocks that bring the whole system down. This lack of deadlock free communication affects all system management and diagnostic tools using LID routing. In this paper, we propose the sFtree routing algorithm that guarantees deadlock free and fully connected switch-to-switch communication in fat-trees while maintaining the properties of the current fat-tree algorithm. We prove that the algorithm is deadlock free and we implement it in OpenSM for evaluation. We evaluate the performance of the sFtree algorithm experimentally on a small cluster and we do a large-scale evaluation through simulations. The results confirm that the sFtree routing algorithm is deadlock free and show that the impact of switch-to-switch management traffic on the end-node traffic is negligible.

DOI10.1145/2086696.208673
Citation KeySimula.simula.864