AuthorsF. O. Sem-Jacobsen and O. Lysne
EditorsY. Robert
TitleFault Tolerance With Shortest Paths in Regular and Irregular Networks
Afilliation, Communication Systems
Publication TypeProceedings, refereed
Year of Publication2008
Conference Name22nd IEEE International Parallel & Distributed Processing Symposium
Date PublishedApril
ISBN Number978-1-4244-1693-6

Fault tolerance has become an important part of current supercomputers. Local dynamic fault tolerance is the most expedient way of tolerating faults by preconfiguring the network with multiple paths from every node/switch to every destination. In this paper we present a local shortest path dynamic fault-tolerance mechanism inspired by a solution developed for the Internet that can be applied to any shortest path routing algorithm such as Dimension Ordered Routing, Fat Tree Routing, Layered Shortest Path, etc., and provide a solution for achieving deadlock freedom in the presence of faults. Simulation results show that 1) for fat trees this yields the to this day highest throughput and lowest requirements on virtual layers for dynamic one-fault tolerance, 2) we require in general few layers to achieve deadlock freedom, and 3) for irregular topologies it gives at most a 10 times performance increase compared to FRoots.

Citation KeySimula.ND.61