|Authors||F. O. Sem-Jacobsen and O. Lysne|
|Title||Fault Tolerance With Shortest Paths in Regular and Irregular Networks|
|Afilliation||, Communication Systems|
|Publication Type||Proceedings, refereed|
|Year of Publication||2008|
|Conference Name||22nd IEEE International Parallel & Distributed Processing Symposium|
Fault tolerance has become an important part of current supercomputers. Local dynamic fault tolerance is the most expedient way of tolerating faults by preconfiguring the network with multiple paths from every node/switch to every destination. In this paper we present a local shortest path dynamic fault-tolerance mechanism inspired by a solution developed for the Internet that can be applied to any shortest path routing algorithm such as Dimension Ordered Routing, Fat Tree Routing, Layered Shortest Path, etc., and provide a solution for achieving deadlock freedom in the presence of faults. Simulation results show that 1) for fat trees this yields the to this day highest throughput and lowest requirements on virtual layers for dynamic one-fault tolerance, 2) we require in general few layers to achieve deadlock freedom, and 3) for irregular topologies it gives at most a 10 times performance increase compared to FRoots.