|Authors||N. A. Nordbotten and T. Skeie|
|Editors||S. Aluru, M. Parashar, R. Badrinath and V. K. Prasanna|
|Title||A Routing Methodology for Dynamic Fault Tolerance in Meshes and Tori|
|Afilliation||, Communication Systems|
|Publication Type||Proceedings, refereed|
|Year of Publication||2007|
|Conference Name||International Conference on High Performance Computing (HiPC)|
This paper proposes a fully distributed fault-tolerant routing methodology for tori and meshes. A dynamic fault-model is supported, enabling the network to remain fully operational at all times. Contrary to most previous proposals that support a dynamic fault-model, the methodology is able to tolerate concave fault regions, thereby avoiding disabling healthy nodes in most practical scenarios. The methodology provides high network performance through the use of adaptive routing and provides graceful performance degradation in the presence of faults.