AuthorsN. A. Nordbotten
TitleFault-Tolerant Routing in Interconnection Networks
StatusPublished
Publication TypePhD Thesis
Year of Publication2008
PublisherUniversity of Oslo
Thesis Typephd
ISBN Number1501-7710
Abstract

Interconnection networks are used for connecting the various components of a system, such as the nodes of a parallel computer. In the event that the interconnection network fails, the remainder of the system is left disconnected. Thus, the reliability of the interconnection network is vital for the overall reliability of the system. However, as the network size increases, there is an increased probability that some component will fail. It is therefore essential to be able to keep the interconnection network operational even in the presence of faulty components. In this thesis, this issue is addressed through new methods for fault-tolerant routing. There are two main contributions.

The first is a fault-tolerant routing methodology assuming a static fault-model. The main fault-tolerant mechanism of the methodology is routing via intermediate nodes. In addition, several extensions are provided, enabling the methodology to be adapted to various fault tolerance requirements. The methodology requires no change to the way packets are routed in the fault-free case, can be easily implemented, does not require the use of routing tables, and is well-suited for use in high-performance systems.

The second main contribution is a fault-tolerant routing method supporting a dynamic fault-model. Using this method, network traffic is not required to be stopped at any time, enabling faults in the interconnection network to be made transparent to the applications. The method is therefore applicable to systems that are required to remain operational at all times.

Both methods are valid for both mesh and torus topologies, which are among the most commonly used interconnection network topologies. Furthermore, they provide high network performance, through the use of adaptive routing, and provide graceful performance degradation in the presence of faults.