AuthorsA. Mejía, J. Flich, J. Duato, S. Reinemo and T. Skeie
EditorsS. Ali
TitleSegment-Based Routing: an Efficient Fault-Tolerant Routing Algorithm for Meshes and Tori
AfilliationCommunication Systems, Communication Systems
Publication TypeProceedings, refereed
Year of Publication2006
Conference Name20th IEEE International Parallel & Distributed Processing Symposium
Date PublishedApril
PublisherIEEE Computer Society
Place PublishedWashington, USA
ISBN Number1-4244-0054-6

Computers get faster every year, but the demand for computing resources seems to grow at an even faster rate. Science keeps demanding more processing power for calculations and simulations, growth in E-commerce requires powerful servers to offer seamless online shopping, and massive multiplayer online games requires powerful and stable systems to keep their virtual worlds running 24 hours a day. Depending on the problem domain, this demand for more power can be satisfied by either, massively parallel computers, or clusters of computer. Common for both approaches is the dependence on high performance interconnect networks such as Myrinet, Infiniband, or 10 Gigabit Ethernet. While high throughput and low latency are key features of interconnection networks, the issue of fault-tolerance is now becoming increasingly important. As the number of network components grows so does the probability for failure, thus it becomes important to also consider the fault-tolerance mechanism of interconnection networks. The main challenge then lies in combining performance and fault-tolerance, while still keeping cost and complexity low. This paper proposes a new deterministic routing methodology for tori and meshes, which achieves high performance without the use of virtual channels. Furthermore, it is topology agnostic in nature, meaning it can handle any topology derived from any combination of faults when combined with static reconfiguration. The algorithm, referred to as Segment-based Routing (SR), works by partitioning a topology into subnets, and subnets into segments. This allows us to place bidirectional turn restrictions locally within a segment. As segments are independent, we gain the freedom to place turn restrictions within a segment independently from other segments. This results in a larger degree of freedom when placing turn restrictions compared to other routing strategies. In this paper a way to compute segment-based routing tables is presented and applied to meshes and tori. Preliminary evaluation results show that the concept of segments leads to an increase in performance by a factor of 1.8 over FX and up*/down* routing.

Citation KeyMejia.2006.1