|Authors||L. Burchard, J. Moe, D. T. Schroeder, K. Pogorelov and J. Langguth|
|Editors||B. L. Chamberlain, A. Varbanescu, H. Ltaief and P. Luszczek|
|Title||iPUG: Accelerating Breadth-First Graph Traversals Using Manycore Graphcore IPUs|
|Project(s)||Department of High Performance Computing|
|Publication Type||Proceedings, refereed|
|Year of Publication||2021|
|Conference Name||High Performance Computing. ISC High Performance 2021|
|Volume||LNCS, volume 12728|
|Publisher||Springer International Publishing|
|Keywords||BFS, Graph500, IPU, Performance optimization|
The Graphcore Intelligence Processing Unit (IPU) is a newly developed processor type whose architecture does not rely on the traditional caching hierarchies. Developed to meet the need for more and more data-centric applications, such as machine learning, IPUs combine a dedicated portion of SRAM with each of its numerous cores, resulting in high memory bandwidth at the price of capacity. The proximity of processor cores and memory makes the IPU a promising field of experimentation for graph algorithms since it is the unpredictable, irregular memory accesses that lead to performance losses in traditional processors with pre-caching.
This paper aims to test the IPU’s suitability for algorithms with hard-to-predict memory accesses by implementing a breadth-first search (BFS) that complies with the Graph500 specifications. Precisely because of its apparent simplicity, BFS is an established benchmark that is not only subroutine for a variety of more complex graph algorithms, but also allows comparability across a wide range of architectures.
We benchmark our IPU code on a wide range of instances and compare its performance to state-of-the-art CPU and GPU codes. The results indicate that the IPU delivers speedups of up to 4×4× over the fastest competing result on an NVIDIA V100 GPU, with typical speedups of about 1.5×1.5× on most test instances.