AuthorsL. Burchard, J. Moe, D. T. Schroeder, K. Pogorelov and J. Langguth
EditorsB. L. Chamberlain, A. Varbanescu, H. Ltaief and P. Luszczek
TitleiPUG: Accelerating Breadth-First Graph Traversals Using Manycore Graphcore IPUs
AfilliationScientific Computing
Project(s)Department of High Performance Computing
Publication TypeProceedings, refereed
Year of Publication2021
Conference NameHigh Performance Computing. ISC High Performance 2021
VolumeLNCS, volume 12728
PublisherSpringer International Publishing
Place PublishedCham
ISBN Number978-3-030-78712-7
ISSN Number0302-9743
KeywordsBFS, Graph500, IPU, Performance optimization

The Graphcore Intelligence Processing Unit (IPU) is a newly developed processor type whose architecture does not rely on the traditional caching hierarchies. Developed to meet the need for more and more data-centric applications, such as machine learning, IPUs combine a dedicated portion of SRAM with each of its numerous cores, resulting in high memory bandwidth at the price of capacity. The proximity of processor cores and memory makes the IPU a promising field of experimentation for graph algorithms since it is the unpredictable, irregular memory accesses that lead to performance losses in traditional processors with pre-caching.

This paper aims to test the IPU’s suitability for algorithms with hard-to-predict memory accesses by implementing a breadth-first search (BFS) that complies with the Graph500 specifications. Precisely because of its apparent simplicity, BFS is an established benchmark that is not only subroutine for a variety of more complex graph algorithms, but also allows comparability across a wide range of architectures.

We benchmark our IPU code on a wide range of instances and compare its performance to state-of-the-art CPU and GPU codes. The results indicate that the IPU delivers speedups of up to 4×4× over the fastest competing result on an NVIDIA V100 GPU, with typical speedups of about 1.5×1.5× on most test instances.

Citation Key28037

Contact person