Authors | L. Burchard, J. Moe, D. T. Schroeder, K. Pogorelov and J. Langguth |
Editors | B. L. Chamberlain, A. Varbanescu, H. Ltaief and P. Luszczek |
Title | iPUG: Accelerating Breadth-First Graph Traversals Using Manycore Graphcore IPUs |
Afilliation | Scientific Computing |
Project(s) | Department of High Performance Computing |
Status | Published |
Publication Type | Proceedings, refereed |
Year of Publication | 2021 |
Conference Name | High Performance Computing. ISC High Performance 2021 |
Volume | LNCS, volume 12728 |
Pagination | 291-309 |
Publisher | Springer International Publishing |
Place Published | Cham |
ISBN Number | 978-3-030-78712-7 |
ISSN Number | 0302-9743 |
Keywords | BFS, Graph500, IPU, Performance optimization |
Abstract | The Graphcore Intelligence Processing Unit (IPU) is a newly developed processor type whose architecture does not rely on the traditional caching hierarchies. Developed to meet the need for more and more data-centric applications, such as machine learning, IPUs combine a dedicated portion of SRAM with each of its numerous cores, resulting in high memory bandwidth at the price of capacity. The proximity of processor cores and memory makes the IPU a promising field of experimentation for graph algorithms since it is the unpredictable, irregular memory accesses that lead to performance losses in traditional processors with pre-caching. This paper aims to test the IPU’s suitability for algorithms with hard-to-predict memory accesses by implementing a breadth-first search (BFS) that complies with the Graph500 specifications. Precisely because of its apparent simplicity, BFS is an established benchmark that is not only subroutine for a variety of more complex graph algorithms, but also allows comparability across a wide range of architectures. We benchmark our IPU code on a wide range of instances and compare its performance to state-of-the-art CPU and GPU codes. The results indicate that the IPU delivers speedups of up to 4×4× over the fastest competing result on an NVIDIA V100 GPU, with typical speedups of about 1.5×1.5× on most test instances. |
URL | https://link.springer.com/10.1007/978-3-030-78713-4 |
DOI | 10.1007/978-3-030-78713-4 |
Citation Key | 28037 |