AuthorsM. Sourouri, J. Langguth, F. Spiga, S. Baden and X. Cai
TitleCPU+GPU Programming of Stencil Computations for Resource-Efficient Use of GPU Clusters
AfilliationScientific Computing, ,
Project(s)Center for Biomedical Computing (SFF)
Publication TypeProceedings, refereed
Year of Publication2015
Conference NameIEEE 18th International Conference on Computational Science and Engineering
Date Published10/2015
PublisherIEEE Computer Society
KeywordsCPU+GPU computing, CUDA, GPU, MPI, stencil

On modern GPU clusters, the role of the CPUs is often restricted to controlling the GPUs and handling MPI communication. The unused computing power of the CPUs, however, can be considerable for computations whose performance is bounded by memory traffic. This paper investigates the challenges of simultaneous usage of CPUs and GPUs for computation. Our emphasis is on deriving a heterogeneous CPU+GPU programming approach that combines MPI, OpenMP and CUDA. To effectively hide the overhead of various inter- and intra-node communications, a new level of task parallelism is introduced on top of the conventional data parallelism. Combined with a suitable workload division between the CPUs and GPUs, our CPU+GPU programming approach is able to fully utilize the different processing units. The programming details and achievable performance are exemplified by a widely used 3D 7-point stencil computation, which shows high performance and scaling in experiments using up to 64 CPU-GPU nodes.

Citation Key23704

Contact person