Benchmarking Collective Communication on GPU-based Supercomputers

Communication among GPUs is often the most performance-critical part of scientific computing and AI workloads on today's supercomputers. This thesis will tune, analyze and compare collective communication algorithms from state-of-the-art GPU communication libraries.

Being a master’s student

List of projects

Goal

Survey the literature on collective communication algorithms for GPUs, and review implementations provided by libraries such as HPC-X (MPI), NCCL and NVSHMEM.
Measure, analyze and tune the performance of collective operations on GPU systems.
Use these findings to accelerate real-world applications and workloads in scientific computing and machine learning/AI.

Learning outcome

Enhanced GPU programming skills with CUDA.
Understanding of collective communication algorithms and their use in machine learning and scientific computing applications.
Experience with high-performance GPU computing on supercomputers.

Qualifications

Experience with C/C++ programming
Familiarity with GPU programming is very helpful

Supervisors

Johannes Langguth
Xing Cai
James D Trotter

Benchmarking Collective Communication on GPU-based Supercomputers

Goal

Learning outcome

Qualifications

Supervisors

Associated contacts