Benchmarking Collective Communication on GPU-based Supercomputers

Benchmarking Collective Communication on GPU-based Supercomputers

Communication among GPUs is often the most performance-critical part of scientific computing and AI workloads on today's supercomputers. This thesis will tune, analyze and compare collective communication algorithms from state-of-the-art GPU communication libraries.

Communication among GPUs is often the most performance-critical part of scientific computing and AI workloads on today's supercomputers. This thesis will tune, analyze and compare collective communication algorithms from state-of-the-art GPU communication libraries.

Goal

  • Survey the literature on collective communication algorithms for GPUs, and review implementations provided by libraries such as HPC-X (MPI), NCCL and NVSHMEM.
  • Measure, analyze and tune the performance of collective operations on GPU systems.
  • Use these findings to accelerate real-world applications and workloads in scientific computing and machine learning/AI.

Learning outcome

  • Enhanced GPU programming skills with CUDA.
  • Understanding of collective communication algorithms and their use in machine learning and scientific computing applications.
  • Experience with high-performance GPU computing on supercomputers.

Qualifications

  • Experience with C/C++ programming
  • Familiarity with GPU programming is very helpful

Supervisors

  • Johannes Langguth
  • Xing Cai
  • James D Trotter

Associated contacts