Fast Multi-GPU communication over PCIe

NCCL (pronounced "Nickel") is a stand-alone library of standard collective communication routines for GPUs. It has been optimized to achieve high bandwidth on platforms using PCIe, NVLink, NVswitch, as well as networking using TCP/IP sockets. NCCL supports an arbitrary number of GPUs installed in a single node or across multiple nodes and can be used in either single- or multi-process (e.g., MPI) applications.
Master

NCCL is used to communicate between multiple GPUs and multiple machines with GPUs when doing distributed Deep Learning Training. When using multiple computers, NCCL uses TCP/IP to communicate.

The tasks for the master project will be to:

  • Benchmark and analyze the existing NCCL implementation with TCP/IP
  • Use TCP/IP over PCIe to get a baseline performance.
  • Write an optimized PCIe transport for NCCL
  • Contribute code back to the open-source NCCL project.

Goal

Implement PCIe transport in the NVIDIA Collective Communications Library (NCCL) and use Deep Learning Training to benchmark the implementation.

Learning outcome

In-depth knowledge on how to distribute workloads over multiple machines connected in a PCIe network. The student will also get detailed insight in working with and modifying and contributing code to an existing open-source library.

Qualifications

Good understanding of C and/or C++ programming. INF3151 or equivalent is recommended.

Supervisors

  • Håkon Kvale Stensland
  • Pål Halvorsen
  • Jonas Markussen, Dolphin Interconnect Solutions
  • Hugo Kohmann, Dolphin Interconnect Solutions

Collaboration partners

Dolphin Interconnect Solutions

References

github.com/NVIDIA/nccl

Contact person