Fast Multi-GPU communication over PCIe
NCCL is used to communicate between multiple GPUs and multiple machines with GPUs when doing distributed Deep Learning Training. When using multiple computers, NCCL uses TCP/IP to communicate.
The tasks for the master project will be to:
- Benchmark and analyze the existing NCCL implementation with TCP/IP
- Use TCP/IP over PCIe to get a baseline performance.
- Write an optimized PCIe transport for NCCL
- Contribute code back to the open-source NCCL project.
Implement PCIe transport in the NVIDIA Collective Communications Library (NCCL) and use Deep Learning Training to benchmark the implementation.
In-depth knowledge on how to distribute workloads over multiple machines connected in a PCIe network. The student will also get detailed insight in working with and modifying and contributing code to an existing open-source library.
Good understanding of C and/or C++ programming. INF3151 or equivalent is recommended.
- Håkon Kvale Stensland
- Pål Halvorsen
- Jonas Markussen, Dolphin Interconnect Solutions
- Hugo Kohmann, Dolphin Interconnect Solutions
Dolphin Interconnect Solutions