AuthorsJ. Markussen, L. B. Kristiansen, P. Halvorsen, H. Kielland-Gyrud, H. K. Stensland and C. Griwodz
TitleSmartIO: Zero-overhead Device Sharing through PCIe Networking
AfilliationCommunication Systems, Machine Learning
Project(s)Unified PCIe IO: Unified PCI Express for Distributed Component Virtualization, Department of Holistic Systems, Department of High Performance Computing
StatusPublished
Publication TypeJournal Article
Year of Publication2021
JournalACM Transactions on Computer Systems
Volume38
Issue1-2
Number2
Pagination 1–78
Date Published07/2021
PublisherAssociation for Computing Machinery
Place PublishedNew York, NY, United States
ISSN0734-2071
Abstract

The large variety of compute-heavy and data-driven applications accelerate the need for a distributed I/O solution that enables cost-effective scaling of resources between networked hosts. For example, in a cluster system, different machines may have various devices available at different times, but moving workloads to remote units over the network is often costly and introduces large overheads compared to accessing local resources. To facilitate I/O disaggregation and device sharing among hosts connected using Peripheral Component Interconnect Express (PCIe) non-transparent bridges, we present SmartIO. NVMes, GPUs, network adapters, or any other standard PCIe device may be borrowed and accessed directly, as if they were local to the remote machines. We provide capabilities beyond existing disaggregation solutions by combining traditional I/O with distributed shared-memory functionality, allowing devices to become part of the same global address space as cluster applications. Software is entirely removed from the data path, and simultaneous sharing of a device among application processes running on remote hosts is enabled. Our experimental results show that I/O devices can be shared with remote hosts, achieving native PCIe performance. Thus, compared to existing device distribution mechanisms, SmartIO provides more efficient, low-cost resource sharing, increasing the overall system performance

URLhttps://dl.acm.org/doi/10.1145/3462545
DOI10.1145/3462545
Citation Key27908

Contact person