The Case for Data Centre Hyperloops

Published in International Symposium on Computer Architectures, 2024

Data movement is a hot-button topic today, with workloads like machine learning (ML) training, graph processing, and data analytics consuming datasets as large as 30PB. Such a dataset would take almost a week to transfer at 400 gbps while consuming megajoules of energy just to operate the two endpoints’ optical transceivers. All of this time and energy is seen as an unavoidable overhead on top of directly accessing the disks that store the data. In this paper, we re-evaluate the fundamental assumption of networked data copying and instead propose the adoption of embodied data movement. Our insight is that solid state disks (SSDs) have been rapidly growing in an under-exploited way: their data density, both in TB per unit volume and unit mass. With data centres reaching kilometres in length, we propose a new architecture featuring data centre hyperloops2 (DHLs) where large datasets, stored on commodity SSDs, are moved via magnetic levitation in low-pressure tubes. By eliminating much of the potential friction inherent to embodied data movement, DHLs offer more efficient data movement, with SSDs potentially travelling at hundreds of metres per second. Consequently, a contemporary dataset can be moved through a DHL in seconds and then accessed with local latency and bandwidth well into the terabytes per second. DHLs have the potential to massively reduce the network bandwidth and energy consumption associated with moving large datasets, but raise a variety of questions regarding the viability of their realisation and deployment. Through flexibility and creative engineering, we argue that many potential issues can be resolved. Further, we present models of DHLs and their application to workloads with growing data movement demands, such as training machine learning algorithms, large-scale physics experiments, and data centre backups. For a fixed data movement task, we obtain energy reductions of 1.6× to 376.1× and time speedups from 114.8× to 646.4× versus 400gbps optical networking. When modelling DHL in simulation, we obtain time speedups of between 5.7× and 118× (iso-power) and communication power reductions of between 6.4× and 135× (iso-time) to train an iteration of a representative DLRM workload. We provide a cost analysis, showing that DHLs are financially practical. With the scale of the improvements realisable through DHLs, we consider this paper a call to action for our community to grapple with the remaining architectural challenges.2HyperLoopTM is a term for high-speed transportation using magnetic levitation trains and low-pressure tubes; it does not imply a loop topology.