CosmoFlow: Scale-Aware Representation Learning for Cosmology with Flow Matching
Published in ML4Astro Workshop, Co-located with ICML 2025, 2025
Published in ML4Astro Workshop, Co-located with ICML 2025, 2025
Published in Nature Communications, 2024
Domain-specific hardware to solve computationally hard optimization problems has generated tremendous excitement. Here, we evaluate probabilistic bit (p-bit) based Ising Machines (IM) on the 3-Regular 3-Exclusive OR Satisfiability (3R3X), as a representative hard optimization problem. We first introduce a multiplexed architecture that emulates all-to-all network functionality while maintaining highly parallelized chromatic Gibbs sampling. We implement this architecture in a single Field-Programmable Gate Array (FPGA) and show that running the adaptive parallel tempering algorithm demonstrates competitive algorithmic and prefactor advantages over alternative IMs by D-Wave, Toshiba, and Fujitsu. We also implement higher-order interactions that lead to better prefactors without changing algorithmic scaling for the XORSAT problem. Even though FPGA implementations of p-bits are still not quite as fast as the best possible greedy algorithms accelerated on Graphics Processing Units (GPU), scaled magnetic versions of p-bit IMs could lead to orders of magnitude improvements over the state of the art for generic optimization.
Published in International Symposium on Computer Architectures, 2024
Data movement is a hot-button topic today, with workloads like machine learning (ML) training, graph processing, and data analytics consuming datasets as large as 30PB. Such a dataset would take almost a week to transfer at 400 gbps while consuming megajoules of energy just to operate the two endpoints’ optical transceivers. All of this time and energy is seen as an unavoidable overhead on top of directly accessing the disks that store the data. In this paper, we re-evaluate the fundamental assumption of networked data copying and instead propose the adoption of embodied data movement. Our insight is that solid state disks (SSDs) have been rapidly growing in an under-exploited way: their data density, both in TB per unit volume and unit mass. With data centres reaching kilometres in length, we propose a new architecture featuring data centre hyperloops2 (DHLs) where large datasets, stored on commodity SSDs, are moved via magnetic levitation in low-pressure tubes. By eliminating much of the potential friction inherent to embodied data movement, DHLs offer more efficient data movement, with SSDs potentially travelling at hundreds of metres per second. Consequently, a contemporary dataset can be moved through a DHL in seconds and then accessed with local latency and bandwidth well into the terabytes per second. DHLs have the potential to massively reduce the network bandwidth and energy consumption associated with moving large datasets, but raise a variety of questions regarding the viability of their realisation and deployment. Through flexibility and creative engineering, we argue that many potential issues can be resolved. Further, we present models of DHLs and their application to workloads with growing data movement demands, such as training machine learning algorithms, large-scale physics experiments, and data centre backups. For a fixed data movement task, we obtain energy reductions of 1.6× to 376.1× and time speedups from 114.8× to 646.4× versus 400gbps optical networking. When modelling DHL in simulation, we obtain time speedups of between 5.7× and 118× (iso-power) and communication power reductions of between 6.4× and 135× (iso-time) to train an iteration of a representative DLRM workload. We provide a cost analysis, showing that DHLs are financially practical. With the scale of the improvements realisable through DHLs, we consider this paper a call to action for our community to grapple with the remaining architectural challenges.2HyperLoopTM is a term for high-speed transportation using magnetic levitation trains and low-pressure tubes; it does not imply a loop topology.