CosmoFlow: Scale-Aware Representation Learning for Cosmology with Flow Matching

Published in ML4Astro Workshop, Co-located with ICML 2025, 2025

Generative machine learning models have been demonstrated to be able to learn low dimensional representations of data that preserve information required for downstream tasks. In this work, we demonstrate that flow matching based generative models can learn compact, semantically rich latent representations of field level cold dark matter (CDM) simulation data without supervision. Our model, CosmoFlow, learns representations 8000x smaller than the raw field data, without degradation in parameter inference accuracy. Our model also learns interpretable representations, in which different latent channels correspond to features at different cosmological scales, generates high-quality reconstructions, and synthesizes new data for cosmological parameters not in the dataset.