We propose an inference-time scaling approach for pretrained flow models. Recently, inference-time scaling has gained significant attention in LLMs and diffusion models, improving sample quality or better aligning outputs with user preferences by leveraging additional computation. For diffusion models, particle sampling has allowed more efficient scaling due to the stochasticity at intermediate denoising steps. On the contrary, while flow models have gained popularity as an alternative to diffusion models--offering faster generation and high-quality outputs in state-of-the-art image and video generative models--efficient inference-time scaling methods used for diffusion models cannot be directly applied due to their deterministic generative process. To enable efficient inference-time scaling for flow models, we propose three key ideas: 1) SDE-based generation, enabling particle sampling in flow models, 2) Interpolant conversion employs an alternative generative trajectory, broadening the search space and enhancing sample diversity, and 3) Rollover Budget Forcing (RBF), an adaptive allocation of computational resources across timesteps to maximize budget utilization. Our experiments show that SDE-based generation, particularly variance-preserving (VP) interpolant-based generation, improves the performance of particle sampling methods for inference-time scaling in flow models. Additionally, we demonstrate that RBF with VP-SDE achieves the best performance, outperforming all previous inference-time scaling approaches.
Previous particle sampling methods [1-3] use a fixed number of particles at each denoising step. Our analysis reveals that the required number of function evaluations (NFEs) to find a better sample varies across timesteps, making uniform allocation inefficient. Rollover Budget Forcing (RBF) addresses this by adopting a rollover strategy: when a high-reward particle is found early within the allocated quota, the unused NFEs are carried over to the next step—enabling adaptive compute allocation and more effective alignment.
Without additional training, our method, RBF, can align pretrained flow models with diverse user preferences, including logical relations, spatial relations, and object quantities.
We observe a consistent improvement in alignment when applying inference-time SDE conversion (linear SDE) and interpolant conversion (VP SDE), as they expand the search space. This enables efficient use of particle sampling in flow models, outperforming other search methods based on linear ODE, such as BoN and SoP [4].
We thank Seungwoo Yoo and Juil Koo for providing constructive feedback of our manuscript. Thank you to Phillip Y. Lee for helpful discussions on Vision Language Models.
@article{kim2025inferencetimescalingflowmodels,
title = {Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing},
author = {Jaihoon Kim and Taehoon Yoon and Jisung Hwang and Minhyuk Sung},
journal={arXiv preprint arXiv:2503.19385},
year = {2025}}