Radio Frequency (RF) sensing has emerged as a powerful, privacy-preserving alternative to vision-based methods for indoor perception tasks. However, collecting high-quality RF data in dynamic and diverse indoor environments remains a major challenge.
To address this, we introduce WaveVerse, a prompt-based, scalable framework that simulates realistic RF signals from generated indoor scenes with human motions. WaveVerse introduces a language-guided 4D world generator, which includes a state-aware causal transformer for human motion generation conditioned on spatial constraints and texts, and a phase-coherent ray tracing simulator that enables the simulation of accurate and coherent RF signals.
Experiments demonstrate the effectiveness of our approach in conditioned human motion generation and highlight how phase coherence is applied to beamforming and respiration monitoring. We further present two case studies in ML-based high-resolution imaging and human activity recognition, demonstrating that WaveVerse not only enables data generation for RF imaging for the first time, but also consistently achieves performance gain in both data-limited and data-adequate scenarios.
Below, we present qualitative results for text- and path-conditioned human motion generation, 4D world generation, and Doppler estimation.
For conditonal human motion generation. We begin with customized conditions to highlight the capabilities of our model, followed by qualitative examples from the test set of the HumanML3D dataset.
We first fix the text condition to “walk”, maintain the path direction but vary path lengths of 1, 3, 5, and 7 meters. Paths are colored from blue (start) to red (end).
We then change the text to “slowly walk” and the path length is fixed. However, we vary path directions at ±90°, ±45° and ±30°.
Now, we adopt the same path direction and length. However, we change the text to: jump, run, walk as if there are stairs in the front, and wave their arms.
We show the generalization of the model to random text/path combinations.
We provide qualitative results on the HumanML3D test set.
We present a series of dynamic 4D scenes generated by WaveVerse.
We simulate a sphere moving back and forth with sinusoidal velocity, observed by a radar. The range-velocity maps reveal the expected sinusoidal pattern and a narrow velocity band across multiple range bins due to the sphere’s extent. Our method, with temporal phase coherence, yields much cleaner maps than conventional ray tracing.
@article{zheng2025scalable,
title={Scalable RF Simulation in Generative 4D Worlds},
author={Zheng, Zhiwei and Hu, Dongyin and Zhao, Mingmin},
journal={arXiv preprint arXiv:2508.12176},
year={2025}
}