Scalable RF Simulation in Generative 4D Worlds

University of Pennsylvania
teaser figure

WaveVerse simulates RF signals within the generated 4D worlds.

Abstract

Radio Frequency (RF) sensing has emerged as a powerful, privacy-preserving alternative to vision-based methods for indoor perception tasks. However, collecting high-quality RF data in dynamic and diverse indoor environments remains a major challenge.

To address this, we introduce WaveVerse, a prompt-based, scalable framework that simulates realistic RF signals from generated indoor scenes with human motions. WaveVerse introduces a language-guided 4D world generator, which includes a state-aware causal transformer for human motion generation conditioned on spatial constraints and texts, and a phase-coherent ray tracing simulator that enables the simulation of accurate and coherent RF signals.

Experiments demonstrate the effectiveness of our approach in conditioned human motion generation and highlight how phase coherence is applied to beamforming and respiration monitoring. We further present two case studies in ML-based high-resolution imaging and human activity recognition, demonstrating that WaveVerse not only enables data generation for RF imaging for the first time, but also consistently achieves performance gain in both data-limited and data-adequate scenarios.

Improvement with Simulated Data

teaser figure

Performance comparison over the baseline with varying amounts of additional real and simulated data on: (a) high-resolution RF imaging and (b) human activity recognition. BL: Baseline.

Supplementary Visualization

Below, we present qualitative results for text- and path-conditioned human motion generation, 4D world generation, and Doppler estimation.

Conditional Human Motion Generation

For conditonal human motion generation. We begin with customized conditions to highlight the capabilities of our model, followed by qualitative examples from the test set of the HumanML3D dataset.

Varying Path Lengths

We first fix the text condition to “walk”, maintain the path direction but vary path lengths of 1, 3, 5, and 7 meters. Paths are colored from blue (start) to red (end).

Path length: 1 m
Path length: 3 m
Path length: 5 m
Path length: 7 m

Varying Path Directions

We then change the text to “slowly walk” and the path length is fixed. However, we vary path directions at ±90°, ±45° and ±30°.

Path direction: –90°
Path direction: 90°
Path direction: –45°
Path direction: 45°
Path direction: –30°
Path direction: 30°

Varying Text Descriptions

Now, we adopt the same path direction and length. However, we change the text to: jump, run, walk as if there are stairs in the front, and wave their arms.

Text: Jump
Text: Run
Text: Walk as if there are stairs in the front
Text: Wave their arms

Random Combinations

We show the generalization of the model to random text/path combinations.

Run; 8 m; 30°
Jump; 2.5 m; –90°
Walk as if there are stairs in the front; 3.5 m; 45°
Wave arms; 2.0 m; –30°

Performance on HumanML3D

We provide qualitative results on the HumanML3D test set.

Text: The person takes a step and waves his right hand back and forth.
Text: The person was pushed but did not fall.
Text: A man walks backwards and then stops.
Text: A person walks in a circular motion.
Text: A person begins walking forward first with their left foot, taking wide awkward steps as if they are stepping around or over something; begins walking towards the right and then slowly continues to walk to the left, then continues to walk towards the right coming to a stop off to the right side.
Text: A person bends to the right.
Text: A figure tip toes around while walking in a slalom like motion.
Text: A person who is walking moves forward taking six confident strides.

Generated 4D World

We present a series of dynamic 4D scenes generated by WaveVerse.

Bird’s-eye view 1
A broad gallery; Slowly tour around
Close-up View of Motion
Bird’s-eye view 2
A hallway; Wave the arm
Close-up View of Motion
Bird’s-eye view 3
A zigzag hallway; Navigate
Close-up View of Motion
Bird’s-eye view 4
A keyhole-shaped hallway; Bend to pick something up
Close-up View of Motion
Bird’s-eye view 5
A cozy cabin kitchen; Walk to retrieve items
Close-up View of Motion
Bird’s-eye view 6
A winding corridor; Walk
Close-up View of Motion
Bird’s-eye view 7
An L-shaped hallway; Quickly Move
Close-up View of Motion
Bird’s-eye view 1
A chic bathroom; Walk and almost slip
Close-up View of Motion
Bird’s-eye view 2
A U-shaped hallway; Jump
Close-up View of Motion
Bird’s-eye view 1
A classic music room; Dance
Close-up View of Motion

Doppler Estimation

We simulate a sphere moving back and forth with sinusoidal velocity, observed by a radar. The range-velocity maps reveal the expected sinusoidal pattern and a narrow velocity band across multiple range bins due to the sphere’s extent. Our method, with temporal phase coherence, yields much cleaner maps than conventional ray tracing.

BibTeX

@article{zheng2025scalable,
  title={Scalable RF Simulation in Generative 4D Worlds},
  author={Zheng, Zhiwei and Hu, Dongyin and Zhao, Mingmin},
  journal={arXiv preprint arXiv:2508.12176},
  year={2025}
}