Previous
Pushing the Frontier of Audiovisual Perception wit
Boyang Wang; Haoran Zhang; Shujie Zhang; Jinkun Hao; Mingda Jia; Qi Lv; Yucheng Mao; Zhaoyang Lyu; Jia Zeng; Xudong Xu; Jiangmiao Pang
This work addresses the critical bottleneck of data scarcity in robotics by leveraging generative video models to create synthetic training data. It advances beyond standard text-to-image augmentation by solving for multi-view consistency and temporal coherence—essential requirements for modern robot policies—while using visual identity prompting for precise scene control. The alignment with the exploding trend of synthetic data for embodied AI, combined with validation on real-world hardware, marks it as a highly relevant and practical contribution to scaling robot learning.