← Back to Ideas

An uncertainty-aware 'Generative Curriculum' where RoboVIP specifically synthesizes trajectories for states with high policy variance will achieve higher sample efficiency than uniform data augmentation.

Feasibility: 7 Novelty: 8

Motivation

Uniformly augmenting data with video generation is inefficient and may not address the specific failure modes of the policy. Focusing the expensive video generation process on the 'boundary conditions' where the robot is confused would maximize the utility of the compute budget.

Proposed Method

Train an ensemble of policies to detect epistemic uncertainty in the state space. Implement an active learning loop where the system identifies high-uncertainty initial states and prompts RoboVIP to generate successful 'imagined' trajectories starting from those states (using text prompts for the goal outcome). Add these specific synthetic examples to the training buffer iteratively.

Expected Contribution

A closed-loop active learning framework that uses generative video to dynamically patch holes in a robot's policy distribution.

Required Resources

Iterative training pipeline, uncertainty estimation mechanisms (ensembles or dropout), generative model inference API.

Source Paper

RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation

View Paper Details →