← Back to Ideas

The 'reasoning gap' identified by MMGR correlates strongly with the failure of generative models to serve as World Models for Reinforcement Learning agents, implying MMGR scores can predict downstream RL transferability.

Feasibility: 9 Novelty: 8

Motivation

There is a push to use video generation models as simulators for training robots/agents. However, if a model fails MMGR (e.g., walls disappear), the RL agent learns invalid policies. Establishing a correlation between MMGR scores and RL agent success rates would validate MMGR as the standard metric for 'World Model' viability, beyond just image generation.

Proposed Method

Select 3 video generation models with varying MMGR scores. Use these models to generate synthetic environments (dreaming) for an offline RL agent trained on navigation tasks (Embodied Navigation subset of MMGR). Evaluate the agent's zero-shot performance in a true ground-truth simulator. Analyze the correlation between the generator's MMGR score and the agent's success rate.

Expected Contribution

Validation of MMGR as a proxy metric for Embodied AI utility, shifting the evaluation focus of video generation from 'watching' to 'acting'.

Required Resources

RL training pipeline, pre-trained video models, standard RL environments (e.g., Habitat or distinct navigation grids).

Source Paper

MMGR: Multi-Modal Generative Reasoning

View Paper Details →