The GenEnv framework can be extended to Embodied AI by co-evolving Python simulation code (defining physics/geometry) and Vision-Language Model (VLM) control policies, achieving zero-shot sim-to-real transfer superior to Domain Randomization.
Motivation
Current embodied agents rely on static Domain Randomization to bridge the sim-to-real gap. GenEnv's text-based environment generation can be adapted to generate executable simulation code (e.g., MuJoCo/Isaac Gym scripts), allowing the environment to intelligently evolve physical complexities (friction, obstacle geometry) that specifically challenge the agent's current motor control policies.
Proposed Method
Replace the text-based environment simulator with a 'Coder LLM' that outputs simulation scripts (e.g., XML/Python for physics engines). The 'Agent' is a VLM controlling a simulated robot. The Coder LLM evolves to generate increasingly complex physical tasks (e.g., stacking irregular objects) based on the Agent's success rate. Test the final policy on a real-world robot arm against policies trained with standard Domain Randomization.
Expected Contribution
A method for 'Procedural Physics Co-Evolution' that automates curriculum generation for robotics, potentially solving the data scarcity problem in embodied AI.
Required Resources
Physics simulation platform (Isaac Gym/MuJoCo), VLM backbone (e.g., GPT-4V or open source equivalent), and robotic hardware for validation.
Source Paper
GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators