Source Idea
Decomposing visual model policies into internal modular policies can enhance visual reasoning and object recognition tasks similar to language models.
View Source Idea →
Files (8)
- README.md
- metadata.json
- requirements.txt
- src/data_loader.py
- src/evaluate.py
- src/model.py
- src/train.py
- src/utils.py
README Preview
# Visual Policy Decomposition
## Description
This project explores the hypothesis that decomposing visual model policies into internal modular policies can enhance visual reasoning and object recognition tasks. Inspired by techniques used in language models, this approach applies a bottom-up policy optimization to CNN architectures for visual tasks.
## Research Hypothesis
Decomposing visual model policies into internal modular policies can enhance visual reasoning and object recognition tasks similar to language models.
## Implementation Approach
- Use a CNN architecture tailored for visual tasks.
- Decompose the CNN into internal modular policies.
- Fine-tune each module using a reinforcement learning framework.
- Evaluate performance on visual reasoning benchmarks.
## Setup Instructions
1. Clone the repository.
2. Install dependencies using `pip install -r requirements.txt`.
3. Configure your environment in `config/config.yaml`.
## Usage Examples
- Train the model: `python src/train.py`
- Evaluate the model: `python src/evaluate.py`
## Expected Results
Improved performance in visual reasoning and object recognition tasks through modular policy decomposition.
## References
- Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies ([Paper](http://arxiv.org/abs/2512.19673v1))