← Back to Papers

Mechanisms of Prompt-Induced Hallucination in Vision-Language Models

7.14 2601.05201 · 2026-01-08

Authors

William Rudman; Michal Golovanevsky; Dana Arad; Yonatan Belinkov; Ritambhara Singh; Carsten Eickhoff; Kyle Mahowald

Scores

7.2
Novelty
7.2
Technical
6.3
Transferability
8.5
Momentum
7.2
Evidence
6.2
Breakthrough

Rationale

This paper effectively bridges the gap between mechanistic interpretability and VLM robustness by identifying specific attention heads responsible for prioritizing prompt text over visual evidence. While the scope is currently limited to object counting, the discovery of a training-free intervention (head ablation) to reduce hallucinations is technically significant and highly relevant to current safety research. The work aligns well with the growing momentum in understanding internal model mechanics, though its long-term impact depends on whether these 'copying mechanisms' generalize to more complex, open-ended multimodal reasoning tasks.