Mechanisms of Prompt-Induced Hallucination in Vision-Language Models
Authors
William Rudman; Michal Golovanevsky; Dana Arad; Yonatan Belinkov; Ritambhara Singh; Carsten Eickhoff; Kyle Mahowald
Scores
Rationale
This paper effectively bridges the gap between mechanistic interpretability and VLM robustness by identifying specific attention heads responsible for prioritizing prompt text over visual evidence. While the scope is currently limited to object counting, the discovery of a training-free intervention (head ablation) to reduce hallucinations is technically significant and highly relevant to current safety research. The work aligns well with the growing momentum in understanding internal model mechanics, though its long-term impact depends on whether these 'copying mechanisms' generalize to more complex, open-ended multimodal reasoning tasks.