The 'Curse of Dimensionality' in Transformers can be bypassed by a 'Phase Transition Curriculum' that anneals the model from a geometric interpolation phase (high temperature) to a topological SPT phase (low temperature).
Motivation
Directly optimizing for topological invariants is difficult due to discrete symmetries and vanishing gradients. Physics suggests that ordered phases emerge from disordered ones via cooling. A curriculum that gradually enforces the non-Abelian gauge constraints might allow standard Transformers to converge into this robust reasoning regime without specialized architecture.
Proposed Method
Modify the training of a standard Transformer on algorithmic tasks. Introduce a 'temperature' parameter scaling the softmax and a regularization term penalizing gauge symmetry violations. Start training with high temperature (standard attention behavior) and slowly anneal to zero while increasing the symmetry penalty, forcing the model to 'freeze' into the SPT phase.
Expected Contribution
A training recipe that confers the robustness benefits of the new architecture onto standard, widely-used Transformer architectures.
Required Resources
Standard compute cluster, algorithmic reasoning datasets (e.g., CLRS-30).