Discretized emergent temporal abstractions from autoregressive models can serve as a static 'skill vocabulary' for offline meta-reinforcement learning, overcoming context length limitations in Decision Transformers.
Motivation
While the paper demonstrates emergent abstractions, current autoregressive RL methods (like Decision Transformers) still struggle with extremely long-horizon tasks due to quadratic attention costs. If these emergent abstractions represent distinct sub-skills, explicitly discretizing them into a 'skill vocabulary' could compress effective trajectory lengths by an order of magnitude, enabling planning over much longer horizons.
Proposed Method
1. Train a standard autoregressive model on offline trajectory data (e.g., D4RL). 2. Apply Vector Quantization (VQ) or clustering to the emergent latent temporal states identified in the paper to create a discrete dictionary of 'skill tokens'. 3. Train a high-level Transformer policy that predicts sequences of these skill tokens rather than raw actions. 4. Train a low-level goal-conditioned decoder to execute the skill tokens. Compare performance against standard Decision Transformers on long-horizon benchmarks like AntMaze or Kitchen.
Expected Contribution
A method to scale autoregressive RL to horizons previously unreachable by standard attention mechanisms by leveraging the paper's emergent abstractions as a compression mechanism.
Required Resources
High-end GPU cluster (e.g., 4-8 A100s) for training large Transformer models; standard offline RL benchmarks (D4RL).
Source Paper
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning