← Back to Ideas

Discretized emergent temporal abstractions from autoregressive models can serve as a static 'skill vocabulary' for offline meta-reinforcement learning, overcoming context length limitations in Decision Transformers.

Feasibility: 8 Novelty: 7

Motivation

While the paper demonstrates emergent abstractions, current autoregressive RL methods (like Decision Transformers) still struggle with extremely long-horizon tasks due to quadratic attention costs. If these emergent abstractions represent distinct sub-skills, explicitly discretizing them into a 'skill vocabulary' could compress effective trajectory lengths by an order of magnitude, enabling planning over much longer horizons.

Proposed Method

1. Train a standard autoregressive model on offline trajectory data (e.g., D4RL). 2. Apply Vector Quantization (VQ) or clustering to the emergent latent temporal states identified in the paper to create a discrete dictionary of 'skill tokens'. 3. Train a high-level Transformer policy that predicts sequences of these skill tokens rather than raw actions. 4. Train a low-level goal-conditioned decoder to execute the skill tokens. Compare performance against standard Decision Transformers on long-horizon benchmarks like AntMaze or Kitchen.

Expected Contribution

A method to scale autoregressive RL to horizons previously unreachable by standard attention mechanisms by leveraging the paper's emergent abstractions as a compression mechanism.

Required Resources

High-end GPU cluster (e.g., 4-8 A100s) for training large Transformer models; standard offline RL benchmarks (D4RL).

Source Paper

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

View Paper Details →