← Back to Ideas

Entropy-Driven Curriculum Learning for Masked Diffusion Training

Feasibility: 7 Novelty: 8

Motivation

The paper uses Denoising Entropy at inference time to guide decoding. However, this signal suggests that certain token dependencies are harder to learn than others. Incorporating this uncertainty metric into the training phase could force the model to focus on 'hard' masking patterns earlier, rather than relying on uniform random masking.

Proposed Method

Modify the training loop of an MDM. Periodically compute the Denoising Entropy of the training set (or a batch subset) using the current model checkpoint. Instead of uniform random masking, sample masks proportional to the entropy map—masking high-entropy regions more frequently to force the model to learn robust representations for difficult features.

Expected Contribution

Improved sample efficiency during training and better handling of complex dependencies (e.g., hands in images or logical connectors in text) in the final model.

Required Resources

High-performance computing cluster for model training, standard MDM datasets (ImageNet or OpenWebText).

Source Paper

Optimizing Decoding Paths in Masked Diffusion Models by Quantifying Uncertainty

View Paper Details →