Integrating hierarchical dataset selection with active learning creates a 'Curriculum Dataset Selection' mechanism that accelerates convergence and improves final accuracy compared to static pre-selection.
Motivation
The current approach likely selects a static subset of data before training begins. However, deep learning models benefit from curriculum learning (easy to hard examples). A static selection ignores the changing needs of the model as it matures during training.
Proposed Method
1. Modify the selection agent to be dynamic, re-evaluating the hierarchy at fixed training intervals (epochs). 2. Use the validation loss gradient on the target task as a reward signal to guide the traversal of the dataset hierarchy. 3. Compare a static selection baseline against this dynamic approach where the granularity of selected data evolves (e.g., starting with broad, high-level clusters and narrowing down to difficult sub-clusters).
Expected Contribution
Establishing a framework for 'Active Hierarchical Selection' that optimizes the data stream temporally, not just spatially, leading to compute-efficient training.
Required Resources
High-performance computing cluster for iterative training runs, reinforcement learning framework implementation expertise.