← Back to Ideas

Integrating hierarchical dataset selection with active learning creates a 'Curriculum Dataset Selection' mechanism that accelerates convergence and improves final accuracy compared to static pre-selection.

Feasibility: 7 Novelty: 9

Motivation

The current approach likely selects a static subset of data before training begins. However, deep learning models benefit from curriculum learning (easy to hard examples). A static selection ignores the changing needs of the model as it matures during training.

Proposed Method

1. Modify the selection agent to be dynamic, re-evaluating the hierarchy at fixed training intervals (epochs). 2. Use the validation loss gradient on the target task as a reward signal to guide the traversal of the dataset hierarchy. 3. Compare a static selection baseline against this dynamic approach where the granularity of selected data evolves (e.g., starting with broad, high-level clusters and narrowing down to difficult sub-clusters).

Expected Contribution

Establishing a framework for 'Active Hierarchical Selection' that optimizes the data stream temporally, not just spatially, leading to compute-efficient training.

Required Resources

High-performance computing cluster for iterative training runs, reinforcement learning framework implementation expertise.

Source Paper

Hierarchical Dataset Selection for High-Quality Data Sharing

View Paper Details →

←

Latent semantic hierarchies derived from foundatio

→

Hierarchical dataset selection can optimize the pr