Source Idea
Hierarchical dataset selection can improve domain adaptation by optimizing the source data selection process for transfer learning tasks.
View Source Idea →
Files (10)
- README.md
- metadata.json
- requirements.txt
- src/__init__.py
- src/data_loader.py
- src/dataset_selector.py
- src/evaluate.py
- src/hierarchical_selection.py
- src/train.py
- src/utils.py
README Preview
# Hierarchical Dataset Selection for Domain Adaptation
## Description
This project explores the hypothesis that hierarchical dataset selection can improve domain adaptation by optimizing the source data selection process for transfer learning tasks. The aim is to reduce negative transfer by selecting relevant subsets of the source data most aligned with the target domain.
## Research Hypothesis
Hierarchical dataset selection can improve domain adaptation by optimizing the source data selection process for transfer learning tasks.
## Implementation Approach
The project will involve:
- Implementing a hierarchical dataset selector.
- Applying this selector to domain adaptation benchmarks such as Amazon Reviews and Office-31.
- Comparing performance metrics, such as accuracy and robustness, of models trained with and without hierarchical selection.
## Setup Instructions
1. Clone the repository:
```bash
git clone
cd hierarchical_dataset_selection
```
2. Install the required Python packages:
```bash
pip install -r requirements.txt
```
3. Download the datasets and place them in the `data/` directory.
## Usage Examples
### Training
To train a model with hierarchical dataset selection:
```bash
python src/train.py --use-hierarchy
```
### Evaluation
To evaluate the model:
```bash
python src/evaluate.py
```
## Expected Results
The project aims to demonstrate improved domain adaptation performance using hierarchical dataset selection, showing higher accuracy and robustness compared to standard methods.
## References
- [Hierarchical Dataset Selection for High-Quality Data Sharing](http://arxiv.org/abs/2512.10952v1)