Fetch datasets
Fetch a dataset
Section titled “Fetch a dataset”Use the dataset ID to download and extract a dataset:
just fetch div2k-validThe dataset is downloaded, extracted, and stored under data/datasets/.
Available datasets
Section titled “Available datasets”| ID | Images | Resolution | Format | Size |
|---|---|---|---|---|
div2k-valid | 100 | 2K | PNG | ~449 MB |
div2k-train | 800 | 2K | PNG | ~3.5 GB |
liu4k-v1-valid | 80 | 4K | PNG | ~1.3 GB |
liu4k-v1-train | 800 | 4K | PNG | ~10 GB |
liu4k-v2-valid | 400 | 4K–6K | PNG | ~15 GB |
liu4k-v2-train | 1600 | 4K–6K | PNG | ~60 GB |
uhd-iqa-full | 6073 | 4K | JPEG | ~10.7 GB |
You can also list available datasets with the CLI script:
python3 scripts/fetch_dataset.py --listRecommendations
Section titled “Recommendations”- Development and testing:
div2k-valid(smallest, fast to download) - 4K research:
liu4k-v1-valid(high resolution, manageable size) - Large-scale studies:
liu4k-v2-trainoruhd-iqa-full
For unbiased format comparison, prefer lossless PNG datasets (DIV2K, LIU4K) over UHD-IQA (JPEG source with pre-existing compression artifacts).
Check downloaded datasets
Section titled “Check downloaded datasets”python3 scripts/fetch_dataset.py --show-downloadedAdvanced options
Section titled “Advanced options”The fetch_dataset.py script offers additional flags:
# Keep the archive after extraction (deleted by default)python3 scripts/fetch_dataset.py div2k-train --keep-archive
# Use a custom datasets directorypython3 scripts/fetch_dataset.py div2k-valid --datasets-dir /path/to/datasets
# Use a custom configuration filepython3 scripts/fetch_dataset.py div2k-valid --config /path/to/datasets.jsonNotes on LIU4K datasets
Section titled “Notes on LIU4K datasets”- All LIU4K datasets use CC BY-NC-ND 4.0 license
- Downloaded from Google Drive (may encounter quota limits)
- LIU4K v2 uses multi-part zip archives and requires 7z (pre-installed in the dev container)
Troubleshooting
Section titled “Troubleshooting”- Download fails: Retry — Google Drive has occasional quota limits. As a fallback, download manually and place files in
data/datasets/ - Dataset not found: Run
python3 scripts/fetch_dataset.py --listto check available IDs - Insufficient space: Check disk usage; a full pipeline run on DIV2K validation needs ~1 GB free
See also
Section titled “See also”- Datasets reference — detailed dataset properties, licensing, and comparisons
- Configuration reference —
config/datasets.jsonschema