Skip to content

Fetch datasets

Use the dataset ID to download and extract a dataset:

Terminal window
just fetch div2k-valid

The dataset is downloaded, extracted, and stored under data/datasets/.

IDImagesResolutionFormatSize
div2k-valid1002KPNG~449 MB
div2k-train8002KPNG~3.5 GB
liu4k-v1-valid804KPNG~1.3 GB
liu4k-v1-train8004KPNG~10 GB
liu4k-v2-valid4004K–6KPNG~15 GB
liu4k-v2-train16004K–6KPNG~60 GB
uhd-iqa-full60734KJPEG~10.7 GB

You can also list available datasets with the CLI script:

Terminal window
python3 scripts/fetch_dataset.py --list
  • Development and testing: div2k-valid (smallest, fast to download)
  • 4K research: liu4k-v1-valid (high resolution, manageable size)
  • Large-scale studies: liu4k-v2-train or uhd-iqa-full

For unbiased format comparison, prefer lossless PNG datasets (DIV2K, LIU4K) over UHD-IQA (JPEG source with pre-existing compression artifacts).

Terminal window
python3 scripts/fetch_dataset.py --show-downloaded

The fetch_dataset.py script offers additional flags:

Terminal window
# Keep the archive after extraction (deleted by default)
python3 scripts/fetch_dataset.py div2k-train --keep-archive
# Use a custom datasets directory
python3 scripts/fetch_dataset.py div2k-valid --datasets-dir /path/to/datasets
# Use a custom configuration file
python3 scripts/fetch_dataset.py div2k-valid --config /path/to/datasets.json
  • All LIU4K datasets use CC BY-NC-ND 4.0 license
  • Downloaded from Google Drive (may encounter quota limits)
  • LIU4K v2 uses multi-part zip archives and requires 7z (pre-installed in the dev container)
  • Download fails: Retry — Google Drive has occasional quota limits. As a fallback, download manually and place files in data/datasets/
  • Dataset not found: Run python3 scripts/fetch_dataset.py --list to check available IDs
  • Insufficient space: Check disk usage; a full pipeline run on DIV2K validation needs ~1 GB free