Getting started
This tutorial walks you through setting up the environment and running a complete image format comparison — from dataset download to interactive report.
Prerequisites
Section titled “Prerequisites”- VS Code with the Dev Containers extension
- Docker installed and running
Step 1: Set up the environment
Section titled “Step 1: Set up the environment”-
Clone the repository:
Terminal window git clone https://github.com/kadykov/web-image-formats-research.gitcd web-image-formats-research -
Open in VS Code and start the dev container:
Terminal window code .VS Code will detect the
.devcontainer/configuration and prompt you to “Reopen in Container”. Click it. The first build takes several minutes because it compiles image encoding tools from source. -
Verify the setup:
Terminal window just verify-toolsYou should see checkmarks for all encoding tools (cjpeg, cwebp, avifenc, cjxl) and quality measurement tools (ssimulacra2, butteraugli_main, ffmpeg).
-
Run the quality checks:
Terminal window just checkThis runs formatting checks, linting, type checking, and all tests. Everything should pass in a fresh dev container.
Step 2: Fetch a dataset
Section titled “Step 2: Fetch a dataset”Studies need source images. Fetch the DIV2K validation dataset (100 images, ~450 MB):
just fetch div2k-validFor higher resolution research, you can also fetch 4K datasets (see Fetch Datasets for all options).
Step 3: Run a study
Section titled “Step 3: Run a study”Run the format comparison study, which encodes each image as JPEG, WebP, AVIF, and JPEG XL and measures quality metrics. Give it a 30-minute time budget:
just pipeline format-comparison 30mThe pipeline will:
- Pick images from the dataset one at a time
- Encode each image in all configured formats and quality levels
- Measure SSIMULACRA2, PSNR, SSIM, and Butteraugli for every encoded variant
- Save results to
data/metrics/format-comparison/quality.json - Repeat until the 30-minute budget runs out
Step 4: Analyze results
Section titled “Step 4: Analyze results”Generate statistical summaries and static plots:
just analyze format-comparisonThis creates CSV statistics and SVG plots in data/analysis/format-comparison/.
Step 5: Generate visual comparisons
Section titled “Step 5: Generate visual comparisons”Generate side-by-side comparison images showing the worst-case encoding regions with Butteraugli distortion maps:
just compare format-comparisonStep 6: Generate an interactive report
Section titled “Step 6: Generate an interactive report”Combine everything into an interactive HTML report with Plotly visualizations:
just reportPreview it locally:
just serve-reportOpen http://localhost:8000 in your browser to explore rate-distortion curves, quality-vs-parameter plots, and comparison images.
Next steps
Section titled “Next steps”- Run the pipeline — time budgets, advanced options
- Fetch datasets — all supported datasets
- Analyze results — understand the CSV and plots
- Generate comparisons — visual comparison options
- Generate reports — interactive HTML reports
- Architecture — design decisions and rationale
Customize your research
Section titled “Customize your research”- Add a custom dataset — register new image sources
- Create a custom study — define your own encoding experiments
- Extend formats and metrics — add new encoders or quality metrics
- Run studies on GitHub Actions — run studies on CI infrastructure