Getting started

This tutorial walks you through setting up the environment and running a complete image format comparison — from dataset download to interactive report.

Prerequisites

VS Code with the Dev Containers extension
Docker installed and running

Step 1: Set up the environment

Clone the repository:

git clone https://github.com/kadykov/web-image-formats-research.git
cd web-image-formats-research

Open in VS Code and start the dev container:
Terminal window
```
code .
```
VS Code will detect the .devcontainer/ configuration and prompt you to “Reopen in Container”. Click it. The first build takes several minutes because it compiles image encoding tools from source.
Verify the setup:
Terminal window
```
just verify-tools
```
You should see checkmarks for all encoding tools (cjpeg, cwebp, avifenc, cjxl) and quality measurement tools (ssimulacra2, butteraugli_main, ffmpeg).
Run the quality checks:
Terminal window
```
just check
```
This runs formatting checks, linting, type checking, and all tests. Everything should pass in a fresh dev container.

Step 2: Fetch a dataset

Studies need source images. Fetch the DIV2K validation dataset (100 images, ~450 MB):

just fetch div2k-valid

For higher resolution research, you can also fetch 4K datasets (see Fetch Datasets for all options).

Step 3: Run a study

Run the format comparison study, which encodes each image as JPEG, WebP, AVIF, and JPEG XL and measures quality metrics. Give it a 30-minute time budget:

just pipeline format-comparison 30m

The pipeline will:

Pick images from the dataset one at a time
Encode each image in all configured formats and quality levels
Measure SSIMULACRA2, PSNR, SSIM, and Butteraugli for every encoded variant
Save results to data/metrics/format-comparison/quality.json
Repeat until the 30-minute budget runs out

Step 4: Analyze results

Generate statistical summaries and static plots:

just analyze format-comparison

This creates CSV statistics and SVG plots in data/analysis/format-comparison/.

Step 5: Generate visual comparisons

Generate side-by-side comparison images showing the worst-case encoding regions with Butteraugli distortion maps:

just compare format-comparison

Step 6: Generate an interactive report

Combine everything into an interactive HTML report with Plotly visualizations:

just report

Preview it locally:

just serve-report

Open http://localhost:8000 in your browser to explore rate-distortion curves, quality-vs-parameter plots, and comparison images.

Next steps

Run the pipeline — time budgets, advanced options
Fetch datasets — all supported datasets
Analyze results — understand the CSV and plots
Generate comparisons — visual comparison options
Generate reports — interactive HTML reports
Architecture — design decisions and rationale

Customize your research

Add a custom dataset — register new image sources
Create a custom study — define your own encoding experiments
Extend formats and metrics — add new encoders or quality metrics
Run studies on GitHub Actions — run studies on CI infrastructure