Generate visual comparisons

Generate comparisons

After running the pipeline, generate visual comparison figures for a study:

just compare format-comparison

The comparison generator independently re-encodes the selected source image (no encoded artifacts from the pipeline are needed) and assembles labeled side-by-side grids together with Butteraugli distortion maps.

For crop-impact studies, it reconstructs the crop windows around the stored analysis fragment so every tile compares the same content at the same pixel resolution even though the full encoded image area changes.

What the comparison generator does

The generator works through the following steps:

Image selection — for each target group defined in the study config, selects the source image with the highest cross-format coefficient of variation (CV = std / mean) of the output metric (e.g. bits_per_pixel when targeting a quality score). This maximises the relative spread of visible differences across encoding variants.
Quality interpolation — for every target value (e.g. SSIMULACRA2 = 70), interpolates the encoder quality setting per format using measurements from quality.json, then re-encodes the image to those settings on the fly.
Fragment selection — computes per-pixel Butteraugli distortion maps, builds an aggregate anisotropic standard-deviation map across all target values in the group, and uses it to pick the single most informative crop region for the whole group.
Figure assembly — assembles labeled comparison grids using ImageMagick montage, plus distortion-map grids and an annotated original for each group. Missing variants are rendered as fixed-position placeholders so tile order stays stable across figures in the same set.

Two types of comparison figures

For each target group the generator produces two kinds of figure:

Figure type	Target metric	Purpose
Matched visual quality	`ssimulacra2`, `psnr`, `ssim`, `butteraugli`	Show artifact character at equal perceived quality
Matched file size	`bits_per_pixel`	Show quality differences under equal relative bit-budget constraints

A single study run can yield both figure types simultaneously when the study config lists target groups of both kinds.

What it produces

Output goes to data/analysis/<study-id>/comparison/.

For each target group a set of figures is created:

<metric>/comparison_<value>.webp — crop grid at the target value
<metric>/distortion_map_comparison_<value>.webp — distortion-map grid at the target value
<metric>/distortion_map_anisotropic.webp — aggregate anisotropic std map used for fragment selection
<metric>/original_annotated.webp — source image with the selected fragment highlighted

Resolution- or crop-split studies may add intermediate subdirectories such as r720/ or c800/ above the metric directory when those parameters produce separate figure groups.

Prerequisites

quality.json must exist for the study: run just pipeline <study-id> <budget> first.
The source dataset images must be present on disk: run just fetch <dataset-id> first.
Comparison configuration (target values, tile parameter, excluded images) is read from the study JSON in config/studies/.

Unlike the main pipeline the comparison script is fully self-contained: it re-encodes images independently and does not depend on any encoded artifacts saved by the pipeline.

Advanced options

# Use a larger crop region (default: 128 px before zoom)
python3 scripts/generate_comparison.py format-comparison --crop-size 192

# Change zoom factor (default: 3×)
python3 scripts/generate_comparison.py format-comparison --zoom 4

# Override the parameter that creates tiles within each figure
python3 scripts/generate_comparison.py format-comparison --tile-parameter format

# Pin to a specific source image instead of auto-selection
python3 scripts/generate_comparison.py format-comparison --source-image data/preprocessed/0801.png

# Generate crop-impact comparisons
python3 scripts/generate_comparison.py avif-crop-impact

# Custom output directory
python3 scripts/generate_comparison.py format-comparison --output data/analysis/custom-dir

# List studies that have quality measurements available
python3 scripts/generate_comparison.py --list

Configuring comparison targets in the study file

Add a comparison section to the study JSON to control which figures are produced:

{
  "comparison": {
    "targets": [
      { "metric": "ssimulacra2", "values": [60, 75, 90] },
      { "metric": "bits_per_pixel", "values": [0.5, 1.0, 1.5] }
    ],
    "tile_parameter": "format",
    "exclude_images": ["problematic_image.png"]
  }
}

When no targets are configured, the generator defaults to ssimulacra2 = [60, 70, 80] and bits_per_pixel = [0.5, 1.0, 1.5].

Tile labels use BPP instead of absolute file size so comparisons remain valid across different crop sizes and resolutions.