Skip to content

Create a custom study

A study is a JSON file in config/studies/ that defines which formats, quality levels, and encoder parameters to test. Creating a new study requires no code changes.

Create a new file in config/studies/. The filename (without .json) becomes the study ID used in all CLI commands:

Terminal window
# config/studies/my-study.json → study ID is "my-study"

Compare AVIF and WebP at a few quality levels:

{
"$schema": "../study.schema.json",
"id": "my-study",
"name": "My Custom Study",
"description": "Compare AVIF and WebP at representative quality levels.",
"dataset": {
"id": "div2k-valid",
"max_images": 10
},
"encoders": [
{
"format": "avif",
"quality": [50, 65, 80],
"speed": 4
},
{
"format": "webp",
"quality": [50, 65, 80],
"method": 4
}
]
}

Sweep quality as a range and test multiple speed settings:

{
"$schema": "../study.schema.json",
"id": "avif-deep-sweep",
"name": "AVIF Deep Quality Sweep",
"description": "Fine-grained quality sweep with speed variants.",
"dataset": {
"id": "div2k-valid",
"max_images": 20
},
"encoders": [
{
"format": "avif",
"quality": { "start": 30, "stop": 90, "step": 5 },
"speed": [2, 4, 6]
}
]
}

The range {"start": 30, "stop": 90, "step": 5} expands to [30, 35, 40, 45, ..., 85, 90].

Test how resolution affects encoding efficiency:

{
"$schema": "../study.schema.json",
"id": "resolution-test",
"name": "Resolution Impact Test",
"description": "Compare encoding at different target resolutions.",
"dataset": {
"id": "div2k-valid",
"max_images": 10
},
"encoders": [
{
"format": "avif",
"quality": [50, 65, 80],
"speed": 4,
"resolution": [1920, 1280, 640]
}
]
}

Study how encoded image area affects one format without changing the measured content or fragment resolution:

{
"$schema": "../study.schema.json",
"id": "webp-crop-impact-custom",
"name": "WebP Crop Impact Custom",
"description": "Measure WebP on different crop sizes around a fixed fragment.",
"dataset": {
"id": "div2k-valid",
"max_images": 20
},
"analysis_fragment_size": 200,
"crop_too_small_strategy": "skip_image",
"analysis": {
"x_axis": "crop",
"group_by": "quality"
},
"comparison": {
"tile_parameter": "crop",
"targets": [
{ "metric": "ssimulacra2", "values": [70, 80, 90] },
{ "metric": "bits_per_pixel", "values": [0.4, 0.8, 1.6] }
]
},
"encoders": [
{
"format": "webp",
"quality": [40, 60, 80, 100],
"method": 6,
"crop": [2048, 1600, 1200, 800, 400]
}
]
}

Unlike a resolution study, this keeps the analysis fragment at full resolution and only changes how much surrounding context is encoded.

FieldTypeRequiredDescription
idstringYesMust match the filename (without .json). Pattern: ^[a-z0-9][a-z0-9_-]*$
namestringYesHuman-readable name.
descriptionstringNoPurpose of the study.
time_budgetnumberNoDefault time budget in seconds. Overridden by CLI --time-budget.
dataset.idstringYesDataset identifier from config/datasets.json.
dataset.max_imagesintegerNoLimit images used from the dataset.
analysis_fragment_sizeintegerNoSquare fragment size measured in crop-impact studies. Defaults to 200.
crop_too_small_strategystringNoWhat to do if a crop level cannot contain the fragment: skip_image, skip_measurement, or adjust_aspect_ratio.
analysisobjectNoPlot configuration overrides such as x_axis and group_by.
comparisonobjectNoComparison-figure configuration such as tile_parameter, targets, and exclude_images.
encodersarrayYesList of encoder configurations (at least one).
FieldTypeRequiredDescription
formatstringYes"jpeg", "webp", "avif", or "jxl".
qualityint/list/rangeYesQuality settings (0–100). See formats below.
speedint/int[]NoAVIF speed (0–10). 0 = slowest/best, 10 = fastest.
effortint/int[]NoJXL effort (1–10). Higher = slower/better.
methodint/int[]NoWebP method (0–6). Higher = slower/better.
chroma_subsamplingstring[]NoModes: "444", "422", "420", "400".
resolutionint/int[]NoTarget resolution(s) in pixels (longest edge).
cropint/int[]NoTarget crop longest-edge values in pixels for crop-impact studies.
extra_argsobjectNoAdditional CLI arguments as key-value pairs.

Do not combine resolution and crop on the same encoder entry. They model two different study types.

Use this to override the automatically chosen plot axes:

{
"analysis": {
"x_axis": "crop",
"group_by": "quality"
}
}

Use this to control comparison figure generation directly from the study config:

{
"comparison": {
"tile_parameter": "crop",
"targets": [
{ "metric": "ssimulacra2", "values": [70, 80, 90] },
{ "metric": "bits_per_pixel", "values": [0.4, 0.8, 1.6] }
],
"exclude_images": ["0828.png"]
}
}
FormatExampleExpands to
Single integer75[75]
Explicit list[60, 75, 90][60, 75, 90]
Range object{"start": 30, "stop": 90, "step": 10}[30, 40, 50, 60, 70, 80, 90]

The extra_args field passes additional CLI arguments directly to the encoder tool. Keys are argument names (without leading dashes) and values are argument values:

{
"format": "avif",
"quality": [60, 75],
"speed": 4,
"extra_args": {
"sharpness": 2,
"ignore-exif": true
}
}

Note that extra_args values are recorded in the quality results for traceability but are not currently passed to the encoder CLI automatically. To use them, the encoder method in src/encoder.py needs to be updated. See Extend formats and metrics for guidance on exposing additional encoder parameters.

  1. Ensure the dataset is available:

    Terminal window
    just fetch div2k-valid
  2. Run the pipeline with a time budget:

    Terminal window
    just pipeline my-study 30m
  3. Analyze results:

    Terminal window
    just analyze my-study
  4. Generate visual comparisons:

    Terminal window
    just compare my-study
  5. Generate a report (includes all studies with results):

    Terminal window
    just report

Adding "$schema": "../study.schema.json" to your file enables in-editor validation and autocompletion in VS Code. The schema enforces valid format names, quality ranges, and parameter bounds.

  • Start small: Use max_images: 5 and a short time budget (5m) to verify your config before running a full study.

  • Dry run: Preview what would run without executing:

    Terminal window
    python3 scripts/run_pipeline.py my-study --time-budget 5m --dry-run
  • Parameter count: The pipeline runs every combination of parameters. With 10 quality levels × 4 speed settings × 2 chroma modes = 80 variants per image — plan time budgets accordingly.

  • Clean up: Remove a study’s data with just clean-study my-study.