Skip to content

analysis

Analysis and visualization module.

This module handles data analysis and visualization of compression efficiency and quality metrics from study quality measurements.

  • BITS_PER_BYTE
  • METRIC_DIRECTIONS
  • MARKERS

load_quality_results(quality_json_path: Path) → dict[str, Any]

Load quality measurement results from JSON file.

Args:

  • quality_json_path: Path to quality.json file

Returns: Quality results dictionary

Raises:

  • FileNotFoundError: If the file doesn’t exist
  • ValueError: If the JSON has no ‘measurements’ field

create_analysis_dataframe(quality_results: dict[str, Any]) → DataFrame

Create analysis DataFrame from quality measurements.

Args:

  • quality_results: Quality results dictionary

Returns: DataFrame with all measurements and derived metrics


compute_statistics(df: DataFrame, group_by: list[str]) → DataFrame

Compute statistical aggregates for quality and efficiency metrics.

Args:

  • df: DataFrame with measurements
  • group_by: Columns to group by (e.g., [‘format’, ‘quality’, ‘chroma_subsampling’])

Returns: DataFrame with mean, percentiles (5, 25, 50, 75, 95) for each metric


determine_varying_parameters(df: DataFrame) → list[str]

Determine which parameters vary in the dataframe.

Args:

  • df: DataFrame with measurements

Returns: List of parameter columns that have more than one unique value


determine_sweep_parameter(df: DataFrame) → str

Determine which parameter has the longest sweep range for primary x-axis.

Args:

  • df: DataFrame with measurements

Returns: Column name of the parameter with most unique values


function determine_secondary_sweep_parameter

Section titled “function determine_secondary_sweep_parameter”
determine_secondary_sweep_parameter(df: DataFrame, primary: str) → str | None

Determine the second longest sweep parameter for grouping.

This is used for rate-distortion plots where we want to connect points along the second-most varied parameter.

Args:

  • df: DataFrame with measurements
  • primary: The primary sweep parameter (to exclude)

Returns: Column name of the parameter with second-most unique values, or None


resolve_axis_parameters(
df: DataFrame,
stats: DataFrame,
x_axis: str | None = None,
group_by: str | None = None,
study_config_path: Path | None = None,
quiet: bool = False
) → tuple[str, str | None]

Determine the primary x-axis and secondary grouping parameters.

Resolution order (for both x_axis and group_by):

  1. Explicit argument (x_axis / group_by). 2. Study configuration file (analysis.x_axis / analysis.group_by). 3. Built-in heuristic: parameter with most / second-most unique values.

This helper centralises the logic so that the static SVG analysis and the interactive Plotly report produce identical axis choices.

Args:

  • df: Analysis DataFrame (used by the heuristic).
  • stats: Statistics DataFrame (used for column-existence checks).
  • x_axis: Explicit override for the x-axis parameter.
  • group_by: Explicit override for the grouping parameter.
  • study_config_path: Optional path to the study configuration JSON — analysis.x_axis and analysis.group_by are read from it when present.
  • quiet: Suppress informational prints.

Returns: (x_param, secondary_param) — resolved x-axis and grouping parameter (the latter may be None).


get_worst_percentile_col(metric: str) → str

Get the appropriate percentile column for worst-case values.

For “higher is better” metrics, worst = p05 (lowest 5%) For “lower is better” metrics, worst = p95 (highest 5%)

Args:

  • metric: Metric name

Returns: Column suffix for the worst percentile


plot_quality_metrics(
stats: DataFrame,
x_param: str,
metric: str,
output_path: Path,
title: str | None = None
) → None

Plot mean and worst percentile quality metrics.

Args:

  • stats: Statistics DataFrame
  • x_param: Parameter to use as x-axis (can be grouping column or _mean statistic)
  • metric: Metric to plot (e.g., ‘ssimulacra2’)
  • output_path: Path to save plot (WebP format)
  • title: Optional custom title

plot_rate_distortion(
stats: DataFrame,
metric: str,
grouping_param: str | None,
output_path: Path,
title: str | None = None,
primary_param: str | None = None
) → None

Plot quality metric vs bits_per_pixel (rate-distortion curve).

Args:

  • stats: Statistics DataFrame
  • metric: Quality metric to plot (e.g., ‘ssimulacra2’)
  • grouping_param: Parameter to group lines by (e.g., ‘format’, ‘chroma_subsampling’)
  • output_path: Path to save plot (WebP format)
  • title: Optional custom title
  • primary_param: Primary sweep parameter to sort points by (e.g., ‘quality’, ‘speed’). When provided, points within each group are connected in the order of this parameter rather than by bits_per_pixel, giving a meaningful line for non-monotonic sweeps such as speed or effort settings.

plot_efficiency_metrics(
stats: DataFrame,
x_param: str,
efficiency_metric: str,
output_path: Path,
title: str | None = None
) → None

Plot encoder efficiency metrics (bits per quality per pixel).

Args:

  • stats: Statistics DataFrame
  • x_param: Parameter to use as x-axis
  • efficiency_metric: Efficiency metric (e.g., ‘bits_per_ssimulacra2_per_pixel’)
  • output_path: Path to save plot (WebP format)
  • title: Optional custom title

plot_bits_per_pixel(
stats: DataFrame,
x_param: str,
output_path: Path,
title: str | None = None
) → None

Plot bits per pixel with mean, 5th and 95th percentiles.

Args:

  • stats: Statistics DataFrame
  • x_param: Parameter to use as x-axis
  • output_path: Path to save plot (WebP format)
  • title: Optional custom title

plot_encoding_time_per_pixel(
stats: DataFrame,
x_param: str,
output_path: Path,
title: str | None = None
) → None

Plot encoding time per pixel with mean, 5th and 95th percentiles.

Automatically uses logarithmic scale if the dynamic range is large (>10x).

Args:

  • stats: Statistics DataFrame
  • x_param: Parameter to use as x-axis
  • output_path: Path to save plot (SVG format)
  • title: Optional custom title

analyze_study(
quality_json_path: Path,
output_dir: Path,
x_axis: str | None = None,
group_by: str | None = None,
study_config_path: Path | None = None
) → None

Run complete analysis for a study.

Generates:

  • CSV with statistics
  • Quality metric plots vs sweep parameter (mean + 5% worst)
  • Quality metric plots vs bits_per_pixel (rate-distortion curves)
  • Bits per pixel plots (mean + 5% smallest + 95% largest)
  • Encoding time per pixel plots (mean + 5% fastest + 95% slowest)
  • Efficiency metric plots vs sweep parameter (mean + 5% worst)

Parameter resolution order for x_axis and group_by:

  1. Explicit CLI argument (x_axis / group_by parameters). 2. Study configuration file (analysis.x_axis / analysis.group_by). 3. Built-in heuristic: parameter with most / second-most unique values.

Args:

  • quality_json_path: Path to quality.json file
  • output_dir: Directory to save analysis outputs
  • x_axis: Override for primary x-axis parameter. When None the value from the study config or the heuristic is used.
  • group_by: Override for secondary (line-grouping) parameter. When None the value from the study config or the heuristic is used.
  • study_config_path: Optional path to the study configuration JSON file. When provided, analysis.x_axis and analysis.group_by are read from it.