analysis
module src.analysis
Section titled “module src.analysis”Analysis and visualization module.
This module handles data analysis and visualization of compression efficiency and quality metrics from study quality measurements.
Global Variables
Section titled “Global Variables”- BITS_PER_BYTE
- METRIC_DIRECTIONS
- MARKERS
function load_quality_results
Section titled “function load_quality_results”load_quality_results(quality_json_path: Path) → dict[str, Any]Load quality measurement results from JSON file.
Args:
quality_json_path: Path to quality.json file
Returns: Quality results dictionary
Raises:
FileNotFoundError: If the file doesn’t existValueError: If the JSON has no ‘measurements’ field
function create_analysis_dataframe
Section titled “function create_analysis_dataframe”create_analysis_dataframe(quality_results: dict[str, Any]) → DataFrameCreate analysis DataFrame from quality measurements.
Args:
quality_results: Quality results dictionary
Returns: DataFrame with all measurements and derived metrics
function compute_statistics
Section titled “function compute_statistics”compute_statistics(df: DataFrame, group_by: list[str]) → DataFrameCompute statistical aggregates for quality and efficiency metrics.
Args:
df: DataFrame with measurementsgroup_by: Columns to group by (e.g., [‘format’, ‘quality’, ‘chroma_subsampling’])
Returns: DataFrame with mean, percentiles (5, 25, 50, 75, 95) for each metric
function determine_varying_parameters
Section titled “function determine_varying_parameters”determine_varying_parameters(df: DataFrame) → list[str]Determine which parameters vary in the dataframe.
Args:
df: DataFrame with measurements
Returns: List of parameter columns that have more than one unique value
function determine_sweep_parameter
Section titled “function determine_sweep_parameter”determine_sweep_parameter(df: DataFrame) → strDetermine which parameter has the longest sweep range for primary x-axis.
Args:
df: DataFrame with measurements
Returns: Column name of the parameter with most unique values
function determine_secondary_sweep_parameter
Section titled “function determine_secondary_sweep_parameter”determine_secondary_sweep_parameter(df: DataFrame, primary: str) → str | NoneDetermine the second longest sweep parameter for grouping.
This is used for rate-distortion plots where we want to connect points along the second-most varied parameter.
Args:
df: DataFrame with measurementsprimary: The primary sweep parameter (to exclude)
Returns: Column name of the parameter with second-most unique values, or None
function resolve_axis_parameters
Section titled “function resolve_axis_parameters”resolve_axis_parameters( df: DataFrame, stats: DataFrame, x_axis: str | None = None, group_by: str | None = None, study_config_path: Path | None = None, quiet: bool = False) → tuple[str, str | None]Determine the primary x-axis and secondary grouping parameters.
Resolution order (for both x_axis and group_by):
- Explicit argument (
x_axis/group_by). 2. Study configuration file (analysis.x_axis/analysis.group_by). 3. Built-in heuristic: parameter with most / second-most unique values.
This helper centralises the logic so that the static SVG analysis and the interactive Plotly report produce identical axis choices.
Args:
df: Analysis DataFrame (used by the heuristic).stats: Statistics DataFrame (used for column-existence checks).x_axis: Explicit override for the x-axis parameter.group_by: Explicit override for the grouping parameter.study_config_path: Optional path to the study configuration JSON —analysis.x_axisandanalysis.group_byare read from it when present.quiet: Suppress informational prints.
Returns:
(x_param, secondary_param) — resolved x-axis and grouping parameter (the latter may be None).
function get_worst_percentile_col
Section titled “function get_worst_percentile_col”get_worst_percentile_col(metric: str) → strGet the appropriate percentile column for worst-case values.
For “higher is better” metrics, worst = p05 (lowest 5%) For “lower is better” metrics, worst = p95 (highest 5%)
Args:
metric: Metric name
Returns: Column suffix for the worst percentile
function plot_quality_metrics
Section titled “function plot_quality_metrics”plot_quality_metrics( stats: DataFrame, x_param: str, metric: str, output_path: Path, title: str | None = None) → NonePlot mean and worst percentile quality metrics.
Args:
stats: Statistics DataFramex_param: Parameter to use as x-axis (can be grouping column or _mean statistic)metric: Metric to plot (e.g., ‘ssimulacra2’)output_path: Path to save plot (WebP format)title: Optional custom title
function plot_rate_distortion
Section titled “function plot_rate_distortion”plot_rate_distortion( stats: DataFrame, metric: str, grouping_param: str | None, output_path: Path, title: str | None = None, primary_param: str | None = None) → NonePlot quality metric vs bits_per_pixel (rate-distortion curve).
Args:
stats: Statistics DataFramemetric: Quality metric to plot (e.g., ‘ssimulacra2’)grouping_param: Parameter to group lines by (e.g., ‘format’, ‘chroma_subsampling’)output_path: Path to save plot (WebP format)title: Optional custom titleprimary_param: Primary sweep parameter to sort points by (e.g., ‘quality’, ‘speed’). When provided, points within each group are connected in the order of this parameter rather than by bits_per_pixel, giving a meaningful line for non-monotonic sweeps such as speed or effort settings.
function plot_efficiency_metrics
Section titled “function plot_efficiency_metrics”plot_efficiency_metrics( stats: DataFrame, x_param: str, efficiency_metric: str, output_path: Path, title: str | None = None) → NonePlot encoder efficiency metrics (bits per quality per pixel).
Args:
stats: Statistics DataFramex_param: Parameter to use as x-axisefficiency_metric: Efficiency metric (e.g., ‘bits_per_ssimulacra2_per_pixel’)output_path: Path to save plot (WebP format)title: Optional custom title
function plot_bits_per_pixel
Section titled “function plot_bits_per_pixel”plot_bits_per_pixel( stats: DataFrame, x_param: str, output_path: Path, title: str | None = None) → NonePlot bits per pixel with mean, 5th and 95th percentiles.
Args:
stats: Statistics DataFramex_param: Parameter to use as x-axisoutput_path: Path to save plot (WebP format)title: Optional custom title
function plot_encoding_time_per_pixel
Section titled “function plot_encoding_time_per_pixel”plot_encoding_time_per_pixel( stats: DataFrame, x_param: str, output_path: Path, title: str | None = None) → NonePlot encoding time per pixel with mean, 5th and 95th percentiles.
Automatically uses logarithmic scale if the dynamic range is large (>10x).
Args:
stats: Statistics DataFramex_param: Parameter to use as x-axisoutput_path: Path to save plot (SVG format)title: Optional custom title
function analyze_study
Section titled “function analyze_study”analyze_study( quality_json_path: Path, output_dir: Path, x_axis: str | None = None, group_by: str | None = None, study_config_path: Path | None = None) → NoneRun complete analysis for a study.
Generates:
- CSV with statistics
- Quality metric plots vs sweep parameter (mean + 5% worst)
- Quality metric plots vs bits_per_pixel (rate-distortion curves)
- Bits per pixel plots (mean + 5% smallest + 95% largest)
- Encoding time per pixel plots (mean + 5% fastest + 95% slowest)
- Efficiency metric plots vs sweep parameter (mean + 5% worst)
Parameter resolution order for x_axis and group_by:
- Explicit CLI argument (
x_axis/group_byparameters). 2. Study configuration file (analysis.x_axis/analysis.group_by). 3. Built-in heuristic: parameter with most / second-most unique values.
Args:
quality_json_path: Path to quality.json fileoutput_dir: Directory to save analysis outputsx_axis: Override for primary x-axis parameter. WhenNonethe value from the study config or the heuristic is used.group_by: Override for secondary (line-grouping) parameter. WhenNonethe value from the study config or the heuristic is used.study_config_path: Optional path to the study configuration JSON file. When provided,analysis.x_axisandanalysis.group_byare read from it.