Skip to content

pipeline

Merged encode+measure pipeline with time-budget support.

This module provides a unified pipeline that processes images with a worker-per-image architecture, encoding and measuring quality in a single pass. This eliminates the need to store intermediate encoded files on disk and allows time-budget-based processing where the pipeline processes as many images as possible within a given time constraint.

Architecture:

  • Each worker processes one complete image at a time (all encoding tasks sequentially)
  • Workers pull the next image from a queue when they finish
  • This keeps all workers fully utilized throughout the pipeline
  • Memory-intensive operations are naturally staggered across workers, reducing peak memory usage

Key advantages over the separate encode → measure workflow:

  • Time-budget control: Set a wall-clock time limit instead of guessing how many images to process. The pipeline processes as many images as possible within the budget.
  • Full worker utilization: Workers always have work available, no idle time waiting for other tasks to complete.
  • Reduced peak memory: Tasks are staggered across workers rather than synchronized, preventing memory spikes from parallel execution of memory-intensive tools.
  • Reduced disk IO: Encoded files are written to temporary storage and cleaned up after measurement. Optional save_artifacts flag persists them to disk.
  • Per-image error isolation: All operations for one image are grouped within a single worker. If encoding or measurement fails, the worker logs the error and moves to the next image.
  • UTC

parse_time_budget(value: str) → float

Parse a human-readable time budget string into seconds.

Accepted formats:

  • Plain number: interpreted as seconds ("3600" → 3600.0)
  • Duration suffixes: "1h", "30m", "90s", "1h30m", "2h15m30s"

Args:

  • value: Time budget string.

Returns: Duration in seconds.

Raises:

  • ValueError: If the format cannot be parsed.

Merged encode+measure pipeline with time-budget support.

Uses a worker-per-image architecture where each worker processes one complete image before moving to the next. For each image, the worker:

  1. Preprocesses (resize) for every configured resolution. 2. Encodes all parameter combinations sequentially. 3. Measures quality of each encoded variant. 4. Pulls the next image from the queue if time budget allows.

This architecture keeps all workers fully utilized and naturally staggers memory-intensive operations across workers, reducing peak memory usage.

Time budget behavior:

  • Initial batch fills all available workers (max throughput at start)
  • Budget is checked before submitting additional images
  • When budget expires, new submissions stop but in-flight work completes
  • Note: In-flight images process sequentially on their assigned workers, which may leave some workers idle during the finish phase. A future optimization could switch to task-level parallelism after budget expiry.

Encoded files live in a temporary directory and are discarded after measurement unless save_artifacts=True.

__init__(project_root: Path) → None

run(
config: StudyConfig,
time_budget: float | None = None,
save_artifacts: bool = False,
num_workers: int | None = None
) → QualityResults

Run the merged encode+measure pipeline.

Args:

  • config: Study configuration describing dataset, encoders, and optional preprocessing.
  • time_budget: Maximum wall-clock seconds to spend. When set, the pipeline processes images until this budget is exhausted (always completing the current image). None means process all available images.
  • save_artifacts: If True, persist encoded files to data/encoded/<study_id>/.
  • num_workers: Parallel workers (default: CPU count).

Returns:

  • :class: QualityResults ready for analysis / report.

Raises:

  • ValueError: If dataset is not found in configuration.
  • FileNotFoundError: If dataset is not downloaded or has no images.