Skip to content

Public research with GitHub Actions

This project uses GitHub Actions not just for CI/CD, but as a public research platform. Encoding studies run on GitHub-hosted runners, and the results — including every parameter, tool version, and measurement — are published openly through GitHub Releases and GitHub Pages.

Because this is an open-source repository, every workflow run is visible: anyone can inspect the exact execution logs, configuration inputs, and timing of each study. This transparency is the foundation of reproducible image format research.

GitHub provides generous free-tier runners for public repositories:

ResourceSpecification
CPU4 cores (x86_64)
RAM16 GB
Storage14 GB SSD
Job timeout6 hours
Concurrent jobs20

This is sufficient to run studies on the DIV2K validation dataset (100 images, ~450 MB) with multiple format and parameter combinations within a single workflow run.

Every workflow run records:

  • Input parameters — time budget, study selection
  • Tool versions — encoder and metric tool versions are logged and saved in quality results JSON
  • Execution logs — full stdout/stderr for every step
  • Timing — per-step durations visible in the Actions UI

Anyone can verify what was run and how by examining the workflow run page.

The project enforces single-threaded encoding (avifenc -j 1, cjxl --num_threads=1) so that results are deterministic regardless of the runner’s CPU count. Combined with pinned tool versions in the Dockerfile (e.g., libjxl v0.11.2, libavif v1.3.0, libaom v3.13.1), this ensures that running the same study twice produces identical encoded files and quality measurements.

The study workflow (.github/workflows/study.yml) is triggered manually via workflow_dispatch and orchestrates seven jobs:

build-image ──┐
├── fetch-dataset ──┐
prepare ──────┘ │
├── run-study (parallel per study)
│ ├── pipeline (encode + measure)
│ └── analyze (statistics + plots)
├── run-comparison (parallel per study, after run-study)
│ └── generate visual comparison figures
│ (re-encodes from dataset; no pipeline artefacts needed)
└── generate-report (after run-comparison)
├── interactive HTML report
├── release notes
└── release assets (CSV)
deploy-report → GitHub Pages
release → GitHub Release
  • Dev container image: The workflow builds the same dev container used for local development and pushes it to GHCR. Every subsequent job runs inside this image, ensuring tool parity between CI and local environments.

  • Parallel studies: Each study runs as an independent matrix job. A format-comparison study and an AVIF speed sweep run simultaneously on separate runners, maximizing throughput within the 6-hour window.

  • Separate comparison job: Visual comparison figure generation (run-comparison) runs as a dedicated job after run-study, downloading the dataset and study results as artifacts. This keeps the pipeline job focused on pure encode-and-measure, and lets comparison parameters be tuned independently. Studies without comparison targets produce no output; the artifact upload uses if-no-files-found: ignore to handle this gracefully.

  • Artifact pipeline: Study results flow through GitHub Actions artifacts. The dataset is fetched once and shared. Each study uploads its metrics and analysis. The report job downloads all results and generates the final report.

The report is deployed to the report/ subdirectory of GitHub Pages. It includes interactive Plotly visualizations for rate-distortion curves, quality-vs-parameter plots, and visual comparisons with Butteraugli distortion maps.

Each workflow run creates a timestamped release (e.g., study-20260228-143000) containing:

  • Release notes — auto-generated markdown summarizing studies, datasets, tool versions, and key findings
  • CSV statistics files — per-study statistical summaries suitable for independent re-analysis in any tool (Excel, R, Python, etc.)

The CSV files include per-format, per-quality-level aggregated statistics (mean, median, percentiles) for all quality metrics, file sizes, encoding times, and derived efficiency metrics.

All data from the publicly performed research is available for re-analysis:

OutputLocationRetention
Interactive reportGitHub Pages /report/Until next deployment
CSV statisticsGitHub ReleasesPermanent
Raw metrics JSONWorkflow artifacts90 days
Execution logsActions tab90 days (or per repo settings)
AspectMechanism
Tool versionsPinned in Dockerfile build args (JPEG_XL_VERSION, LIBAOM_VERSION, etc.)
Encoding determinismSingle-threaded mode enforced for all encoders
ConfigurationStudy JSON files committed to the repository
EnvironmentIdentical dev container image for local and CI runs
TraceabilityQuality results JSON records tool versions, timestamps, and all parameters
  • Runner variability: GitHub-hosted runners may have different CPU microarchitectures between runs. This can affect encoding speed measurements but not quality metrics or compression ratios.
  • Dataset scope: The workflow is configured to use div2k-valid (100 images). Larger datasets may exceed the 6-hour job timeout.
  • Storage: The 14 GB runner disk limits the number of concurrent encoded artifacts. The pipeline discards encoded files after measurement by default.