automation layer

Catch regressions
before they merge.

Monitor host telemetry during CI. Parse benchmark output from any major tool. Compare against rolling baselines. Post regression summaries to PRs. All as composable GitHub Actions.

See a Regression Gate Start With Quickstart Browse Actions

What a Regression Gate Looks Like

The compare action flags regressions beyond your threshold and posts a summary to the PR:

Benchmark	Baseline (avg 10 runs)	Current	Change	Status
BenchmarkParse	142 ns/op	148 ns/op	+4.2%	✅ pass
BenchmarkEncode	89 ns/op	87 ns/op	−2.2%	✅ pass
BenchmarkQuery	1,204 ns/op	2,451 ns/op	+103.6%	❌ regression
BenchmarkAlloc	48 B/op	112 B/op	+133.3%	❌ regression

The Pipeline

monitor

Start OTel Collector, scrape host metrics, stash on exit

→

parse-results

Normalize benchmark output to OTLP, stash to data branch

→

aggregate

Rebuild indexes and time-series from all runs

→

compare

Flag regressions, post PR comment, optionally fail the step

Quickstart

name: benchkit-ci
on: [pull_request]

permissions:
  contents: write
  actions: read

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - id: monitor
        uses: strawgate/o11ykit/octo11y/actions/monitor@main-dist
        with:
          github-token: ${{ github.token }}

      - run: go test ./... -bench=. | tee bench.txt

      - uses: strawgate/o11ykit/octo11y/actions/parse-results@main-dist
        with:
          mode: file
          results: bench.txt
          format: go
          github-token: ${{ github.token }}

      - uses: strawgate/o11ykit/octo11y/actions/aggregate@main-dist
        with:
          github-token: ${{ github.token }}

      - uses: strawgate/o11ykit/octo11y/actions/compare@main-dist
        with:
          results: bench.txt
          format: go
          baseline-runs: 10
          threshold: 5
          github-token: ${{ github.token }}

Supported Formats

Auto-detected or explicitly specified — parse-results handles them all:

Go bench Rust cargo bench Hyperfine pytest-benchmark benchmark-action OTLP JSON

What Monitor Captures

Host Metrics

CPU utilization, memory usage, system load — filtered to runner-descendant processes.

Custom Metrics

One-liner emission via benchkit-emit CLI. No SDK, no configuration, just a name and value.

Telemetry Sidecar

Full OTLP collector with gRPC + HTTP receivers. Gzipped NDJSON stashed alongside benchmarks.

Catch regressionsbefore they merge.