o11ykit

automation layer

Catch regressions
before they merge.

Monitor host telemetry during CI. Parse benchmark output from any major tool. Compare against rolling baselines. Post regression summaries to PRs. All as composable GitHub Actions.

What a Regression Gate Looks Like

The compare action flags regressions beyond your threshold and posts a summary to the PR:

Benchmark Baseline (avg 10 runs) Current Change Status
BenchmarkParse 142 ns/op 148 ns/op +4.2% ✅ pass
BenchmarkEncode 89 ns/op 87 ns/op −2.2% ✅ pass
BenchmarkQuery 1,204 ns/op 2,451 ns/op +103.6% ❌ regression
BenchmarkAlloc 48 B/op 112 B/op +133.3% ❌ regression

The Pipeline

monitor
Start OTel Collector, scrape host metrics, stash on exit
parse-results
Normalize benchmark output to OTLP, stash to data branch
aggregate
Rebuild indexes and time-series from all runs
compare
Flag regressions, post PR comment, optionally fail the step

Quickstart

name: benchkit-ci
on: [pull_request]

permissions:
  contents: write
  actions: read

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - id: monitor
        uses: strawgate/o11ykit/octo11y/actions/monitor@main-dist
        with:
          github-token: ${{ github.token }}

      - run: go test ./... -bench=. | tee bench.txt

      - uses: strawgate/o11ykit/octo11y/actions/parse-results@main-dist
        with:
          mode: file
          results: bench.txt
          format: go
          github-token: ${{ github.token }}

      - uses: strawgate/o11ykit/octo11y/actions/aggregate@main-dist
        with:
          github-token: ${{ github.token }}

      - uses: strawgate/o11ykit/octo11y/actions/compare@main-dist
        with:
          results: bench.txt
          format: go
          baseline-runs: 10
          threshold: 5
          github-token: ${{ github.token }}

Supported Formats

Auto-detected or explicitly specified — parse-results handles them all:

Go bench Rust cargo bench Hyperfine pytest-benchmark benchmark-action OTLP JSON

What Monitor Captures

Host Metrics

CPU utilization, memory usage, system load — filtered to runner-descendant processes.

Custom Metrics

One-liner emission via benchkit-emit CLI. No SDK, no configuration, just a name and value.

Telemetry Sidecar

Full OTLP collector with gRPC + HTTP receivers. Gzipped NDJSON stashed alongside benchmarks.