Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Quick Start

Get a working logfwd pipeline in 60 seconds, then build on it.

Prerequisites

You need the logfwd binary. See Installation for all options, or grab it quickly:

# Download the latest release (macOS Apple Silicon shown)
curl -fsSL https://github.com/strawgate/memagent/releases/latest/download/logfwd-darwin-arm64 -o logfwd
chmod +x logfwd

# Or build from source (Rust 1.86+)
cargo build --release -p logfwd && cp target/release/logfwd .

Verify it works:

./logfwd --version

Stage 1: See log output

Generate synthetic JSON log lines and print them to your terminal.

Generate test data:

./logfwd --generate-json 10000 logs.json

This creates 10,000 JSON log lines with fields like level, message, status, duration_ms, and service.

Create a config file:

# config.yaml
input:
  type: file
  path: logs.json
  format: json

output:
  type: stdout
  format: console

Run it:

./logfwd --config config.yaml

You’ll see a startup banner followed by colored output for every log line:

logfwd v0.1.0

  ✓  default
     in   file  logs.json
     out  stdout

ready · 1 pipeline

10:30:00.000Z  INFO   request handled GET /api/v1/users/10000  duration_ms=1 request_id=... service=myapp status=200
10:30:00.000Z  WARN   request handled POST /api/v2/orders/10015  duration_ms=87 request_id=... service=myapp status=429
10:30:00.000Z  ERROR  request handled GET /health/10021  duration_ms=40 request_id=... service=myapp status=503
10:30:00.000Z  DEBUG  request handled GET /api/v1/users/10033  duration_ms=3 request_id=... service=myapp status=200
...

logfwd parsed every JSON line, detected field types automatically (strings, integers, floats), built Arrow RecordBatches, and printed them in a human-readable format. All 10,000 lines stream through in under a second.

Note: logfwd exits when it reaches the end of a finite file. In production you’d point it at a log file that’s actively being appended to, and logfwd will tail it continuously — like tail -f, but with parsing and SQL.


Stage 2: Filter with SQL

Now add a SQL transform to keep only what matters. This is the core reason to use logfwd — every batch of parsed records becomes a DataFusion SQL table named logs.

First, regenerate the test data — logfwd tracks file positions between runs so it doesn’t reprocess data it has already seen:

./logfwd --generate-json 10000 logs.json

Update your config:

# config.yaml
input:
  type: file
  path: logs.json
  format: json

transform: |
  SELECT level, message, status, duration_ms
  FROM logs
  WHERE level = 'ERROR' AND duration_ms > 50

output:
  type: stdout
  format: console

Run it:

./logfwd --config config.yaml

Now you see far fewer lines — only errors with slow durations:

ERROR  request handled GET /api/v2/products/10049  status=500 duration_ms=92
ERROR  request handled POST /api/v1/orders/10121  status=503 duration_ms=78
...

Only the four columns from the SELECT appear — level, message, status, duration_ms. Everything else was filtered out before it reached the output. In production, this means you only ship the logs you care about — saving bandwidth, storage, and money.

Try a more advanced transform:

transform: |
  SELECT
    level,
    regexp_extract(message, '(GET|POST|PUT|DELETE) (\S+)', 2) AS path,
    status,
    duration_ms
  FROM logs
  WHERE status >= 400

This extracts the URL path from the message with a regex and keeps only 4xx/5xx responses. Full SQL — JOIN, GROUP BY, HAVING, subqueries — all works.

Note: ORDER BY is valid SQL and works correctly within each batch. However, logfwd processes data in streaming batches, so ordering only applies within a single batch, not globally across all data.

Built-in UDFs: int(), float(), regexp_extract(), grok(), json(), json_int(), json_float(). The geo_lookup() UDF is also available when a geo-IP database is configured. See the SQL Transforms guide for the complete reference.


Stage 3: Ship to a collector

In production, logfwd sends OTLP protobuf to an OpenTelemetry Collector, Grafana Alloy, or any OTLP-compatible backend. Let’s try it with logfwd’s built-in blackhole receiver — it accepts OTLP data and discards it, so you can test the full pipeline without external infrastructure.

Start the blackhole receiver:

./logfwd --blackhole &
# logfwd blackhole starting on 127.0.0.1:4318

Regenerate test data and update your config to send OTLP:

./logfwd --generate-json 10000 logs.json
# config.yaml
input:
  type: file
  path: logs.json
  format: json

transform: |
  SELECT level, message, status, duration_ms
  FROM logs
  WHERE level IN ('ERROR', 'WARN')

output:
  type: otlp
  endpoint: http://127.0.0.1:4318/v1/logs
  compression: zstd

Run it:

./logfwd --config config.yaml

logfwd parses the JSON, runs the SQL filter, encodes matching records as OTLP protobuf with zstd compression, and ships them over HTTP. The blackhole receiver accepts everything.

To ship to a real collector, replace the endpoint:

output:
  type: otlp
  endpoint: http://otel-collector:4318/v1/logs   # your real collector
  compression: zstd

logfwd works out of the box with:


Validate before deploying

Before running in production, verify your config. --validate catches YAML errors; --dry-run goes further and compiles the SQL against the Arrow schema, catching column name typos and type mismatches before any data flows.

./logfwd --config config.yaml --validate
#   ready: default
# config ok: 1 pipeline(s)

./logfwd --config config.yaml --dry-run
#   ready: default
# dry run ok: 1 pipeline(s)

What’s next

GuideWhat you’ll learn
Your First PipelineProduction config with monitoring, multi-pipeline setup, CRI format
SQL TransformsFull SQL reference — JOINs, UDFs, enrichment tables, column naming
Configuration ReferenceEvery YAML field, input/output type, and option
Kubernetes DeploymentDaemonSet, resource sizing, OTLP collector integration
TroubleshootingCommon errors, debug mode, diagnostics API