Configuration Reference

logfwd is configured with a YAML file passed via --config <path>.

Overview

logfwd supports two layout styles:

Simple — single pipeline with top-level input, transform, and output keys.
Advanced — multiple named pipelines under a pipelines map.

Environment variables are expanded using ${VAR} syntax anywhere in the file. If a variable is not set the placeholder is left as-is.

Simple layout

input:
  type: file
  path: /var/log/app/*.log
  format: json

transform: SELECT level_str, message_str, status_int FROM logs WHERE status_int >= 400

output:
  type: otlp
  endpoint: otel-collector:4317
  compression: zstd

server:
  diagnostics: 0.0.0.0:9090
  log_level: info

Advanced layout

pipelines:
  errors:
    inputs:
      - name: pod_logs
        type: file
        path: /var/log/pods/**/*.log
        format: cri
    transform: SELECT * FROM logs WHERE level_str = 'ERROR'
    outputs:
      - type: otlp
        endpoint: otel-collector:4317

  debug:
    inputs:
      - type: file
        path: /var/log/pods/**/*.log
        format: cri
    outputs:
      - type: stdout
        format: json

server:
  diagnostics: 0.0.0.0:9090

The two layouts cannot be mixed: specifying both input/output at the top level and a pipelines map is a validation error.

Input configuration

Each pipeline requires at least one input. Use a single mapping for one input or a YAML sequence for multiple inputs.

Common fields

Field	Type	Required	Description
`type`	string	Yes	Input type. See Input types.
`name`	string	No	Friendly name shown in diagnostics.
`format`	string	No	Log format. See Formats. Defaults to `auto`.

`file` input

Tail one or more log files that match a glob pattern.

Field	Type	Required	Description
`path`	string	Yes	Glob pattern, e.g. `/var/log/pods/*/.log`.

input:
  type: file
  path: /var/log/pods/**/*.log
  format: cri

`udp` input (not yet implemented)

Listen for log lines on a UDP socket.

Field	Type	Required	Description
`listen`	string	Yes	`host:port`, e.g. `0.0.0.0:514`.

input:
  type: udp
  listen: 0.0.0.0:514
  format: syslog

`tcp` input (not yet implemented)

Accept log lines on a TCP socket.

Field	Type	Required	Description
`listen`	string	Yes	`host:port`, e.g. `0.0.0.0:5140`.

input:
  type: tcp
  listen: 0.0.0.0:5140
  format: json

`otlp` input (not yet implemented)

Receive OTLP log records from another agent or SDK.

No extra fields required; the listen address will be configurable in a future release.

Input types

Value	Status	Description
`file`	Implemented	Tail files matching a glob pattern.
`udp`	Planned	Receive log lines over UDP.
`tcp`	Planned	Accept log lines over TCP.
`otlp`	Planned	Receive OTLP logs.

Formats

The format field controls how raw bytes from the input are parsed into log records.

Value	Description
`auto`	Auto-detect (default). Tries CRI first, then JSON, then raw.
`cri`	CRI container log format (`<timestamp> <stream> <flags> <message>`). Multi-line log reassembly via the `P` partial flag is supported.
`json`	Newline-delimited JSON. Each line must be a single JSON object.
`raw`	Treat each line as an opaque string stored in `_raw_str`.
`logfmt`	Key=value pairs (e.g. `level=info msg="hello"`). Not yet implemented.
`syslog`	RFC 5424 syslog. Not yet implemented.
`console`	Human-readable coloured output for interactive debugging. Output mode only.

Output configuration

Each pipeline requires at least one output.

Common fields

Field	Type	Required	Description
`type`	string	Yes	Output type. See Output types.
`name`	string	No	Friendly name shown in diagnostics.

`otlp` output

Send log records as OTLP protobuf to an OpenTelemetry collector.

Field	Type	Required	Default	Description
`endpoint`	string	Yes	—	Collector address, e.g. `otel-collector:4317` (gRPC) or `http://otel-collector:4318` (HTTP).
`protocol`	string	No	`http`	`http` or `grpc`.
`compression`	string	No	none	`zstd` to compress the request body.

output:
  type: otlp
  endpoint: otel-collector:4317
  protocol: grpc
  compression: zstd

`http` output

POST log records as newline-delimited JSON to an HTTP endpoint.

Field	Type	Required	Description
`endpoint`	string	Yes	Full URL, e.g. `http://ingest.example.com/logs`.
`compression`	string	No	`zstd` to compress the request body.

output:
  type: http
  endpoint: http://ingest.example.com/logs
  compression: zstd

`stdout` output

Print records to standard output for local debugging.

Field	Type	Required	Default	Description
`format`	string	No	`json`	`json` (newline-delimited JSON) or `console` (coloured text).

output:
  type: stdout
  format: console

`elasticsearch` output (stub)

Ship to Elasticsearch via the bulk API. Not yet functional.

Field	Type	Required	Description
`endpoint`	string	Yes	Elasticsearch base URL.

`loki` output (stub)

Push to Grafana Loki. Not yet functional.

Field	Type	Required	Description
`endpoint`	string	Yes	Loki push URL.

`file_out` output (partial)

Write records to a file.

Field	Type	Required	Description
`path`	string	Yes	Destination file path.

`parquet` output (stub)

Write records to Parquet files. Not yet functional.

Field	Type	Required	Description
`path`	string	Yes	Destination file path.

Output types

Value	Status	Description
`otlp`	Implemented	OTLP protobuf over HTTP or gRPC.
`http`	Implemented	JSON lines over HTTP POST.
`stdout`	Implemented	Print to stdout (JSON or coloured text).
`elasticsearch`	Stub	Elasticsearch bulk API.
`loki`	Stub	Grafana Loki push API.
`file_out`	Partial	Write to a file.
`parquet`	Stub	Write Parquet files.

SQL transform

The optional transform field contains a DataFusion SQL query that is applied to every Arrow RecordBatch produced by the scanner. The source table is always named logs.

transform: SELECT level_str, message_str, status_int FROM logs WHERE status_int >= 400

Multi-line SQL is supported with YAML block scalars:

transform: |
  SELECT
    level_str,
    message_str,
    regexp_extract(message_str, 'request_id=([a-f0-9-]+)', 1) AS request_id_str,
    status_int
  FROM logs
  WHERE level_str IN ('ERROR', 'WARN')
    AND status_int >= 400

Column naming convention

The scanner maps each JSON field to one or more typed Arrow columns following the {field}_{type} naming convention:

JSON value type	Arrow column type	Column name pattern	Example
String	StringArray	`{field}_str`	`level_str`
Integer	Int64Array	`{field}_int`	`status_int`
Float	Float64Array	`{field}_float`	`latency_ms_float`
Boolean	StringArray (`"true"`/`"false"`)	`{field}_str`	`enabled_str`
Null	null in all type columns	—	—
Object / Array	StringArray (raw JSON)	`{field}_str`	`metadata_str`

When a field contains mixed types across rows, separate columns are emitted: status_int and status_str can coexist in the same batch.

Special columns added by the scanner:

Column	Type	Description
`_file_str`	string	Absolute path of the source file (file inputs only).
`_raw_str`	string	Original JSON line (only when `keep_raw: true`).
`_time_ns_int`	int64	Timestamp from CRI header in nanoseconds (CRI inputs only).
`_stream_str`	string	CRI stream name (`stdout`/`stderr`).

Built-in UDFs

Function	Signature	Description
`int(expr)`	`int(any) → int64`	Cast any value to int64. Returns NULL on failure.
`float(expr)`	`float(any) → float64`	Cast any value to float64. Returns NULL on failure.
`grok(pattern, input)`	`grok(utf8, utf8) → utf8`	Apply a Grok pattern to `input` and return the first capture as JSON.
`regexp_extract(input, pattern, group)`	`regexp_extract(utf8, utf8, int64) → utf8`	Return capture group `group` from a regex match.

Examples:

-- Cast a string column to int
SELECT int(status_str) AS status_int FROM logs

-- Extract a field with Grok
SELECT grok('%{IP:client} %{WORD:method} %{URIPATHPARAM:path}', message_str) AS parsed_str FROM logs

-- Extract a named group with regex
SELECT regexp_extract(message_str, 'user=([a-z]+)', 1) AS user_str FROM logs

-- Type-cast from environment-injected string
SELECT float(duration_str) AS duration_ms_float FROM logs

Enrichment tables

Enrichment tables are made available as SQL tables that can be joined in the transform query. They are declared under the top-level enrichment key.

enrichment:
  k8s:
    type: k8s_path

  host:
    type: host_info

  labels:
    type: static
    fields:
      environment: production
      region: us-east-1

`k8s_path` enrichment

Parses Kubernetes pod log paths (e.g. /var/log/pods/<namespace>_<pod>_<uid>/<container>/) to extract metadata.

SELECT l.level_str, l.message_str, k.namespace, k.pod_name, k.container_name
FROM logs l
JOIN k8s k ON l._file_str = k.log_path_prefix

Columns exposed by k8s:

Column	Description
`log_path_prefix`	Directory prefix used as join key.
`namespace`	Kubernetes namespace.
`pod_name`	Pod name.
`pod_uid`	Pod UID.
`container_name`	Container name.

`host_info` enrichment

Exposes the hostname of the machine running logfwd.

Column	Description
`hostname`	System hostname.

`static` enrichment

A table with one row containing user-defined label columns.

enrichment:
  labels:
    type: static
    fields:
      environment: production
      cluster: us-east-1
      tier: backend

SELECT l.*, lbl.environment, lbl.cluster
FROM logs l CROSS JOIN labels lbl

Server configuration

The optional server block controls the diagnostics server and observability settings.

Field	Type	Default	Description
`diagnostics`	string	none	`host:port` to listen for HTTP diagnostics. See Diagnostics API.
`log_level`	string	`info`	Log verbosity. One of `error`, `warn`, `info`, `debug`, `trace`.
`metrics_endpoint`	string	none	OTLP endpoint for periodic metrics push, e.g. `http://otel-collector:4318`.
`metrics_interval_secs`	integer	`60`	Push interval for OTLP metrics in seconds.

server:
  diagnostics: 0.0.0.0:9090
  log_level: info
  metrics_endpoint: http://otel-collector:4318
  metrics_interval_secs: 30

Diagnostics API

When server.diagnostics is configured, logfwd exposes an HTTP API for monitoring and troubleshooting.

Route	Method	Description
`/`	GET	Dashboard HTML (visual explorer for metrics and traces).
`/health`	GET	Liveness probe. Returns 200 OK if the server is running.
`/ready`	GET	Readiness probe. Returns 200 OK once pipelines are initialized.
`/api/pipelines`	GET	Per-pipeline counters (lines, bytes, errors, batches, stage timing).
`/api/stats`	GET	Aggregate process stats (uptime, RSS, CPU, aggregate line counts).
`/api/config`	GET	Currently loaded YAML configuration and its file path.
`/api/logs`	GET	Recent log lines from logfwd’s own stderr (ring buffer).
`/api/history`	GET	Time-series data (1-hour window) for dashboard charts.
`/api/traces`	GET	Recent batch processing spans for detailed latency analysis.

Note: The /metrics (Prometheus) endpoint was removed in favor of /api/pipelines. It returns 410 Gone. The /api/system route mentioned in some older documentation does not exist.

Storage configuration

The optional storage block controls where logfwd persists state (checkpoints, disk queue).

Field	Type	Default	Description
`data_dir`	string	none	Directory for state files. Created if it does not exist.

storage:
  data_dir: /var/lib/logfwd

Environment variable substitution

Any value in the config file can reference an environment variable with ${VAR}:

output:
  type: otlp
  endpoint: ${OTEL_COLLECTOR_ADDR}

server:
  metrics_endpoint: ${METRICS_PUSH_URL}

If the variable is not set, the placeholder is left as-is (no error).

Complete example

pipelines:
  app:
    inputs:
      - name: pod_logs
        type: file
        path: /var/log/pods/**/*.log
        format: cri
    transform: |
      SELECT
        l.level_str,
        l.message_str,
        l.status_int,
        k.namespace,
        k.pod_name,
        k.container_name,
        lbl.environment
      FROM logs l
      LEFT JOIN k8s k ON l._file_str = k.log_path_prefix
      CROSS JOIN labels lbl
      WHERE l.level_str IN ('ERROR', 'WARN')
        OR l.status_int >= 500
    outputs:
      - name: collector
        type: otlp
        endpoint: ${OTEL_ENDPOINT}
        protocol: grpc
        compression: zstd
      - name: debug
        type: stdout
        format: console

enrichment:
  k8s:
    type: k8s_path
  labels:
    type: static
    fields:
      environment: ${ENVIRONMENT}
      cluster: ${CLUSTER_NAME}

server:
  diagnostics: 0.0.0.0:9090
  log_level: info
  metrics_endpoint: ${OTEL_ENDPOINT}
  metrics_interval_secs: 60

storage:
  data_dir: /var/lib/logfwd

Keyboard shortcuts

logfwd