Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Configuration Reference

logfwd is configured with a YAML file passed via --config <path>.

Overview

logfwd supports two layout styles:

  • Simple — single pipeline with top-level input, transform, and output keys.
  • Advanced — multiple named pipelines under a pipelines map.

Environment variables are expanded using ${VAR} syntax anywhere in the file. If a variable is not set the placeholder is left as-is.


Simple layout

input:
  type: file
  path: /var/log/app/*.log
  format: json

transform: SELECT level_str, message_str, status_int FROM logs WHERE status_int >= 400

output:
  type: otlp
  endpoint: otel-collector:4317
  compression: zstd

server:
  diagnostics: 0.0.0.0:9090
  log_level: info

Advanced layout

pipelines:
  errors:
    inputs:
      - name: pod_logs
        type: file
        path: /var/log/pods/**/*.log
        format: cri
    transform: SELECT * FROM logs WHERE level_str = 'ERROR'
    outputs:
      - type: otlp
        endpoint: otel-collector:4317

  debug:
    inputs:
      - type: file
        path: /var/log/pods/**/*.log
        format: cri
    outputs:
      - type: stdout
        format: json

server:
  diagnostics: 0.0.0.0:9090

The two layouts cannot be mixed: specifying both input/output at the top level and a pipelines map is a validation error.


Input configuration

Each pipeline requires at least one input. Use a single mapping for one input or a YAML sequence for multiple inputs.

Common fields

FieldTypeRequiredDescription
typestringYesInput type. See Input types.
namestringNoFriendly name shown in diagnostics.
formatstringNoLog format. See Formats. Defaults to auto.

file input

Tail one or more log files that match a glob pattern.

FieldTypeRequiredDescription
pathstringYesGlob pattern, e.g. /var/log/pods/**/*.log.
input:
  type: file
  path: /var/log/pods/**/*.log
  format: cri

udp input (not yet implemented)

Listen for log lines on a UDP socket.

FieldTypeRequiredDescription
listenstringYeshost:port, e.g. 0.0.0.0:514.
input:
  type: udp
  listen: 0.0.0.0:514
  format: syslog

tcp input (not yet implemented)

Accept log lines on a TCP socket.

FieldTypeRequiredDescription
listenstringYeshost:port, e.g. 0.0.0.0:5140.
input:
  type: tcp
  listen: 0.0.0.0:5140
  format: json

otlp input (not yet implemented)

Receive OTLP log records from another agent or SDK.

No extra fields required; the listen address will be configurable in a future release.


Input types

ValueStatusDescription
fileImplementedTail files matching a glob pattern.
udpPlannedReceive log lines over UDP.
tcpPlannedAccept log lines over TCP.
otlpPlannedReceive OTLP logs.

Formats

The format field controls how raw bytes from the input are parsed into log records.

ValueDescription
autoAuto-detect (default). Tries CRI first, then JSON, then raw.
criCRI container log format (<timestamp> <stream> <flags> <message>). Multi-line log reassembly via the P partial flag is supported.
jsonNewline-delimited JSON. Each line must be a single JSON object.
rawTreat each line as an opaque string stored in _raw_str.
logfmtKey=value pairs (e.g. level=info msg="hello"). Not yet implemented.
syslogRFC 5424 syslog. Not yet implemented.
consoleHuman-readable coloured output for interactive debugging. Output mode only.

Output configuration

Each pipeline requires at least one output.

Common fields

FieldTypeRequiredDescription
typestringYesOutput type. See Output types.
namestringNoFriendly name shown in diagnostics.

otlp output

Send log records as OTLP protobuf to an OpenTelemetry collector.

FieldTypeRequiredDefaultDescription
endpointstringYesCollector address, e.g. otel-collector:4317 (gRPC) or http://otel-collector:4318 (HTTP).
protocolstringNohttphttp or grpc.
compressionstringNononezstd to compress the request body.
output:
  type: otlp
  endpoint: otel-collector:4317
  protocol: grpc
  compression: zstd

http output

POST log records as newline-delimited JSON to an HTTP endpoint.

FieldTypeRequiredDescription
endpointstringYesFull URL, e.g. http://ingest.example.com/logs.
compressionstringNozstd to compress the request body.
output:
  type: http
  endpoint: http://ingest.example.com/logs
  compression: zstd

stdout output

Print records to standard output for local debugging.

FieldTypeRequiredDefaultDescription
formatstringNojsonjson (newline-delimited JSON) or console (coloured text).
output:
  type: stdout
  format: console

elasticsearch output (stub)

Ship to Elasticsearch via the bulk API. Not yet functional.

FieldTypeRequiredDescription
endpointstringYesElasticsearch base URL.

loki output (stub)

Push to Grafana Loki. Not yet functional.

FieldTypeRequiredDescription
endpointstringYesLoki push URL.

file_out output (partial)

Write records to a file.

FieldTypeRequiredDescription
pathstringYesDestination file path.

parquet output (stub)

Write records to Parquet files. Not yet functional.

FieldTypeRequiredDescription
pathstringYesDestination file path.

Output types

ValueStatusDescription
otlpImplementedOTLP protobuf over HTTP or gRPC.
httpImplementedJSON lines over HTTP POST.
stdoutImplementedPrint to stdout (JSON or coloured text).
elasticsearchStubElasticsearch bulk API.
lokiStubGrafana Loki push API.
file_outPartialWrite to a file.
parquetStubWrite Parquet files.

SQL transform

The optional transform field contains a DataFusion SQL query that is applied to every Arrow RecordBatch produced by the scanner. The source table is always named logs.

transform: SELECT level_str, message_str, status_int FROM logs WHERE status_int >= 400

Multi-line SQL is supported with YAML block scalars:

transform: |
  SELECT
    level_str,
    message_str,
    regexp_extract(message_str, 'request_id=([a-f0-9-]+)', 1) AS request_id_str,
    status_int
  FROM logs
  WHERE level_str IN ('ERROR', 'WARN')
    AND status_int >= 400

Column naming convention

The scanner maps each JSON field to one or more typed Arrow columns following the {field}_{type} naming convention:

JSON value typeArrow column typeColumn name patternExample
StringStringArray{field}_strlevel_str
IntegerInt64Array{field}_intstatus_int
FloatFloat64Array{field}_floatlatency_ms_float
BooleanStringArray ("true"/"false"){field}_strenabled_str
Nullnull in all type columns
Object / ArrayStringArray (raw JSON){field}_strmetadata_str

When a field contains mixed types across rows, separate columns are emitted: status_int and status_str can coexist in the same batch.

Special columns added by the scanner:

ColumnTypeDescription
_file_strstringAbsolute path of the source file (file inputs only).
_raw_strstringOriginal JSON line (only when keep_raw: true).
_time_ns_intint64Timestamp from CRI header in nanoseconds (CRI inputs only).
_stream_strstringCRI stream name (stdout/stderr).

Built-in UDFs

FunctionSignatureDescription
int(expr)int(any) → int64Cast any value to int64. Returns NULL on failure.
float(expr)float(any) → float64Cast any value to float64. Returns NULL on failure.
grok(pattern, input)grok(utf8, utf8) → utf8Apply a Grok pattern to input and return the first capture as JSON.
regexp_extract(input, pattern, group)regexp_extract(utf8, utf8, int64) → utf8Return capture group group from a regex match.

Examples:

-- Cast a string column to int
SELECT int(status_str) AS status_int FROM logs

-- Extract a field with Grok
SELECT grok('%{IP:client} %{WORD:method} %{URIPATHPARAM:path}', message_str) AS parsed_str FROM logs

-- Extract a named group with regex
SELECT regexp_extract(message_str, 'user=([a-z]+)', 1) AS user_str FROM logs

-- Type-cast from environment-injected string
SELECT float(duration_str) AS duration_ms_float FROM logs

Enrichment tables

Enrichment tables are made available as SQL tables that can be joined in the transform query. They are declared under the top-level enrichment key.

enrichment:
  k8s:
    type: k8s_path

  host:
    type: host_info

  labels:
    type: static
    fields:
      environment: production
      region: us-east-1

k8s_path enrichment

Parses Kubernetes pod log paths (e.g. /var/log/pods/<namespace>_<pod>_<uid>/<container>/) to extract metadata.

SELECT l.level_str, l.message_str, k.namespace, k.pod_name, k.container_name
FROM logs l
JOIN k8s k ON l._file_str = k.log_path_prefix

Columns exposed by k8s:

ColumnDescription
log_path_prefixDirectory prefix used as join key.
namespaceKubernetes namespace.
pod_namePod name.
pod_uidPod UID.
container_nameContainer name.

host_info enrichment

Exposes the hostname of the machine running logfwd.

ColumnDescription
hostnameSystem hostname.

static enrichment

A table with one row containing user-defined label columns.

enrichment:
  labels:
    type: static
    fields:
      environment: production
      cluster: us-east-1
      tier: backend
SELECT l.*, lbl.environment, lbl.cluster
FROM logs l CROSS JOIN labels lbl

Server configuration

The optional server block controls the diagnostics server and observability settings.

FieldTypeDefaultDescription
diagnosticsstringnonehost:port to listen for HTTP diagnostics. See Diagnostics API.
log_levelstringinfoLog verbosity. One of error, warn, info, debug, trace.
metrics_endpointstringnoneOTLP endpoint for periodic metrics push, e.g. http://otel-collector:4318.
metrics_interval_secsinteger60Push interval for OTLP metrics in seconds.
server:
  diagnostics: 0.0.0.0:9090
  log_level: info
  metrics_endpoint: http://otel-collector:4318
  metrics_interval_secs: 30

Diagnostics API

When server.diagnostics is configured, logfwd exposes an HTTP API for monitoring and troubleshooting.

RouteMethodDescription
/GETDashboard HTML (visual explorer for metrics and traces).
/healthGETLiveness probe. Returns 200 OK if the server is running.
/readyGETReadiness probe. Returns 200 OK once pipelines are initialized.
/api/pipelinesGETPer-pipeline counters (lines, bytes, errors, batches, stage timing).
/api/statsGETAggregate process stats (uptime, RSS, CPU, aggregate line counts).
/api/configGETCurrently loaded YAML configuration and its file path.
/api/logsGETRecent log lines from logfwd’s own stderr (ring buffer).
/api/historyGETTime-series data (1-hour window) for dashboard charts.
/api/tracesGETRecent batch processing spans for detailed latency analysis.

Note: The /metrics (Prometheus) endpoint was removed in favor of /api/pipelines. It returns 410 Gone. The /api/system route mentioned in some older documentation does not exist.


Storage configuration

The optional storage block controls where logfwd persists state (checkpoints, disk queue).

FieldTypeDefaultDescription
data_dirstringnoneDirectory for state files. Created if it does not exist.
storage:
  data_dir: /var/lib/logfwd

Environment variable substitution

Any value in the config file can reference an environment variable with ${VAR}:

output:
  type: otlp
  endpoint: ${OTEL_COLLECTOR_ADDR}

server:
  metrics_endpoint: ${METRICS_PUSH_URL}

If the variable is not set, the placeholder is left as-is (no error).


Complete example

pipelines:
  app:
    inputs:
      - name: pod_logs
        type: file
        path: /var/log/pods/**/*.log
        format: cri
    transform: |
      SELECT
        l.level_str,
        l.message_str,
        l.status_int,
        k.namespace,
        k.pod_name,
        k.container_name,
        lbl.environment
      FROM logs l
      LEFT JOIN k8s k ON l._file_str = k.log_path_prefix
      CROSS JOIN labels lbl
      WHERE l.level_str IN ('ERROR', 'WARN')
        OR l.status_int >= 500
    outputs:
      - name: collector
        type: otlp
        endpoint: ${OTEL_ENDPOINT}
        protocol: grpc
        compression: zstd
      - name: debug
        type: stdout
        format: console

enrichment:
  k8s:
    type: k8s_path
  labels:
    type: static
    fields:
      environment: ${ENVIRONMENT}
      cluster: ${CLUSTER_NAME}

server:
  diagnostics: 0.0.0.0:9090
  log_level: info
  metrics_endpoint: ${OTEL_ENDPOINT}
  metrics_interval_secs: 60

storage:
  data_dir: /var/lib/logfwd