Skip to content

Configuration Reference

FastForward commands that operate on pipeline config (for example run, validate, dry-run, and effective-config) accept a YAML file via --config <config.yaml>. ff send accepts a destination-only config with top-level output, injects stdin as the input, drains the output, and exits.

FastForward pipeline configs use a top-level pipelines map. Each named pipeline defines inputs, optional transform, optional enrichment, optional resource_attrs, and outputs.

Environment variables are expanded using ${VAR} syntax anywhere in the file. If a variable is not set, config loading fails fast with a validation error.


pipelines:
errors:
inputs:
- name: pod_logs
type: file
path: /var/log/pods/**/*.log
format: cri
transform: SELECT * FROM logs WHERE level = 'ERROR'
outputs:
- type: otlp
endpoint: http://otel-collector:4318/v1/logs
debug:
inputs:
- type: file
path: /var/log/pods/**/*.log
format: cri
outputs:
- type: stdout
format: json
server:
diagnostics: 0.0.0.0:9090

Each pipeline requires at least one input. Use a single mapping for one input or a YAML sequence for multiple inputs.

FieldTypeRequiredDescription
typestringYesInput type. See Input types.
namestringNoFriendly name shown in diagnostics.
formatstringNoLog format. See Formats. Defaults to auto.
source_metadatastringNoSource metadata style. Defaults to none. Use fastforward for internal __source_id, ecs for public ECS columns such as file.path, otel for public OpenTelemetry columns such as log.file.path, or vector for public Vector-style columns such as file. Public source path styles require inputs that expose source path snapshots. File inputs expose filesystem paths; S3 inputs expose object keys.

Tail one or more log files that match a glob pattern.

FieldTypeRequiredDescription
pathstringYesGlob pattern, e.g. /var/log/pods/**/*.log.
poll_interval_msintegerNoHow often to poll the file when tailing, in milliseconds (default: 50).
read_buf_sizeintegerNoBuffer size for file reads in bytes (default: 262144, max: 4194304).
per_file_read_budget_bytesintegerNoMaximum bytes read per file per poll (default: 262144).
adaptive_fast_polls_maxintegerNoImmediate repoll budget after a read-budget hit (default: 8, set 0 to disable adaptive fast repolls).
input:
type: file
path: /var/log/pods/**/*.log
format: cri

Read objects from AWS S3 or an S3-compatible endpoint. The S3 input can poll ListObjectsV2 by prefix, or process object notifications from SQS when sqs_queue_url is set. S3 support is behind the s3 feature.

FieldTypeRequiredDescription
s3.bucketstringYesBucket name.
s3.regionstringNoAWS region. Defaults to us-east-1.
s3.endpointstringNoS3-compatible endpoint URL, for example http://localhost:9000. Path-style addressing is used when set.
s3.prefixstringNoOnly process object keys with this prefix.
s3.sqs_queue_urlstringNoSQS queue URL for event-driven object discovery. Omit to poll the bucket prefix.
s3.start_afterstringNoInitial ListObjectsV2 StartAfter key for prefix polling.
s3.access_key_idstringNoAWS access key ID. Falls back to AWS_ACCESS_KEY_ID.
s3.secret_access_keystringNoAWS secret access key. Falls back to AWS_SECRET_ACCESS_KEY.
s3.session_tokenstringNoAWS session token. Falls back to AWS_SESSION_TOKEN.
s3.part_size_bytesintegerNoRange-GET part size. Defaults to 1048576.
s3.max_concurrent_fetchesintegerNoMaximum concurrent range GETs per object. Defaults to 16.
s3.max_concurrent_objectsintegerNoMaximum objects fetched at once. Defaults to 4.
s3.visibility_timeout_secsintegerNoSQS visibility timeout. Defaults to 300 and must be at least 30.
s3.compressionenumNoCompression override: auto, gzip, zstd, snappy, or none. Defaults to auto.
s3.poll_interval_msintegerNoPrefix polling interval in milliseconds. Defaults to 5000.

When source_metadata is ecs, otel, or vector, S3 exposes the object key as the source path value for each row. fastforward only attaches the internal __source_id, and none attaches no source metadata.

input:
type: s3
format: json
source_metadata: ecs
s3:
bucket: app-logs
region: us-east-1
prefix: prod/

Read data from standard input. This input has no input-specific fields. It is primarily used by ff send for command-line piping.

input:
type: stdin
format: auto

Supported formats: auto, cri, json, and raw.

Emit synthetic records for benchmarks, demos, and pipeline tests.

FieldTypeRequiredDescription
generatorobjectNoGenerator settings. If omitted, runtime defaults are used, including batch_size: 1000 and events_per_sec: 0. See Input Types for the detailed nested fields.
generator.profilestringNoGenerator profile: logs (default), record, envoy, cri_k8s, wide, narrow, cloud_trail. See Generator profiles.
input:
type: generator
generator:
events_per_sec: 50000
batch_size: 4096

Listen for log lines on a UDP socket.

FieldTypeRequiredDescription
listenstringYeshost:port, e.g. 0.0.0.0:514.
input:
type: udp
listen: 0.0.0.0:514
format: json

Accept log lines on a TCP socket.

FieldTypeRequiredDescription
listenstringYeshost:port, e.g. 0.0.0.0:5140.
tlsobjectNoOptional server TLS options (cert_file, key_file, client_ca_file, require_client_auth). client_ca_file is valid only when require_client_auth: true.
input:
type: tcp
listen: 0.0.0.0:5140
format: json

Receive OTLP log records from another agent or SDK.

FieldTypeRequiredDescription
listenstringYeshost:port, e.g. 0.0.0.0:4318.
protobuf_decode_modestringNoExperimental protobuf decoder: prost (default), projected_fallback, or projected_only. Projected modes require a build with the otlp-research feature.
input:
type: otlp
listen: 0.0.0.0:4318
protobuf_decode_mode: prost

Receive newline-delimited payloads over HTTP POST.

FieldTypeRequiredDescription
listenstringYeshost:port, e.g. 0.0.0.0:8081.
http.pathstringNoRoute path. Must start with /. Defaults to /.
http.strict_pathbooleanNoWhen true (default), require exact path match.
http.methodstringNoAccepted method. Defaults to POST.
http.max_request_body_sizeintegerNoMaximum request body size in bytes. Defaults to 10 MiB.
http.max_drained_bytes_per_pollintegerNoMaximum bytes drained from the internal request queue per poll. Defaults to 1 GiB.
http.response_codeintegerNoSuccess code. One of 200, 201, 202, 204 (default 200).
http.response_bodystringNoOptional static success response body. Not allowed when http.response_code: 204.
input:
type: http
listen: 0.0.0.0:8081
format: json
http:
path: /ingest
strict_path: true
method: POST
max_request_body_size: 10485760
max_drained_bytes_per_poll: 1073741824
response_code: 200
response_body: '{"ok":true}'

Linux eBPF sensor input for platform-native ingestion. This input is Arrow-native and does not support format.

FieldTypeRequiredDescription
sensor.poll_interval_msintegerNoPeriodic sample cadence in milliseconds. Must be >= 1. Defaults to 10000.
sensor.control_pathstringNoOptional JSON control-plane file path for runtime reload.
sensor.control_reload_interval_msintegerNoReload check interval in milliseconds. Must be >= 1. Defaults to 1000.
sensor.enabled_familiesarray[string]NoOptional enabled signal families for this target (process,file,network,dns,authz on Linux). Omit to use defaults; set [] to disable all families.
sensor.emit_signal_rowsbooleanNoEmit periodic per-family sample rows. Defaults to true.
input:
type: linux_ebpf_sensor
sensor:
poll_interval_ms: 2000

macOS EndpointSecurity sensor input. This input is Arrow-native and does not support format.

FieldTypeRequiredDescription
sensor.poll_interval_msintegerNoPeriodic sample cadence in milliseconds. Must be >= 1. Defaults to 10000.
sensor.control_pathstringNoOptional JSON control-plane file path for runtime reload.
sensor.control_reload_interval_msintegerNoReload check interval in milliseconds. Must be >= 1. Defaults to 1000.
sensor.enabled_familiesarray[string]NoOptional enabled signal families (process,file,network,dns,module,authz on macOS). Omit to use defaults; set [] to disable all families.
sensor.emit_signal_rowsbooleanNoEmit periodic per-family sample rows. Defaults to true.
input:
type: macos_es_sensor

Windows eBPF sensor input. This input is Arrow-native and does not support format.

FieldTypeRequiredDescription
sensor.poll_interval_msintegerNoPeriodic sample cadence in milliseconds. Must be >= 1. Defaults to 10000.
sensor.control_pathstringNoOptional JSON control-plane file path for runtime reload.
sensor.control_reload_interval_msintegerNoReload check interval in milliseconds. Must be >= 1. Defaults to 1000.
sensor.enabled_familiesarray[string]NoOptional enabled signal families (process,file,network,dns,module,registry,authz on Windows). Omit to use defaults; set [] to disable all families.
sensor.emit_signal_rowsbooleanNoEmit periodic per-family sample rows. Defaults to true.
input:
type: windows_ebpf_sensor

Read macOS unified log entries by running log stream. This input is only available on macOS and emits structured rows parsed from the command output.

FieldTypeRequiredDescription
macos_log.levelstringNoOptional log level filter. Must not be empty when set.
macos_log.subsystemstringNoOptional subsystem filter. Must not be empty when set.
macos_log.processstringNoOptional process filter. Must not be empty when set.
pipelines:
default:
inputs:
- type: macos_log
macos_log:
level: info
subsystem: com.example.app
outputs:
- type: stdout
format: json

Read structured entries from the systemd journal using either the native sd_journal C API or a journalctl subprocess.

FieldTypeRequiredDescription
journald.include_unitslistNoSystemd units to include. Suffix .service automatically if omitted.
journald.exclude_unitslistNoSystemd units to exclude.
journald.identifierslistNoSyslog identifiers (SYSLOG_IDENTIFIER=) to include.
journald.prioritieslistNoPriority/log levels to include (e.g. 0, 3, info, err).
journald.cursor_pathstringNoPath to persist cursor for resume after restarts.
journald.include_boot_idboolNoInclude _BOOT_ID field (default: false).
journald.current_boot_onlyboolNoOnly include entries from the current boot (default: true).
journald.since_nowboolNoOnly include entries appended after start (default: false).
journald.journalctl_pathstringNoPath to journalctl binary. Defaults to journalctl on PATH.
journald.journal_directorystringNoCustom journal directory (--directory=<path>).
journald.journal_namespacestringNoJournal namespace (--namespace=<ns>).
journald.backendenumNoauto (default), native (require sd_journal API), or subprocess (always use journalctl).
pipelines:
default:
inputs:
- type: journald
journald:
include_units:
- nginx
- redis
priorities:
- err
- warning
cursor_path: /var/lib/ffwd/journald.cursor

Host metrics input that collects process snapshots, CPU, memory, and network statistics via sysinfo. This input is Arrow-native and does not support format. The OS-specific implementation is selected at compile time based on the build target.

FieldTypeRequiredDescription
sensor.poll_interval_msintegerNoPeriodic sample cadence in milliseconds. Must be >= 1. Defaults to 10000.
sensor.control_pathstringNoOptional JSON control-plane file path for runtime reload.
sensor.control_reload_interval_msintegerNoReload check interval in milliseconds. Must be >= 1. Defaults to 1000.
sensor.enabled_familiesarray[string]NoOptional enabled signal families. Omit to use defaults; set [] to disable all families.
sensor.emit_signal_rowsbooleanNoEmit periodic per-family sample rows. Defaults to true.
sensor.max_rows_per_pollintegerNoUpper bound on data rows returned per collection cycle. Defaults to 256. Set to 0 or omit for the default.
sensor.max_process_rows_per_pollintegerNoUpper bound on process snapshot rows returned per collection cycle. Defaults to 1024. Set to 0 or omit for the default.
sensor.scrapersarray[string]NoList of scrapers to run. Supported values are: cpu, memory, disk, network, filesystem.
sensor.collection_interval_msintegerNoMetrics collection cadence in milliseconds. Defaults to 10000.
sensor.disk_include_devicesarray[string]NoOptional list of disk devices to include in scraping.
sensor.disk_exclude_devicesarray[string]NoOptional list of disk devices to exclude from scraping.
sensor.network_include_interfacesarray[string]NoOptional list of network interfaces to include in scraping.
sensor.network_exclude_interfacesarray[string]NoOptional list of network interfaces to exclude from scraping.
sensor.filesystem_include_mount_pointsarray[string]NoOptional list of filesystem mount points to include in scraping.
sensor.filesystem_exclude_mount_pointsarray[string]NoOptional list of filesystem mount points to exclude from scraping.
input:
type: host_metrics
sensor:
poll_interval_ms: 5000
collection_interval_ms: 5000
scrapers: ["cpu", "memory"]

Receive Arrow IPC stream payloads over HTTP POST and forward decoded RecordBatch values directly into the pipeline (scanner bypass).

FieldTypeRequiredDescription
listenstringYeshost:port, e.g. 0.0.0.0:4319.

Behavior:

  • Route is fixed to POST /v1/arrow for MVP.
  • arrow_ipc is Arrow-native and rejects format.
  • Canonical payload types are application/vnd.apache.arrow.stream and application/vnd.apache.arrow.stream+zstd.
  • Content-Encoding: zstd is also supported for compressed Arrow stream payloads.
  • The receiver currently decodes by payload bytes and may still accept requests with missing/other content-type headers; use canonical content types for predictable interoperability.

ValueStatusDescription
fileImplementedTail files matching a glob pattern.
s3ImplementedRead objects from AWS S3 or an S3-compatible endpoint.
stdinImplementedRead piped stdin until EOF, then drain outputs and exit.
generatorImplementedEmit synthetic JSON-like records from an in-process source.
udpImplementedReceive log lines over UDP.
tcpImplementedAccept log lines over TCP.
otlpImplementedReceive OTLP logs over a bound listen address.
httpImplementedReceive newline-delimited payloads via HTTP POST.
linux_ebpf_sensorImplementedLinux eBPF sensor input (Arrow-native control + signal rows).
macos_es_sensorImplementedmacOS EndpointSecurity sensor input (Arrow-native control + signal rows).
macos_logImplementedRead macOS unified log entries from the log stream command.
windows_ebpf_sensorImplementedWindows eBPF sensor input (Arrow-native control + signal rows).
journaldBetaRead structured journal entries from systemd journald.
host_metricsImplementedHost metrics input — process snapshots, CPU, memory, network stats via sysinfo (Arrow-native).
arrow_ipcImplementedReceive Arrow IPC stream batches via HTTP POST /v1/arrow.

The format field controls how raw bytes from the input are parsed into log records. linux_ebpf_sensor, macos_es_sensor, windows_ebpf_sensor, and arrow_ipc are Arrow-native and reject format.

ValueDescription
autoAuto-detect (default). Tries CRI first, then JSON, then raw.
criCRI container log format (<timestamp> <stream> <flags> <message>). Multi-line log reassembly via the P partial flag is supported.
jsonNewline-delimited JSON. Each line must be a single JSON object.
rawTreat each line as an opaque string stored in body.
logfmtKey=value pairs (e.g. level=info msg="hello"). Not yet implemented.
consoleHuman-readable coloured output for interactive debugging. Output mode only.

Each pipeline requires at least one output.

FieldTypeRequiredDescription
typestringYesOutput type. See Output types.
namestringNoFriendly name shown in diagnostics.

Send log records as OTLP protobuf to an OpenTelemetry collector.

FieldTypeRequiredDefaultDescription
endpointstringYesFull collector URL, e.g. http://otel-collector:4317 (gRPC) or http://otel-collector:4318/v1/logs (HTTP).
protocolenumNohttphttp or grpc. Invalid values are rejected while parsing config.
compressionenumNononezstd, gzip, or none for the request body. Invalid values are rejected while parsing config.
authobjectNoOptional bearer token or custom headers for HTTP auth.
tlsobjectNoOptional TLS client options (ca_file, cert_file, key_file, insecure_skip_verify) for HTTPS endpoints.
headersmap[string,string]NoAdditional static HTTP headers to send with every export request.
retry_attemptsintegerNoMaximum export retry attempts.
retry_initial_backoff_msintegerNoInitial backoff delay in milliseconds.
retry_max_backoff_msintegerNoMaximum backoff delay in milliseconds.
request_timeout_msintegerNoExport request timeout in milliseconds.
batch_sizeintegerNoMaximum rows per OTLP request.
batch_timeout_msintegerNoMaximum time to buffer rows before exporting.
pipelines:
default:
inputs:
- type: stdin
format: json
outputs:
- type: otlp
endpoint: http://otel-collector:4317
protocol: grpc
compression: zstd

Send newline-delimited JSON rows to an HTTP endpoint with optional request-body compression and auth headers.

FieldTypeRequiredDescription
endpointstringYesFull URL, e.g. http://ingest.example.com/logs.
formatenumNoMust be json when set.
compressionenumNozstd, gzip, or none.
authobjectNoOptional bearer token or custom headers for HTTP auth.
pipelines:
default:
inputs:
- type: stdin
format: json
outputs:
- type: http
endpoint: https://ingest.example.com/logs
compression: zstd

Print records to standard output for local debugging.

FieldTypeRequiredDefaultDescription
formatstringNoconsolejson (newline-delimited JSON), console (coloured text), or text (raw text).
output:
type: stdout
format: console

Ship to Elasticsearch via the Bulk API.

FieldTypeRequiredDefaultDescription
endpointstringYesElasticsearch base URL.
indexstringNologsTarget index name. Must not be empty, and must not contain Elasticsearch-reserved characters or prefixes.
compressionenumNononegzip or none. zstd is rejected for Elasticsearch by validation.
request_modeenumNobufferedbuffered or streaming. Invalid values are rejected while parsing config; streaming currently requires compression: none.
request_timeout_msintegerNo30000HTTP request timeout in milliseconds. Must be >= 1 when set.
tlsobjectNoOptional TLS client options (ca_file, cert_file, key_file, insecure_skip_verify) for HTTPS endpoints.
authobjectNoOptional bearer token or custom headers for HTTP auth.
retryobjectNoOptional retry configuration (max_attempts, initial_backoff_secs, max_backoff_secs).
batchobjectNoOptional batching configuration (max_bytes, max_events, timeout_secs).

Bulk payloads are split before they exceed 5242880 bytes (5 MiB). That limit is internal and is not a YAML field.

output:
type: elasticsearch
endpoint: https://es-cluster:9200
index: logs
compression: gzip
request_mode: buffered
auth:
bearer_token: "${ES_TOKEN}"

Push to Grafana Loki.

FieldTypeRequiredDescription
endpointstringYesLoki base URL. The /loki/api/v1/push path is appended automatically.
request_timeout_msintegerNoHTTP request timeout in milliseconds (default: 30000). Must be >= 1 when set.
tlsobjectNoOptional TLS client options (ca_file, cert_file, key_file, insecure_skip_verify) for HTTPS endpoints.
tenant_idstringNoOptional value sent as X-Scope-OrgID for multi-tenant Loki deployments.
static_labelsmap[string,string]NoStatic labels applied to every pushed log stream. Keys and values must be non-empty.
label_columnsarray[string]NoAdditional log columns to promote as Loki labels.
authobjectNoOptional bearer token or custom headers for HTTP auth.
retryobjectNoOptional retry configuration (max_attempts, initial_backoff_secs, max_backoff_secs).
batchobjectNoOptional batching configuration (max_bytes, max_events, timeout_secs).
output:
type: loki
endpoint: http://loki:3100
tenant_id: team-a
static_labels:
app: ffwd
env: prod
label_columns: [service, level]
auth:
bearer_token: "${LOKI_TOKEN}"

compression is not supported for Loki outputs.

Write records to a file.

FieldTypeRequiredDescription
pathstringYesDestination file path. Parent directory must already exist and be writable.
formatstringNojson for NDJSON output, or text to write raw lines.
output:
type: file
path: /var/log/ffwd/capture.ndjson
format: json

Send newline-delimited JSON records to a TCP endpoint.

FieldTypeRequiredDescription
endpointstringYesHost:port destination (for example tcp.example.com:9000).

TCP output currently emits newline-delimited JSON. encoding, framing, tls, keepalive, timeout_secs, retry, and batch are rejected until the sink runtime implements them.

Send newline-delimited JSON records as UDP datagrams.

FieldTypeRequiredDescription
endpointstringYesHost:port destination (for example udp.example.com:514).

UDP output currently emits newline-delimited JSON using the built-in datagram size. encoding and max_datagram_size_bytes are rejected until the sink runtime implements them.

Drop records intentionally for tests and benchmark baselines. The type value must be quoted as a YAML string; unquoted type: null is YAML’s null value and is rejected. Null outputs do not accept sink-specific fields such as endpoint, format, auth, tls, retry, or batch controls.

output:
type: "null"

ValueStatusDescription
otlpImplementedOTLP protobuf over HTTP or gRPC.
httpImplementedPOST newline-delimited JSON rows to an HTTP endpoint.
stdoutImplementedPrint to stdout (JSON, console, or text).
elasticsearchImplementedElasticsearch Bulk API with index/compression/request-mode controls.
lokiImplementedGrafana Loki push API with label grouping.
fileImplementedWrite NDJSON or text to a local file.
nullImplementedDrop records intentionally for tests and benchmark baselines.
tcpImplementedSend records to a TCP endpoint.
udpImplementedSend records to a UDP endpoint.
arrow_ipcImplementedSend Arrow IPC payloads to an HTTP endpoint.

The optional transform field contains a DataFusion SQL query that is applied to every Arrow RecordBatch produced by the scanner. The source table is always named logs.

transform: SELECT level, message, status FROM logs WHERE status >= 400

Multi-line SQL is supported with YAML block scalars:

transform: |
SELECT
level,
message,
regexp_extract(message, 'request_id=([a-f0-9-]+)', 1) AS request_id,
status
FROM logs
WHERE level IN ('ERROR', 'WARN')
AND status >= 400

The scanner maps each JSON field to a typed Arrow column using the field’s base name (no type suffix):

JSON value typeArrow column typeColumn nameExample
StringStringArray{field}level
IntegerInt64Array{field}status
FloatFloat64Array{field}latency_ms
BooleanStringArray ("true"/"false"){field}enabled
Nullnull in column{field}
Object / ArrayStringArray (raw JSON){field}metadata

When a field contains mixed types across rows, the scanner emits a single Struct column under the field’s base name containing one child per observed type (e.g., a status Struct with int and str children). Legacy single-underscore suffixed columns (status_int, level_str) are not emitted.

Special columns attached by the runtime after scan, plus format-derived columns:

ColumnTypeDescription
bodystringOriginal input line (when input line capture is enabled, e.g. line_field: body, or when a non-JSON CRI line is wrapped for scanner safety).
__source_iduint64FastForward internal row-level source identity when source_metadata: fastforward is set. SQL can reference it, but user-facing sinks drop this internal column unless SQL aliases it to a public name.
file.pathstringECS-style source file path when source_metadata: ecs is set. Quote it in SQL as "file.path".
log.file.pathstringOpenTelemetry-style source file path when source_metadata: otel is set. Quote it in SQL as "log.file.path".
filestringVector-style source file path when source_metadata: vector is set.
_timestampstringTimestamp from the CRI header as an RFC 3339 string (CRI inputs only).
_streamstringCRI stream name (stdout / stderr).

Source metadata is never written into raw input bytes. It is carried beside scanner-ready chunks and, when source_metadata is not none, materialized as table columns before SQL runs. SQL does no hidden pruning or widening: SELECT * returns the columns that exist in the table. User-facing sinks drop known FastForward internal columns such as __source_id by default; alias an internal column to a public name when it should be emitted. User payload fields that happen to start with __ are not treated as internal. Public source path styles currently require inputs that expose source path snapshots. File inputs use filesystem paths; S3 inputs use object keys. Use fastforward for source identity on inputs that do not expose public source descriptors.

FunctionSignatureDescription
int(expr)int(any) → int64Cast any value to int64. Returns NULL on failure.
float(expr)float(any) → float64Cast any value to float64. Returns NULL on failure.
grok(input, pattern)grok(utf8, utf8) → StructApply a Grok pattern to input and return the captures as a struct.
regexp_extract(input, pattern, group)regexp_extract(utf8, utf8, int64) → utf8Return capture group group from a regex match.

Examples:

-- Cast a string column to int
SELECT int(status) AS status FROM logs
-- Extract a field with Grok
SELECT grok(message, '%{IP:client} %{WORD:method} %{URIPATHPARAM:path}') AS parsed FROM logs
-- Extract a named group with regex
SELECT regexp_extract(message, 'user=([a-z]+)', 1) AS user FROM logs
-- Type-cast from environment-injected string
SELECT float(duration) AS duration_ms FROM logs

Enrichment tables are one-row (or multi-row) Arrow tables registered in DataFusion alongside the logs table. Use CROSS JOIN for one-row tables or LEFT JOIN for multi-row lookup tables.

enrichment:
- type: host_info
- type: process_info
- type: network_info
- type: container_info
- type: k8s_cluster_info
- type: k8s_path
- type: static
table_name: labels
labels:
environment: production
region: us-east-1
- type: kv_file
table_name: os_release
path: /etc/os-release
- type: env_vars
table_name: deploy_meta
prefix: FFWD_META_
- type: csv
table_name: assets
path: /etc/ffwd/assets.csv
- type: jsonl
table_name: ip_owners
path: /etc/ffwd/ip-owners.jsonl
- type: geo_database
format: mmdb
path: /data/GeoLite2-City.mmdb

System host metadata, resolved once at startup. Fixed table name: host_info.

Extended fields are sourced from /etc/os-release, /proc/sys/kernel/osrelease, /etc/machine-id, and /proc/sys/kernel/random/boot_id on Linux; they degrade to empty strings on other platforms.

FieldDescription
styleColumn naming convention: raw (default), ecs / beats, or otel.
enrichment:
- type: host_info
style: ecs # Use ECS/Beats dotted column names

Column names by style:

Semanticraw (default)ecs / beatsotel
hostnamehostnamehost.hostnamehost.name
OS typeos_typehost.os.typeos.type
architectureos_archhost.architecturehost.arch
OS nameos_namehost.os.nameos.name
OS familyos_familyhost.os.familyos.family
OS versionos_versionhost.os.versionos.version
kernelos_kernelhost.os.kernelos.kernel
machine idhost_idhost.idhost.id
boot idboot_idhost.boot.idhost.boot.id
SELECT l.*, h.hostname, h.os_type, h.os_name, h.os_kernel
FROM logs l CROSS JOIN host_info h

Agent self-metadata, resolved once at startup. Fixed table name: process_info.

ColumnDescription
agent_nameAlways ffwd.
agent_versionSemantic version of the running binary.
pidProcess ID (as string).
start_timeISO 8601 UTC timestamp captured when the process_info enrichment table is constructed during pipeline startup.
SELECT l.*, p.agent_version, p.start_time
FROM logs l CROSS JOIN process_info p

Network interface metadata from procfs, resolved once at startup. Fixed table name: network_info.

ColumnDescription
hostnameSystem hostname.
primary_ipv4Lexicographically first non-loopback IPv4 address, or empty. On multihomed hosts this may not match the default-route interface; use all_ipv4 for full coverage.
primary_ipv6Lexicographically first non-loopback, non-link-local IPv6 address, or empty. Same caveat as primary_ipv4.
all_ipv4Comma-separated list of all non-loopback IPv4 addresses.
all_ipv6Comma-separated list of all non-loopback, non-link-local IPv6 addresses.
SELECT l.*, n.primary_ipv4
FROM logs l CROSS JOIN network_info n

Container runtime detection from /proc/self/cgroup and /.dockerenv, resolved once at startup. Fixed table name: container_info.

ColumnDescription
container_id64-character hex container ID, or empty if not in a container.
container_runtimedocker, containerd, cri-o, kubernetes, unknown, or empty.
SELECT l.*, c.container_id, c.container_runtime
FROM logs l CROSS JOIN container_info c

Kubernetes cluster metadata from the downward API environment variables, resolved once at startup. Fixed table name: k8s_cluster_info.

Populate these via fieldRef in your DaemonSet pod spec:

env:
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: K8S_CLUSTER_NAME
value: my-cluster # or from a ConfigMap
- name: K8S_SERVICE_ACCOUNT
valueFrom:
fieldRef:
fieldPath: spec.serviceAccountName
ColumnDescription
namespacePod namespace (from mounted service account path).
pod_namePod name (from HOSTNAME).
node_nameNode name (from K8S_NODE_NAME or NODE_NAME).
service_accountService account name (from K8S_SERVICE_ACCOUNT or SERVICE_ACCOUNT env var).
cluster_nameCluster name (from K8S_CLUSTER_NAME or CLUSTER_NAME).
SELECT l.*, k.namespace, k.node_name, k.cluster_name
FROM logs l CROSS JOIN k8s_cluster_info k

Parses Kubernetes pod log paths (e.g. /var/log/pods/<namespace>_<pod>_<uid>/<container>/) to extract metadata. Queries that join a source path to this table should set source_metadata: ecs or another public path style on the file input and quote dotted column names in SQL, for example "file.path". Source paths are never written into raw log bytes.

Columns exposed by the enrichment table (named k8s_pods by default; set table_name: k8s in config to use the k8s alias shown in the examples below):

ColumnDescription
log_path_prefixDirectory prefix used as join key.
namespaceKubernetes namespace.
pod_namePod name.
pod_uidPod UID.
container_nameContainer name.

A one-row table with user-defined label columns from the YAML config.

enrichment:
- type: static
table_name: labels
labels:
environment: production
cluster: us-east-1
tier: backend
SELECT l.*, lbl.environment, lbl.cluster
FROM logs l CROSS JOIN labels lbl

A one-row table populated from environment variables matching a name prefix. The prefix is stripped and the remainder lower-cased to form column names.

enrichment:
- type: env_vars
table_name: deploy_meta
prefix: FFWD_META_

With FFWD_META_CLUSTER=prod and FFWD_META_REGION=us-east-1 set, the table exposes cluster and region columns.

SELECT l.*, m.cluster, m.region
FROM logs l CROSS JOIN deploy_meta m

A one-row table parsed from a KEY=value properties file. Supports unquoted, double-quoted, and single-quoted values. Lines starting with # are comments. Column names are keys lower-cased.

enrichment:
- type: kv_file
table_name: os_release
path: /etc/os-release
refresh_interval: 3600 # optional: reload every N seconds; must be >= 1 when set
SELECT l.*, os.pretty_name, os.version_id
FROM logs l CROSS JOIN os_release os

Useful for /etc/os-release, .env files, or ConfigMap-mounted metadata files.

A multi-row lookup table loaded from a CSV file. All columns are UTF-8 strings and are materialized internally as Arrow Utf8View columns for SQL execution. The first row must be column headers. Empty cells are empty strings; missing trailing cells are NULL.

enrichment:
- type: csv
table_name: assets
path: /etc/ffwd/assets.csv
refresh_interval: 3600 # optional: reload every N seconds; must be >= 1 when set
SELECT l.*, a.owner, a.team
FROM logs l LEFT JOIN assets a ON l.hostname = a.hostname

A multi-row lookup table loaded from a JSON Lines file (one JSON object per line). Columns are the union of all keys across all rows.

enrichment:
- type: jsonl
table_name: ip_owners
path: /etc/ffwd/ip-owners.jsonl
refresh_interval: 1800 # optional: reload every N seconds; must be >= 1 when set
SELECT l.*, ipl.owner
FROM logs l LEFT JOIN ip_owners ipl ON l.client_ip = ipl.ip

Registers a GeoIP database for use with the geo_lookup() SQL function. Supports MaxMind MMDB and CSV IP-range formats.

# MaxMind MMDB format
enrichment:
- type: geo_database
format: mmdb
path: /data/GeoLite2-City.mmdb
refresh_interval: 86400 # optional: reload daily; must be >= 1 when set
# CSV IP-range format (DB-IP Lite compatible)
enrichment:
- type: geo_database
format: csv_range
path: /data/dbip-city-lite.csv
SELECT l.*,
geo_lookup(l.client_ip).country_code AS country,
geo_lookup(l.client_ip).city AS city,
geo_lookup(l.client_ip).latitude AS lat,
geo_lookup(l.client_ip).longitude AS lon
FROM logs l

The geo_lookup() function returns a struct with these fields:

FieldTypeDescription
country_codestringISO 3166-1 two-letter code (e.g. US).
country_namestringFull English country name.
citystringCity name.
regionstringState or subdivision name.
latitudefloatDecimal degrees.
longitudefloatDecimal degrees.
asnintegerAutonomous System Number.
orgstringOrganization name for the ASN.

The optional server block controls the diagnostics server and observability settings.

FieldTypeDefaultDescription
diagnosticsstringnonehost:port to listen for HTTP diagnostics. See Diagnostics API.
log_levelstringinfoLog verbosity. One of error, warn, info, debug, trace.
metrics_endpointstringnoneOTLP endpoint for periodic metrics push, e.g. http://otel-collector:4318.
metrics_interval_secsinteger60Push interval for OTLP metrics in seconds.
server:
diagnostics: 0.0.0.0:9090
log_level: info
metrics_endpoint: http://otel-collector:4318
metrics_interval_secs: 30

When server.diagnostics is configured, FastForward exposes an HTTP API for monitoring and troubleshooting.

RouteMethodDescription
/GETDashboard HTML (visual explorer for metrics and traces).
/liveGETLiveness probe. Returns 200 OK if the process and control plane are running.
/readyGETReadiness probe. Returns 200 OK when required components are initialized and in a ready health state; returns 503 while components are still starting, stopping, stopped, failed, or otherwise not ready.
/admin/v1/statusGETCanonical rich status payload with live/ready state, component health, and per-pipeline counters.
/admin/v1/statsGETAggregate process stats (uptime, RSS, CPU, aggregate line counts).
/admin/v1/configGETCurrently loaded YAML configuration and its file path (disabled by default; enable with FFWD_UNSAFE_EXPOSE_CONFIG=1). May expose sensitive values; do not enable in shared or production environments unless strictly required.
/admin/v1/logsGETRecent log lines from FastForward’s own stderr (ring buffer).
/admin/v1/historyGETTime-series data (1-hour window) for dashboard charts.
/admin/v1/tracesGETRecent batch processing spans for detailed latency analysis.

For input diagnostics, bytes_total reflects source payload bytes accepted at the input boundary. For structured receivers such as OTLP, this is the accepted request-body size as received on the wire, not the in-memory Arrow batch footprint or the post-decompression payload size.


The optional storage block controls where FastForward persists state (checkpoints, disk queue).

FieldTypeDefaultDescription
data_dirstringnoneDirectory for state files. Created if it does not exist.
storage:
data_dir: /var/lib/ffwd

Any value in the config file can reference an environment variable with ${VAR}. Variable names must start with an ASCII letter or _, and then contain only ASCII letters, digits, or _. $VAR stays literal, and default expressions such as ${VAR:fallback} are rejected because : is not valid in variable names.

output:
type: otlp
endpoint: ${OTEL_COLLECTOR_ADDR}
server:
metrics_endpoint: ${METRICS_PUSH_URL}

If the variable is not set, config loading fails fast with a validation error. An unterminated reference such as ${VAR is preserved literally so existing config text is not rewritten accidentally; completed placeholders before that literal tail are still expanded.

Environment variables are expanded as string data. The typed config schema then parses those strings into the field type, so numeric and boolean fields can read values directly from the environment without treating the env value as YAML:

pipelines:
app:
workers: ${FFWD_WORKERS}

String fields remain strings even when the expanded value looks like a YAML number, boolean, or null:

input:
type: file
path: ${LOG_PATH}

Placeholders embedded inside longer strings always remain strings:

output:
type: file
path: "/var/log/${SERVICE_NAME}.jsonl"

Environment variables can also appear in mapping keys. If expansion produces duplicate keys, config loading fails.


pipelines:
app:
inputs:
- name: pod_logs
type: file
path: /var/log/pods/**/*.log
format: cri
transform: |
SELECT
l.level,
l.message,
l.status,
lbl.environment
FROM logs l
CROSS JOIN labels lbl
WHERE l.level IN ('ERROR', 'WARN')
OR l.status >= 500
outputs:
- name: collector
type: otlp
endpoint: ${OTEL_ENDPOINT}
protocol: grpc
compression: zstd
- name: debug
type: stdout
format: console
enrichment:
- type: host_info
- type: process_info
- type: network_info
- type: container_info
- type: k8s_cluster_info
- type: static
table_name: labels
labels:
environment: ${ENVIRONMENT}
cluster: ${CLUSTER_NAME}
server:
diagnostics: 0.0.0.0:9090
log_level: info
metrics_endpoint: ${OTEL_ENDPOINT}
metrics_interval_secs: 60
storage:
data_dir: /var/lib/ffwd