Learn
FastForward processes log data through four stages. Click any stage to explore its internals.
Select a stage to explore its components
Tail files matching glob patterns with position tracking across restarts. Detects rotation via inode/device identity, drains old files before switching.
Never loses data. Checkpoints survive crashes via atomic write + fsync. File identity tracking (device + inode + fingerprint) means ffwd follows the real file through renames, rotations, and truncations — not just the path.
kqueue/inotify + polling hybrid. Adaptive fast-poll on budget hit. LRU eviction at 1024 open files. Checkpoint via atomic JSON write + fsync.
Accept log lines on a TCP socket with automatic framing detection. Supports RFC 6587 octet counting and newline-delimited fallback.
Max 1024 concurrent clients. Global 256 MiB memory budget. Idle timeout 60s. Oversized lines (>1 MiB) discarded. Optional TLS/mTLS.
Receive log lines as individual datagrams. Kernel buffer tuned to 8 MiB. Drop detection via ENOBUFS/ENOMEM.
Max 256 datagrams per poll, 1 MiB emit budget. Fire-and-forget: no ack, no ordering guarantee.
Receive OpenTelemetry log records over HTTP. Decodes protobuf and JSON payloads directly into Arrow RecordBatches — bypasses the scanner.
Use ffwd as an OTLP relay or aggregator. Because the input produces structured Arrow data directly, there is no parsing overhead — just decode the protobuf envelope and convert. Resource attributes, trace context, and severity are all preserved.
Structured input: produces RecordBatch directly, not raw bytes. Resource attributes flatten under canonical resource.attributes.* columns. Bounded channel to pipeline.
Receive newline-delimited payloads via HTTP POST. Configurable path, method, body size limit, and response code.
Background thread with bounded channel. Max 10 MiB body. Supports gzip Content-Encoding.
Receive Arrow IPC stream payloads over HTTP POST. Bypasses the scanner — decoded RecordBatches go directly into the pipeline.
Scanner bypass: zero parsing overhead. Canonical content type: application/vnd.apache.arrow.stream.
eBPF (Linux), EndpointSecurity (macOS), or ETW (Windows) sensor. Arrow-native control and signal rows for process, file, network, DNS events.
Arrow-native: no format/scanner. Control-plane reload via JSON file. Configurable signal families.
Emit synthetic JSON log lines for benchmarking and pipeline testing. Configurable EPS, batch size, and timestamp generation.
Profiles: logs (synthetic requests) and record (flat JSON from attributes). Zero external dependencies.
Detects and parses CRI, JSON, or raw format. CRI parser strips timestamp/stream prefix and reassembles multi-line logs via the P partial flag.
Auto-detection means one config works for mixed log sources. The CRI parser handles Kubernetes' container runtime format natively — no sidecar or pre-processing needed. Multi-line reassembly via P/F flags means stack traces arrive as single log records.
Auto mode: try CRI first, then JSON, then raw fallback. CRI reassembler buffers partials (P flag), emits on full (F flag). CRLF normalized.
One vectorized pass over the entire buffer using platform-native SIMD. Classifies 10 structural characters simultaneously in 64-byte blocks.
This is why ffwd is fast. Instead of parsing JSON byte-by-byte, SIMD instructions process 16-32 bytes in a single CPU cycle. One pass over the buffer produces bitmasks that make every subsequent string lookup O(1) — just a trailing_zeros instruction. The same technique powers simdjson. ffwd uses AVX2 on Intel/AMD, NEON on ARM (Apple Silicon, Graviton).
AVX2/SSE2 on x86_64, NEON on aarch64 via wide crate. Output: real_quotes + in_string bitmasks. O(1) string boundary lookup via trailing_zeros.
Scalar state machine walks top-level JSON objects using pre-computed bitmasks. Resolves field names to column indices once per batch via HashMap.
Field pushdown is the key optimization here. ffwd analyzes your SQL query before scanning and only extracts the columns you reference. If your query uses 3 of 20 fields, the scanner skips the other 17 — giving you 2-3x throughput on wide data for free.
Field pushdown: QueryAnalyzer extracts referenced columns from SQL. Scanner skips unwanted fields for 2-3x throughput on wide data.
Deferred builder pattern: collects (row, offset, len) views during scan, bulk-builds Arrow columns at finish. Zero-copy StringViewArray shares the input buffer.
Apache Arrow is the secret weapon. By building columnar RecordBatches, ffwd gets DataFusion SQL execution for free — the same query engine that powers Apache Spark and InfluxDB. StringViewArray means string data is never copied: 16-byte views point directly into the input buffer. This is why ffwd uses ~6 MB of RAM to process millions of log lines.
scan() produces StringViewArray (zero-copy via Bytes refcount). scan_detached() produces StringArray (single bulk copy). Resource metadata injected as columns.
Each batch cycle registers source RecordBatches as partitions of a "logs" MemTable. Schema changes detected via hash — context rebuilt only when needed.
Why SQL instead of a custom DSL? Because you already know SQL. DataFusion is a production-grade query engine (used in InfluxDB, Comet, and Ballista) that gives ffwd full SQL support — SELECT, WHERE, GROUP BY, JOINs, window functions, subqueries — without inventing a new language. Your team can write log filters on day one.
Lazy SessionContext creation. Full SQL: SELECT, WHERE, GROUP BY, HAVING, JOINs, subqueries, window functions. Enrichment tables registered per batch.
Scalar functions registered into the DataFusion context alongside built-in functions.
grok() supports 20+ built-in patterns (IP, UUID, TIMESTAMP_ISO8601, etc.). geo_lookup() requires MaxMind .mmdb database. hash() for deterministic partitioning.
SQL-joinable tables injected into the DataFusion context. Refreshed per batch cycle if the underlying snapshot changes.
k8s_path: zero-cost path parsing, no K8s API calls. static: one-row table from config labels. All exposed as regular SQL tables for JOIN.
Encodes log records as OTLP protobuf with 3-phase encoding: encode LogRecords, group by resource/scope, write final protobuf with computed sizes.
OTLP is the standard. By encoding directly to protobuf with zstd compression, ffwd produces payloads that are 5-10x smaller than JSON. Resource grouping means pods from the same deployment share metadata in a single ResourceLogs envelope — further reducing wire size.
Groups rows by resource.attributes.* columns into ResourceLogs. Two encoders: handwritten (full) and generated fast path. zstd/gzip compression. HTTP and gRPC protocols.
Ships via the Bulk API with per-document error handling. Oversized payloads split recursively. @timestamp injected if missing.
Precomputed action bytes (zero alloc hot path). ISO-8601 timestamp on 47-byte stack buffer. Per-document error extraction. Reactive split on 413.
Pushes to Grafana Loki with automatic label grouping and key sanitization.
Label cardinality control via config. Duplicate label key detection at validation time.
Print to stdout for debugging. Console: colored, human-readable. JSON: one object per line. Text: raw body column.
Three formats: console (colored), json (NDJSON), text (raw). Zero network overhead.
Write NDJSON or raw text to a local file for capture, replay testing, and local archival.
Send records to a TCP endpoint with persistent connections and retry backoff.
Fire-and-forget datagram output for low-overhead forwarding.
Send Arrow IPC payloads for zero-copy handoff to downstream consumers.
Explore by stage
Section titled “Explore by stage”| Stage | What’s inside |
|---|---|
| Inputs | 9 input types — stdin, file tailing, TCP, UDP, OTLP, HTTP, Arrow IPC, sensor, generator |
| Scanner Deep Dive | SIMD structural indexing, field pushdown, zero-copy Arrow building |
| Columnar | Why Arrow columnar storage makes SQL fast |
| Backpressure in Action | How bounded channels propagate pressure through the pipeline |
| Checkpoint Ordering | Contiguous ACK advancement and crash-safe position tracking |
| Performance | 2.8M lines/sec benchmarks, stage breakdown, memory profile |