Skip to content

Performance

StageTime (100K lines)% of total
Scan (JSON to Arrow)21ms57%
Transform (SQL)~0ms~0%
OTLP encode9ms27%
zstd compress6ms16%
Total CPU36ms2.8M lines/sec

The scanner dominates at 57% of CPU time. This is expected — parsing JSON structure is the hard work. The SQL transform is essentially free because DataFusion operates on Arrow columnar data that’s already in the right format.

Each optimization compounds on the others:

  • SIMD structural indexing — One vectorized pass classifies 10 JSON characters across 64 bytes simultaneously. Every subsequent string lookup is O(1) via bitmask + trailing_zeros. This is the same technique that powers simdjson.
  • Zero-copy StringViewArray — String data is never copied during scanning. 16-byte views point directly into the input buffer, shared via reference counting. Five string columns sharing one buffer use 1x memory, not 5x.
  • Field pushdown — FastForward analyzes your SQL query before scanning and only extracts referenced columns. If your query uses 3 of 20 fields, the scanner skips the other 17 — giving 2-3x throughput on wide data.
  • Persistent zstd context — The compression dictionary is reused across batches, avoiding re-initialization overhead.
  • Connection pooling — HTTP clients reuse connections for output requests, amortizing TLS handshake and TCP setup.

At the default 4 MB batch size (~23K lines):

ComponentMemory
Input buffer4 MB (shared with StringView columns)
Arrow RecordBatch~2 MB
Total per batch~6 MB

For stress tests at 1M lines:

  • Real RSS: ~205 MB
  • get_array_memory_size() reports ~926 MB — this overcounts because StringViewArray shares the backing buffer across all string columns
DatasetFieldsThroughput
Narrow (3 fields)33.4M lines/sec
Simple (6 fields)62.0M lines/sec
Wide (20 fields)20560K lines/sec
Wide with pushdown (20 fields, 2 projected)20 to 21.4M lines/sec

Field pushdown recovers most of the throughput loss from wide data. If your SQL only references 2 columns from 20-field logs, you get 1.4M lines/sec instead of 560K.

TopicWhere to go
See how the scanner worksScanner Deep Dive (interactive)
Configure SQL transformsSQL Transforms
Deploy to productionKubernetes DaemonSet
Contribute optimizationsContributing