Zero-Copy Scanner

The scanner converts newline-delimited JSON into Apache Arrow RecordBatches using SIMD-accelerated structural classification.

How it works

Stage 1 (SIMD): Classify the entire buffer in one pass. Find all quotes and backslashes using platform-specific SIMD: AVX2/SSE2 on x86_64, NEON on aarch64. This produces 64-bit bitmasks for O(1) string boundary lookups.
Stage 2 (Scalar): Walk the JSON structure using the pre-computed bitmasks. Extract field names and values, detect types (int/float/string), and build Arrow columns directly.

Zero-copy mode

Scanner uses Arrow’s StringViewArray to create 16-byte views into the input buffer. String data is never copied — the original input buffer is shared via reference counting.

The SQL transform is analyzed before scanning. If the query only references level and message, the scanner skips extracting all other fields. On wide data (20+ fields), this provides 2-3x throughput improvement.

Performance

Dataset	Fields	Throughput
Narrow (3 fields)	3	3.4M lines/sec
Simple (6 fields)	6	2.0M lines/sec
Wide (20 fields)	20	560K lines/sec
Wide (2 fields projected)	20→2	1.4M lines/sec

logfwd

Zero-Copy Scanner

How it works

Zero-copy mode

Field pushdown

Performance

Keyboard shortcuts

logfwd

Zero-Copy Scanner

How it works

Zero-copy mode

Field pushdown

Performance