Monitoring & Diagnostics
server: diagnostics: 127.0.0.1:9090 log_level: infoLog level
Section titled “Log level”The log_level field controls the verbosity of FastForward’s own stderr output.
Supported values, from most verbose to least:
| Value | When to use |
|---|---|
trace | Deep debugging of I/O loops and buffer internals (very noisy) |
debug | Investigating specific pipeline behavior or connection issues |
info | Default. Startup, shutdown, config reload, and periodic summary lines |
warn | Only warnings and errors (recommended for high-throughput production) |
error | Only unrecoverable or action-required errors |
You can change the level at runtime by sending a PUT request:
curl -X PUT http://localhost:9090/admin/v1/log_level -H 'Content-Type: application/json' -d '"debug"'Endpoints
Section titled “Endpoints”| Endpoint | Description |
|---|---|
GET /live | Liveness probe (process/control-plane only) |
GET /ready | Readiness probe (200 once initialized) |
GET /admin/v1/status | Canonical rich status JSON (live, ready, component health, per-pipeline detail) |
GET /admin/v1/stats | Flattened JSON for polling/benchmarks |
GET /admin/v1/config | View active YAML configuration (disabled by default; enable with FFWD_UNSAFE_EXPOSE_CONFIG=1) |
GET /admin/v1/logs | View recent log lines from stderr |
GET /admin/v1/history | Time-series data for dashboard charts |
GET /admin/v1/traces | Detailed latency spans for recent batches |
GET / | HTML dashboard |
HTML dashboard
Section titled “HTML dashboard”Visiting http://<host>:9090/ in a browser opens a built-in, single-page HTML
dashboard. It provides a live view of:
- Pipeline throughput — lines/sec per input and output, with sparkline charts.
- Flush behavior — ratio of size-triggered vs. timeout-triggered flushes.
- Stage latency — time spent in scan, transform, and output stages.
- Error counts — recent transport errors or dropped batches.
The dashboard pulls data from /admin/v1/history and refreshes automatically.
No external dependencies or authentication are required. It is a useful
first-look tool before diving into the JSON API or Grafana dashboards.
Status endpoint
Section titled “Status endpoint”GET /admin/v1/status returns the canonical health payload. Here is an
annotated example:
{ "live": { "status": "live" }, "ready": { "status": "ready" }, "pipelines": [ { "name": "host-logs", "inputs": [ { "type": "file", "lines_total": 184200, "bytes_total": 52428800, "errors_total": 0 } ], "transform": { "lines_in": 184200, "lines_out": 91040, "filter_drop_rate": 0.506 }, "outputs": [ { "type": "otlp", "lines_total": 91040, "bytes_total": 11206400, "errors_total": 0 } ] } ], "system": { "uptime_seconds": 3621, "version": "0.14.0" }}Pipelines are returned as an array. Use jq '.pipelines[0]' to access the first pipeline, or filter by name with jq '.pipelines[] | select(.name == "host-logs")'.
| Field | Description |
|---|---|
live | true when the process is running and the control plane is healthy |
ready | true once all pipelines have completed initialization |
uptime_secs | Seconds since the process started |
version | FastForward binary version |
pipelines.<name>.input.lines_total | Total lines read by this input since startup |
pipelines.<name>.input.bytes_total | Total bytes read by this input |
pipelines.<name>.input.errors_total | Cumulative read errors (file permission, connection reset, etc.) |
pipelines.<name>.input.transport | Transport-specific metrics (see Transport Observability below) |
pipelines.<name>.transform.lines_in | Lines entering the SQL transform stage |
pipelines.<name>.transform.lines_out | Lines emitted after filtering |
pipelines.<name>.transform.filter_drop_rate | Fraction of input lines dropped by the filter (higher = more aggressive filter) |
pipelines.<name>.output.lines_total | Lines successfully delivered to the destination |
pipelines.<name>.output.errors_total | Delivery failures (connection errors, timeouts, HTTP 5xx) |
pipelines.<name>.output.last_flush_age_secs | Seconds since the last successful flush; useful for staleness alerts |
Key metrics
Section titled “Key metrics”The diagnostics WebSocket and JSON endpoints use the metric names below
(dot-separated prefix, underscore-separated suffix). The OTLP push path
(metrics_endpoint) uses fully underscore-separated names (e.g.
ffwd_input_lines). See What gets pushed for details.
| Metric | Description |
|---|---|
ffwd.input_lines | Total lines read across all inputs |
ffwd.input_bytes | Total bytes read across all inputs |
ffwd.output_lines | Total lines delivered to outputs |
ffwd.output_bytes | Total bytes delivered to outputs |
ffwd.output_errors | Cumulative output delivery errors |
ffwd.stage_nanos | Time spent in the scan/parse stage (ns) |
ffwd.stage_nanos (transform) | Time spent in the SQL transform stage (ns) |
ffwd.stage_nanos (output) | Time spent serializing output batches (ns) |
ffwd.send_nanos | Time spent transmitting batches to the destination (ns) |
ffwd.queue_wait_nanos | Time a batch waited in the channel before processing (ns) |
ffwd.batches | Total batches flushed |
ffwd.batch_rows | Total rows across all flushed batches |
ffwd.dropped_batches | Batches discarded due to scan, transform, or output errors |
ffwd.backpressure_stalls | Times input stalled on a full channel |
ffwd.inflight_batches | Batches currently in-flight |
Transport Observability
Section titled “Transport Observability”The /admin/v1/status endpoint includes a transport object inside each input’s JSON representation containing specific metrics for that transport type:
- File: exposes
consecutive_error_polls, representing the current file-tail pressure and backoff state. - TCP: exposes
accepted_connections(total accepted) andactive_connections(currently connected clients) indicators. - UDP: exposes
drops_detected(datagrams dropped due to kernel buffer overflows) andrecv_buffer_size(actual kernel receive buffer size applied) indicators.
Alerting guidance
Section titled “Alerting guidance”Use the status endpoint and OTLP metrics to build alerts for the conditions that matter most. The table below lists recommended thresholds as starting points — adjust them based on your traffic patterns and SLOs.
| Condition | Metric / check | Suggested threshold | Severity |
|---|---|---|---|
| Process down | /live returns non-200 | 2 consecutive failures (30 s apart) | Critical |
| Not ready | /ready returns non-200 | > 60 s after container start | Warning |
| Output stale | last_flush_age_secs | > 120 s | Critical |
| Delivery errors | output.errors_total rate | > 0 sustained for 5 min | Warning |
| Input errors | input.errors_total rate | > 0 sustained for 5 min | Warning |
| High drop rate | transform.filter_drop_rate | > 0.99 (dropping >99% of lines) | Info |
| Memory pressure | Container memory usage | > 85 % of limit | Warning |
| CPU saturation | ffwd.stage_nanos rate | Approaching --cpus limit | Warning |
| UDP drops | transport.drops_detected rate | > 0 sustained for 2 min | Warning |
OTLP metrics push
Section titled “OTLP metrics push”In addition to the pull-based diagnostics API, FastForward can push its own internal metrics to an OpenTelemetry Collector over OTLP/HTTP.
server: metrics_endpoint: https://otel-collector:4318 metrics_interval_secs: 60| Field | Description |
|---|---|
metrics_endpoint | URL of the OTLP HTTP receiver (typically port 4318) |
metrics_interval_secs | How often FastForward pushes a metrics batch (default: 60) |
What gets pushed
Section titled “What gets pushed”The counters and histograms listed in the Key metrics table above are exported as
OTLP metrics over HTTP. The push path uses the OpenTelemetry SDK, which registers
instruments with underscore-separated names (e.g. ffwd_input_lines,
ffwd_output_bytes). Each metric carries pipeline, input, or output
metric attributes so you can filter by component. The payload uses OTLP
protobuf encoding.
Verifying the push path
Section titled “Verifying the push path”# 1. Confirm FastForward is sending metrics (look for export lines in debug logs)docker logs ffwd 2>&1 | grep -i "metrics export"
# 2. Query the collector's own metrics to see ingest countscurl -s http://otel-collector:8888/metrics | grep otelcol_receiver_accepted_metric_points
# 3. Check the status endpoint for push errorscurl -s http://localhost:9090/admin/v1/status | jq '.metrics_push'What’s next
Section titled “What’s next”- Docker Deployment — container setup, volumes, and resource constraints.
- Kubernetes Deployment — DaemonSet manifests and production defaults.
- Output Types — configure OTLP and other destinations.
- Troubleshooting — common issues and diagnostic steps.