Troubleshooting

Use this page when FastForward is running but results are wrong, incomplete, or unstable. Start with the symptom table, run the exact checks, and compare expected output before changing config.

Use a config that passes validation: ff validate --config config.yaml.
Enable diagnostics while debugging:

server:
  diagnostics: 127.0.0.1:9090
  log_level: info

Keep one terminal tailing logs:

kubectl -n collectors logs -f daemonset/ffwd
# or local/docker: docker logs -f ffwd

Symptom-first triage

Symptom	First check	Expected output	If not expected
No logs arrive at destination	`curl -s http://localhost:9090/admin/v1/status	jq ‘.pipelines[0].inputs’`	`lines_total` increasing
Logs read, but nothing forwarded	`curl -s http://localhost:9090/admin/v1/status	jq ‘.pipelines[0].transform’`	`lines_in > 0` and `lines_out > 0`
Frequent OTLP send errors	Check runtime logs for `error sending`	No repeated connection/auth errors	Fix endpoint/protocol/connectivity
Startup/config errors	`ff validate --config config.yaml`	Output contains `config ok:`	Fix required fields / YAML syntax
Throughput unexpectedly low	`curl -s http://localhost:9090/admin/v1/status	jq ‘.pipelines[0].stage_seconds’`	`output` not dominating total

Scenario 1: No logs arrive at destination

Checks

# 1) Are inputs being read?
curl -s http://localhost:9090/admin/v1/status | jq '.pipelines[0].inputs'

# 2) Are output counters increasing?
curl -s http://localhost:9090/admin/v1/status | jq '.pipelines[0].outputs'

Expected

inputs[*].lines_total increases over time.
outputs[*].lines_total increases over time.

Common causes and fixes

Wrong file glob.
- Fix input.path and validate the directory exists inside the container/pod.
Missing host mount.
- Ensure /var/log is mounted read-only in Docker/Kubernetes.
Permission denied on log files.
- Run with access to host log files and verify container security context.

Verify fix

Run the two checks again and confirm both input and output counters increase.

Scenario 2: Inputs increase, but outputs remain zero

Checks

curl -s http://localhost:9090/admin/v1/status | jq '.pipelines[0].transform'

Expected

lines_in and lines_out are both greater than zero.
filter_drop_rate is not near 1.0.

Common causes and fixes

SQL WHERE clause filters everything.
- Temporarily remove WHERE to confirm data flow.
Case mismatch in filters.
- Example: level = 'error' vs actual ERROR.
Wrong field names.
- Verify selected columns exist in parsed records.

Verify fix

lines_out should increase within a few seconds under load.

Scenario 3: OTLP send failures (`connection refused`, timeouts, auth)

Checks

# HTTP OTLP health check (local)
curl -X POST http://localhost:4318/v1/logs \
  -H 'Content-Type: application/json' \
  -d '{}'

# HTTP OTLP health check (Kubernetes — replace otel-collector with your namespace/service)
curl -X POST http://otel-collector:4318/v1/logs \
  -H 'Content-Type: application/json' \
  -d '{}'

# Kubernetes DNS resolution check
POD=$(kubectl -n collectors get pods -l app=ffwd -o jsonpath='{.items[0].metadata.name}')
kubectl -n collectors exec "$POD" -- nslookup otel-collector

Expected

DNS resolves collector service name.
Endpoint responds (even non-200 can prove reachability).
Runtime logs stop repeating send errors.

Common causes and fixes

Protocol/port mismatch.
- gRPC uses 4317, HTTP uses 4318.
Wrong namespace-qualified service name.
- Use full in-cluster DNS name when needed.
Network policy blocking egress.
- Allow traffic from the collectors namespace to the collector service.

Verify fix

outputs[*].errors stops increasing and outputs[*].lines_total resumes growing.

Scenario 4: Startup fails with config validation errors

Checks

ff validate --config config.yaml

Expected

  ready: default
config ok: 1 pipeline(s)

Exit code 0 on success, 1 on configuration error.

Common causes and fixes

Missing input.path for file input.
Missing endpoint for otlp/http/elasticsearch/loki outputs.
Missing the required pipelines map or using the wrong nesting for inputs / outputs.
YAML scalar mistakes in SQL.

Use block scalar syntax for SQL:

transform: |
  SELECT level, message
  FROM logs
  WHERE level = 'ERROR'

Verify fix

Validation passes, then dry-run succeeds:

ff dry-run --config config.yaml

Scenario 5: Throughput drops or latency spikes

Checks

curl -s http://localhost:9090/admin/v1/status | jq '.pipelines[0].stage_seconds'

Expected

Stage times should be stable, with no sudden sustained growth in output time.

Common causes and fixes

Output stage dominates.
- Move collector closer, enable compression (compression: zstd), or scale collector.
Excessive transform complexity.
- Simplify query or split into named pipelines.
Node resource pressure.
- Increase CPU/memory requests for the ffwd DaemonSet.

Verify fix

Observe reduced output stage time and stable forwarding counters.

Recovery fallback (safe mode)

If you need immediate stabilization while investigating:

Remove complex transform filters temporarily.
Send to stdout in a non-production environment to confirm parse path.
Re-enable OTLP once counters and errors look healthy.

This narrows failure scope without changing input collection semantics.

Helpful diagnostics commands

# Health and readiness
curl -s http://localhost:9090/live | jq .
curl -s http://localhost:9090/ready | jq .

# End-to-end pipeline stats
curl -s http://localhost:9090/admin/v1/status | jq .

# Flattened stats snapshot
curl -s http://localhost:9090/admin/v1/stats | jq .

What’s next

Topic	Where to go
Check pipeline metrics	Monitoring & Diagnostics
Review your config	YAML Reference
Understand the pipeline	Pipeline Explorer (interactive)

Troubleshooting

Symptom-first triage

Scenario 1: No logs arrive at destination

Checks

Expected

Common causes and fixes

Verify fix

Scenario 2: Inputs increase, but outputs remain zero

Checks

Expected

Common causes and fixes

Verify fix

Scenario 3: OTLP send failures (connection refused, timeouts, auth)

Checks

Expected

Common causes and fixes

Verify fix

Scenario 4: Startup fails with config validation errors

Checks

Expected

Common causes and fixes

Verify fix

Scenario 5: Throughput drops or latency spikes

Checks

Expected

Common causes and fixes

Verify fix

Recovery fallback (safe mode)

Helpful diagnostics commands

What’s next

Scenario 3: OTLP send failures (`connection refused`, timeouts, auth)