Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Troubleshooting

This guide helps you diagnose and fix common problems with logfwd.


Common error messages

config validation error: pipeline '...' has no inputs

The named pipeline has an empty inputs list (or no input key in simple layout).

# Wrong — no inputs
pipelines:
  app:
    outputs:
      - type: stdout

# Fixed
pipelines:
  app:
    inputs:
      - type: file
        path: /var/log/app/*.log
    outputs:
      - type: stdout

config validation error: file input requires 'path'

A file input is missing the required path glob.

# Wrong
input:
  type: file

# Fixed
input:
  type: file
  path: /var/log/app/*.log

config validation error: otlp output requires 'endpoint'

An otlp, http, elasticsearch, or loki output is missing the required endpoint field.

# Wrong
output:
  type: otlp

# Fixed
output:
  type: otlp
  endpoint: otel-collector:4317

config validation error: cannot mix top-level input/output with pipelines

You used both the simple layout (top-level input/output) and the advanced layout (pipelines map) in the same file. Choose one.

config YAML error: ...

The YAML is malformed. Common causes:

  • Indentation errors (YAML requires consistent spaces, not tabs).
  • Missing quotes around values that contain special characters like :.
  • Multi-line SQL not using a block scalar (|):
# Wrong — colon in SQL breaks YAML
transform: SELECT level FROM logs WHERE level = 'ERROR'

# Fixed — use block scalar
transform: |
  SELECT level FROM logs WHERE level = 'ERROR'

error sending to OTLP endpoint: connection refused

logfwd cannot reach the configured OTLP collector.

  1. Verify the endpoint address and port are correct.

  2. Check that the collector is running:

    curl -v http://otel-collector:4318/v1/logs
    
  3. If using gRPC (protocol: grpc), ensure port 4317 is open; for HTTP use 4318.

  4. In Kubernetes, verify the service name resolves from within the pod:

    kubectl exec -n collectors <pod> -- nslookup otel-collector
    

error watching path: No such file or directory

The glob pattern in a file input matched no files and the base directory does not exist. Ensure the directory is mounted and the pattern is correct.

error watching path: permission denied

logfwd cannot read the log directory. In Kubernetes, confirm the varlog volume is mounted with readOnly: true at /var/log and that the container user has read access.


Diagnosing dropped or missing data

Step 1 — Check transform filter_drop_rate

Call the /api/pipelines endpoint (see Reading /api/pipelines below). Look at the transform.filter_drop_rate field. A value close to 1.0 means almost all records are being dropped by your WHERE clause.

Example: you intended to keep errors but accidentally wrote a filter that matches nothing:

-- Typo — 'error' vs 'ERROR'
WHERE level = 'error'

Fix: adjust the WHERE clause or remove it temporarily to confirm records are flowing.

Step 2 — Check input and output line counts

In /api/pipelines, compare:

  • inputs[*].lines_total — lines read from source
  • transform.lines_in — lines entering the transform stage
  • transform.lines_out — lines leaving the transform stage
  • outputs[*].lines_total — lines successfully sent

If inputs[*].lines_total is zero, logfwd is not reading any files. Check the glob pattern and confirm new lines are being appended to the files.

If outputs[*].errors is non-zero, there are delivery failures. Check the logs for error sending messages.

Step 3 — Verify file tailing

Use --dry-run to confirm the file input starts without error:

logfwd --config config.yaml --dry-run

Enable debug logging to see file discovery and tail events:

server:
  log_level: debug

Look for log lines like:

[DEBUG] tailing /var/log/pods/app_pod-xyz/app/0.log
[DEBUG] read 4096 bytes from /var/log/pods/app_pod-xyz/app/0.log

If files appear in the log but no records reach the output, the format processing (FramedInput) may be discarding lines. Switch to format: raw temporarily to confirm raw lines are flowing:

input:
  type: file
  path: /var/log/app/*.log
  format: raw

Step 4 — Check for output errors

If outputs[*].errors is increasing, look at stderr for the specific error. Common causes:

SymptomCauseFix
connection refusedCollector is down or unreachableCheck network connectivity and collector health
413 Request Entity Too LargeBatch too largeReduce batch size (future config option)
401 UnauthorizedMissing auth tokenAdd Authorization header support (not yet implemented)
Slow output_s in stage_secondsNetwork latency to collectorUse compression (compression: zstd) or move collector closer

Reading /api/pipelines

Enable the diagnostics server:

server:
  diagnostics: 0.0.0.0:9090

Then query it:

curl -s http://localhost:9090/api/pipelines | jq .

Response schema

{
  "pipelines": [
    {
      "name": "default",
      "inputs": [
        {
          "name": "pod_logs",
          "type": "file",
          "lines_total": 1024000,
          "bytes_total": 204800000,
          "errors": 0
        }
      ],
      "transform": {
        "sql": "SELECT * FROM logs WHERE level = 'ERROR'",
        "lines_in": 1024000,
        "lines_out": 2048,
        "errors": 0,
        "filter_drop_rate": 0.998
      },
      "outputs": [
        {
          "name": "collector",
          "type": "otlp",
          "lines_total": 2048,
          "bytes_total": 512000,
          "errors": 0
        }
      ],
      "batches": {
        "total": 512,
        "avg_rows": 4.0,
        "flush_by_size": 500,
        "flush_by_timeout": 12,
        "dropped_batches_total": 0,
        "scan_errors_total": 0,
        "last_batch_time_ns": 1712160000000000000
      },
      "stage_seconds": {
        "scan": 1.234567,
        "transform": 0.012345,
        "output": 0.456789
      }
    }
  ],
  "system": {
    "uptime_seconds": 3600,
    "version": "0.1.0"
  }
}

Key fields

FieldDescription
inputs[*].lines_totalTotal log lines read from this input since start.
inputs[*].bytes_totalTotal bytes read from this input since start.
inputs[*].errorsTotal read errors (file open failures, etc.).
transform.lines_inLines entering the SQL transform.
transform.lines_outLines produced by the SQL transform after filtering.
transform.filter_drop_rateFraction of input lines dropped: 1 - (lines_out / lines_in).
outputs[*].lines_totalLines successfully delivered to this output.
outputs[*].errorsDelivery errors (network failures, HTTP errors, etc.).
batches.totalTotal Arrow batches processed.
batches.avg_rowsAverage rows per batch.
batches.flush_by_sizeBatches flushed because they reached the row limit.
batches.flush_by_timeoutBatches flushed because the timeout expired.
batches.dropped_batches_totalBatches dropped due to backpressure or errors.
batches.scan_errors_totalScanner errors (malformed input, etc.).
batches.last_batch_time_nsUnix timestamp (ns) of last processed batch.
stage_seconds.scanTotal CPU time spent in the scanner.
stage_seconds.transformTotal CPU time spent in DataFusion SQL.
stage_seconds.outputTotal CPU time spent in the output sink (includes network).
system.uptime_secondsSeconds since logfwd started.

Debug mode / increasing log verbosity

Set log_level in the server block:

server:
  log_level: debug

Available levels (least to most verbose): error, warn, info, debug, trace.

logfwd writes all log output to stderr. Redirect it to a file if needed:

logfwd --config config.yaml 2>logfwd.log

In Kubernetes:

kubectl -n collectors logs daemonset/logfwd

What each level shows

LevelWhat you see
errorFatal errors only.
warnRecoverable problems (e.g. failed delivery, retrying).
infoPipeline start/stop, file discovery, batch flush summaries.
debugPer-file tail events, batch sizes, SQL plan details.
tracePer-record scanner output, individual JSON field extraction.

Warning: trace level generates very large output on busy nodes. Use only for short debugging sessions.


Validating configuration

# Check syntax and field types only (fast)
logfwd --config config.yaml --validate

# Build full pipeline objects including SQL parsing (slower, catches SQL errors)
logfwd --config config.yaml --dry-run

Both commands print a success message or a detailed error to stderr and exit without starting any pipelines.


Checking the SQL transform

Use --dry-run to catch SQL syntax errors:

logfwd --config config.yaml --dry-run
# error: SQL error: Execution error: column "leve" not found

Test your SQL against a sample file using the stdout output:

input:
  type: file
  path: /path/to/sample.log
  format: json

transform: |
  SELECT level, message FROM logs WHERE status >= 500

output:
  type: stdout
  format: console

Run once and inspect the output:

logfwd --config test.yaml

Performance issues

High CPU usage

  1. Check stage_seconds in /api/pipelines. If transform dominates, simplify the SQL query or add indexes via WHERE-clause pushdown.
  2. If scan dominates, check the input volume with inputs[*].bytes_total / system.uptime_seconds. logfwd processes ~1.7 GB/s on a single core; sustained CPU near 100 % on a fast input is expected.

High memory usage

  1. Confirm keep_raw: false (the default). Setting keep_raw: true stores the full JSON line and can double or triple memory consumption.
  2. Check the number of unique field names across your log lines. Each unique field produces at least one Arrow column; very wide schemas use more memory per batch.

Records accumulating / not being shipped

Check outputs[*].errors — if non-zero, delivery is failing and records are being discarded. Enable log_level: warn to see delivery error messages.

Check batches.flush_by_timeout vs batches.flush_by_size. If nearly all flushes are timeout-driven, the input rate is low and latency is bounded by the flush timeout (expected behaviour).