Elasticsearch OpenTelemetry Receiver Dashboards¶
Kibana dashboards for monitoring Elasticsearch clusters using OpenTelemetry's Elasticsearch receiver.
Overview¶
These dashboards provide detailed visibility into cluster health, node performance, JVM metrics, index statistics, and circuit breaker behavior.
Dashboards¶
| Dashboard | File | Description |
|---|---|---|
| Cluster Overview | 01-cluster-overview.yaml |
High-level cluster health, node counts, shard distribution, and pending tasks |
| Node Overview | 02-node-overview.yaml |
Node-level summary with CPU, memory, disk, and operations |
| Node Metrics | 03-node-metrics.yaml |
Detailed node performance metrics including cache and thread pools |
| Index Metrics | 04-index-metrics.yaml |
Index-level statistics, shard sizes, segments, and operations |
| JVM Health | 05-jvm-health.yaml |
JVM memory (heap/non-heap), garbage collection, threads, and memory pools |
| Circuit Breakers | 06-circuit-breakers.yaml |
Circuit breaker memory usage, limits, and trip events |
| Cluster Metadata | 07-cluster-metadata.yaml |
Cluster configuration and metadata exploration |
All dashboards include navigation links for easy switching between views.
Dashboard Definitions¶
Cluster Overview (01-cluster-overview.yaml)
---
dashboards:
- id: elasticsearch-otel-cluster-overview
name: '[Elasticsearch OTel] Cluster Overview'
description: Overview of Elasticsearch cluster health and key metrics
filters:
- field: data_stream.dataset
equals: elasticsearchreceiver.otel
panels:
- title: Navigation Links
size: {w: 48, h: 2}
links:
layout: horizontal
items:
- label: Cluster Overview
dashboard: elasticsearch-otel-cluster-overview
- label: Node Overview
dashboard: elasticsearch-otel-node-overview
- label: Node Metrics
dashboard: elasticsearch-otel-node-metrics
- label: Index Metrics
dashboard: elasticsearch-otel-index-metrics
- label: JVM Health
dashboard: elasticsearch-otel-jvm-health
- label: Circuit Breakers
dashboard: elasticsearch-otel-circuit-breakers
- label: Cluster Metadata
dashboard: elasticsearch-otel-cluster-metadata
- title: Cluster Health Status
hide_title: true
size: {w: 12, h: 4}
lens:
type: metric
data_view: metrics-*
breakdown:
field: status
label: Health
size: 3
primary:
aggregation: unique_count
field: elasticsearch.cluster.name
label: Clusters
- title: Active Clusters
hide_title: true
size: {w: 12, h: 4}
lens:
type: metric
data_view: metrics-*
primary:
aggregation: unique_count
field: elasticsearch.cluster.name
label: Clusters
- title: Total Nodes
hide_title: true
size: {w: 12, h: 4}
lens:
type: metric
data_view: metrics-*
primary:
aggregation: max
field: elasticsearch.cluster.nodes
label: Nodes
format:
type: number
decimals: 0
- title: Data Nodes
hide_title: true
size: {w: 12, h: 4}
lens:
type: metric
data_view: metrics-*
primary:
aggregation: max
field: elasticsearch.cluster.data_nodes
label: Data Nodes
format:
type: number
decimals: 0
- title: Cluster Nodes Over Time
size: {w: 48, h: 8}
lens:
type: line
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.cluster.name
type: values
size: 10
metrics:
- aggregation: average
field: elasticsearch.cluster.nodes
label: Total Nodes
- aggregation: average
field: elasticsearch.cluster.data_nodes
label: Data Nodes
- title: Shard Distribution
description: >-
Active Primary: primary shards serving requests. Initializing:
recovering from copy. Unassigned: no node assigned (red status if
primary).
size: {w: 16, h: 8}
lens:
type: datatable
data_view: metrics-*
breakdowns:
- id: cluster
type: values
field: elasticsearch.cluster.name
label: Cluster
size: 20
metrics:
- id: active_primary
formula: "max(elasticsearch.cluster.shards, kql='state: active_primary')"
label: Active Primary
format:
type: number
decimals: 0
- id: active
formula: "max(elasticsearch.cluster.shards, kql='state: active')"
label: Active Shards
format:
type: number
decimals: 0
- id: initializing
formula: "max(elasticsearch.cluster.shards, kql='state: initializing')"
label: Initializing
format:
type: number
decimals: 0
- id: relocating
formula: "max(elasticsearch.cluster.shards, kql='state: relocating')"
label: Relocating
format:
type: number
decimals: 0
- id: unassigned
formula: "max(elasticsearch.cluster.shards, kql='state: unassigned')"
label: Unassigned
format:
type: number
decimals: 0
- title: Pending Tasks
description: >-
Cluster state updates waiting to be applied. High or increasing values
indicate cluster manager contention.
size: {w: 16, h: 8}
lens:
type: line
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.cluster.name
type: values
size: 10
metrics:
- aggregation: max
field: elasticsearch.cluster.pending_tasks
label: Pending Tasks
format:
type: number
decimals: 0
- title: In-Flight Fetches
description: >-
Snapshot recovery operations in progress. Values >0 indicate the
cluster is recovering data from snapshots.
size: {w: 16, h: 8}
lens:
type: line
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.cluster.name
type: values
size: 10
metrics:
- aggregation: max
field: elasticsearch.cluster.in_flight_fetch
label: In-Flight Fetches
format:
type: number
decimals: 0
- title: Index Statistics
size: {w: 24, h: 8}
lens:
type: datatable
data_view: metrics-*
breakdowns:
- id: cluster
type: values
field: elasticsearch.cluster.name
label: Cluster
size: 20
metrics:
- id: indices
aggregation: unique_count
field: elasticsearch.index.name
label: Indices
format:
type: number
decimals: 0
- id: state_queue
aggregation: average
field: elasticsearch.cluster.state_queue
label: State Queue
format:
type: number
- title: Shards Over Time
size: {w: 24, h: 8}
lens:
type: area
mode: stacked
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
metrics:
- formula: "max(elasticsearch.cluster.shards, kql='state: active_primary')"
label: Active Primary
- formula: "max(elasticsearch.cluster.shards, kql='state: active')"
label: Active
- formula: "max(elasticsearch.cluster.shards, kql='state: initializing')"
label: Initializing
- formula: "max(elasticsearch.cluster.shards, kql='state: relocating')"
label: Relocating
- formula: "max(elasticsearch.cluster.shards, kql='state: unassigned')"
label: Unassigned
Node Overview (02-node-overview.yaml)
---
dashboards:
- id: elasticsearch-otel-node-overview
name: '[Elasticsearch OTel] Node Overview'
description: Overview of Elasticsearch node health and key metrics
filters:
- field: data_stream.dataset
equals: elasticsearchreceiver.otel
controls:
- type: options
label: Cluster Name
data_view: metrics-*
field: elasticsearch.cluster.name
- type: options
label: Node Name
data_view: metrics-*
field: elasticsearch.node.name
panels:
- title: Navigation Links
size: {w: 48, h: 2}
links:
layout: horizontal
items:
- label: Cluster Overview
dashboard: elasticsearch-otel-cluster-overview
- label: Node Overview
dashboard: elasticsearch-otel-node-overview
- label: Node Metrics
dashboard: elasticsearch-otel-node-metrics
- label: Index Metrics
dashboard: elasticsearch-otel-index-metrics
- label: JVM Health
dashboard: elasticsearch-otel-jvm-health
- label: Circuit Breakers
dashboard: elasticsearch-otel-circuit-breakers
- label: Cluster Metadata
dashboard: elasticsearch-otel-cluster-metadata
- title: Node Health Section
size: {w: 48, h: 3}
markdown:
content: '### Node Health'
- title: Document Count
hide_title: true
size: {w: 12, h: 4}
lens:
type: metric
data_view: metrics-*
primary:
aggregation: average
field: elasticsearch.node.documents
label: Documents
format:
type: number
decimals: 0
- title: Open Files
hide_title: true
size: {w: 12, h: 4}
lens:
type: metric
data_view: metrics-*
primary:
aggregation: average
field: elasticsearch.node.open_files
label: Open Files
format:
type: number
decimals: 0
- title: HTTP Connections
hide_title: true
size: {w: 12, h: 4}
lens:
type: metric
data_view: metrics-*
primary:
aggregation: average
field: elasticsearch.node.http.connections
label: Connections
format:
type: number
decimals: 0
- title: Active Nodes
hide_title: true
size: {w: 12, h: 4}
lens:
type: metric
data_view: metrics-*
primary:
aggregation: unique_count
field: elasticsearch.node.name
label: Nodes
format:
type: number
decimals: 0
- title: CPU & Memory Section
size: {w: 48, h: 3}
markdown:
content: '### CPU & Memory'
- title: CPU Usage by Node
size: {w: 24, h: 8}
lens:
type: line
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- aggregation: average
field: elasticsearch.process.cpu.usage
label: CPU %
format:
type: percent
- title: Memory Usage by Node
size: {w: 24, h: 8}
lens:
type: line
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- formula: "average(elasticsearch.os.memory, kql='memory_state: used') / (average(elasticsearch.os.memory, kql='memory_state: used')\
\ + average(elasticsearch.os.memory, kql='memory_state: free') + 0.000001)"
label: Memory %
format:
type: percent
- title: Disk Space Section
size: {w: 48, h: 3}
markdown:
content: '### Disk Space'
- title: Disk Space Available
size: {w: 24, h: 8}
lens:
type: bar
data_view: metrics-*
dimension:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- aggregation: average
field: elasticsearch.node.fs.disk.available
label: Available Bytes
format:
type: bytes
- title: Disk Usage %
size: {w: 24, h: 8}
lens:
type: bar
data_view: metrics-*
dimension:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- formula: 1 - (average(elasticsearch.node.fs.disk.available) / (average(elasticsearch.node.fs.disk.total) + 0.000001))
label: Disk Usage %
format:
type: percent
- title: Cache Performance Section
size: {w: 48, h: 3}
markdown:
content: '### Cache Performance'
- title: Cache Evictions
size: {w: 24, h: 8}
lens:
type: datatable
data_view: metrics-*
breakdowns:
- id: node
type: values
field: elasticsearch.node.name
label: Node
size: 50
metrics:
- id: field_data_evictions
formula: "counter_rate(max(elasticsearch.node.cache.evictions), kql='cache_name: fielddata')"
label: Field Data
- id: query_cache_evictions
formula: "counter_rate(max(elasticsearch.node.cache.evictions), kql='cache_name: query')"
label: Query Cache
- title: Cache Memory Usage
size: {w: 24, h: 8}
lens:
type: datatable
data_view: metrics-*
breakdowns:
- id: node
type: values
field: elasticsearch.node.name
label: Node
size: 50
metrics:
- id: field_data_size
formula: "average(elasticsearch.node.cache.memory.usage, kql='cache_name: fielddata')"
label: Field Data
format:
type: bytes
- id: query_cache_size
formula: "average(elasticsearch.node.cache.memory.usage, kql='cache_name: query')"
label: Query Cache
format:
type: bytes
- title: Thread Pool Overview Section
size: {w: 48, h: 3}
markdown:
content: '### Thread Pool Overview'
- title: Thread Pool Queue Size
size: {w: 48, h: 12}
lens:
type: datatable
data_view: metrics-*
breakdowns:
- id: node
type: values
field: elasticsearch.node.name
label: Node
size: 50
- id: pool
type: values
field: thread_pool_name
label: Pool
size: 20
metrics:
- id: queue
aggregation: average
field: elasticsearch.node.thread_pool.tasks.queued
label: Queue Size
format:
type: number
decimals: 0
- id: active
aggregation: average
field: elasticsearch.node.thread_pool.threads
label: Active Threads
format:
type: number
decimals: 0
Node Metrics (03-node-metrics.yaml)
---
dashboards:
- id: elasticsearch-otel-node-metrics
name: '[Elasticsearch OTel] Node Metrics'
description: Detailed time series metrics for Elasticsearch nodes
filters:
- field: data_stream.dataset
equals: elasticsearchreceiver.otel
controls:
- type: options
label: Cluster Name
data_view: metrics-*
field: elasticsearch.cluster.name
- type: options
label: Node Name
data_view: metrics-*
field: elasticsearch.node.name
panels:
- title: Navigation Links
size: {w: 48, h: 2}
links:
layout: horizontal
items:
- label: Cluster Overview
dashboard: elasticsearch-otel-cluster-overview
- label: Node Overview
dashboard: elasticsearch-otel-node-overview
- label: Node Metrics
dashboard: elasticsearch-otel-node-metrics
- label: Index Metrics
dashboard: elasticsearch-otel-index-metrics
- label: JVM Health
dashboard: elasticsearch-otel-jvm-health
- label: Circuit Breakers
dashboard: elasticsearch-otel-circuit-breakers
- label: Cluster Metadata
dashboard: elasticsearch-otel-cluster-metadata
- title: CPU & Memory Section
size: {w: 48, h: 3}
markdown:
content: '### CPU & Memory'
- title: CPU Usage Over Time
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- aggregation: average
field: elasticsearch.process.cpu.usage
label: CPU %
format:
type: percent
- title: Memory Usage Over Time
size: {w: 24, h: 10}
lens:
type: area
mode: stacked
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- formula: "average(elasticsearch.os.memory, kql='memory_state: used')"
label: Memory Used
format:
type: bytes
- title: Disk I/O Section
size: {w: 48, h: 3}
markdown:
content: '### Disk I/O'
- title: Disk Read Operations
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- formula: "counter_rate(max(elasticsearch.node.operations.completed), kql='operation: read')"
label: Read Ops/sec
- title: Disk Write Operations
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- formula: "counter_rate(max(elasticsearch.node.operations.completed), kql='operation: write')"
label: Write Ops/sec
- title: Operations Section
size: {w: 48, h: 3}
markdown:
content: '### Operations'
- title: Document Operations
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- aggregation: average
field: elasticsearch.node.documents
label: Document Count
format:
type: number
decimals: 0
- title: Indexing Operations
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- formula: "counter_rate(max(elasticsearch.node.operations.completed), kql='operation: index')"
label: Index Ops/sec
- title: Cache Performance Section
size: {w: 48, h: 3}
markdown:
content: '### Cache Performance'
- title: Field Data Cache Evictions
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
filters:
- field: cache_name
equals: fielddata
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- formula: counter_rate(max(elasticsearch.node.cache.evictions))
label: Evictions/sec
- title: Query Cache Evictions
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
filters:
- field: cache_name
equals: query
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- formula: counter_rate(max(elasticsearch.node.cache.evictions))
label: Evictions/sec
- title: Field Data Cache Size
size: {w: 24, h: 10}
lens:
type: area
mode: stacked
data_view: metrics-*
filters:
- field: cache_name
equals: fielddata
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- aggregation: average
field: elasticsearch.node.cache.memory.usage
label: Memory Usage
format:
type: bytes
- title: Query Cache Size
size: {w: 24, h: 10}
lens:
type: area
mode: stacked
data_view: metrics-*
filters:
- field: cache_name
equals: query
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- aggregation: average
field: elasticsearch.node.cache.memory.usage
label: Memory Usage
format:
type: bytes
- title: Thread Pool Metrics Section
size: {w: 48, h: 3}
markdown:
content: '### Thread Pool Metrics'
- title: Thread Pool Queue Size - Search
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
filters:
- field: thread_pool_name
equals: search
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- aggregation: average
field: elasticsearch.node.thread_pool.tasks.queued
label: Queue Size
format:
type: number
decimals: 0
- title: Thread Pool Queue Size - Write
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
filters:
- field: thread_pool_name
equals: write
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- aggregation: average
field: elasticsearch.node.thread_pool.tasks.queued
label: Queue Size
format:
type: number
decimals: 0
- title: Thread Pool Active Threads - Search
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
filters:
- field: thread_pool_name
equals: search
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- aggregation: average
field: elasticsearch.node.thread_pool.threads
label: Active Threads
format:
type: number
decimals: 0
- title: Thread Pool Active Threads - Write
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
filters:
- field: thread_pool_name
equals: write
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- aggregation: average
field: elasticsearch.node.thread_pool.threads
label: Active Threads
format:
type: number
decimals: 0
- title: Thread Pool Rejections
size: {w: 48, h: 10}
lens:
type: line
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: thread_pool_name
type: values
size: 10
metrics:
- formula: "counter_rate(max(elasticsearch.node.thread_pool.tasks.finished), kql='state: rejected')"
label: Rejections/sec
Index Metrics (04-index-metrics.yaml)
---
dashboards:
- id: elasticsearch-otel-index-metrics
name: '[Elasticsearch OTel] Index Metrics'
description: Elasticsearch index-level metrics and operations
filters:
- field: data_stream.dataset
equals: elasticsearchreceiver.otel
controls:
- type: options
label: Cluster Name
data_view: metrics-*
field: elasticsearch.cluster.name
- type: options
label: Index Name
data_view: metrics-*
field: elasticsearch.index.name
panels:
- title: Navigation Links
size: {w: 48, h: 2}
links:
layout: horizontal
items:
- label: Cluster Overview
dashboard: elasticsearch-otel-cluster-overview
- label: Node Overview
dashboard: elasticsearch-otel-node-overview
- label: Node Metrics
dashboard: elasticsearch-otel-node-metrics
- label: Index Metrics
dashboard: elasticsearch-otel-index-metrics
- label: JVM Health
dashboard: elasticsearch-otel-jvm-health
- label: Circuit Breakers
dashboard: elasticsearch-otel-circuit-breakers
- label: Cluster Metadata
dashboard: elasticsearch-otel-cluster-metadata
- title: Index Overview Section
size: {w: 48, h: 3}
markdown:
content: '### Index Overview'
- title: Total Indices
hide_title: true
size: {w: 12, h: 4}
lens:
type: metric
data_view: metrics-*
primary:
aggregation: unique_count
field: elasticsearch.index.name
label: Indices
- title: Total Index Size
hide_title: true
size: {w: 12, h: 4}
lens:
type: metric
data_view: metrics-*
primary:
aggregation: sum
field: elasticsearch.index.shards.size
label: Total Size
- title: Total Segment Count
hide_title: true
size: {w: 12, h: 4}
lens:
type: metric
data_view: metrics-*
primary:
aggregation: sum
field: elasticsearch.index.segments.count
label: Segments
format:
type: number
decimals: 0
- title: Index Size Distribution
size: {w: 24, h: 12}
lens:
type: datatable
data_view: metrics-*
breakdowns:
- id: index
type: values
field: elasticsearch.index.name
label: Index
size: 100
metrics:
- id: size
aggregation: average
field: elasticsearch.index.shards.size
label: Size
format:
type: bytes
- id: segments
aggregation: average
field: elasticsearch.index.segments.count
label: Segments
format:
type: number
decimals: 0
- id: segment_size
aggregation: average
field: elasticsearch.index.segments.size
label: Segment Size
format:
type: bytes
paging:
enabled: true
page_size: 20
- title: Index Shard Distribution
size: {w: 24, h: 12}
lens:
type: datatable
data_view: metrics-*
breakdowns:
- id: index
type: values
field: elasticsearch.index.name
label: Index
size: 100
metrics:
- id: shard_size
aggregation: average
field: elasticsearch.index.shards.size
label: Shard Size
format:
type: bytes
paging:
enabled: true
page_size: 20
- title: Operations Section
size: {w: 48, h: 3}
markdown:
content: '### Operations'
- title: Index Operations Rate
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
filters:
- field: operation
equals: index
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.index.name
type: values
size: 20
metrics:
- formula: counter_rate(max(elasticsearch.index.operations.completed))
label: Index Ops/sec
- title: Index Operation Time
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
filters:
- field: operation
equals: index
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.index.name
type: values
size: 20
metrics:
- formula: counter_rate(max(elasticsearch.index.operations.time))
label: Time (ms/sec)
- title: Search Operations Rate
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
filters:
- field: operation
equals: search
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.index.name
type: values
size: 20
metrics:
- formula: counter_rate(max(elasticsearch.index.operations.completed))
label: Search Ops/sec
- title: Search Operation Time
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
filters:
- field: operation
equals: search
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.index.name
type: values
size: 20
metrics:
- formula: counter_rate(max(elasticsearch.index.operations.time))
label: Time (ms/sec)
- title: Merge Operations Rate
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
filters:
- field: operation
equals: merge
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.index.name
type: values
size: 20
metrics:
- formula: counter_rate(max(elasticsearch.index.operations.completed))
label: Merge Ops/sec
- title: Merge Operation Time
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
filters:
- field: operation
equals: merge
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.index.name
type: values
size: 20
metrics:
- formula: counter_rate(max(elasticsearch.index.operations.time))
label: Time (ms/sec)
- title: Segments Section
size: {w: 48, h: 3}
markdown:
content: '### Segments'
- title: Segment Count Over Time
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.index.name
type: values
size: 20
metrics:
- aggregation: average
field: elasticsearch.index.segments.count
label: Segment Count
format:
type: number
decimals: 0
- title: Segment Memory Usage
size: {w: 24, h: 10}
lens:
type: area
mode: stacked
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.index.name
type: values
size: 20
metrics:
- aggregation: average
field: elasticsearch.index.segments.memory
label: Memory
format:
type: bytes
JVM Health (05-jvm-health.yaml)
---
dashboards:
- id: elasticsearch-otel-jvm-health
name: '[Elasticsearch OTel] JVM Health'
description: JVM health metrics for Elasticsearch nodes
filters:
- field: data_stream.dataset
equals: elasticsearchreceiver.otel
controls:
- type: options
label: Cluster Name
data_view: metrics-*
field: elasticsearch.cluster.name
- type: options
label: Node Name
data_view: metrics-*
field: elasticsearch.node.name
panels:
- title: Navigation Links
size: {w: 48, h: 2}
links:
layout: horizontal
items:
- label: Cluster Overview
dashboard: elasticsearch-otel-cluster-overview
- label: Node Overview
dashboard: elasticsearch-otel-node-overview
- label: Node Metrics
dashboard: elasticsearch-otel-node-metrics
- label: Index Metrics
dashboard: elasticsearch-otel-index-metrics
- label: JVM Health
dashboard: elasticsearch-otel-jvm-health
- label: Circuit Breakers
dashboard: elasticsearch-otel-circuit-breakers
- label: Cluster Metadata
dashboard: elasticsearch-otel-cluster-metadata
- title: JVM Memory Section
size: {w: 48, h: 3}
markdown:
content: '### JVM Memory'
- title: JVM Heap Usage %
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- formula: average(jvm.memory.heap.used) / (average(jvm.memory.heap.max) + 0.000001)
label: Heap Usage %
format:
type: percent
- title: JVM Heap Usage (Bytes)
size: {w: 24, h: 10}
lens:
type: area
mode: stacked
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- aggregation: average
field: jvm.memory.heap.used
label: Heap Used
format:
type: bytes
- title: JVM Heap Committed
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- aggregation: average
field: jvm.memory.heap.committed
label: Heap Committed
format:
type: bytes
- title: JVM Non-Heap Usage
size: {w: 24, h: 10}
lens:
type: area
mode: stacked
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- aggregation: average
field: jvm.memory.nonheap.used
label: Non-Heap Used
format:
type: bytes
- title: Garbage Collection Section
size: {w: 48, h: 3}
markdown:
content: '### Garbage Collection'
- title: GC Collection Count
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: name
type: values
size: 10
metrics:
- formula: counter_rate(max(jvm.gc.collections.count))
label: Collections/sec
- title: GC Collection Time
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: name
type: values
size: 10
metrics:
- formula: counter_rate(max(jvm.gc.collections.elapsed))
label: Time (ms/sec)
- title: GC Collectors by Node
size: {w: 48, h: 12}
lens:
type: datatable
data_view: metrics-*
breakdowns:
- id: node
type: values
field: elasticsearch.node.name
label: Node
size: 50
- id: collector
type: values
field: name
label: Collector
size: 10
metrics:
- id: count
formula: counter_rate(max(jvm.gc.collections.count))
label: Collections/sec
- id: time
formula: counter_rate(max(jvm.gc.collections.elapsed))
label: Time (ms/sec)
- title: Thread Management Section
size: {w: 48, h: 3}
markdown:
content: '### Thread Management'
- title: JVM Thread Count
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- aggregation: average
field: jvm.threads.count
label: Thread Count
format:
type: number
decimals: 0
- title: JVM Classes Loaded
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 20
metrics:
- aggregation: average
field: jvm.classes.loaded
label: Classes Loaded
format:
type: number
decimals: 0
- title: Memory Pool Usage Section
size: {w: 48, h: 3}
markdown:
content: '### Memory Pool Usage'
- title: Memory Pool Usage
size: {w: 24, h: 12}
lens:
type: area
mode: stacked
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: name
type: values
size: 10
metrics:
- aggregation: average
field: jvm.memory.pool.used
label: Used
format:
type: bytes
- title: Memory Pool Max
size: {w: 24, h: 12}
lens:
type: line
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: name
type: values
size: 10
metrics:
- aggregation: average
field: jvm.memory.pool.max
label: Max
format:
type: bytes
Circuit Breakers (06-circuit-breakers.yaml)
---
dashboards:
- id: elasticsearch-otel-circuit-breakers
name: '[Elasticsearch OTel] Circuit Breakers'
description: Elasticsearch circuit breaker monitoring
filters:
- field: data_stream.dataset
equals: elasticsearchreceiver.otel
controls:
- type: options
label: Cluster Name
data_view: metrics-*
field: elasticsearch.cluster.name
- type: options
label: Node Name
data_view: metrics-*
field: elasticsearch.node.name
panels:
- title: Navigation Links
size: {w: 48, h: 2}
links:
layout: horizontal
items:
- label: Cluster Overview
dashboard: elasticsearch-otel-cluster-overview
- label: Node Overview
dashboard: elasticsearch-otel-node-overview
- label: Node Metrics
dashboard: elasticsearch-otel-node-metrics
- label: Index Metrics
dashboard: elasticsearch-otel-index-metrics
- label: JVM Health
dashboard: elasticsearch-otel-jvm-health
- label: Circuit Breakers
dashboard: elasticsearch-otel-circuit-breakers
- label: Cluster Metadata
dashboard: elasticsearch-otel-cluster-metadata
- title: Circuit Breaker Overview Section
size: {w: 48, h: 3}
markdown:
content: '### Circuit Breaker Overview'
- title: Circuit Breaker Utilization %
description: >-
Approaching 100% triggers the breaker and rejects operations. Monitor
for trends approaching threshold.
size: {w: 48, h: 12}
lens:
type: datatable
data_view: metrics-*
breakdowns:
- id: node
type: values
field: elasticsearch.node.name
label: Node
size: 50
- id: breaker
type: values
field: name
label: Circuit Breaker
size: 20
metrics:
- id: current
aggregation: average
field: elasticsearch.breaker.memory.estimated
label: Current (Bytes)
format:
type: bytes
- id: limit
aggregation: average
field: elasticsearch.breaker.memory.limit
label: Limit (Bytes)
format:
type: bytes
- id: utilization
formula: average(elasticsearch.breaker.memory.estimated) / (average(elasticsearch.breaker.memory.limit) + 0.000001)
label: Utilization %
format:
type: percent
paging:
enabled: true
page_size: 20
- title: Circuit Breaker Memory Usage Section
size: {w: 48, h: 3}
markdown:
content: '### Circuit Breaker Memory Usage'
- title: Request Circuit Breaker
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
filters:
- field: name
equals: request
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 50
metrics:
- aggregation: average
field: elasticsearch.breaker.memory.estimated
label: Estimated
format:
type: bytes
- aggregation: average
field: elasticsearch.breaker.memory.limit
label: Limit
format:
type: bytes
- title: Request Breaker Utilization %
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
filters:
- field: name
equals: request
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 50
metrics:
- formula: average(elasticsearch.breaker.memory.estimated) / (average(elasticsearch.breaker.memory.limit) + 0.000001)
label: Utilization %
format:
type: percent
- title: Field Data Circuit Breaker
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
filters:
- field: name
equals: fielddata
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 50
metrics:
- aggregation: average
field: elasticsearch.breaker.memory.estimated
label: Estimated
format:
type: bytes
- aggregation: average
field: elasticsearch.breaker.memory.limit
label: Limit
format:
type: bytes
- title: Field Data Breaker Utilization %
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
filters:
- field: name
equals: fielddata
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 50
metrics:
- formula: average(elasticsearch.breaker.memory.estimated) / (average(elasticsearch.breaker.memory.limit) + 0.000001)
label: Utilization %
format:
type: percent
- title: Parent Circuit Breaker
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
filters:
- field: name
equals: parent
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 50
metrics:
- aggregation: average
field: elasticsearch.breaker.memory.estimated
label: Estimated
format:
type: bytes
- aggregation: average
field: elasticsearch.breaker.memory.limit
label: Limit
format:
type: bytes
- title: Parent Breaker Utilization %
size: {w: 24, h: 10}
lens:
type: line
data_view: metrics-*
filters:
- field: name
equals: parent
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: elasticsearch.node.name
type: values
size: 50
metrics:
- formula: average(elasticsearch.breaker.memory.estimated) / (average(elasticsearch.breaker.memory.limit) + 0.000001)
label: Utilization %
format:
type: percent
- title: Circuit Breaker Trips Section
size: {w: 48, h: 3}
markdown:
content: '### Circuit Breaker Trips'
- title: Trip Events by Breaker
description: >-
Each trip rejects operations to prevent memory exhaustion. Sustained
trips indicate memory-constrained cluster.
size: {w: 48, h: 10}
lens:
type: line
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: name
type: values
size: 50
metrics:
- formula: counter_rate(max(elasticsearch.breaker.tripped))
label: Trips/sec
- title: Trip Events by Node
size: {w: 48, h: 12}
lens:
type: datatable
data_view: metrics-*
breakdowns:
- id: node
type: values
field: elasticsearch.node.name
label: Node
size: 50
- id: breaker
type: values
field: name
label: Circuit Breaker
size: 20
metrics:
- id: trips
formula: counter_rate(max(elasticsearch.breaker.tripped))
label: Trips/sec
paging:
enabled: true
page_size: 20
- title: All Circuit Breakers Section
size: {w: 48, h: 3}
markdown:
content: '### All Circuit Breakers'
- title: All Breakers Memory Usage
size: {w: 48, h: 12}
lens:
type: area
mode: stacked
data_view: metrics-*
dimension:
field: '@timestamp'
type: date_histogram
breakdown:
field: name
type: values
size: 50
metrics:
- aggregation: sum
field: elasticsearch.breaker.memory.estimated
label: Estimated
format:
type: bytes
Cluster Metadata (07-cluster-metadata.yaml)
---
dashboards:
- id: elasticsearch-otel-cluster-metadata
name: '[Elasticsearch OTel] Cluster Metadata'
description: Elasticsearch cluster and node metadata via ES|QL
filters:
- field: data_stream.dataset
equals: elasticsearchreceiver.otel
controls:
- type: options
label: Cluster Name
data_view: metrics-*
field: elasticsearch.cluster.name
- type: options
label: Node Name
data_view: metrics-*
field: elasticsearch.node.name
panels:
- title: Navigation Links
size: {w: 48, h: 2}
links:
layout: horizontal
items:
- label: Cluster Overview
dashboard: elasticsearch-otel-cluster-overview
- label: Node Overview
dashboard: elasticsearch-otel-node-overview
- label: Node Metrics
dashboard: elasticsearch-otel-node-metrics
- label: Index Metrics
dashboard: elasticsearch-otel-index-metrics
- label: JVM Health
dashboard: elasticsearch-otel-jvm-health
- label: Circuit Breakers
dashboard: elasticsearch-otel-circuit-breakers
- label: Cluster Metadata
dashboard: elasticsearch-otel-cluster-metadata
- title: Cluster Configuration Section
size: {w: 48, h: 3}
markdown:
content: '### Cluster Configuration'
- title: Cluster Summary
size: {w: 24, h: 12}
esql:
type: datatable
query:
- FROM metrics-*
- WHERE data_stream.dataset == "elasticsearchreceiver.otel"
- WHERE elasticsearch.cluster.name IS NOT NULL
- STATS nodes = MAX(elasticsearch.cluster.nodes), data_nodes = MAX(elasticsearch.cluster.data_nodes) BY cluster = elasticsearch.cluster.name
- EVAL nodes_int = TO_INTEGER(nodes)
- EVAL data_nodes_int = TO_INTEGER(data_nodes)
- KEEP cluster, nodes_int, data_nodes_int
- RENAME nodes_int AS `Total Nodes`, data_nodes_int AS `Data Nodes`
breakdowns:
- field: cluster
label: Cluster
- field: Total Nodes
label: Total Nodes
- field: Data Nodes
label: Data Nodes
- title: Cluster Health Status
size: {w: 24, h: 12}
esql:
type: datatable
query:
- FROM metrics-*
- WHERE data_stream.dataset == "elasticsearchreceiver.otel"
- WHERE elasticsearch.cluster.name IS NOT NULL
- STATS pending_tasks = MAX(elasticsearch.cluster.pending_tasks), in_flight = MAX(elasticsearch.cluster.in_flight_fetch), state_queue
= MAX(elasticsearch.cluster.state_queue) BY cluster = elasticsearch.cluster.name, health = status
- EVAL pending_tasks_int = TO_INTEGER(pending_tasks)
- EVAL in_flight_int = TO_INTEGER(in_flight)
- EVAL state_queue_int = TO_INTEGER(state_queue)
- KEEP cluster, health, pending_tasks_int, in_flight_int, state_queue_int
- RENAME pending_tasks_int AS `Pending Tasks`, in_flight_int AS `In-Flight Fetches`, state_queue_int AS `State Queue`
breakdowns:
- field: cluster
label: Cluster
- field: health
label: Health
- field: Pending Tasks
label: Pending Tasks
- field: In-Flight Fetches
label: In-Flight Fetches
- field: State Queue
label: State Queue
- title: Node Configuration Section
size: {w: 48, h: 3}
markdown:
content: '### Node Configuration'
- title: Node Metadata
size: {w: 48, h: 15}
esql:
type: datatable
query:
- FROM metrics-*
- WHERE data_stream.dataset == "elasticsearchreceiver.otel"
- WHERE elasticsearch.node.name IS NOT NULL
- STATS count = COUNT(*) BY node = elasticsearch.node.name, cluster = elasticsearch.cluster.name
- KEEP node, cluster
breakdowns:
- field: node
label: Node
- field: cluster
label: Cluster
- title: Node Resource Usage Section
size: {w: 48, h: 3}
markdown:
content: '### Node Resource Usage'
- title: Node CPU and Memory
size: {w: 24, h: 15}
esql:
type: datatable
query:
- FROM metrics-*
- WHERE data_stream.dataset == "elasticsearchreceiver.otel"
- WHERE elasticsearch.node.name IS NOT NULL
- STATS cpu_usage = AVG(elasticsearch.process.cpu.usage), mem_used = AVG(elasticsearch.os.memory) BY node = elasticsearch.node.name
- EVAL cpu_percent = ROUND(cpu_usage * 100, 1)
- EVAL mem_gb = ROUND(mem_used / 1073741824, 2)
- KEEP node, cpu_percent, mem_gb
- RENAME cpu_percent AS `CPU %`, mem_gb AS `Memory Used (GB)`
breakdowns:
- field: node
label: Node
- field: CPU %
label: CPU %
- field: Memory Used (GB)
label: Memory Used (GB)
- title: Node Disk Usage
size: {w: 24, h: 15}
esql:
type: datatable
query:
- FROM metrics-*
- WHERE data_stream.dataset == "elasticsearchreceiver.otel"
- WHERE elasticsearch.node.name IS NOT NULL
- STATS disk_avail = AVG(elasticsearch.node.fs.disk.available), disk_total = AVG(elasticsearch.node.fs.disk.total) BY node = elasticsearch.node.name
- EVAL disk_avail_gb = ROUND(disk_avail / 1073741824, 2)
- EVAL disk_total_gb = ROUND(disk_total / 1073741824, 2)
- EVAL disk_used_pct = CASE(disk_total IS NULL OR disk_total == 0, null, ROUND((1 - (disk_avail / disk_total)) * 100, 1))
- KEEP node, disk_avail_gb, disk_total_gb, disk_used_pct
- RENAME disk_avail_gb AS `Disk Available (GB)`, disk_total_gb AS `Disk Total (GB)`, disk_used_pct AS `Disk Used %`
breakdowns:
- field: node
label: Node
- field: Disk Available (GB)
label: Disk Available (GB)
- field: Disk Total (GB)
label: Disk Total (GB)
- field: Disk Used %
label: Disk Used %
- title: Index Configuration Section
size: {w: 48, h: 3}
markdown:
content: '### Index Configuration'
- title: Index Statistics
size: {w: 48, h: 15}
esql:
type: datatable
query:
- FROM metrics-*
- WHERE data_stream.dataset == "elasticsearchreceiver.otel"
- WHERE elasticsearch.index.name IS NOT NULL
- STATS size = AVG(elasticsearch.index.shards.size), segments = AVG(elasticsearch.index.segments.count), segment_size = AVG(elasticsearch.index.segments.size)
BY index_name = elasticsearch.index.name
- EVAL size_mb = ROUND(size / 1048576, 2)
- EVAL segments_int = TO_INTEGER(segments)
- EVAL segment_size_mb = ROUND(segment_size / 1048576, 2)
- KEEP index_name, size_mb, segments_int, segment_size_mb
- RENAME size_mb AS `Size (MB)`, segments_int AS `Segment Count`, segment_size_mb AS `Segment Size (MB)`
- SORT `Size (MB)` DESC
- LIMIT 100
breakdowns:
- field: index_name
label: Index Name
- field: Size (MB)
label: Size (MB)
- field: Segment Count
label: Segment Count
- field: Segment Size (MB)
label: Segment Size (MB)
- title: Thread Pool Configuration Section
size: {w: 48, h: 3}
markdown:
content: '### Thread Pool Configuration'
- title: Thread Pool Status by Node
size: {w: 48, h: 15}
esql:
type: datatable
query:
- FROM metrics-*
- WHERE data_stream.dataset == "elasticsearchreceiver.otel"
- WHERE elasticsearch.node.name IS NOT NULL
- STATS threads = AVG(elasticsearch.node.thread_pool.threads), queue = AVG(elasticsearch.node.thread_pool.tasks.queued) BY node =
elasticsearch.node.name, pool = thread_pool_name
- EVAL threads_int = TO_INTEGER(threads)
- EVAL queue_int = TO_INTEGER(queue)
- KEEP node, pool, threads_int, queue_int
- RENAME threads_int AS Threads, queue_int AS `Queue Size`
- WHERE Threads > 0 OR `Queue Size` > 0
breakdowns:
- field: node
label: Node
- field: pool
label: Thread Pool
- field: Threads
label: Threads
- field: Queue Size
label: Queue Size
- title: Cache Configuration Section
size: {w: 48, h: 3}
markdown:
content: '### Cache Configuration'
- title: Cache Memory Usage by Node
size: {w: 48, h: 15}
esql:
type: datatable
query:
- FROM metrics-*
- WHERE data_stream.dataset == "elasticsearchreceiver.otel"
- WHERE elasticsearch.node.name IS NOT NULL
- STATS mem_usage = AVG(elasticsearch.node.cache.memory.usage) BY node = elasticsearch.node.name, cache = cache_name
- EVAL mem_usage_mb = ROUND(mem_usage / 1048576, 2)
- KEEP node, cache, mem_usage_mb
- RENAME mem_usage_mb AS `Memory Usage (MB)`
- WHERE `Memory Usage (MB)` > 0
breakdowns:
- field: node
label: Node
- field: cache
label: Cache
- field: Memory Usage (MB)
label: Memory Usage (MB)
- title: JVM Configuration Section
size: {w: 48, h: 3}
markdown:
content: '### JVM Configuration'
- title: JVM Memory Configuration
size: {w: 24, h: 15}
esql:
type: datatable
query:
- FROM metrics-*
- WHERE data_stream.dataset == "elasticsearchreceiver.otel"
- WHERE elasticsearch.node.name IS NOT NULL
- STATS heap_max = AVG(jvm.memory.heap.max), heap_used = AVG(jvm.memory.heap.used), nonheap_used = AVG(jvm.memory.nonheap.used) BY
node = elasticsearch.node.name
- EVAL heap_max_gb = ROUND(heap_max / 1073741824, 2)
- EVAL heap_used_gb = ROUND(heap_used / 1073741824, 2)
- EVAL nonheap_used_mb = ROUND(nonheap_used / 1048576, 2)
- KEEP node, heap_max_gb, heap_used_gb, nonheap_used_mb
- RENAME heap_max_gb AS `Heap Max (GB)`, heap_used_gb AS `Heap Used (GB)`, nonheap_used_mb AS `Non-Heap Used (MB)`
breakdowns:
- field: node
label: Node
- field: Heap Max (GB)
label: Heap Max (GB)
- field: Heap Used (GB)
label: Heap Used (GB)
- field: Non-Heap Used (MB)
label: Non-Heap Used (MB)
- title: JVM Thread and Class Information
size: {w: 24, h: 15}
esql:
type: datatable
query:
- FROM metrics-*
- WHERE data_stream.dataset == "elasticsearchreceiver.otel"
- WHERE elasticsearch.node.name IS NOT NULL
- STATS threads = AVG(jvm.threads.count), classes = AVG(jvm.classes.loaded) BY node = elasticsearch.node.name
- EVAL threads_int = TO_INTEGER(threads)
- EVAL classes_int = TO_INTEGER(classes)
- KEEP node, threads_int, classes_int
- RENAME threads_int AS `Thread Count`, classes_int AS `Classes Loaded`
breakdowns:
- field: node
label: Node
- field: Thread Count
label: Thread Count
- field: Classes Loaded
label: Classes Loaded
- title: Circuit Breaker Configuration Section
size: {w: 48, h: 3}
markdown:
content: '### Circuit Breaker Configuration'
- title: Circuit Breaker Limits
size: {w: 48, h: 15}
esql:
type: datatable
query:
- FROM metrics-*
- WHERE data_stream.dataset == "elasticsearchreceiver.otel"
- WHERE elasticsearch.node.name IS NOT NULL
- STATS estimated = AVG(elasticsearch.breaker.memory.estimated), limit = AVG(elasticsearch.breaker.memory.limit) BY node = elasticsearch.node.name,
breaker = name
- EVAL estimated_mb = ROUND(estimated / 1048576, 2)
- EVAL limit_mb = ROUND(limit / 1048576, 2)
- EVAL utilization_pct = CASE(limit == 0 OR limit IS NULL, null, ROUND((estimated / limit) * 100, 1))
- KEEP node, breaker, estimated_mb, limit_mb, utilization_pct
- RENAME estimated_mb AS `Estimated (MB)`, limit_mb AS `Limit (MB)`, utilization_pct AS `Utilization %`
breakdowns:
- field: node
label: Node
- field: breaker
label: Breaker
- field: Estimated (MB)
label: Estimated (MB)
- field: Limit (MB)
label: Limit (MB)
- field: Utilization %
label: Utilization %
Prerequisites¶
- Elasticsearch: Version 7.x or 8.x with
monitorormanagecluster privileges - OpenTelemetry Collector: Collector Contrib distribution with Elasticsearch receiver
- Kibana: Version compatible with your Elasticsearch cluster
Data Requirements¶
- Data stream dataset:
elasticsearchreceiver.otel - Data view:
metrics-*
OpenTelemetry Collector Configuration¶
receivers:
elasticsearch:
endpoint: http://localhost:9200
username: ${env:ELASTICSEARCH_USERNAME}
password: ${env:ELASTICSEARCH_PASSWORD}
collection_interval: 10s
metrics:
elasticsearch.cluster.health:
enabled: true
elasticsearch.cluster.nodes:
enabled: true
elasticsearch.cluster.data_nodes:
enabled: true
elasticsearch.cluster.shards:
enabled: true
elasticsearch.cluster.pending_tasks:
enabled: true
elasticsearch.node.documents:
enabled: true
elasticsearch.node.fs.disk.available:
enabled: true
elasticsearch.node.cache.memory.usage:
enabled: true
elasticsearch.process.cpu.usage:
enabled: true
jvm.memory.heap.used:
enabled: true
jvm.memory.heap.max:
enabled: true
jvm.gc.collections.count:
enabled: true
exporters:
elasticsearch:
endpoints: ["https://your-elasticsearch-instance:9200"]
service:
pipelines:
metrics:
receivers: [elasticsearch]
exporters: [elasticsearch]
Metrics Reference¶
Critical Naming Convention: The receiver uses two distinct metric naming patterns:
- JVM Metrics - Use
jvm.*prefix (NOelasticsearch.prefix) - Elasticsearch Metrics - Use
elasticsearch.*prefix
Cluster Metrics (default enabled)¶
| Metric | Type | Unit | Description | Attributes |
|---|---|---|---|---|
elasticsearch.cluster.health |
Sum | {status} |
Cluster health (green/yellow/red) | status |
elasticsearch.cluster.nodes |
Sum | {nodes} |
Total node count | — |
elasticsearch.cluster.data_nodes |
Sum | {nodes} |
Data node count | — |
elasticsearch.cluster.shards |
Sum | {shards} |
Shard count by state | state |
elasticsearch.cluster.pending_tasks |
Sum | {tasks} |
Pending cluster tasks | — |
elasticsearch.cluster.in_flight_fetch |
Sum | {fetches} |
Unfinished fetches | — |
elasticsearch.cluster.published_states.full |
Sum | 1 |
Published cluster states | — |
elasticsearch.cluster.published_states.differences |
Sum | 1 |
Differences between states | state |
elasticsearch.cluster.state_queue |
Sum | 1 |
Cluster states in queue | state |
elasticsearch.cluster.state_update.count |
Sum | 1 |
State update attempts | state |
elasticsearch.cluster.state_update.time |
Sum | ms |
Time updating cluster state | state, type |
Node Metrics (default enabled)¶
| Metric | Type | Unit | Description | Attributes |
|---|---|---|---|---|
elasticsearch.node.documents |
Sum | {documents} |
Documents on node | state |
elasticsearch.node.fs.disk.available |
Sum | By |
Available disk space | — |
elasticsearch.node.fs.disk.free |
Sum | By |
Unallocated disk space | — |
elasticsearch.node.fs.disk.total |
Sum | By |
Total disk space | — |
elasticsearch.node.http.connections |
Sum | {connections} |
HTTP connections to node | — |
elasticsearch.node.cache.memory.usage |
Sum | By |
Cache size on node | cache_name |
elasticsearch.node.cache.count |
Sum | {count} |
Query cache hits/misses | type |
elasticsearch.node.cache.evictions |
Sum | {evictions} |
Cache evictions | cache_name |
elasticsearch.node.cluster.connections |
Sum | {connections} |
Cluster TCP connections | — |
elasticsearch.node.cluster.io |
Sum | By |
Cluster network bytes | direction |
elasticsearch.node.disk.io.read |
Sum | KiBy |
Disk bytes read (Linux) | — |
elasticsearch.node.disk.io.write |
Sum | KiBy |
Disk bytes written (Linux) | — |
elasticsearch.node.ingest.documents |
Sum | {documents} |
Documents ingested lifetime | — |
elasticsearch.node.ingest.documents.current |
Sum | {documents} |
Documents currently ingesting | — |
elasticsearch.node.ingest.operations.failed |
Sum | {operation} |
Failed ingest operations | — |
elasticsearch.node.open_files |
Sum | {files} |
Open file descriptors | — |
elasticsearch.node.operations.completed |
Sum | {operations} |
Operations completed | operation |
elasticsearch.node.operations.time |
Sum | ms |
Operation time | operation |
elasticsearch.node.pipeline.ingest.documents.current |
Sum | {documents} |
Documents in pipeline | name |
elasticsearch.node.pipeline.ingest.documents.preprocessed |
Sum | {documents} |
Documents preprocessed | name |
elasticsearch.node.pipeline.ingest.operations.failed |
Sum | {operation} |
Failed pipeline ops | name |
elasticsearch.node.script.compilations |
Sum | {compilations} |
Script compilations | — |
elasticsearch.node.script.cache_evictions |
Sum | 1 |
Script cache evictions | — |
elasticsearch.node.script.compilation_limit_triggered |
Sum | 1 |
Circuit breaker triggers | — |
elasticsearch.node.shards.size |
Sum | By |
Shard storage size | — |
elasticsearch.node.shards.data_set.size |
Sum | By |
Dataset size of shards | — |
elasticsearch.node.shards.reserved.size |
Sum | By |
Reserved shard size | — |
elasticsearch.node.thread_pool.tasks.queued |
Sum | {tasks} |
Queued tasks | thread_pool_name |
elasticsearch.node.thread_pool.tasks.finished |
Sum | {tasks} |
Finished tasks | thread_pool_name, state |
elasticsearch.node.thread_pool.threads |
Sum | {threads} |
Thread count | thread_pool_name, state |
elasticsearch.node.translog.operations |
Sum | {operations} |
Transaction log ops | — |
elasticsearch.node.translog.size |
Sum | By |
Transaction log size | — |
elasticsearch.node.translog.uncommitted.size |
Sum | By |
Uncommitted translog size | — |
elasticsearch.os.cpu.usage |
Gauge | % |
System CPU usage | — |
elasticsearch.os.cpu.load_avg.1m |
Gauge | 1 |
1-minute load average | — |
elasticsearch.os.cpu.load_avg.5m |
Gauge | 1 |
5-minute load average | — |
elasticsearch.os.cpu.load_avg.15m |
Gauge | 1 |
15-minute load average | — |
elasticsearch.os.memory |
Gauge | By |
Physical memory | state |
elasticsearch.process.cpu.usage |
Gauge | 1 |
Process CPU usage (0-1) | — |
JVM Metrics (default enabled)¶
| Metric | Type | Unit | Description | Attributes |
|---|---|---|---|---|
jvm.memory.heap.used |
Gauge | By |
Used heap memory | — |
jvm.memory.heap.max |
Gauge | By |
Maximum heap memory | — |
jvm.memory.heap.committed |
Gauge | By |
Committed heap memory | — |
jvm.memory.nonheap.used |
Gauge | By |
Used non-heap memory | — |
jvm.memory.nonheap.committed |
Gauge | By |
Committed non-heap | — |
jvm.memory.pool.used |
Gauge | By |
Memory pool usage | name |
jvm.memory.pool.max |
Gauge | By |
Memory pool maximum | name |
jvm.gc.collections.count |
Sum | 1 |
GC collection count | name |
jvm.gc.collections.elapsed |
Sum | ms |
GC elapsed time | name |
jvm.classes.loaded |
Gauge | 1 |
Loaded classes | — |
jvm.threads.count |
Gauge | 1 |
JVM thread count | — |
Index Metrics (default enabled)¶
| Metric | Type | Unit | Description | Attributes |
|---|---|---|---|---|
elasticsearch.index.documents |
Sum | {documents} |
Documents in index | state, aggregation |
elasticsearch.index.shards.size |
Sum | By |
Index shard size | aggregation |
elasticsearch.index.segments.count |
Sum | {segments} |
Segments in index | aggregation |
elasticsearch.index.operations.completed |
Sum | {operations} |
Completed operations | operation, aggregation |
elasticsearch.index.operations.time |
Sum | ms |
Operation time | operation, aggregation |
elasticsearch.index.operations.merge.current |
Gauge | {merges} |
Active segment merges | aggregation |
Circuit Breaker Metrics (default enabled)¶
| Metric | Type | Unit | Description | Attributes |
|---|---|---|---|---|
elasticsearch.breaker.memory.estimated |
Gauge | By |
Estimated memory used | name |
elasticsearch.breaker.memory.limit |
Sum | By |
Memory limit | name |
elasticsearch.breaker.tripped |
Sum | 1 |
Circuit breaker trips | name |
Indexing Pressure Metrics (default enabled)¶
| Metric | Type | Unit | Description | Attributes |
|---|---|---|---|---|
elasticsearch.indexing_pressure.memory.limit |
Gauge | By |
Indexing memory limit | — |
elasticsearch.indexing_pressure.memory.total.primary_rejections |
Sum | 1 |
Primary rejections | — |
elasticsearch.indexing_pressure.memory.total.replica_rejections |
Sum | 1 |
Replica rejections | — |
elasticsearch.memory.indexing_pressure |
Sum | By |
Indexing memory | stage |
Metric Attributes¶
| Attribute | Values | Description |
|---|---|---|
status |
green, yellow, red |
Cluster health status |
state (shards) |
active, active_primary, initializing, relocating, unassigned, unassigned_delayed |
Shard state |
state (documents) |
active, deleted |
Document state |
state (queue) |
pending, committed |
Queue state |
state (memory) |
free, used |
Memory state |
state (threads) |
active, idle |
Thread state |
type (update) |
computation_time, context_construction_time, notification_time, publication_time |
State update type |
cache_name |
fielddata, query, request |
Cache type |
thread_pool_name |
analyze, fetch_shard_store, get, listener, search, write |
Thread pool |
operation |
index, delete, get, query, fetch, scroll, suggest, merge, refresh, flush, warmer |
Operation type |
aggregation |
total, primaries, replicas |
Shard aggregation |
direction |
sent, received |
Network direction |
name |
Various | Circuit breaker, GC, or memory pool name |
stage |
coordinating, primary, replica |
Indexing pressure stage |
Resource Attributes¶
| Attribute | Description |
|---|---|
elasticsearch.cluster.name |
Cluster identifier |
elasticsearch.node.name |
Node identifier |
elasticsearch.node.version |
Node version |
elasticsearch.index.name |
Index name |
Metrics Not Used in Dashboards¶
The following metrics are available from the Elasticsearch receiver but are not currently visualized in the dashboards:
Cluster Metrics Not Used¶
| Metric | Type | Unit | Description | Attributes |
|---|---|---|---|---|
elasticsearch.cluster.published_states.full |
Sum | 1 |
Published cluster states | — |
elasticsearch.cluster.published_states.differences |
Sum | 1 |
Differences between states | state |
elasticsearch.cluster.state_update.count |
Sum | 1 |
State update attempts | state |
elasticsearch.cluster.state_update.time |
Sum | ms |
Time updating cluster state | state, type |
Node Metrics Not Used¶
| Metric | Type | Unit | Description | Attributes |
|---|---|---|---|---|
elasticsearch.node.fs.disk.free |
Sum | By |
Unallocated disk space | — |
elasticsearch.node.cache.count |
Sum | {count} |
Query cache hits/misses | type |
elasticsearch.node.disk.io.read |
Sum | KiBy |
Disk bytes read (Linux) | — |
elasticsearch.node.disk.io.write |
Sum | KiBy |
Disk bytes written (Linux) | — |
elasticsearch.node.ingest.documents |
Sum | {documents} |
Documents ingested lifetime | — |
elasticsearch.node.ingest.documents.current |
Sum | {documents} |
Documents currently ingesting | — |
elasticsearch.node.ingest.operations.failed |
Sum | {operation} |
Failed ingest operations | — |
elasticsearch.node.pipeline.ingest.documents.current |
Sum | {documents} |
Documents in pipeline | name |
elasticsearch.node.pipeline.ingest.documents.preprocessed |
Sum | {documents} |
Documents preprocessed | name |
elasticsearch.node.pipeline.ingest.operations.failed |
Sum | {operation} |
Failed pipeline ops | name |
elasticsearch.node.script.compilations |
Sum | {compilations} |
Script compilations | — |
elasticsearch.node.script.cache_evictions |
Sum | 1 |
Script cache evictions | — |
elasticsearch.node.script.compilation_limit_triggered |
Sum | 1 |
Circuit breaker triggers | — |
elasticsearch.node.shards.data_set.size |
Sum | By |
Dataset size of shards | — |
elasticsearch.node.shards.reserved.size |
Sum | By |
Reserved shard size | — |
elasticsearch.node.translog.operations |
Sum | {operations} |
Transaction log ops | — |
elasticsearch.node.translog.size |
Sum | By |
Transaction log size | — |
elasticsearch.node.translog.uncommitted.size |
Sum | By |
Uncommitted translog size | — |
Indexing Pressure Metrics Not Used¶
| Metric | Type | Unit | Description | Attributes |
|---|---|---|---|---|
elasticsearch.indexing_pressure.memory.limit |
Gauge | By |
Indexing memory limit | — |
elasticsearch.indexing_pressure.memory.total.primary_rejections |
Sum | 1 |
Primary rejections | — |
elasticsearch.indexing_pressure.memory.total.replica_rejections |
Sum | 1 |
Replica rejections | — |
elasticsearch.memory.indexing_pressure |
Sum | By |
Indexing memory | stage |