Kubernetes Cluster Receiver Dashboards
Kubernetes cluster monitoring dashboards using OpenTelemetry k8sclusterreceiver metrics, designed for SRE and DevOps workflows.
Overview
The k8sclusterreceiver is an OpenTelemetry Collector receiver that collects cluster-level metrics from the Kubernetes API server. It provides visibility into cluster health, workload status, resource utilization, and autoscaling behavior.
Important: The k8sclusterreceiver must be deployed as a single instance per cluster to avoid duplicate metrics.
Dashboards
Dashboard
File
Description
Cluster Overview
01-cluster-overview.yaml
Entry point for cluster health triage
Workload Health
02-workload-health.yaml
Deployment and container health
Resource Allocation
03-resource-allocation.yaml
Capacity planning and quota analysis
Batch Jobs
04-batch-jobs.yaml
Job and CronJob monitoring
Autoscaling
05-autoscaling.yaml
HPA scaling behavior
All dashboards include navigation links for easy switching between views.
Dashboard Definitions
Cluster Overview (01-cluster-overview.yaml)
---
# Kubernetes Cluster Overview Dashboard
# SRE Entry Point: "Is my cluster healthy? Where should I look?"
dashboards :
- id : k8s-cluster-overview
name : '[Metrics K8s Cluster] Overview'
description : High-level Kubernetes cluster health for rapid SRE triage
controls :
- type : options
label : Namespace
data_view : metrics-*
field : k8s.namespace.name
filters :
- field : data_stream.dataset
equals : kubernetesclusterreceiver.otel
panels :
# ═══════════════════════════════════════════════════════════════════════
# NAVIGATION
# ═══════════════════════════════════════════════════════════════════════
- title : Navigation
size : { w : 48 , h : 3 }
links :
layout : horizontal
items :
- label : 📊 Overview
dashboard : k8s-cluster-overview
- label : ⚙️ Workloads
dashboard : k8s-cluster-workloads
- label : 📦 Resources
dashboard : k8s-cluster-resources
- label : 🔄 Batch Jobs
dashboard : k8s-cluster-batch
- label : 📈 Autoscaling
dashboard : k8s-cluster-hpa
# ═══════════════════════════════════════════════════════════════════════
# CLUSTER HEALTH SUMMARY (4 metric cards - at-a-glance health)
# ═══════════════════════════════════════════════════════════════════════
- title : Cluster Health
size : { w : 48 , h : 3 }
markdown :
content : '## 🏥 Cluster Health'
font_size : 14
- title : Running Pods
description : Pods in Running phase (phase=2).
hide_title : true
size : { w : 12 , h : 4 }
lens :
type : metric
data_view : metrics-*
primary :
formula : unique_count(k8s.pod.name)
label : Running
format :
type : number
decimals : 0
filters :
- field : k8s.pod.phase
equals : '2'
- title : Pending Pods
description : >-
Pods in Pending phase (phase=1), waiting for scheduling or container
image pull.
hide_title : true
size : { w : 12 , h : 4 }
lens :
type : metric
data_view : metrics-*
primary :
formula : unique_count(k8s.pod.name)
label : Pending
format :
type : number
decimals : 0
filters :
- field : k8s.pod.phase
equals : '1'
- title : Failed Pods
description : Pods in Failed phase (phase=4). Check pod logs for root cause.
hide_title : true
size : { w : 12 , h : 4 }
lens :
type : metric
data_view : metrics-*
primary :
formula : unique_count(k8s.pod.name)
label : Failed
format :
type : number
decimals : 0
filters :
- field : k8s.pod.phase
equals : '4'
- title : Container Restarts
hide_title : true
size : { w : 12 , h : 4 }
lens :
type : metric
data_view : metrics-*
primary :
formula : sum(k8s.container.restarts)
label : Restarts
format :
type : number
decimals : 0
filters :
- exists : k8s.container.restarts
# ═══════════════════════════════════════════════════════════════════════
# ANALYSIS: Pod Health Distribution & Trends
# ═══════════════════════════════════════════════════════════════════════
- title : Pod Health Distribution
size : { w : 20 , h : 14 }
lens :
type : pie
data_view : metrics-*
breakdowns :
- field : k8s.pod.phase
type : values
label : Status
size : 5
metrics :
- aggregation : unique_count
field : k8s.pod.name
label : Pods
format :
type : number
decimals : 0
color :
palette : eui_amsterdam_color_blind
assignments :
- value : '1'
color : '#FEC514'
- value : '2'
color : '#54B399'
- value : '3'
color : '#6092C0'
- value : '4'
color : '#D36086'
- value : '5'
color : '#9170B8'
- title : Pod Health Over Time
size : { w : 28 , h : 14 }
lens :
type : area
mode : stacked
data_view : metrics-*
dimension :
field : '@timestamp'
type : date_histogram
breakdown :
field : k8s.pod.phase
type : values
size : 5
metrics :
- aggregation : unique_count
field : k8s.pod.name
label : Pods
format :
type : number
decimals : 0
color :
palette : eui_amsterdam_color_blind
assignments :
- value : '1'
color : '#FEC514'
- value : '2'
color : '#54B399'
- value : '3'
color : '#6092C0'
- value : '4'
color : '#D36086'
- value : '5'
color : '#9170B8'
# ═══════════════════════════════════════════════════════════════════════
# WORKLOAD HEALTH PREVIEW
# ═══════════════════════════════════════════════════════════════════════
- title : Workload Health
size : { w : 48 , h : 3 }
markdown :
content : '## 🚀 Workload Health Preview'
font_size : 14
- title : Deployments - Desired vs Available
description : >-
Gap between lines indicates deployments that can't reach desired
replica count.
size : { w : 24 , h : 12 }
lens :
type : line
data_view : metrics-*
dimension :
field : '@timestamp'
type : date_histogram
metrics :
- formula : sum(k8s.deployment.desired)
label : Desired
format :
type : number
decimals : 0
- formula : sum(k8s.deployment.available)
label : Available
format :
type : number
decimals : 0
filters :
- exists : k8s.deployment.name
- title : Container Restarts by Namespace
size : { w : 24 , h : 12 }
lens :
type : bar
data_view : metrics-*
dimension :
field : k8s.namespace.name
type : values
size : 10
sort :
by : Restarts
direction : desc
metrics :
- formula : sum(k8s.container.restarts)
label : Restarts
format :
type : number
decimals : 0
filters :
- exists : k8s.container.restarts
# ═══════════════════════════════════════════════════════════════════════
# DETAIL: Unhealthy Deployments Table
# ═══════════════════════════════════════════════════════════════════════
- title : Unhealthy Deployments
size : { w : 48 , h : 3 }
markdown :
content : '## 🔍 Unhealthy Deployments (Desired ≠ Available)'
font_size : 14
- title : Deployments Missing Replicas
description : >-
Missing = Desired - Available. Positive values indicate failed
provisioning or insufficient resources.
size : { w : 48 , h : 12 }
lens :
type : datatable
data_view : metrics-*
breakdowns :
- field : k8s.deployment.name
type : values
size : 25
label : Deployment
sort :
by : Missing
direction : desc
- field : k8s.namespace.name
type : values
size : 1
label : Namespace
metrics :
- field : k8s.deployment.desired
aggregation : max
label : Desired
format :
type : number
decimals : 0
- field : k8s.deployment.available
aggregation : max
label : Available
format :
type : number
decimals : 0
- formula : max(k8s.deployment.desired) - max(k8s.deployment.available)
label : Missing
format :
type : number
decimals : 0
filters :
- exists : k8s.deployment.name
Workload Health (02-workload-health.yaml)
---
# Kubernetes Workload Health Dashboard
# SRE Question: "Are my deployments healthy? What's crashing?"
dashboards :
- id : k8s-cluster-workloads
name : '[Metrics K8s Cluster] Workload Health'
description : Deployment, StatefulSet, DaemonSet, and container health monitoring
controls :
- type : options
label : Namespace
data_view : metrics-*
field : k8s.namespace.name
- type : options
label : Deployment
data_view : metrics-*
field : k8s.deployment.name
filters :
- field : data_stream.dataset
equals : kubernetesclusterreceiver.otel
panels :
# ═══════════════════════════════════════════════════════════════════════
# NAVIGATION
# ═══════════════════════════════════════════════════════════════════════
- title : Navigation
size : { w : 48 , h : 3 }
links :
layout : horizontal
items :
- label : 📊 Overview
dashboard : k8s-cluster-overview
- label : ⚙️ Workloads
dashboard : k8s-cluster-workloads
- label : 📦 Resources
dashboard : k8s-cluster-resources
- label : 🔄 Batch Jobs
dashboard : k8s-cluster-batch
- label : 📈 Autoscaling
dashboard : k8s-cluster-hpa
# ═══════════════════════════════════════════════════════════════════════
# CONTAINER HEALTH SUMMARY (4 metric cards)
# ═══════════════════════════════════════════════════════════════════════
- title : Container Health
size : { w : 48 , h : 3 }
markdown :
content : '## 🐳 Container Health'
font_size : 14
- title : Ready Containers
description : Containers with all startup and liveness probes passing. Can receive traffic.
hide_title : true
size : { w : 12 , h : 4 }
lens :
type : metric
data_view : metrics-*
primary :
formula : unique_count(k8s.container.name)
label : Ready
format :
type : number
decimals : 0
filters :
- field : k8s.container.ready
equals : '1'
- title : Not Ready Containers
description : >-
Containers failing probes. Check pod logs and events for startup or
health issues.
hide_title : true
size : { w : 12 , h : 4 }
lens :
type : metric
data_view : metrics-*
primary :
formula : unique_count(k8s.container.name)
label : Not Ready
format :
type : number
decimals : 0
filters :
- field : k8s.container.ready
equals : '0'
- title : Total Restarts
hide_title : true
size : { w : 12 , h : 4 }
lens :
type : metric
data_view : metrics-*
primary :
formula : sum(k8s.container.restarts)
label : Restarts
format :
type : number
decimals : 0
filters :
- exists : k8s.container.restarts
- title : Containers Restarting
description : >-
Containers that have restarted at least once since pod creation.
Frequent restarts indicate instability.
hide_title : true
size : { w : 12 , h : 4 }
lens :
type : metric
data_view : metrics-*
primary :
formula : unique_count(k8s.container.name)
label : With Restarts
format :
type : number
decimals : 0
filters :
- field : k8s.container.restarts
gt : '0'
# ═══════════════════════════════════════════════════════════════════════
# DEPLOYMENT HEALTH
# ═══════════════════════════════════════════════════════════════════════
- title : Deployment Health
size : { w : 48 , h : 3 }
markdown :
content : '## 🚀 Deployment Health (Desired vs Available)'
font_size : 14
- title : Deployments
size : { w : 24 , h : 12 }
lens :
type : line
data_view : metrics-*
dimension :
field : '@timestamp'
type : date_histogram
metrics :
- formula : sum(k8s.deployment.desired)
label : Desired
format :
type : number
decimals : 0
- formula : sum(k8s.deployment.available)
label : Available
format :
type : number
decimals : 0
filters :
- exists : k8s.deployment.name
- title : StatefulSets
size : { w : 24 , h : 12 }
lens :
type : line
data_view : metrics-*
dimension :
field : '@timestamp'
type : date_histogram
metrics :
- formula : sum(k8s.statefulset.desired_pods)
label : Desired
format :
type : number
decimals : 0
- formula : sum(k8s.statefulset.ready_pods)
label : Ready
format :
type : number
decimals : 0
filters :
- exists : k8s.statefulset.name
- title : DaemonSets
size : { w : 24 , h : 12 }
lens :
type : line
data_view : metrics-*
dimension :
field : '@timestamp'
type : date_histogram
metrics :
- formula : sum(k8s.daemonset.desired_scheduled_nodes)
label : Desired Nodes
format :
type : number
decimals : 0
- formula : sum(k8s.daemonset.ready_nodes)
label : Ready Nodes
format :
type : number
decimals : 0
filters :
- exists : k8s.daemonset.name
- title : ReplicaSets
size : { w : 24 , h : 12 }
lens :
type : line
data_view : metrics-*
dimension :
field : '@timestamp'
type : date_histogram
metrics :
- formula : sum(k8s.replicaset.desired)
label : Desired
format :
type : number
decimals : 0
- formula : sum(k8s.replicaset.available)
label : Available
format :
type : number
decimals : 0
filters :
- exists : k8s.replicaset.name
# ═══════════════════════════════════════════════════════════════════════
# CONTAINER ANALYSIS
# ═══════════════════════════════════════════════════════════════════════
- title : Container Analysis
size : { w : 48 , h : 3 }
markdown :
content : '## 📊 Container Analysis'
font_size : 14
- title : Container Readiness Over Time
description : >-
Green (1) = ready, red (0) = not ready. Correlate dips with
deployments or incidents.
size : { w : 24 , h : 12 }
lens :
type : area
mode : stacked
data_view : metrics-*
dimension :
field : '@timestamp'
type : date_histogram
breakdown :
field : k8s.container.ready
type : values
size : 2
metrics :
- aggregation : unique_count
field : k8s.container.name
label : Containers
format :
type : number
decimals : 0
color :
palette : eui_amsterdam_color_blind
assignments :
- value : '1'
color : '#54B399'
- value : '0'
color : '#D36086'
- title : Top Restarting Containers
size : { w : 24 , h : 12 }
lens :
type : bar
data_view : metrics-*
dimension :
field : k8s.container.name
type : values
size : 15
sort :
by : Restarts
direction : desc
metrics :
- field : k8s.container.restarts
aggregation : max
label : Restarts
format :
type : number
decimals : 0
filters :
- field : k8s.container.restarts
gt : '0'
# ═══════════════════════════════════════════════════════════════════════
# DETAIL TABLES
# ═══════════════════════════════════════════════════════════════════════
- title : Workload Details
size : { w : 48 , h : 3 }
markdown :
content : '## 🔍 Workload Status Details'
font_size : 14
- title : Deployment Status
size : { w : 24 , h : 12 }
lens :
type : datatable
data_view : metrics-*
breakdowns :
- field : k8s.deployment.name
type : values
size : 20
label : Deployment
- field : k8s.namespace.name
type : values
size : 1
label : Namespace
metrics :
- field : k8s.deployment.desired
aggregation : max
label : Desired
format :
type : number
decimals : 0
- field : k8s.deployment.available
aggregation : max
label : Available
format :
type : number
decimals : 0
filters :
- exists : k8s.deployment.name
- title : StatefulSet Status
size : { w : 24 , h : 12 }
lens :
type : datatable
data_view : metrics-*
breakdowns :
- field : k8s.statefulset.name
type : values
size : 20
label : StatefulSet
- field : k8s.namespace.name
type : values
size : 1
label : Namespace
metrics :
- field : k8s.statefulset.desired_pods
aggregation : max
label : Desired
format :
type : number
decimals : 0
- field : k8s.statefulset.ready_pods
aggregation : max
label : Ready
format :
type : number
decimals : 0
- field : k8s.statefulset.current_pods
aggregation : max
label : Current
format :
type : number
decimals : 0
filters :
- exists : k8s.statefulset.name
Resource Allocation (03-resource-allocation.yaml)
---
# Kubernetes Resource Allocation Dashboard
# SRE Question: "Am I running out of resources? Are workloads over/under-provisioned?"
dashboards :
- id : k8s-cluster-resources
name : '[Metrics K8s Cluster] Resource Allocation'
description : CPU, memory, and storage requests vs limits for capacity planning
controls :
- type : options
label : Namespace
data_view : metrics-*
field : k8s.namespace.name
- type : options
label : Node
data_view : metrics-*
field : k8s.node.name
filters :
- field : data_stream.dataset
equals : kubernetesclusterreceiver.otel
panels :
# ═══════════════════════════════════════════════════════════════════════
# NAVIGATION
# ═══════════════════════════════════════════════════════════════════════
- title : Navigation
size : { w : 48 , h : 3 }
links :
layout : horizontal
items :
- label : 📊 Overview
dashboard : k8s-cluster-overview
- label : ⚙️ Workloads
dashboard : k8s-cluster-workloads
- label : 📦 Resources
dashboard : k8s-cluster-resources
- label : 🔄 Batch Jobs
dashboard : k8s-cluster-batch
- label : 📈 Autoscaling
dashboard : k8s-cluster-hpa
# ═══════════════════════════════════════════════════════════════════════
# CLUSTER CAPACITY OVERVIEW
# ═══════════════════════════════════════════════════════════════════════
- title : Cluster Capacity
size : { w : 48 , h : 3 }
markdown :
content : '## 📊 Cluster Capacity (Requests vs Limits)'
font_size : 14
- title : CPU Requests vs Limits
size : { w : 24 , h : 12 }
lens :
type : line
data_view : metrics-*
dimension :
field : '@timestamp'
type : date_histogram
metrics :
- formula : sum(k8s.container.cpu_request)
label : CPU Requests
- formula : sum(k8s.container.cpu_limit)
label : CPU Limits
filters :
- exists : k8s.container.name
- title : Memory Requests vs Limits
size : { w : 24 , h : 12 }
lens :
type : line
data_view : metrics-*
dimension :
field : '@timestamp'
type : date_histogram
metrics :
- formula : sum(k8s.container.memory_request)
label : Memory Requests
format :
type : bytes
- formula : sum(k8s.container.memory_limit)
label : Memory Limits
format :
type : bytes
filters :
- exists : k8s.container.name
- title : Storage Requests vs Limits
size : { w : 24 , h : 12 }
lens :
type : line
data_view : metrics-*
dimension :
field : '@timestamp'
type : date_histogram
metrics :
- formula : sum(k8s.container.storage_request)
label : Storage Requests
format :
type : bytes
- formula : sum(k8s.container.storage_limit)
label : Storage Limits
format :
type : bytes
filters :
- exists : k8s.container.storage_request
- title : Resource Quota Usage
size : { w : 24 , h : 12 }
lens :
type : bar
mode : stacked
data_view : metrics-*
dimension :
field : resource
type : values
size : 10
label : Resource Type
metrics :
- field : k8s.resource_quota.used
aggregation : max
label : Used
- formula : max(k8s.resource_quota.hard_limit) - max(k8s.resource_quota.used)
label : Available
filters :
- exists : k8s.resource_quota.hard_limit
color :
palette : eui_amsterdam_color_blind
assignments :
- value : Used
color : '#6092C0'
- value : Available
color : '#54B399'
# ═══════════════════════════════════════════════════════════════════════
# NAMESPACE ALLOCATION
# ═══════════════════════════════════════════════════════════════════════
- title : Namespace Allocation
size : { w : 48 , h : 3 }
markdown :
content : '## 🏷️ Resource Allocation by Namespace'
font_size : 14
- title : CPU by Namespace
size : { w : 24 , h : 14 }
lens :
type : bar
mode : stacked
data_view : metrics-*
dimension :
field : k8s.namespace.name
type : values
size : 15
sort :
by : CPU Limits
direction : desc
metrics :
- formula : sum(k8s.container.cpu_request)
label : CPU Requests
- formula : sum(k8s.container.cpu_limit)
label : CPU Limits
filters :
- exists : k8s.container.name
- title : Memory by Namespace
size : { w : 24 , h : 14 }
lens :
type : bar
mode : stacked
data_view : metrics-*
dimension :
field : k8s.namespace.name
type : values
size : 15
sort :
by : Memory Limits
direction : desc
metrics :
- formula : sum(k8s.container.memory_request)
label : Memory Requests
format :
type : bytes
- formula : sum(k8s.container.memory_limit)
label : Memory Limits
format :
type : bytes
filters :
- exists : k8s.container.name
# ═══════════════════════════════════════════════════════════════════════
# POD RESOURCE DETAILS
# ═══════════════════════════════════════════════════════════════════════
- title : Pod Details
size : { w : 48 , h : 3 }
markdown :
content : '## 🔍 Pod Resource Details'
font_size : 14
- title : Pod Resource Summary
size : { w : 48 , h : 14 }
lens :
type : datatable
data_view : metrics-*
breakdowns :
- field : k8s.pod.name
type : values
size : 25
label : Pod
- field : k8s.namespace.name
type : values
size : 1
label : Namespace
metrics :
- formula : sum(k8s.container.cpu_request)
label : CPU Req
- formula : sum(k8s.container.cpu_limit)
label : CPU Lim
- formula : sum(k8s.container.memory_request)
label : Mem Req
format :
type : bytes
- formula : sum(k8s.container.memory_limit)
label : Mem Lim
format :
type : bytes
- formula : sum(k8s.container.restarts)
label : Restarts
format :
type : number
decimals : 0
filters :
- exists : k8s.container.name
Batch Jobs (04-batch-jobs.yaml)
---
# Kubernetes Batch Jobs Dashboard
# SRE Question: "Are my jobs completing successfully? What's failing?"
dashboards :
- id : k8s-cluster-batch
name : '[Metrics K8s Cluster] Batch Jobs'
description : Job and CronJob execution status and completion tracking
controls :
- type : options
label : Namespace
data_view : metrics-*
field : k8s.namespace.name
- type : options
label : Job
data_view : metrics-*
field : k8s.job.name
filters :
- field : data_stream.dataset
equals : kubernetesclusterreceiver.otel
panels :
# ═══════════════════════════════════════════════════════════════════════
# NAVIGATION
# ═══════════════════════════════════════════════════════════════════════
- title : Navigation
size : { w : 48 , h : 3 }
links :
layout : horizontal
items :
- label : 📊 Overview
dashboard : k8s-cluster-overview
- label : ⚙️ Workloads
dashboard : k8s-cluster-workloads
- label : 📦 Resources
dashboard : k8s-cluster-resources
- label : 🔄 Batch Jobs
dashboard : k8s-cluster-batch
- label : 📈 Autoscaling
dashboard : k8s-cluster-hpa
# ═══════════════════════════════════════════════════════════════════════
# JOB STATUS SUMMARY (4 metric cards)
# ═══════════════════════════════════════════════════════════════════════
- title : Job Status Summary
size : { w : 48 , h : 3 }
markdown :
content : '## 📋 Job Status Summary'
font_size : 14
- title : Successful Jobs
hide_title : true
size : { w : 12 , h : 4 }
lens :
type : metric
data_view : metrics-*
primary :
formula : sum(k8s.job.successful_pods)
label : Successful
format :
type : number
decimals : 0
filters :
- exists : k8s.job.name
- title : Failed Jobs
hide_title : true
size : { w : 12 , h : 4 }
lens :
type : metric
data_view : metrics-*
primary :
formula : sum(k8s.job.failed_pods)
label : Failed
format :
type : number
decimals : 0
filters :
- exists : k8s.job.name
- title : Active Jobs
hide_title : true
size : { w : 12 , h : 4 }
lens :
type : metric
data_view : metrics-*
primary :
formula : sum(k8s.job.active_pods)
label : Active
format :
type : number
decimals : 0
filters :
- exists : k8s.job.name
- title : Active CronJobs
hide_title : true
size : { w : 12 , h : 4 }
lens :
type : metric
data_view : metrics-*
primary :
formula : sum(k8s.cronjob.active_jobs)
label : Active
format :
type : number
decimals : 0
filters :
- exists : k8s.cronjob.name
# ═══════════════════════════════════════════════════════════════════════
# JOB EXECUTION TRENDS
# ═══════════════════════════════════════════════════════════════════════
- title : Job Execution Trends
size : { w : 48 , h : 3 }
markdown :
content : '## 📈 Job Execution Trends'
font_size : 14
- title : Job Success vs Failure Trend
size : { w : 48 , h : 14 }
lens :
type : area
mode : stacked
data_view : metrics-*
dimension :
field : '@timestamp'
type : date_histogram
metrics :
- formula : sum(k8s.job.successful_pods)
label : Successful
format :
type : number
decimals : 0
- formula : sum(k8s.job.failed_pods)
label : Failed
format :
type : number
decimals : 0
- formula : sum(k8s.job.active_pods)
label : Active
format :
type : number
decimals : 0
filters :
- exists : k8s.job.name
color :
palette : eui_amsterdam_color_blind
assignments :
- value : Successful
color : '#54B399'
- value : Failed
color : '#D36086'
- value : Active
color : '#6092C0'
# ═══════════════════════════════════════════════════════════════════════
# JOB DETAILS
# ═══════════════════════════════════════════════════════════════════════
- title : Job Details
size : { w : 48 , h : 3 }
markdown :
content : '## 🔍 Job Details'
font_size : 14
- title : Jobs by Status
size : { w : 48 , h : 14 }
lens :
type : datatable
data_view : metrics-*
breakdowns :
- field : k8s.job.name
type : values
size : 25
label : Job
- field : k8s.namespace.name
type : values
size : 1
label : Namespace
metrics :
- field : k8s.job.active_pods
aggregation : max
label : Active
format :
type : number
decimals : 0
- field : k8s.job.successful_pods
aggregation : max
label : Successful
format :
type : number
decimals : 0
- field : k8s.job.failed_pods
aggregation : max
label : Failed
format :
type : number
decimals : 0
- field : k8s.job.desired_successful_pods
aggregation : max
label : Desired
format :
type : number
decimals : 0
filters :
- exists : k8s.job.name
Autoscaling (05-autoscaling.yaml)
---
# Kubernetes Autoscaling Dashboard
# SRE Question: "Is autoscaling working? Am I hitting limits?"
dashboards :
- id : k8s-cluster-hpa
name : '[Metrics K8s Cluster] Autoscaling'
description : Horizontal Pod Autoscaler scaling behavior and capacity tracking
controls :
- type : options
label : Namespace
data_view : metrics-*
field : k8s.namespace.name
- type : options
label : HPA
data_view : metrics-*
field : k8s.hpa.name
filters :
- field : data_stream.dataset
equals : kubernetesclusterreceiver.otel
panels :
# ═══════════════════════════════════════════════════════════════════════
# NAVIGATION
# ═══════════════════════════════════════════════════════════════════════
- title : Navigation
size : { w : 48 , h : 3 }
links :
layout : horizontal
items :
- label : 📊 Overview
dashboard : k8s-cluster-overview
- label : ⚙️ Workloads
dashboard : k8s-cluster-workloads
- label : 📦 Resources
dashboard : k8s-cluster-resources
- label : 🔄 Batch Jobs
dashboard : k8s-cluster-batch
- label : 📈 Autoscaling
dashboard : k8s-cluster-hpa
# ═══════════════════════════════════════════════════════════════════════
# HPA STATUS SUMMARY (4 metric cards)
# ═══════════════════════════════════════════════════════════════════════
- title : HPA Status Summary
size : { w : 48 , h : 3 }
markdown :
content : '## 📈 HPA Status Summary'
font_size : 14
- title : Total HPAs
hide_title : true
size : { w : 12 , h : 4 }
lens :
type : metric
data_view : metrics-*
primary :
formula : unique_count(k8s.hpa.name)
label : HPAs
format :
type : number
decimals : 0
filters :
- exists : k8s.hpa.name
- title : Current Replicas
hide_title : true
size : { w : 12 , h : 4 }
lens :
type : metric
data_view : metrics-*
primary :
formula : sum(k8s.hpa.current_replicas)
label : Current
format :
type : number
decimals : 0
filters :
- exists : k8s.hpa.name
- title : Desired Replicas
hide_title : true
size : { w : 12 , h : 4 }
lens :
type : metric
data_view : metrics-*
primary :
formula : sum(k8s.hpa.desired_replicas)
label : Desired
format :
type : number
decimals : 0
filters :
- exists : k8s.hpa.name
- title : Max Replicas Limit
hide_title : true
size : { w : 12 , h : 4 }
lens :
type : metric
data_view : metrics-*
primary :
formula : sum(k8s.hpa.max_replicas)
label : Max Total
format :
type : number
decimals : 0
filters :
- exists : k8s.hpa.name
# ═══════════════════════════════════════════════════════════════════════
# SCALING BEHAVIOR
# ═══════════════════════════════════════════════════════════════════════
- title : Scaling Behavior
size : { w : 48 , h : 3 }
markdown :
content : '## 🔄 Scaling Behavior'
font_size : 14
- title : Scaling Activity (Current vs Desired)
size : { w : 24 , h : 14 }
lens :
type : line
data_view : metrics-*
dimension :
field : '@timestamp'
type : date_histogram
metrics :
- formula : sum(k8s.hpa.current_replicas)
label : Current Replicas
format :
type : number
decimals : 0
- formula : sum(k8s.hpa.desired_replicas)
label : Desired Replicas
format :
type : number
decimals : 0
filters :
- exists : k8s.hpa.name
- title : Capacity Headroom (Min / Current / Max)
size : { w : 24 , h : 14 }
lens :
type : line
data_view : metrics-*
dimension :
field : '@timestamp'
type : date_histogram
metrics :
- formula : sum(k8s.hpa.min_replicas)
label : Min Replicas
format :
type : number
decimals : 0
- formula : sum(k8s.hpa.current_replicas)
label : Current Replicas
format :
type : number
decimals : 0
- formula : sum(k8s.hpa.max_replicas)
label : Max Replicas
format :
type : number
decimals : 0
filters :
- exists : k8s.hpa.name
# ═══════════════════════════════════════════════════════════════════════
# HPA DETAILS
# ═══════════════════════════════════════════════════════════════════════
- title : HPA Details
size : { w : 48 , h : 3 }
markdown :
content : '## 🔍 HPA Configuration & Status'
font_size : 14
- title : HPA Status by Name
size : { w : 48 , h : 14 }
lens :
type : datatable
data_view : metrics-*
breakdowns :
- field : k8s.hpa.name
type : values
size : 25
label : HPA
- field : k8s.namespace.name
type : values
size : 1
label : Namespace
metrics :
- field : k8s.hpa.current_replicas
aggregation : max
label : Current
format :
type : number
decimals : 0
- field : k8s.hpa.desired_replicas
aggregation : max
label : Desired
format :
type : number
decimals : 0
- field : k8s.hpa.min_replicas
aggregation : max
label : Min
format :
type : number
decimals : 0
- field : k8s.hpa.max_replicas
aggregation : max
label : Max
format :
type : number
decimals : 0
filters :
- exists : k8s.hpa.name
Prerequisites
Kubernetes cluster : v1.24+
OpenTelemetry Collector : Contrib distribution with k8sclusterreceiver
Kibana : Version 8.x or later
Cluster admin permissions : For RBAC configuration
Data Requirements
Data stream dataset : kubernetesclusterreceiver.otel
Data view : metrics-*
OpenTelemetry Collector Configuration
Receiver Configuration
receivers :
k8s_cluster :
auth_type : serviceAccount
collection_interval : 10s
node_conditions_to_report : [ Ready ]
distribution : kubernetes
allocatable_types_to_report : [ cpu , memory , ephemeral-storage , storage ]
metadata_collection_interval : 5m
exporters :
elasticsearch :
endpoints : [ "https://elasticsearch:9200" ]
auth :
authenticator : basicauth
mapping :
mode : ecs
service :
pipelines :
metrics :
receivers : [ k8s_cluster ]
processors : [ batch , resourcedetection , resource ]
exporters : [ elasticsearch ]
Receiver Configuration Options
YAML Key
Type
Description
Default
auth_type
string
Kubernetes API authentication method (serviceAccount, kubeConfig)
serviceAccount
collection_interval
duration
Metric collection frequency
10s
node_conditions_to_report
list
Node conditions to monitor
[Ready]
distribution
string
Cluster type (kubernetes, openshift)
kubernetes
allocatable_types_to_report
list
Node resource types to report
[cpu, memory, ephemeral-storage, storage]
metadata_collection_interval
duration
Entity metadata collection frequency
5m
Metrics Reference
All metrics below are enabled by default.
Container Metrics
Metric
Type
Unit
Description
k8s.container.cpu_limit
Gauge
{cpu}
Maximum CPU resource limit for container
k8s.container.cpu_request
Gauge
{cpu}
CPU resources requested for container
k8s.container.memory_limit
Gauge
By
Maximum memory resource limit
k8s.container.memory_request
Gauge
By
Memory resources requested
k8s.container.storage_limit
Gauge
By
Maximum storage resource limit
k8s.container.storage_request
Gauge
By
Storage resources requested
k8s.container.ephemeralstorage_limit
Gauge
By
Maximum ephemeral storage limit
k8s.container.ephemeralstorage_request
Gauge
By
Ephemeral storage requested
k8s.container.ready
Gauge
—
Whether container passed readiness probe (0/1)
k8s.container.restarts
Gauge
{restart}
Container restart count
Pod Metrics
Metric
Type
Unit
Description
k8s.pod.phase
Gauge
—
Current pod phase (numeric encoding, see below)
Deployment Metrics
Metric
Type
Unit
Description
k8s.deployment.desired
Gauge
{pod}
Desired pod count in deployment
k8s.deployment.available
Gauge
{pod}
Available pods (ready for minReadySeconds)
StatefulSet Metrics
Metric
Type
Unit
Description
k8s.statefulset.desired_pods
Gauge
{pod}
Desired pods (spec.replicas)
k8s.statefulset.ready_pods
Gauge
{pod}
Pods with Ready condition
k8s.statefulset.current_pods
Gauge
{pod}
Pods created from StatefulSet version
k8s.statefulset.updated_pods
Gauge
{pod}
Pods created from current version
DaemonSet Metrics
Metric
Type
Unit
Description
k8s.daemonset.desired_scheduled_nodes
Gauge
{node}
Nodes that should run daemon pods
k8s.daemonset.current_scheduled_nodes
Gauge
{node}
Nodes running daemon pods as intended
k8s.daemonset.ready_nodes
Gauge
{node}
Nodes with ready daemon pods
k8s.daemonset.misscheduled_nodes
Gauge
{node}
Nodes running daemon pods incorrectly
ReplicaSet Metrics
Metric
Type
Unit
Description
k8s.replicaset.desired
Gauge
{pod}
Desired pod count in replicaset
k8s.replicaset.available
Gauge
{pod}
Available pods targeted by replicaset
Job Metrics
Metric
Type
Unit
Description
k8s.job.active_pods
Gauge
{pod}
Actively running job pods
k8s.job.desired_successful_pods
Gauge
{pod}
Desired successful pod count
k8s.job.successful_pods
Gauge
{pod}
Pods in Succeeded phase
k8s.job.failed_pods
Gauge
{pod}
Pods in Failed phase
k8s.job.max_parallel_pods
Gauge
{pod}
Maximum concurrent pods
CronJob Metrics
Metric
Type
Unit
Description
k8s.cronjob.active_jobs
Gauge
{job}
Count of actively running jobs
HPA Metrics
Metric
Type
Unit
Description
k8s.hpa.current_replicas
Gauge
{pod}
Current pod replicas managed by autoscaler
k8s.hpa.desired_replicas
Gauge
{pod}
Desired pod replicas for autoscaler
k8s.hpa.min_replicas
Gauge
{pod}
Minimum autoscaler replica count
k8s.hpa.max_replicas
Gauge
{pod}
Maximum autoscaler replica count
Resource Quota Metrics
Metric
Type
Unit
Description
Attributes
k8s.resource_quota.hard_limit
Gauge
{resource}
Upper resource limit in namespace quota
resource
k8s.resource_quota.used
Gauge
{resource}
Resource usage against quota
resource
Namespace Metrics
Metric
Type
Unit
Description
k8s.namespace.phase
Gauge
—
Current phase (1=active, 0=terminating)
Optional Metrics (disabled by default)
Metric
Type
Unit
Description
Attributes
k8s.container.status.reason
Sum
{container}
Container count by status reason
k8s.container.status.reason
k8s.container.status.state
Sum
{container}
Container count by state
k8s.container.status.state
k8s.node.condition
Gauge
{condition}
Node condition status
condition
k8s.pod.status_reason
Gauge
—
Pod status reason (numeric encoding)
—
Phase Value Encoding
The k8s.pod.phase metric uses numeric values:
Value
Phase
1
Pending
2
Running
3
Succeeded
4
Failed
5
Unknown
Metric Attributes
Attribute
Values
Description
resource
cpu, memory, pods, requests.cpu, requests.memory, limits.cpu, limits.memory
Resource quota type
k8s.container.status.reason
ContainerCreating, CrashLoopBackOff, CreateContainerConfigError, ErrImagePull, ImagePullBackOff, OOMKilled, Completed, Error, ContainerCannotRun
Container status reason
k8s.container.status.state
terminated, running, waiting
Container state
condition
Ready, MemoryPressure, PIDPressure, DiskPressure
Node condition
Metrics Not Used in Dashboards
The following metrics are available from the k8sclusterreceiver but are not currently visualized in the dashboards:
Default Metrics Not Used
Metric
Type
Unit
Description
k8s.container.ephemeralstorage_limit
Gauge
By
Maximum ephemeral storage limit
k8s.container.ephemeralstorage_request
Gauge
By
Ephemeral storage requested
k8s.statefulset.updated_pods
Gauge
{pod}
Pods created from current version
k8s.daemonset.current_scheduled_nodes
Gauge
{node}
Nodes running daemon pods as intended
k8s.daemonset.misscheduled_nodes
Gauge
{node}
Nodes running daemon pods incorrectly
k8s.job.max_parallel_pods
Gauge
{pod}
Maximum concurrent pods
k8s.namespace.phase
Gauge
—
Current phase (1=active, 0=terminating)
Optional Metrics Not Used
Metric
Type
Unit
Description
Attributes
k8s.container.status.reason
Sum
{container}
Container count by status reason
k8s.container.status.reason
k8s.container.status.state
Sum
{container}
Container count by state
k8s.container.status.state
k8s.node.condition
Gauge
{condition}
Node condition status
condition
k8s.pod.status_reason
Gauge
—
Pod status reason (numeric encoding)
—