OpenCost Metrics
This document provides a comprehensive reference of all metrics used and generated by OpenCost. It is organized into sections for better clarity and understanding.
Metrics Overview
OpenCost uses and generates several types of metrics:
Type | Description | Source |
---|---|---|
Consumed Metrics | Metrics that OpenCost reads from other sources | node-exporter, kube-state-metrics |
Generated Metrics | Metrics that OpenCost produces for cost calculations | OpenCost cost model |
Internal Metrics | Metrics used internally by OpenCost | OpenCost operations |
External Metrics | Metrics exposed for external consumption | OpenCost API |
Required Metrics from External Sources
These metrics are required for OpenCost to function properly. They are typically provided by node-exporter and kube-state-metrics.
Node Metrics (from node-exporter)
Metric | Description | Required | Purpose |
---|---|---|---|
node_memory_MemTotal_bytes | Total memory available on the node | Yes | Memory cost calculation |
node_cpu_seconds_total | CPU time spent in different modes | Yes | CPU cost calculation |
node_filesystem_size_bytes | Size of mounted filesystems | Yes | Storage cost calculation |
node_filesystem_free_bytes | Free space on mounted filesystems | Yes | Storage cost calculation |
Kubernetes State Metrics (from kube-state-metrics)
Metric | Description | Required | Purpose |
---|---|---|---|
kube_node_status_capacity | Node capacity information | Yes | Node resource allocation |
kube_node_status_allocatable | Node allocatable resources | Yes | Node resource allocation |
kube_pod_container_resource_requests | Pod container resource requests | Yes | Container cost allocation |
kube_pod_container_resource_limits | Pod container resource limits | Yes | Container cost allocation |
kube_persistentvolumeclaim_info | PVC information | Yes | Storage cost calculation |
kube_persistentvolumeclaim_resource_requests_storage_bytes | PVC storage requests | Yes | Storage cost calculation |
Generated Cost Metrics
These are the metrics that OpenCost generates for cost calculations and monitoring.
Node Cost Metrics
Metric | Description | Labels | Unit |
---|---|---|---|
node_cpu_hourly_cost | Hourly cost per vCPU | node, instance, provider_id | USD/hour |
node_gpu_hourly_cost | Hourly cost per GPU | node, instance, provider_id | USD/hour |
node_ram_hourly_cost | Hourly cost per GiB of memory | node, instance, provider_id | USD/hour |
node_total_hourly_cost | Total node cost per hour | node, instance, provider_id | USD/hour |
node_gpu_count | Number of GPUs available | node, instance, provider_id | count |
kubecost_node_is_spot | Node preemptibility status | node, instance, provider_id | boolean |
Resource Allocation Metrics
Metric | Description | Labels | Unit |
---|---|---|---|
container_cpu_allocation | CPU allocation over last 1m | container, node, namespace, pod | cores |
container_gpu_allocation | GPU allocation over last 1m | container, node, namespace, pod | count |
container_memory_allocation_bytes | Memory allocation over last 1m | container, node, namespace, pod | bytes |
pod_pvc_allocation | PVC allocation | persistentvolume, namespace, pod | bytes |
Network Cost Metrics
Metric | Description | Labels | Unit |
---|---|---|---|
kubecost_network_zone_egress_cost | Cost per GiB zone egress | namespace, service | USD/GiB |
kubecost_network_region_egress_cost | Cost per GiB region egress | namespace, service | USD/GiB |
kubecost_network_internet_egress_cost | Cost per GiB internet egress | namespace, service | USD/GiB |
Storage Cost Metrics
Metric | Description | Labels | Unit |
---|---|---|---|
pv_hourly_cost | Hourly cost per GiB | persistentvolume | USD/hour |
kubecost_load_balancer_cost | Hourly load balancer cost | namespace, service | USD/hour |
Cluster Management Metrics
Metric | Description | Labels | Unit |
---|---|---|---|
kubecost_cluster_management_cost | Hourly cluster management fee | cluster | USD/hour |
kubecost_cluster_info | Cluster information | cluster, provider | info |
Label Metrics
Metric | Description | Labels | Unit |
---|---|---|---|
service_selector_labels | Service Selector Labels | namespace, service | labels |
deployment_match_labels | Deployment Match Labels | namespace, deployment | labels |
statefulSet_match_labels | StatefulSet Match Labels | namespace, statefulset | labels |
Internal Operation Metrics
Metric | Description | Labels | Unit |
---|---|---|---|
kubecost_http_requests_total | Total HTTP requests | endpoint, method, status | count |
kubecost_http_response_time_seconds | Response time | endpoint, method | seconds |
kubecost_http_response_size_bytes | Response size | endpoint, method | bytes |
kubecost_cluster_memory_working_set_bytes | Created by recording rule | cluster | bytes |
Metric Labels
Label | Description | Example |
---|---|---|
node | Kubernetes node name | worker-1 |
namespace | Kubernetes namespace | default |
pod | Kubernetes pod name | nginx-7d4cf4f754-2j8vw |
container | Container name | nginx |
cluster | Cluster identifier | prod-cluster-1 |
instance | Instance identifier | 10.0.0.1:9100 |
provider_id | Cloud provider ID | aws:///us-west-2a/i-1234567890abcdef0 |
Best Practices
Metric Filtering
When using your own Prometheus instance, you can filter metrics using the following patterns:
metricRelabelConfigs:
- sourceLabels: [__name__]
regex: '^(node_.*|kube_.*|container_.*|kubecost_.*)'
action: keep
Scrape Configuration
Setting | Recommended Value | Notes |
---|---|---|
Scrape Interval | 1m | Minimum recommended interval |
Scrape Timeout | 10s | Maximum time to wait for scrape |
Honor Labels | true | Preserve original metric labels |
Label Usage
Label | Status | Recommendation |
---|---|---|
node | Current | Use this for node-based queries |
instance | Deprecated | Avoid using in new queries |
Example Queries
Cost Queries
Query Type | PromQL Query | Description |
---|---|---|
Total Cluster Cost | sum(node_total_hourly_cost) * 730 | Monthly cost of all nodes |
Cost by Namespace | sum by (namespace) (container_cpu_allocation * on (node) group_left node_cpu_hourly_cost + container_memory_allocation_bytes * on (node) group_left node_ram_hourly_cost / (1024 * 1024 * 1024)) | CPU and memory cost by namespace |
Storage Cost | sum by (namespace) (pod_pvc_allocation * on (persistentvolume) group_left pv_hourly_cost / (1024 * 1024 * 1024)) | Storage cost by namespace |
Advanced Cost Queries
Total cost of the cluster workload the last 30 days
sort_desc(
sum by (type, namespace) (
sum_over_time(
(
label_replace(
(
(
avg by (container, node, namespace, pod) (container_memory_allocation_bytes)
* on (node) group_left ()
avg by (node) (node_ram_hourly_cost)
)
/
(1024 * 1024 * 1024)
),
"type",
"ram",
"",
""
)
or
label_replace(
(
avg by (container, node, namespace, pod) (container_cpu_allocation)
* on (node) group_left ()
avg by (node) (node_cpu_hourly_cost)
),
"type",
"cpu",
"",
""
)
or
label_replace(
(
avg by (container, node, namespace, pod) (container_gpu_allocation)
* on (node) group_left ()
avg by (node) (node_gpu_hourly_cost)
),
"type",
"gpu",
"",
""
)
label_replace(
(
(
avg by (persistentvolume, namespace, pod) (pod_pvc_allocation)
* on (persistentvolume) group_left ()
avg by (persistentvolume) (pv_hourly_cost)
)
/
(1024 * 1024 * 1024)
),
"type",
"storage",
"",
""
)
)[30d:5m]
)
/
scalar(count_over_time(vector(1)[30d:5m]))
* 24 * 30
)
)
Query Parameters:
30
Represents the duration over which the data is aggregated, which is 30 days in this case.5m
Defines the accuracy of the data. Modify this to adjust precision:- Decrease (e.g., to
1m
): Enhances accuracy. It's typically not recommended to set it below the Prometheus scraping interval (1m
by default) - Increase Enhances the performance of the query.
- Decrease (e.g., to
sum by (type, namespace)
controls the grouping, available options arecontainer, namespace, node, pod, type
Hourly memory cost for the default namespace
sum(
avg(container_memory_allocation_bytes{namespace="default"}) by (instance) / 1024 / 1024 / 1024
*
on(instance) group_left() avg(node_ram_hourly_cost) by (instance)
)
Monthly cost of provisioned nodes
sum(sum_over_time(node_total_hourly_cost[30d:1h]))
Troubleshooting
Issue | Check | Solution |
---|---|---|
Missing metrics | Verify node-exporter and kube-state-metrics are running | Ensure required metrics are being collected |
Incorrect costs | Check OpenCost configuration | Verify pricing settings |
High query latency | Review scrape interval and timeouts | Adjust Prometheus configuration |
Label conflicts | Check for duplicate labels | Use metric relabeling rules |
For more detailed troubleshooting, refer to the Troubleshooting Guide.