OpenCost Metrics

This document provides a comprehensive reference of all metrics used and generated by OpenCost. It is organized into sections for better clarity and understanding.

Metrics Overview

OpenCost uses and generates several types of metrics:

Type	Description	Source
Consumed Metrics	Metrics that OpenCost reads from other sources	node-exporter, kube-state-metrics
Generated Metrics	Metrics that OpenCost produces for cost calculations	OpenCost cost model
Internal Metrics	Metrics used internally by OpenCost	OpenCost operations
External Metrics	Metrics exposed for external consumption	OpenCost API

Required Metrics from External Sources

These metrics are required for OpenCost to function properly. They are typically provided by node-exporter and kube-state-metrics.

Node Metrics (from node-exporter)

Metric	Description	Required	Purpose
`node_memory_MemTotal_bytes`	Total memory available on the node	Yes	Memory cost calculation
`node_cpu_seconds_total`	CPU time spent in different modes	Yes	CPU cost calculation
`node_filesystem_size_bytes`	Size of mounted filesystems	Yes	Storage cost calculation
`node_filesystem_free_bytes`	Free space on mounted filesystems	Yes	Storage cost calculation

Kubernetes State Metrics (from kube-state-metrics)

Metric	Description	Required	Purpose
`kube_node_status_capacity`	Node capacity information	Yes	Node resource allocation
`kube_node_status_allocatable`	Node allocatable resources	Yes	Node resource allocation
`kube_pod_container_resource_requests`	Pod container resource requests	Yes	Container cost allocation
`kube_pod_container_resource_limits`	Pod container resource limits	Yes	Container cost allocation
`kube_persistentvolumeclaim_info`	PVC information	Yes	Storage cost calculation
`kube_persistentvolumeclaim_resource_requests_storage_bytes`	PVC storage requests	Yes	Storage cost calculation

Generated Cost Metrics

These are the metrics that OpenCost generates for cost calculations and monitoring.

Node Cost Metrics

Metric	Description	Labels	Unit
`node_cpu_hourly_cost`	Hourly cost per vCPU	node, instance, provider_id	USD/hour
`node_gpu_hourly_cost`	Hourly cost per GPU	node, instance, provider_id	USD/hour
`node_ram_hourly_cost`	Hourly cost per GiB of memory	node, instance, provider_id	USD/hour
`node_total_hourly_cost`	Total node cost per hour	node, instance, provider_id	USD/hour
`node_gpu_count`	Number of GPUs available	node, instance, provider_id	count
`kubecost_node_is_spot`	Node preemptibility status	node, instance, provider_id	boolean

Resource Allocation Metrics

Metric	Description	Labels	Unit
`container_cpu_allocation`	CPU allocation over last 1m	container, node, namespace, pod	cores
`container_gpu_allocation`	GPU allocation over last 1m	container, node, namespace, pod	count
`container_memory_allocation_bytes`	Memory allocation over last 1m	container, node, namespace, pod	bytes
`pod_pvc_allocation`	PVC allocation	persistentvolume, namespace, pod	bytes

Network Cost Metrics

Metric	Description	Labels	Unit
`kubecost_network_zone_egress_cost`	Cost per GiB zone egress	namespace, service	USD/GiB
`kubecost_network_region_egress_cost`	Cost per GiB region egress	namespace, service	USD/GiB
`kubecost_network_internet_egress_cost`	Cost per GiB internet egress	namespace, service	USD/GiB

Storage Cost Metrics

Metric	Description	Labels	Unit
`pv_hourly_cost`	Hourly cost per GiB	persistentvolume	USD/hour
`kubecost_load_balancer_cost`	Hourly load balancer cost	namespace, service	USD/hour

Cluster Management Metrics

Metric	Description	Labels	Unit
`kubecost_cluster_management_cost`	Hourly cluster management fee	cluster	USD/hour
`kubecost_cluster_info`	Cluster information	cluster, provider	info

Label Metrics

Metric	Description	Labels	Unit
`service_selector_labels`	Service Selector Labels	namespace, service	labels
`deployment_match_labels`	Deployment Match Labels	namespace, deployment	labels
`statefulSet_match_labels`	StatefulSet Match Labels	namespace, statefulset	labels

Internal Operation Metrics

Metric	Description	Labels	Unit
`kubecost_http_requests_total`	Total HTTP requests	endpoint, method, status	count
`kubecost_http_response_time_seconds`	Response time	endpoint, method	seconds
`kubecost_http_response_size_bytes`	Response size	endpoint, method	bytes
`kubecost_cluster_memory_working_set_bytes`	Created by recording rule	cluster	bytes

Metric Labels

Label	Description	Example
`node`	Kubernetes node name	`worker-1`
`namespace`	Kubernetes namespace	`default`
`pod`	Kubernetes pod name	`nginx-7d4cf4f754-2j8vw`
`container`	Container name	`nginx`
`cluster`	Cluster identifier	`prod-cluster-1`
`instance`	Instance identifier	`10.0.0.1:9100`
`provider_id`	Cloud provider ID	`aws:///us-west-2a/i-1234567890abcdef0`

Best Practices

Metric Filtering

When using your own Prometheus instance, you can filter metrics using the following patterns:

metricRelabelConfigs:
- sourceLabels: [__name__]
  regex: '^(node_.*|kube_.*|container_.*|kubecost_.*)'
  action: keep

Scrape Configuration

Setting	Recommended Value	Notes
Scrape Interval	1m	Minimum recommended interval
Scrape Timeout	10s	Maximum time to wait for scrape
Honor Labels	true	Preserve original metric labels

Label Usage

Label	Status	Recommendation
`node`	Current	Use this for node-based queries
`instance`	Deprecated	Avoid using in new queries

Example Queries

Cost Queries

Query Type	PromQL Query	Description
Total Cluster Cost	`sum(node_total_hourly_cost) * 730`	Monthly cost of all nodes
Cost by Namespace	`sum by (namespace) (container_cpu_allocation * on (node) group_left node_cpu_hourly_cost + container_memory_allocation_bytes * on (node) group_left node_ram_hourly_cost / (1024 * 1024 * 1024))`	CPU and memory cost by namespace
Storage Cost	`sum by (namespace) (pod_pvc_allocation * on (persistentvolume) group_left pv_hourly_cost / (1024 * 1024 * 1024))`	Storage cost by namespace

Advanced Cost Queries

Total cost of the cluster workload the last 30 days

sort_desc(
  sum by (type, namespace) (
          sum_over_time(
            (
                    label_replace(
                      (
                          (
                              avg by (container, node, namespace, pod) (container_memory_allocation_bytes)
                            * on (node) group_left ()
                              avg by (node) (node_ram_hourly_cost)
                          )
                        /
                          (1024 * 1024 * 1024)
                      ),
                      "type",
                      "ram",
                      "",
                      ""
                    )
                  or
                    label_replace(
                      (
                          avg by (container, node, namespace, pod) (container_cpu_allocation)
                        * on (node) group_left ()
                          avg by (node) (node_cpu_hourly_cost)
                      ),
                      "type",
                      "cpu",
                      "",
                      ""
                    )
                or
                  label_replace(
                    (
                        avg by (container, node, namespace, pod) (container_gpu_allocation)
                      * on (node) group_left ()
                        avg by (node) (node_gpu_hourly_cost)
                    ),
                    "type",
                    "gpu",
                    "",
                    ""
                  )
              
                label_replace(
                  (
                      (
                          avg by (persistentvolume, namespace, pod) (pod_pvc_allocation)
                        * on (persistentvolume) group_left ()
                          avg by (persistentvolume) (pv_hourly_cost)
                      )
                    /
                      (1024 * 1024 * 1024)
                  ),
                  "type",
                  "storage",
                  "",
                  ""
                )
            )[30d:5m]
          )
        /
          scalar(count_over_time(vector(1)[30d:5m]))
      * 24 * 30
  )
)

Query Parameters:

30 Represents the duration over which the data is aggregated, which is 30 days in this case.
5m Defines the accuracy of the data. Modify this to adjust precision:
- Decrease (e.g., to 1m): Enhances accuracy. It's typically not recommended to set it below the Prometheus scraping interval (1m by default)
- Increase Enhances the performance of the query.
sum by (type, namespace) controls the grouping, available options are container, namespace, node, pod, type

Hourly memory cost for the default namespace

sum(
  avg(container_memory_allocation_bytes{namespace="default"}) by (instance) / 1024 / 1024 / 1024
  *
  on(instance) group_left() avg(node_ram_hourly_cost) by (instance)
)

Monthly cost of provisioned nodes

sum(sum_over_time(node_total_hourly_cost[30d:1h]))

Troubleshooting

Issue	Check	Solution
Missing metrics	Verify node-exporter and kube-state-metrics are running	Ensure required metrics are being collected
Incorrect costs	Check OpenCost configuration	Verify pricing settings
High query latency	Review scrape interval and timeouts	Adjust Prometheus configuration
Label conflicts	Check for duplicate labels	Use metric relabeling rules

For more detailed troubleshooting, refer to the Troubleshooting Guide.

Metrics Overview​

Required Metrics from External Sources​

Node Metrics (from node-exporter)​

Kubernetes State Metrics (from kube-state-metrics)​

Generated Cost Metrics​

Node Cost Metrics​

Resource Allocation Metrics​

Network Cost Metrics​

Storage Cost Metrics​

Cluster Management Metrics​

Label Metrics​

Internal Operation Metrics​

Metric Labels​

Best Practices​

Metric Filtering​

Scrape Configuration​

Label Usage​

Example Queries​

Cost Queries​

Advanced Cost Queries​

Total cost of the cluster workload the last 30 days​

Hourly memory cost for the default namespace​

Monthly cost of provisioned nodes​

Troubleshooting​