Skip to main content

OpenCost Metrics

This document provides a comprehensive reference of all metrics used and generated by OpenCost. It is organized into sections for better clarity and understanding.

Metrics Overview

OpenCost uses and generates several types of metrics:

TypeDescriptionSource
Consumed MetricsMetrics that OpenCost reads from other sourcesnode-exporter, kube-state-metrics
Generated MetricsMetrics that OpenCost produces for cost calculationsOpenCost cost model
Internal MetricsMetrics used internally by OpenCostOpenCost operations
External MetricsMetrics exposed for external consumptionOpenCost API

Required Metrics from External Sources

These metrics are required for OpenCost to function properly. They are typically provided by node-exporter and kube-state-metrics.

Node Metrics (from node-exporter)

MetricDescriptionRequiredPurpose
node_memory_MemTotal_bytesTotal memory available on the nodeYesMemory cost calculation
node_cpu_seconds_totalCPU time spent in different modesYesCPU cost calculation
node_filesystem_size_bytesSize of mounted filesystemsYesStorage cost calculation
node_filesystem_free_bytesFree space on mounted filesystemsYesStorage cost calculation

Kubernetes State Metrics (from kube-state-metrics)

MetricDescriptionRequiredPurpose
kube_node_status_capacityNode capacity informationYesNode resource allocation
kube_node_status_allocatableNode allocatable resourcesYesNode resource allocation
kube_pod_container_resource_requestsPod container resource requestsYesContainer cost allocation
kube_pod_container_resource_limitsPod container resource limitsYesContainer cost allocation
kube_persistentvolumeclaim_infoPVC informationYesStorage cost calculation
kube_persistentvolumeclaim_resource_requests_storage_bytesPVC storage requestsYesStorage cost calculation

Generated Cost Metrics

These are the metrics that OpenCost generates for cost calculations and monitoring.

Node Cost Metrics

MetricDescriptionLabelsUnit
node_cpu_hourly_costHourly cost per vCPUnode, instance, provider_idUSD/hour
node_gpu_hourly_costHourly cost per GPUnode, instance, provider_idUSD/hour
node_ram_hourly_costHourly cost per GiB of memorynode, instance, provider_idUSD/hour
node_total_hourly_costTotal node cost per hournode, instance, provider_idUSD/hour
node_gpu_countNumber of GPUs availablenode, instance, provider_idcount
kubecost_node_is_spotNode preemptibility statusnode, instance, provider_idboolean

Resource Allocation Metrics

MetricDescriptionLabelsUnit
container_cpu_allocationCPU allocation over last 1mcontainer, node, namespace, podcores
container_gpu_allocationGPU allocation over last 1mcontainer, node, namespace, podcount
container_memory_allocation_bytesMemory allocation over last 1mcontainer, node, namespace, podbytes
pod_pvc_allocationPVC allocationpersistentvolume, namespace, podbytes

Network Cost Metrics

MetricDescriptionLabelsUnit
kubecost_network_zone_egress_costCost per GiB zone egressnamespace, serviceUSD/GiB
kubecost_network_region_egress_costCost per GiB region egressnamespace, serviceUSD/GiB
kubecost_network_internet_egress_costCost per GiB internet egressnamespace, serviceUSD/GiB

Storage Cost Metrics

MetricDescriptionLabelsUnit
pv_hourly_costHourly cost per GiBpersistentvolumeUSD/hour
kubecost_load_balancer_costHourly load balancer costnamespace, serviceUSD/hour

Cluster Management Metrics

MetricDescriptionLabelsUnit
kubecost_cluster_management_costHourly cluster management feeclusterUSD/hour
kubecost_cluster_infoCluster informationcluster, providerinfo

Label Metrics

MetricDescriptionLabelsUnit
service_selector_labelsService Selector Labelsnamespace, servicelabels
deployment_match_labelsDeployment Match Labelsnamespace, deploymentlabels
statefulSet_match_labelsStatefulSet Match Labelsnamespace, statefulsetlabels

Internal Operation Metrics

MetricDescriptionLabelsUnit
kubecost_http_requests_totalTotal HTTP requestsendpoint, method, statuscount
kubecost_http_response_time_secondsResponse timeendpoint, methodseconds
kubecost_http_response_size_bytesResponse sizeendpoint, methodbytes
kubecost_cluster_memory_working_set_bytesCreated by recording ruleclusterbytes

Metric Labels

LabelDescriptionExample
nodeKubernetes node nameworker-1
namespaceKubernetes namespacedefault
podKubernetes pod namenginx-7d4cf4f754-2j8vw
containerContainer namenginx
clusterCluster identifierprod-cluster-1
instanceInstance identifier10.0.0.1:9100
provider_idCloud provider IDaws:///us-west-2a/i-1234567890abcdef0

Best Practices

Metric Filtering

When using your own Prometheus instance, you can filter metrics using the following patterns:

metricRelabelConfigs:
- sourceLabels: [__name__]
regex: '^(node_.*|kube_.*|container_.*|kubecost_.*)'
action: keep

Scrape Configuration

SettingRecommended ValueNotes
Scrape Interval1mMinimum recommended interval
Scrape Timeout10sMaximum time to wait for scrape
Honor LabelstruePreserve original metric labels

Label Usage

LabelStatusRecommendation
nodeCurrentUse this for node-based queries
instanceDeprecatedAvoid using in new queries

Example Queries

Cost Queries

Query TypePromQL QueryDescription
Total Cluster Costsum(node_total_hourly_cost) * 730Monthly cost of all nodes
Cost by Namespacesum by (namespace) (container_cpu_allocation * on (node) group_left node_cpu_hourly_cost + container_memory_allocation_bytes * on (node) group_left node_ram_hourly_cost / (1024 * 1024 * 1024))CPU and memory cost by namespace
Storage Costsum by (namespace) (pod_pvc_allocation * on (persistentvolume) group_left pv_hourly_cost / (1024 * 1024 * 1024))Storage cost by namespace

Advanced Cost Queries

Total cost of the cluster workload the last 30 days

sort_desc(
sum by (type, namespace) (
sum_over_time(
(
label_replace(
(
(
avg by (container, node, namespace, pod) (container_memory_allocation_bytes)
* on (node) group_left ()
avg by (node) (node_ram_hourly_cost)
)
/
(1024 * 1024 * 1024)
),
"type",
"ram",
"",
""
)
or
label_replace(
(
avg by (container, node, namespace, pod) (container_cpu_allocation)
* on (node) group_left ()
avg by (node) (node_cpu_hourly_cost)
),
"type",
"cpu",
"",
""
)
or
label_replace(
(
avg by (container, node, namespace, pod) (container_gpu_allocation)
* on (node) group_left ()
avg by (node) (node_gpu_hourly_cost)
),
"type",
"gpu",
"",
""
)

label_replace(
(
(
avg by (persistentvolume, namespace, pod) (pod_pvc_allocation)
* on (persistentvolume) group_left ()
avg by (persistentvolume) (pv_hourly_cost)
)
/
(1024 * 1024 * 1024)
),
"type",
"storage",
"",
""
)
)[30d:5m]
)
/
scalar(count_over_time(vector(1)[30d:5m]))
* 24 * 30
)
)

Query Parameters:

  • 30 Represents the duration over which the data is aggregated, which is 30 days in this case.
  • 5m Defines the accuracy of the data. Modify this to adjust precision:
    • Decrease (e.g., to 1m): Enhances accuracy. It's typically not recommended to set it below the Prometheus scraping interval (1m by default)
    • Increase Enhances the performance of the query.
  • sum by (type, namespace) controls the grouping, available options are container, namespace, node, pod, type

Hourly memory cost for the default namespace

sum(
avg(container_memory_allocation_bytes{namespace="default"}) by (instance) / 1024 / 1024 / 1024
*
on(instance) group_left() avg(node_ram_hourly_cost) by (instance)
)

Monthly cost of provisioned nodes

sum(sum_over_time(node_total_hourly_cost[30d:1h]))

Troubleshooting

IssueCheckSolution
Missing metricsVerify node-exporter and kube-state-metrics are runningEnsure required metrics are being collected
Incorrect costsCheck OpenCost configurationVerify pricing settings
High query latencyReview scrape interval and timeoutsAdjust Prometheus configuration
Label conflictsCheck for duplicate labelsUse metric relabeling rules

For more detailed troubleshooting, refer to the Troubleshooting Guide.

Documentation Distributed under CC BY 4.0.  The Linux Foundation® (TLF) has registered trademarks and uses trademarks. For a list of TLF trademarks, see: Trademark Usage.