采集指定虛擬節點的Metrics
更新時間 2025-05-27 17:21:54
最近更新時間: 2025-05-27 17:21:54
分享文章
本文介紹如何采集指定虛擬節點的Metrics。
本文介紹如何通過修改Prometheus監控配置來采集虛擬節點的Metrics。
背景信息
在天翼云Serverless集群虛擬節點的架構設計下,同一Serverless集群內的多個虛擬節點會共享同一個Node IP。由于Prometheus常通過Kubelet Service采集所有節點的Metrics,采集單個虛擬節點的數據會返回所有虛擬節點的全量數據,因此會出現Metrics重復的現象。為了解決這個問題,Serverless集群提供了采集指定虛擬節點的Metrics數據的能力,不但保留了原有的采集端點<nodeIP>:10250/metrics/cadvisor,并且會過濾指定nodeName的數據,避免重復采集數據。
前提條件
確保您已經創建Serverless集群,具體操作請參閱創建Serverless集群。
修改Prometheus監控配置
您可以通過修改監控配置來采集指定虛擬節點的Metrics。Serverless集群支持ccse-monitor插件以及開源Prometheus場景下的配置方式。
Serverless集群安裝ccse-monitor插件后配置默認支持采集虛擬節點的Metrics,無需額外操作。開源Promethues配置參考如下:
scrape_configs:
...
- job_name: cadvisor
honor_timestamps: true
scrape_interval: 40s
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
kubernetes_sd_configs:
- role: node
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: false
relabel_configs:
- source_labels: [ __meta_kubernetes_node_label_kubernetes_poseidon_daliqc.cn_collector_scrape ]
regex: true
action: keep
# 以__開頭的標簽會在relabel后被刪除,通過labelmap動作可改名。
- separator: ;
regex: __meta_kubernetes_node_label_(.+)
replacement: $$1
action: labelmap
- separator: ;
regex: (.*)
target_label: __address__
replacement: kubernetes.default.svc:443
action: replace
- source_labels: [ __meta_kubernetes_node_name ]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/$${1}/proxy/metrics/cadvisor
action: replace
- replacement: {{.Values.otelCollector.clusterName}}
target_label: cluster_name
action: replace
- replacement: {{.Values.otelCollector.regionCode}}
target_label: region_code
action: replace
- replacement: {{.Values.otelCollector.tenantCode}}
target_label: tenant_code
action: replace
- replacement: {{.Values.otelCollector.tenantId}}
target_label: tenant_id
action: replace
- replacement: {{.Values.otelCollector.tenantName}}
target_label: tenant_name
action: replace
- replacement: {{.Values.otelCollector.instanceId}}
target_label: instance_id
action: replace
- replacement: CCSE
target_label: carms_obj_type
action: replace
- replacement: {{.Values.otelCollector.instanceId}}
target_label: carms_obj_id
action: replace
- replacement: {{.Values.otelCollector.clusterName}}
target_label: carms_obj_name
action: replace
- replacement: {{.Values.otelCollector.regionName}}
target_label: region_name
action: replace
metric_relabel_configs:
- source_labels: [ __name__ ]
regex: (container_memory_failures_total|container_memory_rss|container_spec_memory_limit_bytes|container_memory_failcnt|container_memory_cache|container_memory_swap|container_memory_usage_bytes|container_memory_max_usage_bytes|container_cpu_load_average_10s|container_fs_reads_total|container_fs_writes_total|container_network_transmit_errors_total|container_network_transmit_packets_total|container_network_receive_errors_total|container_network_receive_bytes_total|container_network_receive_errors_total|container_network_transmit_errors_total|container_memory_working_set_bytes|container_cpu_usage_seconds_total|container_fs_reads_bytes_total|container_fs_writes_bytes_total|container_spec_cpu_quota|container_cpu_cfs_periods_total|container_cpu_cfs_throttled_periods_total|container_cpu_cfs_throttled_seconds_total|container_fs_inodes_free|container_fs_io_time_seconds_total|container_fs_io_time_weighted_seconds_total|container_fs_limit_bytes|container_tasks_state|container_fs_read_seconds_total|container_fs_write_seconds_total|container_fs_usage_bytes|container_fs_inodes_total|container_fs_io_current|machine_cpu_cores|machine_memory_bytes|container_network_transmit_bytes_total).*
action: keep