功能概述--云监控-火山引擎

文档中心

云监控

导出监控数据到 Prometheus

功能概述

导出监控数据到Prometheus

云监控支持导出监控数据到托管Prometheus和自建Prometheus。

操作视频

导出监控数据到托管Prometheus

云监控是火山引擎云上一站式监控告警解决方案，可以帮助您收集并可视化展示火山引擎上多种类型云产品的资源状态和监控信息，包括云服务器、集群、网关、专线连接、云存储等。
您可以通过云监控提供的监控数据导出工具Exporter，将云监控中的云产品监控数据实时导出到托管Prometheus，与集群中的业务监控数据进行汇聚，实现业务数据和基础资源数据的统一监控。
同时，还可以借助托管Prometheus与Grafana的天然兼容性，使用Grafana搭建业务相关的大盘。详情请参见导出云监控数据到托管Prometheus。
导出数据到托管Prometheus的详细操作，请参见导出云监控数据到托管Prometheus。

导出监控数据到自建Prometheus

如果您在云下已有完善的企业管理系统，可以通过云监控提供的监控数据导出工具Exportēr，将云上数据实时导出到线下进行统一监控，并持续写入Prometheus。Exporter安装在客户集群，通过OpenAPI调用云监控指标数据，以Prometheus协议对外吐出指标，您可以配置Prometheus采集任务进行采集。

注意

该功能处于公测阶段，暂时免费使用。如需使用该功能，请先开通按量计费，然后联系对应销售或售后人员进行开白。
Exporter 运行时使用的 AK/SK 需要至少拥有云监控的读权限（CloudMonitorReadOnlyAccess），否则可能因权限不足导致请求失败，无法获取实例与监控数据。
此外，如果您需要调用的指标和对象较多，可能会因为限频导致拉取失败，建议尽量将请求按照时间维度均摊。
Exporter 通过请求GetMetricData接口获取监控数据，会占用接口Quota，超限后获取指标会被限制。

使用说明

安装Exporter需要将AK、SK作为运行参数。详情请参见访问密钥使用指南。
Exporter导出的数据是Gauge类型。

目前仅支持导出部分云产品。支持的Namespace：

产品分类	产品名称	Namespace
网络	NAT网关	VCM_NAT
	负载均衡	VCM_CLB
	应用型负载均衡	VCM_ALB
	公网IP	VCM_EIP
	共享带宽包	VCM_BandwidthPackage
	云企业网	VCM_CEN
	专线连接-专线网关	VCM_DirectConnectGateway
	专线连接-虚拟接口	VCM_DirectConnectVIF
	互联网通道-公网带宽	VCM_InternetTunnelBandwidth
	互联网通道虚拟接口	VCM_InternetTunnelVirtualInterface
	私网连接-私网连接网关注意该产品region需要配置为`no-region`。	VCM_PrivateLinkGateway
	中转路由器	VCM_TransitRouter
	中转路由器带宽包	VCM_TransitRouterBandwidthPackage
存储	对象存储注意 Subnamespace为 account_overview的指标，不支持通过Exporter导出数据到Prometheus。	VCM_TOS
存储	文件存储vePFS	VCM_vePFS
数据库	云数据库MySQL版	VCM_RDS_MySQL
	云数据库veDB MySQL版	VCM_veDB_MySQL
	云数据库PostgreSQL版	VCM_RDS_PostgreSQL
	缓存数据库Redis版	VCM_Redis
	文档数据库MongoDB版-副本集	VCM_MongoDB_Replica
	文档数据库MongoDB版-分片集	VCM_MongoDB_Sharded_Cluster
弹性计算	云服务器	VCM_ECS
中间件	消息队列Kafka版	VCM_Kafka
	消息队列RocketMQ版	VCM_RocketMQ
	消息队列RabbitMQ版	VCM_RabbitMQ
	云原生消息引擎	VCM_BMQ

给定Namespace、SubNamespace、MetricName才能唯一标识一个指标，因为指标名在不同云产品下可以重名。
- Namespace：云产品。各云产品的Namespace，请参见云产品监控指标。
- SubNamespace：云产品下的指标细分分类。详情请参见云监控指标查询下各云产品的Subnamespace字段。
- MetricName：指标名。详情请参见云监控指标查询下各云产品的MetricName字段。

步骤一：安装Exporter

k8s安装Exporter

Docker安装Exporter

虚机安装Exporter

在k8s集群中创建配置项。

apiVersion: v1
kind: ConfigMap
metadata:
  name: cloud-monitor-exporter-conf
data:
  conf.yaml: |
    Region: "cn-beijing"
    Credentials:
      AccessKey: "********"
      SecretKey: "********"
    Namespaces:
      - VCM_ECS
    SubNamespaces:
      VCM_ECS:
        - Instance
    Metrics:
      VCM_ECS:
        Instance:
          - Instance_CpuBusy

根据实际情况配置yaml文件，然后在您的k8s集群中安装Exporter。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: volc-cloud-monitor-exporter
spec:
  replicas: 1
  selector:
    matchLabels:
      app: volc-cloud-monitor-exporter
  template:
    metadata:
      labels:
        app: volc-cloud-monitor-exporter
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9988"
    spec:
      containers:
      - name: volc-cloud-monitor-exporter
        image: cloud-monitor-cn-beijing.cr.volces.com/cm/cloud-monitor-exporter:1.0.17-rc0
        args:
          - "--config=/conf/conf.yaml"
          - "--port=9988"
        resources:
          limits:
            memory: "128Mi"
            cpu: "500m"
        ports:
        - containerPort: 9988
        volumeMounts:
          - name: conf
            mountPath: /conf
      volumes:
        - name: conf
          configMap:
            name: cloud-monitor-exporter-conf

创建文件config.yaml，并根据实际情况进行修改。

Region: "cn-beijing"
Credentials:
  AccessKey: "********"
  SecretKey: "********"
Namespaces:
  - VCM_ECS
SubNamespaces:
  VCM_ECS:
    - Instance
Metrics:
  VCM_ECS:
    Instance:
      - Instance_CpuBusy

在配置文件所在目录，执行以下命令，运行docker镜像。

docker run -itd -p 9988:9988 -v $(pwd)/config.yaml:/conf/conf.yaml cloud-monitor-cn-beijing.cr.volces.com/cm/cloud-monitor-exporter:1.0.17-rc0 --config /conf/conf.yaml --port=9988

下载二进制产物，解压并进入产物目录。
byteapm.cloud_monitor.cloud_monitor_exporter_1.0.17-rc0.tar.gz
未知大小

在产物目录下，创建文件config.yaml，并根据实际情况进行修改。

Region: "cn-beijing"
Credentials:
  AccessKey: "********"
  SecretKey: "********"
Namespaces:
  - VCM_ECS
SubNamespaces:
  VCM_ECS:
    - Instance
Metrics:
  VCM_ECS:
    Instance:
      - Instance_CpuBusy

在产物目录下，执行以下命令，启动Exporter。
```
./exporter --config config.yaml --port=9988
```

步骤二：配置采集任务

以下配置示例仅供参考，具体配置方式请根据部署情况酌情修改。

普通采集任务

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9988"] # 指向Exporter部署的IP:Port

（可选）信任云监控时间点

Prometheus的采集逻辑是将采集的时间作为指标数据的时间戳。如果您希望使用云监控的时间戳，请修改Prometheus的配置honor_timestamps，scrape_configs.jobs.honor_timestamps = true。

scrape_configs:
  - job_name: "prometheus"
    # honor_timestamps: 是否信任Exporter的数据点, 默认为false
    honor_timestamps: true

（可选）允许RemoteWrite并开启乱序写入

Prometheus限制了数据的写入规则，不允许写入早于Prometheus最后采集的数据点的数据。因此数据补点和BackFilling需要通过乱序写入来完成。

打开乱序写入的窗口。

storage:
  tsdb:
    out_of_order_time_window: 10m

开启Prometheus的RemoteWrite能力。

# 在Prometheus的启动项中增加: --web.enable-remote-write-receiver参数
./prometheus --web.enable-remote-write-receiver

步骤三：验证数据

参数说明：

参数	说明
指标名	Exporter导出的指标名为metricName_subNamespace_namespace。示例中Namespace为`VCM_Redis`，SubNamespace为`aggregated_server`，MetricName为`AggregatedMemUtil`，所以实际导出的指标名是`AggregatedMemUtil_aggregated_server_VCM_Redis`。
指标数值	数值对应的value是Float类型，如果需要显示小数点后两位，请手动处理。
时间戳	`GetMetricData`动作返回的时间。产品当前指标的时间=GetMetricData返回的时间-DelaySeconds时间。例如，调用`GetMetricData`接口查询指标的时间是10:00:00，用户配置的DelaySeconds为300秒(5分钟)，那么10:05:00时才会查询到10:00:00的数据。

步骤四：完成更多配置

启动参数

-config string
        配置文件路径 (default "/conf/config.yaml")
  -enable-self-metrics
        是否启用自监控指标, 可选true/false (default true)
  -host string
        exporter 监听 EntPoint (default "0.0.0.0")
  -log-level string
        日志级别, 可选:
        - panic
        - fatal
        - error
        - warn/warning
        - info
        - debug
        - trace
        对大小写不敏感 (default "info")
  -port int
        exporter 监听端口 (default 9898)

配置文件

Exporter提供了高度自定义的配置，不仅提供了简单场景下的简易配置能力，还可以适配多种极端场景。

全量配置

Credentials:
  AccessKey: "********"
  SecretKey: "********"
Region: "cn-beijing"
Host: open.volcengineapi.com
MaxRetries: 3
BatchInstanceCount: 50
DataQueueSize: 100
DataConcurrency: 10
MetaFreshIntervalSeconds: 60
DataFreshIntervalSeconds: 60
Limiter:
  LimitQPSGetMetricData: 9
  LimitQPSMonitorObjectList: 9
EnableDynamicDelaySeconds: false
BehaviorWhenPullTimeout: WaitAndSkip
Namespaces:
  - VCM_ECS
SubNamespaces:
  VCM_ECS:
    - Instance
Metrics:
  VCM_ECS:
    Instance:
      - Instance_CpuBusy

config配置说明

注意

如果云产品是全域产品，没有地域限制，Exporter配置中的Region必须设置为no-region。
subnamespace的取值请参见各云产品指标文档的SubNamespace列。详情请参见云监控指标查询。
metricname的取值请参见各云产品指标文档的MetricName列。详情请参见云监控指标查询。

# 用户的认证信息
Credentials: <Credentials>

# 指定 Exporter 发起 API 请求的域名
[ Host: <String> | default = "open.volcengineapi.com" ]

# 获取指定 Region 的数据
[ Region: <String> | default = "cn-beijing"]

# 调用 API 的 5xx 状态码的重试次数.
[ MaxRetries: <Integer> | default = 3, range = [1, 3] ]

# 每次拉取指标的请求最大实例数量
[ BatchInstanceCount: <Integer> | default = 50, range = [1, 50] ]

# 内部请求体的队列长度
[ DataQueueSize: <Integer> | default = 100, range = [1, +] ]

# 执行拉取的线程数
[ DataConcurrency: <Integer> | default = 10, range = [1, +] ]

# 元数据拉取的刷新间隔
[ MetaFreshIntervalSeconds: <Seconds> | default = 60, range = [30, +] ]

# 指标数据拉取的间隔
[ DataFreshIntervalSeconds: <Seconds> | default = 60, range = [30, +] ]

# 对请求的限制配置
[ Limiter: <Limiter> ]

#
# 轮次超时: 是指指标拉取的时间超过了配置的 DataFreshIntervalSeconds. 
#
# 此配置描述 Exporter 在发生轮次超时后, 应该如何处理当前轮次和下一轮次的数据拉取.
#
# - SkipMissExecute: 本次拉取完成后, 跳过没有执行的轮次, 直接执行最新拉取.
# - ContinueMissExecute: 本次拉取完成后, 继续执行没有执行的轮次.
# - StopAndBackFilling: 停止本次拉取, 直接开始下一次拉取, 并把当前未完成的拉取请求放在离线队列中, 
#   然后通过 RemoteWrite 的方式写入 Prometheus.
[ BehaviorWhenPullTimeout: <String> | default = "" ]

# 回填配置, 仅在 BehaviorWhenPullTimeout = StopAndBackFilling 的时候, 才会生效.
[ OfflinePullConfig: <OfflinePullConfig> ]

# 需要导出的产品列表, 如果没有声明产品列表, Exporter 不会导出指标. 
Namespace: <NamespaceConfig ...>

# 需要导出的指标维度, 如果 SubNamespace 没有填写, 则默认导出全部产品. 否则在 SubNamespace
# 中的维度才能导出.
[ SubNamespaces: <String: String...> ]

# 排除维度配置. 如果维度在 ExcludeSubNamespace 中, 则不会导出(即使在 SubNamespace 声明
# 也不会导出)
[ ExcludeSubNamespaces: <String: String...> ]

# 指标配置, 如果 Metrics 配置为空, 则默认导出 Namespace 和 SubNamespace 中声明的所有指
# 标.
[ Metrics: <String: <String: String...>> ]

Credentials配置说明

# 用户认证的 AccessKey
AccessKey: <String>

# 用户认证的 SecretKey
SecretKey: <String>

Limiter配置说明

# 调用云监控指标接口的 QPS
[ LimitQPSGetMetricData: <Integer> | default = ]

# 获取云监控指标元数据的 QPS
[ LimitQPSMonitorObjectList: <Integer> | default = ]

OfflinePullConfig配置说明

# 回填线程的请求 QPS 占总请求的比例.
[ RequestQPSRatio: <Float> | default = 0.3, range = [0.1, 0.9] ]
  
# 回填数据的并发数, 如果设置为 0, 则会被设置为 Config.DataConcurrency
[ Concurrency: <Integer> | default = 0, range = [0, +] ]
  
# 回填数据的请求池大小, 如果设置为 0, 则会被设置为 Config.DataQueueSize
[ DataQueueSize: <Integer> | default = 0, range = [0, +] ]

# 最大有效时间, 超过该时间的请求会被丢弃.
#
# 指标数据的拉取都会携带时间范围, 数据拉取的结果是请求的时间范围中的, 如果预估指标拉取的时间落
# 后于当前时间 - MaxAllowedOfflineDelayTime, 该请求会被丢弃. 
# MaxAllowedOfflineDelayTime 必须小于 Prometheus 配置的乱序写入时间范围, 否则会出现写入
# 失败的问题. Prometheus 乱序写入需要高版本的 Prometheus 支持.
#
# Prometheus OTLP: 
# https://prometheus.io/docs/prometheus/latest/querying/api/#otlp-receiver 
#
# Prometheus 提供了乱序写入的能力, 需要配置开启, 另外需要信任Exporter导出指标的时间戳
# > --------- prometheus.yml ---------------
# > scrape_config:
# >   honor_timestamps: true
# >
# > storage:
# >   tsdb:
# >     out_of_order_time_window: <Duration>
# > --------- prometheus.yml ---------------
[ PromOutOfOrderTimeWindow: <Duration> | default = ? ]
  
# Prometheus 的 RemoteWrite 地址.
#
# Exporter 通过 RemoteWrite 的方式进行数据回填, 如果 Prometheus RemoteWrite 地址为空, 
# 则数据无法回填到 Prometheus 中.
# 
# Prometheus 开启回填的方式:
# prometheus 开启 WriteAPI EndPoint:
# > ------- Prometheus start command -------
# > prometheus \
# >   --web.enable-remote-write-receiver \
# >   --config.file={FilePATH} \
# >   {其他参数}
# > ----------------------------------------
PromWriteAddress: <String>

[ PromBasicAuth: <PromBasicAuth>]

PromBasicAuth配置说明

# Prometheus 基础认证信息

# Prometheus web basic auth username.
Username: <String>

# Prometheus web basic auth password.
Password: <String>

NamespaceConfig配置说明

# Namespace 有两种结构:
< string | [

  # Namespace 配置, 对应产品在云监控的命名空间.
  [ Namespace: <String> ]

  # 如果产品数据上报有一定延迟, 可以通过该参数调整.
  [ DelaySeconds: <Seconds> ]
]>

Exporter更新日志

下文记录Exporter的版本更新记录。

2024-05

Exporter版本：cloud-monitor-cn-beijing.cr.volces.com/cm/cloud-monitor-exporter:1.0.17-rc0

功能更新
- 提供了多种拉取数据的方式，在不同场景中可以抵抗一定的网络波动。
- Exporter自监控指标增强。
- 支持环境变量的方式配置AK、SK。
- 支持更多日志等级。

2023-12

Exporter版本：cloud-monitor-cn-beijing.cr.volces.com/cm/cloud-monitor-exporter:1.0.16.1

功能更新
- 支持给不同的云产品配置不同的延迟时间，解决极端情况下由于未配置不同的延迟时间导致无数据的情况。
- 支持动态延迟和时间偏移，Exporter动态调整查询数据的时间，保证每次拉取数据都是有效和完整的。

最近更新时间：2024.06.21 14:32:10

这个页面对您有帮助吗？

有用

无用

云监控

操作视频 #

导出监控数据到托管Prometheus #

导出监控数据到自建Prometheus #

使用说明 #

步骤一：安装Exporter #

步骤二：配置采集任务 #