Pushgateway

Pushgateway 介绍

Pushgateway是一个独立服务，它在HTTP REST API上接收Prometheus指标。Pushgateway位于发送指标的应用程序和Prometheus服务器之间。Pushgateway接收指标，然后作为目标被抓取，以将指标提供给Prometheus服务器

Pushgateway 架构图

Pushgateway 使用场景

网关（gateway）不是一个完美的解决方案，只能用作有限的解决方案使用，特别是用于监控其他无法访问的资源（比如内部系统、聚石塔）
网关（gateway）单点故障或性能瓶颈，因为Pushgateway肯定不会像Prometheus服务器那样可扩展
与功能齐全的推送监控工具相比，网关更接近于代理
因此，使用它将丢失Prometheus服务器提供的很多有用功能，这包括通过up指标和指标过期进行实例状态监控。你没办法监控到服务作业有没有推送。
默认情况下，它是静态代理，会记住发送给它的每个指标并暴露它们，只要它正在运行（并且指标不会保留）或者直到它们被删除。这意味着不再存在的实例的指标可能仍保存在网关中
应该将网关的重点放在监控短生命周期的资源（如作业），或者无法访问的资源的短期监控上，然后安装Prometheus服务器以长期监控可访问的资源

安装

https://prometheus.io/download/#pushgateway
https://github.com/prometheus/pushgateway/releases

docker 运行

1 2	docker pull prom/pushgateway docker run -d -p 9091:9091 prom/pushgateway

二进制运行

cd /usr/local/src
wget -c https://github.com/prometheus/pushgateway/releases/download/v1.4.2/pushgateway-1.4.2.linux-amd64.tar.gz
tar -zxvf pushgateway-1.4.2.linux-amd64.tar.gz
mv pushgateway-1.4.2.linux-amd64 /usr/local/pushgateway
ln -s /usr/local/pushgateway/pushgateway /usr/local/bin/pushgateway
pushgateway --web.listen-address="0.0.0.0:9091"

服务

$ vim /etc/systemd/system/pushgateway.service
[Unit]
Description=pushgateway

[Service]
ExecStart=/usr/local/pushgateway/pushgateway 
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure

[Install]
WantedBy=multi-user.target

systemctl daemon-reload
systemctl enabled pushgateway
systemctl start pushgateway

web 访问 HTTP://IP:9091

配置和运行Pushgateway

Pushgateway不需要任何配置，开箱即用
在所有接口上运行Pushgateway

默认情况下，网关将所有指标存储在内存中。这意味着如果网关停止或重新启动，那么你将丢失内存中所有指标。可以通过指定–persistence.file参数将指标持久存储于磁盘路径
代码清单：持久化指标
默认情况下，文件每五分钟持久化写入一次，但你可以使用 –persistence.interval 参数覆盖它

1	pushgateway --persistence.file="/tmp/pushgateway_persist" --web.listen-address="0.0.0.0:9091"

向Pushgateway发送指标

最简单的方法是使用curl等命令行工具发送指标

将指标推送到路径/metrics。URL使用标签组成，这里/metrics/job/
，其中batchjob1是我们的作业标签

Pushgateway指标路径

1	/metrics/job/<jobname>{/<label>/<label>}

1	echo 'batchjob1_zj_crontab 2' \| curl --data-binary @- http://localhost:9091/metrics/job/batchjob1

让我们在URL中为指标添加一个instance标签

向网关发送指标

1	echo 'batchjob1_zj_crontab 2' \| curl --data-binary @- http://localhost:9091/metrics/job/batchjob1/instance/sidekiq_server

由于网关是缓存而不是聚合器，因此指标组将保持运行，直到网关停止或删除它们为止

为推送的指标添加标签

1	echo 'batchjob1_zj_crontab{job_id="123ABC"} 2' \| curl --data-binary @- http://localhost:9091/metrics/job/batchjob1/instance/sidekiq_server

可以通过在推送中传递TYPE和HELP语句来向指标添加类型

传递类型和描述

cat << EOF | curl --data-binary @- http://localhost:9091/metrics/job/batchjob2/instance/sidekiq_server
# TYPE batchjob1_user_counter counter
# HELP batchjob1_user_counter A metric from BatchJob1.
batchjob1_user_counter{job_id="123ABC"} 2
EOF

传递类型和描述（添加更多指标）

cat << EOF | curl --data-binary @- http://localhost:9091/metrics/job/batchjob1/instance/sidekiq_server
# TYPE batchjob1_avg_latency gauge
# HELP batchjob1_avg_latency Another metric from BatchJob1
batchjob1_avg_latency{job_id="123ABC"} 74.5
# TYPE batchjob1_sales_counter counter
# HELP batchjob1_sales_counter A third metric from BatchJob1
batchjob1_sales_counter{job_id="123ABC"} 1
EOF

Pushgateway 相关操作

在Pushgateway上查看指标
1
curl http://localhost:9091/metrics

删除Pushgateway所有指标

1	curl -X DELETE http://localhost:9091/metrics/job/batchjob1

选择删除Pushgateway指标

1	curl -X DELETE http://localhost:9091/metrics/job/batchjob1/instance/sidekiq_server

使用

现在 PushGateway 服务已经启动完毕，但是还没有跟 Prometheus 关联起来，我们需要的是通过 PushGateway 来上传自定义监控数据，然后通过 Prometheus 采集这些数据来进行监控。那么就需要将 PushGateway 添加到 Prometheus 目标任务中去，增加 prometheus.yml 配置如下：

# scrape_configs: 下面添加
  - job_name: 'pushgateway'
    honor_labels: true
    static_configs:
      - targets: ['127.0.0.1:9091']
        labels:
          instance: pushgateway

说明一下，这里采用 static_configs 静态配置方式，因为目前就一个 PushGateway，如果有多个可以考虑其他服务发现方式，来方便动态加载，具体可以参考这里。配置完毕后，重启 Prometheus 服务，此时可以通过 Prometheus UI 页面的 Targets 下查看是否配置成功。

API 方式 Push 数据到 PushGateway

1	echo "test_metric 123456" \| curl --data-binary @- http://localhost:9091/metrics/job/test_job

除了 test_metric 外，同时还新增了 push_time_seconds 和 push_failure_time_seconds 两个指标，这两个是 PushGateway 系统自动生成的相关指标。此时，我们在 Prometheus UI 页面上 Graph 页面可以查询的到该指标了。

这里要着重提一下的是，上图中 test_metric 我们查询出来的结果为 test_metric{exported_job=”test_job”,instance=”pushgateway”,job=”pushgateway”} ，眼尖的会发现这里头好像不太对劲，刚刚提交的指标所属 job 名称为 test_job ，为啥显示的为 exported_job=”test_job” ，而 job 显示为 job=”pushgateway” ，这显然不太正确，那这是因为啥？其实是因为 Prometheus 配置中的一个参数 honor_labels （默认为 false）决定的，我们不妨再 Push 一个数据，来演示下添加 honor_labels: true 参数前后的变化。

honor_labels选项并将其设置为false。当Prometheus抓取目标时，它将附加抓取作业的名称（此处为pushgateway），以及填充了目标的主机或IP地址的instance标签
如果honor_labels设置为true，那么Prometheus将使用Pushgateway上的job和instance标签。如果设置为false，那么它将重命名这些值，在它们前面加上exported_前缀，并在服务器上为这些标签附加新值

这次，我们 Push 一个复杂一些的，一次写入多个指标，而且每个指标添加 TYPE 及 HELP 说明。

cat <<EOF | curl --data-binary @- http://localhost:9091/metrics/job/test_job/instance/test_instance
# TYPE test_metrics counter
test_metrics{label="app1",name="demo"} 100.00
# TYPE another_test_metrics gauge
# HELP another_test_metrics Just an example.
another_test_metrics 123.45
EOF

文件上传

上边我们 Push 指标数据是通过命令行追加方式，少量数据还凑合，如果需要 Push 的数据比较大时，就不太方便了，这里我们也可以通过将指标数据写入到文件，然后将文件内容提交，也可以正常添加到 PushGateway。

vim pgdata.txt
# TYPE http_request_total counter
# HELP http_request_total get interface request count with different code.
http_request_total{code="200",interface="/v1/save"} 276
http_request_total{code="404",interface="/v1/delete"} 0
http_request_total{code="500",interface="/v1/save"} 1
# TYPE http_request_time gauge
# HELP http_request_time get core interface http request time.
http_request_time{code="200",interface="/v1/core"} 0.122

1	curl -XPOST --data-binary @pgdata.txt http://localhost:9091/metrics/job/app/instance/app-172.30.0.0

cat << EOF | curl --data-binary @- http://localhost:9091/metrics/job/console_monistor/instance/kayou_hosting
# TYPE http_request_total counter
# HELP http_request_total get interface request count with different code.
http_request_total{code="200",interface="/v1/save"} 276
http_request_total{code="404",interface="/v1/delete"} 0
http_request_total{code="500",interface="/v1/save"} 1
EOF

使用 PushGateway 注意事项

指标值只能是数字类型，非数字类型报错。

1	$ echo "test_metric 12.34.56ff" \| curl --data-binary @- http://172.30.12.167:9091/metrics/job/test_job_1 text format parsing error in line 1: expected float as value, got "12.34.56ff"

指标值支持最大长度为 16 位，超过16 位后默认置为 0

1	$ echo "test_metric 1234567898765432123456789" \| curl --data-binary @- http://172.30.12.167:9091/metrics/job/test_job_2 # 实际获取值 test_metric{job="test_job_2"} 1234567898765432200000000

PushGateway 数据持久化操作
默认 PushGateway 不做数据持久化操作，当 PushGateway 重启或者异常挂掉，导致数据的丢失，我们可以通过启动时添加 -persistence.file 和 -persistence.interval 参数来持久化数据。-persistence.file 表示本地持久化的文件，将 Push 的指标数据持久化保存到指定文件，-persistence.interval 表示本地持久化的指标数据保留时间，若设置为 5m，则表示 5 分钟后将删除存储的指标数据。
1
$ docker run -d -p 9091:9091 prom/pushgateway "-persistence.file=pg_file –persistence.interval=5m"
PushGateway 推送及 Prometheus 拉取时间设置
Prometheus 每次从 PushGateway 拉取的数据，并不是拉取周期内用户推送上来的所有数据，而是最后一次 Push 到 PushGateway 上的数据，所以推荐设置推送时间小于或等于 Prometheus 拉取的时间，这样保证每次拉取的数据是最新 Push 上来的。