前言
本篇記錄如何應用 Prometheus
紀錄學習 HiSKIO 課程 Prometheus + Grafana 監控和警報系統 從入門到進階
介紹
Prometheus 是一套系統監控框架,可用來蒐集各項 Metric
因 Prometheus UI 較陽春,故可將 Prometheus 當作資料蒐集器,提供資料給 Garfana 呈現
Prometheus 相關功能 可參考 Prometheus - 官方網站
作業環境
Windows 10 Professional (22H2)
Docker Desktop
Docker Compose
Metrics 類型
Prometheus 有四種 Metrics 類型
- Counter = 計數器,數值只增不減
- Gauge = 與 Counter 不同,Gauge 隨著時間變化可以遞增也可以遞減
- Histogram = 柱狀圖,顯示數值區間數量,用以顯示分布情況
- Summary = 彙總,顯示數值中位數
設定說明
課程範例原始碼資料夾樹狀結構如下
.
├── docker-compose.yml
├── telegraf/
│ ├── telegraf.d
│ │ ├── amazon.conf
│ │ └── github.conf
│ └── telegraf.conf
└── prometheus/
└── prometheus.yml
docker-compose.yml
version: "3.0"
services:
telegraf:
image: telegraf:1.16.0
restart: always
container_name: telegraf
hostname: telegraf
ports:
- 9273:9273
volumes:
- ./telegraf/telegraf.conf:/etc/telegraf/telegraf.conf
- ./telegraf/telegraf.d/:/etc/telegraf/telegraf.d/
command: telegraf --config /etc/telegraf/telegraf.conf --config-directory /etc/telegraf/telegraf.d
prometheus:
image: prom/prometheus:latest
restart: always
container_name: prometheus
hostname: prometheus
ports:
- 9090:9090
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
command: --config.file=/etc/prometheus/prometheus.yml
telegraf 部分參考 w4560000 - Telegraf 學習筆記-1
prometheus.yml
global:
scrape_interval: 5s
scrape_configs:
- job_name: "telegraf"
static_configs:
- targets: ["telegraf:9273"]
scrape_interval
每五秒撈取 Metrics 一次scrape_configs
欲拉取 Metrics 的服務設定
操作流程
docker-compose up -d
# 確認一下服務
docker-compose ps
# 輸出
NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
prometheus prom/prometheus:latest "/bin/prometheus --c…" prometheus 37 minutes ago Up 37 minutes 0.0.0.0:9090->9090/tcp
telegraf telegraf:1.16.0 "/entrypoint.sh tele…" telegraf 37 minutes ago Up 37 minutes 8092/udp, 8125/udp, 8094/tcp, 0.0.0.0:9273->9273/tcp
查看 telegraf metrics 開啟 http://localhost:9273/metrics
查看 prometheus 開啟 http://localhost:9090/
PromQL 基礎查詢
目前資料
搜尋 ping_average_response_ms
顯示 ping_average_response_ms Metrics 數據
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”1”, service_name=”github”, url=”github.com”} | 38.205 |
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.cn”} | 312.184 |
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.com”} | 221.355 |
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.de”} | 267.097 |
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.jp”} | 5.177 |
操作符號:
- = equal
搜尋ping_average_response_ms{service_name="github"}
顯示 service_name 為 github 的 Metrics 數據
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”1”, service_name=”github”, url=”github.com”} | 38.476 |
- != not equal
搜尋ping_average_response_ms{service_name!="github"}
顯示 service_name 不為 github 的 Metrics 數據
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.cn”} | 311.721 |
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.com”} | 222.662 |
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.de”} | 265.805 |
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.jp”} | 5.479 |
搜尋 ping_average_response_ms{service_name!="github", url="amazon.cn"}
顯示 service_name 不為 github 以及 url 為 amazon.cn 的 Metrics 數據
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.cn”} | 311.875 |
- =~ match regex
搜尋 ping_average_response_ms{url=~"^amazon.*"}
顯示符合 url 開頭為 amazon. 正則表達式的 Metrics 數據
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.cn”} | 312.087 |
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.com”} | 222.339 |
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.de”} | 264.802 |
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.jp”} | 5.676 |
- !~ doesn’s match regex
搜尋 ping_average_response_ms{url!~"^amazon.*"}
顯示不符合 url 開頭為 amazon. 正則表達式的 Metrics 數據
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”1”, service_name=”github”, url=”github.com”} | 38.009 |
數值判斷
搜尋 ping_average_response_ms{service_name!="github"} > 100
顯示數值超過 100 的 Metrics 數據數據
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.cn”} | 312.05 |
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.com”} | 222.711 |
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.de”} | 265.618 |
範圍查詢
搜尋 ping_average_response_ms{service_name!="github"}[30s]
顯示近 30 秒內的 Metrics 數據
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.cn”} | 311.996 @1699346946 315.853 @1699346955 311.564 @1699346965 |
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.com”} | 223.442 @1699346955 223.2 @1699346965 |
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.de”} | 274.674 @1699346955 272.112 @1699346965 |
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.jp”} | 8.695 @1699346955 5.744 @1699346965 |
聚合(Aggregation) 查詢
搜尋 sum(ping_average_response_ms) by (service_name)
顯示以 service_name 為分群 加總 ping_average_response_ms 的 Metrics 數據
{service_name=”github”} | 37.998 |
{service_name=”amazon”} | 793.574 |
搜尋 topk(3, ping_average_response_ms)
顯示最大 ping_average_response_ms 的 Metrics 數據 的前 3 筆資料
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.cn”} | 312.218 |
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.de”} | 272.3 |
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.com”} | 221.275 |
其餘 Function 參考 Prometheus Functions
參考文件
HiSKIO 課程 Prometheus + Grafana 監控和警報系統 從入門到進階
Prometheus 官網
轉載請註明來源,若有任何錯誤或表達不清楚的地方,歡迎在下方評論區留言,也可以來信至 leozheng0621@gmail.com
如果文章對您有幫助,歡迎斗內(donate),請我喝杯咖啡