Prometheus 學習筆記-1

  1. 前言
  2. 介紹
  3. 作業環境
  4. Metrics 類型
  5. 設定說明
  6. 操作流程
  7. PromQL 基礎查詢
  8. 數值判斷
  9. 範圍查詢
  10. 聚合(Aggregation) 查詢
  11. 參考文件

前言

本篇記錄如何應用 Prometheus

紀錄學習 HiSKIO 課程 Prometheus + Grafana 監控和警報系統 從入門到進階

介紹

Prometheus 官網

Prometheus 是一套系統監控框架,可用來蒐集各項 Metric
因 Prometheus UI 較陽春,故可將 Prometheus 當作資料蒐集器,提供資料給 Garfana 呈現
Prometheus 相關功能 可參考 Prometheus - 官方網站

作業環境

Windows 10 Professional (22H2)
Docker Desktop
Docker Compose

課程範例原始碼

Metrics 類型

Prometheus 有四種 Metrics 類型

  1. Counter = 計數器,數值只增不減
  2. Gauge = 與 Counter 不同,Gauge 隨著時間變化可以遞增也可以遞減
  3. Histogram = 柱狀圖,顯示數值區間數量,用以顯示分布情況
  4. Summary = 彙總,顯示數值中位數

設定說明

課程範例原始碼資料夾樹狀結構如下

.
├── docker-compose.yml
├── telegraf/
│   ├── telegraf.d
│   │   ├── amazon.conf
│   │   └── github.conf
│   └── telegraf.conf
└── prometheus/
    └── prometheus.yml

docker-compose.yml

version: "3.0"

services:
  telegraf:
    image: telegraf:1.16.0
    restart: always
    container_name: telegraf
    hostname: telegraf
    ports:
      - 9273:9273
    volumes:
      - ./telegraf/telegraf.conf:/etc/telegraf/telegraf.conf
      - ./telegraf/telegraf.d/:/etc/telegraf/telegraf.d/
    command: telegraf --config /etc/telegraf/telegraf.conf --config-directory /etc/telegraf/telegraf.d

  prometheus:
    image: prom/prometheus:latest
    restart: always
    container_name: prometheus
    hostname: prometheus
    ports:
      - 9090:9090
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
    command: --config.file=/etc/prometheus/prometheus.yml

telegraf 部分參考 w4560000 - Telegraf 學習筆記-1

prometheus.yml

global:
  scrape_interval: 5s

scrape_configs:
  - job_name: "telegraf"
    static_configs:
      - targets: ["telegraf:9273"]

scrape_interval 每五秒撈取 Metrics 一次
scrape_configs 欲拉取 Metrics 的服務設定

操作流程

docker-compose up -d

# 確認一下服務
docker-compose ps

# 輸出
NAME                IMAGE                    COMMAND                  SERVICE             CREATED             STATUS              PORTS
prometheus          prom/prometheus:latest   "/bin/prometheus --c…"   prometheus          37 minutes ago      Up 37 minutes       0.0.0.0:9090->9090/tcp
telegraf            telegraf:1.16.0          "/entrypoint.sh tele…"   telegraf            37 minutes ago      Up 37 minutes       8092/udp, 8125/udp, 8094/tcp, 0.0.0.0:9273->9273/tcp 

查看 telegraf metrics 開啟 http://localhost:9273/metrics
查看 prometheus 開啟 http://localhost:9090/

PromQL 基礎查詢

目前資料
搜尋 ping_average_response_ms 顯示 ping_average_response_ms Metrics 數據

ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”1”, service_name=”github”, url=”github.com”} 38.205
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.cn”} 312.184
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.com”} 221.355
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.de”} 267.097
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.jp”} 5.177

操作符號:

  1. = equal
    搜尋 ping_average_response_ms{service_name="github"} 顯示 service_name 為 github 的 Metrics 數據
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”1”, service_name=”github”, url=”github.com”} 38.476
  1. != not equal
    搜尋 ping_average_response_ms{service_name!="github"} 顯示 service_name 不為 github 的 Metrics 數據
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.cn”} 311.721
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.com”} 222.662
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.de”} 265.805
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.jp”} 5.479

搜尋 ping_average_response_ms{service_name!="github", url="amazon.cn"} 顯示 service_name 不為 github 以及 url 為 amazon.cn 的 Metrics 數據

ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.cn”} 311.875
  1. =~ match regex

搜尋 ping_average_response_ms{url=~"^amazon.*"} 顯示符合 url 開頭為 amazon. 正則表達式的 Metrics 數據

ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.cn”} 312.087
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.com”} 222.339
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.de”} 264.802
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.jp”} 5.676
  1. !~ doesn’s match regex

搜尋 ping_average_response_ms{url!~"^amazon.*"} 顯示不符合 url 開頭為 amazon. 正則表達式的 Metrics 數據

ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”1”, service_name=”github”, url=”github.com”} 38.009

數值判斷

搜尋 ping_average_response_ms{service_name!="github"} > 100 顯示數值超過 100 的 Metrics 數據數據

ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.cn”} 312.05
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.com”} 222.711
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.de”} 265.618

範圍查詢

搜尋 ping_average_response_ms{service_name!="github"}[30s] 顯示近 30 秒內的 Metrics 數據

ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.cn”} 311.996 @1699346946
315.853 @1699346955 311.564 @1699346965
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.com”} 223.442 @1699346955
223.2 @1699346965
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.de”} 274.674 @1699346955
272.112 @1699346965
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.jp”} 8.695 @1699346955
5.744 @1699346965

聚合(Aggregation) 查詢

搜尋 sum(ping_average_response_ms) by (service_name) 顯示以 service_name 為分群 加總 ping_average_response_ms 的 Metrics 數據

{service_name=”github”} 37.998
{service_name=”amazon”} 793.574

搜尋 topk(3, ping_average_response_ms) 顯示最大 ping_average_response_ms 的 Metrics 數據 的前 3 筆資料

ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.cn”} 312.218
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.de”} 272.3
ping_average_response_ms{environment=”LeoTest”, host=”telegraf”, instance=”telegraf:9273”, job=”telegraf”, pingVersion=”2”, service_name=”amazon”, url=”amazon.com”} 221.275

其餘 Function 參考 Prometheus Functions

參考文件

HiSKIO 課程 Prometheus + Grafana 監控和警報系統 從入門到進階
Prometheus 官網


轉載請註明來源,若有任何錯誤或表達不清楚的地方,歡迎在下方評論區留言,也可以來信至 leozheng0621@gmail.com
如果文章對您有幫助,歡迎斗內(donate),請我喝杯咖啡

斗內💰

×

歡迎斗內

github