Telegraf 學習筆記-1

  1. 前言
  2. 介紹
  3. 作業環境
  4. 設定說明
  5. 操作流程
  6. 新增 cpu input plugin
  7. 添加全局 Lable
  8. 針對特定 input plugin 添加靜態標籤
  9. 添加動態標籤
  10. 參考文件

前言

本篇記錄如何應用 Telegraf

紀錄學習 HiSKIO 課程 Prometheus + Grafana 監控和警報系統 從入門到進階

介紹

Telegraf 官網

Telegraf 是用 GO 開發的一個開源專案,主要功能用來協助收集統計資料的一個中繼程式

GitHub 上有大量的Telegraf Plugins 可供使用

資料流程為

  1. Input = 資料來源,Telegraf 可以自己起一個監聽服務讓外界呼叫來協助轉拋資料,也可以主動去拉取其他服務的資料回來
  2. Process = 取得資料後,可對資料進一步處理或者過濾
  3. Aggregate = 接著可透過資料建立客製化的 metrics
  4. Output = 輸出 metrics 資料,可讓其他數據庫服務來取得,或者推送到其他資料庫

作業環境

Windows 10 Professional (22H2)
Docker Desktop
Docker Compose

課程範例原始碼

設定說明

課程範例原始碼資料夾樹狀結構如下

.
├── docker-compose.yml
├── telegraf/
│   ├── telegraf.d
│   │   ├── amazon.conf
│   │   └── github.conf
│   └── telegraf.conf
├── prometheus/
│   └── prometheus.yml

docker-compose.yml

version: "3.0"

services:
  telegraf:
    image: telegraf:1.16.0
    restart: always
    container_name: telegraf
    hostname: telegraf
    ports:
      - 9273:9273
    volumes:
      - ./telegraf/telegraf.conf:/etc/telegraf/telegraf.conf
    command: telegraf --config /etc/telegraf/telegraf.conf

telegraf.conf

[agent]
# https://docs.influxdata.com/telegraf/v1.16/administration/configuration/#agent-configuration
interval = "30s"
flush_interval = "10s"

[[inputs.ping]]
# https://github.com/influxdata/telegraf/blob/master/plugins/inputs/ping/README.md
urls = ["github.com", "amazon.com"]
method = "exec"
count = 3

[[outputs.prometheus_client]]
# https://github.com/influxdata/telegraf/blob/master/plugins/outputs/prometheus_client/README.md
## Address to listen on.
listen = ":9273"
metric_version = 2

配置 telegraf 設定檔,至少需要這三部分 agent、inputs 的 plugins、output 的 plugins

agent
interval = 每隔多長時間蒐集一次資料
flush_interval = 每隔多長時間將彙整的資料更新到 output

input.ping = input 的一種 plugin,用來 ping 確認其他服務是否有異常
urls = 要 ping 的目標 domain
method = 參數方法 (exec (透過主機系統原生的 ping 來呼叫) or native (plugin 直接呼叫))
count = 發送次數,若 ping_interval = 1,則代表 1 秒呼叫 3 次

outputs.prometheus_client = output 的一種 plugin,prometheus
listen = 會在本機建立被動的 Port 號,等待 prometheus 來取得
metric_version = metric 版本

操作流程

docker-compose up -d

# 確認一下服務
docker-compose ps

# 輸出
NAME                IMAGE               COMMAND                  SERVICE             CREATED             STATUS              PORTS
telegraf            telegraf:1.16.0     "/entrypoint.sh tele…"   telegraf            About an hour ago   Up About an hour    8092/udp, 8125/udp, 8094/tcp, 0.0.0.0:9273->9273/tcp

查看 metrics,開啟 http://localhost:9273/metrics

可以發現蒐集到的 metrics 資料,ex: 平均反應時間

ping_average_response_ms{host="telegraf",url="amazon.com"} 192.085
ping_average_response_ms{host="telegraf",url="github.com"} 56.701

ping_average_response_ms = metrics 名稱
{host=”telegraf”,url=”amazon.com”} = Label,這邊有 host、url 兩個 Label
192.085 = 數值

新增 cpu input plugin

github - telegraf plugins input.cpu

修改 telegraf.conf

[agent]
# https://docs.influxdata.com/telegraf/v1.16/administration/configuration/#agent-configuration
interval = "30s"
flush_interval = "10s"

[[inputs.ping]]
# https://github.com/influxdata/telegraf/blob/master/plugins/inputs/ping/README.md
urls = ["github.com", "amazon.com"]
method = "exec"
count = 3

+[[inputs.cpu]]
+## Whether to report per-cpu stats or not
+percpu = true
+## Whether to report total system cpu stats or not
+totalcpu = true
+## If true, collect raw CPU time metrics
+collect_cpu_time = false
+## If true, compute and report the sum of all non-idle CPU states
+report_active = false
+## If true and the info is available then add core_id and physical_id tags
+core_tags = false

[[outputs.prometheus_client]]
# https://github.com/influxdata/telegraf/blob/master/plugins/outputs/prometheus_client/README.md
## Address to listen on.
listen = ":9273"
metric_version = 2

重啟 docker-compose 服務

docker compose restart

即可看到 除了 ping 的 metrics 也已經有 cpu 的 metrics 了

# TYPE cpu_usage_guest gauge
cpu_usage_guest{cpu="cpu-total",host="telegraf"} 0
cpu_usage_guest{cpu="cpu0",host="telegraf"} 0

添加全局 Lable

修改 telegraf.conf

[agent]
# https://docs.influxdata.com/telegraf/v1.16/administration/configuration/#agent-configuration
interval = "30s"
flush_interval = "10s"

+[global_tags]
+environment = "LeoTest"

[[inputs.ping]]
# https://github.com/influxdata/telegraf/blob/master/plugins/inputs/ping/README.md
urls = ["github.com", "amazon.com"]
method = "exec"
count = 3

[[inputs.cpu]]
## Whether to report per-cpu stats or not
percpu = true
## Whether to report total system cpu stats or not
totalcpu = true
## If true, collect raw CPU time metrics
collect_cpu_time = false
## If true, compute and report the sum of all non-idle CPU states
report_active = false
## If true and the info is available then add core_id and physical_id tags
core_tags = false

[[outputs.prometheus_client]]
# https://github.com/influxdata/telegraf/blob/master/plugins/outputs/prometheus_client/README.md
## Address to listen on.
listen = ":9273"
metric_version = 2

重啟 docker-compose 服務,再次查看 metrics
可以看到已經添加了 全局 environment 的 Label

ping_average_response_ms{environment="LeoTest",host="telegraf",url="amazon.com"} 191.99
ping_average_response_ms{environment="LeoTest",host="telegraf",url="github.com"} 54.649

針對特定 input plugin 添加靜態標籤

修改 telegraf.conf

[agent]
# https://docs.influxdata.com/telegraf/v1.16/administration/configuration/#agent-configuration
interval = "30s"
flush_interval = "10s"

[global_tags]
environment = "LeoTest"

[[inputs.ping]]
# https://github.com/influxdata/telegraf/blob/master/plugins/inputs/ping/README.md
urls = ["github.com", "amazon.com"]
method = "exec"
count = 3

+[inputs.ping.tags]
+pingVersion = '1'

[[inputs.cpu]]
## Whether to report per-cpu stats or not
percpu = true
## Whether to report total system cpu stats or not
totalcpu = true
## If true, collect raw CPU time metrics
collect_cpu_time = false
## If true, compute and report the sum of all non-idle CPU states
report_active = false
## If true and the info is available then add core_id and physical_id tags
core_tags = false

+[inputs.cpu.tags]
+cpuVersion = '2'

[[outputs.prometheus_client]]
# https://github.com/influxdata/telegraf/blob/master/plugins/outputs/prometheus_client/README.md
## Address to listen on.
listen = ":9273"
metric_version = 2

重啟 docker-compose 服務,再次查看 metrics
可以看到已經添加了 pingVersion、cpuVersion 的 Label

ping_average_response_ms{environment="LeoTest",host="telegraf",pingVersion="1",url="amazon.com"} 192.008
ping_average_response_ms{environment="LeoTest",host="telegraf",pingVersion="1",url="github.com"} 56.853
cpu_usage_guest{cpu="cpu-total",cpuVersion="2",environment="LeoTest",host="telegraf"} 0
cpu_usage_guest{cpu="cpu0",cpuVersion="2",environment="LeoTest",host="telegraf"} 0

添加動態標籤

使用 Telegraf Processor
透過使用 Processor Regex Plugin 來做範例

透過 Regex 解析 url 內容,將 .com 之前的字段建立新的 Lable 來存
ping_average_response_ms{environment=”LeoTest”,host=”telegraf”,url=”amazon.com”} 191.99

修改 telegraf.conf

[agent]
# https://docs.influxdata.com/telegraf/v1.16/administration/configuration/#agent-configuration
interval = "30s"
flush_interval = "10s"

[global_tags]
environment = "LeoTest"

[[inputs.ping]]
# https://github.com/influxdata/telegraf/blob/master/plugins/inputs/ping/README.md
urls = ["github.com", "amazon.com"]
method = "exec"
count = 3

+[[processors.regex]]
+  namepass = ["ping"] # 只處理 metrics 名稱為 ping 的 metrics
+
+  [[processors.regex.tags]]
+    key = "url"
+    pattern = "^([^.]+).*$"
+    replacement = "${1}"
+    result_key = "service_name"

[inputs.ping.tags]
pingVersion = '1'

[[inputs.cpu]]
## Whether to report per-cpu stats or not
percpu = true
## Whether to report total system cpu stats or not
totalcpu = true
## If true, collect raw CPU time metrics
collect_cpu_time = false
## If true, compute and report the sum of all non-idle CPU states
report_active = false
## If true and the info is available then add core_id and physical_id tags
core_tags = false

[inputs.cpu.tags]
cpuVersion = '2'

[[outputs.prometheus_client]]
# https://github.com/influxdata/telegraf/blob/master/plugins/outputs/prometheus_client/README.md
## Address to listen on.
listen = ":9273"
metric_version = 2

重啟 docker-compose 服務,再次查看 metrics
可以看到已經添加了 pingVersion、cpuVersion 的 Label

ping_average_response_ms{environment="LeoTest",host="telegraf",pingVersion="1",url="amazon.com"} 192.008
ping_average_response_ms{environment="LeoTest",host="telegraf",pingVersion="1",url="github.com"} 56.853
cpu_usage_guest{cpu="cpu-total",cpuVersion="2",environment="LeoTest",host="telegraf"} 0
cpu_usage_guest{cpu="cpu0",cpuVersion="2",environment="LeoTest",host="telegraf"} 0

參考文件

HiSKIO 課程 Prometheus + Grafana 監控和警報系統 從入門到進階
Telegraf 官網
Telegraf Github


轉載請註明來源,若有任何錯誤或表達不清楚的地方,歡迎在下方評論區留言,也可以來信至 leozheng0621@gmail.com
如果文章對您有幫助,歡迎斗內(donate),請我喝杯咖啡

斗內💰

×

歡迎斗內

github