简介
process-export主要用来做进程监控,比如某个服务的进程数、消耗了多少CPU、内存等资源
安装 / 使用
下载并程序放在/usr/local/bin下且赋权
项目地址:https://github.com/ncabatoff/process-exporter
配置
文件可用模版变量如下(可选配置)
不建议使用 PID 或 StartTime:结果不一定符合预期
1 2 3 4 5 6 7 8 9 10 11 12 13 14
|
{{.Comm}} 包含原始可执行文件的基本名称,即 /proc/<pid>/stat 中的第 2 个字段,并截取前15个字符 {{.ExeBase}} 包含可执行文件的基本名称 {{.ExeFull}} 包含可执行文件的完全限定路径 {{.Username}} 包含有效用户的用户名 {{.Matches}} map 包含应用 cmdline 正则表达式产生的所有匹配项 {{.PID}} 包含进程的 PID。请注意,使用 PID 意味着该组将仅包含一个进程 {{.StartTime}} 包含进程的开始时间。这与 PID 结合使用时非常有用,因为 PID 会随着时间的推移而被重用。 {{.Cgroups}} 包含(如果支持)进程的 cgroups (/proc/self/cgroup)。这对于识别进程属于哪个容器特别有用
|
配置启动脚本
vim /usr/lib/systemd/system/process_exporter.service
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| [Unit] Description=process_exporter After=network.target
[Service] ExecStart=/usr/local/bin/process_exporter \ --config.path=/usr/local/process_exporter/process-conf.yaml
Restart=always RestartSec=1
[Install] WantedBy=multi-user.target
|
启动 process_exporter
1 2 3
| systemctl daemon-reload systemctl start process_exporter systemctl enable process_exporter
|
验证监控数据
1
| curl http://localhost:9256/metrics
|
kube-prom operator监听
Service & Endpoints & ServiceMonitor
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
| apiVersion: v1 kind: Service metadata: name: external-process-exporter namespace: cattle-monitoring-system labels: app: external-process-exporter app.kubernetes.io/name: process-exporter spec: type: ClusterIP ports: - name: metrics port: 9256 protocol: TCP targetPort: 9256 --- apiVersion: v1 kind: Endpoints metadata: name: external-process-exporter namespace: cattle-monitoring-system labels: app: external-process-exporter app.kubernetes.io/name: process-exporter subsets: - addresses: - ip: xxx1 - ip: xxx2 ports: - name: metrics port: 9256 --- apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: external-process-exporter namespace: cattle-monitoring-system labels: app: external-process-exporter release: prometheus spec: selector: matchLabels: app: external-process-exporter namespaceSelector: matchNames: - cattle-monitoring-system endpoints: - port: metrics interval: 1m path: /metrics scheme: http
|
常用告警规则
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
| alert: 进程告警 expr: sum(namedprocess_namegroup_states) by (cluster,job,instance) > 500 for: 20s labels: severity: warning annotations: value: 服务器当前已产生 {{ $value }} 个进程,大于告警阈值
alert: 进程告警 expr: sum by(cluster, job, instance, groupname) (namedprocess_namegroup_states{state="Zombie"}) > 0 for: 1m labels: severity: warning annotations: value: 当前产生 {{ $value }} 个僵尸进程
alert: 进程重启告警 expr: ceil(time() - max by(cluster, job, instance, groupname) (namedprocess_namegroup_oldest_start_time_seconds)) < 60 for: 25s labels: label: alert_once severity: warning annotations: value: 进程 {{ $labels.groupname }} 在 {{ $value }} 秒前发生重启
alert: 进程退出告警 expr: up{export="process_exporter"} == 0 or max by(cluster, job, instance, groupname) (delta(namedprocess_namegroup_oldest_start_time_seconds{groupname=~"^map.*"}[10d])) < 0 for: 55s labels: severity: warning annotations: value: 进程 {{ $labels.export}} 已退出
|
grafana dashboard id【22161】