一文教你使用alertmanager實現微信和郵箱告警
前言
alertmanager 是一個開源的告警方案實現。本文將詳細介紹如何利用 alertmanager 實現郵箱和qiye微信告警。
整體架構
監控數據被推送到Prometheus,然后在Prometheus側配置告警規則。如果監控的指標數據觸發了告警規則的閾值,將會將告警數據發送到Alertmanager。通過Alertmanager,可以實現將告警發送到多個地方,例如發送到郵箱或微信。告警的整體架構如下所示:

下面詳細介紹下具體實現。
安裝alertmanager
下載alertmanager,并解壓安裝:
mkdir /usr/local/prometheus
tar -xf alertmanager-0.25.0.linux-amd64.tar.gz
mv alertmanager-0.25.0.linux-amd64.tar.gz alertmanager
檢查配置
[root@evm-c7j1a8ape0h3uq7dl640 alertmanager]#
./amtool check-config alertmanager.yml
Checking 'alertmanager.yml' SUCCESS
Found:
- global config
- route
- 0 inhibit rules
- 1 receivers
- 1 templates
SUCCESS
systemd啟動
/usr/lib/systemd/system/alertmanager.service
[Unit]
Description=Alertmanager
After=network.target
[Service]
Type=forking
Restart=on-failure
RestartSec=5
ExecStart=/usr/local/prometheus/alertmanager/start.sh
ExecStop=/usr/local/prometheus/alertmanager/stop.sh
[Install]
WantedBy=multi-user.target
start.sh
#!/bin/bash
nohup /usr/local/prometheus/alertmanager/alertmanager --config.file='/usr/local/prometheus/alertmanager/alertmanager.yml' --cluster.advertise-address=0.0.0.0:9983 --web.listen-address=:9983 --log.level=debug > /usr/local/prometheus/alertmanager/alertmanager.log 2>&1 &
stop.sh
#!/bin/bash
ps -ef|grep alertmanager |grep -v grep | awk '{print $2}' | xargs kill -9
啟動
systemctl daemon-reload
systemctl restart alertmanager
頁面訪問alertmanager
頁面UI訪問:localhost:9983

promethus也可以看:

郵箱告警實現
alertmanager.yml配置
global:
resolve_timeout: 1m # 每1分鐘檢測一次是否恢復
smtp_smarthost: 'smtp.qq.com:465'
smtp_from: 'xxx@qq.com'
smtp_auth_username: 'xxx@qq.com'
smtp_auth_password: 'xxx' // 授權碼
smtp_require_tls: false
route:
receiver: 'mail'
group_by: ['type','alertname'] #告警中的標簽,相同type+alertname的會合并為一個通知告警
group_wait: 10s # 初次發送告警延時
group_interval: 10s # 距離第一次發送告警,等待多久再次發送告警
repeat_interval: 10s # 告警重發時間
receivers:
- name: 'mail'
email_configs:
- to: 'xxx@qq.com'
- name: 'wechat'
wechat_configs:
- send_resolved: true
message: '{{ template "wechat.default.message" . }}'
to_party: '2' #
agent_id: '1000002' #
api_secret: 'xxx'
配置郵箱告警需要授權碼,獲取郵箱授權碼:

promethus配置
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9983 // alertmanager的ip:port
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "/etc/prometheus/alert_rules.yml" // 觸發規則
alert_rules.yml
groups:
- name: node
rules:
- alert: haproxy_status_test
expr: haproxy_up{exported_job="xxx", type="haproxy"} == 0
#expr: vector(1)
for: 5s
annotations:
summary: "haproxy {{ $labels.host}} 掛了"
# labels.host就是上面PQL的輸出中的host
重啟Prometheus和Alertmanager,停止一臺HAProxy,您將收到以下告警郵件:

qiye微信告警webhook實現
alertmanager.yml中配置接收器we.book:
route:
receiver: 'web.hook' //指定接收器
group_by: ['type','alertname']
group_wait: 10s # 初次發送告警延時
group_interval: 300s # 距離第一次發送告警,等待多久再次發送告警
repeat_interval: 300s # 告警重發時間
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'localhost:8080/webhook'
#http_config:
#method: post
#headers:
# Content-Type: application/json
send_resolved: true
配置完成后,如果有告警,將會發送到配置的 endpoint 的 URL。發送的內容大致如下:
"receiver": "web.hook",
"status": "firing",
"alerts": [{
"status": "firing",
"labels": {
"alertname": "haproxy_status",
"exported_job": "xxx",
"host": "xxx",
"instance": "xxx:9234",
"job": "pushgateway",
"type": "haproxy"
},
"annotations": {
"summary": "haproxy掛了, 機器:xxx"
},
"startsAt": "2023-03-21T08:18:32.202Z",
"endsAt": "0001-01-01T00:00:00Z",
"fingerprint": "aaae1485b1cbafdb"
}],
"groupLabels": {
"alertname": "haproxy_status",
"type": "haproxy"
},
"commonLabels": {
"alertname": "haproxy_status",
"exported_job": "xxx",
"host": "xxx",
"instance": "xxx:9234",
"job": "pushgateway",
"type": "haproxy"
},
"commonAnnotations": {
"summary": "haproxy掛了, 機器:xxx"
},
"externalURL": "xxx",
"version": "4",
"groupKey": "{}:{alertname=\\"haproxy_status\\", type=\\"haproxy\\"}",
"truncatedAlerts": 0
開發Goland監聽程序,監聽localhost:8080/webhook,解析請求數據,然后發送消息到qiye微信機器人的Webhook,這樣機器人就能收到告警。以下是代碼示例:
package main
import (
"net/http"
"fmt"
"strings"
"github.com/gin-gonic/gin"
model "github.com/yunlzheng/alertmanaer-dingtalk-webhook/model"
)
func main() {
// 這個targetUrl創建qiye微信機器人后就會有了
targetUrl := "<//qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxx>"
router := gin.Default()
router.POST("/webhook", func(c *gin.Context) {
var notification model.Notification
/*reqBody, _ := c.GetRawData()
fmt.Printf("[INFO] Request: %s %s\\n", c.Request.Method, c.Request.RequestURI)
fmt.Printf("requet body:%s", reqBody)
return*/
err := c.BindJSON(¬ification)
if err != nil {
fmt.Printf("BindJSON fail, err=", err)
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
return
}
//data, err := json.Marshal(¬ification)
//fmt.Println(err)
//fmt.Println(data)
// fmt.Println("notification.status:", notification.Status)
alertSummary := ""
for _, alert := range notification.Alerts {
annotations := alert.Annotations
if "" == annotations["summary"] {
continue
}
alertSummary += "[" + annotations["summary"] + "]\\r\\n"
//buffer.WriteString(fmt.Sprintf("##### %s\\n > %s\\n", annotations["summary"], annotations["description"]))
//buffer.WriteString(fmt.Sprintf("\\n> 開始時間:%s\\n", alert.StartsAt.Format("15:04:05")))
}
if "" == alertSummary {
return
}
msgContent := fmt.Sprintf("{\\"msgtype\\":\\"text\\",\\"text\\":{\\"content\\":\\"%s\\"}}", alertSummary)
fmt.Println(msgContent)
payload := strings.NewReader(msgContent)
req, _ := http.NewRequest("POST", targetUrl, payload)
req.Header.Add("Content-Type", "application/json")
response, err := http.DefaultClient.Do(req)
fmt.Println(err, response)
c.JSON(http.StatusOK, gin.H{"message": " successful receive alert notification message!"})
})
router.Run()
}
alertmanager告警存在的問題
- 如果沒有數據推送,pushgateway中保留的是舊數據,導致PQL告警可能是無效的,存在誤報的情況。解決方法是定期刪除pushgateway中的數據即可。
總結
本文通過實戰的方式,利用Alertmanager實現了郵箱和qiye微信的告警功能。其中,qiye微信的告警采用自己實現Webhook的方式進行觸發。根據業務需求,可以利用這個方案進行郵件或者qiye微信的告警。