在第一章创建集群时,并没有开启Open monitoring
。这并没关系,在创建完成集群后依然可以开启:
选择Enable open monitoring
, 并同时使用JMX Exporter
和Node Exporter
:
点击确认后,过几分钟集群的更新完成。
开启Open Monitoring
后,MSK使用11001端口提供JMX Exporter
的指标,使用11002端口提供Node Exporter
的指标,要注意这和Node Exporter默认的9100端口不一样。 参考: https://docs.aws.amazon.com/msk/latest/developerguide/open-monitoring.html
到 https://prometheus.io/download/ 地址下找到prometheus的下载链接:
当前版本是2.32.0
,将其下载并解压:
wget https://github.com/prometheus/prometheus/releases/download/v2.32.0/prometheus-2.32.0.linux-amd64.tar.gz
tar -zxvf prometheus-2.32.0.linux-amd64.tar.gz
cd prometheus-2.32.0.linux-amd64/
此时可以执行./prometheus
命令来运行prometheus,但我们需要做一些其他配置才能拉取到MSK的指标
将上面文件目录下的prometheus.yml
, 内容替换为:
# file: prometheus.yml
# my global config
global:
scrape_interval: 10s
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
static_configs:
# 9090 is the prometheus server port
- targets: ['localhost:9090']
- job_name: 'broker'
file_sd_configs:
- files:
- 'targets.json'
最后我们加了一个job来拉取broker的exporter指标,这个Job使用了基于文件的服务发现
(可参考拙作
)
targets.json
里需要配置所有JMX Exporter
和Node Exporter
的地址, 第一步是获取所有Broker的地址,这些Exporter运行在Broker的11001和11002端口:
CLUSTER_ARN=arn:aws:kafka:ap-southeast-1:145197526627:cluster/MSKDemo/89d04308-2643-4e80-b6e2-fe996354f056-4 # 根据集群的实际情况做替换
aws kafka list-nodes --cluster-arn $CLUSTER_ARN \
--query NodeInfoList[*].BrokerNodeInfo.Endpoints[]
在当前目录下新建targets.json
,将上面的输出地址加上:11001
和:11002
, 分别对应jmx和node exporter:
[
{
"labels": {
"job": "jmx"
},
"targets": [
"b-3.mskdemo.mxqzz7.c4.kafka.ap-southeast-1.amazonaws.com:11001",
"b-6.mskdemo.mxqzz7.c4.kafka.ap-southeast-1.amazonaws.com:11001",
"b-2.mskdemo.mxqzz7.c4.kafka.ap-southeast-1.amazonaws.com:11001",
"b-5.mskdemo.mxqzz7.c4.kafka.ap-southeast-1.amazonaws.com:11001",
"b-4.mskdemo.mxqzz7.c4.kafka.ap-southeast-1.amazonaws.com:11001",
"b-1.mskdemo.mxqzz7.c4.kafka.ap-southeast-1.amazonaws.com:11001"
]
},
{
"labels": {
"job": "node"
},
"targets": [
"b-3.mskdemo.mxqzz7.c4.kafka.ap-southeast-1.amazonaws.com:11002",
"b-6.mskdemo.mxqzz7.c4.kafka.ap-southeast-1.amazonaws.com:11002",
"b-2.mskdemo.mxqzz7.c4.kafka.ap-southeast-1.amazonaws.com:11002",
"b-5.mskdemo.mxqzz7.c4.kafka.ap-southeast-1.amazonaws.com:11002",
"b-4.mskdemo.mxqzz7.c4.kafka.ap-southeast-1.amazonaws.com:11002",
"b-1.mskdemo.mxqzz7.c4.kafka.ap-southeast-1.amazonaws.com:11002"
]
}
]
由于后面我们要访问Prometheus的web UI,在Cloud 9下要确保8080端口不被占用。我们先将之前的akhq停掉( 进入对应目录执行
docker-compose stop
)
启动prometheus:
./prometheus --web.listen-address="127.0.0.1:8080"
上面命令将prometheus的web UI运行在8080端口,我们可以访问它:
进入到Targets页面:
看到broker下12个endpoint状态都是UP,说明Prometheus成功拉取到了MSK JMX Exporter
和Node Exporter
的数据:
在Graph页面,我们可以获取指标的列表及其详细数据
本文的prometheus是在测试环境下运行。
在生产环境下,要考虑prometheus的高可用,比如使用k8s部署或使用AWS托管的Prometheus,还有要考虑prometheus数据备份等