1162587-20181105224814324-1666171721

版本信息

CORE VERSION PORT ORDER OTHER
Elasticsearch 7.13.3 9200 9300 1 搜索引擎
Kibana 7.13.3 5601 2 数据可视化
Canal-Admin 1.1.15 8489 11110(Admin) 11111(tcp) 11112(metric) 3 Canal管理页面
Canal-Server 1.1.15 4 Canal数据同步,MySQL同步ES(1.1.16版本有bug,回调至低版本)
Prometheus 2.37.0 9090 5 采集
Grafana 9.0.4 3000 6 可视化
Node_exporter 1.3.1 9100 7 系统监控采集

P.S. 服务器为Linux,本文涉及到的ip地址请自行根据个人情况更改,所有组件部署全为单机部署,大部分为服务方式启动,本文仅简单记录部署过程,仅供参考,切勿完全照搬。


准备工作

必备组件

1
yum -y install nano tar net-tools

JDK

需要大于等于1.8版本

1
2
3
4
[root@eck ~]# java -version
openjdk version "1.8.0_332"
OpenJDK Runtime Environment (build 1.8.0_332-b09)
OpenJDK 64-Bit Server VM (build 25.332-b09, mixed mode)

没有安装的参考下这个(自行替换 JAVA_HOME 部分内容)

1
2
3
4
5
6
7
8
9
10
[root@eck ~]# yum -y install java-1.8.0-openjdk.x86_64
[root@eck ~]# nano /etc/profile

#set java environment
JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.342.b07-1.el7_9.x86_64/jre
PATH=$PATH:$JAVA_HOME/bin
CLASSPATH=.:$JAVA_HOME/lib
export JAVA_HOME CLASSPATH PATH

[root@eck jre]# . /etc/profile

关闭防火墙

1
2
3
[root@eck] systemctl disable firewalld
[root@eck] systemctl stop firewalld
[root@eck] systemctl status firewalld

如果不允许关闭防火墙,需要手动开启防火墙端口

1
2
3
4
5
6
7
8
# elasticsearch
[root@eck] firewall-cmd --zone=public --add-port=9200/tcp --permanent
[root@eck] firewall-cmd --zone=public --add-port=9300/tcp --permanent

# kibana
[root@eck] firewall-cmd --zone=public --add-port=5601/tcp --permanent

# P.S. /sbin/iptables -I INPUT -p tcp --dport 5601 -j ACCEPT

如果是云服务器,还需要在云服务器厂商页面开放安全组端口


Linux调优

内存权限

elasticsearch 用户拥有的内存权限太小,至少需要 262144

1
[root@eck] nano /etc/sysctl.conf

修改值大于等于 262144 即可

1
vm.max_map_count=262144

立即生效

1
2
[root@eck] sysctl -p
vm.max_map_count = 262144

最大文件描述符

elasticsearch 用户拥有的最大文件描述符太小,至少需要 65535

1
[root@eck] nano /etc/security/limits.conf

修改值大于等于即可

1
2
3
4
* soft nofile 65536
* hard nofile 131072
* soft nproc 2048
* hard nproc 4096

退出登录Logout,重新登录查看是否生效

1
2
[root@eck ~]# ulimit -n
65536

端口号冲突检测

1
2
3
4
5
6
# Elasticsearch
[root@eck ~] netstat -an |grep :9200
[root@eck ~] netstat -an |grep :9300

# kibana
[root@eck ~] netstat -an |grep :5601

Elasticsearch

下载

最新版本:Download Elasticsearch | Elastic

已测试过的版本:Elasticsearch 7.13.3 | Elastic


安装

1
[root@eck] tar -zxvf elasticsearch-7.13.3-linux-x86_64.tar.gz -C /usr/local

指定JDK

由于ES和JDK存在强指定版本关系,我们需要指定ES使用的JDK版本,ES自带了JDK,此处只需要指定使用ES自带的JDK,而非系统环境变量指向的JDK。如果系统未安装JDK,则默认使用ES自带的JDK

1
2
3
4
[root@eck] cd /usr/local
[root@eck] mv elasticsearch-7.13.3/ elasticsearch/
[root@eck] cd /usr/local/elasticsearch/bin
[root@eck] nano elasticsearch

添加如下内容

1
2
3
4
5
6
7
8
9
10
########### 添加配置解决jdk版本问题 ###########
# 将jdk修改为es中自带jdk的配置目录
export JAVA_HOME=/usr/local/elasticsearch/jdk
export PATH=$JAVA_HOME/bin:$PATH

if [ -x "$JAVA_HOME/bin/java" ]; then
JAVA="/usr/local/elasticsearch/jdk/bin/java"
else
JAVA=`which java`
fi

保存并退出 esc :wq


修改内存上限

请根据服务器配置自行修改

1
[root@eck] nano /usr/local/elasticsearch/config/jvm.options

添加如下内容

1
2
-Xms2g
-Xmx2g

官方推荐系统内存的一半(因为要预留给Lucene OS 到 File Cache,如果你把所有的内存都分配给Elasticsearch,不留一点给 Lucene,那你的全文检索性能会很差的。),但不要超过32GB(64位,32位要更少,因为JVM采用内存对象指针压缩技术,不然对象指针需要占用很大的内存,而不同 JDK 版本最大边界值是不同,所以推荐不超过32GB)

如果你有一台128GB以上的机器,建议一台机器2个Node


创建用户

P.S. root用户无法启动

1
[root@eck] useradd -s /sbin/nologin -M elastic

创建所属组

1
[root@eck] chown elastic:elastic -R /usr/local/elasticsearch

修改日志及数据卷位置

创建如下文件
P.S. 请自行更换至你挂载的硬盘,否则后期空间不够需要自行扩容

1
2
3
4
[root@eck] mkdir -vp /home/elastic/elasticsearch/data
[root@eck] mkdir -vp /home/elastic/elasticsearch/logs
[root@eck] chown elastic:elastic -R /home/elastic/elasticsearch/data
[root@eck] chown elastic:elastic -R /home/elastic/elasticsearch/logs

编辑ES配置文件

1
[root@eck] nano /usr/local/elasticsearch/config/elasticsearch.yml

添加如下内容

1
2
3
4
5
6
7
8
9
10
11
12
# 数据目录位置
path.data: /home/elastic/elasticsearch/data
# 日志目录位置
path.logs: /home/elastic/elasticsearch/logs
# 允许远程访问,否则只能本机访问
network.host: 192.168.0.88
# 初始化节点名称
cluster.name: elasticsearch
node.name: es-node0
cluster.initial_master_nodes: ["es-node0"]
# 端口号冲突改此处,非必须
# http.port: 19200

启动

服务启动

[user-es@eck] nano /usr/lib/systemd/system/elasticsearch.service

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
[Unit]
Description=elasticsearch
After=network.target

[Service]
Type=forking
User=elastic
Restart=on-failure
RestartSec=15s
ExecStart=/usr/local/elasticsearch/bin/elasticsearch -d
PrivateTmp=true
# 指定此进程可以打开的最大文件数
LimitNOFILE=65535
# 指定此进程可以打开的最大进程数
LimitNPROC=65535
# 最大虚拟内存
LimitAS=infinity
# 最大文件大小
LimitFSIZE=infinity
# 超时设置 0-永不超时
TimeoutStopSec=0
# SIGTERM是停止java进程的信号
KillSignal=SIGTERM
# 信号只发送给给JVM
KillMode=process
# java进程不会被杀掉
SendSIGKILL=no
# 正常退出状态
SuccessExitStatus=143

[Install]
WantedBy=multi-user.target
1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@eck elasticsearch]# systemctl start elasticsearch
[root@eck elasticsearch]# systemctl status elasticsearch
● elasticsearch.service - elasticsearch
Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; disabled; vendor preset: disabled)
Active: active (running) since 二 2022-08-02 17:02:05 CST; 4s ago
Process: 9969 ExecStart=/usr/local/elasticsearch/bin/elasticsearch -d (code=exited, status=0/SUCCESS)
Main PID: 10146 (java)
CGroup: /system.slice/elasticsearch.service
└─10146 /usr/local/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8...

8月 02 17:02:03 eck systemd[1]: Starting elasticsearch...
8月 02 17:02:05 eck systemd[1]: Started elasticsearch.
[root@eck elasticsearch]# systemctl enable elasticsearch
Created symlink from /etc/systemd/system/multi-user.target.wants/elasticsearch.service to /usr/lib/systemd/system/elasticsearch.service.

验证启动

浏览器访问 http://ip:port,例如:http://192.168.1.48:9200/

【选读/点击查看】外网访问需要配置Nginx

nginx需要监听此端口,升级WebSocketttp协议,调整最大文件描述符,调整心跳

一般而言,ES禁止对外访问,防止机器人扫描漏洞攻击

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
server {
listen 9200;
server_name elastic;
open_file_cache max=65535 inactive=20s;
client_max_body_size 256m;
client_header_buffer_size 32k;
large_client_header_buffers 4 32k;
location / {
proxy_pass http://127.0.0.1:9200;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header X-Real-IP $remote_addr;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
}

修改密码

ES7.x以后的版本将安全认证功能免费开放了,并将X-pack插件集成了到了开源的ElasticSearch版本中,修改密码ES必须先启动

1
[root@eck] nano /usr/local/elasticsearch/config/elasticsearch.yml

末尾处添加如下内容

1
2
3
# 密码配置
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true

重启ES

1
[root@eck] systemctl restart elasticsearch

修改密码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[user-es@eck] sh /usr/local/elasticsearch/bin/elasticsearch-setup-passwords interactive
123465
Initiating the setup of passwords for reserved users elastic,apm_system,kibana,kibana_system,logstash_system,beats_system,remote_monitoring_user.
You will be prompted to enter passwords as the process progresses.
Please confirm that you would like to continue [y/N]y
Enter password for [elastic]:
Reenter password for [elastic]:
Enter password for [apm_system]:
Reenter password for [apm_system]:
Enter password for [kibana_system]:
Reenter password for [kibana_system]:
Enter password for [logstash_system]:
Reenter password for [logstash_system]:
Enter password for [beats_system]:
Reenter password for [beats_system]:
Enter password for [remote_monitoring_user]:
Reenter password for [remote_monitoring_user]:
Changed password for user [apm_system]
Changed password for user [kibana_system]
Changed password for user [kibana]
Changed password for user [logstash_system]
Changed password for user [beats_system]
Changed password for user [remote_monitoring_user]
Changed password for user [elastic]

再次登录,验证密码是否正确 http://192.168.1.48:9200/,输入用户名elastic和密码能正常访问则成功

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
"name" : "es-node0",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "IIaI-dwFS1Oa7dSEyNb2xA",
"version" : {
"number" : "7.13.3",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "5d21bea28db1e89ecc1f66311ebdec9dc3aa7d64",
"build_date" : "2021-07-02T12:06:10.804015202Z",
"build_snapshot" : false,
"lucene_version" : "8.8.2",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}

备份

创建仓库

1
2
3
4
5
6
7
8
9
10
mkdir /home/elastic/backup
mkdir /home/elastic/backup/esbackup
mkdir /home/elastic/backup/stream-backup
chown -R elastic:elastic /home/elastic/backup
nano /usr/local/elasticsearch/config/elasticsearch.yml

path:
repo:
- /home/elastic/backup/esbackup
- /home/elastic/backup/stream-backup
1
2
3
4
5
6
7
8
PUT /_snapshot/my_fs_backup
{
"type": "fs",
"settings": {
"location": "/home/elastic/backup/esbackup/My_fs_backup_location",
"compress": "true"
}
}

compress:启用压缩


查询仓库

1
2
3
4
5
6
7
8
9
10
11
GET /_snapshot/_all

{
"my_fs_backup" : {
"type" : "fs",
"settings" : {
"compress" : "true",
"location" : "/home/elastic/backup/esbackup/My_fs_backup_location"
}
}
}

验证是否生效

通过 verify 验证节点仓库是否在所有节点已生效

1
2
3
4
5
6
7
8
9
POST /_snapshot/my_fs_backup/_verify

{
"nodes" : {
"foxFx1TVQmiP3X4C5OgOsg" : {
"name" : "es-node0"
}
}
}

删除仓库

1
DELETE /_snapshot/my_fs_backup

创建快照

一个仓库可以包含多个 Snapshot ,一个 Snapshot 在集群中的名字是唯一的。Snapshot 快照备份的内容仅包含截止快照开始时间之前的数据,快照之后的数据需要通过不断的增量 Snapshot 来捕获。通过PUT请求创建一个 Snapshot ,默认备份集群所有可读索引、流,如果需要部分备份则可以通过传参来指定。

1
2
3
4
5
6
7
8
9
10
11
# wait_for_completion参数表示是否要同步等Snapshot创建完成再返回,PUT请求如果传参为空则默认备份所有可读索引、流
PUT /_snapshot/my_fs_backup/snapshot_1?wait_for_completion=true
{
"indices": "hundredsman,index_1,index_2",
"ignore_unavailable": true,
"include_global_state": false,
"metadata": {
"taken_by": "Leopold",
"taken_because": "Leopold init backup"
}
}

删除快照

1
2
3
4
DELETE /_snapshot/my_fs_backup/snapshot_1
# 删除多个可以用逗号分隔或者通配符
DELETE /_snapshot/my_fs_backup/snapshot_2,snapshot_3
DELETE /_snapshot/my_fs_backup/snap*

如果 Snapshot 正在创建过程中,ElasticSearch 也会终止任务,并删除所有 Snapshot 相关的数据。但要注意不能手动删除仓库里的备份数据,这样会有数据损坏的风险。


监控进度

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 查看当前Snapshot状态
GET /_snapshot/my_fs_backup/_current

# 指定Snapshot查看
GET /_snapshot/my_fs_backup/snapshot_1
GET /_snapshot/my_fs_backup/snapshot_*

# 查看所有仓库(如果建了多个仓库的话)
GET /_snapshot/_all
GET /_snapshot/my_fs_backup,my_hdfs_backup
GET /_snapshot/my*

# 指定查看某一个Snapshot的进度详情
GET /_snapshot/my_fs_backup/snapshot_1/_status

例如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# 指定Snapshot查看
GET /_snapshot/my_fs_backup/snapshot_1

{
"snapshots" : [
{
"snapshot" : "snapshot_1",
"uuid" : "9gXttlxpS3ao7N3IZdVr9A",
"version_id" : 7130399,
"version" : "7.13.3",
"indices" : [
".kibana_security_session_1",
".tasks",
"pt-platform-logs-2022-08-10",
".kibana-event-log-7.13.3-000001",
".apm-custom-link",
".async-search",
".ds-ilm-history-5-2022.08.02-000001",
".security-7",
".kibana_7.13.3_001",
".apm-agent-configuration",
"i_platform_medical_record_result",
"i_platform_medical_record",
".kibana_task_manager_7.13.3_001"
],
"data_streams" : [
"ilm-history-5"
],
"include_global_state" : true,
"state" : "SUCCESS",
"start_time" : "2022-08-15T02:17:33.167Z",
"start_time_in_millis" : 1660529853167,
"end_time" : "2022-08-15T02:21:57.109Z",
"end_time_in_millis" : 1660530117109,
"duration_in_millis" : 263942,
"failures" : [ ],
"shards" : {
"total" : 17,
"failed" : 0,
"successful" : 17
},
"feature_states" : [
{
"feature_name" : "security",
"indices" : [
".security-7"
]
},
{
"feature_name" : "async_search",
"indices" : [
".async-search"
]
},
{
"feature_name" : "kibana",
"indices" : [
".kibana_task_manager_7.13.3_001",
".kibana_7.13.3_001",
".kibana_security_session_1",
".apm-agent-configuration",
".apm-custom-link"
]
},
{
"feature_name" : "tasks",
"indices" : [
".tasks"
]
}
]
}
]
}

Restore恢复

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# 不带参数的请求默认恢复所有Snapshot中的索引、流
POST /_snapshot/my_fs_backup/snapshot_1/_restore

# 如果需要恢复特定的索引、流,可以在POST参数中指定
POST /_snapshot/my_fs_backup/snapshot_1/_restore
{
"indices": "index*",
"ignore_unavailable": true,
# include_global_state默认为true,是设置集群全局状态
"include_global_state": false,
# 重命名索引匹配规则,如: index_1
"rename_pattern": "index_(.+)",
# 重命名索引为新的规则,如: re_index_1
"rename_replacement": "re_index_$1",
"include_aliases": false
}

# 如果索引已经存在,会提示已经有同名索引存在,需要重命名。
{
"error": {
"root_cause": [
{
"type": "snapshot_restore_exception",
"reason": "[my_fs_backup:snapshot_1/90A9o4hORUCv732HTQBfRQ] cannot restore index [index_1] because an open index with same name already exists in the cluster. Either close or delete the existing index or restore the index under a different name by providing a rename pattern and replacement name"
}
]
},
"status": 500
}

监控Restore恢复状态

Restore 恢复启动后,因为 Restore 在恢复索引的主分片,所以集群状态会变成 yellow,主分片恢复完成后 Elasticsearch 开始根据副本设置的策略恢复副本数,所有操作完成后集群才会恢复到 green 状态。也可以先把索引的副本数修改为0,待主分片完成后再修改到目标副本数。Restore 恢复状态可以通过监控集群或者指定索引的 Recovery 状态。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 查看集群恢复状态,更多请参考集群恢复监控接口:https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-recovery.html
GET /_cat/recovery/

#! this request accesses system indices: [.apm-agent-configuration, .apm-custom-link, .async-search, .kibana_7.13.3_001, .kibana_task_manager_7.13.3_001, .security-7, .tasks], but in a future major version, direct access to system indices will be prevented by default
.kibana_7.13.3_001 0 452ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 30 0 0 100.0% 2267250 0 0 100.0%
.security-7 0 423ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 38 0 0 100.0% 195699 0 0 100.0%
.apm-custom-link 0 39ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 1 0 0 100.0% 208 0 0 100.0%
.kibana-event-log-7.13.3-000001 0 76ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 25 0 0 100.0% 24651 0 0 100.0%
.apm-agent-configuration 0 76ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 1 0 0 100.0% 208 0 0 100.0%
pt-platform-logs-2022-08-10 0 147ms snapshot done n/a n/a 192.168.0.88 es-node0 my_fs_backup snapshot_1 10 10 100.0% 10 27456 27456 100.0% 27456 0 0 100.0%
.async-search 0 96ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 4 0 0 100.0% 3481 0 0 100.0%
i_platform_medical_record_result 0 644ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 82 0 0 100.0% 326927787 0 0 100.0%
i_platform_medical_record_result 1 313ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 120 0 0 100.0% 331827725 0 0 100.0%
i_platform_medical_record_result 2 273ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 102 0 0 100.0% 328299966 0 0 100.0%
i_platform_medical_record 0 2.2s existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 147 0 0 100.0% 3344435623 0 0 100.0%
i_platform_medical_record 1 2.2s existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 141 0 0 100.0% 3333267188 0 0 100.0%
i_platform_medical_record 2 2.2s existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 138 0 0 100.0% 3328457583 0 0 100.0%
.kibana_task_manager_7.13.3_001 0 108ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 71 0 0 100.0% 132821 0 0 100.0%
.tasks 0 346ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 24 0 0 100.0% 27283 0 0 100.0%
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# 查看索引的恢复状态,更多请参考索引恢复监控接口:https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-recovery.html
GET /pt-platform-logs*/_recovery

{
"pt-platform-logs-2022-08-10" : {
"shards" : [
{
"id" : 0,
"type" : "SNAPSHOT",
"stage" : "DONE",
"primary" : true,
"start_time_in_millis" : 1660530572451,
"stop_time_in_millis" : 1660530572598,
"total_time_in_millis" : 147,
"source" : {
"repository" : "my_fs_backup",
"snapshot" : "snapshot_1",
"version" : "7.13.3",
"index" : "pt-platform-logs-2022-08-10",
"restoreUUID" : "KZA2lOpRRH2_rKHNPPEUvg"
},
"target" : {
"id" : "foxFx1TVQmiP3X4C5OgOsg",
"host" : "192.168.0.88",
"transport_address" : "192.168.0.88:9300",
"ip" : "192.168.0.88",
"name" : "es-node0"
},
"index" : {
"size" : {
"total_in_bytes" : 27456,
"reused_in_bytes" : 0,
"recovered_in_bytes" : 27456,
"percent" : "100.0%"
},
"files" : {
"total" : 10,
"reused" : 0,
"recovered" : 10,
"percent" : "100.0%"
},
"total_time_in_millis" : 96,
"source_throttle_time_in_millis" : 0,
"target_throttle_time_in_millis" : 0
},
"translog" : {
"recovered" : 0,
"total" : 0,
"percent" : "100.0%",
"total_on_start" : 0,
"total_time_in_millis" : 37
},
"verify_index" : {
"check_index_time_in_millis" : 0,
"total_time_in_millis" : 0
}
}
]
}
}

其他

停止ES

1
[root@eck] systemctl stop elasticsearch

端口说明

9300:tcp通讯端口,集群ES节点之间通讯使用

9200:http协议的RESTFUL接口


查看ES日志

1
[user-es@eck] tail -f -n 300 /home/elastic/elasticsearch/logs/elasticsearch.log

禁用内存交换

内存交换到磁盘对服务器性能来说是致命的,如果内存交换到磁盘上,一个100微秒的操作可能变成10毫秒,再想想那么多10微秒的操作时延累加起来。不难看出swapping对于性能是多么可怕。

最好的办法就是在你的操作系统中完全禁用swapping。这样可以暂时禁用:

1
sudo swapoff -a

为了永久禁用它,你可能需要修改/etc/fstab文件,这要参考你的操作系统相关文档。

如果完全禁用swap,对你来说是不可行的。你可以降低swappiness 的值,这个值决定操作系统交换内存的频率。这可以预防正常情况下发生交换。但仍允许os在紧急情况下发生交换。对于大部分Linux操作系统,可以在sysctl 中这样配置:

1
vm.swappiness = 1

swappiness设置为1比设置为0要好,因为在一些内核版本,swappness=0会引发OOM

最后,如果上面的方法都不能做到,你需要打开配置文件中的mlockall开关,它的作用就是运行JVM锁住内存,禁止OS交换出去。在elasticsearch.yml配置如下: 以下配置好像报错

1
bootstrap.mlockall: true

迁移

本机只需要修改文件路径并复制文件内容即可


Kibana

下载

最新版本:Download Kibana Free | Get Started Now | Elastic

已测试过的版本:Kibana 7.13.3 | Elastic

注意:Elasticsearch 和 Kibana 版本号需强一致性


安装

1
2
3
4
5
[root@eck] cd /root/zip
[root@eck] tar -zxvf kibana-7.13.3-linux-x86_64.tar.gz -C /usr/local
[root@eck] mv /usr/local/kibana-7.13.3-linux-x86_64/ /usr/local/kibana/
[root@eck] mkdir -p /var/log/kibana/
[root@eck] chown -R kibana:kibana /var/log/kibana/

配置

1
[root@eck] nano /usr/local/kibana/config/kibana.yml

添加如下内容

1
2
3
4
5
6
7
8
9
server.name: kibana
server.host: "0.0.0.0"
server.port: 5601
elasticsearch.hosts: [ "http://127.0.0.1:9200" ]
monitoring.ui.container.elasticsearch.enabled: true
i18n.locale: "zh-CN"
elasticsearch.username: "elastic"
elasticsearch.password: "test123!@#"
logging.dest: /var/log/kibana/kibana.log

如果内存紧张(一般会占用1.4GB/64位),可以调整JVM老年代大小,将可执行文件kibanad的NODE_OPTIONS中加入--max_old_space_size=200,数值可以适当调整,然后重新运行即可

1
[root@eck] nano /usr/local/kibana/config/node.options

添加如下内容

1
NODE_OPTIONS="--no-warnings --max-http-header-size=65536 ${NODE_OPTIONS} --max-old-space-size=200"

创建用户

root用户无法启动

创建用户

1
[root@eck] useradd -s /sbin/nologin -M kibana

创建所属组

1
2
[root@eck] chown kibana:kibana -R /usr/local/kibana
[root@eck] chown kibana:kibana -R /var/log/kibana

启动

[root@eck] nano /usr/lib/systemd/system/kibana.service

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[Unit]
Description=kibana
After=network.target

[Service]
Type=simple
User=kibana
Restart=on-failure
RestartSec=15s
ExecStart=/usr/local/kibana/bin/kibana
PrivateTmp=true

[Install]
WantedBy=multi-user.target
1
2
3
systemctl start kibana
systemctl status kibana
systemctl enable kibana

其他

停止

1
systemctl stop kibana

Canal-Admin

创建用户

1
useradd -s /sbin/nologin -M canal

创建文件夹

1
2
3
mkdir /usr/local/canal
mkdir /usr/local/canal/admin
mkdir /usr/local/canal/server

配置环境变量

1
2
mkdir /home/canal
nano /home/canal/.bashrc

追加如下内容(根据实际位置配置)

1
2
export CANAL_HOME=/usr/local/canal/server
export PATH=$PATH:$CANAL_HOME/bin

安装

解压

1
tar -zxvf canal.admin-1.1.6.tar.gz -C /usr/local/canal/admin

创建所属组

1
chown canal:canal -R /usr/local/canal/admin

修改配置文件

1
nano /usr/local/canal/admin/conf/application.yml

按需求改为如下配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
server:
port: 8489
spring:
jackson:
date-format: yyyy-MM-dd HH:mm:ss
time-zone: GMT+8

spring.datasource:
address: 192.168.0.89:3306
database: canal_manager
username: canal
password: canal
driver-class-name: com.mysql.cj.jdbc.Driver
url: jdbc:mysql://${spring.datasource.address}/${spring.datasource.database}?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai
hikari:
maximum-pool-size: 30
minimum-idle: 1

canal:
adminUser: admin
adminPasswd: admin

上传MySQL对应jar包 mysql-connector-java-8.0.22.jar/usr/local/canal/admin/lib 目录下,删除mysql-connector-java-5.1.48.jar


MySQL

修改MySQL模式,配置 my.cnf

1
2
3
4
[mysqld]
log-bin=mysql-bin #添加这一行就 ok
binlog-format=ROW #选择 row 模式
server_id=1 #配置 mysql replaction 需要定义,不能和canal的slaveId重复

重启后查看变更结果

1
SQL> show variables like 'binlog_format%';

创建数据库用户

1
2
3
4
5
6
7
8
9
10
SQL> CREATE USER canal IDENTIFIED BY 'canal';
SQL> GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
SQL> grant select,insert,update,delete,create,alter on *.* to 'canal';
SQL> FLUSH PRIVILEGES;

ALTER USER 'canal'@'%' IDENTIFIED BY 'canal' PASSWORD EXPIRE NEVER;
ALTER USER 'canal'@'%' IDENTIFIED WITH mysql_native_password BY 'canal';
FLUSH PRIVILEGES;

SELECT DISTINCT CONCAT('User: ''',user,'''@''',host,''';') AS query FROM mysql.user;

在数据库中执行 /usr/local/canal/admin/conf/canal_manager.sql

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `canal_manager` /*!40100 DEFAULT CHARACTER SET utf8 COLLATE utf8_bin */;

USE `canal_manager`;

SET NAMES utf8;
SET FOREIGN_KEY_CHECKS = 0;

-- ----------------------------
-- Table structure for canal_adapter_config
-- ----------------------------
DROP TABLE IF EXISTS `canal_adapter_config`;
CREATE TABLE `canal_adapter_config` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`category` varchar(45) NOT NULL,
`name` varchar(45) NOT NULL,
`status` varchar(45) DEFAULT NULL,
`content` text NOT NULL,
`modified_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

-- ----------------------------
-- Table structure for canal_cluster
-- ----------------------------
DROP TABLE IF EXISTS `canal_cluster`;
CREATE TABLE `canal_cluster` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`name` varchar(63) NOT NULL,
`zk_hosts` varchar(255) NOT NULL,
`modified_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

-- ----------------------------
-- Table structure for canal_config
-- ----------------------------
DROP TABLE IF EXISTS `canal_config`;
CREATE TABLE `canal_config` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`cluster_id` bigint(20) DEFAULT NULL,
`server_id` bigint(20) DEFAULT NULL,
`name` varchar(45) NOT NULL,
`status` varchar(45) DEFAULT NULL,
`content` text NOT NULL,
`content_md5` varchar(128) NOT NULL,
`modified_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `sid_UNIQUE` (`server_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

-- ----------------------------
-- Table structure for canal_instance_config
-- ----------------------------
DROP TABLE IF EXISTS `canal_instance_config`;
CREATE TABLE `canal_instance_config` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`cluster_id` bigint(20) DEFAULT NULL,
`server_id` bigint(20) DEFAULT NULL,
`name` varchar(45) NOT NULL,
`status` varchar(45) DEFAULT NULL,
`content` text NOT NULL,
`content_md5` varchar(128) DEFAULT NULL,
`modified_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `name_UNIQUE` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

-- ----------------------------
-- Table structure for canal_node_server
-- ----------------------------
DROP TABLE IF EXISTS `canal_node_server`;
CREATE TABLE `canal_node_server` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`cluster_id` bigint(20) DEFAULT NULL,
`name` varchar(63) NOT NULL,
`ip` varchar(63) NOT NULL,
`admin_port` int(11) DEFAULT NULL,
`tcp_port` int(11) DEFAULT NULL,
`metric_port` int(11) DEFAULT NULL,
`status` varchar(45) DEFAULT NULL,
`modified_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

-- ----------------------------
-- Table structure for canal_user
-- ----------------------------
DROP TABLE IF EXISTS `canal_user`;
CREATE TABLE `canal_user` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`username` varchar(31) NOT NULL,
`password` varchar(128) NOT NULL,
`name` varchar(31) NOT NULL,
`roles` varchar(31) NOT NULL,
`introduction` varchar(255) DEFAULT NULL,
`avatar` varchar(255) DEFAULT NULL,
`creation_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

SET FOREIGN_KEY_CHECKS = 1;

-- ----------------------------
-- Records of canal_user
-- ----------------------------
BEGIN;
INSERT INTO `canal_user` VALUES (1, 'admin', '6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9', 'Canal Manager', 'admin', NULL, NULL, '2019-07-14 00:05:28');
COMMIT;

SET FOREIGN_KEY_CHECKS = 1;

启动

检查java

1
2
3
which java
// 如果不是默认安装,需要添加一个软连接,例如:
ln -s /tar/jdk1.8.0_60/bin/java /usr/bin/java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
nano /usr/lib/systemd/system/canal-admin.service

[Unit]
Description=canal-admin
After=network.target

[Service]
Type=forking
User=canal
Restart=on-failure
RestartSec=15s
ExecStart=/usr/local/canal/admin/bin/startup.sh
ExecStop=/usr/local/canal/admin/bin/stop.sh
PrivateTmp=true

[Install]
WantedBy=multi-user.target
1
2
3
systemctl start canal-admin
systemctl status canal-admin
systemctl enable canal-admin

访问 http://192.168.1.48:8489/

默认用户名:admin

默认密码:123456


其他

查看日志

1
tail -f -n 300 /usr/local/canal/admin/logs/admin.log

修改内存占用

Canal默认3G,如果需要更改,请更改 /usr/local/canal/admin/bin/startup.sh

1
2
3
4
5
6
7
if [ -n "$str" ]; then
# JAVA_OPTS="-server -Xms1024m -Xmx1536m -Xmn512m -XX:SurvivorRatio=2 -XX:PermSize=96m -XX:MaxPermSize=256m -XX:MaxTenuringThreshold=15 -XX:+DisableExplicitGC $JAVA_OPTS"
# For G1
JAVA_OPTS="-server -Xms1g -Xmx2g -XX:+UseG1GC -XX:MaxGCPauseMillis=250 -XX:+UseGCOverheadLimit -XX:+ExplicitGCInvokesConcurrent $JAVA_OPTS"
else
JAVA_OPTS="-server -Xms1024m -Xmx1024m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:MaxPermSize=128m $JAVA_OPTS"
fi

Canal-server

安装

解压

1
tar -zxvf canal.deployer-1.1.6.tar.gz -C /usr/local/canal/server/

创建所属组

1
chown canal:canal -R /usr/local/canal/server/

连接Canal-admin

修改配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
nano /usr/local/canal/server/conf/canal_local.properties

# register ip
canal.register.ip =

# canal admin config 改为Canal-admin的端口号
canal.admin.manager = 192.168.1.48:8489
canal.admin.port = 11110
canal.admin.user = admin
canal.admin.passwd = 4ACFE3202A5FF5CF467898FC58AAB1D615029441
# admin auto register
canal.admin.register.auto = true
canal.admin.register.cluster =
canal.admin.register.name =

启动

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
nano /usr/lib/systemd/system/canal-server.service

[Unit]
Description=canal-server
After=network.target

[Service]
Type=forking
User=canal
Restart=on-failure
RestartSec=15s
ExecStart=/usr/local/canal/server/bin/startup.sh local
ExecStop=/usr/local/canal/server/bin/stop.sh
PrivateTmp=true

[Install]
WantedBy=multi-user.target
1
2
3
systemctl start canal-server
systemctl status canal-server
systemctl enable canal-admin

其他

查看日志

1
tail -f -n 300 /usr/local/canal/server/logs/canal/canal.log

修改内存占用

Canal默认3G,如果需要更改,请更改 :

1
2
3
4
5
6
7
8
9
nano /usr/local/canal/server/bin/startup.sh

if [ -n "$str" ]; then
# JAVA_OPTS="-server -Xms1024m -Xmx1536m -Xmn512m -XX:SurvivorRatio=2 -XX:PermSize=96m -XX:MaxPermSize=256m -XX:MaxTenuringThreshold=15 -XX:+DisableExplicitGC $JAVA_OPTS"
# For G1
JAVA_OPTS="-server -Xms1g -Xmx2g -XX:+UseG1GC -XX:MaxGCPauseMillis=250 -XX:+UseGCOverheadLimit -XX:+ExplicitGCInvokesConcurrent $JAVA_OPTS"
else
JAVA_OPTS="-server -Xms1024m -Xmx1024m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:MaxPermSize=128m $JAVA_OPTS"
fi

Canal 配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
#################################################
######### common argument #############
#################################################
# canal server绑定的本地IP信息,如果不配置,默认选择一个本机IP进行启动服务
canal.ip =
# register ip to zookeeper
canal.register.ip =
# canal server提供socket服务的端口
canal.port = 11111
canal.metrics.pull.port = 11112
# canal instance user/passwd
# canal.user = canal
# canal.passwd = E3619321C1A937C46A0D8BD1DAC39F93B27D4458

# canal admin config
#canal.admin.manager = 127.0.0.1:8089
canal.admin.port = 11110
canal.admin.user = admin
canal.admin.passwd = 4ACFE3202A5FF5CF467898FC58AAB1D615029441
# admin auto register
#canal.admin.register.auto = true
#canal.admin.register.cluster =
#canal.admin.register.name =

canal.zkServers =
# flush data to zk
canal.zookeeper.flush.period = 1000
canal.withoutNetty = false
# tcp, kafka, rocketMQ, rabbitMQ
canal.serverMode = tcp
# flush meta cursor/parse position to file
canal.file.data.dir = ${canal.conf.dir}
canal.file.flush.period = 1000
## memory store RingBuffer size, should be Math.pow(2,n) 16384
canal.instance.memory.buffer.size = 1048576
## memory store RingBuffer used memory unit size , default 1kb
canal.instance.memory.buffer.memunit = 1024
## meory store gets mode used MEMSIZE or ITEMSIZE
#canal内存store中数据缓存模式
#1. ITEMSIZE : 根据buffer.size进行限制,只限制记录的数量
#2. MEMSIZE : 根据buffer.size * buffer.memunit的大小,限制缓存记录的大小
canal.instance.memory.batch.mode = MEMSIZE
canal.instance.memory.rawEntry = true

## detecing config
canal.instance.detecting.enable = false
#canal.instance.detecting.sql = insert into retl.xdual values(1,now()) on duplicate key update x=now()
canal.instance.detecting.sql = select 1
canal.instance.detecting.interval.time = 3
canal.instance.detecting.retry.threshold = 3
canal.instance.detecting.heartbeatHaEnable = false

# support maximum transaction size, more than the size of the transaction will be cut into multiple transactions delivery
canal.instance.transaction.size = 1024
# mysql fallback connected to new master should fallback times
canal.instance.fallbackIntervalInSeconds = 60

# network config
canal.instance.network.receiveBufferSize = 16384
canal.instance.network.sendBufferSize = 16384
canal.instance.network.soTimeout = 30

# binlog filter config
canal.instance.filter.druid.ddl = true
canal.instance.filter.query.dcl = false
canal.instance.filter.query.dml = false
canal.instance.filter.query.ddl = false
canal.instance.filter.table.error = false
canal.instance.filter.rows = false
canal.instance.filter.transaction.entry = false
canal.instance.filter.dml.insert = false
canal.instance.filter.dml.update = false
canal.instance.filter.dml.delete = false

# binlog format/image check
canal.instance.binlog.format = ROW,STATEMENT,MIXED
canal.instance.binlog.image = FULL,MINIMAL,NOBLOB

# binlog ddl isolation
canal.instance.get.ddl.isolation = false

# parallel parser config
canal.instance.parser.parallel = true
## concurrent thread number, default 60% available processors, suggest not to exceed Runtime.getRuntime().availableProcessors()
#canal.instance.parser.parallelThreadSize = 16
## disruptor ringbuffer size, must be power of 2
canal.instance.parser.parallelBufferSize = 256

# table meta tsdb info
canal.instance.tsdb.enable = true
canal.instance.tsdb.dir = ${canal.file.data.dir:../conf}/${canal.instance.destination:}
canal.instance.tsdb.url = jdbc:h2:${canal.instance.tsdb.dir}/h2;CACHE_SIZE=1000;MODE=MYSQL;
canal.instance.tsdb.dbUsername = canal
canal.instance.tsdb.dbPassword = canal
# dump snapshot interval, default 24 hour
canal.instance.tsdb.snapshot.interval = 24
# purge snapshot expire , default 360 hour(15 days)
canal.instance.tsdb.snapshot.expire = 360

#################################################
######### destinations #############
#################################################
canal.destinations = zhikong
# conf root dir
canal.conf.dir = ../conf
# auto scan instance dir add/remove and start/stop instance
canal.auto.scan = true
canal.auto.scan.interval = 5
# set this value to 'true' means that when binlog pos not found, skip to latest.
# WARN: pls keep 'false' in production env, or if you know what you want.
canal.auto.reset.latest.pos.mode = false

canal.instance.tsdb.spring.xml = classpath:spring/tsdb/h2-tsdb.xml
#canal.instance.tsdb.spring.xml = classpath:spring/tsdb/mysql-tsdb.xml

canal.instance.global.mode = manager
canal.instance.global.lazy = false
canal.instance.global.manager.address = ${canal.admin.manager}
#canal.instance.global.spring.xml = classpath:spring/memory-instance.xml
canal.instance.global.spring.xml = classpath:spring/file-instance.xml
#canal.instance.global.spring.xml = classpath:spring/default-instance.xml

##################################################
######### MQ Properties #############
##################################################
# aliyun ak/sk , support rds/mq
canal.aliyun.accessKey =
canal.aliyun.secretKey =
canal.aliyun.uid=

canal.mq.flatMessage = true
canal.mq.canalBatchSize = 50
canal.mq.canalGetTimeout = 100
# Set this value to "cloud", if you want open message trace feature in aliyun.
canal.mq.accessChannel = local

canal.mq.database.hash = true
canal.mq.send.thread.size = 30
canal.mq.build.thread.size = 8

##################################################
######### Kafka #############
##################################################
kafka.bootstrap.servers = 127.0.0.1:6667
kafka.acks = all
kafka.compression.type = none
kafka.batch.size = 16384
kafka.linger.ms = 1
kafka.max.request.size = 1048576
kafka.buffer.memory = 33554432
kafka.max.in.flight.requests.per.connection = 1
kafka.retries = 0

kafka.kerberos.enable = false
kafka.kerberos.krb5.file = "../conf/kerberos/krb5.conf"
kafka.kerberos.jaas.file = "../conf/kerberos/jaas.conf"

##################################################
######### RocketMQ #############
##################################################
rocketmq.producer.group = test
rocketmq.enable.message.trace = false
rocketmq.customized.trace.topic =
rocketmq.namespace =
rocketmq.namesrv.addr = 127.0.0.1:9876
rocketmq.retry.times.when.send.failed = 0
rocketmq.vip.channel.enabled = false
rocketmq.tag =

##################################################
######### RabbitMQ #############
##################################################
rabbitmq.host =
rabbitmq.virtual.host =
rabbitmq.exchange =
rabbitmq.username =
rabbitmq.password =
rabbitmq.deliveryMode =

Prometheus

Prometheus中文发音为普罗米修斯,它可以使用各种数学算法实现强大的监控需求,并且原生支持K8S的服务发现,能监控容器的动态变化。结合Grafana绘出漂亮图形,最终使用alertmanager或Grafana实现报警。它与其他监控相比有以下主要优势:数据格式是Key/Value形式,简单、速度快;监控数据的精细程度是绝对的领先,达到秒级(但正因为数据采集精度高,对磁盘消耗大,存在性能瓶颈,而且不支持集群,但可以通过联邦能力进行扩展);不依赖分布式存储,数据直接保存在本地,可以不需要额外的数据库配置。但是如果对历史数据有较高要求,可以结合OpenTSDB;周边插件丰富,如果对监控要求不是特别严格的话,默认的几个成品插件已经足够使用;本身基于数学计算模型,有大量的函数可用,可以实现很复杂的监控(所以学习成本高,需要有一定数学思维,独有的数学命令行很难入门);可以嵌入很多开源工具的内部去进行监控,数据更可信。

有句话说得好,服务崩了,Prometheus 也不能崩,比如2021年的B站崩溃事件,大量服务无法访问的情况下,Prometheus 依然能够运行并检测系统状态,才有了后来的问题精准定位和复盘,他是崩溃时运维能力的保障,也是崩溃时的最后一道防线


安装

Download | Prometheus


创建用户

1
[root@eck] useradd -s /sbin/nologin -M prometheus

部署

1
2
3
4
5
6
7
8
9
[root@eck zip]# mkdir /usr/local/prometheus
[root@eck zip]# tar zxvf prometheus-2.37.0.linux-amd64.tar.gz -C /usr/local/prometheus/
[root@eck zip]# cd /usr/local/prometheus/prometheus-2.37.0.linux-amd64
[root@eck prometheus-2.37.0.linux-amd64]# mv * ../
[root@eck prometheus-2.37.0.linux-amd64]# cd ../
[root@eck prometheus]# rm -rf prometheus-2.37.0.linux-amd64/
[root@eck prometheus]# cp prometheus.yml prometheus.yml.bak
[root@eck prometheus]# mkdir /usr/local/prometheus/data
nano /usr/local/prometheus/prometheus.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
scrape_timeout: 30s
# scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"

# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.

static_configs:
- targets: ["192.168.1.48:9090"]
- job_name: "canal"
static_configs:
- targets: ["192.168.1.48:11112"]

如果需要更换端口,请同步更改上面的 - targets: ["localhost:9090"]./prometheus --web.listen-address="0.0.0.0:9080"

1
2
3
4
5
[root@eck prometheus]# /usr/local/prometheus/promtool check config /usr/local/prometheus/prometheus.yml
Checking prometheus.yml
SUCCESS: prometheus.yml is valid prometheus config file syntax
[root@eck prometheus] chown prometheus:prometheus -R /usr/local/prometheus
[root@eck prometheus] nano /usr/lib/systemd/system/prometheus.service
1
2
3
4
5
6
7
8
9
10
11
[Unit]
Description=The Prometheus Server
After=network.target

[Service]
Restart=on-failure
RestartSec=15s
ExecStart=/usr/local/prometheus/prometheus --web.external-url=prometheus --config.file=/usr/local/prometheus/prometheus.yml --log.level "info"

[Install]
WantedBy=multi-user.target
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[root@eck system]# systemctl daemon-reload
[root@eck system]# systemctl start prometheus
[root@eck system]# systemctl status prometheus

● prometheus.service - The Prometheus Server
Loaded: loaded (/usr/lib/systemd/system/prometheus.service; disabled; vendor preset: disabled)
Active: active (running) since Thu 2022-07-28 14:24:08 CST; 6s ago
Main PID: 40225 (prometheus)
CGroup: /system.slice/prometheus.service
└─40225 /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml

Jul 28 14:24:09 eck prometheus[40225]: ts=2022-07-28T06:24:09.139Z caller=head.go:536 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=221.285µs
Jul 28 14:24:09 eck prometheus[40225]: ts=2022-07-28T06:24:09.139Z caller=head.go:542 level=info component=tsdb msg="Replaying WAL, this may take a while"
Jul 28 14:24:09 eck prometheus[40225]: ts=2022-07-28T06:24:09.140Z caller=head.go:613 level=info component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
Jul 28 14:24:09 eck prometheus[40225]: ts=2022-07-28T06:24:09.140Z caller=head.go:619 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=186.407µs wal_replay_duration=591.84…uration=1.071424ms
Jul 28 14:24:09 eck prometheus[40225]: ts=2022-07-28T06:24:09.142Z caller=main.go:993 level=info fs_type=XFS_SUPER_MAGIC
Jul 28 14:24:09 eck prometheus[40225]: ts=2022-07-28T06:24:09.142Z caller=main.go:996 level=info msg="TSDB started"
Jul 28 14:24:09 eck prometheus[40225]: ts=2022-07-28T06:24:09.142Z caller=main.go:1177 level=info msg="Loading configuration file" filename=/usr/local/prometheus/prometheus.yml
Jul 28 14:24:09 eck prometheus[40225]: ts=2022-07-28T06:24:09.144Z caller=main.go:1214 level=info msg="Completed loading of configuration file" filename=/usr/local/prometheus/prometheus.yml totalDuration=2.101397ms db_…µs
Jul 28 14:24:09 eck prometheus[40225]: ts=2022-07-28T06:24:09.144Z caller=main.go:957 level=info msg="Server is ready to receive web requests."
Jul 28 14:24:09 eck prometheus[40225]: ts=2022-07-28T06:24:09.144Z caller=manager.go:941 level=info component="rule manager" msg="Starting rule manager..."
Hint: Some lines were ellipsized, use -l to show in full.

[root@eck system]# systemctl enable prometheus
Created symlink from /etc/systemd/system/multi-user.target.wants/prometheus.service to /usr/lib/systemd/system/prometheus.service.

P.S. 新版本的启动命令和旧版的不同,许多参数都优化或删除了,网上许多配置示例都是旧版本的,请注意甄别


停止

1
[root@eck prometheus] systemctl stop prometheus

Grafana

安装

Download Grafana | Grafana Labs


创建用户

1
[root@eck] useradd -s /sbin/nologin -M grafana

部署

1
2
3
4
5
6
7
8
9
[root@eck] mkdir /usr/local/grafana
[root@eck] tar zxvf grafana-enterprise-9.0.5.linux-amd64.tar.gz -C /usr/local/grafana/
[root@eck grafana-9.0.5]# cd /usr/local/grafana/grafana-9.0.5
[root@eck grafana-9.0.5]# mv * ../
[root@eck grafana-9.0.5]# cd ../
[root@eck grafana]# rm -rf grafana-9.0.5/
[root@eck grafana]# mkdir /usr/local/grafana/data
[root@eck grafana]# chown -R grafana:grafana /usr/local/grafana
[root@eck grafana]# nano /usr/local/grafana/conf/defaults.ini

更改如下参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#################################### Paths ###############################
[paths]
# Path to where grafana can store temp files, sessions, and the sqlite3 db (if that is used)
#data = data
data = /usr/local/grafana/data


# Temporary files in `data` directory older than given duration will be removed
temp_data_lifetime = 24h

# Directory where grafana can store logs
#logs = data/log
logs = /usr/local/grafana/log


# Directory where grafana will automatically scan and look for plugins
#plugins = data/plugins
plugins = /usr/local/grafana/plugins


# folder that contains provisioning config files that grafana will apply on startup and while running.
#provisioning = conf/provisioning
provisioning = /usr/local/grafana/conf/provisioning
1
[root@eck grafana]# nano /usr/lib/systemd/system/grafana.service

添加如下内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[Unit]
Description=Grafana
After=network.target

[Service]
User=grafana
Group=grafana
Type=notify
ExecStart=/usr/local/grafana/bin/grafana-server -homepath /usr/local/grafana
Restart=on-failure
RestartSec=15s

[Install]
WantedBy=multi-user.target
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[root@eck grafana]# systemctl daemon-reload
[root@eck grafana]# systemctl start grafana
[root@eck grafana]# systemctl status grafana

● grafana.service - Grafana
Loaded: loaded (/usr/lib/systemd/system/grafana.service; disabled; vendor preset: disabled)
Active: active (running) since Thu 2022-07-28 14:04:55 CST; 4s ago
Main PID: 33239 (grafana-server)
CGroup: /system.slice/grafana.service
└─33239 /usr/local/grafana/bin/grafana-server -homepath /usr/local/grafana

Jul 28 14:04:50 eck grafana-server[33239]: logger=query_data t=2022-07-28T14:04:50.010286641+08:00 level=info msg="Query Service initialization"
Jul 28 14:04:50 eck grafana-server[33239]: logger=live.push_http t=2022-07-28T14:04:50.034617786+08:00 level=info msg="Live Push Gateway initialization"
Jul 28 14:04:55 eck grafana-server[33239]: logger=ngalert t=2022-07-28T14:04:55.216538416+08:00 level=warn msg="failed to delete old am configs" org=1 err="database is locked"
Jul 28 14:04:55 eck grafana-server[33239]: logger=infra.usagestats.collector t=2022-07-28T14:04:55.313131685+08:00 level=info msg="registering usage stat providers" usageStatsProvidersLen=2
Jul 28 14:04:55 eck grafana-server[33239]: logger=report t=2022-07-28T14:04:55.314348363+08:00 level=warn msg="Scheduling and sending of reports disabled, SMTP is not configured and enabled. Configure SMTP to enable."
Jul 28 14:04:55 eck grafana-server[33239]: logger=grafanaStorageLogger t=2022-07-28T14:04:55.316123829+08:00 level=info msg="storage starting"
Jul 28 14:04:55 eck grafana-server[33239]: logger=ngalert t=2022-07-28T14:04:55.317403277+08:00 level=info msg="warming cache for startup"
Jul 28 14:04:55 eck grafana-server[33239]: logger=ngalert.multiorg.alertmanager t=2022-07-28T14:04:55.317616335+08:00 level=info msg="starting MultiOrg Alertmanager"
Jul 28 14:04:55 eck systemd[1]: Started Grafana.
Jul 28 14:04:55 eck grafana-server[33239]: logger=http.server t=2022-07-28T14:04:55.321708016+08:00 level=info msg="HTTP Server Listen" address=[::]:3000 protocol=http subUrl= socket=

[root@eck grafana]# systemctl enable grafana
Created symlink from /etc/systemd/system/multi-user.target.wants/grafana.service to /usr/lib/systemd/system/grafana.service.

登录

192.168.1.48:3000

用户名:admin

密码:admin


Node_exporter

prometheus负责汇总多个服务器的node_exporter收集的数据在grafana形象的展示出来。所以node_exporter需要安装在监控的目标机上


安装

Releases · prometheus/node_exporter (github.com)

1
2
3
[root@eck zip]# tar zxvf node_exporter-1.3.1.linux-amd64.tar.gz -C /usr/local
[root@eck zip]# mv /usr/local/node_exporter-1.3.1.linux-amd64/ /usr/local/node_exporter
[root@eck zip]# nano /usr/lib/systemd/system/node_exporter.service
1
2
3
4
5
6
7
8
9
10
11
12
[Unit]
Description=The node_exporter Server
After=network.target

[Service]
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure
RestartSec=15s
SyslogIdentifier=node_exporter

[Install]
WantedBy=multi-user.target
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[root@eck zip]# systemctl daemon-reload
[root@eck zip]# systemctl start node_exporter
[root@eck zip]# systemctl status node_exporter

● node_exporter.service - The node_exporter Server
Loaded: loaded (/usr/lib/systemd/system/node_exporter.service; disabled; vendor preset: disabled)
Active: active (running) since Thu 2022-07-28 15:01:19 CST; 3s ago
Main PID: 53582 (node_exporter)
CGroup: /system.slice/node_exporter.service
└─53582 /usr/local/node_exporter/node_exporter

Jul 28 15:01:19 eck node_exporter[53582]: ts=2022-07-28T07:01:19.364Z caller=node_exporter.go:115 level=info collector=thermal_zone
Jul 28 15:01:19 eck node_exporter[53582]: ts=2022-07-28T07:01:19.364Z caller=node_exporter.go:115 level=info collector=time
Jul 28 15:01:19 eck node_exporter[53582]: ts=2022-07-28T07:01:19.364Z caller=node_exporter.go:115 level=info collector=timex
Jul 28 15:01:19 eck node_exporter[53582]: ts=2022-07-28T07:01:19.364Z caller=node_exporter.go:115 level=info collector=udp_queues
Jul 28 15:01:19 eck node_exporter[53582]: ts=2022-07-28T07:01:19.364Z caller=node_exporter.go:115 level=info collector=uname
Jul 28 15:01:19 eck node_exporter[53582]: ts=2022-07-28T07:01:19.364Z caller=node_exporter.go:115 level=info collector=vmstat
Jul 28 15:01:19 eck node_exporter[53582]: ts=2022-07-28T07:01:19.364Z caller=node_exporter.go:115 level=info collector=xfs
Jul 28 15:01:19 eck node_exporter[53582]: ts=2022-07-28T07:01:19.364Z caller=node_exporter.go:115 level=info collector=zfs
Jul 28 15:01:19 eck node_exporter[53582]: ts=2022-07-28T07:01:19.365Z caller=node_exporter.go:199 level=info msg="Listening on" address=:9100
Jul 28 15:01:19 eck node_exporter[53582]: ts=2022-07-28T07:01:19.367Z caller=tls_config.go:195 level=info msg="TLS is disabled." http2=false

[root@eck zip]# systemctl enable node_exporter
Created symlink from /etc/systemd/system/multi-user.target.wants/node_exporter.service to /usr/lib/systemd/system/node_exporter.service.

访问:192.168.1.48:9100


部署

1
[root@eck ~]# nano /usr/local/prometheus/prometheus.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml" - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape: Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics' scheme defaults to 'http'.
static_configs:
- targets: ["192.168.1.48:9090"]
- job_name: "canal"
static_configs:
- targets: ["192.168.1.48:11112"]
- job_name: "group1"
static_configs:
- targets: ["192.168.1.48:9100"]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[root@eck ~]# /usr/local/prometheus/promtool check config /usr/local/prometheus/prometheus.yml
Checking /usr/local/prometheus/prometheus.yml
SUCCESS: /usr/local/prometheus/prometheus.yml is valid prometheus config file syntax

[root@eck ~]# systemctl restart prometheus
[root@eck ~]# systemctl status prometheus
● prometheus.service - The Prometheus Server
Loaded: loaded (/usr/lib/systemd/system/prometheus.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2022-07-28 15:09:59 CST; 3s ago
Main PID: 57357 (prometheus)
CGroup: /system.slice/prometheus.service
└─57357 /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --log.level info

Jul 28 15:09:59 eck prometheus[57357]: ts=2022-07-28T07:09:59.224Z caller=head.go:613 level=info component=tsdb msg="WAL segment loaded" segment=0 maxSegment=2
Jul 28 15:09:59 eck prometheus[57357]: ts=2022-07-28T07:09:59.238Z caller=head.go:613 level=info component=tsdb msg="WAL segment loaded" segment=1 maxSegment=2
Jul 28 15:09:59 eck prometheus[57357]: ts=2022-07-28T07:09:59.239Z caller=head.go:613 level=info component=tsdb msg="WAL segment loaded" segment=2 maxSegment=2
Jul 28 15:09:59 eck prometheus[57357]: ts=2022-07-28T07:09:59.239Z caller=head.go:619 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=41.469µs wal_replay_duration=22.7899…ration=23.179317ms
Jul 28 15:09:59 eck prometheus[57357]: ts=2022-07-28T07:09:59.240Z caller=main.go:993 level=info fs_type=XFS_SUPER_MAGIC
Jul 28 15:09:59 eck prometheus[57357]: ts=2022-07-28T07:09:59.241Z caller=main.go:996 level=info msg="TSDB started"
Jul 28 15:09:59 eck prometheus[57357]: ts=2022-07-28T07:09:59.241Z caller=main.go:1177 level=info msg="Loading configuration file" filename=/usr/local/prometheus/prometheus.yml
Jul 28 15:09:59 eck prometheus[57357]: ts=2022-07-28T07:09:59.258Z caller=main.go:1214 level=info msg="Completed loading of configuration file" filename=/usr/local/prometheus/prometheus.yml totalDuration=17.238657ms db…µs
Jul 28 15:09:59 eck prometheus[57357]: ts=2022-07-28T07:09:59.258Z caller=main.go:957 level=info msg="Server is ready to receive web requests."
Jul 28 15:09:59 eck prometheus[57357]: ts=2022-07-28T07:09:59.258Z caller=manager.go:941 level=info component="rule manager" msg="Starting rule manager..."
Hint: Some lines were ellipsized, use -l to show in full.

Mysqld-exporter

下载

Releases · prometheus/mysqld_exporter (github.com)


部署

1
2
3
4
5
6
7
8
9
[root@mysql zip]# tar zxvf mysqld_exporter-0.14.0.linux-amd64.tar.gz -C /usr/local
[root@mysql zip]# mv /usr/local/mysqld_exporter-0.14.0.linux-amd64/ mysqld_exporter/
[root@mysql zip]# cd /usr/local/mysqld_exporter/
[root@mysql mysqld_exporter]# nano /usr/local/mysqld_exporter/my.cnf
[client]
host=192.168.1.49
port=33106
user=mysql_monitor
password=test123
1
2
3
4
5
6
7
8
9
#在数据库里执行(注意更改密码)
USE mysql;
CREATE USER 'mysql_monitor' IDENTIFIED BY 'test123' WITH MAX_USER_CONNECTIONS 3;
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'mysql_monitor'@'%';
-- 没成功
-- CREATE USER 'mysql_monitor'@'localhost' IDENTIFIED BY 'test123' WITH MAX_USER_CONNECTIONS 3;
-- GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'mysql_monitor'@'localhost';
FLUSH PRIVILEGES;
EXIT
1
[root@eck zip]# nano /usr/lib/systemd/system/mysqld_exporter.service
1
2
3
4
5
6
7
8
9
10
11
12
[Unit]
Description=The mysqld_exporter Server
After=network.target

[Service]
ExecStart=/usr/local/mysqld_exporter/mysqld_exporter --config.my-cnf=/usr/local/mysqld_exporter/my.cnf --log.level=info
Restart=on-failure
RestartSec=15s
SyslogIdentifier=mysqld_exporter

[Install]
WantedBy=multi-user.target
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[root@mysql mysqld_exporter]# systemctl start mysqld_exporter
[root@mysql mysqld_exporter]# systemctl status mysqld_exporter
● mysqld_exporter.service - The mysqld_exporter Server
Loaded: loaded (/usr/lib/systemd/system/mysqld_exporter.service; disabled; vendor preset: disabled)
Active: active (running) since Fri 2022-07-29 09:23:11 CST; 3s ago
Main PID: 32377 (mysqld_exporter)
Tasks: 6
Memory: 8.3M
CGroup: /system.slice/mysqld_exporter.service
└─32377 /usr/local/mysqld_exporter/mysqld_exporter --config.my-cnf=/usr/local/mysqld_exporter/my.cnf --log.level=info

Jul 29 09:23:11 mysql mysqld_exporter[32377]: ts=2022-07-29T01:23:11.620Z caller=mysqld_exporter.go:277 level=info msg="Starting mysqld_exporter" version="(version=0.14.0, branch=HEAD, revision=ca1b9af82...b1aac73e7b68c)"
Jul 29 09:23:11 mysql mysqld_exporter[32377]: ts=2022-07-29T01:23:11.620Z caller=mysqld_exporter.go:278 level=info msg="Build context" (gogo1.17.8,userroot@401d370ca42e,date20220304-16:25:15)=(MISSING)
Jul 29 09:23:11 mysql mysqld_exporter[32377]: ts=2022-07-29T01:23:11.620Z caller=mysqld_exporter.go:293 level=info msg="Scraper enabled" scraper=global_status
Jul 29 09:23:11 mysql mysqld_exporter[32377]: ts=2022-07-29T01:23:11.620Z caller=mysqld_exporter.go:293 level=info msg="Scraper enabled" scraper=global_variables
Jul 29 09:23:11 mysql mysqld_exporter[32377]: ts=2022-07-29T01:23:11.620Z caller=mysqld_exporter.go:293 level=info msg="Scraper enabled" scraper=slave_status
Jul 29 09:23:11 mysql mysqld_exporter[32377]: ts=2022-07-29T01:23:11.620Z caller=mysqld_exporter.go:293 level=info msg="Scraper enabled" scraper=info_schema.query_response_time
Jul 29 09:23:11 mysql mysqld_exporter[32377]: ts=2022-07-29T01:23:11.620Z caller=mysqld_exporter.go:293 level=info msg="Scraper enabled" scraper=info_schema.innodb_cmp
Jul 29 09:23:11 mysql mysqld_exporter[32377]: ts=2022-07-29T01:23:11.620Z caller=mysqld_exporter.go:293 level=info msg="Scraper enabled" scraper=info_schema.innodb_cmpmem
Jul 29 09:23:11 mysql mysqld_exporter[32377]: ts=2022-07-29T01:23:11.620Z caller=mysqld_exporter.go:303 level=info msg="Listening on address" address=:9104
Jul 29 09:23:11 mysql mysqld_exporter[32377]: ts=2022-07-29T01:23:11.621Z caller=tls_config.go:195 level=info msg="TLS is disabled." http2=false
Hint: Some lines were ellipsized, use -l to show in full.
[root@mysql mysqld_exporter]# systemctl enable mysqld_exporter
Created symlink from /etc/systemd/system/multi-user.target.wants/mysqld_exporter.service to /usr/lib/systemd/system/mysqld_exporter.service.

[root@mysql mysqld_exporter]# journalctl -f -u mysqld_exporter

MySQL Overview dashboard for Grafana | Grafana Labs


标签说明

Name Meaning Param Other
MySQL Uptime MySQL运行时间 MySQL 服务器自从上次重启运行到现在的时长
Current QPS 每秒查询速率 根据使用 MySQL 的 SHOW STATUS 命令查询到的结果,它是服务器在最后一秒内执行的语句数量。这个变量包含在存储程序中执行的语句,与 Questions 变量不同
InnoDB Buffer Pool Size InnoDB 缓冲池 InnoDB 缓冲池 InnoDB 维护一个称为缓冲池的存储区域,用于在内存中缓存数据和索引。了解 InnoDB 缓冲池如何工作,并利用它来将频繁访问的数据保存在内存中,这是 MySQL 调优最重要的方面之一。目标是将工作集保存在内存中。在大多数情况下,这个值应该处于主机上60%-90%的可用内存之间
show status like 'Innodb_buffer_pool_resize%';
MySQL Connections 连接数 1. Connections 试图连接MySQL服务器的尝试次数
2. Max Connections 允许同时保持在打开状态的客户连接的最大个数
3. Max Used Connections 此前曾同时打开处于打开状态的连接的最大个数
4. Threads Connectd 现在正处于打开状态的连接的个数
自服务器启动以来同时使用的最大连接数
MySQL Client Thread Activity 客户端活动线程数 未休眠线程数
MySQL Questions 服务器执行的语句数 与 QPS 计算中使用的查询不同,只包括客户端发送到服务器的语句,而不包括存储程序中执行的语句
MySQL Thread Cache 线程缓存 1. Thread Cache Size 线程缓存所能容纳的线程的最大个数.断开的mysql连接会放到这个缓存里,新建立的连接就会重复使用它们而不创建新的线程. 如果缓存中有自由的线程,MySQL就能很快的响应连接请求,不必为每个连接都创建新的线程.每个在缓存中的线程通常消耗256KB内存.
2. Thread Created 为处理连接创建的线程总数
当客户端断开连接时,如果缓存未满,客户端的线程将被放入缓存中
MySQL Temporary Objects 临时表信息 1.Created_tmp_tables
MySQL服务器在对SQL查询语句进行处理时在内存里创建的临时数据表的个数. 如果该值太高,唯一的解决办法是:优化查询语句.
2. Created_tmp_disk_tables
MySQL服务器在对SQL查询语句进行处理时在磁盘上创建的临时数据表的个数,如果这个值比较高,可能的原因: a.查询在选择BLOB或者TEXT列的时候创建了临时表 b.tmp_table_size和max_heap_table_size的值也许太小
3. Created_tmp_files
MySQL服务器所创建的临时文件的个数
MySQL Select Types 1. Select Full Join 没有使用索引而完成的多表联接操作的次数.这种情况是性能杀手,最好去优化sql.
2. Select Full Range Join 利用一个辅助性的参照表(reference table)上的区间搜索(range search)操作而完成的多数据表联接操作的次数. 该值表示使用了范围查询联接表的次数.
3. Select Range 利用第一个数据表上的某个区间而完成的多数据表联接操作的次数.
4. Select Range Check 该变量记录了在联接时,对每一行数据重新检查索引的查询计划的数量,它的开销很大. 如果该值较高或正在增加,说明一些查询没有找到好索引.
5. Select Scan 通过对第一个数据表进行全表扫描而完成的多数据表联接操作的次数.
MySQL Sorts 排序使用情况 1. Sort Rows 对多少行排序
2. Sort Range 利用一个区间进行的排序操作的次数
3. Sort Merge Passes 查询导致了文件排序的次数.可以优化sql或者适当增加sort_buffer_size变量
4. Sort Scan 利用一次全表扫作而完成的排序操作的次数
显示当前排序功能的使用情况
MySQL Slow Queries 慢查询使用情况 1. Slow Queries 慢查询的次数(执行时间超过long_query_time值) 显示当前慢查询功能的使用情况
MySQL Aborted Connections 终止的连接数 1. Aborted Clients 因客户端没有正确地关闭而被丢弃的连接的个数
2. Aborted Connects 试图连接MySQL服务器但没有成功的次数
当一个给定的主机连接到 MySQL 并且连接在中间被中断(例如由于凭证错误)时,MySQL 会将该信息保存在系统表中
MySQL Table Locks 表级锁使用情况 1. Table Locks Immediate 无需等待就能够立刻得到满足的数据表锁定请求的个数
2. Table Locks Waited 显示了有多少表被锁住了并且导致服务器级的锁等待(存储引擎级的锁,如InnoDB行级锁,不会使该变量增加). 如果这个值比较高或者正在增加,那么表明存在严重的并发瓶颈.
MySQL 因各种原因需要多个不同的锁。在这个图表中,我们看到 MySQL 从存储引擎请求了多少个表级锁
MySQL Network Traffic 网络流量 1. Bytes Send 发送字节数
2. Bytes Received 收到字节数
MySQL 产生了多少网络流量。出站是从 MySQL 发送的网络流量,入站是 MySQL 收到的网络流量
MySQL Network Usage Hourly 每小时网络流量 每小时 MySQL 产生多少网络流量
MySQL Internal Memory Overview 内存概述 1. Key Buffer Size 键缓存大小
数据库使用的内存情况
Top Command Counters 命令计数器 显示了MySQL(在过去1秒内)执行各种命令的次数
Top Command Counters Hourly 命令计数器(小时)
MySQL Handlers 请求个数 1. Handler_writer 向数据表里插入一个数据行的请求的个数

2. Handler_update 对数据表里的一个数据行进行修改的请求的个数

3. Handler_delete 从数据表删除一个数据行的请求的个数

4. Handler_read_first 读取索引中第一个索引项的请求的个数

5. Handler_read_key 根据一个索引值而读取一个数据行的请求的个数

6. Handler_read_next 按索引顺序读取下一个数据行的请求的个数

7. Handler_read_prev 按索引逆序读取前一个数据行的请求的个数

8. Handler_read_rnd 根据某个数据行的位置而读取该数据行的请求的个数

9. Handler_read_rnd_next 读取下一个数据行的请求的个数.如果这个数字很高,就说明有很多语句需要
MySQL Transcation Handlers 1. Handler Commit 提交一个事务的请求的个数

2. Handler Rollback 回滚一个事务的请求的个数

3. Handler Savepoint 创建一个事务保存点的请求的个数

4. Handler Savepoint Rollback 回滚到一个事务保存点的请求的个数.
Process States
Top Processs States Hourly
MySQL Query Cache Memory
MySQL Query Cache Activity
MySQL File Openings
MySQL Open Files 当前处于打开状态的文件的个数 如果与open_files_limit接近,则应该加大
MySQL Table Open Cache Status MYSQL实践心得:table_open_cache的设置 - 盘思动 - 博客园 (cnblogs.com)
MySQL Open Tables MySQL服务器已打开的数据表总数(包括显式定义的临时表) 如果这个值很高,应该慎重考虑,是否加大数据表缓存(table_open_cache)
MySQL Table Definition Cache

Percona监控MySQL模板详解 - 爱码网 (likecs.com)


Redis_exporter

安装

Releases · oliver006/redis_exporter (github.com)

1
2
3
4
5
6
useradd -s /sbin/nologin -M redisporter
tar -xvf redis_exporter-v1.43.0.linux-386.tar.gz -C /usr/local
cd /usr/local
mv redis_exporter-v1.43.0.linux-386/ redis_exporter
chown redisporter:redisporter -R redis_exporter
nano /usr/lib/systemd/system/redis_exporter.service
1
2
3
4
5
6
7
8
9
10
11
12
13
14
[Unit]
Description=The redis_exporter Server
After=network.target

[Service]
User=redisporter
Group=redisporter
ExecStart=/usr/local/redis_exporter/redis_exporter -redis.addr 192.168.0.89:6379 -redis.password test123
Restart=on-failure
RestartSec=15s
SyslogIdentifier=redis_exporter

[Install]
WantedBy=multi-user.target
1
2
3
4
systemctl daemon-reload
systemctl start redis_exporter
systemctl status redis_exporter
systemctl enable redis_exporter

prometheus

1
2
3
4
5
nano /usr/local/prometheus/prometheus.yml

- job_name: "redis"
static_configs:
- targets: ["192.168.0.89:9121"]
1
systemctl restart prometheus

访问:127.0.0.1:9121/metrics


Elasticsearch_exporter

Prometheus + Grafana(十)系统监控之Elasticsearch - 曹伟雄 - 博客园 (cnblogs.com)

安装

Releases · prometheus-community/elasticsearch_exporter (github.com)

1
2
3
4
5
useradd -s /sbin/nologin -M esporter
tar zxvf elasticsearch_exporter-1.5.0.linux-386.tar.gz -C /usr/local
mv /usr/local/elasticsearch_exporter-1.5.0.linux-386/ /usr/local/elasticsearch_exporter
chown esporter:esporter -R /usr/local/elasticsearch_exporter
nano /usr/lib/systemd/system/elasticsearch_exporter.service
1
2
3
4
5
6
7
8
9
10
11
12
13
14
[Unit]
Description=The elasticsearch_exporter Server
After=network.target

[Service]
User=esporter
Group=esporter
ExecStart=/usr/local/elasticsearch_exporter/elasticsearch_exporter --es.all --es.indices --es.cluster_settings --es.indices_settings --es.shards --es.snapshots --es.uri http://elastic:test123!%40%23@192.168.0.88:9200
Restart=on-failure
RestartSec=15s
SyslogIdentifier=elasticsearch_exporter

[Install]
WantedBy=multi-user.target
1
2
3
4
systemctl daemon-reload
systemctl start elasticsearch_exporter
systemctl status elasticsearch_exporter
systemctl enable elasticsearch_exporter

prometheus

1
2
3
4
5
nano /usr/local/prometheus/prometheus.yml

- job_name: "elasticsearch"
static_configs:
- targets: ["192.168.0.88:9114"]
1
systemctl restart prometheus

访问:127.0.0.1:9114/metrics


Alertmanager

Release 0.24.0 / 2022-03-24 · prometheus/alertmanager (github.com)

Prometheus监控+Grafana+Alertmanager告警安装使用 (图文详解) - 九卷 - 博客园 (cnblogs.com)

1
2
3
4
5
6
7
8
9
10
11
tar -zxvf alertmanager-0.24.0.linux-386.tar.gz -C /usr/local
mv /usr/local/alertmanager-0.24.0.linux-386 /usr/local/alertmanager
nano /usr/local/prometheus/prometheus.yml

- job_name: "alertmanager"
static_configs:
- targets: ["eck:9093"]

useradd -s /sbin/nologin -M alertmanager
chown -R alertmanager:alertmanager /usr/local/alertmanager
nano /usr/lib/systemd/system/alertmanager.service
1
2
3
4
5
6
7
8
9
10
11
12
13
14
[Unit]
Description=The alertmanager Server
After=network.target

[Service]
User=alertmanager
Group=alertmanager
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml
Restart=on-failure
RestartSec=15s
SyslogIdentifier=alertmanager

[Install]
WantedBy=multi-user.target
1
2
3
4
5
systemctl daemon-reload
systemctl start alertmanager
systemctl status alertmanager
systemctl enable alertmanager
systemctl restart prometheus

MySQL

定时备份

MysqlDump

1
nano /home/bak/mysql/data/backup.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#!/bin/bash

DATAdelete=`date +%F -d "-7 day"`
rm -rf /home/bak/mysql/data/*_${DATAdelete}.sql.gz

MYSQL_CMD=/usr/local/mysql/bin/mysqldump
MYSQL_USER=root
MYSQL_PWD=test123!@#
DATA=`date +%F`

DBname=`mysql -u${MYSQL_USER} -p${MYSQL_PWD} -e "show databases;" | sed '1d' | grep -vE 'information_schema|mysql|performance_schema|sys'`

for DBname in ${DBname}
do
${MYSQL_CMD} -u${MYSQL_USER} -p${MYSQL_PWD} --compact -B ${DBname} | gzip >/home/bak/mysql/data/${DBname}_${DATA}.sql.gz
done
1
2
3
4
5
6
chmod +x /home/bak/mysql/data/backup.sh
sh /usr/local/mysqlDataBackup/backup.sh
crontab -e
00 01 * * * /usr/local/mysql/backup.sh
crontab -l
service crond restart

Grafana外网访问

nginx

1
nano /etc/nginx/sites-available/eck
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
#grafana
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}

server {
listen 3000;
root /usr/share/nginx/www;
index index.html index.htm;

location ^~ /grafana/ {
proxy_pass http://eck:3000/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-Host $host;
proxy_set_header X-Forwarded-Server $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
add_header 'Access-Control-Allow-Origin' $http_origin;
add_header 'Access-Control-Allow-Credentials' 'true' always;
add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS' always;
add_header 'Access-Control-Expose-Headers' 'Content-Type,Content-Length,Content-Range';
add_header 'Access-Control-Allow-Headers'
'Accept,
Authorization,
Cache-Control,
Content-Type,
DNT,
If-Modified-Since,
Keep-Alive,
Origin,
User-Agent,
X-Requested-With' always;

if ($request_method = 'OPTIONS') {
return 204;
}
}

# Proxy Grafana Live WebSocket connections.
location ^~ /grafana/api/live {
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header Host $http_host;
proxy_pass http://eck:3000/;
}
}

#prometheuns
server {
listen 9090;
server_name eck;
location / {
proxy_pass http://eck:9090;
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Real-IP $remote_addr;
}
}

#node_exporter
server {
listen 9100;
server_name eck;
location / {
proxy_pass http://eck:9100;
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Real-IP $remote_addr;
}
}

Grafana

nano /usr/local/grafana/conf/defaults.ini

1
2
3
domain = 192.168.0.88
root_url = %(protocol)s://%(domain)s:%(http_port)s/grafana
serve_from_sub_path = true