etcd数据库集群部署

参考文档

1、安装cfssl证书生成工具 (master1 上操作就好)

cfssl 是一个开源的证书管理工具,使用 json文件生成证书,相比 openssl更方便使用。

1
2
3
4
5
6
7
cd /usr/local/bin
wget --no-check-certificate https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 -o /usr/local/bin/cfssl
wget --no-check-certificate https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64 -o /usr/local/bin/cfssljson
wget --no-check-certificate https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64 -o /usr/local/bin/cfssl-certinfo


chmod +x /usr/local/bin/cfssl*

海外很难下, 需要的同学可以发邮件

4、创建 etcd配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cat > /opt/etcd/cfg/etcd.conf << EOF
#[Member]
ETCD_NAME="etcd-1" # 这里各节点需要改成自己的节点名称
ETCD_DATA_DIR="/data/etcd/default.etcd"
ETCD_LISTEN_PEER_URLS="https://192.168.100.45:2380" # 当前etcd节点的ip
ETCD_LISTEN_CLIENT_URLS="https://192.168.100.45:2379" # 当前etcd节点的ip

#[Clustering]
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.100.45:2380" # 当前etcd节点的ip
ETCD_ADVERTISE_CLIENT_URLS="https://192.168.100.45:2379" # 当前etcd节点的ip
ETCD_INITIAL_CLUSTER="etcd-1=https://192.168.100.45:2380,etcd-2=https://192.168.100.47:2380,etcd-3=https://192.168.100.48:2380" # etcd集群的所有节点名称、IP
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
EOF
  • ETCD_NAME:节点名称,集群中唯一
  • ETCD_DATA_DIR:数据目录
  • ETCD_LISTEN_PEER_URLS:集群通信监听地址
  • ETCD_LISTEN_CLIENT_URLS:客户端访问监听地址
  • ETCD_INITIAL_ADVERTISE_PEER_URLS:集群通告地址
  • ETCD_ADVERTISE_CLIENT_URLS:客户端通告地址
  • ETCD_INITIAL_CLUSTER:集群节点地址
  • ETCD_INITIAL_CLUSTER_TOKEN:集群Token
  • ETCD_INITIAL_CLUSTER_STATE:加入集群的当前状态,new是新集群,existing表示加入已有集群

    5、etcd所有配置 scp到其他etcd节点

    1
    2
    3
    4
    # master同步 etcd二进制目录给其他etcd节点
    cd /opt
    scp -P22 -r etcd root@192.168.100.47:/opt/ # node1
    scp -P22 -r etcd root@192.168.100.48:/opt/ # node2

    6、在node1,node2上修改 etcd配置文件

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    cat /opt/etcd/cfg/etcd.conf    
    #[Member]
    ETCD_NAME="etcd-2" # 修改此处,节点2改为etcd-2,节点3改为etcd-3
    ETCD_DATA_DIR="/data/etcd/default.etcd"
    ETCD_LISTEN_PEER_URLS="https://192.168.100.47:2380" # 修改此处为当前服务器IP
    ETCD_LISTEN_CLIENT_URLS="https://192.168.100.47:2379" # 修改此处为当前服务器IP

    #[Clustering]
    ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.100.47:2380" # 修改此处为当前服务器IP
    ETCD_ADVERTISE_CLIENT_URLS="https://192.168.100.47:2379" # 修改此处为当前服务器IP
    ETCD_INITIAL_CLUSTER="etcd-1=https://192.168.100.45:2380,etcd-2=https://192.168.100.47:2380,etcd-3=https://192.168.100.48:2380"
    ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
    ETCD_INITIAL_CLUSTER_STATE="new" # 初始化集群状态,new表示新建

    7、systemd管理 etcd.service

    所有节点
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    cat > /etc/systemd/system/etcd.service << EOF
    [Unit]
    Description=Etcd Server
    After=network.target
    After=network-online.target
    Wants=network-online.target

    [Service]
    Type=notify
    EnvironmentFile=/opt/etcd/cfg/etcd.conf
    ExecStart=/opt/etcd/bin/etcd \
    --cert-file=/opt/etcd/ssl/server.pem \
    --key-file=/opt/etcd/ssl/server-key.pem \
    --peer-cert-file=/opt/etcd/ssl/server.pem \
    --peer-key-file=/opt/etcd/ssl/server-key.pem \
    --trusted-ca-file=/opt/etcd/ssl/ca.pem \
    --peer-trusted-ca-file=/opt/etcd/ssl/ca.pem \
    --logger=zap
    Restart=on-failure
    LimitNOFILE=65536

    [Install]
    WantedBy=multi-user.target
    EOF

    # 所有节点开机自启动

    systemctl daemon-reload
    systemctl enable etcd

    # 所有节点创建 etcd 数据目录

    mkdir -p /data/etcd

8、依次启动三个etcd节点

master, node1, node2依次启动 etcd

启动的时候报错

1
2
tail -f /var/log/messages 
Nov 19 00:45:42 192-168-100-45 etcd: recognized and used environment variable ETCD_NAME=etcd-1# 这里各节点需要改成自己的节点名称

解决 配置文件不能有中文注释,去掉就好了,文档只是说明。

如果只启动一个节点命令行会hung住,其内部是在等待其他节点启动。

1
2
3
systemctl daemon-reload 
systemctl enable etcd
systemctl start etcd

9、最后查看 etcd集群状态

1
2
3
4
5
6
7
8
9
10
11
12
13
ETCDCTL_API=3 /opt/etcd/bin/etcdctl \
--cacert=/opt/etcd/ssl/ca.pem \
--cert=/opt/etcd/ssl/server.pem \
--key=/opt/etcd/ssl/server-key.pem \
--endpoints="https://192.168.100.45:2379,https://192.168.100.47:2379,https://192.168.100.48:2379" endpoint health --write-out=table

+-----------------------------+--------+-------------+-------+
| ENDPOINT | HEALTH | TOOK | ERROR |
+-----------------------------+--------+-------------+-------+
| https://192.168.100.45:2379 | true | 25.056174ms | |
| https://192.168.100.47:2379 | true | 26.951606ms | |
| https://192.168.100.48:2379 | true | 27.763509ms | |
+-----------------------------+--------+-------------+-------+

如果输出上面信息,就说明集群部署成功。

如果有问题第一步先看日志:/var/log/message 或 journalctl -u etcd

10、添加节点和删除节点

参考资料

在leader节点上操作,如果不知道那个是leader节点可以执行命令查看

1
2
3
4
5
6
7
8
9
ETCDCTL_API=3 /opt/etcd/bin/etcdctl   --cacert=/opt/etcd/ssl/ca.pem   --cert=/opt/etcd/ssl/server.pem   --key=/opt/etcd/ssl/server-key.pem   --endpoints="https://192.168.100.45:2379,https://192.168.100.47:2379,https://192.168.100.48:2379" endpoint status --write-out=table

+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.100.45:2379 | 1c208007db945734 | 3.4.22 | 20 kB | true | false | 139 | 15 | 15 | |
| https://192.168.100.47:2379 | d3126f94175c8ff7 | 3.4.22 | 20 kB | false | false | 139 | 15 | 15 | |
| https://192.168.100.48:2379 | 6e42f9ee8f2cb08d | 3.4.22 | 20 kB | false | false | 139 | 15 | 15 | |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

添加,注意参数少些,如果需要添加多个,只能一个添加,启动后。才能添加另一个

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 添加
ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/server.pem --key=/opt/etcd/ssl/server-key.pem --endpoints="https://192.168.100.45:2379,https://192.168.100.47:2379,https://192.168.100.48:2379" member add etcd-4 --peer-urls="https://192.168.100.46:2380"

Member 1afea86762407038 added to cluster ebec0a3ac5e92070

ETCD_NAME="etcd-4"
ETCD_INITIAL_CLUSTER="etcd-4=https://192.168.100.46:2380,etcd-1=https://192.168.100.45:2380,etcd-3=https://192.168.100.48:2380,etcd-2=https://192.168.100.47:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.100.46:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

# 加完STATUS 会有 unstarted
ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/server.pem --key=/opt/etcd/ssl/server-key.pem --endpoints="https://192.168.100.45:2379,https://192.168.100.47:2379,https://192.168.100.48:2379" member list --write-out=table
+------------------+-----------+--------+-----------------------------+-----------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+-----------+--------+-----------------------------+-----------------------------+------------+
| 1afea86762407038 | unstarted | etcd-4 | https://192.168.100.46:2380 | https://192.168.100.46:2379 | false |
| 1c208007db945734 | started | etcd-1 | https://192.168.100.45:2380 | https://192.168.100.45:2379 | false |
| 6e42f9ee8f2cb08d | started | etcd-3 | https://192.168.100.48:2380 | https://192.168.100.48:2379 | false |
| d3126f94175c8ff7 | started | etcd-2 | https://192.168.100.47:2380 | https://192.168.100.47:2379 | false |
+------------------+-----------+--------+-----------------------------+-----------------------------+------------+

etcd-4 这台配置好配置文件,启动

  • 将etcd- 1 上证书文件和systemd 文件拷贝到etcd-4 上对应位置

  • 修改配置文件中new为existing,以及对应的hostname,ip (修改配置文件中ETCD_INITIAL_CLUSTER_STATE标记为existing,如果为new,则会自动生成一个新的memberID,这和前面添加节点时生成的ID不一致,故日志中会报节点ID不匹配的错。)

  • 重新生成证书,证书中添加kd4对应的ip,将证书同步到4个节点,并重启前3个节点的etcd (我们生成证书的时候包含了这个IP,所以这个时候不需要重新生成证书)

  • 前三台需要增加配置文件(以后重启也可以)

1
ETCD_INITIAL_CLUSTER="etcd-1=https://192.168.100.45:2380,etcd-2=https://192.168.100.47:2380,etcd-3=https://192.168.100.48:2380,etcd-4=https://192.168.100.46:2380"
1
2
3
4
5
6
7
8
9
ETCDCTL_API=3 /opt/etcd/bin/etcdctl   --cacert=/opt/etcd/ssl/ca.pem   --cert=/opt/etcd/ssl/server.pem   --key=/opt/etcd/ssl/server-key.pem   --endpoints="https://192.168.100.45:2379,https://192.168.100.47:2379,https://192.168.100.48:2379,https://192.168.100.46:2379" endpoint status --write-out=table
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.100.45:2379 | 1c208007db945734 | 3.4.22 | 20 kB | true | false | 146 | 32 | 32 | |
| https://192.168.100.47:2379 | d3126f94175c8ff7 | 3.4.22 | 20 kB | false | false | 146 | 32 | 32 | |
| https://192.168.100.48:2379 | 6e42f9ee8f2cb08d | 3.4.22 | 25 kB | false | false | 146 | 32 | 32 | |
| https://192.168.100.46:2379 | 1afea86762407038 | 3.4.22 | 20 kB | false | false | 146 | 32 | 32 | |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

报错1

1
2
3
4
5
6
7
8
tail -f /var/log/messages

request sent was ignored (cluster ID mismatch: peer[c39bdec535db1fd5]=cdf818194e3a8c32, local=b0daaba520989844)

# 解决方法
# 清缓存
cd /var/lib/etcd/
rm ./* -rf

删除节点

查看ID

1
2
3
4
5
6
7
8
9
10
11
12
13
14
ETCDCTL_API=3 /opt/etcd/bin/etcdctl   --cacert=/opt/etcd/ssl/ca.pem   --cert=/opt/etcd/ssl/server.pem   --key=/opt/etcd/ssl/server-key.pem   --endpoints="https://192.168.100.45:2379,https://192.168.100.47:2379,https://192.168.100.48:2379" member list --write-out=table
+------------------+---------+--------+-----------------------------+-----------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+--------+-----------------------------+-----------------------------+------------+
| 1c208007db945734 | started | etcd-1 | https://192.168.100.45:2380 | https://192.168.100.45:2379 | false |
| 6e42f9ee8f2cb08d | started | etcd-3 | https://192.168.100.48:2380 | https://192.168.100.48:2379 | false |
| d3126f94175c8ff7 | started | etcd-2 | https://192.168.100.47:2380 | https://192.168.100.47:2379 | false |
+------------------+---------+--------+-----------------------------+-----------------------------+------------+

# 删除
ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/server.pem --key=/opt/etcd/ssl/server-key.pem --endpoints="https://192.168.100.45:2379,https://192.168.100.47:2379,https://192.168.100.48:2379" member remove a8589aa8629b731b

Removed member a8589aa8629b731b from cluster