k8s deploy
install kubelet kubeadm kubectl
all nodes should install
- Step1
1 | Set SELinux in permissive mode (effectively disabling it) |
Step2
注意下面示例的v1.32不是最新的版本,可修改到最新的版本进行安装
CentOS/RHEL/Rocky
1 | cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo |
Ubuntu/Debian
1 | sudo apt update |
- step3
CentOS/RHEL/Rocky
1 | all node |
Ubuntu/Debian
1 | sudo apt install -y kubelet kubeadm kubectl |
step4 disable swap
主要是 kubelet 服务需要禁用swap
1 | sudo swapoff -a |
install Container Runtimes
1 | # Runtime Path to Unix domain socket |
- Enable ipv4 packet forward
1 | sysctl params required by setup, params persist across reboots |
- cgroup driver (optional in container and CRI-O)
To set systemd
as the cgroup driver, edit the KubeletConfiguration
option of cgroupDriver
and set it to systemd
. For example:
1 | apiVersion: kubelet.config.k8s.io/v1beta1 |
Starting with v1.22 and later, when creating a cluster with kubeadm, if the user does not set the cgroupDriver
field under KubeletConfiguration
, kubeadm defaults it to systemd
.
containerd
init config.toml
1
2
3must create /etc/containerd directory advanced
sudo mkdir -p /etc/containerd
sudo containerd config default | sudo tee /etc/containerd/config.tomlconfig
Configuring the systemd
cgroup driver
sudo vim /etc/containerd/config.toml
1 | #disabled_plugins = ["cri"] |
sudo systemctl restart containerd
启动完成后就可以使用 containerd 的本地 CLI 工具
ctr
和crictl
了
check containerd version
1
2containerd --version
containerd containerd.io 1.7.19
CRI-O (optional)
set repo
1
2
3
4
5
6
7
8cat <<EOF | sudo tee /etc/yum.repos.d/cri-o.repo
[cri-o]
name=CRI-O
baseurl=https://mirrors.ustc.edu.cn/kubernetes/addons:/cri-o:/stable:/v1.30/rpm/
enabled=1
gpgcheck=1
gpgkey=https://mirrors.ustc.edu.cn/kubernetes/addons:/cri-o:/stable:/v1.30/rpm/repodata/repomd.xml.key
EOFinstall
1
2
3dnf install -y container-selinux
dnf install -y cri-o
systemctl start crio.servicebootstrap cluster
1
2
3swapoff -a
modprobe br_netfilter
sysctl -w net.ipv4.ip_forward=1
Docker Engine (optional)
install docker (existing)
install
cri-dockerd
, the CRI socket is/run/cri-dockerd.sock
by default.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22download
wget https://github.com/Mirantis/cri-dockerd/releases/download/v0.3.16/cri-dockerd-0.3.16.amd64.tgz
extract tgz
tar -xzvf cri-dockerd-0.3.16.amd64.tgz
sudo mv ./cri-dockerd /usr/local/bin/ # mv binary cri-dockerd file
wget https://raw.githubusercontent.com/Mirantis/cri-dockerd/master/packaging/systemd/cri-docker.service
wget https://raw.githubusercontent.com/Mirantis/cri-dockerd/master/packaging/systemd/cri-docker.socket
sudo mv cri-docker.socket cri-docker.service /etc/systemd/system/
sudo sed -i -e 's,/usr/bin/cri-dockerd,/usr/local/bin/cri-dockerd,' /etc/systemd/system/cri-docker.service
config always auto-restart
vim /etc/systemd/system/cri-docker.socket
[Socket]
Restart=always
enable service, cri-docker.socket -> trigger -> cri-docker.service
sudo systemctl daemon-reload
sudo systemctl enable cri-docker.service
sudo systemctl enable --now cri-docker.socket
sudo systemctl status cri-docker.socket
crictl
cri client, could check container running on k8s
all node should install it
- Config containerd
vim /etc/crictl.yaml
1 | runtime-endpoint: unix:///run/containerd/containerd.sock # docker-engine use unix:///var/run/cri-dockerd.sock |
- verify mirror
1 | sudo crictl info |
- command
1 | List All Pods |
kubeadm deploy cluster
To use different container runtime or if there are more than one installed on the provisioned node, specify the
--cri-socket
argument tokubeadm
.
Runtime | Path to Unix domain socket |
---|---|
containerd | unix:///var/run/containerd/containerd.sock |
CRI-O | unix:///var/run/crio/crio.sock |
Docker Engine (using cri-dockerd) | unix:///var/run/cri-dockerd.sock |
Initializing control-plane node
bootstrap cluster
1
2
3
4
5use flannel as Pod network add-on (CNI plugin), 10.244.0.0/16 is config for flannel
sudo kubeadm init --apiserver-advertise-address=10.220.32.16 --pod-network-cidr=10.244.0.0/16 --cri-socket=unix:///var/run/containerd/containerd.sock # 如果使用其他 cri, 需要修改sock文件
sudo kubeadm config images pull --config kubeadm.conf (提前拉取相关镜像)
sudo kubeadm init --config kubeadm.conf (使用指定的配置文件,上文中的 apiserver-advertise-address 等参数均在配置文件中配置)(optional) resert (back to kubeadm init)
执行报错,或修改init配置的时候,进行resert操作
1
2
3
4
5sudo kubeadm reset
sudo rm -rf /etc/cni/net.d
sudo rm -rf $HOME/.kube/config
sudo systemctl restart kubelet # 必须执行重启
Use kubectl
1
2
3mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/configinstall flannel
1
2
3
4
5
6apply flannel(optional) CNI plugin
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
如果需要替换 flannel 相关的镜像源,需要提前wget 下载 kube-flannel.yml, 然后修改image信息
swr.cn-north-4.myhuaweicloud.com/ddn-k8s/ghcr.io/flannel-io/flannel:v0.26.4
swr.cn-north-4.myhuaweicloud.com/ddn-k8s/ghcr.io/flannel-io/flannel-cni-plugin:v1.6.2-flannel1check the CoreDNS Pod is
Running
1
2
3
4
5
6
7
8
9
10kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-7b5944fdcf-mgwq2 1/1 Running 0 4h22m
kube-system coredns-7b5944fdcf-sxfvt 1/1 Running 0 4h22m
kube-system etcd-dingo7232 1/1 Running 1 4h22m
kube-system kube-apiserver-dingo7232 1/1 Running 1 4h22m
kube-system kube-controller-manager-dingo7232 1/1 Running 1 4h22m
kube-system kube-proxy-w2lfk 1/1 Running 0 4h22m
kube-system kube-scheduler-dingo7232 1/1 Running 1 4h22mkubeadm.conf
infosudo kubeadm config print init-defaults > kubeadm.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 172.20.7.232
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
name: dingo7232
taints: null
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.30.0
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
scheduler: {}
join node
工作节点加入集群 (工作及节点需要提前安装 kubelet、kubeadm、kubectl)
1 | shangdi 16 |
注意:kubeadm的token会有TTL,可用 kubeadm token list
查看token是否有效,否则使用kubeadm token create --print-join-command
创建一个新的token
- Check on control-plane
1 | kubectl get nodes |
check all pods
1
2
3
4
5
6
7
8
9
10
11
12
13
14kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-gmdks 1/1 Running 0 56s
kube-flannel kube-flannel-ds-qzmb9 1/1 Running 0 10m
kube-flannel kube-flannel-ds-tvdjg 1/1 Running 0 3m58s
kube-system coredns-55cb58b774-7wwq7 1/1 Running 0 74m
kube-system coredns-55cb58b774-z8v87 1/1 Running 0 74m
kube-system etcd-dingo127 1/1 Running 0 74m
kube-system kube-apiserver-dingo127 1/1 Running 0 74m
kube-system kube-controller-manager-dingo127 1/1 Running 0 74m
kube-system kube-proxy-6wv22 1/1 Running 0 74m
kube-system kube-proxy-mc6nt 1/1 Running 0 56s
kube-system kube-proxy-zv8fp 1/1 Running 0 3m58s
kube-system kube-scheduler-dingo127 1/1 Running 0 74m
plugin
dashboard
install online
1
2helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/
helm upgrade --install kubernetes-dashboard kubernetes-dashboard/kubernetes-dashboard --create-namespace --namespace kubernetes-dashboardinstall offline
1
helm upgrade --install kubernetes-dashboard kubernetes-dashboard-7.5.0.tgz --create-namespace --namespace kubernetes-dashboard
uninstall
1
helm delete kubernetes-dashboard --namespace kubernetes-dashboard
reference
best practices
change etcd default port
step 1: init kubeadm config
只需要更新需要修改的参数,例如下面的配置信息就是修改etcd的默认端口改为 12379、12380
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
nodeRegistration:
criSocket: unix:///var/run/cri-dockerd.sock # 使用了 docker engine 作为 container runtime
name: dingo127
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: 1.30.0
networking:
podSubnet: "10.244.0.0/16" # 对应了命令行的 pod-network-cidr 参数
apiServer:
timeoutForControlPlane: 4m0s
extraArgs:
advertise-address: "172.30.14.127"
etcd-servers: "https://127.0.0.1:12379" # etcd 服务地址
etcd:
local:
dataDir: /var/lib/k8s-etcd
extraArgs:
advertise-client-urls: "https://172.30.14.127:12379"
initial-advertise-peer-urls: "https://172.30.14.127:12380"
listen-client-urls: "https://127.0.0.1:12379,https://172.30.14.127:12379"
listen-peer-urls: "https://172.30.14.127:12380"
initial-cluster: "dingo127=https://172.30.14.127:12380"
listen-metrics-urls: "http://127.0.0.1:12381"注意:kubeadm.k8s.io/v1beta3 版本的 extraArgs 是 map[string]string 格式;如果使用的是 kubeadm.k8s.io/v1beta4,则相应的extraArgs 需要变为 list 类型
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23...
apiServer:
extraArgs:
- name: "advertise-address"
value: "172.30.14.127"
- name: "etcd-servers"
value: "https://127.0.0.1:12379"
...
etcd:
local:
extraArgs:
- name: "advertise-client-urls"
value: "https://172.30.14.127:12379"
- name: "initial-advertise-peer-urls"
value: "https://172.30.14.127:12380"
- name: "listen-client-urls"
value: "https://127.0.0.1:12379,https://172.30.14.127:12379"
- name: "listen-peer-urls"
value: "https://172.30.14.127:12380"
- name: "initial-cluster"
value: "dingo127=https://172.30.14.127:12380"
- name: "listen-metrics-urls"
value: "http://127.0.0.1:12381"step2: 执行 init 操作
1
2sudo kubeadm init --config=kubeadm.conf --ignore-preflight-errors=Port-2379,Port-2380
因为 kubeadm 需要做 pref-light(检测etcd端口2379和2380端口是否被占用),这里需要跳过对这两个端口的检测
rejoin work node
step1: remove node
1
2kubectl drain <old-node-name> --ignore-daemonsets --delete-emptydir-data
kubectl delete node <old-node-name>step 2: 重置kubeadm
1
2
3
4
5Reset kubeadm on the Worker Node
sudo kubeadm reset -f # --cri-socket unix:///var/run/cri-dockerd.sock
restart kubelet on workder node
sudo systemctl restart kubeletstep 3: join node
1
sudo kubeadm join <control-plane-ip>:6443 --token <token> --discovery-token-ca-cert-hash sha256:<hash> # # --cri-socket unix:///var/run/cri-dockerd.sock
other node use kubectl
1 | method 1 |
log
init control-plane (kubeadm init)
1 | [reset] Reading configuration from the cluster... |
join node to cluter
1 | [preflight] Running pre-flight checks |
troubelshoot
flannel
Failed to check br_netfilter: stat /proc/sys/net/bridge/bridge-nf-call-iptables: no such file or directory
Solution : Load the br_netfilter Module
Check if br_netfilter is loaded
1
lsmod | grep br_netfilter
If not loaded, load the module:
1
sudo modprobe br_netfilter
Enable br_netfilter on boot
1
2Add the following line to /etc/modules-load.d/k8s.conf:
br_netfilterEnsure bridge-nf-call-iptables and bridge-nf-call-ip6tables are set to 1:
1
2sudo sysctl net.bridge.bridge-nf-call-iptables=1
sudo sysctl net.bridge.bridge-nf-call-ip6tables=1Persist the settings by adding the following to
/etc/sysctl.d/k8s.conf
:1
2net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1Reload sysctl settings:
1
sudo sysctl --system
containerd
change containerd’s default data path
Identify the Current Data Path
1
containerd config default | grep "root" # Expected output: root = "/var/lib/containerd"
Modify the following lines in /etc/containerd/config.toml:
1
root = "/path/to/new/data/path" # the location where container data (images, volumes) is stored
Move Existing Data (if required)
1
2
3sudo systemctl stop containerd # optional
sudo mv /var/lib/containerd /data/containerd
sudo systemctl start containerd
kubelet serivce failed
The connection to the server 10.220.32.16:6443 was refused - did you specify the right host or port?
API server process is up and listening on port 6443
- disable swap
1 | sudo swapoff -a |
systemctl start kubelet
control-plane NotReady
restart kubelet service on control-plane node
1
systemctl restart kubectl