本文最后更新于：February 1, 2023 pm

本文主要介绍如何在calico集群彻底删除calico并重新安装配置cilium组件作为集群的cni。

为什么标题写着有损迁移呢，因为在迁移过程中集群的网络会中断，所有的pod都不能正常工作。关于无损的迁移方案，此前在jet stack上面看到过有位大神发了一篇文章，有兴趣的可以看看。其实测试环境的话无所谓有损无损，但是生产环境不建议这么操作，实际上估计也不会有这么操作的吧。

关于本次使用的calico集群的部署过程可以参考之前的文章k8s系列13-calico部署BGP模式的高可用k8s集群。

此前写的一些关于k8s基础知识和集群搭建的一些方案，有需要的同学可以看一下。

1、集群信息

1.1 node信息

[root@k8s-calico-master-10-31-90-1 ~]# kubectl get nodes -o wide
NAME                                       STATUS   ROLES           AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION              CONTAINER-RUNTIME
k8s-calico-master-10-31-90-1.tinychen.io   Ready    control-plane   24d   v1.26.0   10.31.90.1    <none>        CentOS Linux 7 (Core)   6.1.4-1.el7.elrepo.x86_64   containerd://1.6.14
k8s-calico-master-10-31-90-2.tinychen.io   Ready    control-plane   24d   v1.26.0   10.31.90.2    <none>        CentOS Linux 7 (Core)   6.1.4-1.el7.elrepo.x86_64   containerd://1.6.14
k8s-calico-master-10-31-90-3.tinychen.io   Ready    control-plane   24d   v1.26.0   10.31.90.3    <none>        CentOS Linux 7 (Core)   6.1.4-1.el7.elrepo.x86_64   containerd://1.6.14
k8s-calico-worker-10-31-90-4.tinychen.io   Ready    <none>          24d   v1.26.0   10.31.90.4    <none>        CentOS Linux 7 (Core)   6.1.4-1.el7.elrepo.x86_64   containerd://1.6.14
k8s-calico-worker-10-31-90-5.tinychen.io   Ready    <none>          24d   v1.26.0   10.31.90.5    <none>        CentOS Linux 7 (Core)   6.1.4-1.el7.elrepo.x86_64   containerd://1.6.14
k8s-calico-worker-10-31-90-6.tinychen.io   Ready    <none>          24d   v1.26.0   10.31.90.6    <none>        CentOS Linux 7 (Core)   6.1.4-1.el7.elrepo.x86_64   containerd://1.6.14

1.2 ip信息

IP	Hostname
10.31.90.0	k8s-calico-apiserver.tinychen.io
10.31.90.1	k8s-calico-master-10-31-90-1.tinychen.io
10.31.90.2	k8s-calico-master-10-31-90-2.tinychen.io
10.31.90.3	k8s-calico-master-10-31-90-3.tinychen.io
10.31.90.4	k8s-calico-worker-10-31-90-4.tinychen.io
10.31.90.5	k8s-calico-worker-10-31-90-5.tinychen.io
10.31.90.6	k8s-calico-worker-10-31-90-6.tinychen.io
10.33.0.0/17	podSubnet
10.33.128.0/18	serviceSubnet
10.33.192.0/18	LoadBalancerSubnet

1.3 变更目标

此次修改集群的目标是删除原有的calico，并重新安装cilium，同时开启kubeProxyReplacement和BGP路由可达。

2、删除calico

如果之前是使用yaml部署并且保留了原来的文件的，可以直接使用yaml进行卸载

1 2	`kubectl delete -f tigera-operator.yaml --grace-period=0 --force kubectl delete -f custom-resources.yaml --grace-period=0 --force`

CNI的部署可以参考官网的自建K8S部署教程，官网主要给出了两种部署方式，分别是通过Calico operator和Calico manifests来进行部署和管理calico，operator是通过deployment的方式部署一个calico的operator到集群中，再用其来管理calico的安装升级等生命周期操作。manifests则是将相关都使用yaml的配置文件进行管理，这种方式管理起来相对前者比较麻烦，但是对于高度自定义的K8S集群有一定的优势。

一般来说可能没卸载干净，这里我们再检查一下遗漏的资源

# 检查所有名字里面带有 calico|tigera 的资源: 
kubectl get all --all-namespaces | egrep "calico|tigera"

# 检查所有名字里面带有 calico|tigera 的 api resources: 
kubectl api-resources --verbs=list --namespaced -o name | egrep "calico|tigera"

# 检查所有名字里面带有 calico|tigera 的 不带namespace信息的 api resources: 
kubectl api-resources --verbs=list -o name  | egrep "calico|tigera"

当出现资源无法删除的时候可以通过检查其finalizers字段来定位信息

1 2	`# 检查calico-node这个serviceaccounts的配置文件，查看对应的finalizers和status中的conditions定位故障原因 kubectl get serviceaccounts calico-node -n calico-system -o yaml`

如果是finalizers中存在tigera.io/cni-protector导致资源无法被顺利删除，可以尝试修改为finalizers: []。这个问题看起来似乎是个Kubernetes上游的BUG，在github上面能找到相关的issue，主要集中在使用tigera-operator部署的calico。

This is an upstream Kubernetes (not AKS) issue. We can confirm this impacts 1.11.x - I believe this is the main upstream bug tracking this kubernetes/kubernetes#60807 however there are many bugs filed tracking this same behavior with the finalizers.

We will be unable to resolve this until a new upstream release which addresses this issue is released by the Kubernetes team. Marking as a known issue.

https://github.com/tigera/operator/issues/2031

https://github.com/projectcalico/calico/issues/6629

https://github.com/kubernetes/kubernetes/issues/60807

最后删除所有节点上面残留的cni配置文件，然后重启集群的所有机器

1 2	`# 删除cni下相关的配置文件 $ rm -rf /etc/cni/net.d/`

重启机器之后会把此前calico创建的路由信息、iptables规则和cni网卡删除，当然不想重启也可以手动删除干净

# 清理路由信息
$ ip route flush proto bird

# 清理calico相关网卡
$ ip link list | grep cali | awk '{print $2}' | cut -c 1-15 | xargs -I {} ip link delete {}

# 删除ipip模块
$ modprobe -r ipip

# 清理iptables规则
$ iptables-save | grep -i cali | iptables -F
$ iptables-save | grep -i cali | iptables -X

# 清理ipvsadm规则
$ ipvsadm -C

3、部署cilium

cilium的部署此前在博客里面介绍过多次了，包括overlay模式的部署、bgp模式的部署、kubeProxyReplacement模式的部署，以及eBPF的参数优化等，可以参考之前的汇总链接。这里我们直接使用kubeProxyReplacement模式+kube-routerBGP路由可达，另外eBPF的参数优化等也在部署cilium的时候一并部署上去。

3.1 kube-proxy

因为我们这里使用cilium的kubeProxyReplacement模式，所以先删除kube-proxy

# 在master节点上备份kube-proxy相关的配置
$ kubectl get ds -n kube-system kube-proxy -o yaml > kube-proxy-ds.yaml
$ kubectl get cm -n kube-system kube-proxy -o yaml > kube-proxy-cm.yaml

# 删除掉kube-proxy这个daemonset
$ kubectl -n kube-system delete ds kube-proxy
daemonset.apps "kube-proxy" deleted
# 删除掉kube-proxy的configmap，防止以后使用kubeadm升级K8S的时候重新安装了kube-proxy（1.19版本之后的K8S）
$ kubectl -n kube-system delete cm kube-proxy
configmap "kube-proxy" deleted

# 在每台机器上面使用root权限清除掉iptables规则和ipvs规则以及ipvs0网卡
$ iptables-save | grep -v KUBE | iptables-restore
$ ipvsadm -C
$ ip link del kube-ipvs0

3.2 cilium

首先部署helm

$ wget https://get.helm.sh/helm-v3.11.0-linux-amd64.tar.gz
$ tar -zxvf helm-v3.11.0-linux-amd64.tar.gz
$ cp -rp linux-amd64/helm /usr/local/bin/
$ helm version
version.BuildInfo{Version:"v3.11.0", GitCommit:"472c5736ab01133de504a826bd9ee12cbe4e7904", GitTreeState:"clean", GoVersion:"go1.18.10"}

然后添加repo

$ helm repo add cilium https://helm.cilium.io/
"cilium" has been added to your repositories
$ helm repo list
NAME    URL
cilium  https://helm.cilium.io/

最后安装cilium和hubble

SEED=$(head -c12 /dev/urandom | base64 -w0)

helm install cilium cilium/cilium --version 1.12.6 \
	--namespace kube-system \
	--set k8sServiceHost=10.31.90.0 \
	--set k8sServicePort=8443 \
	--set kubeProxyReplacement=strict \
	--set tunnel=disabled \
	--set ipam.mode=kubernetes \
	--set ipv4NativeRoutingCIDR=10.33.0.0/17 \
	--set lbExternalClusterIP=true \
	--set enableIPv4Masquerade=false \
	--set enableIPv6Masquerade=false \
	--set ipam.operator.clusterPoolIPv4PodCIDRList=10.33.0.0/17 \
	--set ipam.operator.clusterPoolIPv4MaskSize=24 \
	--set hubble.relay.enabled=true \
	--set hubble.ui.enabled=true \
	--set loadBalancer.algorithm=maglev \
	--set maglev.tableSize=65521 \
	--set maglev.hashSeed=$SEED \
	--set loadBalancer.mode=hybrid \
	--set socketLB.hostNamespaceOnly=true \
	--set loadBalancer.acceleration=native

部署完成后记得检查cilium相关的各个pod是否正常，与此同时集群中因为缺少cni而变为pending或者是unknown状态的pod也会重新分配IP并变回running。

3.3 kube-router

kube-router主要是用来发布BGP路由，实现podIP和loadbalancerIP的路由可达，我们先下载部署kube-router的yaml文件。

1	`$ curl -LO https://raw.githubusercontent.com/cloudnativelabs/kube-router/v1.2/daemonset/generic-kuberouter-only-advertise-routes.yaml`

在参数中配置bgp的peer信息，这里我添加了两个peer，分别为10.31.254.253和10.31.100.100。下面的peer-router-ips、peer-router-asns、cluster-asn需要根据自己的实际情况进行修改。

- --run-router=true
- --run-firewall=false
- --run-service-proxy=false
- --enable-cni=false
- --enable-pod-egress=false
- --enable-ibgp=true
- --enable-overlay=true
- --advertise-pod-cidr=true
- --advertise-cluster-ip=true
- --advertise-external-ip=true
- --advertise-loadbalancer-ip=true
- --bgp-graceful-restart=true
- --peer-router-ips=10.31.254.253,10.31.100.100
- --peer-router-asns=64512,64516
- --cluster-asn=64517

最后部署kube-router，注意带上namespace参数

1	`$ kubectl apply -f generic-kuberouter-only-advertise-routes.yaml -n kube-system`

部署完成后检查各节点和对应的BGP Peer路由信息是否正确。

# 检测cilium的状态
$ kubectl -n kube-system exec ds/cilium -- cilium status
# 查看k8s集群的node状态
$ kubectl -n kube-system exec ds/cilium -- cilium node list
# 查看k8s集群的service列表
$ kubectl -n kube-system exec ds/cilium -- cilium service list
# 查看对应cilium所处node上面的endpoint信息
$ kubectl -n kube-system exec ds/cilium -- cilium endpoint list

cilium的各项参数检测完成之后，基本可以确定集群的网络处于正常。

cloudnative

centos k8s docker calico bgp cilium containerd ebpf kube-router

Anycast概览 Previous

k8s系列14-calico开启eBPF Next

k8s系列15-calico有损迁移至cilium