背景:
在Kubernetes中,為了實現組件高可用,同一個組件需要部署多個副本,例如多個apiserver、scheduler、controller-manager等,其中apiserver是無狀態的,每個組件都可以工作,而scheduler與controller-manager是有狀態的,同一時刻只能存在一個活躍的,需要進行選主。
Kubernetes中是通過leaderelection來實現組件的高可用的。在Kubernetes本身的組件中,kube-scheduler和kube-manager-controller兩個組件是有leader選舉的,這個選舉機制是Kubernetes對于這兩個組件的高可用保障。即正常情況下kube-scheduler或kube-manager-controller組件的多個副本只有一個是處于業務邏輯運行狀態,其它副本則不斷的嘗試去獲取鎖,去競爭leader,直到自己成為leader。如果正在運行的leader因某種原因導致當前進程退出,或者鎖丟失,則由其它副本去競爭新的leader,獲取leader繼而執行業務邏輯。
不光是Kubernetes本身組件用到了這個選舉策略,我們自己定義的服務同樣可以用這個算法去實現選主。在Kubernetes client-go包中就提供了接口供用戶使用。代碼路徑在client-go/tools/leaderelection下。
改造方案:
無狀態組件:調整實例數,比較簡單,由于是無狀態直接調整實例數就行
有狀態組件:使用選舉的方式,只有一個是處于業務邏輯運行狀態,其它副本則不斷的嘗試去獲取鎖
下面詳細說下選舉的實現方式:
選舉可以參考代碼:
/*Copyright 2018 The Kubernetes Authors.Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License atUnless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.*/package mainimport ( "context" "flag" "os" "os/signal" "syscall" "time" "github.com/google/uuid" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" clientset "k8s.io/client-go/kubernetes" "k8s.io/client-go/rest" "k8s.io/client-go/tools/clientcmd" "k8s.io/client-go/tools/leaderelection" "k8s.io/client-go/tools/leaderelection/resourcelock" "k8s.io/klog/v2")func buildConfig(kubeconfig string) (*rest.Config, error) { if kubeconfig != "" { cfg, err := clientcmd.BuildConfigFromFlags("", kubeconfig) if err != nil { return nil, err } return cfg, nil } cfg, err := rest.InClusterConfig() if err != nil { return nil, err } return cfg, nil}func main() { klog.InitFlags(nil) var kubeconfig string var leaseLockName string var leaseLockNamespace string var id string flag.StringVar(&kubeconfig, "kubeconfig", "", "absolute path to the kubeconfig file") flag.StringVar(&id, "id", uuid.New().String(), "the holder identity name") flag.StringVar(&leaseLockName, "lease-lock-name", "", "the lease lock resource name") flag.StringVar(&leaseLockNamespace, "lease-lock-namespace", "", "the lease lock resource namespace") flag.Parse() if leaseLockName == "" { klog.Fatal("unable to get lease lock resource name (missing lease-lock-name flag).") } if leaseLockNamespace == "" { klog.Fatal("unable to get lease lock resource namespace (missing lease-lock-namespace flag).") } // leader election uses the Kubernetes API by writing to a // lock object, which can be a LeaseLock object (preferred), // a ConfigMap, or an Endpoints (deprecated) object. // Conflicting writes are detected and each client handles those actions // independently. // 獲取當前集群的kubeconfig文件 client := clientset.NewForConfigOrDie(ctrl.GetConfigOrDie()) // 啟動進行業務處理 run := func(ctx context.Context) { // complete your controller loop here klog.Info("Controller loop...") select {} } // use a Go context so we can tell the leaderelection code when we // want to step down ctx, cancel := context.WithCancel(context.Background()) defer cancel() // listen for interrupts or the Linux SIGTERM signal and cancel // our context, which the leader election code will observe and // step down ch := make(chan os.Signal, 1) signal.Notify(ch, os.Interrupt, syscall.SIGTERM) go func() { <-ch klog.Info("Received termination, signaling shutdown") cancel() }() // we use the Lease lock type since edits to Leases are less common // and fewer objects in the cluster watch "all Leases". // 選擇leases作為鎖 lock := &resourcelock.LeaseLock{ LeaseMeta: metav1.ObjectMeta{ Name: leaseLockName, Namespace: leaseLockNamespace, }, Client: client.CoordinationV1(), LockConfig: resourcelock.ResourceLockConfig{ Identity: id, }, } // start the leader election code loop leaderelection.RunOrDie(ctx, leaderelection.LeaderElectionConfig{ Lock: lock, // IMPORTANT: you MUST ensure that any code you have that // is protected by the lease must terminate **before** // you call cancel. Otherwise, you could have a background // loop still running and another process could // get elected before your background loop finished, violating // the stated goal of the lease. ReleaseOnCancel: true, LeaseDuration: 60 * time.Second, RenewDeadline: 15 * time.Second, RetryPeriod: 5 * time.Second, Callbacks: leaderelection.LeaderCallbacks{ OnStartedLeading: func(ctx context.Context) { // we're notified when we start - this is where you would // usually put your code // 成為leader后執行的邏輯 run(ctx) }, OnStoppedLeading: func() { // we can do cleanup here klog.Infof("leader lost: %s", id) // 失去leader后執行的邏輯 os.Exit(0) }, OnNewLeader: func(identity string) { // we're notified when new leader elected // 新leader選舉后執行的邏輯 if identity == id { // I just got the lock return } klog.Infof("new leader elected: %s", identity) }, }, })} |
業務代碼嵌入:
通常做法在業務代碼里面嵌入選舉邏輯,當選leader后再提供服務,不再是leader則停止提供服務。
優點:資源占用少,不需要sidecar,業務容器只有一個提供服務
缺點:需要修改業務代碼,官方有對應的庫,改動量應該不大
sidecar模式1:
不改動業務代碼,部署業務時,嵌入leader選舉的sidecar
sidecar 實現思路:
成為leader則啟動web服務,監聽8080端口,不是leader則停止web服務
部署的deployment中,指定sidecar容器的readinessProbe為8080,監聽web服務的狀態
優點:不需要改動業務代碼,另外嵌入一個sidecar容器做選舉
缺點:sidecar只有一個處于ready狀態,更新策略需要修改為recreate,重啟時需要等待實例退出后再重新啟動,高可用也會有影響
$ kubectl get pod -n default | grep election-example
election-example-789687f864-mvtkn 1/2 Running 0 89s
election-example-789687f864-rb5gt 2/2 Running 0 3m22s
查看deployment狀態,只有一個處于ready狀態
$kubectl get deployments -n default election-example
NAME READY UP-TO-DATE AVAILABLE AGE
election-example 0/2 2 0 34d
sidecar模式2:
簡單改動業務代碼,部署業務時,嵌入leader選舉的sidecar
思路:
sidecar啟動web服務,監聽8080端口,通過http接口, 返回自己是否leader
業務容器不需要自己實現選舉代碼,但是需要隔一段時間檢查sidecar是否是leader,是leader則提供服務
適用于業務難以直接實現選舉代碼的折中方案
權限控制:
需要實現選舉,則需要給業務賦予相應的權限
apiVersion: rbac.authorization.k8s.io/v1kind: Rolemetadata: name: election-example namespace: defaultrules:- apiGroups: - "coordination.k8s.io" resources: - leases verbs: - get - create - update- apiGroups: - "" resources: - configmaps - endpoints verbs: - get - create - update---apiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata: name: election-example namespace: defaultsubjects: - kind: ServiceAccount name: election-exampleroleRef: kind: Role name: election-example apiGroup: rbac.authorization.k8s.io---apiVersion: v1kind: ServiceAccountmetadata: name: election-example namespace: default |