What is etcd?
etcd is a distributed, consistent, and highly available key-value store designed to reliably hold critical data for distributed systems. In Kubernetes, it serves as the single source of truth for the entire cluster, storing all cluster state data, including:
-
Pod and Service configurations
-
Secrets and ConfigMaps
-
Node and network information
-
Role-Based Access Control (RBAC) policies
Without etcd, Kubernetes cannot function—it is the backbone of the control plane.
Why Does Kubernetes Use etcd?
etcd was chosen for Kubernetes due to its unique properties:
-
Strong Consistency: Uses the Raft consensus algorithm to ensure all nodes agree on data.
-
High Availability: Survives node failures via replication.
-
Watch Functionality: Components (e.g., kube-apiserver) watch etcd for real-time updates.
-
Simplicity: Lightweight and purpose-built for small metadata.
Example: Storing a Pod’s State
When you create a Pod, the kube-apiserver writes its configuration to etcd:
# etcd key structure for a Pod: /registry/pods/<namespace>/<pod-name>
The scheduler and kubelet then read this data to schedule and run the Pod.
etcd Security Best Practices
etcd holds sensitive data, making security critical. Here’s how to protect it:
1. Enable Authentication
Restrict access using client certificates and username/password.
# Enable auth in etcd etcd –client-cert-auth –trusted-ca-file=ca.crt –cert-file=server.crt –key-file=server.key
2. Encrypt Data in Transit and at Rest
-
In Transit: Use TLS for communication between etcd peers and clients.
-
At Rest: Enable encryption via Kubernetes EncryptionConfiguration.
# Example EncryptionConfiguration (api-server) apiVersion: apiserver.config.k8s.io/v1 kind: EncryptionConfiguration resources: – resources: [ “secrets” ] providers: – aescbc: keys: – name: key1 secret: <base64-encoded-key>
3. RBAC for etcd Access
Limit access to the etcd Kubernetes role:
4. Network Policies
Isolate etcd traffic using network policies:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: etcd-firewall spec: podSelector: matchLabels: component: etcd ingress: – from: – podSelector: matchLabels: component: kube-apiserver ports: – protocol: TCP port: 2379 # etcd client port
etcd Maintenance
Regular maintenance ensures performance and reliability.
1. Backup and Restore
Backup:
Restore:
2. Defragmentation
Fragmented databases degrade performance. Defragment periodically:
3. Monitor Health
Track key metrics:
-
Leader changes: Frequent changes indicate instability.
-
Database size: Keep below 8GB (etcd’s soft limit).
-
Request latency: Should be <100ms.
Use Prometheus and Grafana for dashboards:
4. Upgrade etcd
Follow the Kubernetes version skew policy. Always test upgrades in staging.
Example: Disaster Recovery
Scenario: etcd cluster crashes due to disk failure.
-
Restore from Snapshot:
-
Verify Cluster Health:
Further Reading: