What is Horizontal Pod Autoscaler?
· Category: Kubernetes
Short answer
The Horizontal Pod Autoscaler (HPA) automatically adjusts the number of Pod replicas in a Deployment or StatefulSet based on observed metrics like CPU utilization, memory usage, or custom metrics.
How it works
HPA queries the metrics server for resource usage. If usage exceeds the target, it increases replicas. If usage falls below the target, it decreases replicas, respecting minReplicas and maxReplicas.
Example
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
Why it matters
HPA enables elastic scaling, ensuring applications handle load spikes without manual intervention. It reduces costs by scaling down during quiet periods and improves reliability by preventing resource exhaustion.
Tips
- Always set resource requests for HPA to calculate utilization correctly.
- Use custom metrics from Prometheus for application-specific scaling.
- Pair HPA with cluster autoscaler to add nodes when needed.
Common issues
- HPA requires the metrics server to be installed.
- Rapid scaling can cause thundering herd effects.
- Scaling based on memory alone can be unreliable for garbage-collected languages.