What is Horizontal Pod Autoscaler?

· Category: Kubernetes

Short answer

The Horizontal Pod Autoscaler (HPA) automatically adjusts the number of Pod replicas in a Deployment or StatefulSet based on observed metrics like CPU utilization, memory usage, or custom metrics.

How it works

HPA queries the metrics server for resource usage. If usage exceeds the target, it increases replicas. If usage falls below the target, it decreases replicas, respecting minReplicas and maxReplicas.

Example

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60

Why it matters

HPA enables elastic scaling, ensuring applications handle load spikes without manual intervention. It reduces costs by scaling down during quiet periods and improves reliability by preventing resource exhaustion.

Tips

  • Always set resource requests for HPA to calculate utilization correctly.
  • Use custom metrics from Prometheus for application-specific scaling.
  • Pair HPA with cluster autoscaler to add nodes when needed.

Common issues

  • HPA requires the metrics server to be installed.
  • Rapid scaling can cause thundering herd effects.
  • Scaling based on memory alone can be unreliable for garbage-collected languages.