Phần 8 — Scheduling & Autoscaling: requests/limits, affinity, taints, HPA/VPA/CA

Series Kubernetes Toàn Tập — 13 phần:
Phần 1 — Tổng quan và kiến trúc
Phần 2 — Cài đặt cluster
Phần 3 — Workloads
Phần 4 — Networking
Phần 5 — Storage
Phần 6 — Configuration
Phần 7 — Security
Phần 8 — Scheduling & Autoscaling ← bạn đang đọc
Phần 9 — Helm, Operator, CRD
Phần 10 — Production
Phần 11 — Service Mesh: Istio, Linkerd, Cilium
Phần 12 — Multi-tenancy & Multi-cluster
Phần 13 — Cluster API, FinOps, AI/ML

Mở đầu

K8s không tự biết app của bạn cần gì. Bạn phải khai báo. Phần này gom các cơ chế bạn sẽ tinh chỉnh nhiều nhất khi app lên production:

Resources: requests/limits, QoS classes.
Scheduler hint: nodeSelector, affinity, anti-affinity, taints/tolerations, topology spread.
Autoscaling: HPA, VPA, Cluster Autoscaler, Karpenter, KEDA.
Resource governance: ResourceQuota, LimitRange, PriorityClass, PodDisruptionBudget.

1. Resources: requests và limits

Mỗi container nên khai báo:

resources:
  requests:
    cpu: 200m         # 0.2 CPU core
    memory: 256Mi
  limits:
    cpu: 1            # 1 core
    memory: 1Gi

1.1 Ý nghĩa

requests — lượng tài nguyên Pod được scheduler đảm bảo. Dùng để bin packing.
limits — trần. CPU vượt → throttle; memory vượt → OOMKill.

1.2 Đơn vị

CPU: 1 = 1 core, 500m = 0.5 core (milli). Scheduler tính milli.
Memory: Mi = 1024 KiB, Gi = 1024 MiB. M/G là chuẩn SI (1000-base). Dùng Mi/Gi.

1.3 QoS classes

K8s tự gán Pod vào 1 trong 3 class dựa trên requests/limits:

Class	Điều kiện	Hành vi
Guaranteed	Mọi container có request = limit cho cả CPU và memory	Evict cuối cùng khi node thiếu RAM
Burstable	Có request, có thể không có limit hoặc limit > request	Evict giữa
BestEffort	Không khai báo request/limit	Evict đầu tiên

Production: workload quan trọng đặt Guaranteed. Workload có thể chịu evict (worker, batch) đặt Burstable.

1.4 CPU limit có nên đặt không?

Tranh cãi kinh điển. Mặt thuận: dự đoán capacity, tránh app rogue ăn cả node. Mặt nghịch: CPU throttling rất khó debug, latency tăng đột biến dù CPU usage thấp. Tài liệu chính thức và đa số chuyên gia hôm nay khuyên:

Đặt request CPU đúng baseline.
Không đặt CPU limit trừ khi có lý do rõ (multi-tenant, regulated).
Luôn đặt memory limit vì OOMKill còn dự đoán được, chứ thiếu RAM cả node thì cả node sập.

2. Scheduler — quyết định pod chạy node nào

2.1 Quy trình

Filtering — loại node không đủ tài nguyên / không match nodeSelector / có taint không tolerate.
Scoring — chấm điểm node còn lại theo: least requested, balanced allocation, image locality, topology spread, …
Pod được bind vào node điểm cao nhất.

2.2 nodeSelector — đơn giản nhất

spec:
  nodeSelector:
    disktype: ssd

Pod chỉ schedule lên node có label disktype=ssd. Hard requirement.

2.3 Node Affinity — biểu thức linh hoạt

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: topology.kubernetes.io/zone
            operator: In
            values: ["us-east-1a", "us-east-1b"]
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        preference:
          matchExpressions:
          - key: node-pool
            operator: In
            values: ["spot"]

required... — hard, không thoả thì không schedule.
preferred... — soft, ưu tiên nếu được.
IgnoredDuringExecution — chỉ tác động lúc schedule, không kick pod khi node label đổi.

2.4 Pod Affinity / Anti-Affinity

# Anti-affinity: trải pod ra các node khác nhau
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: api
        topologyKey: kubernetes.io/hostname

2 pod cùng label app=api không được ở cùng hostname. Tương tự với topology.kubernetes.io/zone để chia đều giữa AZ.

2.5 Taints và Tolerations

Ngược logic: node đuổi pod không có toleration phù hợp. Dùng để dành node cho workload đặc biệt.

# Trên node
kubectl taint nodes gpu-1 gpu=true:NoSchedule

# Trên pod
spec:
  tolerations:
  - key: gpu
    operator: Equal
    value: "true"
    effect: NoSchedule

3 effect: NoSchedule, PreferNoSchedule, NoExecute (kick pod đang chạy nếu không tolerate).

K8s tự taint một số trường hợp: node not ready, network unavailable, disk pressure, control-plane. Đây là lý do bạn thấy 0/3 nodes are available: 3 node(s) had untolerated taint.

2.6 Topology Spread Constraints

Phân bố đều pod giữa các zone/host:

spec:
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway     # hoặc DoNotSchedule
    labelSelector:
      matchLabels:
        app: api

Spread 6 replica thành 2/2/2 thay vì 6/0/0. Quan trọng cho HA cross-AZ.

2.7 PodDisruptionBudget (PDB)

Khi cluster drain node (vd. upgrade), K8s sẽ tránh evict quá nhiều pod cùng app cùng lúc:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: api

API server từ chối evict nếu sẽ làm số pod available xuống dưới 2.

3. PriorityClass và Preemption

Workload không bằng nhau. Pod priority cao có thể preempt pod priority thấp khi cluster thiếu chỗ.

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: critical
value: 1000000
globalDefault: false
description: "Mission-critical workloads"
---
spec:
  priorityClassName: critical

K8s có 2 built-in: system-cluster-critical (2000000000), system-node-critical (2000001000) — cho component hệ thống.

4. ResourceQuota và LimitRange

4.1 ResourceQuota — giới hạn tổng cho namespace

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-a-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    pods: "100"
    persistentvolumeclaims: "20"
    requests.storage: 100Gi
    services.loadbalancers: "2"

Team không vượt tổng tài nguyên khai báo. Khi enforce ResourceQuota, mọi pod bắt buộc phải khai báo requests/limits (qua LimitRange default).

4.2 LimitRange — default & min/max per object

apiVersion: v1
kind: LimitRange
metadata:
  name: defaults
  namespace: team-a
spec:
  limits:
  - type: Container
    default:
      cpu: 500m
      memory: 512Mi
    defaultRequest:
      cpu: 100m
      memory: 128Mi
    max:
      cpu: "4"
      memory: 8Gi
    min:
      cpu: 10m
      memory: 16Mi

Pod không khai báo resources sẽ được inject default. Pod khai báo vượt max bị reject.

5. Horizontal Pod Autoscaler (HPA)

HPA scale số replica của Deployment/StatefulSet theo metric. Yêu cầu: metrics-server.

5.1 HPA theo CPU (cơ bản)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60     # % của requests.cpu
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 4
        periodSeconds: 30
      selectPolicy: Max

Lưu ý: HPA tính % theo requests.cpu chứ không phải CPU thật node. Nếu request quá thấp, app sẽ scale loạn.

5.2 HPA theo metric custom (Prometheus)

Cài prometheus-adapter hoặc KEDA:

metrics:
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: "100"

5.3 KEDA — autoscale theo bất kỳ source

KEDA (Kubernetes Event-Driven Autoscaling) mở rộng HPA với hàng chục scaler: Kafka lag, RabbitMQ queue depth, AWS SQS, Redis list, cron, …

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: worker
spec:
  scaleTargetRef:
    name: worker
  minReplicaCount: 0      # KEDA hỗ trợ scale to zero
  maxReplicaCount: 50
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka:9092
      consumerGroup: worker
      topic: events
      lagThreshold: "100"

6. Vertical Pod Autoscaler (VPA)

VPA chỉnh requests/limits theo metric thực tế (không scale replica). Dùng tốt cho:

Workload pod-một replica (bạn không scale ngang được).
Tự tính requests phù hợp khi mới deploy.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  updatePolicy:
    updateMode: Auto      # Off / Initial / Recreate / Auto
  resourcePolicy:
    containerPolicies:
    - containerName: api
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 4Gi

Cẩn thận: VPA Auto sẽ kill pod để apply request mới. Production thường dùng Off hoặc Initial để tránh restart bất ngờ.

Không dùng VPA cùng HPA trên cùng một metric (CPU/memory) — chúng đánh nhau. Có thể dùng VPA cho memory + HPA cho CPU.

7. Cluster Autoscaler (CA)

Khi pod kẹt Pending vì không node nào còn chỗ, Cluster Autoscaler thêm node. Khi node trống, CA xoá node.

CA hỗ trợ cloud (AWS ASG, GCP MIG, Azure VMSS) và một số on-prem. Config trên AWS:

--cluster-name=my-cluster
--cloud-provider=aws
--nodes=2:20:eks-node-group-spot-AbCdEf
--balance-similar-node-groups
--skip-nodes-with-system-pods=false

7.1 Karpenter (AWS-first, đang lan)

Karpenter (do AWS open-source) thay CA với:

Schedule pod xong mới chọn instance type tối ưu (nhanh).
Hỗ trợ nhiều instance type, spot, consolidation tự động.
Không phụ thuộc ASG — gọi thẳng EC2 RunInstances.

Karpenter dùng NodePool/NodeClass thay node group ASG. Production AWS hôm nay nên dùng Karpenter cho khối lượng đáng kể.

8. Kết hợp HPA + CA + VPA

traffic ↑
  └ HPA tăng replica
      └ pod Pending vì hết chỗ
          └ CA / Karpenter thêm node
              └ pod schedule, traffic xử lý

traffic ↓
  └ HPA giảm replica
      └ node ít util
          └ CA / Karpenter consolidate, xoá node

VPA chạy nền tinh chỉnh requests cho pod-một replica hoặc tư vấn (recommend mode)

9. Topology-aware routing

Service mặc định route đến mọi endpoint. Cross-AZ tốn $$. Bật topology-aware hints:

metadata:
  annotations:
    service.kubernetes.io/topology-mode: Auto

kube-proxy/CNI sẽ ưu tiên endpoint cùng AZ với client → giảm cost và latency.

10. Best practices

Mọi container phải có resources.requests. Memory limit có. CPU limit thường không (trừ khi cần).
Workload prod quan trọng: Guaranteed QoS, PDB minAvailable, anti-affinity hoặc topologySpread cross-AZ.
HPA dựa trên CPU OK cho web/API. Worker async dùng KEDA theo queue depth.
VPA recommend mode để tinh chỉnh request — không VPA Auto cho stateful.
Cluster Autoscaler hoặc Karpenter để cluster thở theo demand. Đặt instance type đa dạng + spot.
ResourceQuota mỗi namespace team. LimitRange default để pod không khai báo vẫn hợp lệ.
PriorityClass cho ingress-controller, monitoring, DNS — không để app rogue evict chúng.
Lưu ý throttling: container_cpu_cfs_throttled_seconds_total trong Prometheus là KPI quan trọng.

11. Debug

# Pod Pending vì gì?
kubectl describe pod <name>
# 'FailedScheduling: 0/5 nodes available: 3 Insufficient cpu, 2 node(s) didn't match Pod's affinity'

# Top utilisation
kubectl top pods -A --sort-by=cpu
kubectl top nodes

# HPA status
kubectl get hpa
kubectl describe hpa api

# Events autoscaling
kubectl get events --field-selector source=cluster-autoscaler

# Why pod evicted?
kubectl get pod -o wide --show-labels
kubectl get events --field-selector reason=Evicted

12. Tóm tắt

Khai báo requests/limits đúng → scheduler đặt pod đúng → QoS đúng → ổn định.
nodeSelector, affinity, taints, topology spread để kiểm soát placement.
HPA scale replica theo metric, KEDA cho event-driven, VPA tinh chỉnh requests.
Cluster Autoscaler / Karpenter scale node — cluster tự co dãn.
ResourceQuota, LimitRange, PriorityClass, PDB là tooling governance cluster.

Phần 9: package app với Helm, mở rộng K8s qua CRD và Operator pattern.

← Phần 7 | Phần 9: Helm, Operator, CRD →

Mở đầu

1. Resources: requests và limits

1.1 Ý nghĩa

1.2 Đơn vị

1.3 QoS classes

1.4 CPU limit có nên đặt không?

2. Scheduler — quyết định pod chạy node nào

2.1 Quy trình

2.2 nodeSelector — đơn giản nhất

2.3 Node Affinity — biểu thức linh hoạt

2.4 Pod Affinity / Anti-Affinity

2.5 Taints và Tolerations

2.6 Topology Spread Constraints

2.7 PodDisruptionBudget (PDB)

3. PriorityClass và Preemption

4. ResourceQuota và LimitRange

4.1 ResourceQuota — giới hạn tổng cho namespace

4.2 LimitRange — default & min/max per object

5. Horizontal Pod Autoscaler (HPA)

5.1 HPA theo CPU (cơ bản)

5.2 HPA theo metric custom (Prometheus)

5.3 KEDA — autoscale theo bất kỳ source

6. Vertical Pod Autoscaler (VPA)

7. Cluster Autoscaler (CA)

7.1 Karpenter (AWS-first, đang lan)

8. Kết hợp HPA + CA + VPA

9. Topology-aware routing

10. Best practices

11. Debug

12. Tóm tắt

Bài viết liên quan

Phần 13 — Cluster API, FinOps và AI/ML workloads trên Kubernetes

Phần 12 — Multi-tenancy và Multi-cluster: chia sẻ K8s an toàn ở quy mô

Phần 11 — Service Mesh: Istio, Linkerd và Cilium Service Mesh

Ý kiến

Bài viết liên quan

Phần 13 — Cluster API, FinOps và AI/ML workloads trên Kubernetes

Phần 12 — Multi-tenancy và Multi-cluster: chia sẻ K8s an toàn ở quy mô

Phần 11 — Service Mesh: Istio, Linkerd và Cilium Service Mesh

Mở đầu

1. Resources: requests và limits

1.1 Ý nghĩa

1.2 Đơn vị

1.3 QoS classes

1.4 CPU limit có nên đặt không?

2. Scheduler — quyết định pod chạy node nào

2.1 Quy trình

2.2 nodeSelector — đơn giản nhất

2.3 Node Affinity — biểu thức linh hoạt

2.4 Pod Affinity / Anti-Affinity

2.5 Taints và Tolerations

2.6 Topology Spread Constraints

2.7 PodDisruptionBudget (PDB)

3. PriorityClass và Preemption

4. ResourceQuota và LimitRange

4.1 ResourceQuota — giới hạn tổng cho namespace

4.2 LimitRange — default & min/max per object

5. Horizontal Pod Autoscaler (HPA)

5.1 HPA theo CPU (cơ bản)

5.2 HPA theo metric custom (Prometheus)

5.3 KEDA — autoscale theo bất kỳ source

6. Vertical Pod Autoscaler (VPA)

7. Cluster Autoscaler (CA)

7.1 Karpenter (AWS-first, đang lan)

8. Kết hợp HPA + CA + VPA

9. Topology-aware routing

10. Best practices

11. Debug

12. Tóm tắt